Вы находитесь на странице: 1из 259

Advanced Linear Algebra

Math 462 - Revised for Fall 2012 - Section 15490


California State Univeristy Northridge

This version was LATEXed on November 3, 2012

Math 462

This is a Rough Draft and Contains Errors!


This is an incomplete working document that will be revised many times over the semester.
Save a tree and just print what you need - there will be lots of corrections, additions, updates,
and changes, and probably pretty frequently!

Note to Students
Dont try to use this as a replacement for your textbook. These are notes and just provide
an outline of the subject material, not a complete presentation. I have provided a copy to
you to use only as a study aid. Their real purpose is to remind me what to talk about about
during my class lectures. They are loosely based on the textbook Linear Algebra Done Right
by Sheldon Axler, and contain some material from other sources as well, but the presentation
in the textbook is more thorough. You should read the textbook in preparation for class, and
just use these notes to aide your own note-taking during class.

Some Legal Stuff


This document is provided in the hope that it will be useful but without any warranty,
without even the implied warranty of merchantability or fitness for a particular purpose.
The document is provided on an as is basis and the author has no obligations to provide
corrections or modifications. The author makes no claims as to the accuracy of this
document. In no event shall the author be liable to any party for direct, indirect, special,
incidental, or consequential damages, including lost profits, unsatisfactory class performance,
poor grades, confusion, misunderstanding, emotional disturbance or other general malaise
arising out of the use of this document or any software described herein, even if the author
has been advised of the possibility of such damage. This is not an official document. Any
opinions expressed herein are totally arbitrary, are only presented to expose the student to
diverse perspectives, and do not necessarily reflect the position of any specific individual, the
California State University, Northridge, or any other organization.

2012. This work is licensed under the Creative Commons Attribution-NoncommercialNo Derivative Works 3.0 United States License. To view a copy of this license, visit
http://creativecommons.org/licenses/by-nc-nd/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.

Please report any errors to bruce.e.shapiro@csun.edu.


All feedback, comments, suggestions for improvement, etc., is appreciated, especially if youve used these notes for a
class, either at CSUN or elsewhere, from both instructors and students.

Page ii

Last revised: November 3, 2012

Contents
Please remember that this is a draft document, and so it is incomplete and buggy. More
sections will be added as the course progresses, the ordering of topics is subject to change,
and errors will be corrected as I become aware of them. This version was last LATEXed
on November 3, 2012.

Front Cover
i
Table of Contents
iii
Symbols Used
v
1 Complex Numbers
1
2 Vectors in 3-Space
9
3 Matrices and Determinants 15
4 Eigenstuff
27
5 Inner Products and Norms 33
6 Similar Matrices
45
7 Previewing the SVD
47
8 Example: Metabolic Flux 51
9 Vector Spaces
59
10 Subspaces
67
11 Polynomials
73
12 Span and Linear Independence
81
13 Bases and Dimension
87
14 Linear Maps
95
15 Matrices of Linear Maps 103
16 Invertibility of Linear Maps107

2012. Draft of: November 3, 2012

17 Operators and Eigenvalues 115


18 Matrices of Operators
121
19 The Canonical Diagonal
Form
131
20 Invariant Subspaces
137
21 Inner Products and Norms 141
22 Fixed Points of Operators 147
23 Orthogonal Bases
159
24 Fourier Series
165
25 Triangular Decomposition 175
26 The Adjoint Map
181
27 The Spectral Theorem
189
28 Normal Operators
197
29 Positive Operators
213
30 Isometries
221
31 Singular Value Decomposition
227
32 Generalized Eigenvectors 235
33 The Characteristic Polynomial
241
34 The Jordan Form
247

Page iii

Math 462

CONTENTS

A characteristic (typical) Linear Algebra teacher.

Page iv

Last revised: November 3, 2012

Symbols Used
(v1 , . . . , vn )
(v1 , . . . , vn |p1 , . . . , pm )
hv, wi
kvk
V W
C
deg(p)
det(A)
diagonal(x1 , . . . , xn )
dim(V)
F
Fn
F
i
length(B)
L(V)
L(V, W)
M(T, B)
P(F)
Pm (F)
R
span(v1 , . . . , vn )
U , V, W
V(F)
z or z
T

List containing v1 , . . . , vn
(v1 , . . . ) with (p1 , . . . ) removed
inner product of two vectors
norm of a vector
Direct Sum of V and W
Field of Complex Numbers
Degree of polynomial p
Determinant of A
Diagonal matrix
Dimension of a Vector Space
Either R or C
Set of tuples (x1 , . . . , xn ), xi F
Set of sequences over F

1
Length of a list B
Set of linear operators T : V 7 V
Set of linear maps T : V 7 W
Matrix of a linear map T
with respect to a basis B
Set of polynomials over F
Polynomials of degree m
Field of Real Numbers
Span of a list of vectors
Vector Spaces
Vector Space over F
Complex conjugate of z (scalar or vector)
Adjoint of T

2012. Draft of: November 3, 2012

Page v

Math 462

CONTENTS

Working across the chalk board from right to left, an intrepid linear algebra
teacher enters a fractal dimensioned vector space in search of the elusive x, braving
Koch curves, PQ waves, and buggy lecture notes, demonstrating Bruces law of
learning: No learning takes place after Thanksgiving.1

1 The American holiday of Thanksgiving, (a national celebration of televised football),


is the 4th Thursday of November. It provides many college students with a 5-day weekend
(the day after Thanksgiving, Black Friday, is the national day of shopping, and pretty
much everyone cuts classes the day before). Final exams begin about a week or two
later. The first week of December thereby degenerates into an educational singularity as
instructors vainly try to make up for lost time.

Page vi

Last revised: November 3, 2012

Topic 1

Complex Numbers
Definition 1.1 Let a, b R. Then a Complex Number is an ordered
pair
z = (a, b)
(1.1)
with the following properties:
1. Complex Addition:
z + w = (a + c, b + d)

(1.2)

2. Complex Multiplication:
z w = (ac bd, ad + bc)

(1.3)

Definition 1.2 The set of all complex numbers is denoted by C. We


will sometimes call this set the complex plane because we can think of the
point (a, b) C as a point plotted with coordinates (a, b) in the Euclidean
plane.
Definition 1.3 The Real Part of of a complex number z = (a, b) is defined
by
Re z = Re (a, b) = a
(1.4)
Definition 1.4 The Imaginary Part of a complex number z = (a, b) is
defined by
Im z = Im (a, b) = b.
(1.5)
Definition 1.5 The Real Axis is defined as the subset of C
{z = (x, 0)|x R}

2012. Draft of: November 3, 2012

(1.6)
Page 1

Math 462

TOPIC 1. COMPLEX NUMBERS

Remark 1.6 There is a one-to-relationship between the real numbers R


and the and Real Axis. We will often use the terms interchangeably.
Thus if x R then (x, 0) C, we will sometimes write things like
x = (x, 0)

(1.7)

although the expression on the left is a real number and the expression on
the right is a complex number. This works because the imaginary part of
the right hand side is zero; hence we can use x and (x, 0) interchangeably.
Definition 1.7 The imaginary axis is the set of complex numbers
{z = (0, y)|y R}

(1.8)

Remark 1.8 There is also a one-to-relationship between the real numbers


R and the and imaginary axis.
Multiplication of Complex Numbers. Equations 1.2 and 1.3 tell us
that it is possible to multiply a complex number by a real number in the
way we would expect. Let z = (a, b) C where a, b R, and let x R.
Then we define scalar multiplication by
xz = x(a, b) = (xa, xb) C

(1.9)

To see why this works, let u = (x, 0) be any point on the real axis. Then
by (1.3),
uz = (x, 0) (a, b) = (ax 0b, bx 0a) = (ax, bx) = x(a, b) = xz (1.10)
Definition 1.9 We use the symbol i to denote the complex number
i = (0, 1)

(1.11)

Square Roots of Negative Real Numbers. The motivation for equation 1.3 is the following. Since i = (0, 1) by (1.11), then by 1.3,
i2 = (0, 1) (0, 1)

(1.12)

= ((0)(0) (1)(1), (0)(1) + (1)(0))

(1.13)

= (1, 0)

(1.14)

= 1

(1.15)

Theorem 1.10 Using the notation defined above,

i = 1
Page 2

(1.16)

Last revised: November 3, 2012

TOPIC 1. COMPLEX NUMBERS

Math 462

More Common Notation. Since i = (0, 1) we can write any complex


number z = (a, b) as
z = (a, b)

(1.17)

= (a, 0) + (b, 0)

(1.18)

= a(1, 0) + b(0, 1)

(1.19)

= a + bi

(1.20)

where we have used the notation (1.7) to write 1 = (1, 0) and (1.11) to
write i = (0, 1).
Theorem 1.11 Let u = a + bi and v = c + di, where a, b, c, d R. Then
the complex number uv can be computed using the normal rules of multiplication over R supplemented by the equation i2 = 1.
Proof. By equation (1.3)
uv = (a, b) (c, d)

(1.21)

= (ac bd, ad + bc)

(1.22)

= (ac bd) + (ad + bc)i

(1.23)

But by the normal rules of real multiplication,


uv = (a + bi)(c + di)

(1.24)
2

= ac + adi + bci + bdi

(1.25)

= (ac bd) + (ad + bc)i

(1.26)

The proof of the following theorem follows immediately form the properties
of R.
Theorem 1.12 Properties of Complex Numbers
1. Closure. The set C is closed under addition and multiplication, i.e.,
whenever w, z C it follows that w + z C and wz C.
2. Commutative Property.
w+z =z+w
wz = zw


w, z C

(1.27)

3. Associative Property:
(u + v) + w = u + (v + w)
(uv)w = u(vw)
Last revised: November 3, 2012


u, v, w C

(1.28)

Page 3

Math 462

TOPIC 1. COMPLEX NUMBERS

4. Additive and Multiplicative Identities. 0, 1 C such that



z+0=0+z =z
z C
(1.29)
z1 = 1z = z
5. Additive Inverse. z C, unique w C 3 z + w = 0, where
w = z = (1)(z).
z + (z) = (z) + z = 0

(1.30)

6. Multiplicative Ineverse. z C, unique w C 3


zw = wz = 1

(1.31)

We write w = z 1 = 1/z sot that


z(z 1 ) = (z 1 )z = 1 or

(1.32)

z(1/z) = (1/z)z = 1

(1.33)

7. Distributive Property.
u(w + z) = uw + uz, u, w, z C

(1.34)

Definition 1.13 A Field is a set, together with two operations, that we


will call addition (or (+)) and multiplication or (), that satisfy the
properites of closure; associativity and commutativity of both (+) and ();
existence of identities and inverses for both (+) and (); and distributivity
of () over (+).
Theorem 1.14 R is a field.
Theorem 1.15 C is a field.
Notation. Throughout this course we will use the notation F to represent
a general scalar field (we can take the term scalar to mean not vector
because, as we will see, there are such things as vector fields as well). We
will only be interested in two fields: R and C. You can think of F as
representating either R or C; i.e., whenever we make a statement about a
field F we mean that the statement refers to either R or C.
Definition 1.16 Let z = a + bi C. Then the complex conjugate of z
is defined as
(1.35)
z = z = a bi
Note that there are two different but equivalent notations that we will use
for the complex conjugate; you whould become comfortable with both since
different texts use different notations and both are pretty standard.
Page 4

Last revised: November 3, 2012

TOPIC 1. COMPLEX NUMBERS

Math 462

Using the complex conjugate gives us a way to extend the way we factor
the difference of two squares. Recall from algebra that if a, b R then
a2 b2 = (a b)(a + b)

(1.36)

We now observe that if z = a + bi then


zz = (a + bi)(a bi) = a2 + b2

(1.37)

Definition 1.17 Let z = a + bi C where a, b R. Then the Absolute


Value of of z is defined as the positive square root,
p

(1.38)
|z| = zz = a2 + b2
Theorem 1.18 Properties of the Complex Conjugate
z + z = 2Re z, z C

(1.39)

z z = 2iIm z, z C

(1.40)

z + w = z + w , z, w C

(1.41)

zw = (z )(w ), z, w C

(1.42)

(z ) = z, z C

(1.43)

|wz| = |w||z|, w, z C

(1.44)

The Complex Plane Representation. We can represent the complex


number z = a + bi as a point (a, b) in the complex plane with x-coordinate
a and y-coordinate b. The phase or argument of z is defined to be the
angle between the line segment from (0, 0) to (a, b) and the x-axis (the real
axis).
Definition 1.19 Let z = a + bi. Then the phase of z is defined as
Ph(z) = tan1 (a, b)

(1.45)

where the quadrant is determined by the location of the point (a, b) in the
complex plane.

Since the distance between the origin and (a, b) is a2 + b2 = |z| we have
Theorem 1.20 Eulers Formula
a + bi = |z|(cos + i sin ) = |z|ei

(1.46)

where = Ph(z).
Last revised: November 3, 2012

Page 5

Math 462

TOPIC 1. COMPLEX NUMBERS

Figure 1.1: Illustration of a complex number and its conjugate, showing the phase () and absolute value (r) of both z and z. [Attribution: Wikimedia Commons, Creative Commons Attribution-Share Alike
3.0 Unported license, by Oleg Alexandrov, http://en.wikipedia.org/
wiki/File:Complex_conjugate_picture.svg.]

Proof. The second equality in equation 1.46 can be proven by expanding


sin , cos and ei in MacLaurin series.
Corollary 1.21 Eulers Equation
ei = 1

(1.47)

Proof. Set = in (1.46).


Theorem 1.22 Every complex number z has a total of n distinct nth roots
given by





+ 2k
+ 2k
+ i sin
, k = 0, 1, .., n 1 (1.48)
rk = |z|1/n cos
n
n
Proof. Take the nth root of 1.46.

Page 6

Last revised: November 3, 2012

TOPIC 1. COMPLEX NUMBERS


Example 1.1 Find

Math 462

i.

From 1.46 we have


i = 0 + (1)i

= cos + i sin
2


2
+ 2k + i sin
+ 2k
= cos
2
2
i(/2+2k)
=e

(1.49)
(1.50)
(1.51)
(1.52)

where the last two lines hold for any integer value of k. Hence


1/2
i = ei(/2+2k)
= ei(/4+k)




+ k + i sin
+ k , k = 0, 1
= cos
4
4

2
2 2
2
=
+i
,
i
2
2
2
2

(1.53)
(1.54)
(1.55)
(1.56)

In fact, (1.55) is valid for all integer values of k, not just k = 0, 1; but it
will only give two unique results, as the result for k will be the same as
for k 2, for all k, because adding any multiple of 2 does not change the
value of the trigonometric function.

Last revised: November 3, 2012

Page 7

Math 462

Page 8

TOPIC 1. COMPLEX NUMBERS

Last revised: November 3, 2012

Topic 2

Vectors in 3-Space
Definition 2.1 A Euclidean 3-vector v is object with a magnitude and
direction which we will denote by the ordered triple
v = (x, y, z)

(2.1)

The magnitude or absolute value or length of the vector, is denoted v,


or |v|, and is defined to be the postitive square root
v = |v| =

x2 + y 2 + z 2

(2.2)

This definition is motivated by the fact that v is the length of the line
segment from the origin to the point P = (x, y, z) in Euclidean 3-space.
A vector is sometimes represented geometrically by an arrow from the origin
to the point P = (x, y, z), and we will sometimes use the notation (x, y, z)
to refer either to the point P or the vector v from the origin to the point
P . Usually it will be clear from the context which we mean.
Definition 2.2 The set of all Euclidean 3-vectors is isomorphic to the Euclidean 3-space (which we typically refer to as R3 ).
If you are unfamiliar with the term isomorphic, dont worry about it; just
take it to mean in one-to-one correspondence with, and that will be
sufficient for our purposes.
Definition 2.3 Let v = (x, y, z) and w = (x0 , y 0 , z 0 ) be Euclidean 3-vectors.
Then the angle between v and w is defined as the angle between the line
segments joining the origin and the points P = (x, y, z) and P 0 = (x0 , y 0 , z 0 ).

2012. Draft of: November 3, 2012

Page 9

Math 462

TOPIC 2. VECTORS IN 3-SPACE

We can define vector addition or vector subtraction by


v + w = (x, y, z) + (x0 , y 0 , z 0 ) = (x + x0 , y + y 0 , z + z 0 )

(2.3)

where v = (x, y, z) and w = (x0 , y 0 , z 0 ), and scalar multiplcation (multiplication by a real number) by
kv = (kx, ky, kz)

(2.4)

Theorem 2.4 The set of all Euclidean vectors is closed under vector addition and scalar multiplication.
Definition 2.5 Let v = (x, y, z), w = (x0 , y 0 , z 0 ) be Euclidean 3-vectors.
Their dot product is defined as
v w = xx0 + yy 0 + zz 0

(2.5)

Remark 2.6 In terms of matrix multiplication of real vectors (defined later


in this section) the dot product is equivalent to the inner product:
v w = vT w

(2.6)

For complex vectors, we replace the transpose with the Hermitian Conjugate:
v w = v w
(2.7)
Alternative notations for the inner product are
v w = hv, wi
= hv|wi

(more common in physics)

(2.8)
(2.9)

Note that for complex vectors the dot products does not in general commute.
Theorem 2.7 Let be the angle between the line segments from the origin
to the points (x, y, z) and (x0 , y 0 , z 0 ) in Euclidean 3-space. Then
v w = |v||w| cos

(2.10)

Definition 2.8 Let v = (x, y, z) and w = (x0 , y 0 , z 0 ) be Euclidean 3-vectors.


Their cross product is


i j k


v w = x y z = (yz 0 y 0 z)i (xz 0 x0 z)j + (xy 0 x0 y)k (2.11)
x0 y 0 z
where the determinant is defined below in definition 3.13.
Page 10

Last revised: November 3, 2012

TOPIC 2. VECTORS IN 3-SPACE

Math 462

Theorem 2.9 Let v = (x, y, z) and w = (x0 , y 0 , z 0 ) be Euclidean 3-vectors,


and let be the angle between them. Then
|v w| = |v||w| sin

(2.12)

Definition 2.10 The standard basis vectors for Euclidean 3-space are
the vectors
i =(1, 0, 0)

(2.13)

j =(0, 1, 0)

(2.14)

k =(0, 0, 1)

(2.15)

Theorem 2.11 Let v = (x, y, z) be any Euclidean 3-vector. Then


v = ix + jy + kz

(2.16)

Definition 2.12 A linear combination of of a set of m-dimensional vectors v1 , v2 , . . . , vn is a vector


v = a1 v1 + a2 v2 + + an vn

(2.17)

for some collection of numbers a1 , . . . , an


Definition 2.13 The vectors v1 , v2 , . . . , vn are said to be linearly dependent if there exist numbers a1 , a2 , . . . , an , not all zero, such that
a1 v1 + a2 v2 + + an vn = 0

(2.18)

i.e., there is some non-trivial linear combination of the vectors that sums
to the zero vector.
If no such numbers exist the vectors are said to be linearly independent,
i.e., no non-trivial linear combination of the vectors sums to the zero vector.
Definition 2.14 The span of a set of m-dimensional vectors v1 , v2 , . . . , vn
is the subset of Rm formed by all possible linear combinations of the vi .
(
)
n

X

span(v1 , v2 , . . . , vn ) = v v =
ai vi , a1 , a2 , F
(2.19)

i=1

where F is either R or C, depending on the domain of the problem.


Definition 2.15 Let S = span(v1 , v2 , . . . , vn ) Fm . Then the dimension of S is the smallest integer k such that for some set of vectors e1 , . . . , ek
S, every element s S can be written as a linear combination of the ei .
s = c1 e1 + c2 e2 + + ck ek

(2.20)

The vectors e1 , . . . , ek are said to form a basis for S.


Last revised: November 3, 2012

Page 11

Math 462

TOPIC 2. VECTORS IN 3-SPACE

Example 2.1 Let S = {a, b, c}, where

a = (1, 2, 1)
b = (10, 0, 10)

c = (24 2, 2, 24 2)

(2.21)
(2.22)
(2.23)

Then a, b, c R3 (m = 3).
If we define



1 1 1
, ,
2
2 2


1
1
e2 = , 0,
2
2

e1 =

(2.24)
(2.25)

Then
a = 2e1

b = 10 2e2

c = 2 2e1 + 24 2e2

(2.26)
(2.27)
(2.28)

Since every element of S can be written as a linear combination of {e1 , e2 },


the dimension of S is 2.
Gram-Schmidt Orthogonalization Process. We can find a basis of a
set of vectors S = {v1 , v2 , . . . } as follows. Let T = S and
1. Pick any vector v1 S. Then let
e1 =

v1
|v1 |

(2.29)

then let T 0 = T {v1 } and rename T 0 to T .


2. Pick any vector v2 T . Let T 0 = T {v2 } and rename T 0 to T .
Define
f = v2 (e1 v2 )e1

(2.30)

If f = 0, pick another vector v from T , let T 0 = T {v}, and rename


T 0 to T. Recompute f and continue.
If f 6= 0, then
e2 =
Page 12

f
|f |

(2.31)

Last revised: November 3, 2012

TOPIC 2. VECTORS IN 3-SPACE

Math 462

3. Continue picking vectors from T . Define


f = vj (e1 vj )e1 (e2 vj )e2 (ej1 vj )ej1

(2.32)

If f = 0 then pick another vector from T , and remove it as before. If


you run out of vectors in T you are done. Otherwise,
ej =

f
|f |

(2.33)

and continue.
Example 2.2 Calculate a basis for the set S in the previous example.
We start by calculating
|a| =

1+2+1=2

(2.34)

Hence
a
|a|
1
= (1, 2, 1)
2


1 1 1
=
, ,
2
2 2

(2.35)

e1 =

(2.36)
(2.37)

Next,
f = b (e1 b)e1

(2.38)


1 1 1
, ,
2
2 2


1 1 1
= (10, 0, 10) 0
, ,
2
2 2

= (10, 0, 10)



1 1 1
(10, 0, 10)
, ,
2
2 2

(2.40)

= (10, 0, 10)

|f | = 200 = 10 2
f
1
e2 =
= (10, 0, 10) =
|f |
10 2
Last revised: November 3, 2012

(2.39)

(2.41)
(2.42)


1
1
, 0,
2
2


(2.43)

Page 13

Math 462

TOPIC 2. VECTORS IN 3-SPACE

Finally, there is only one vector left, c, so we compute


f =c (e1 c)e1 (e2 c)e2

=(24 2, 2, 24 2)





1 1 1
1 1 1
, ,
, ,
(24 2, 2, 24 2)

2
2
2 2
2 2





1
1
1
1
(24 2, 2, 24 2)
, 0,
, 0,
2
2
2
2

=(24 2, 2, 24 2)

  1 1 1    1
1
, ,
24 2 , 0,
2 2
2
2 2
2
2

=(24 2, 2, 24 2)

!
!
24 2
2 2 2 2 2 2
24 2
, ,
, 0,
2
2
2
2
2

=(24 2, 2, 24 2) ( 2, 2, 2) (24, 0, 24)


=(0, 0, 0)

(2.44)

(2.45)

(2.46)

(2.47)
(2.48)
(2.49)

Since f = 0 and there are no more vectors we are done. The basis is {e1 , e2 }

Page 14

Last revised: November 3, 2012

Topic 3

Matrices and
Determinants
Definition 3.1 An m n (or m by n) matrix A is a rectangular array of
number with m rows and n columns. We will denote the number in the ith
row and j th column as aij

a11 a12 a1n


a21 a22
a2n

A= .
(3.1)
..
..
.
am1

am2

amn

We will sometimes denote the matrix A by [aij ].


Definition 3.2 The transpose of the matrix A is the matrix AT obtained
by interchanging the row and column indices,
(AT )ij = aji

(3.2)

[aij ]T = [aji ]

(3.3)

or
Remark: The transpose of an m n matrix is an n m matrix.
Definition 3.3 The Hermitian Conjugate or Conjugate Transpose
or Hermitian Adjoint Matrix or just adjoint matrix, denoted by AH
or A of a matrix A is the transpose of the complex conjugate of the matrix,
[aH
ij ] = [aji ]

2012. Draft of: November 3, 2012

(3.4)
Page 15

Math 462

TOPIC 3. MATRICES AND DETERMINANTS

Example 3.1 Let



A=
Then
A =

5 + 7i
i


3 2i
17

(3.5)


i
17

(3.6)


5 7i
3 + 2i

Theorem 3.4 (Conjugate) Transpose of a Product:


(AB) = B A

(3.7)

Definition 3.5 The inner product between two complex vectors is defined as
v w
(3.8)
This reduces to the ordinary dot product when the vectors are real. When
the inner product is complex we may use it to define a complex cosine,
where
cos = v w
(3.9)
Definition 3.6 A matrix A is called Hermitian or Self Adjoint if
AH = A

(3.10)

Example 3.2 The matrix





5
6 + 7i
6 7i
17

(3.11)

is self-adjoint.
Remark 3.7 Do not confuse the adjoint matrix with the Classical adjoint defined below in definition 3.39, as they are not related.
Theorem 3.8 The diagonal entries of a self-adjoint matrix are real.
Column and Row Vectors. We will sometimes represent the vector
v = (x, y, z) by its 3 1 column-vector representation

x
v = y
(3.12)
z
or its 1 3 row-vector representation
vT = x
Page 16

(3.13)

Last revised: November 3, 2012

TOPIC 3. MATRICES AND DETERMINANTS

Math 462

Definition 3.9 Matrix Addition is defined between two matrices of the


same size, by adding corresponding elemnts.


a11 + b11 b22 + b12
b11 b12
a11 a12
a21 a22 b21 b22 a21 + b21 a22 + b22

+
..
..
..
.
.
.
(3.14)
Matrices that have different sizes cannot be added.
Definition 3.10 A square matrix is any matrix with the same number
of rows as columns. The order of the square matrix is the number of rows
(or columns).
Definition 3.11 The column (row) rank of a matrix is the dimension
of its column (row) space.
1. Row rank = column rank (follows from the SVD, to be discussed
later)
2. For a square matrix, rank order
3. A matrix is said to be of full rank if it has the maximum possible
rank, i.e., min(n, m) for an m n matrix.
Definition 3.12 Let A be a square matrix. A submatrix of A is the
matrix A with one (or more) rows and/or one (or more) columns deleted.
Example 3.3 Let

1
A = 5
9

2
6
10

3
7
11

4
8
12

(3.15)

Then the submatrix formed by rows 1 and 2 and columns 1, 3 and 4 is


given by


1 3 4
A=
(3.16)
5 7 8
Definition 3.13 The determinant of a square matrix A is defined as
follows. Let n be the order of A. Then
1. If n = 1 then A = [a] and det A = a.
2. If n 2 then
det A =

n
X

aki (1)i+k det(A0ik )

(3.17)

i=1

for any k = 1, .., n, where by A0ik we mean the submatrix of A with the
ith row and k th column deleted. (The choice of which k does not matter
Last revised: November 3, 2012

Page 17

Math 462

TOPIC 3. MATRICES AND DETERMINANTS

because the result will be the same.)


We denote the determinant by the notation

a11


det A = a21
..
.

a12
a22

(3.18)

In particular,

a

c


b
= ad bc
d

(3.19)

and

A

D

G

B
E
H



C
E

F = A
H
I



D
F
B
G
I



D
F
+ C
G
I


E
H

(3.20)

Definition 3.14 Let A = [aij ] be any square matrix of order n. Then the
cofactor of aij , denoted by cof aij , is the (1)i+j det Mij where Mij is
the submatrix of A with row i and column j removed.
Example 3.4 Let

1
A = 4
7

2
5
8

3
6
9

(3.21)

Then

4
cof(a12 ) = (1)1+2
7


6
= (1)(36 42) = 6
9

(3.22)

The determinant can be expanded along any row or column:


Remark 3.15 Expansion of Determinant using Cofactor Notation:
det A =

n
X
j=1
n
X

aij cof(aij )

(Expansion along row i)

(3.23)

aji cof(aji )

(Expansion along column j)

(3.24)

j=1

Page 18

Last revised: November 3, 2012

TOPIC 3. MATRICES AND DETERMINANTS

Math 462

Determinant via Permutations on (1, 2. . . . , n)


A permutation of the sequence (1, 2. . . . , n) is a rearrangement. For example, possible permutations of the sequence (1, 2, 3) are
(1, 2, 3) (3, 1, 2) (2, 3, 1)
(1, 3, 2) (2, 1, 3) (3, 2, 1)

(3.25)

Definition 3.16 Let = (j1 , j2 , . . . , jn ) be any permutation (reordering)


of (1, 2. . . . , n). Then the sign of the permutation is defined by
Y
s(j1 , . . . , jn ) =
sign (jq jp )
(3.26)
1p<qn

For example, for n = 2:


Y

s(j1 , j2 ) =

sign (jq jp ) = sign (j2 j1 )

(3.27)

1p<q2

= s(1, 2) = sign (2 1) = 1

(3.28)

s(2, 1) = sign (1 2) = 1

(3.29)

For n = 3,
Y

s(j1 , j2 , j3 ) =

sign (jq jp )

(3.30)

1p<q3

= sign (j2 j1 ) sign (j3 j1 ) sign (j3 j2 )

(3.31)

There are six possible permutations, from (3.25)


s(1, 2, 3) = sign (2 1) sign (3 1) sign (2 1) = (+)(+)(+) = 1

(3.32)

s(3, 1, 2) = sign (1 3) sign (2 3) sign (2 1) = ()()(+) = 1

(3.33)

s(2, 3, 1) = sign (3 2) sign (1 2) sign (1 3) = (+)()() = 1

(3.34)

s(1, 3, 2) = sign (3 1) sign (2 1) sign (2 3) = (+)(+)() = 1

(3.35)

s(2, 1, 3) = sign (1 2) sign (3 2) sign (3 1) = ()(+)(+) = 1

(3.36)

s(3, 2, 1) = sign (2 3) sign (1 3) sign (1 2) = ()()() = 1

(3.37)

Then we can rewrite the definition of a determinant as:


X
det(A) =
s(j1 , . . . , jn )a1j1 a2,j2 an,jn

(3.38)

j1 ,...,jn

s(j1 , . . . , jn )

j1 ,...,jn

Last revised: November 3, 2012

n
Y

ai,ji

(3.39)

i=1

Page 19

Math 462

TOPIC 3. MATRICES AND DETERMINANTS

A 2 2 determinant is:
det(A) = s(1, 2)a11 a22 + s(2, 1)a12 a21
= a11 a22 a12 a21

(3.40)
(3.41)

A 3 3 determinant is:
det A = s(1, 2, 3)a11 a22 a33 + s(3, 1, 2)a13 a21 a22 +
s(2, 3, 1)a12 a23 a31 + s(1, 3, 2)a11 a23 a32 +
s(2, 1, 3)a12 a21 a33 + s(3, 2, 1)a13 a22 a31

(3.42)

= a11 a22 a33 + a13 a21 a22 + a12 a23 a31


a11 a23 a32 a12 a21 a33 a13 a22 a31

(3.43)

and so forth.
Remark 3.17 Properties of Permutations
1. If two numbers in a permutation (j1 , . . . , jn ) are interchanged, teh
sign is reversed.
2. Suppose the permutation of (j1 , . . . , jn ) can be formed from (1, 2, . . . , n)
by k successive interchanges. Then s(j1 , . . . , n ) = (1)k .
Example 3.5
s(5, 1, 3, 2, 4) = s(5, 4, 3, 2, 1)

(3.44)

Example 3.6 Find s(4, 3, 2, 1).


Starting from (1, 2, 3, 4) we interchangespairs of numbers:
(1, 2, 3, 4) (4, 2, 3, 1)

interchange 1,4

(3.45)

(4, 3, 2, 1)

interchange 2,3

(3.46)

Two interchanges were required hence


s(4, 3, 2, 1) = (1)2 = 1

(3.47)

Theorem 3.18 det(AT ) = det(A)


Proof. Let B = AT , so that bij = aji . Then
det AT = det B =

s(j)b1j1 b2j2 bnjn

(3.48)

(j)

s(j)aj1 1 aj2 2 ajn n

(3.49)

(j)

Page 20

Last revised: November 3, 2012

TOPIC 3. MATRICES AND DETERMINANTS

Math 462

But by rearrangement and relabeling


aj1 1 aj2 2 ajn n = a1i1 a2ij anin

(3.50)

because each of j1 , j2 , . . . , jn occurs precisely once.


Thus as (j) goes through all n! permutations, so does (i).
Thus if (3.50) requires k interchanges to get from (j1 , . . . , jn ) to (1, . . . , n)
then so does (i1 , . . . , in ).
This means that
s(j) = (1)k = s(i)

(3.51)

hence
det AT =

s(j)aj1 1 aj2 2 ajn n

(3.52)

s(i)a1i1 a2i2 anin

(3.53)

(j)

X
(i)

= det A

(3.54)

Theorem 3.19 If two rows (columns) of a square matrix are interchanged,


the sign of the determinant is reversed.
Proof. (See Franklin 1.3.4)
Corollary 3.20 If two rows (columns) of a square matrix are identical, the
determinant is zero.
Proof. Form A0 by exchange any two rows (columns) of A. By the previous
theorem,
det A0 = det A
(3.55)
But since the rows are identical, A0 = A, hence
det A0 = det A

(3.56)

Since det A = det A, we conclude that det A = 0.


Definition 3.21 A square matrix A is said to be singular if det A = 0,
and non-singular if det A 6= 0.
Last revised: November 3, 2012

Page 21

Math 462

TOPIC 3. MATRICES AND DETERMINANTS

Definition 3.22 Matrix Times a Vector Let A = [aij ] be an m r


matrix and let b = [bj ] be a vector of length r. Then the product Ab is
defined by
r
X
[Ab]i =
aij bj = rowi (A) b
(3.57)
j=1

Theorem 3.23 Cramers Rule. Let A be nonsingular. Then the solution


of a linear system
Ax = b
(3.58)
is x = (x1 , x2 , . . . )T , where
xi =

det Ai
det A

(3.59)

where Ai is matrix A with the ith column replaced by b.


Proof. The expression Ax = b is equivalent to
n
X
i=1
n
X

a1i xi = a11 x1 + a12 x2 + + a1n xn = b1

(3.60)

a2i xi = a21 x1 + a22 x2 + + a2n xn = b2

(3.61)

i=1

..
.
n
X

ani xi = an1 x1 + an2 x2 + + ann xn = b1

(3.62)

i=1

or more concisely:
n
X

aki xi = bk , k = 1, 2, . . . n

(3.63)

i=1

Pick any j in 1, . . . , n and consider the product formed by multiplying


cof(akj ) by (3.63),
cof(akj )

n
X

aki xi = cof(akj )bk

(3.64)

i=1

Summing over all k gives


n
X
k=1

Page 22

cof(akj )

n
X
i=1

aki xi =

n
X

cof(akj )bk

(3.65)

k=1

Last revised: November 3, 2012

TOPIC 3. MATRICES AND DETERMINANTS

Math 462

or
n
X

cof(akj )bk =

n X
n
X

cof(akj )aki xi

(3.66)

k=1 i=1

k=1

n
X

cof(akj )akj xj +

k=1

= xj

n
X

cof(akj )aki xi

(3.67)

i=1,i6=j

n
X

n
X

cof(akj )akj +

k=1

xi

i=1,i6=j

n
X

cof(akj )aki

(3.68)

k=1

The sum on the left is det Aj The first sum on the right is det A. The
internal sum of the second sum
n
X

cof(akj )aki , i 6= j

(3.69)

k=1

is the determinant of of a matrix where columns i and j are identical, hence


the sum is zero. This leaves us with
det Aj = xj det A

(3.70)

which is identical to (3.59).


Theorem 3.24 The n columns (or rows) of an n n square matrix A are
linearly independent if and only if det A 6= 0.
Definition 3.25 The range of a matrix A defined as
range(A) = {v |v = Ax for some vector x }

(3.71)

Theorem 3.26 range(A) is the space spanned by the columns of A.


Proof. (Proof that range(A) the column space of A ):
Let y range(A). Then there is some vector x such that
y = Ax
Since
yi =

n
X

aij xj

(3.72)

(3.73)

j=1

Denote (aj ) as the jth column of A. Then


yi =

n
X
(aj )i xj

(3.74)

j=1

Last revised: November 3, 2012

Page 23

Math 462

TOPIC 3. MATRICES AND DETERMINANTS

Hence
y=

n
X

(aj )xj = x1 (a1 ) + x2 (a2 ) +

(3.75)

j=1

Hence y is a linear combination of the columns of A.


(Proof that the column space of A range(A)):
Let y the column space of A. Then for some numbers xj ,
y=

n
X

xj (aj ) = Ax

(3.76)

j=1

Hence y range(A) and therefore the column space of A range(A).

Remark 3.27 The terms range and column space are used interchangeably, since we has shown that they are equivalent.
Definition 3.28 The nullspace of a matrix A is the set of all vectors v
such that Av = 0. If v nullspace(A) then by (3.75)
0=

n
X

(aj )xj = x1 (a1 ) + x2 (a2 ) +

(3.77)

j=1

Definition 3.29 Matrix Multiplication. Let A = [aij ] be an m r


matrix and let B = [bij ] be an r n matrix. Then the matrix product is
defined by
r
X
[AB]ij =
aik bkr = rowi (A) columnj (B)
(3.78)
k=1
th

i.e., the ij element of the product is the dot product between the ith row
of A and the j th column of B.
Example 3.7


1
4

2
5

 8
3
10
6
12



9
(1, 2, 3) (8, 10, 12) (1, 2, 3) (9, 11, 13)

11 =
(4, 5, 6) (8, 10, 12) (4, 5, 6) (9, 11, 13)
13
(3.79)


64
70
=
(3.80)
156 169

Note that the product of an [n r] matrix and an [r m] matrix is always


an [n m] matrix. The product of an [n r] matrix and and [s n] is
undefined unless r = s.
Page 24

Last revised: November 3, 2012

TOPIC 3. MATRICES AND DETERMINANTS

Math 462

Theorem 3.30 If A and B are both n n square matrices then


det AB = (det A)(det B)

(3.81)

Proof. (See Franklin 1.7.1)


Definition 3.31 The main diagonal of a square matrix A is the list
(a11 , a22 , . . . , ann ).
Definition 3.32 A diagonal matrix is a square matrix that all of its nonzero entries on the main diagonal. A diagonal matrix is sometimes denoted
by
diag(a1 , a2 , . . . , an )
(3.82)
Remark 3.33 The determinant of a diagonal matrix is the product of the
elements on the diagonal.
Definition 3.34 Identity Matrix. The n n matrix I is defined as the
diagonal matrix consisting of only 1s,
I = diag(1, 1, . . . , 1)

(3.83)

Theorem 3.35 I is the identity under matrix multiplication. Let A be


any n n matrix and I the n n Identity matrix. Then
AI = IA = A

(3.84)

Definition 3.36 A square matrix A is said to be invertible if there exists


a matrix A1 , called the inverse of A, such that
AA1 = A1 A = I

(3.85)

Theorem 3.37 A square matrix is invertible if and only if it is nonsingular,


i.,e, det A 6= 0.
Proof. From (3.85)
det(AA1 ) = (det A)(det A1 ) = det I = 1
Hence
det A =

1
det A1

(3.86)

(3.87)

If det(A) = 0 then the number det(A1 ) is undefined.


If det(A) 6= 0 then the constructive algorithm given by theorem 3.40 can
be used to compute the inverse.
Last revised: November 3, 2012

Page 25

Math 462

TOPIC 3. MATRICES AND DETERMINANTS

Theorem 3.38 Let v be a non-trivial vector. Then


Av = 0

(3.88)

if and only if A is singular.


Proof. (See Franklin 2.4.1)
Definition 3.39 Let A be a square matrix of order n. The Adjugate
Matrix or Clasical Adjoint of A, denoted adj A, is the transopose of the
matrix that results when every element of A is replaced by its cofactor.
Example 3.8 Find the adjugate of

1
A = 4
0

0
5
3

3
0
1

(3.89)

The adjugate is

(1)[(1)(5) (0)(3)]
adjA = (1)[(0)(1) (3)(3)]
(1)[(0)(0) (3)(5)]

5
= 9
15

4
1
12

(1)[(4)(1) (0)(0)]
(1)[(1)(1) (3)(0)]
(1)[(1)(0) (3)(4)]

T
5
12
3 = 4
12
5

9
1
3

T
(1)[(4)(3) (5)(0)]
(1)[(1)(3) (0)(0)]
(1)[(1)(5) (0)(4)]
(3.90)

15
12
5

(3.91)

Theorem 3.40 Let A be a non-singular square matrix. Then


A1 =

1
adj A
det A

(3.92)

Example 3.9 Let A be the square matrix defined in equation 3.89. Then
det A = (1)(5 0) (0) + (3)(12 0) = 41
Hence
A1

5
1
4
=
41
12

9
1
3

15
12
5

(3.93)

(3.94)

In practical terms, computation of the determinant is computationally inefficient, and there are faster ways to calculate the inverse, such as via
Gaussian Elimination. In fact, determinants and matrix inverses are very
rarely used computationally because there is almost always a better way to
solve the problem, where by better we mean the total number of computations as measure by number of required multiplications and additions.
Page 26

Last revised: November 3, 2012

Topic 4

Eigenstuff
Definition 4.1 Let A be a square matrix. Then is called an eigenvalue
of A if there exists some nonzero vector v such that
Av = v

(4.1)

The vector v is called an eigenvector of A with eigenvalue .


Remark 4.2 The eigenvector corresponding to a given eigenvalue is not
unique. In fact, any multiple of an eigenvector is an eigenvector with the
same eigenvalue. It is also possible for multiple linearly independent eigenvectors to have the same eigenvalue.
Example 4.1 Let

1
A = 0
6

0
4
0

1
0
0

(4.2)

Then

3
v = 0
6
is an eigenvector of A with

1 0
Av = 0 4
6 0

(4.3)

eigenvalue = 3. To see this, we calculate




1
3
9
3
0 0 = 0 = 3 0 = 3v
(4.4)
0
6
18
6

2012. Draft of: November 3, 2012

Page 27

Math 462

TOPIC 4. EIGENSTUFF

In general it is much more difficult to find the eigenvalues and eigenvalues


than it is to verify that a given vector is an eigenvector. To do this we need
the characteristic equation of a matrix.
Definition 4.3 The characteristic equation of a square matrix of order
n is the nth order (or possibly lower order) polynomial
det(A I) = 0

(4.5)

Example 4.2 Let A be the square matrix defined in equation 3.89. Then
its characteristic equation is


1
0
3

5
0
0 = 4
(4.6)
0
3
1
= (1 )(5 )(1 ) 0 + 3(4)(3)
2

= 41 11 + 7

(4.7)
(4.8)

Theorem 4.4 The eigenvalues of a square matrix A are the roots of its
characteristic polynomial.
Proof. By definition, is an eigenvalue of A if and only if there is a nonzero
vector v such that
Av = v

(4.9)

Av v = 0

(4.10)

Av Iv = 0

(4.11)

(A I)v = 0

(4.12)

A I is singular

(4.13)

det(A I) = 0

(4.14)

Hence is a root of the characteristic polynomial.


Example 4.3 Let A be the square matrix defined in equation 3.89. Then
its eigenvalues are the roots of the cubic equation
41 11 + 72 3 = 0

(4.15)

The only real root of this equation is approximately 6.28761. There are
two additional complex roots, 0.356196 2.52861i and 0.356196 +
2.52861i.
Page 28

Last revised: November 3, 2012

TOPIC 4. EIGENSTUFF

Math 462

Example 4.4 Let

2
A = 1
1
Its characteristic equation is

2
2

1
0 = 1
1
3

3
1
1

2
1
3

3
1
1

= (2 )[(1 )(1 ) 3] + 2[(1 ) 1]

(4.16)

(4.17)
(4.18)

+ 3[3 (1 )]

(4.19)

(4.20)

= (2 )(1 + 3) + 2(2 ) + 3(2 + )


2

= (2 )( 4) 2( + 2) + 3( + 2)

(4.21)

= (2 )( + 2)( 2) + ( + 2)

(4.22)

= ( + 2)[(2 )( 2) + 1]

(4.23)

(4.24)

= ( + 2)( 4 + 3)

(4.25)

= ( + 2)( 3)( 1)

(4.26)

= ( + 2)( + 4 3)

Therefore the eigenvalues are -2, 3, 1. To find the eigenvector corresponding


to -2 we would solve the system of



2 2 3
x
x
1 1
1 y = 2 y
(4.27)
1 3 1
z
z
for x, y, z. One way to do this is to multiply out the matrix on the left and
solve the system of three equations in three unknowns:
2x 2y + 3z = 2x

(4.28)

x + y + z = 2y

(4.29)

x + 3y z = 2z

(4.30)

However, we should observe that the eigenvector is never unique. For example, if v is an eigenvector of A with eigenvalue then
A(kv) = kAv = kv

(4.31)

i.e., kv is also an eigenvalue of A. So the problem is simplified: we can


try to fix one of the elements of the eigenvalue. Say we try to find an
Last revised: November 3, 2012

Page 29

Math 462

TOPIC 4. EIGENSTUFF

eigenvector of A corresponding to = 2 with y = 1. Then we solve the


system
2x 2 + 3z = 2x

(4.32)

x + 1 + z = 2

(4.33)

x + 3 z = 2z

(4.34)

Simplifying
4x 2 + 3z = 0

(4.35)

x+3+z =0

(4.36)

x+3+z =0

(4.37)

The second and third equations are now the same because we have fixed
one of the values. The remaining two equations give two equations in two
unknowns:
4x + 3z = 2
x + z = 3

(4.38)
(4.39)

The solution is x = 11, z = 14. Therefore an eigenvector of A corresponding to = 2 is v = (11, 1, 14), as is any constant multiple of this
vector.
Theorem 4.5 The eigenvalues of a diagonal matrix are the elements of the
diagonal.
Proof. Let the diagonal elements be d1 , d2 , . . . The characteristic equation
is
0 = det(A I)

d1
0

0
d
2
=
..
.

(4.40)


0
0

= (d1 )(d2 )

(4.41)
(4.42)

Hence the roots are d1 , d2 , . . .


Theorem 4.6 The determinant is the product of the eigenvalues:
det(A) = 1 2 . . . n
Page 30

(4.43)

Last revised: November 3, 2012

TOPIC 4. EIGENSTUFF

Math 462

Proof. The characteristic polynomial is


det(xI A) = (x 1 )(x 2 ) (x n )

(4.44)

Setting x = 0 gives
det(A) = (1)n 1 2 n

(4.45)

The result follows because det(A) = (1)n det(A).


Definition 4.7 The trace of a square matrix is the sum of its diagonal
elements
Theorem 4.8 The trace is equal to the sum of the eigenvalues:
trace(A) = a11 + a22 + + ann = 1 + 2 + + n

(4.46)

Proof. (will be given later).


Definition 4.9 An upper (lower) triangular matrix is a square matrix
that only has nonzero entries on or above (below) the main diagonal.
Theorem 4.10 The eigenvalues of an upper (lower) triangular matrix lie
on the main diagonal.
Proof. Let A be an upper triangular matrix with d1 , d2 , . . . on the diagonal.
The characteristic equation is
0 = det(A I)

d1
a12

0
d

2


0
= 0
.
.
.

0

(4.47)
a13
a23

a14
a24

d3

a33
..
.

= (d1 )(d2 ) (dn )


a1n
a2n
..
.

an1,n
dn

(4.48)

(4.49)

where the determinant is expanded by the first column. Hence the roots
are d1 , d2 , . . . A similar calculation holds for lower triangular matrices, expanding the determinant by the first row.

Last revised: November 3, 2012

Page 31

Math 462

TOPIC 4. EIGENSTUFF

Theorem 4.11 Let A be a square n n matrix on F. Then the following


are equivalent:
1. A is invertible
2. rank(A) = n
3. range(A) = Fn
4. nullspace(A) = {0}
5. 0 is not an eigenvalue of A
6. 0 is not a singular value of A (we will define singular values below)
7. det(A) 6= 0

Page 32

Last revised: November 3, 2012

Topic 5

Inner Products and


Norms
Note for Next Year
Move this material to Chapter 21 and integrate the material more closely.
Definition 5.1 The inner product of two vectors x, y Fn is given by
hx, yi = x y =

n
X

xi yi

(5.1)

k=1

Definition 5.2 The Euclidean Length of a vector xis given by


v
u n
p
uX
x2i
|x| = hx, xi = t

(5.2)

k=1

(Compare the above definitions with definitions 2.1 and 2.5, and the comments that follow those earlier definitions.) The Euclidean Length satisfies
the properties of a norm given below in definition 5.8.
Definition 5.3 The Cosine of the Angle between two vectors x and y
is given by
hx, yi
cos =
(5.3)
|x||y|
Definition 5.4 Two vectors x and y are said to be orthogonal if
hx, yi = 0

2012. Draft of: November 3, 2012

(5.4)
Page 33

Math 462

TOPIC 5. INNER PRODUCTS AND NORMS

In Rm this means the vectors are perpendicular.


A set of nonzero vectors X = {x1 , x2 , . . . } is an orthogonal set of
vectors (or just orthogonal) if
hxi , xj i = 0 whenever i 6= j

(5.5)

A set of nonzero vectors X = {x1 , x2 , . . . } is an orthonormal set of


vectors (or just orthonormal) if V is orthogonal and |xj | = 1 for all j.
In an orthonormal set we have1
hxi , xj i = ij

(5.6)

Two sets of vectors X = {x1 , x2 , . . . } and Y = {y1 , y2 , . . . } are orthogonal sets of vectors (or just orthognal) if
hxi , yj i xi X and yj Y

(5.7)

Theorem 5.5 The vectors in an orthogonal set are linearly independent.


Proof. Let V = {v1 , v2 , . . . } be an orthogonal set.
Suppose that V is not linearly independent. Then by definition 2.13 there
is some vk (6= 0) V such that
vk =

n
X

ci vi

(5.8)

i=1,i6=k

where the ci are not all zero.


Since vk 6= 0, |vk | =
6 0. But
|vk |2 = vk vk
n
X

= vk
ci vi
=

i=1,i6=k
n
X

ci vk vi = 0

(5.9)
(5.10)

(5.11)

i=1,i6=k

where the last term follows because hvk , vi i = 0 for i 6= k. But this contradicts the observation that |vk | =
6 0.
Hence no vk can be written as a linear combination of the other elements
of V, and we conclude that V is a linearly independent set.
1 The

Kronecker function is defined as ij = 0 if i 6= j and ii = 1.

Page 34

Last revised: November 3, 2012

TOPIC 5. INNER PRODUCTS AND NORMS

Math 462

Definition 5.6 A matrix is called unitary (or merely orthogonal on R)


if
A = A1
(5.12)
For real matrices, this reduces to
AT = A1

(5.13)

Theorem 5.7 Unitary matrices preserve lengths and angles, in the sense
that if A is unitary and xis any vector over F then
|Ax| = |x|

(5.14)

and if x and y are any two vectors over F then


hAx, Ayi = hx, yi

(5.15)

Proof. Let A be unitary.


1. Lengths are preserved: Let u = Av
|u|2 = u u = (Av) (Av) = v A Av = v v = |v|2

(5.16)

2. Angles are preserved: Let x = Au and y = Av. Then their inner


product is
x y = (Au) Av

Last revised: November 3, 2012

(5.17)

= u A Av

(5.18)

= u v

(5.19)

Page 35

Math 462

TOPIC 5. INNER PRODUCTS AND NORMS

Orthogonal Components and Projections


Let V = {v1 , v2 , . . . , vn } be an orthonormal set and let x be any vector
over F. Then the projection of x onto vj or the j th component of xis
xj = hvj , xi vj = (vj x)vj
Let
y =x

n
X

(5.20)

(vj x)vj

(5.21)

j=1

Then
hvi , yi = vi x

n
X
(vj x)(vi vj ) = 0

(5.22)

j=1

since the only non-zero term in the sum is the one with i = j.
Thus y is orthogonal to all the vi . We say that y is orthogonal to the
set V .
Thus any vector x can be be decomposed into n + 1 orthogonal components
Parallel to V
Orthogonal to V

z}|{
y

x=

z
}|
{
n
X
+
(vj x)vj

(5.23)

j=1

=y+

n
X

vj (vj x)

(5.24)

j=1

y
|{z}

Orthogonal to V

n
X

(vj vj )x

(5.25)

j=1

{z

Parallel to V

The matrix
P=

n
X
(vj vj )

(5.26)

j=1

is the projection matrix onto V , so that Px is parallel to V and (IP)x


is orthogonal:
x = Px + (I P)x
(5.27)

Page 36

Last revised: November 3, 2012

TOPIC 5. INNER PRODUCTS AND NORMS

Math 462

Vector Norms
We will discuss norms more generally in chapter 21. Here we will review
some basic concepts of norms of vectors and matrices.
Definition 5.8 A norm is a function
kxk : Fn 7 R

(5.28)

such that (vectors)x, y Fn and (scalars)c F,


1. kxk 0
2. kxk = 0 x = 0
3. kx + yk kxk + kyk (triangle inequality)
4. kcxk = |c|kxk
There are many ways to define a norm; when it is clear which norm we are
using we will often write kxk as simply |x|, though this simplification is
usually reserved for the Euclidean length (2-norm) given in definition 5.2.
Definition 5.9 The p-norm is given by
v
u n
uX
p
|xi |p
kxkp = t

(5.29)

i=1

Some important special cases are given by


1. The 1-norm or taxicab norm:
kxk1 =

n
X

|xi |

(5.30)

i=1

2. The 2-norm or Euclidean norm:


v
u n
p
uX
|xi |2 n = hx, xi
kxk2 = t

(5.31)

i=1

3. The sup-norm or max-norm:


kxk = max |xi |
1in

(5.32)

The sup-norm arises in the limp of the p-norm.


Last revised: November 3, 2012

Page 37

Math 462

TOPIC 5. INNER PRODUCTS AND NORMS

Figure 5.1: Unit balls for several pnorms in R2 .


1-Norm Unit Ball

2 -Norm Unit Ball

3 -Norm Unit Ball

4 -Norm Unit Ball

5 -Norm Unit Ball

Sup -Norm Unit Ball

The pnorm can be visualized in terms of its unit-ball, that is, the collection
of all points such that
kxkp = 1
(5.33)
In 2-dimensions, we can describe a vector with components (x, y) in terms
of the polar coordinates (r, ), as
x = (r cos , r sin )

(5.34)

The unit ball is thus the set of all points


1/p

1 = kxkp = (|r cos |p + (|r sin |)p )

(5.35)

or equivalently (since 1p = 1),


1 = kxkp = |r cos |p + |r sin |p
p

(5.36)
p

= r (| cos | + | sin | )

(5.37)

since r = |r| 0. Solving for r,



r=

1
| cos |p + | sin |p

1/p
(5.38)

These unit balls are plotted for several values of p in figure 5.1. The dashed
box is the unit square [1, 1] [1, 1].
Page 38

Last revised: November 3, 2012

TOPIC 5. INNER PRODUCTS AND NORMS

Math 462

Figure 5.2: Unit balls for several weighted pnorms in R2 using W =


diag(1, 2).
1-Norm Weighted Unit Ball

2 -Norm Weighted Unit Ball

3 -Norm Weighted Unit Ball

Sup -Norm Weighted Unit Ball

A related set of norms is the weighted-p-norm,


kxkW,p = kWxk =

X

|wii xi |p

1/p

(5.39)

where W is any diagonal matrix.2 Some examples are illustrated in figure


5.2.

2 It

is possible to extend this concept to an arbitrary non-singular matrix, not just


diagonal matrices.

Last revised: November 3, 2012

Page 39

Math 462

TOPIC 5. INNER PRODUCTS AND NORMS

Matrix Norms
Note for Next Year
Include:
Raleigh Principle and derivation of simple formula for induced 2 norm
More on xT Ax as an ellipsoid
Applications: PCA and Least Squares Fit
The easiest way to define a matrix norm by treating any matrix as the
vector of its components. The Frobenius Norm or Hilbert-Schmidt
Norm is given by
sX
kAkF =
|aij |2
(5.40)
i,j

Similarly the matrix sup-norm as


kAk = max |aij |
i,j

(5.41)

These are the most commonly used matrix norms when proving properties
for numerical linear algebra.
Theorem 5.10 The Frobenius Norm can be written is
p
p
kAkF = trace(A A) = trace(AA )

(5.42)

Proof. (exercise)
One can also defined an induced matrix norm as
kAk =

sup
x(6=0)Fn

kAxk
kxk

(5.43)
kAxk

sup

(5.44)

{x(6=0)Fn |kxk=1}

The induced matrix norms have geometric interpretations and are more
commonly used in analysis.
Figure 5.3 shows illustrates several induced matrix norms using


0.7 2
A=
3 1

(5.45)

The effect of the matrix A under (5.43) is to perturb the unit ball of the
pnorm; the matrix norm is the radius of the circumscribed circle, which
gives the greatest distance of from the origin of the perturbed unit ball
under the vector pnorm given by A.
Page 40

Last revised: November 3, 2012

TOPIC 5. INNER PRODUCTS AND NORMS

Math 462

Figure 5.3: Unit balls for some induced matrix pnorms in R2 using the
matrix A given by (5.45). The pnorm is the radius of the circumscribed
circle.
1-Norm Induced Unit Ball

2 -Norm Induced Unit Ball

3 -Norm Induced Unit Ball

Sup -Norm Induced Unit Ball

The 2-norm of a diagonal matrix has the interesting property that A


maps the hypersphere to a hyperellipse with semi-axes given by the
diagonal elements themselves. Hence if
D = diag(d1 , d2 , . . . )

(5.46)

kDk = max |di |

(5.47)

Then
i

i.e., the matrix 2-norm of a diagonal matrix is the absolute value of the
largest diagonal element.
The induced 1-norm kAk1 has the property that
kAk1 =

kAxk1

sup

(5.48)

{x:kxk=1}

= max

1jn

n
X

|aij | = maximum column sum

(5.49)

j=1

= max kai k1
i=1,...,n

(5.50)

where ai is the ith column vector of A.


Last revised: November 3, 2012

Page 41

Math 462

TOPIC 5. INNER PRODUCTS AND NORMS

The induced sup ()-norm kAk has the property that


kAk =

kAxk

sup

(5.51)

{x:kxk=1}

= max

1in

n
X

|aij | = maximum row sum

(5.52)

j=1

= max kai k1

(5.53)

i=1,...,n

where ai is the ith row vector of A, or equivalently, the ith column vector
of A .
Theorem 5.11 Cauchy-Schwarz Inequality. Let x and y be vectors
over Fn . Then
hx, yi kxk2 kyk2

(5.54)

or in terms of components
X

xi yi

qX

x2i

 qX

yi2


(5.55)

Proof. If x and y are linearly dependent then there exists some a F such
that
y = ax

(5.56)

Then
2

(hx, yi) = ((x) (ax))


X
2
=
xi axi
X
2
xi xi
= a2
= a2 (kxk2 )2
= kxk2 a2 kxk2
= kxk2 kaxk2
= kxk2 kyk2
= hx, yi = kxkkyk
If x and y are not linearly dependent then define the function f (u) : F 7 R
Page 42

Last revised: November 3, 2012

TOPIC 5. INNER PRODUCTS AND NORMS

Math 462

by
f (u) = kx uyk22 0

(5.57)

= 0 hx uy, x uyi

(5.58)

= (x uy) (x uy)

(5.59)

= x x ux y u y x + u uy y
=

kxk22

ux y u y x + |u|

kyk22

(5.60)
(5.61)
(5.62)

Since this holds for all u F, we can pick any u we like, such as
y x
kyk22
x y
= u =
kyk22
u=

and
uu =

y x x y
|x y|2
=
kyk22 kyk22
kyk42

With this substitution,


 
 2
 
x y
|x y|
y x

2
x y
y x+
kyk22
0 kxk2
kyk22
kyk22
kyk42
|x y|2
|x y|2
= kxk22 2
+
kyk22
kyk22
2
|x y|
= kxk22
kyk22

(5.63)
(5.64)

(5.65)

(5.66)
(5.67)
(5.68)

Rearranging,
|x y|2
kxk22
kyk22

(5.69)

|x y|2 kxk22 kyk22

(5.70)

Taking the positive square root of each side gives the Cauchy-Scwartz inequality.
Theorem 5.12 Unitary Matrices preserve both the 2-norm and the Frobenius norm.
Proof. This result follows from theorem 5.7 applied to each norm.
Last revised: November 3, 2012

Page 43

Math 462

Page 44

TOPIC 5. INNER PRODUCTS AND NORMS

Last revised: November 3, 2012

Topic 6

Similar Matrices
Note for Next Year
Move this material to chapter 19 and integrate more thoroughly.
As we will see in definition 19.2,
Definition 6.1 Two square matrices A and B are called similar if there
exists a third square matrix T with det(T) 6= 0 such that
T1 AT = B

(6.1)

Remark 6.2 Similar Matrices represent the same linear transformation in


different coordinate systems. (see chapter 19).
Theorem 6.3 Similar matrices have the same eigenvalues with the same
multiplicities
Proof. Let B = T1 AT. Then
det(B I) = det(T1 AT I)
AT T

) det(A I) det(T)

(6.4)

) det(T) det(A I)

(6.5)

= det(T
= det(T

= det(T

(6.2)

IT)

= det(I) det(A I)
= det(A I)

(6.3)

(6.6)
(6.7)

Corollary 6.4 Similar matrices have the same trace.

2012. Draft of: November 3, 2012

Page 45

Math 462

TOPIC 6. SIMILAR MATRICES

Proof. The trace is the sum of the eigenvalues and the eigenvalues are
preserved.
Definition 6.5 A square matrix is diagonalizable if it is similar to a
diagonal matrix, i.e., there exists some T and some some diagonal matrix
D such that
T1 AT = D
(6.8)
Theorem 6.6 A is diagonalizable iff it has n linearly independent eigenvectors.
Proof. ( = ) Suppose that A is diagonalizable. Then for some matrix T
T1 AT = diag(d1 , d2 , . . . , dn )

(6.9)

Since T it is invertible and the columns t1 , . . . , tn are linearly independent.


Then (multiply by T),

A t1

AT = Tdiag(d1 , . . . , dn )


tn = t1 tn diag(d1 , . . . , dn )

= d1 t1 dn tn

(6.10)

Ati = di ti

(6.13)

(6.11)
(6.12)

Thus the columns of T are eigenvectors of A with eigenvalues di .


( = ) Suppose that A has n linearly independent eigenvectors t1 , . . . , tn
with eigenvalues 1 , . . . , n . Then we can construct

T = t1 tn
(6.14)

AT = A t1 tn
(6.15)

= At1 Atn
(6.16)

= 1 t1 n tn
(6.17)
= TD

(6.18)

where D = diag(1 , . . . , n ).
Multiplying by T1 gives
T1 AT = D

(6.19)

where D is diagonal.

Page 46

Last revised: November 3, 2012

Topic 7

Previewing the SVD


Note for Next Year
Move this material to chapter 31 and integrate more thoroughly.
We will talk more about the Singular Value Decomposition in chapter 31.
Here we provide a geometrical description and algorithm that can be used
to calculate the SVD.
A hyperellipse in Rn is a unit sphere in which each axis has been stretched
by some factors {1 , 2 , . . . , n } in some collection of orthogonal directions
{u1 , u2 , . . . , un }, where kui k2 = 1.
The vectors {1 u1 , 2 u2 , . . . , n un } are called the principal semi-axes of
the hyperellipse; thus axis j (along the uj direction) has length j .
Remark 7.1 The image of the unit sphere under any matrix is a hyperellipse, i.e., if S is a unit sphere in Rn and A is a matrix, then
S 0 = AS = {Ax|x S}

(7.1)

is a hypersphere. Here we have also introduced the notation AS for the


image of a set S under a matrix A.
Definition 7.2 Let A be an m n matrix with m n and let S be a unit
ball in Rn . Then
1. The singular values of A are the lengths of the n principal semiaxes of AS. We will assume that the singular values are numbered
according to
1 2 . . . n 0
(7.2)
If not, then rename all your variables.

2012. Draft of: November 3, 2012

Page 47

Math 462

TOPIC 7. PREVIEWING THE SVD

2. The left singular vectors of A are the unit vectors


{u1 , u2 , . . . , un }

(7.3)

numbered to correspond with the singular values, i.e., u1 corresponds


to the larges singular value and un to the the smallest.
3. The right singular vectors of A are the unit vector pre-images
{v1 , v2 , . . . , vn } S

(7.4)

of the ui in S, numbered again so that


Avj = j uj
From (7.5)

A v1 v2

(7.5)

 

vn = 1 u1 2 u2 n un


= u1 u2 un diag(1 , 2 , . . . , n )

(7.6)
(7.7)

Let us define the matrices



V = v1

U = u1

(7.8)

(7.9)

= diag(1 , 2 , . . . , n )

(7.10)

AV = U

(7.11)

v2
u2

vn

un

Then
Since the column vectors of V are orthonormal, the matrix V is unitary
(proof: exercise) and hence invertible with inverse V . This gives the
Singular Value Decomposition:
A = UV

(7.12)

Theorem 7.3 Every matrix has a singular value decomposition.


Proof. (will be given later in the course in terms of operators)
Theorem 7.4 Computing the SVD.
1. The left singular vectors are the normalized eigenvectors of AA .
2. The right singular vectors are the normalized eigenvectors of A A.
3. The singular values are the square roots of the eigenvalues of A A
(or, equivalently, AA ).
Page 48

Last revised: November 3, 2012

TOPIC 7. PREVIEWING THE SVD

Math 462

Example 7.1 Compute the Singular Value Decomposition of




2 2
A=
1 1

(7.13)

Let


 

2 1
2 2
5 3
R=A A=
=
2 1
1 1
3 5


 

2 2 2 1
8 0

L = AA =
=
1 1 2 1
0 2

(7.14)
(7.15)

The singular
values are the square roots of the eigenvalues, i.e., 1 = 2 2
and 2 = 2.
(To see that both R and L have the same eigenvalues, we can compute the
characteristic polynomial of R:
0 = (5 )(5 ) 9

(7.16)

(7.17)

= 10 + 16

(7.18)

= ( 2)( 8)

(7.19)

= 25 10 + 9
2

which also give eigenvalues of 2 and 8.)


Thus


2 2
=
0


0
2

(7.20)

The
vectors are the normalized eigenvectors of L, which are
  left singular
 
1
0
and
(by inspection, because L is diagonal); thus
0
1
U=


1
0


0
1

(7.21)

To get V we need the eigenvectors of R. The eigenvector corresponding


the 1 = 8 satisfies

 
 
5 3 x
x
=8
(7.22)
3 5 y
y
= 5x + 3y = 8x

(7.23)

= 3y = 3x

(7.24)

Last revised: November 3, 2012

Page 49

Math 462

TOPIC 7. PREVIEWING THE SVD

So an eigenvector is
that

 
1
. Since we need normalized eigenvectors, we find
1
" #
v1 =

1
2
1
2

(7.25)

The eigenvectors corresponding to 2 = 2 must satisfy



 
 
5 3 x
x
=2
3 5 y
y

(7.26)

= 5x + 3y = 2x

(7.27)

= 3y = 3x

(7.28)

So a normalized eigenvector is
"
v2 =
hence

"
V=

1
2
1
2

12

#
(7.29)

1
2

12

#
(7.30)

1
2

The SVD is then



1
A = UV =
0

0
1


2 2
0

 " 1
0
2

2 12

1
2
1
2

#
(7.31)

Note that the algorithm did not specify how to chose the sign of the eigenvectors, and this must be determined by trial and error - if we had chosen
one of the signs wrong it will still give an eigenvector but might not multiply
out to the correct answer.

Page 50

Last revised: November 3, 2012

Topic 8

Example: Metabolic Flux


As an application of linear algebra we will consider the study of flux analyses in biological systems.1 In particular, one problem of interest is the
identification of conserved moieties in biochemical systems, that is, the conservation of particular subgroups of of chemical species that are cyclically
transferred from one molecule to another.
Consider, for example, the system of chemical reactions
E E + X
XY
E+Y E

(8.1)
(8.2)

(8.3)

This system represents some enzyme, or catalyst, that exists in two forms,
which we call E and E ; the () is a common biochemical notation that
indicates the activated form of the inactive enzyme (). In form E, the
enzyme cant do anything, but in E it can interact with some substrate to
do something. In our system of reactions, adding Y to E makes it active
(equation (8.3)); and then the activated form E spontaneously converts
back to the inactive form E, emitting a molecule X in the process (equation
(8.1)). Meanwhile, when they are free, the molecules X can convert to Y
on their own (equation (8.2)). The enzymatic activity of E , i.e., what it
does when it is active, is not actually shown in eqs. (8.1) to (8.3).
In this system, there are two conserved cycles:
1 For further reading on the example in this section see Sauro and Ingalls, Biophysical
Chemistry 109(2004):1-15, Conservation analysis in biochemical networks: computational issues for software writers.

2012. Draft of: November 3, 2012

Page 51

Math 462

TOPIC 8. EXAMPLE: METABOLIC FLUX

1. {E, E }: E can become E and vice-versa, the total amount of E+E


is fixed.
2. {E , X, Y }: every times E converts to E, it emits an X, which in
terms becomes a Y ; but a molecule of Y is used every time an E is
created. Hence the total amount of E + X + Y is fixed.
This may not be at all obvious from the list of reactions.
To see this more clearly, the law of mass action tells us that if we know
how fast each of the reactions in eqs. (8.1) to (8.3) proceeds (e.g., in
molecules/second or some similar measure), we can define a system of differential equations. Let us assign rates of k1 , k2 , k3 to each of eqs. (8.1) to
(8.3), respectively. Biochemically, this is normally written as:

E 1 E + X
k2

XY

(8.4)
(8.5)

k3

E + Y E

(8.6)

Mathematically, the law of mass action says that the affinity or likelihood
of two chemicals on the left-hand-side of a biochemical equation
k

n1 A + n2 B P

(8.7)

(read as n1 moles of A combine with n2 moles of B to produce one mole of


P at a rate k) is given by the product kAn1 B n2 so that the rate at which
P is produced is
dP
= kAn1 B n2
(8.8)
dt
The numbers n1 and n2 are called stoichiometries. If P also appears on
the left of the reaction, i.e.,
k

n1 A + n2 B + n3 P n4 P

(8.9)

then the differential equation is modified as


dP
= k(n4 n3 )An1 B n2 P n3
dt

(8.10)

If P appears in more than one reaction, then the total dP/dt is the sum of
the dP/dts from each individual reaction.
Page 52

Last revised: November 3, 2012

TOPIC 8. EXAMPLE: METABOLIC FLUX

Math 462

We can now interpret equations (8.4) to (8.6) as follows:


dE
dt
dE
dt
dX
dt
dY
dt

= k1 E + k3 EY

(8.11)

= k1 E k3 EY

(8.12)

= k1 E k2 X

(8.13)

= k2 X k3 EY

(8.14)

Adding equation (8.11) and (8.12) gives


dE
dE
d(E + E )
=
+
dt
dt
dt
= k1 E + k3 EY + k1 E k3 EY
=0

(8.15)
(8.16)
(8.17)

Hence
E + E = constant

(8.18)

Next, adding equations (8.11), (8.13) and (8.14),


d(E + X + Y )
dE
dX
dY
=
+
+
dt
dt
dt
dt
= (k1 E + k3 EY ) + (k1 E k2 X)+
(k2 X k3 EY )

(8.19)

(8.20)

=0

(8.21)

hence
E + X + Y = constant

(8.22)

Thus we have two conserved cycles {E, E } and {E , X, Y }, as previously


stated.
Now, return to our differential equations (8.11) through (8.14):
dE
dt
dE
dt
dX
dt
dY
dt

= k1 E + k3 EY

(8.23)

= k1 E k3 EY

(8.24)

= k1 E k2 X

(8.25)

= k2 X k3 EY

(8.26)

Last revised: November 3, 2012

Page 53

Math 462

TOPIC 8. EXAMPLE: METABOLIC FLUX

while it would be easy to prove that a solution exists to this system of


equations, it is actually impossible to find such a solution analytically,
and one must solve the problem numerically. Furthermore, the actual
numbers k1 , k2 and k3 are often difficult (if not impossible) to measure
experimentally and may not be known with any reasonable degree of accuracy.
Instead, it is often more practical to study a system that describes what
can be more easily measured. This means looking at the reactions a little
differently. Let us rewrite this system of reactions as follows:
k

E 1 E + X
k

X 2 Y
k3

E + Y E

at rate v1 = k1 E moles/second

(8.27)

at rate v2 = k2 X moles/second

(8.28)

at rate v3 = k3 EY moles/second

(8.29)

As it turns out, it is sometimes a lot easier to know the number v1 , v2


and v3 then the numbers k1 , k2 , and k3 . The differential equations then
become:
dE
= v1 + v3
(8.30)
dt
dE
= v1 v3
(8.31)
dt
dX
= v1 v2
(8.32)
dt
dY
= v2 v3
(8.33)
dt
or in matrix form

E
1 0
1
v1

d
0 1
E= 1
v2
(8.34)
dt X 1 1 0
v3
Y
0
1 1
Letting S = (E , E, X, Y )T be the vector of chemical species, v = (v1 , v2 , v3 )T
be the vector of velocities and N the matrix

1 0
1
1
0 1

N=
(8.35)
1 1 0
0
1 1
the system of differential equations can be compactly expressed as
dS
= Nv
dt
Page 54

(8.36)
Last revised: November 3, 2012

TOPIC 8. EXAMPLE: METABOLIC FLUX

Math 462

The matrix N is called the stoichiometry matrix and has an easy intuitive explanation: Nij is the number of molecules of species j produced in
reaction i for each molecule consumed by the system:
(8.27)(8.28)(8.29)
1 0
1

1
0
1

1 1 0
0
1 1

for
for
for
for

E
E
X
Y

(8.37)

In other words, reading this matrix across the first row, E is decreased
by one by reaction (8.27) and increased by 1 in reaction (8.29). It is not
affected by reaction (8.28), hence the zero in the second column of the first
row.
It turns that the matrix N contains all the information we need to determine
which cycles are conserved. We first partition the matrix N (through, e.g.,
row reduction) into the linearly independent rows NR and the dependent
rows N0 as follows:


NR
N=
(8.38)
N0
Since the linearly dependent rows in N0 are linear combinations of the
independent rows in NR then there exists some matrix L0 such that
N0 = L0 NR

(8.39)

We call L0 the Link-zero matrix. Augment L0 with an identity matrix


to define
 
I
L=
(8.40)
L0
Combining these definitions


NR
N=
N0


INR
=
L0 NR
 
I
=
NR
L0
= LNR
Hence

 

d Si
dS
I
=
NR v = LNR v = Nv =
L0
dt
dt Sd

(8.41)
(8.42)
(8.43)
(8.44)

Last revised: November 3, 2012

(8.45)
Page 55

Math 462

TOPIC 8. EXAMPLE: METABOLIC FLUX

where Si and Sd define the independent and dependent species column


vectors, and
 
Si
S=
(8.46)
Sd
is the vector of all species.
Hence
dSi
dt
dSd
L0 NR v =
dt
NR v =

(8.47)
(8.48)

Substituting (8.47) into (8.48) gives


L0

dSi
dSd
=
dt
dt

Multiplying by t and integrating from 0 to ,


Z
Z
dSi
dSd
L0
dt =
dt
dt
dt
0
0

(8.49)

(8.50)

Applying the fundamental theorem of calculus,


L0 [Si ( ) Si (0)] = Sd ( ) Sd (0)

(8.51)

for all time . Changing variables back to t and rearranging,


L0 Si (t) L0 Si (0) = Sd (t) Sd (0)
L0 Si (t) Sd (t) = L0 Si (0) Sd (0)

(8.52)
(8.53)

The right hand side of (8.53) is a constant vector. Define


T = Sd (0) Si (0)

(8.54)

so that

Page 56

L0 Si (t) Sd (t) = T

(8.55)

L0 Si (t) ISd (t) = T


 

 Si
L0 I
= T
Sd
 

 Si
L0 I
=T
Sd


= L0 I S = T

(8.56)
(8.57)
(8.58)
(8.59)
(8.60)

Last revised: November 3, 2012

TOPIC 8. EXAMPLE: METABOLIC FLUX

Math 462

where (8.46) has been used in the final step. Finally we can define the
Conservation Matrix as


= L0 I
(8.61)
so that
S = T

(8.62)

Each row of gives us a conserved cycle (because T is a constant).


So discovering conserved cycles requires computing the conservation matrix
. To computer we return to the definition of the Link-zero matrix given
by (8.39) which we manipulate as follows:
L0 NR = N0

(8.63)

= 0 = L0 NR + N0

(8.64)

= L0 NR + IN0



 NR
= L0 I
N0


= L0 I N

from (8.41)

(8.67)

= N

from (8.61)

(8.68)

(8.65)
(8.66)

In other words
N = 0

(8.69)

= NT T = 0

(8.70)

Thus T is the nullspace of NT , or


= (null(NT ))T

(8.71)

As to our example, we see that

E X Y
E
1 1
1
0
T
N =
0
0 1 1
1 1 0 1

and hence
null(NT ) =

E E X Y
1 0 1 1
1 1 0 0

(8.72)

(8.73)

Reading across the first row tells us that {E , X, Y } form a cycle, and the
second row tells us that {E , E} form a cycle.
This can be computed, e.g., in Mathematica as follows:
Last revised: November 3, 2012

Page 57

Math 462

TOPIC 8. EXAMPLE: METABOLIC FLUX

N = {{-1, 0, 1}, {1, 0, -1}, {1, -1, 0}, {0, 1, -1}};


NullSpace[Transpose[N]]
In practice, the system of reactions is large, in the thousands rather than
just a handful. So the only way to perform this calculation is computationally. For an example of a large system see http://yeast.sourceforge.net
which contains a metabolic network consisting of a reconstruction of the
yeast genome Saccharomyces cerevisiae.
This is of course only the beginning of the problem. In general we dont
even know all of the values of the vi ; we only know some of them. The rest
we can only guess out with some random error,
vi,min vi vi,max i = 1, 2, . . .

(8.74)

and we have some sort of model description of the system (what we think
all the reaction are or should be) that is described by a stoichiometry matriz
N. The problem thus becomes this: Given
1. a proposed model of the system given by the matrix N;
2. a list of constraints on the velocities given by (8.74);
3. a list of observed fluxes f (i.e., a subset of v that we were actually
able to measure in the lab),
then what values of the conserved quantities v will produce the observed
fluxes f ?
Mathematically, this problem is typically solved as follows:2 maximize f T v
subject to Nv = 0 and the constraints (8.74) where v is the vector of all
fluxes and f is a vector of indicators (1s and 0s) that indicates which
components of v are known. This solution method has the underlying
assumption that nature will produce an optimal solution, which may not
be correct. However, solving the problem when formulated in this manner is possible because it is a restatement of the basic problem of linear
programming, which is described in any textbook on operations research.3
Typical algorithms to do this include the simplex method. More sophisticated techniques rely on nonlinear optimization methods. Linear programming (which has nothing inherently to do with computer programming) is a
field of mathematical optimization that was developed originally by Leonid
Kantrovitch and George Dantzig in the 1930s and 1940s and is extensively
utilized in business.
2 See Smallbone et. al., BMC Systems Biology 4:6 (2010), Towards a genomescale kinetic model of cellular metabolism.
3 See, for example, Hillier and Lieberman, Introduction to Operations Research,
McGraw-Hill.

Page 58

Last revised: November 3, 2012

Topic 9

Vector Spaces
Definition 9.1 A list of length n is a finite ordered collection of n objects,
e.g.,
(x1 , x2 , . . . , xn )
(9.1)
The expression
(x1 , x2 , . . . )

(9.2)

refers to a list of some finite unspecified length (it is not an infinite list).
The list of length 0 is written as ().
A list is similar to set except that the order is critical. For example,
{1, 3, 4, 2, 3, 5} and {1, 2, 3, 3, 4, 5}

(9.3)

are the same set but


(1, 3, 4, 2, 3, 5) and (1, 2, 3, 3, 4, 5)

(9.4)

are different lists.


We will sometimes use the term ordered tuple and list interchangeably.
Definition 9.2 An ordered pair is a list of length 2. An ordered triple
is a list of length 3.
Definition 9.3 Two lists are said to be equal if they contain the same
elements in the same locations.
We are already familiar of the use of a list to represent a point in space;
the list (x, y) represents a point in R2 , etc.

2012. Draft of: November 3, 2012

Page 59

Math 462

TOPIC 9. VECTOR SPACES

Definition 9.4 The set Fn is defined as the set of lists


Fn = {(x1 , x2 , . . . , xn )|x1 , x2 , . . . , xn F }

(9.5)

We will use the simplified notation


x = (x1 , x2 , . . . , xn )

(9.6)

to represent a list. For example, if x and y are two lists of the same length
n then we can defined list addition as
x + y = (x1 + y1 , x2 + y2 , . . . , xn + yn )

(9.7)

It is not possible to add lists of different lengths, but this will not be a
problem since we will in general be interested in using lists to represent
points in Fn . For example, if x, y Fn we can define the point
z = x + y Fn

(9.8)

by equation 9.7.
Definition 9.5 Let F be a field. A vector space V over F is a set
V along with two operations addition and scalar multiplication with
the following properties1 Elements of vector spaces are called vectors or
points.
1. Closure. v, w V and c F,
v+w V
cv V


(9.9)

2. Commutivity. u, v V,
u+v =v+u

(9.10)

3. Associativity. u, v, w V and a, b F,
(u + v) + w = u + (v + w)
(ab)v = a(bv)


(9.11)

1 Although this looks deceptively like the definition of a field, it is not, because we
only require scalar multiplication and not multiplication between elements of the set.
Also one should be careful to avoid confusing the terms vector space and vector field,
which you might see in your studies. A vector field is actually a function f : Fn 7 V(F)
that associates an element in the vector space V over F with every point in Fn . Vector
fields occur frequently in physics.

Page 60

Last revised: November 3, 2012

TOPIC 9. VECTOR SPACES

Math 462

4. Additive Identity. (0 V) 3
v+0=0+v =v

(9.12)

v V.
5. Additive Inverse. (v V), (w V) 3
v+w =w+v =0

(9.13)

6. Multiplicative Identity. 1v = v1 = v for all v V, where 1 is the


multipicative identity in F.
7. Distributive Property. (a, b F) and (u, v V)
a(u + v) = au + av,

(a + b)u = au + bu

(9.14)

A vector space over R is called a Real vector space. For example, Euclidean 3-space R3 defined by
R3 = {(x, y, z)|x, y, z R}

(9.15)

is a real vector space.


A vector space over C is called a Complex vector space. For example,
the set of all points
C4 = {(z1 , z2 , z3 , z4 )|z1 , z2 , z3 , z4 C}

(9.16)

is a complex vector field, where each element is a 4-vector with complex


values.
Some texts will use the term linear vector space or linear space instead
of vector space.
Example 9.1 Let V be set of all polynomials p(f ) : F 7 F, with coefficients
in F. The elements of V are then functions, such as
v = a0 + a1 f + a2 f 2 + + am f m

(9.17)

where a0 , a1 , . . . , am F. The V is a vector space over F with the normal


definition of addition and multiplication over F. If
v=

m
X

vk f ,

w=

k=0

Last revised: November 3, 2012

n
X
k=0

wk f ,

u=

p
X

uk f k ,

(9.18)

k=0

Page 61

Math 462

TOPIC 9. VECTOR SPACES

are polynomials over F, then

v+w =

m
X

vk f k +

k=0

n
X

wk f k

(9.19)

k=0

max(m,n)

(vk + wk )f k V

(9.20)

k=0

where if m < n the we define vk = 0; and if m > m = n we define wk = 0.


proving closure, and, since vk + wk = wk + vk by the commutivity of F,
commutivity also follows,

v+w =

max(m,n)

max(m,n)

(vk + wk )f k =

k=0

(wk + vk )f k = w + v

(9.21)

k=0

To see associativity over addition,


max(m,n,p)

(u + v) + w =

((uk + vk ) + wk )f k

(9.22)

(uk + (vk + wk ))f k

(9.23)

k=0
max(m,n,p)

X
k=0

= u + (v + w)

(9.24)

where the coefficients uk , vk , wk are all suitably extended to max(m, n, p)


by setting them equal to zero in the summation.
The addititive identity is the polynomial g(f ) = 0, since g(f ) + v(f ) =
v(f ) + g(f ) = v(f ) over F; the additive inverse if v is

v =

m
X

(vk )f k

(9.25)

k=0

Multiplying any polynomial by 1 F returns the original polynomial:

1v =

m
X
k=0

Page 62

(1)(vk )f =

m
X

(vk )f k = v

(9.26)

k=0

Last revised: November 3, 2012

TOPIC 9. VECTOR SPACES

Math 462

becuase 1vk = vk . Finally, we see distributivity. Let a, b F. Then


a(u + v) = a

=a

p
X

uk f +

m
X

k=0
p
X

k=0
m
X

k=0

k=0

uk f k + a

(a + b)u = (a + b)

p
X

!
vk f

(9.27)

vk f k = au + av

(9.28)

uk f k

(9.29)

k=0

=a

p
X

uk f k + b

k=0

p
X

uk f k = av + bv

(9.30)

k=0

Example 9.2 Let V be the set of all polynomials on F of degree at most


n. Then V is a vector space.
Example 9.3 Let V be the set of all 2 3 vector matrices over F:

v=

a b
d e

c
f


(9.31)

Then V is a vector space over F with matrix addition as (+) and scalar
multiplication of a matrix as (). For example, the sum of any two 2 3
matrix is a 23 matrix; matrix addition is commutative and associative; the
additive inverse of v is the matrix v and the additive identity is the matrix
of all zeros; the multiplicative identity 1 F is the multiplicative identity;
and scalar multiplication of matrices distributes over matrix summation.
Example 9.4 Let V be the set of all functions f : F 7 F. Then V is a
vector space.
Theorem 9.6 Let V be an vector space over F. Then V has a unique
additive identity.
Proof. Suppose that 0, 00 V are both additive identities. Then
00 = 00 + 0 = 0

(9.32)

Hence any additive identity equals 0, proving uniqueness.


Theorem 9.7 Let V be a vector space. Then every element in V has a
unique additive inverse, i.e., v V a unique w 3 v + w = w + v = 0.
Last revised: November 3, 2012

Page 63

Math 462

TOPIC 9. VECTOR SPACES

Proof. Let v V have two different additive inverses w and w0 . Then since
each is an additive inverse,
w = w + 0 = w + (v + w0 ) = (w + v) + w0 = 0 + w0 = w0

(9.33)

which proves uniqueness.


Notation: Because, given any v V, its additive inverse is unique, we
write it as v, i.e.,
v + (v) = (v) + v = 0
(9.34)
Furthermore, given any v, w V, we define the notation
w v = w + (v)

(9.35)

Notation. Because of associativity we will dispense with parenthesis, i.e.,


given any u, v, w V and any a, b F, we define the notations uvw and
u + v + w to mean the following:
u + v + w = (u + v) + w = u + (v + w)

(9.36)

abv = a(bv) = (ab)v

(9.37)

Remark. There are different 00 s, namely sometimes we will use 0 F and


other times 0 V and sometimes they will be used in the same equation.
We will use the same symbol for these zeroes. Be careful.
Theorem 9.8 Let V be a vector space over F. Then v V,
0v = 0

(9.38)

This is one of those cases where we have two different zeroes in the same
equation. The 0 on the left is the scalar zero in F, while the 0 on the right
is the vector 0 V.
Proof.
0v = (0 + 0)v = 0v + 0v 0v 0v = 0v 0 = 0v

(9.39)

Theorem 9.9 Let V be a vector space over F and let 0 V be the additive
indentity in V. Then
a0 = 0
(9.40)
Page 64

Last revised: November 3, 2012

TOPIC 9. VECTOR SPACES

Math 462

Proof.
a0 = a(0 + 0) = a0 + a0 0 = a0 a0 = a0 + a0 a0 = a0

(9.41)

Theorem 9.10 Let 1 F be the multiplicative identity in F and let v V.


Then
(1)v = v
(9.42)
Proof.
v + (1)v = (1)v + (1)v = (1 + 1)v = 0v = 0

(9.43)

Hence (1)v is the additive inverse of v, which we call v. Hence (1)v =


v.

Last revised: November 3, 2012

Page 65

Math 462

TOPIC 9. VECTOR SPACES

Page 66

Last revised: November 3, 2012

Topic 10

Subspaces
Definition 10.1 A subset U V is called a subspace of V if U is also a
vector space. Specifically, the following properties need to hold:
1. Additive identity:
0U

(10.1)

2. Closure under addition:


u, v U

u+v U

(10.2)

3. Closure under scalar multiplication:


a F u U,

au U

(10.3)

For example V = {(x, y, 0)|x, y R} is a subspace of R3 , and W =


{(x, 0)|x R} is a subspace of R2 .
Definition 10.2 Let U1 , . . . , Un be subspaces of V. Then the sum of
U1 , . . . Un is
U+ + Un = {u1 + + un |u1 U1 . . . , un Un }

(10.4)

Theorem 10.3 Let U and W be subspaces of V. The U W is a subspace


of V.
Proof. (a) Since U and W are subspaces of V they each contain 0. Hence
their intersection contains 0.
(b) Let x, y U W. Then x, y U and x, y W. Hence by closure of,
U, x + y U, and by closure of W, x + y W. Hence x + y U W.

2012. Draft of: November 3, 2012

Page 67

Math 462

TOPIC 10. SUBSPACES

(c) Let a F and x U W. Hence x U and x W. Since U is a


subspace, it is closed under scalar multiplication, hence ax U. Similarly, V
is a subspace, so it is also closed under scalar multiplication, hence ax W.
Thus ax U W.
Theorem 10.4 If U1 , . . . , Un are subspaces of V then U1 + + Un is a
subspace of V.
Proof. (exercise.)
Example 10.1 Let V = F3 , and
U = {(x, 0, 0) F3 |x F}

(10.5)

W = {(0, y, 0) F3 |x F}

(10.6)

The U and W are subspaces of V, and


U + W = {(x, y, 0) F3 |x, y F}

(10.7)

is also a subspace of V.
Theorem 10.5 Let U1 , . . . , Un be substaces of V. Then U1 + + Un is
the smallest subspace of V that contains all of U1 , . . . , Un .
Proof. We need to prove (a) that U1 + + Un contains U1 , . . . , Un , and (b)
that any subspace that contains U1 , . . . , Un also contains U1 + + Un .
To see (a), let u1 U1 . Since 0 Ui for all i, we can let u2 = 0 U2 , u3 =
0 U3 , . . . , un = 0 Un . Then
u1 = u1 + 0 + + 0

(10.8)

= u1 + u2 + + un U1 + + Un

(10.9)

Hence U1 U1 + + Un . By a similar argument, each of U2 , . . . , Un


U1 + + Un . This proves assertion (a).
To see (b), Let W be a subspace of V such that U1 W, . . . , Un W. Let
u1 U1 , . . . , un Un . By definition of the sum,
u1 + u2 + + un U1 + U2 + + Un

(10.10)

But since each Ui W, then ui W. Since W is a vector space, it is closed


under addition, hence
u1 + + un W
(10.11)
Since this is true for every element of U1 + + Un , then
U1 + + Un W

(10.12)

which proves assertion (b).


Page 68

Last revised: November 3, 2012

TOPIC 10. SUBSPACES

Math 462

Definition 10.6 Let U1 , . . . , Un be subspaces of V such that


V = U1 + + Un

(10.13)

We say that the V is the direct sum of U1 , . . . , Un if each element of V


can be written uniquely as a sum u1 + + un where u1 U1 , . . . , un Un ,
and we write
V = U1 U2 Un
(10.14)
Example 10.2 Let V = R2 . Then V = U W where
U = {(x, 0)|x R}

(10.15)

W = {(0, y)|y R}

(10.16)

V = Fn = {(v1 , v2 , . . . , vn )|vi F}

(10.17)

More generally, if

and

V1 = {(v, 0, . . . , 0)|v F}
V2 = {(0, v, 0, . . . , 0)|v F}
..
.
Vn = {(0, . . . , 0, v)|v F}

(10.18)

Then
V = V1 V2 Vn

(10.19)

Example 10.3 Let V = R3 and suppose that

U = {(x, y, 0)|x R}
W = {(0, y, y)|y R}

Z = {(0, 0, z)|z R}

(10.20)

Then U, W, and Z are subspaces of V, and


V =U +W +Z

(10.21)

V=
6 U W Z

(10.22)

but
(1) U is a subspace of V: U contains (0, 0, 0) = 0;
(x, y, 0) + (x0 , y 0 , 0) = (x + x0 , y + y 0 , 0) U

(10.23)

hence U is closed under addition; and


a(x, y, 0) = (ax, ay, 0) U
Last revised: November 3, 2012

(10.24)
Page 69

Math 462

TOPIC 10. SUBSPACES

hence U is closed under scalar multiplication.


(2) W is a subspace of V: Since y = 0 R, we know that (0, 0, 0) W.
Since
(0, y, y) + (0, y 0 , y 0 ) = (0, y + y 0 , y + y 0 ) W
(10.25)
then W is closed under addition; and since
a(0, y, y) = (0, ay, ay) W

(10.26)

then W is closed under scalar multiplication.


(3) Z is a subspace of V: Since z = 0 R then (0, 0, 0) Z; since
(0, 0, z) + (0, 0, z 0 ) = (0, 0, z + z 0 ) Z

(10.27)

we have Z closed under addition; and since


a(0, 0, z) = (0, 0, az) Z

(10.28)

it is closed under scalar multiplication.


(4) V = U + W + Z: Two sets are equal if each is a subset of the other.
Hence if we can show that (a) V U + W + Z and (b) U + W + Z V
then the two sets must be identical.
To see (a), Let (x, y, z) V. Then
(x, y, z) = (x, y/2, 0) + (0, y/2, y/2) + (0, 0, z y/2)

(10.29)

Since (x, y/2, 0) U, (0, y/2, y/2) W, and (0, 0, z y/2) Z then
(x, y, z) U + W + Z

(10.30)

and therefore V U + W + Z.
(5) V 6= U W Z: Consider the element (0, 0, 0) which is in each of the
subspaces as well as V. Then
(0, 0, 0)V = (0, 0, 0)U + (0, 0, 0)W + (0, 0, 0)Z

(10.31)

But we also can write


(0, 0, 0)V = (0, 17, 0)U + (0, 17, 17)W + (0, 0, 17)Z

(10.32)

This means we can express the vector (0, 0, 0) as two different sums of the
form u + w + z, and hence the method of defining u, w, z is not unique.
Going back to equation 10.29 we see that that sum is also not unique. For
example, we could also write
(x, y, z) = (x, y/4, 0)U + (0, 3y/4, 3y/4)W + (0, 0, z 3y/4)Z
Page 70

(10.33)

Last revised: November 3, 2012

TOPIC 10. SUBSPACES

Math 462

as well as any number of other combinations! In fact, we only have to


check the zero vector to make sure it can only be formed of individual zero
vectors, from the following theorem.
Theorem 10.7 Let U1 , . . . , Un be subspaces of V. Then V = U1 Un
if and only if both of the following are true:
(1) V = U1 + + Un
(2) The only way to write 0 as a sum u1 + + un , with each uj Uj , is
if uj = 0 for all the j.
Proof. ( = ) Suppose that
V = U1 Un

(10.34)

Then by definition of direct sum,


V = U1 + + Un

(10.35)

which proves (1). Now suppose that there are ui Ui , i = 1, . . . , n, such


that
u1 + + un = 0
(10.36)
By the uniqueness part of the defintion of direct sums, there must be only
one way to choose these ui . Since we know that the choice ui = 0, i =
1, . . . , n works, this choice must be unique. Thus (2) follows.
( = ) Suppose that (1) and (2) both hold. Let v V. By (1) we can find
u1 U1 , . . . , un Un so that
v = u1 + u2 + + un

(10.37)

Suppose that there is some other representation


v = v1 + v2 + + v n

(10.38)

Then
0 = v v = (u1 v1 ) + (u2 v2 ) + + (un vn )

(10.39)

where by closure of each Ui we have ui vi Ui . Since we have assumed


that (2) is true, then we must have ui vi = 0 for all i. Hence ui = vi and
therefore the representation is unique.

Last revised: November 3, 2012

Page 71

Math 462

TOPIC 10. SUBSPACES

Theorem 10.8 Let V be a vector space with subspaces U and W. Then


V = U W if and only if V = U + W and U W = {0}.
Proof. ( = ) Suppose that V = U W.
Then V = U + W by the definition of direct sum. Now suppose that
v U W. Then
0 = v + (v)
(10.40)
where v U and v W (since they are both in the intersection).
By the previous theorem, the only way to write 0 is
0 = 0 U + 0W

(10.41)

and this is the only representation of 0 as a sum of vectors in U and W.


Hence v = 0U and v = 0W .
Thus U W = {0}.
( = ) Suppose that V = U + W and U W = {0}. Suppose that there
exist u U and w W such that
0=u+w

(10.42)

w =u+ww =u

(10.43)

But by equation 10.42

Since w W then u W. Hence u U W. But U W = {0} and


therefore u = 0. Since w = u we also have w = 0. Hence the only way
to construct 0 = u + w is with u = w = 0, from which by theorem 10.7 we
conclude that V = U W.

Page 72

Last revised: November 3, 2012

Topic 11

Polynomials
Definition 11.1 A function p(x) : F 7 F is called a polynomial (with
coefficients) in F if there exists a0 , . . . , an F such that
p(z) = a0 + a1 z + a2 z 2 + + an z n

(11.1)

If an 6= 0 then we say that the degree of p is n and we write deg(p) = n.


If all the coefficients a0 = = an = 0 we say that deg(p) = .
We denote the vector space of all polynomials with coefficents in F as P(F)
and the vector space of all polynomials with coefficients in F and degree
at most m as Pm (F) (thus, e.g., 3x2 + 4 is in P3 as well as P2 ).
A number F is called a root of p if
p() = 0

(11.2)

Lemma 11.2 Let F. Then for j = 2, .3, . . . ,


z j j = (z )(z j1 + z j2 + + zj2 + j1 )

(11.3)

Proof. (Exercise.)
Theorem 11.3 Let p P(F) be a polynomial with degree m 1. Then
F is a root of p if and only if there is a polynomial q P(F) with degree
m 1 such that
p(z) = (z )q(z)
(11.4)
for all z F.

2012. Draft of: November 3, 2012

Page 73

Math 462

TOPIC 11. POLYNOMIALS

Proof. ( = ) Suppose that there exists a q P(F) such that 11.4 holds.
Then
p() = ( )q() = 0
(11.5)
hence is a root of p.
( = )Let be a root of p, where
p(z) = a0 + a1 z + a2 z 2 + + am z m

(11.6)

0 = a0 + a1 + a2 2 + + am m

(11.7)

Then
Subtracting,
p(z) = a1 ( z) + a2 (z 2 2 ) + + am (z m m )

(11.8)

By the lemma
z j j = (z )qj1 (z)

(11.9)

qj1 (z) = z j1 + z j2 + + zj2 + j1

(11.10)

where
and therefore
p(z) = a1 ( z) + a2 (z )q1 (z) + a3 (z )q2 (z)+
+ am (z )qm (z)

(11.11)

= (z )(a1 + a2 q1 (z) + a2 q2 (z) + + am qm1 (z))

(11.12)

= (z )q(z)

(11.13)

q(z) = a1 + a2 q1 (z) + a3 q2 (z) + + am qm1 (z)

(11.14)

where

is a polynomial of degree m 1 (because the sum of polynomials of degree


at most m 1 is a polynomial of degree at most m 1, and because there
is a nonzero-coefficient to the z m1 term).
Corollary 11.4 Let p P(F) with deg(p) = m 0. Then p has at most
m distinct roots in F.
Proof. If m = 0, then p(z) = a0 6= 0 so p has no roots (and 0 0 = m).
If m = 1 then p(z) = a0 + a1 z where a1 6= 0, and p has precisely one root
at z = a0 /a1 .
Suppose m > 1 and assume that the theorem is true for degree m 1
(inductive hypothesis).
Page 74

Last revised: November 3, 2012

TOPIC 11. POLYNOMIALS

Math 462

Either p has a root F or it does not.


If p does not have a root then the number of roots is 0 which is less than
m and the theorem is proven.
Now assume that p does have a root . By theorem 11.3 there is a polynomial q(z) with deg(q) = m 1 such that
p(z) = (z )q(z)

(11.15)

By the inductive hypothesis, q has at most m 1 distinct roots. Either is


one of these roots, in which case p has precisely m 1 roots, or is not one
of these roots, in which case p has at most m 1 + 1 = m distinct roots.
In either case, p has m distinct roots, proving the theorem.
Corollary 11.5 Let a0 , . . . , an F and
p(z) = a0 + a1 z + a2 z 2 + + an z n = 0

(11.16)

for all z F. Then a0 = a1 = = an = 0.


Proof. Suppose that equation 11.16 holds with at least one aj 6= 0. Then
by corollary 11.4 p(z) has at most m roots. But by equaton 11.16 every
value of z is a root, which means that p has an infinite number of roots.
This is a contradiction. Hence all the aj = 0.
Theorem 11.6 Polynomial Division. Let p(z), q(z) P(F) with p(z) 6=
0 (the zero polynomial). Then there exist polynomials s(z), r(z) P(F)
such that
q(z) = s(z)p(z) + r(z)
(11.17)
such that deg(r) < deg(p).
Proof.1 Let n = deg(q) and m = deg(p). We need to show that there exist
s and r with deg(r) < m.
Case 1. Suppose that n < m (deg(q) < deg(p)). Let s = 0 and r = q(z).
Then deg(r(z)) = deg(q(z)) = n < m as required.
Case 2. 0 m n. Prove by induction.
If n = 0: Since 0 m n = 0 we must have m = 0. Hence there exists
non-zero constants q0 F and p0 F such that

q(z) = q0
(11.18)
p(z) = p0
1 This

proof is based on the one given in Friedberg et al., Linear Algebra, 4th edition,
Pearson (2003).

Last revised: November 3, 2012

Page 75

Math 462

TOPIC 11. POLYNOMIALS

we need to find s(z) and r(z) such that


q0 = p0 s(z) + r(z)

(11.19)

If we choose r(z) = 0 and


s(z) =

q0
p0

(11.20)

then deg(r) = < 0 = m as required.


If n > 0: (Inductive hypothesis): Let n be fixed and assume that the result
holds for all polynomials of degree less than n. Then we can assume that
there exist some constants p0 , . . . , pm F and q0 , . . . , qn F, with pn 6= 0
and qm 6= 0, such that
q(z) = q0 + q1 z + q2 z 2 + + qn z n

(11.21)

p(z) = p0 + p1 z + p2 z 2 + + pm z m

(11.22)

and

with 0 m n. Then define


h(z) = q(z)

qn nm
z
p(z)
pm

(11.23)


 qn nm
z
p0 + p1 z + p2 z 2 + + pm z m
= q0 + q1 z + q2 z 2 + + qn z n
pm
(11.24)
Then since the z n terms subtract out,
deg(h(z)) < n

(11.25)

Then either deg(h) < m = deg(p) or m deg(h) < n. In the first instance,
case 1 applies, and in the second instance the inductive hypothesis applies
to h and p, i.e., there exist polynomials s0 (z) and r(z) such that
h(z) = s0 (z)p(z) + r(z)

(11.26)

deg(r) < deg(p) = m

(11.27)

with

Substituting equation 11.26 into equation 11.23 gives


q(z)
Page 76

qn nm
z
p(z) = s0 (z)p(z) + r(z)
pm

(11.28)

Last revised: November 3, 2012

TOPIC 11. POLYNOMIALS

Math 462

Solving for q(z),


qn nm
z
p(z) + s0 (z)p(z) + r(z)
pm


qn nm
=
z
+ s0 (z) p(z) + r(z)
pm

q(z) =

(11.29)
(11.30)

= s(z)p(z) + r(z)

(11.31)

for some polynomial s(z), as required.


Theorem 11.7 The factorization of theorem 11.6 is unique.
Proof. (Exercise.)
Theorem 11.8 (Fundamental Theorem of Algebra) Every polynomial
of degree n over C has precisely n roots.
Descarte proposed the fundamental theorem of algebra in 1637 but did not
prove it. Albert Girard had earlier (1629) proposed that an nth order polynomial has n roots but that they may exist in a field larger than the complex
numbers. The first published proof of the fundamental theorem of algebra
was by DAlembert in 1746, but his proof was based on an earlier theorem
that itself used the theorem, and hence is circular. At about the same time
Euler proved it for polynomials with real coefficients up to 6th order. Between 1799 (in his doctoral dissertation) and 1816 Gauss published three
different proofs for polynomials with with real coefficients, and in 1849 he
proved the general case for polynomials with complex coefficients.
Theorem 11.9 Let p(z) be a polynomial with real coefficients. Then if
C is a root of p then is also a root of p(z).
Proof. Define
p(z) = a0 + a1 z + + an z m

(11.32)

0 = a0 + a1 + + an m

(11.33)

Since is a root,
Taking the complex conjugate of this equation proves the theorem:
0 = 0 = a0 + a1 + + an m
m

a0 + a1 + + an

(11.34)
(11.35)

where the last line follows from the fact that all the coefficients are real,
hence ai = ai . Thus is a root of p.
Last revised: November 3, 2012

Page 77

Math 462

TOPIC 11. POLYNOMIALS

Theorem 11.10 Let , R. Then there exist 1 , 2 C such that


z 2 + z + = (z 1 )(z 2 )

(11.36)

and 1 , 2 R 2 > 4.
Proof. We complete the squares in the quadratic,
 2  2
z 2 + z + = z 2 + z +

+
2
2

2
2
= z+
+
2
4

(11.37)
(11.38)

If 2 4, then we define c R by
c2 =

(11.39)

and therefore

2
z 2 + z + = z +
c2
2




= z+ c z+ +c
2
2

(11.40)
(11.41)

hence


p
1
2 4
2
which is the familiar quadratic equation.
1,2 =

(11.42)

If 2 < 4 then the right hand side of the equation 11.37 is always positive,
and hence there is no real value of z that gives p(z) = 0. Hence there can
be no real roots; if any root exists, it must be complex. We can solve for
these two roots using the quadratic formula:

p
1
i 4 2
(11.43)
1,2 =
2
(substitution proves that these are roots). This is a complex conjugate
pair.
Theorem 11.11 If p P(C) is a non-constant polynomial then it has a
unique factorization
p(z) = c(z 1 )(z 2 ) (z n )

(11.44)

where c, 1 , . . . , n C and 1 , . . . , n are the roots (note that some of the


roots might be repeated, if they are real).
Page 78

Last revised: November 3, 2012

TOPIC 11. POLYNOMIALS

Math 462

If some of the roots are complex, then let m be the number of real roots
and k = n m be the number of complex roots, where k is even because
the complex roots come in pairs. Then the complex roots can written as
m+2j1 = aj + ibj ,

2j = aj ibj ,

j = 1, . . . , k/2

(11.45)

[(x m+1 )(x m+2 ] [(x m+k1 )(x m+k )]

(11.46)

for some real numbers a1 , b1 , . . . , an , bn . Then


p(x) = c(x 1 )(x 2 ) (x m )
2

= c(x 1 )(x 2 ) (x m )(x + 1 x + 1 ) (x + k x + k )


(11.47)
where
x2 + j x + j = (x m+2j1 )(x m+2j )

(11.48)

= (x aj ibj )(x aj + ibj )

(11.49)

=
=

(x aj ) + b2j
x2 2aj x + a2j

(11.50)
+

b2j

(11.51)

Thus j = 2aj and j = a2j + b2j = (j /2)2 + b2j . Hence we have the
following result.
Theorem 11.12 Let p(x) be a non-constant polynomial of order n with
real coefficients. Then p(x) may be uniquely factored as
p(x) = c(x 1 )(x 2 ) (x m )(x2 + 1 x + 1 ) (x2 + k x + k )
(11.52)
where n = m + 2k, 1 , . . . , m R, i , i R, and i2 < 4j . If k > 0 then
the complex roots are
q
j
i j (j /2)2
(11.53)
j1,j2 =
2

Last revised: November 3, 2012

Page 79

Math 462

Page 80

TOPIC 11. POLYNOMIALS

Last revised: November 3, 2012

Topic 12

Span and Linear


Independence
Definition 12.1 Let V be a vector field over F and let s = (v1 , v2 , . . . , vn )
be a list of vectors in V. Then a linear combination of s is any vector
of the form
v = a1 v1 + a2 v2 + + an vn
(12.1)
where a1 , . . . , an F.
The span of (v1 , . . . , vn ) is the set of all linear combinations of (v1 , . . . , vn ),
span(v1 , . . . , vn ) = {a1 v1 + + an vn |a1 , . . . , an F}

(12.2)

We define the span of the empty list as span() = {0}.


We say (v1 , . . . , vn ) spans V if V = span(v1 , . . . , vn ).
A vector space is finite dimensional if V = span(v1 , . . . , vn ) for some
v1 , . . . , vn V.1 A vector space that is not finite dimensional is called
infinite dimensional.
Example 12.1 span(1, z, z 2 , . . . , z m ) = Pm (F)).
Example 12.2 P(F ) is infinite dimensional.
Theorem 12.2 The span of any list of vectors in V is a subspace of V.
Proof. (Exercise.)
1 Recall

that a list has a finite length, by definition.

2012. Draft of: November 3, 2012

Page 81

Math 462

TOPIC 12. SPAN AND LINEAR INDEPENDENCE

Theorem 12.3 span(v1 , . . . , vn ) is the smallest subspace of V that contains


all the vectors in the list (v1 , . . . , vn ).
Proof. (Exercise.)
Theorem 12.4 Pm (F) is a subspace of P(F)
Proof. (Exercise.)
Definition 12.5 A list of vectors (v1 , . . . , vn ) in V is said to be linearly
independent if
a1 v1 + a2 v2 + + an vn = 0 a1 = a2 = = an = 0

(12.3)

for ai F, and is called linearly dependent iff there exists a1 , . . . , an not


all zero such that
a1 v1 + a2 v2 + + an vn = 0
(12.4)
Example 12.3 The list ((1, 5), (2, 2)) is linearly independent because the
only solution to
a(1, 5) + b(2, 2) = 0
(12.5)
is a = b = 0. To see this, suppose that there is a solution; then

a + 2b = 0
5a + 2b = 0

(12.6)

Subtracting the first equation from the second gives 4a = 0 = a = 0;


hence (from either equation), b = 0.
Example 12.4 The list ((1, 2), (1, 3), (0, 1)) is linear dependent because
(1)(1, 2) + (1)(1, 3) + (5)(0, 1) = 0

(12.7)

Theorem 12.6 Linear Dependence Lemma. Let (v1 , . . . , vn ) be linearly dependent in V with v1 6= 0. Then for some integer j, 2 j n,
(a) vj span(v1 , . . . , vj1 )
(b) If the j th term is removed from (v1 , . . . , vn ) then the span of the remaining lists equals span(v1 , . . . , vn ).
Proof. (a) Let (v1 , . . . , vn ) be linearly dependent with v1 6= 0.
a1 , . . . , an , not all 0, such that
0 = a1 v1 + a2 v2 + + an vn
Page 82

Then

(12.8)

Last revised: November 3, 2012

TOPIC 12. SPAN AND LINEAR INDEPENDENCE

Math 462

Let j be the largest integer 2 j n such that aj 6= 0. At least one


aj , j > 1 must be nonzero because a1 6= 0. Then, since aj+1 = aj+2 =
= an = 0,
0 = a1 v1 + a2 v2 + + aj vj
(12.9)
hence (since aj 6= 0),
vj =

aj1
a1
v1
vj1
aj
aj

(12.10)

This proves (a).


To prove (b), let u span(v1 , . . . , vn ). Then for some numbers c1 , . . . , cn
F,
u = c1 v1 + + cj vj + + cn vn


aj1
a1
= c1 v1 + + cj v1
vj1 + + cn vn
aj
aj

(12.11)
(12.12)

which does not depend on vj , proving (b).


Theorem 12.7 Let V be a finite-dimnsional vector space. Then the length
of every linearly independent list of vectors in V is less than or equal to the
length of every spanning list of vectors in V.
Proof. Let (u1 , . . . , um ) be linearly independant in V and let (w1 , . . . , wn )
be any spanning list of vectors of V. We need to show that m n.
Since (w1 , . . . , wn ) spans V then there are some numbers a1 , . . . , an such
that
u1 = a1 w1 + + an wn
(12.13)
hence
0 = (1)u1 + w1 + + an wn

(12.14)

which tells us that the list


b1 = (u1 , w1 , w2 , . . . , wn )

(12.15)

is linearly dependent.
By the linear dependence theorem, we can remove one of the wi so that
the remaining elements in 12.15 spans V. Call the vector removed wi1 , and
define
B1 = (u1 , w1 , w2 , . . . , wn |wi1 )
(12.16)
where by (a, b, ...|p, q, ...) we mean the list (a, b, ..) with the elements (p, q, ..)
removed.
Last revised: November 3, 2012

Page 83

Math 462

TOPIC 12. SPAN AND LINEAR INDEPENDENCE

Since B spans V, if we add any vector to it from V, it becomes linearly


dependent. For example
b2 = (u1 , u2 , w1 , w2 , . . . , wn |wi1 )

(12.17)

is linearly dependent. Applying the linear dendence theorem a second time,


we can remove one of the elements and the list still spans V. Since the ui
are linearly independent, it must be one of the wj , call it wi2 . So the list
B2 = (u1 , u2 , w1 , . . . , wn |wi1 , wi2 )

(12.18)

spans V.
We keep repeating this process. In each step, we add one uk and remove
one wj . If at some point we do not have any ws to remove this must
mean that we have created a list that only contains the us but is linearly
dependent. This is a contradition. Hence there must always be at least one
w to remove. Thus there must be at least as many ws as there are us.
Hence m n.
Example 12.5 Let V = R2 and define the vectors
a = (1, 0)

(12.19)

b = (0, 1)

(12.20)

c = (1, 1)

(12.21)

s = (a, b, c)

(12.22)

and the list


2

Then s spans V. To see that, let v R be any vector (x, y) R2 . To


prove s spans V we must find numbers , , such that
v = (x, y) = a + b + c

(12.23)

= (1, 0) + (0, 1) + (1, 1)

(12.24)

= ( + , + )

(12.25)

For example, given any x, y R we can take = = x and = 0,


v = (x, y) = x(1, 0) + y(0, 1) = xa + yb

(12.26)

thus s spans V. However, the representation is not unique. For example,


we can write the vector (3,7) in many different ways in terms of s, two of
which are

Page 84

(3, 7) = 3(1, 0) + 7(0, 1) + 0(1, 1)

(12.27)

= 2(1, 0) + 6(0, 1) + 1(1, 1)

(12.28)

Last revised: November 3, 2012

TOPIC 12. SPAN AND LINEAR INDEPENDENCE

Math 462

Thus the expansion is not unique.


Theorem 12.7 tells us that since the length of s is 3, every linearly independent list in R2 has length at most 3.
In fact, we can see that s is not linearly independent because we can write
c=a+b

(12.29)

and as we shall see in the next chapter that this means that while s spans
V, it is not a basis of V.
In fact, we can remove c from s, and still have a set s0 = (a, b) that spans
V. Since the length of s0 = 2, this means that any linearly dependent list
in V has length at most 2. In fact, since s0 is linearly independent and
spans V, it is a basis of V (a basis is a linearly independent spanning list of
vectors).
Theorem 12.8 Every subspace of a finite dimensional vector space is finite
dimensional.
Proof. Let V be finite dimensional, and let U be a subspace of V. Since V
is finite dimensional, for some m, there exists w1 , . . . , wm V such that
L = (w1 , . . . , wm )

(12.30)

spans V, i.e., V = span(L).


If U = {0} then we are done.
If U 6= {0} then there exists at least one v1 U such that v1 6= 0. Define
B = (v1 )

(12.31)

Either U = span(B) or U 6= span(B)


If U = span(B) then U is finite dimensional (because it is spanned by a list
of length one) and the proof is complete.
If U 6= span(B), and there is some v2 U such that
v2 6 span(B)

(12.32)

Append this vector to B so that


B = (v1 , v2 )

(12.33)

We then keep repeating this process. If at any point, U = span(B) (where


the length of B is some finite integer j, in (12.33) j = 2), then, since U is
Last revised: November 3, 2012

Page 85

Math 462

TOPIC 12. SPAN AND LINEAR INDEPENDENCE

spanned by a list of finite length, we conclude that U is finite dimensioned,


and the proof is finished.
If after j steps we have that U 6= span(B), there is some vector in U that
is not in the span(B) that we can append to B, and continue the iteration.
After n steps we will have n vectors in B.
B = (v1 , v2 , . . . , vn )

(12.34)

Either the process stops with n < m or it does not. If it stops, we are done,
because the length of B is finite.
If it does not then eventually B will have length n = m.
It is not possible to find any other vector u U at this point such that
B 0 = (v1 , . . . , vm , u)

(12.35)

is linearly independent.
To see this, suppose that there is such a vector, i.e., that B 0 is linearly
independent.
Since the length of B is m + 1 > m, this means we have found a linearly
independent list of length greater than a spanning list (L, which has length
m).
This contradicts theorem 12.7. Hence no such vector u exits.
Thus the longest possible list that spans U has m elements; since m is finite,
U is finite dimensioned, as required.

Page 86

Last revised: November 3, 2012

Topic 13

Bases and Dimension


Definition 13.1 A basis of V is a list of linearly indpendent vectors in V
that spans V.
The standard basis in Fn is
((1, 0, . . . , 0), (0, 1, 0, . . . , 0), (0, 0, 1, 0, . . . , 0), . . . , (0, . . . , 0, 1))

(13.1)

Example 13.1 (1, z, z 2 , . . . , z m ) is a basis of Pm .


Theorem 13.2 Let v1 , . . . , vn V(F). Then (v1 , . . . , vn ) is a basis of V if
and only if every v V can be written uniquely in the form
v = a1 v1 + + an vn

(13.2)

for some a1 , . . . , an F.
Proof. ( = ) Suppose that B = (v1 , . . . , vn ) is a basis of V.
Let v V. Then because B spans V there exists some a1 , . . . , an F such
that
v = a1 v1 + + an vn
(13.3)
We must show that the numbers a1 , . . . , an are unique. To do those, suppose
that there is a second set of numbers b1 , . . . , bn F such that
v = b1 v1 + + bn vn

(13.4)

0 = (a1 b1 )v1 + (a2 b2 )v2 + + (an bn )vn

(13.5)

Subtracting,

2012. Draft of: November 3, 2012

Page 87

Math 462

TOPIC 13. BASES AND DIMENSION

Since B is a basis, it is linearly independent. Hence every coefficent in equation 13.5 must be zero, i.e., ai = bi , i = 1, . . . , n. Thus the representation
is unique.
( = ) Suppose that every v V can written uniquely in the form of
equation 13.3.
Then by definition of a spanning list, (v1 , . . . , vn ) spans V. To show that
B is a basis of V we need to also show that it is linearly independent.
Suppose that B is linearly dependent. Then there exist b1 , . . . , bn F such
that
0 = b1 v 1 + + bn v n
(13.6)
By uniqueness (which we are assuming), this is the only set of bi for which
this is true. But we also know that
0 = (0)v1 + + (0)vn

(13.7)

as a different representation of the vector 0. Hence bi = 0, i = 1, . . . , n.


Thus the list B is linearly independent, and hence a basis of V.
Theorem 13.3 Let S be a list of vectors that spans V. Then either S is a
basis of V or it can be reduced to a basis of S by removing some elements
of S.
Proof. Let S = (v1 , . . . , vn ) span V. Let B = S.
For each element in B, starting with i = 1,
(1) If vi = 0, remove it from B and reindex B.
(2) If vi span(v1 , . . . , vi1 ), remove vi from B and re-index.
At each step, we only remove a vector if it is in the span of vectors to the
left of it in B. Hence what remains still spans V.
Let m = length(B) after the process is finished.
Since no vector is in the span of any vector to the left of it, vm is not in
the span of (v1 , . . . , vm1 ). Hence there is no way to write vm as a linear
combination of the list (v1 , . . . , vm1 ) and therefore (v1 , . . . , vm ) is linearly
independent.
Hence B spans V and is linearly independent, hence it is a basis of V.
Theorem 13.4 Every finite dimensional vector space has a basis.
Proof. Let V be finite dimensional. Then it has a spanning list. This list
can be reduced to a basis by theorem 13.3. Thus V has a basis.
Page 88

Last revised: November 3, 2012

TOPIC 13. BASES AND DIMENSION

Math 462

Theorem 13.5 Let S = (v1 , . . . , vm ) be a linearly independent list of


vectors in a finite dimensional vector space V. Then S can be extended to
a basis of V (by adding additional elements from V to S).
Proof. Let W = (w1 , . . . , wn ) be any list of vectors that spans V.
If w1 span(S), let B = S. Otherwise, let B = (v1 , . . . , vm , w1 ).
For each j = 2, . . . , n, repeat this process: if wj span(B), then ignore
wj ; otherwise append wj to B.
At each step, B is still linearly indepenent, because we are adding a vector
that is not in the previous span of B.
After the nth step, every wi is either in B or in the span of B. Thus every
wj span(B):
X
wj =
ck bk
(13.8)
for some c1 , c2 , F
Since W spans V, every vector v V can be expressed as a linear combination of the wi , i.e., for some a1 , a2 , . . . ,
X
v=
ai wi
(13.9)
Since every wi span(B), then every wi can be expressed as a linear
combination of B (eq. (13.8)). Substituting,
X X
X
v=
ai
ck bk =
ai ck bk
(13.10)
i

i,k

Hence every vector v V can be expressed as a linear combination of B.


Hence B spans V.
Since B spans V and B is linearly independent, it is a basis if V.
Theorem 13.6 Let V be a finite dimensional vector space, and let U be a
subspace of V. Then there is a subspace W of V such that V = U W.
Proof. Since any subset of a finite-dimensional vector space is finite-dimensional,
then U is finite-dimensional (Theorem 12.8).
Since every finite-dimensional vector space has a basis, then U has a basis
B = (u1 , . . . , um ) (Theorem 13.4).
Since B is a linearly independent list of vectors in V, it can be extended to
a basis B 0 = (u1 , . . . , um , w1 , . . . , wn ) of V (Theorem 13.5).
Let W = span(w1 , . . . , wn ). Then W is a subspace of V.
Last revised: November 3, 2012

Page 89

Math 462

TOPIC 13. BASES AND DIMENSION

To prove that V = U W we must show that (a) V = U + W ; and (b)


U W = {0}.
To prove (a): let v V. Since B 0 spans V there exist some numbers
a1 , . . . , am , b1 , . . . , bn such that
v = a1 u1 + + am um + b1 w1 + + bn wn = u + w
{z
} |
{z
}
|
U

(13.11)

where
u = a1 u1 + + am um U

(13.12)

w = b1 w1 + + bn wn W

(13.13)

Hence v U + W, which proves (a).


To prove (b): let v U W. Then v U and v W (by the definition of
intersection). Hence there are some numbers a1 , . . . , an such that
v = a1 u1 + + am um

(13.14)

and some numbers b1 , . . . , bn such that


v = b1 w1 + + bn wn

(13.15)

Setting the two expression for v equal to one another,


a1 u1 + + am um = b1 w1 + + bn wn

(13.16)

By rearrangment,
a1 u1 + + am um b1 w1 bn wn = 0

(13.17)

which means that a linear combination of B 0 is equal to zero. Since B 0 is a


basis and is hence linearly independent, the only way that this is possible
if if all the coefficients are zero, namely
a1 = = am = b1 = = bn = 0

(13.18)

Thus (by substitution of either all the ai or all the bi ),


v=0

(13.19)

But we chose v as an arbitrary element of U W. Hence U W = {0},


which proves (b).
Since V = U + W and U W = {0}, we conclude by theorem 10.8 that
V = U + W.
Page 90

Last revised: November 3, 2012

TOPIC 13. BASES AND DIMENSION

Math 462

Theorem 13.7 Let V be any finite dimensional vector space, and let B1
and B2 be any two bases of V. Then
length(B1 ) = length(B2 )

(13.20)

Proof. Since B1 is linearly independent; and since B2 spans V, we have by


theorem 12.7
length(B1 ) length(B2 )
(13.21)
Similarly, since B2 is linearly independent and B1 spans V,
length(B2 ) length(B1 )

(13.22)

Equation 13.20 follows.


Definition 13.8 The dimension of a finite-dimensional vector space V,
denoted by dim(V ) is the length of any basis V. More specifically, if B is
any basis of V, then
dim(V) = length(B)
(13.23)
Theorem 13.9 Let V be any finite dimensional vector space and let U be
any subspace of V. Then
dim(U) dim(V)

(13.24)

Proof. Let B be a basis of U. Then B is a linearly independent list in V.


By theorem 13.5 B can be extended to a basis in V. Let B 0 be any basis
of V obtained by extending B. Then
dim(U) = length(B) length(B 0 ) = dim(V)

(13.25)

Theorem 13.10 Let V be a finite-dimensional vector space, and let B be a


list of vectors in V such that (a) span(B) = V and (b) length(B) = dim(V).
Then B is a basis of V.
Proof. Since span(B) = V, then either B is already a basis of V or B can
be reduced to a basis of V (by theorem 13.3). But since every basis of V has
length dim(V), no elements of B can be removed, else the basis produced
by removing elements would be shorter than dim(V). Hence B is already
a basis of V.

Last revised: November 3, 2012

Page 91

Math 462

TOPIC 13. BASES AND DIMENSION

Theorem 13.11 Let V be a finite-dimensional vector space, and let B be


a linearly independent list of vectors in V such that length(B) = dim(V).
Then B is a basis of V.
Proof. Since B is linearly independent it can be extended to a basis of V
by theorem 13.5. Let B 0 be any such basis.
Since every basis has length dim(V) then
length(B 0 ) = dim(V) = length(B)

(13.26)

Hence it is not necessary to add any vectors to B to make it a basis. Hence


B is already a basis of V.
Theorem 13.12 Let V be a finite-dimensional vector space, and let U and
W be subspaces of V. Then
dim(U + W) = dim(U) + dim(W) dim(U W)

(13.27)

Proof. Let
B = (v1 , . . . , vm )

(13.28)

dim(U W) = m

(13.29)

be a basis of U W. Hence

By theorem 13.5, B can be extended to a basis BU of U,


BU = (v1 , . . . , vm , u1 , . . . , uj )

(13.30)

and to a basis BW of W,
BW = (v1 , . . . , vm , w1 , . . . , wk )
so that
dim(U) = m + j
dim(W) = m + k

(13.31)


(13.32)

for some integers m, j, and k. Let


B 0 = (v1 , . . . , vm , u1 , . . . , uj , w1 , . . . , wk )

(13.33)

Then U span(B 0 ) and W span(B 0 ) hence U + W span(B 0 ).


Furthermore, B 0 is linearly independent. [ To see this, suppose that there
exist scalars a1 , . . . , am , b1 , . . . , bj , c1 , . . . , ck such that
a1 v1 + + am vm + b1 u1 + + bj uj + c1 w1 + + ck wk = 0 (13.34)
Page 92

Last revised: November 3, 2012

TOPIC 13. BASES AND DIMENSION

Math 462

By rearrangement,
a1 v1 + + am vm + b1 u1 + + bj uj = c1 w1 ck wk
{z
}
{z
} |
|
U

(13.35)

Thus since the left hand side is a linear combination of BU ,


w = c1 w1 + + ck wk U

(13.36)

But all the wi W; hence w U W.


Thus there exist scalars d1 , . . . , dm such that
w = c1 w1 + + ck wk = d1 v1 + + dm vm

(13.37)

because B = (v1 , . . . , vm ) is a basis of U W. Rearranging,


c1 w1 + + ck wk d1 v1 dm vm = 0

(13.38)

Since BW = (v1 , . . . , vm , w1 , . . . , wk ) is a basis of W it is a linearly independent set. Hence


c1 = = ck = d1 = = dm = 0

(13.39)

Substituting back into equation 13.34,


a1 v1 + + am vm + b1 u1 + + bj uj = 0

(13.40)

Now since BU = (v1 , . . . , vm , u1 , . . . , uj ) is a basis of W, it is a linearly


independent set, which tells us that
a1 = = am = b1 = = bj = 0

(13.41)

Hence there is no linear combination of B 0 that gives the zero vector except
for the one in which all the coefficients are zero. This means the B 0 is a
linearly independent list. ]
Since B 0 is linearly independent and spans U + W, it is a basis of U + W.
Hence
dim(U + W) = length(B 0 )

(13.42)

=m+j+k

(13.43)

= (m + j) + (m + k) m

(13.44)

= dim(U) + dim(W) dim(U W)

(13.45)

Last revised: November 3, 2012

Page 93

Math 462

TOPIC 13. BASES AND DIMENSION

Theorem 13.13 Let V be a finite dimensional vector space, and suppose


that U1 , . . . , Um are subspaces of V such that
V = U1 + + Um

(13.46)

dim(V) = dim(U1 ) + + dim(Um )

(13.47)

V = U1 Um

(13.48)

and
Then
Proof. Define bases B1 , . . . , Bm for each of the Ui . Let1
B = (B1 , B2 , . . . , Bn )

(13.49)

Then B spans V (by equation 13.46), and


length(B) = length(B1 ) + + length(Bm )

(13.50)

= dim(U1 ) + + dim(Um )

(13.51)

= dim(V)

(13.52)

by equation 13.47. Hence B is a basis of V.


Let u1 U1 , . . . , um Um be chosen such that
0 = u1 + + um

(13.53)

We can express each ui in terms of the basis vectors of Ui . Suppose that


Bi = (vi1 , vi2 , . . . , vi dim(Ui ) ). Then for some scalars ai1 , . . . , ai dim(Ui ) )
dim(Ui )

ui =

aik vik

(13.54)

k=1

Hence
dim(U1 )

0=

X
k=1

dim(U2 )

a1k v1k +

dim(Um )

a2k v2k + +

k=1

amk vmk

(13.55)

k=1

But the right hand side is a linear combination of elements of B, which is


a basis of V. Hence every coefficient is zero, which means every uj = 0 in
equation 13.53. Thus the only way to write zero as a sum of vectors in each
of the Ui is if each ui is zero. By theorem 10.7 this means that Then
V = U1 Um

1 By

(13.56)

((a1 , a2 , . . . ), (b1 , b2 , . . . )) we will mean (a1 , a2 , . . . , b1 , b2 , . . . ).

Page 94

Last revised: November 3, 2012

Topic 14

Linear Maps
Definition 14.1 Let V, W be vector spaces over a field F. Then a linear
map from V to W is a function T : V 7 W with the properties:
(1) Additivity: for all u, v V,
T (u + v) = T (u) + T (v)

(14.1)

(2) Homogeneity: for a F and for all v V,


T (av) = aT (v)

(14.2)

It is common notation to omit the parenthesis when expressing maps, writing T (v) as T v. The reason for this will become clear when we study the
matrix representation of linear maps.
The properties of additivity and homogeneity can be combined into a linearity property expressed as
T (au + bv) = aT u + bT v

(14.3)

where a, b F and u, v V.
Definition 14.2 The set of all linear maps from V to W is denoted
by L(V, W)
Definition 14.3 Let V and W be vector spaces and let T L(V, W) be
a linear map T : V 7 W. The range of T is the subset of W that are
mapped to by T :
range(T ) = {w W|w = T v, v V}

2012. Draft of: November 3, 2012

(14.4)
Page 95

Math 462

TOPIC 14. LINEAR MAPS

Example 14.1 Define T : V 7 V by T v = 0; here T L(V, V ). The range


of T is {0}.
Example 14.2 Define T : V 7 V by T v = v. This is called the Identity
Map and has the special symbol I reserved for it:
Iv = v

(14.5)

The range of I is V.
Example 14.3 Differentiation: Define D L(P(R), P(R)) by Dp = p0
where p0 (x) = dp/dx.
Example 14.4 Integration: Define I L(P(R), R)) by
Z
Ip =

p(x)dx

(14.6)

Remark 14.4 Let V and W be vector spaces and T L(V, W) be a linear


map, and let B = (v1 , . . . , vn ) be a basis of V. Then T v is completely
determined by the values of T vi , because v V, there exist ai , i = 1, . . . , n
such that
v = a1 v1 + a2 v2 + + an vn
(14.7)
Hence by linearity
T v = a1 T v1 + a2 T v2 + + an T vn

(14.8)

Remark 14.5 Linear maps can be made to take on arbitrary values on a


basis. Let B = (v1 , . . . , vn ) be a basis of V and let w1 , . . . , wn W. Then
we can construct a linear map such that T v1 = w1 , T v2 = w2 , . . . , T vn =
wn by
T (a1 v1 + + an vn ) = a1 w1 + + an wn
(14.9)
Definition 14.6 Addition of Linear Maps. Let V, W be vector spaces
and let S, T L(V, W). The we define the map (S + T ) : V 7 W by
(S + T )v = Sv + T v

(14.10)

which must hold for all v V.


Definition 14.7 Scalar Multiplication of Linear Maps. Let V, W be
vector spaces let T L(V, W). The we define the map aT : V 7 W by
(aT )v = a(T v)

(14.11)

which must hold for all a F and all v V.


Page 96

Last revised: November 3, 2012

TOPIC 14. LINEAR MAPS

Math 462

Theorem 14.8 Let V and W be vector spaces. Then L(V, W) is a vector


space under addition and scalar multiplication of linear maps.
Proof. Exercise. You need to prove closure; commutivity and associativity
of addition; existence of additive identity and inverse; existence of identity
for scalar multiplication; and distributivity.
Definition 14.9 Product of Linear Maps. Let U, V, W be vector spaces,
T L(U, V) and S L(V, W). Then we define ST L(U, W ) by
(ST )v = S(T v)

(14.12)

We normally drop the parenthesis and write this as ST .


Remark 14.10 Multiplication of linear maps is associative
(ST )U = S(T U ) = ST U = S(T (U (v)))
and distributive
(S + T )U = SU + T U
S(T + U ) = ST + SU

(14.13)


(14.14)

but is not commutative. In fact, even if ST is defined, T S might be undefined.


Remark 14.11 The product ST only makes sense if domain(S) = range(T ),
i.e, if
S : U 7 V and T : V 7 W = ST : U 7 W
(14.15)
The product ST is analogous to the matrix product ST which only makes
sense if
S is [m p] and T is [p n] = ST is [m n]

(14.16)

When written in this way (ordered as U, V, W (or m, p, n)), the middle


vector space (or middle dimension) must be the same, and the final mapping
is between the outer vector space (or dimensions).
Remark 14.12 Sometimes the product ST is written as the composition
of the linear operators, S T .
Definition 14.13 Let V, W be vector spaces and let T L(V, W). Then
the null space of T is defined as the subset of V that T maps to zero:
null(T ) = {v V|T v = 0}

(14.17)

Theorem 14.14 Let V and W be vector spaces over F and T L(V, W).
Then null(T ) is a subspace of V.
Last revised: November 3, 2012

Page 97

Math 462

TOPIC 14. LINEAR MAPS

Proof. By additivity,
T (0) = T (0 + 0) = T (0) + T (0)

(14.18)

T (0) = 0

(14.19)

hence 0 null(T ).
Now let u, v null(T ).
T (u + v) = T u + T v = 0 + 0 = 0

(14.20)

Hence u, v null(T ) = u + v null(T ).


Finally, let u null(T ) and a F.
T (au) = aT u = a(0) = 0

(14.21)

so that u null(T ) = au null(T ).


Thus null(T ) contains 0 and is closed under addition and scalar multiplication. Hence it is a subspace of V.
Definition 14.15 Let V and W be vectors spaces and let T L(V, W).
Then T is called injective or one-to-one if whenever u, v V,
T u = T v = u = v

(14.22)

Definition 14.16 Let V and W be vector spaces and let T L(V, W).
Then T is called surjective or onto if range(T ) = W, i.e, if w W, v
V 3 w = T v.
Theorem 14.17 Let V and W be vectors spaces and let T L(V, W).
Then T is injective (one-to-one) if and only if null(T ) = {0}.
Proof. ( = ) Suppose that T is injective (1-1). By equation 14.19 we
known that T (0) = 0 hence
{0} null(T )

(14.23)

Let v null(T ); then T (v) = 0 = T (0). Since T is injective (1-1), this


means v = 0. Hence
null(T ) {0}
(14.24)
Thus
null(T ) = {0}

(14.25)

( = ) Assume that equation 14.25 is true.


Page 98

Last revised: November 3, 2012

TOPIC 14. LINEAR MAPS

Math 462

Let u, v V and suppose that T u = T v. Then


0 = T u T v = T (u v)

(14.26)

u v null(T ) = {0} .

(14.27)

Hence
Therefore uv = 0 or u = v which proves that T is injective (1-1) (because
T u = T v = u = v).
Theorem 14.18 Let V and W be vector spaces over F and let T L(V, W).
Then range(T ) is a subspace of W.
Proof. By definition R = range(T ) W. To show that R is a subspace of
W we need to show that:
(a) 0 R;
(b) R is closed under addition; and
(c) R is closed under scalar multiplication.
By equation 14.19, T (0) = 0, hence 0 R, proving (a).
Let w, z range(T ). Then there exist some u, v V such that T (u) = w
and T (v) = z. Then
T (u + v) = T u + T v = w + z

(14.28)

so that w + z range(T ), proving (b).


Now let w R and pick any a F. Then since w = R there is some
v V such that T v = w, so that
T (av) = aT v = aw

(14.29)

and hence aw range(T ) proving (c).


Theorem 14.19 Let V be a finite dimensional vector space over F, let
W be a vector space over F, and let T L(V, W). Then range(T ) is a
finite-dimensional subspace of W and
dim V = dim null(T ) + dim range(T )

(14.30)

Proof. Let B = (u1 , . . . , um ) be a basis of null(T ). Hence


dim null(T ) = m

(14.31)

We can extend B to a basis B 0 of V, i.e., for some integer n,


B 0 = (u1 , . . . , um , w1 , . . . , wn )
Last revised: November 3, 2012

(14.32)
Page 99

Math 462

TOPIC 14. LINEAR MAPS

Hence
dim V = m + n

(14.33)

Let v V. Because B 0 spans V, there are scalars a1 , . . . , bn F such that


v = a1 u1 + + am um + b1 w1 + + bn wn

(14.34)

T v = T a1 u1 + + T am um + T b1 w1 + + T bn wn

(14.35)

= a1 T u1 + + am T um + b1 T w1 + + bn T wn

(14.36)

= b1 T w1 + + bn T wn

(14.37)

Hence

where the last line follows because the ui null(T ).


Thus v V, T v can be expressed as a linear combination of
B 00 = (T w1 , . . . , T wn )

(14.38)

span(B 00 ) = range(T )

(14.39)

Hence
Since the length of B 00 is finite, range (T ) is finite dimensional (because it
is spanned by a finite list).
Suppose that there exist c1 , . . . , cn F such that
0 = c1 T w1 + + cn T wn

(14.40)

= T (c1 w1 + + cn wn )

(14.41)

This implies that


c1 w1 + + cn wn null(T )

(14.42)

But B is a basis of null(T ) so any vector in null(T ) is a linear combination


of the elements of B. So there exist some scalars d1 , . . . , dm F such that
c1 w1 + + cn wn = d1 u1 + + dm um

(14.43)

or by rearrangement
c1 w1 + + cn wn d1 u1 + dm um = 0

(14.44)

But this is a linear combination of the elements of B 0 , which is a basis of


V and hence linearly independent. Thus
c1 = = cn = d1 = = dm = 0
Page 100

(14.45)

Last revised: November 3, 2012

TOPIC 14. LINEAR MAPS

Math 462

and thus (see equation 14.40) the only linear combination of the B 00 that
gives the zero vector is on in which all the coefficients are zero. This means
that B 00 is linearly independent.
Since B 00 is linearly independent and spans range(T ), it is a basis of range(T )
and hence
dim range(T ) = length(B 00 ) = n
(14.46)
Combining equations 14.46 with 14.31 and 14.33
dim V = m + n = dim null(T ) + dim range(T )

(14.47)

Corollary 14.20 Let V and W be finite dimensional vector spaces with


dim V > dim W

(14.48)

and let T L(V, W ). Then T is not injective (one-to-one).


Proof. First, observe that since range(T ) W,
dim range(T ) dim W

(14.49)

Then by theorem 14.19


dim null(T ) = dim V dim range(T ) dim V dim W > 0

(14.50)

Since dim null(T ) > 0 then null(T ) must contain vectors other than 0, i.e.,
null(T ) 6= {0}.
Hence (see theorem 14.17) T is not injective.
Corollary 14.21 Let V and W be finite-dimensional vector spaces with
dim V < dim W

(14.51)

and let T L(V, W ). Then T is not surjective (onto).


Proof. By theorem 14.19
dim range(T ) = dim V dim null(T ) dim V < dim W

(14.52)

Hence there are vectors in W that are not mapped to by T , and thus T is
not surjective (onto).

Last revised: November 3, 2012

Page 101

Math 462

TOPIC 14. LINEAR MAPS

Page 102

Last revised: November 3, 2012

Topic 15

Matrices of Linear Maps


We will denote the set of all mn matrices with entries in F by Mat(m, n, F).
Under the standard definitions of matrix addition and scalar multiplication,
Mat(m, n, F) is a vector space.
Definition 15.1 We define the Matrix of a Linear Map for T L(V, W )
as follows. Let (v1 , . . . , vn ) be a basis of V and let (w1 , . . . , wm ) be a basis
of W. Then
M(T, (v1 , . . . , vn ), (w1 , . . . , wm )) = [aij ]
(15.1)
where
T vk = a1,k w1 + + am,k wm

(15.2)

If the choice of bases for V and W are clear from the context we use the
notation M(T ).
The following may help you remember the structure of this matrix:

w1
..
.

v1

wm

vk
a1,k
..
.

vn

(15.3)

am,k

Theorem 15.2 Properties of M(T ). Let V, W be vector spaces over F.


Then
(1) Scalar Multiplication: Let T L(V, W) and c F.
M(cT ) = cM(T )

2012. Draft of: November 3, 2012

(15.4)
Page 103

Math 462

TOPIC 15. MATRICES OF LINEAR MAPS

(2) Matrix Addition: Let T, S L(V, W). Then


M(T + S) = M(T ) + M(S)

(15.5)

Proof. (sketch) Express the left hand side of each formula as a matrix and
then apply the properties of matrices as reviewed in Chapter 2.
Matrix Multiplication. To derive a rule for matrix multiplication, suppose that S L(U, V) and T L(V, W). The composition T S is a map
T S : U 7 W, i.e., T S L(U, W):

S : U 7 V and T : V 7 W
(15.6)
= T S : U 7 W
Let n = dim V, m = dim W, and p = dim U, and (v1 , . . . , vn ) be a basis of
V, (w1 , . . . , wm ) be a basis of W, and (u1 , . . . , up ) be a basis of U. Suppose
that
M(T ) = [ai,j ]i{1...m},j{1...n} = [m n] matrix
(15.7)
M(S) = [bj,k ]j{1...n},k{1...p} = [n p] matrix
Then
Suk =

n
X

(15.8)

bj,k vj

(15.9)

ai,j wi

(15.10)

j=1

T vj =

m
X
i=1

and therefore
T Suk = T
=

n
X

bj,k vj

(15.11)

bj,k T vj

(15.12)

j=1
n
X
j=1
n
X

bj,k

j=1

m
X

m
X

ai,j wi

(15.13)

i=1

n
X
wi
ai,j bj,k

i=1

(15.14)

j=1

Therefore if we identify
M(T S)i,k =

n
X

ai,j bj,k

(15.15)

j=1

Page 104

Last revised: November 3, 2012

TOPIC 15. MATRICES OF LINEAR MAPS

Math 462

for all i = 1, . . . , m and k = 1, . . . , p, as the coefficient of the [m p] matrix


M(T S), we can define matrix multiplication by
M(T S) = M(T ) M(S)
| {z } | {z } | {z }
[mp]

[mn]

(15.16)

[np]

This is in fact the same definition of matrix multiplication with which you
are already acquainted.
Definition 15.3 Matrix of a Vector. Let V be a vector space over F and
let v V. If (v1 , , vn ) is a basis of V then for some some a1 , . . . , an F
v = a1 v1 + a2 v2 + + an vn
We define the matrix of the vector v as

a1

M(v) = ...

(15.17)

(15.18)

an
Theorem 15.4 Let V, W be vector spaces over F with bases (v1 , . . . , vn )
and (w1 , . . . , wm ) and let T L(V, W). Then for every v V,
M(T v) = M(T )M(v)

(15.19)

Proof. Let

a1,1
..
M(T ) = .

am, 1

a1,n
..
.

(15.20)

am,n

Then (see equation 15.2)


T vk =

m
X

aj,k wj

(15.21)

j=1

For any vector v V there are some scalars b1 , . . . , bn F such that


v=

n
X

bk vk

(15.22)

k=1

Last revised: November 3, 2012

Page 105

Math 462

TOPIC 15. MATRICES OF LINEAR MAPS

Hence
n
X

Tv = T

!
bk vk

(15.23)

k=1

=
=

n
X
k=1
n
X

bk T v k
bk

aj,k wj

(15.25)

j=1

k=1
m
X

m
X

(15.24)

wj

j=1

n
X

!
aj,k bk

(15.26)

aj,k bk

(15.27)

k=1

Therefore
[M(T v)]j =

n
X
k=1

Similarly, since

a1,1
..
M(T )M(v) = .

am, 1

Pn

a1,n
b1
k=1 a1,k bk

.. .. =
..

. . P
.
n
am,n
bn
k=1 am,k bk

(15.28)

we conclude that
[M(T )M(v)]j =

n
X

aj,k bk = [M(T v)]j

(15.29)

k=1

and therefore
M(T v) = M(T )M(v)

Page 106

(15.30)

Last revised: November 3, 2012

Topic 16

Invertibility of Linear
Maps
Definition 16.1 Let V, W be vector spaces over F and let T L(V, W).
Then T is called invertible if there exists a linear map S L(W, V) such
that

ST = I V
(16.1)
TS = I W
The linear map S is called the inverse of T .
Theorem 16.2 The inverse is unique
Proof. Let T be a linear map with inverses S and S 0 . Then
S = SI = S(T S 0 ) = (ST )S 0 = IS 0 = S 0

(16.2)

Notation: Since the index is unique we denote the inverse of T by T 1 :


T T 1 = T 1 T = I

(16.3)

Theorem 16.3 Let V and W be vector spaces over F and let T L(V, W ).
Then T is invertible if and only if T is both injective (one-to-one) and
surjective (onto).
Proof. ( = ) Assume that T is invertible.
Suppose that u, v V and that T u = T v. Then
u = T 1 (T u) = T 1 T u = T 1 T v = v

2012. Draft of: November 3, 2012

(16.4)
Page 107

Math 462

TOPIC 16. INVERTIBILITY OF LINEAR MAPS

This shows that u, v V,


T u = T v = u = v

(16.5)

Thus T is one-to-one (injective).


Now suppose that w W. Then
w = Iw = T T 1 w = T T 1 w

(16.6)

Thus w W, there exists a vector v V such that w = T v (specifically,


we have shown that v = T 1 w.)
Thus w W, we have w range(T ). This implies that W range(T ).
Since by definition of T , range(T ) W we conclude that range(T ) = W .
Hence T is onto (surjective).
( = ) Suppose that T is both injective and surjective. We must show that
it is invertible. This requires showing that there exists a map S with three
properties: (a)T S is the indentity; (b) ST is the identity; (c) S is linear.
(a) Let w W. Then since T is onto there is some v V such that
w = Tv

(16.7)

Since T is injective (one-to-one), by definition,


(u, v V)(T u = T v = u = v).

(16.8)

Since T is surjective (onto)1 ,


(w W)(!v V)(w = T v)

(16.9)

Define the map S : W 7 V such that v = Sw.Then


w = T Sw

(16.10)

Thus T S = IW (the identity map on W) (since it maps w W to itself).


(b) Next, suppose that v V. Then
T (ST v) = (T S)T v = IW T v = T v

(16.11)

Since T is injective (one-to-one), the fact that T (ST v) = v implies that


ST v = v
1 The

(16.12)

notation !x is read as there exists a unique x.

Page 108

Last revised: November 3, 2012

TOPIC 16. INVERTIBILITY OF LINEAR MAPS

Math 462

Since ST maps every element of V to itself, ST = IV (the identity map on


V)2 .
(c) To show that S is linear, let w1 , w2 W. Then
T (Sw1 + Sw2 ) = T Sw1 + T Sw2
= w1 + w2

because T is linear

(16.13)

because T S = I

(16.14)

Apply S to both sides of this equation:


S(w1 + w2 ) = ST (Sw1 + Sw2 )

(16.15)

= Sw1 + Sw2

since ST = I

(16.16)

For homogeneity, Let a F and w W. Then


T (aSw) = aT (Sw)

Homogeneity of T

(16.17)

Associative

(16.18)

because T S = I

(16.19)

ST (aSw) = Saw

apply S to both sides

(16.20)

aSw = Saw

because ST = I

(16.21)

= aT Sw
= aw

The last line shows that S is homogeneous. Hence S is a linear map; it has
the properties that ST = I and T S = I in their respective domains hence
it is the inverse of T . Hence T is invertible.
Definition 16.4 Let V and W be vector spaces over F. Then V and W are
said to be isomorphic vector spaces if there exists an invertible linear
map T L(V, W) (i.e., there also exists a linear map S = T 1 L(W, V)).
Definition 16.5 An operator is a linear map from a vector space to itself.
We denote the set of operators on V by L(V) (instead of L(V, V)).
Example 16.1 Let V = Rn . Then any n n matrix is an operator on V.
Theorem 16.6 Let V and W be finite-dimensional vector spaces. Then
they are isomorphic if and only if they have the same dimension.
Proof. ( = ) Assume that V and W are isomorphic.
Then there exists an invertible linear map T L(V, W).
Since T is invertible, it is injective (one-to-one), hence by theorem 14.17,
null(T ) = {0}. Hence
dim null(T ) = 0
(16.22)
2 Where

the context is clear we generally omit the subscript on I, even though IW


and IV are different maps since they do not even operate on the same space.

Last revised: November 3, 2012

Page 109

Math 462

TOPIC 16. INVERTIBILITY OF LINEAR MAPS

Because T is invertible, it is surjective (onto) and thus


range(T ) = W

(16.23)

By equation 14.30
dim(V) = dim null(T ) + dim range(T ) = dim range(T ) = dim(W) (16.24)
( = ) Assume that dim(V) = dim(W). To show that this implies isomorphism, we need to show that there exists some invertible linear map
T L(V, W).
Let (v1 , . . . , vn ) and (w1 , . . . , wn ) be bases of V and W.
Define T L(V, W) such that
T (a1 v1 + + an vn ) = a1 w1 + + an wn

(16.25)

Since (w1 , . . . , wn ) is a basis of W, T spans3 W. Hence range(T ) = W,


i.e., T is onto. [More
precisely, w W, there exists constants a1 , . . . , an
P
such P
that w =
ai wi , since (wi , . . . ) is a basis. But by definition of T ,
v=
ai vi is mapped to w by (16.25). Hence (w W)(v V)(w =
T v). Hence T is onto.]
To show that T is one-to-one, suppose that p, q V such that T p = T q
where

p = a1 v1 + + an vn
(16.26)
q = b1 v1 + + an vn
Then since T p = T q
T (a1 v1 + + an vn ) = T (b1 v1 + + bn vn )

(16.27)

By linearity and the observation that T vi = wi ,


a1 w1 + + an wn = b1 w1 + + bn wn

(16.28)

Since (w1 , . . . , wn ) is linearly independent, ai = bi , i = 1, . . . , n. Thus


p = q. Hence T is one-to-one (because we have shown that T p = T q =
p = q).
Since T is one-to-one and onto, it is invertible, and therefore V and W are
isomorphic.

3 By

the expression T spans W we mean that {T v|v V} spans W.

Page 110

Last revised: November 3, 2012

TOPIC 16. INVERTIBILITY OF LINEAR MAPS

Math 462

This theorem gives us the amazing result that any two finite dimensional
vector spaces of dimension n are isomorphic. In particular, as the following
corollary states, any vector space of dimension n is isomorphic to Fn . Thus
everything we need to know about vector spaces we can learn by studying
Fn .
Corollary 16.7 Let V be a finite dimensional vector space of dimension n.
Then V is isomorphic to Fn .
In particular, we will be interested in the matrix of a linear map T
L(V, W). This matrix, a linear map in L(Fn , Fm ) where n = dim(V) and
m = dim(W), is defined by the coefficients that map one set of basis vectors
to the others.
Theorem 16.8 Let V and W be finite dimensional vector spaces with bases
(v1 , . . . , vn ) and (w1 , . . . , wm ), and let T L(V, W). Then M
M(T ) : L(V, W) 7 Mat(m, n, F)

(16.29)

is an invertible linear map, i.e.,


M(T ) L(L(V, W), Mat(m, n, F))

(16.30)

is invertible.
Proof. We have already shown that M(T ) is linear (theorem 15.2). To
show invertibility, we need to show that M(T ) is one-to-one and onto.
Let T null(M), that is, T L(V, W) such that M(T ) = 0, where 0 is
the n m zero matrix.
Then T vk = 0 for all k = 1, . . . , n.
Because (v1 , .P
. . , vn ) is a basis, every v V can be written as linear combination v =
ai vi for some a1 , a2 , . . . . Therefore
T (a1 v1 + + an vn ) = 0

(16.31)

In fact, this must hold for all values of ai F, since every collection of
a1 , a2 , . . . produces some vector in V. By linearity, since T vk = 0, we have
T v = 0 for all v V. Thus T = 0, i.e, T is the operator that maps all
vectors in V to the zero vector in W.
But since T null(M) = T = 0,this means that null(M) = {0}. By
14.17 M is one-to-one (injective).
Last revised: November 3, 2012

Page 111

Math 462

TOPIC 16. INVERTIBILITY OF LINEAR MAPS

Now suppose that A Mat(m, n, F) is any matrix, given by

a1,1 a1,n

..
A = ...
.
am,1 am,n

(16.32)

If we define T L(V, W) by
T vk =

m
X

aj,k wj

(16.33)

j=1

(where (v1 , v2 , . . . ) and (w1 , w2 , . . . ) are the bases of V and W) then


M(T ) = A. Hence range(M(T )) = Mat(m, n, F), meaning M is onto.
Theorem 16.9 dim(Mat(n, m, F)) = mn.
Proof. Use as a basis the set of all matrices with a 1 in one entry and zeros
everywhere else.
Corollary 16.10 Let V and W be finite dimensional vector spaces over F.
Then
dim(L(V, W)) = (dim V) (dim W)
(16.34)
Proof. L(V, W) is isomorphic to Mat(m, n, F), which has dimension mn,
where m = dim V and n = dim W.
Theorem 16.11 Let V be a finite dimensional vector space and let T
L(V). Then the following are equivalent:
(a) T is invertible
(b) T is one-to-one
(c) T is onto.
Proof. ((a) = (b)) This follows because T invertible = T is both onto
and one-to-one.
((b) = (c))) Assume that T is one-to-one. Then null(T ) = {0}. Hence
by (14.30)
dim range(T ) = dim(V) dim(null(T )) = dim(V)

(16.35)

By exercise 2.11, range(T ) = V. Hence T is onto, proving (c).


((c) = (a)). Assume that T is onto.
Page 112

Last revised: November 3, 2012

TOPIC 16. INVERTIBILITY OF LINEAR MAPS

Math 462

Then range(T ) = V, and so dim range(T ) = dim(V). Hence


dim null(T ) = dim V dim range(T ) = 0

(16.36)

Therefore T is one-to-one. Since T is one-to-one it is invertible, so (a) is


true.

Last revised: November 3, 2012

Page 113

Math 462

TOPIC 16. INVERTIBILITY OF LINEAR MAPS

Page 114

Last revised: November 3, 2012

Topic 17

Operators and
Eigenvalues
Definition 17.1 An operator is a linear map from a vector space to itself.
We denote the set of operators on V by L(V) (instead of L(V, V)).
If T L(V) then T n L(V) for any positive integer n. We use the notation
T 2 to denote the product T T , T 3 = T T T , etc.
If T is invertible, then we define T m = (T 1 )m . Furthermore,
T m T n = T m+n ,

(T m )n = T mn

(17.1)

where m, n are any integers.


If T is not invertible then equation 17.1 still holds for integers n, m 0.
Definition 17.2 If p(x) = a0 + a1 x + + an xn is a polynomial and T is
an operator then we define the polynomial operator as
p(T ) = a0 I + a1 T + + an T n

(17.2)

Thus if v V by (17.2) we mean


p(T )v = (a0 I + a1 T + + an T n )v

(17.3)

If 1 , . . . , m are the complex roots of p(x) then p(T )v can be factored


p(T )v = c(T 1 I) (T n I)v

(17.4)

If p and q are polynomials then we define the product of polynomial


operators as
(pq)(T ) = p(T )q(T )
(17.5)

2012. Draft of: November 3, 2012

Page 115

Math 462

TOPIC 17. OPERATORS AND EIGENVALUES

Definition 17.3 Let T L(V) be an operator on a finite dimensional


non-zero vector space V over F, and let U be a subspace of V. Then the
restriction of T to U, denoted by T |U is T operating on U.
Definition 17.4 Let V be a finite dimensional non-zero vector space over
V, let U be a subspace of V, and let T L(V) be an operator on V. Then
U is invariant under T if
range(T |U ) = U

(17.6)

i.e, if for every u U then T u U. In this case T |U is an operator on U.


Definition 17.5 Let V be a vector space and T L(V). If there exists a
vector v V, v 6= 0, and a scalar F such that
T v = v

(17.7)

Then is called an eigenvalue of T with eigenvector v.


From equation 17.7 we see that (, v) are an eigenvalue-eigenvector pair if
and only if
(T I)v = 0
(17.8)
Remark 17.6 The set of all eigenvectors of T is null(T I).
Remark 17.7 Let T be an operator on V, and let be an eigenvalue of T .
Then the set of all eigenvectors of T with eigenvalue is a subspace of V.
Theorem 17.8 The following are equivalent:
(a) is an eigenvalue of T
(b) T I is not injective (1-1).
(c) T I is not invertible.
(d) T I is not surjective (onto).
Proof. ((a) (b))
is an eigenvalue of T u 6= 0 such that (T I)u = 0

(17.9)

u null(T I)

(17.10)

null(T I) 6= {0}

(17.11)

T is not injective (1-1), by theorem 14.17.


(17.12)
((b) (c)) An operator is injective (1-1) iff it is invertible. Since T is
not injective (1-1), it is not invertible. This follows from theorem 16.11.
((b) (d)) An operator is invertible iff it is surjective (onto), which
also follows from theorem 16.11.
Page 116

Last revised: November 3, 2012

TOPIC 17. OPERATORS AND EIGENVALUES

Math 462

Theorem 17.9 Let V be a finite-dimensional vector space over F and T


L(V). Let 1 6= 2 6= 6= m be eigenvalues of T with corresponding nonzero eigenvectors v1 , . . . , vm . Then (v1 , . . . , vm ) is linearly independent.
Proof. Suppose (v1 , . . . , vm ) is linearly dependent.
Since the eigenvectors are all nonzero then v1 6= 0.
By theorem 12.6, there is some k such that
vk span(v1 , . . . , vk1 )

(17.13)

and if vk is removed from (v1 , . . . , vm ) then the span of the remaining list
equals the span of the original list. Let k designate the smallest integer
such that this is true.
Since k is the smallest integer for which this is true, the list (v1 , . . . , vk1 )
is linearly independent.
Hence there exists constants a1 , . . . , ak F such that
vk = a1 v1 + + ak1 vk1

(17.14)

Applying T to both sides of the equation and applying linearity,


T vk = T a1 v1 + + T ak1 vk1

(17.15)

Since the j are eigenvalues of T with eigenvectors vj , T vj = j vj and


therefore
k vk = a1 1 v1 + + ak1 k1 vk1
(17.16)
From equation 17.14,
k vk = a1 k v1 + + ak1 k vk1

(17.17)

Subtracting equation 17.16 from 17.17,


0 = a1 (k 1 )v1 + + ak1 (k k1 )vk1

(17.18)

Since (v1 , . . . , vk1 ) is linearly independent, and since the j are all distinct,
a1 = a2 = = ak1 = 0
(17.19)
From equation 17.14, vk = 0. This contradicts the assumption that all the
vk 6= 0.
Therefore (v1 , . . . , vm ) must be linearly independent.

Last revised: November 3, 2012

Page 117

Math 462

TOPIC 17. OPERATORS AND EIGENVALUES

Corollary 17.10 Let T L(V) be an operator on V. Then T has at most


dim(V) distinct eigenvalues.
Proof. Let 1 , . . . , m be the distinct eigenvalues with corresponding nonzero eigenvectors (v1 , . . . , vm ).
By theorem 17.9, the list (v1 , . . . , vm ) are linearly independent.
By theorem 12.7, the length of every linearly independent list is at most the
length of every spanning list of V. By definition of dimension, the length
of any basis of V is dim V, Hence
m = length(v1 , . . . , vm ) dim V

(17.20)

as stated in the corollary.


Theorem 17.11 Let V be a finite-dimensional non-zero complex vector
space, and let T L(V ). Then T has an eigenvalue.
Proof. Let n = dim(V) > 0. Pick any nonzero v V, and define
` = (v, T v, T 2 v, . . . , T n v)

(17.21)

length(`) = n + 1 > n = dim(V )

(17.22)

Since
the list ` is linearly-dependent. Hence there exists a0 , . . . , an , not all zero,
such that
0 = a0 v + a1 T v + a2 T 2 v + + an T n v
(17.23)
Define m n as the largest integer such that am 6= 0. Then
0 = a0 v + a1 T v + a2 T 2 v + + am T m v

(17.24)

Define the polynomial


p(z) = a0 + a1 z + + am z m

(17.25)

= c(z 1 )(z 2 ) (z m )

(17.26)

for some roots 1 , 2 , . . . . Then


0 = p(T )v

(17.27)
2

Page 118

= (a0 I + a1 T + a2 T + + am T )v

(17.28)

= c(T 1 I) (T m I)v

(17.29)

Last revised: November 3, 2012

TOPIC 17. OPERATORS AND EIGENVALUES

Math 462

for all nonzero vectors v V. Hence for some j


(T j I)w = 0

(17.30)

w = (T j+1 I) (T m I)v 6= 0

(17.31)

null(T j I) 6= {0}

(17.32)

where
Thus
Hence T j I is not injective (1-1). By theorem 17.8, j is an eigenvalue
of T .
Hence an eigenvalue exists.

Last revised: November 3, 2012

Page 119

Math 462

TOPIC 17. OPERATORS AND EIGENVALUES

Page 120

Last revised: November 3, 2012

Topic 18

Matrices of Operators
Definition 18.1 Matrix of an Operator. Let T L(V) and let (v1 , . . . , vn )
be a basis of V. Then there are some numbers ai,j such that

T v1 = a11 v1 + + an1 vn

T v2 = a12 v1 + + an2 vn
(18.1)
..

T vn = a1n v1 + + ann vn
Then we define the matrix of T with respect to the basis (v1 , . . . , vn )
as

a11 a12 a1n


a21 a22 a2n

M(T ) = M(T, (v1 , . . . , vn )) = .


(18.2)
..
..
..
.
.
an1 an2 ann
The j th column of M represents T vj . One way to remember this is by the
matrix product
(T v1 , . . . , T vn ) = (v1 , . . . , vn )M(T )

(18.3)

MT = A1 T A = (v1 , . . . , vn )1 T (v1 , . . . , vn )

(18.4)

or
where A = (v1 , . . . , vn ).

2012. Draft of: November 3, 2012

Page 121

Math 462

TOPIC 18. MATRICES OF OPERATORS

Theorem 18.2 Let V be a vector space over F with basis (v1 , . . . , vn ) and
T L(V) an operator on V. The the following are equivalent:
(1) M(T, (v1 , . . . , vn ) is upper triangular.
(2) T vk span(v1 , . . . , vk ) for k = 1, . . . , n.
(3) span(v1 , . . . , vk ) is invariant under T for each k = 1, . . . , n.
Proof. ((1) = (2)) Let M(T ) be upper triangular,

a11
0

M(T ) = 0
..
.
0

a12
a22
0
..
.

a33
..
.

a1n1

..
.
0

a1n
a2n

a3n

..
.

(18.5)

ann

then from equation 18.1


T v1 = a11 v1
T v2 = a12 v1 + a22 v2
T v3 = a13 v1 + a23 v2 + a33 v3
..
.
T vn = a1n v1 + + ann vn

(18.6)

Thus each T vi is a linear combination of the (v1 , . . . , vi ), i.e.,


T vi span(v1 , . . . , vi )

(18.7)

which is statement (2). Hence ((1) = (2)).


((2) = (1)) Assume each T vi span(v1 , . . . , vi ).
Then there exist aij such that equation 18.6 holds.
Then M(T ) is defined as in equation 18.5.
Hence M(T ) is upper triangular, and ((2) = (1)).
((3) = (2)) Assume span(v1 , . . . , vk ) is invariant under T .
Then

Page 122

T v1 span(v1 )
T v2 span(v1 , v2 )
..
.

T vk span(v1 , . . . , vk )
..
.

(18.8)

Last revised: November 3, 2012

TOPIC 18. MATRICES OF OPERATORS

Math 462

which is precisely statement (2).


((2) = (3)). Fix k and assume that T vk span(v1 , . . . , vk ). Hence
T v1 span(v1 ) span(v1 , . . . , vk )
T v2 span(v1 , v2 ) span(v1 , . . . , vk )
..
.
T vk span(v1 , . . . , vk ) span(v1 , . . . , vk )

(18.9)

Let v span(v1 , . . . , vk ). Then v is a linear linear combination of (v1 , . . . , vk ).


By equation 18.9,
T v = a1 v1 + a2 v2 + + an vk span(v1 , . . . , vk )

(18.10)

Hence span(v1 , . . . , vk ) is invariant under T . Hence ((2) = (3)).


Theorem 18.3 Let V be a complex vector space and T L(V). Then
there exists a basis under which M(T ) is upper triangular.
Proof. Prove by induction on the dimension of the vector space.
Let n = dim V.
Since any 1 1 matrix is upper-triangular, the result holds for n = 1.
Assume n > 1 and that the result holds for dim V = 1, 2, . . . , n 1.
Let be any eigenvalue of T (see theorem 17.11). Define
U = range(T I)

(18.11)

Since is an eigenvalue of T , null(T I) is non-empty and T I is not


surjective (onto) (theorem 17.8). Hence dim null(T I) > 0. Thus
dim V = dim null(T I) + dim range (T I)

(18.12)

> dim range (T I)

(18.13)

= dim U.

(18.14)

Let m = dim U. Then by (18.14), m < n.


Let u U. Then
T u = T u Iu + u = (T I)u + u

(18.15)

Since u U and (T I)u U (by definition of U), then T u U. Hence


U is invariant under T .
Since U is invariant under T , T |U is an operator on U.
Last revised: November 3, 2012

Page 123

Math 462

TOPIC 18. MATRICES OF OPERATORS

Hence T |U has an upper triangular basis (u1 , . . . , um ) (by the inductive


hypothesis and equation 18.14, and the fact that m < n (from (18.14))).
By theorem 18.2
T uj = (T |U )uj span(u1 , . . . , uj )

(18.16)

for each j (because the basis is upper triangular).


We can extend (u1 , . . . , um ) to a basis of V by adding some vectors v1 , . . . , vnm
that are not in span (u1 , . . . , um ):
B = (u1 , . . . , um , v1 , . . . , vnm )

(18.17)

For each k, these added vectors satisfy


T vk = T vk vk + vk = (T I)vk + vk

(18.18)

Since
(T I)vk U

(by definition of U)

vk span (v1 , . . . , vnm )

(18.19)
(18.20)

we concluded that
T vk span(B)

(18.21)

Hence theorem 18.2 applies again and T has an upper triangular matrix
with respect to the basis B.
Theorem 18.4 Let V be a vector space over F and let T L(V ) be such
that M(T ) is upper triangular with resepct to some basis B = (v1 , . . . , vn )
of V. Then T is invertible if and only if all the entries on the diagonal of
M(T ) are nonzero.
Proof. ( = ) Let M(T ) be upper triangular, and write

1
0 2

M(T, B) =
..
.
.
0
.
0
.
0

(18.22)

where means anything.


If 1 = 0 then T v1 = 0 and hence T is not invertible. (because null(T ) 6=
{0}, hence T is not injective, hence it is not invertible).
Suppose k = 0 for some k.
Page 124

Last revised: November 3, 2012

TOPIC 18. MATRICES OF OPERATORS

Math 462

By theorem 18.2, because the matrix is upper triangular,


T vi span(v1 , . . . , vk1 )

(18.23)

for i = 1, . . . , k 1.
Because k = akk = 0,
T vk = a1k v1 + a2k v2 + + ak1,k vk1 + akk vk

(18.24)

= a1k v1 + a2k v2 + + ak1,k vk1

(18.25)

= T vk span(v1 , . . . , vk1 )

(18.26)

Define the linear map


S : span(v1 , . . . , vk ) 7 span(v1 , . . . , vk1 )

(18.27)

Sv = T |(v1 ,...,vk ) v

(18.28)

dim(v1 , . . . , vk1 ) = k 1

(18.29)

by
But

dim(v1 , . . . , vk ) = k

(18.30)

hence S cannot be injective (Corollary 14.20), and


k = dim(v1 , . . . , vk )

(18.31)

= dim range (S) + dim null(S)

(18.32)

= k 1 + dim null(S)

(18.33)

= dim null(S) > 0

(18.34)

Hence there exists a vector v such that Sv = 0 (Theorem 14.17).


Hence T v = 0 and therefore T is not injective (also by Theorem 14.17).
Since T is not injective, T is not invertible (theorem 16.11).
Hence if any of the k = 0 then T is not invertible. Hence invertibility
requires that all the diagonal elements be nonzero.
( = ) Suppose T is not invertible.
Then T is not injective (see theorem 16.11).
Thus there exists a v 6= 0 such that T v = 0 (Theorem 14.17).
Since B = (v1 , . . . , vn ) is a basis of V (given), there exists a1 , . . . , ak with
ak 6= 0, k n, such that
v = a1 v1 + + ak vk
Last revised: November 3, 2012

(18.35)
Page 125

Math 462

TOPIC 18. MATRICES OF OPERATORS

Choose k to be the largest k such that equation 18.35 holds (with ak 6= 0).
Then
0 = T v = a1 T v1 + + ak1 T vk1 + ak T vk
= ak T vk = a1 T v1 ak1 T vk1
ak1
a1
T vk1
= T vk = T v1
ak
ak

(18.36)
(18.37)
(18.38)

But because T is upper triangular,

T v1 = b1,1 v1
T v2 = b1,2 v1 + b2,2 v2
..
.
T vk1 = b1,k1 v1 + b2,k1 v2 + + bk1,k1 vk1

(18.39)

where the bij are the elements of M(T, B). Hence by substituting the
expressions for each T vj in (18.39) into the expression for T vk in (18.38),
T vk =

a1
a2
b1,1 v1 (b1,2 v1 + b2,2 v2 )
ak
ak
ak1
(b1,k1 v1 + b2,k1 v2 + + bk1,k1 vk1 )

ak

(18.40)

and consequently T vk span(v1 , . . . , vk1 ) (because vk does not appear


in the above expansion).
Thus for some numbers c1 , . . . , ck1 ,
T vk = c1 v1 + + ck1 vk1

(18.41)

But because T is upper triangular,


T vk = b1,k v1 + b2,k v2 + + bk1,k vk1 + bk,k vk
Comparing the last two expressions gives ci = bi,k , i = 1, . . . , k 1, and 0 =
bk,k = k (the last equality follows because bk,k is the diagonal entry.
Theorem 18.5 Let V be a vector space and T L(V). Then M(T, B) is
diagonal if and only if V has a basis consisting solely of eigenvectors of T .
Proof. Let B = (v1 , . . . , vn ) be a basis of V.
But M(T, B) = [aij ] is diagonal if and only if (see equation 18.1),

T v1 = a11 v1
T v2 = a22 v2
..
.
T vn = ann vn
Page 126

(18.42)

Last revised: November 3, 2012

TOPIC 18. MATRICES OF OPERATORS

Math 462

which is true if and only if (v1 , . . . , vn ) are eigenvectors of T .


Theorem 18.6 Let V be a vector space and T L(V) an operator with
dim V distinct eigenvalues. Then V has a basis B such that M(T, B) is
diagonal.
Proof. Suppose that T has n = dim V distinct eigenvalue 1 , . . . , n with
corresponding non-zero eigenvectors v1 , . . . , vn . Define the list B = (v1 , . . . , vn ).
By theorem 17.9, since the i are distinct then B is linearly independent.
Consequently, since
lengthB = n = dim(V)

(18.43)

then B is a basis, because every list of linearly independent vectors with


the same length as the dimension of the vector space is a basis (theorem
13.11).
Since B consists solely of eigenvectors of T , then M(T, B) is diagonal by
theorem 18.5.
Remark 18.7 The converse of theorem 18.6 is not true. It is possible to
find operators that have diagonal matrices even though they do not have n
distinct eigenvalues.
Theorem 18.8 Let V be a vectors space and T L(V); let 1 , . . . , m be
the distinct eigenvalues of T . Then the following are equivalent:
1. M (T, B) is diagonal with respect to some basis B of V.
2. V has a basis consisting only of eigenvectors of T .
3. There exist one-dimensional subspaces U1 , . . . , Un of V that are each
invariant under T such that
V = U1 U2 Un

(18.44)

4. V = null(T 1 I) null(T m I)
5. dim V = dim null(T 1 I) + + dim null(T m I)
Proof. ((1) (2)) This is theorem 18.5.
((2) = (3)) Assume (2), that V has a basis B = (v1 , . . . , vn ) consisting
of eigenvectors of T .
Let
Uj = span(vj ),
Last revised: November 3, 2012

j = 1, 2, . . . , n

(18.45)
Page 127

Math 462

TOPIC 18. MATRICES OF OPERATORS

By definition each Uj is one-dimensional. Furthermore, since vj is an eigenvector then


T vj = j vj Uj
(18.46)
so each Uj is invariant under T .
Since B is a basis, each vector v V can be written as a linear combination
v = a1 v1 + a2 v2 + + an vn

(18.47)

By the closure of each subspace Uj , ui = ai vi Uj and therefore


v = u1 + + un

(18.48)

where ui Ui , i = 1, . . . , n. These ui are unique, hence


V = U1 Un

(18.49)

hence ((2) = (3).


((3) = (2)) Assume that (3) holds. Then there are one-dimensional subspaces U1 , . . . , Un , each invariant under T , such that equation 18.49 is true.
Let vj Uj such that vj 6= 0, j = 1, . . . , n.
Since Uj is invariant under T , T vj Uj for each j.
Since Uj is one-dimensional, (vj ) is a basis for Uj , for each j.
Hence for each j, there exists some number j such that T vj = j vj .
Hence vj is an eigenvector of T .
By definition of the direct sum each vector v V can be written as a sum
as in 18.48, where, as we have just shown, each uj is is a scalar multiple of
the corresponding vj we chose above, i.e., v V, u1 U1 , . . . , un Un ,
such that
v = u1 + + un
= 1 v1 + + n vn

(18.50)
(18.51)

Hence (v1 , . . . , vn ) spans V. We have shown that each vj is an eigenvector.


Furthermore, the vj are linearly independent (exercise).
Hence (v1 , . . . , vn ) is a basis of eigenvectors. Hence ((3) = (2)).
((2) = (4)) Assume (2). Then V has a basis consisting of eigenvectors
(v1 , . . . , vn ) of T . Let (1 , . . . , n ) be the eigenvalues.
Let v V. Then there are constants ai such that
v = a1 v1 + + an vn
Page 128

(18.52)

Last revised: November 3, 2012

TOPIC 18. MATRICES OF OPERATORS

Math 462

Since vi is an eigenvector,
span(vi ) = null(T i I)

(18.53)

Combining equations 18.52 and 18.53


V = null(T 1 I) + + null(T n I)

(18.54)

Suppose that there exist ui null(T i I) such that


0 = u1 + + un

(18.55)

Since each ui is an eigenvector, and the eigenvectors form a basis (by assumption), the ui are linearly independent. Thus each term in equation
18.55 is zero:
ui = 0
(18.56)
By theorem 10.7, the uniqueness of the expansion of 0 tells us that the sum
in (18.54) is a direct sum:
V = null(T 1 I) null(T n I)

(18.57)

and (3) = (4).


((4) = (5)) follows from exercise 2.17 in Axler.1
((5) = (2)) Assume that (5) is true. Then
dim V = dim null(T 1 I) + + dim null(T m I)

(18.58)

Define Ui = null(T i I).


Define a basis of each Uj and put all these bases together to form a list
B = (v1 , . . . , vn ), where n = dim V.
Each vi is an eigenvector of T because vi Ui = null(T i I).
Suppose that there exist ai F such that
0 = a1 v1 + + an vn

(18.59)

Define ui as the sum of all the terms in (18.59) such that vk null(T
i I) = Ui (the number of eigenvalues might be smaller than the number of
eigenvectors so there might be more than one linearly independent eigenvector is each set). Thus for some m,
0 = u1 + + um

(18.60)

1 Exercise

2.17 states the following: Let V be finite dimensional


and have subspaces
P
U1 , . . . , Um such that V = U1 Um . Then dim V =
dim Ui .

Last revised: November 3, 2012

Page 129

Math 462

TOPIC 18. MATRICES OF OPERATORS

But each ui is an eigenvector of T with eigenvalue i . [To see this, suppose


that
ui = ai,1 vi,1 + + ai,k vi,k
(18.61)
Since each vi,j Ui , it has eigenvalue i , and so
T ui = ai,1 T vi,1 + + ai,k T vi,k

(18.62)

= ai,1 i vi,1 + + ai,k i vi,k

(18.63)

= i (ai,1 vi,1 + + ai,k vi,k )

(18.64)

= i ui

(18.65)

hence ui is an eigenvector with eigenvalue i .]


By equation 18.60 this means
u1 + + um = 0

(18.66)

Since eigenvectors of distinct eigenvalues are linearly independent, this is


only true if each ui = 0.
But each ui is the sum as in 18.61 where the vi,j are a basis of Ui , then
the coefficients in equation 18.59 are also zero. This means that the vi are
linearly independent and hence form a basis.
Thus V has a basis consisting of eigenvectors, hence (5) = (2).

Page 130

Last revised: November 3, 2012

Topic 19

The Canonical Diagonal


Form
Note for Next Year
(1) Move the material from chapter 6 here and integrate more thoroughly.
(2) The examples in this chapter are confusing and are not well justified;
they need to be re-written and explained more thoroughly in light of the
material in the chapter.
Definition 19.1 A linear transformation on Rn is a linear map
T : Rn 7 Rn

(19.1)

y = Tx

(19.2)

given by
n

where x, y R and T is an n n matrix.


Definition 19.2 Two square matrices A and B, both of dimension n, are
said to be similar if there exists an n n invertible matrix U such that
U 1 AU = B

(19.3)

Similar matrices represent the same linear transformation in different coordinate systems.
Let E = (e1 , e2 , . . . , en ) be a basis of Rn . Then we can write
x = 1 e1 + 2 e2 + + n en

(19.4)

y = 1 e1 + 2 e2 + + n en

(19.5)

2012. Draft of: November 3, 2012

Page 131

Math 462

TOPIC 19. THE CANONICAL DIAGONAL FORM

for some 1 , . . . , n and 1 , . . . , n R. The numbers i , i are called the


coordinates of the vectors x and y with respect to the basis E.
If we let E be the matrix whose columns are the vectors e1 , . . . , en , then
x = E,

y = E

(19.6)

where and are column vectors, i.e.,


1
x1
..
..
=
E
. ,
.


y1
1
..
..
=
E
.
.

xn

yn

(19.7)

If y = T x then
E = T E = = (E 1 T E)

(19.8)

In other words if y = T x in a coordinate system defined by basis (b1 , . . . , bn )


then then = T 0 in the coordinate system defined by basis (e1 , . . . , en ),
where T 0 = E 1 T E.
Example 19.1 Let A represent the counter-clockwise rotation through 90
degrees:


0 1
A=
(19.9)
1 0
For example if x = (0.5, 0.7)T then

y = Ax =

0
1

1
0

  

0.5
.7
=
0.7
5

(19.10)

Consider the basis


e1 =

 
1
,
1

e2 =

 
0
1

(19.11)

The transformation matrix is



T =

1
1


0
1

(19.12)

and its inverse is


T 1 =
Page 132

1
1


0
1

(19.13)

Last revised: November 3, 2012

TOPIC 19. THE CANONICAL DIAGONAL FORM

Math 462

so that the transformation of the matrix of the linear transformation given


by y = Ax is




1 0
0 1
1 0
1
B = T AT =
(19.14)
1 1
1 0
1 1



1 0
1 1
=
(19.15)
1 1
1
0


1 1
=
(19.16)
2
1
Now consider the vector
= T 1 x =

1
1

   
0
0.5
0.5
=
1
0.7
0.2

(19.17)

Then represents the same vector as x but in the coordinate system


(e1 , e2 ). In this coordinate system, the transform y = Ax is represented by

  

1 1
0.5
0.7
= B =
=
(19.18)
2
1
0.2
1.2
Then


1
T =
1

0
1


 

0.7
.7
=
=y
1.2
0.5

(19.19)

In terms of the original basis,


 
  

1
0
0.7
+ 1.2
=
1
1
0.5
 
   
1
0
0.5
= 0.5e1 + 0.2e2 = 0.5
+ 0.2
=
1
1
0.7

= 0.7e1 + 1.2e2 = 0.7

So we have in the standard basis ((1, 0), (0, 1)):


 


0.5
0.7
A:
7
0.7
0.5

(19.20)

(19.21)

(19.22)

and in the basis (e1 , e2 )



B:




0.5
0.7
7
0.2
1.2

(19.23)

representing the same transform in different coordinate systems.


Theorem 19.3 Similar matrices have the same eigenvalues with the same
multiplicities.
Last revised: November 3, 2012

Page 133

Math 462

TOPIC 19. THE CANONICAL DIAGONAL FORM

Proof. Let B = T 1 AT . Then, using the property that det AB = (det A)(det B)
gives
det(B I) = det(T 1 AT I)

(19.24)

= det(T

(A T IT

= det(T

(A I)T )

= (det T

)T )

)(det(A I))(det T )

= det(A I)

(19.25)
(19.26)
(19.27)
(19.28)

In the last step we used the fact that


1 = det I = det(T T 1 ) = (det T )(det T 1 )

(19.29)

Since A and B have the same characteristic equation they have the same
eigenvalues with the same multiplicities.
Example 19.2 From the previous example, we had B = T 1 AT where




0 1
1 1
A=
,
B=
(19.30)
1 0
2
1
Each of these matrices have the same eigenvalues {i, i}. To see this observe that
det(A I) = 2 + 1
(19.31)
while
det(B I) = (1 )(1 ) + 2 = 1 + + 2 + 2 = 2 + 1 (19.32)
Definition 19.4 A diagonal matrix is called the Diagonal Canonical
Form of a square matarix A if it is similar to A.
Theorem 19.5 Let A be an n n square matrix. Then A is similar to a
diagonal matrix if and only if A has n linearly independent eigenvectors.
Proof. ( = ) Suppose A is similar to a diagonal matrix. Then there
exists some invertible matrix T with linearly independent column vectors
(e1 , . . . , en ) such that
T 1 AT = diagonal(1 , . . . , n )

(19.33)

Multiplying on the left by T ,

Ae1
Page 134


AT = e1 en diagonal(1 , . . . , n )


Aen = 1 e1 n en

(19.34)
(19.35)

Last revised: November 3, 2012

TOPIC 19. THE CANONICAL DIAGONAL FORM

Math 462

On the left hand side, we see that the jth column vector of AT is Aej , and
on the right hand side the jth column is j ej .Hence
Aej = j ej

(19.36)

Therefore the vectors ej must be eigenvectors of A with eigenvalues j .


Hence A has n linearly independent eigenvectors.
( = ) Suppose that A has n linearly independent eigenvectors (e1 , . . . , en )
with eigenvectors 1 , . . . , n . Then let T be the matrix whose columns are
ei . Then

AT = A e1 en
(19.37)

= Ae1 Aen
(19.38)

= 1 e1 n en
(19.39)

= e1 en diagonal(1 , . . . , n )
(19.40)
= T diagonal(1 , . . . , n )
T

AT = diagonal(1 , . . . , n )

(19.41)
(19.42)

Hence A is similar to a diagonal matrix.


Example 19.3 Find the diagonal canonical form of the matrix


0 1
A=
1 0

(19.43)

The characteristic equation is 0 = 2 + 1 hence the eigenvalues are = i.


The eigenvector of i is found by solving

 
 
0 1
x
x
=i
1 0
y
y

(19.44)

This gives y = ix and x = iy. One


is arbitrary so we choose
 parameter

1
x = 1 to give y = i. Heece e1 =
.
i
Similarly, the eigenvector of i is found by solving

 
 
0 1
x
x
= i
1 0
y
y

(19.45)

Hence x= iy and y = ix. Again choosing x = 1 give y = i; hence


1
e2 =
.
i
Last revised: November 3, 2012

Page 135

Math 462

TOPIC 19. THE CANONICAL DIAGONAL FORM

The transformation matrix is


T = e1
Its inverse is
T


e2 =

1
=
2


1
1

1 1
i i


i
i

(19.46)

(19.47)

Hence


1 1 i
0
1
2 1 i


1 1 i
i
=
1
2 1 i


i 0
=
0 i

T 1 AT =



1
1 1
0
i i

i
1

(19.48)
(19.49)
(19.50)

which, as expected, is the diagonal matrix of eigenvalues of A.

Page 136

Last revised: November 3, 2012

Topic 20

Invariant Subspaces
In this section we will assume that V is a real, finite-dimensional, non-zero
vector space.
Theorem 20.1 Let V be a finite dimensional non-zero real vector space.
Then V has an invariant subspace of either dimension 1 or dimension 2.
Proof. Let n = dim(V) > 0, T L(V), and pick any v V such that v 6= 0.
Then define the list
L = (v, T v, T v2 , . . . , T vn )

(20.1)

Since Length(L) > n = dim V, L cannot be linearly independent. Hence


there exist real numbers (note that we are assuming that V is real, hence
F = R) a0 , . . . , an such that
0 = a0 v + a1 T v + a2 T 2 v + + an T n v

(20.2)

Define the polynomial p(x) by


p(x) = a0 + a1 x + a2 x2 + + an xn

(20.3)

Then p(x) nas n complex roots, which may be grouped into m real roots
1 , . . . , m , and k = (m n)/2 complex conjugate pairs of roots (see theorem 11.12) and can be factored
p(x) = c(x 1 ) (x m )(x2 + 1 x + 1 ) (x2 + k x + k ) (20.4)

2012. Draft of: November 3, 2012

Page 137

Math 462

TOPIC 20. INVARIANT SUBSPACES

where c and all the j , j , j are real. From equation 20.2


0 = a0 Iv + a1 T v + a2 T 2 v + + an T n v
2

(20.5)

= (a0 I + a1 T + a2 T + + an T )v

(20.6)

= c(T 1 I) (T m I)(T + 1 T + 1 I) (T + k T + k I)v


(20.7)
Since 0 is in the null space of the operator on the right (the entire product)
it must be in the null space of at least one of the factors.
Hence at least one of the factors on the right is not injective (1-1).
Either T j I is not injective for some j, or T 2 + j T + j I is not not
injective for some j.
If T j I is not injective, then T has an eigenvalue j . Then T has an
invariant subspace because T vj = j vj span(vj ). This subspace has
dimension 1, hence T has an invariant subspace of dimension 1.
If T 2 + j T + j I is not injective, then there is a non-zero vector u such
that
T 2 u + j T u + j u = 0
(20.8)
Let U = span(u, T u). Then either dim U = 1 or dim U = 2. Let v U.
Then in general there are numbers a, b such that
v = au + bT u

(20.9)

because (u, T u) spans U. Hence from equation 20.8,


T v = T (au + bT u)

(20.10)

= T au + bT u

(20.11)

= T au + b(j T u j u)

(20.12)

rearranging
T v = (a bj )T u bj u span(u, T u)

(20.13)

Thus U is invariant under T . Hence in this case T has an invariant subspace


of either dimension 1 or dimension 2.
Definition 20.2 (Projection.) Suppose that V = U W. Then for any
vector v V such that v = u+w, where u U and v W, the Projection
of v onto U (with null space W) is
PU ,W v = u

(20.14)

and the Projection of v onto W (with null space U) is


PW,U = w
Page 138

(20.15)

Last revised: November 3, 2012

TOPIC 20. INVARIANT SUBSPACES

Math 462

Remark 20.3 Properties of Projections. Let V = U W. Then


(1) v = PU ,W v + PW,U v
(2) PU2 ,W = PU ,W
(3) range (PU ,W ) = U
(4) null(PU ,W ) = W
Theorem 20.4 Let V be an odd-dimensional real-vector space, and let
T L(V). Then T has at least one eigenvalue.
Proof. Prove by induction on n, the dimension of the vector space.
For n = 1. Let v V be non-zero. Since dim V = 1, the list (v) is a basis
of V. Since T is an operator, T v V. Hence T v = cv V for some c R.
Hence c is an eigenvalue.
Inductive Hypothesis. Let n 3 be odd and suppose that for each k =
1, 3, 5, . . . , n 2, if U is a vector space with dimension k, and if T L(U ),
then T has at least one eigenvalue.
Let V be a vector space with dimension n and T L(V). Either T has an
eigenvalue or it does not. If it does the theorem is proven.
Suppose that T does not have an eigenvalue. Then by theorem 20.1, T has
an invariant subspace U of dimension 2. (Strictly speaking, the theorem
says that the subspace has dimension 1 or 2, but in the proof of the theorem
we observed that if the dimension is 1, then T v span (v) for every v V,
which would make them all eigenvectors, contradicting the assumptions
that T does not have an eigenvalue.)
Define W by V = U W (W exists by theorem 13.6).
Define the operator S L(W) by
Sw = PW,U T w

(20.16)

This definition makes sense because T w V and PW,U : V 7 W.


Hence S is an operator on W.
Since dim U = 2, dim W = dim V 2 = n 2, so the inductive hypothesis
applies to S.
By the inductive hypothesis, S has an eigenvalue with eigenvector w.
Hence
(S I)w = 0
(20.17)
Let u U and a R, and define
v = u + aw
Last revised: November 3, 2012

(20.18)
Page 139

Math 462

TOPIC 20. INVARIANT SUBSPACES

Then
(T I)v = (T I)(u + aw)

(20.19)

= T u u + aT w aw

(20.20)

= T u u + a(T w w)

(20.21)

Since we can separate any vector, such as T w into the sum of its projections
onto U and W,
Tw

}|
{
z
(T I)v = T u u + a(PU ,W T w + PW,U T w w)
= T u u + a(PU ,W T w + Sw w)
= T u u +a PU ,W T w + (S I) w
| {z }
| {z } | {z }
U

(20.22)
(20.23)
(20.24)

=0

The first term is in U by definition of U as the span of (u, T u). The second
term is in U because it is the projection of a vector onto U with null space
W.
Hence (T I)v U.
(T I) : (U + span(w)) 7 U

(20.25)

This is a mapping of higher dimensional subspace to a lower dimensional


subspace because
dim U < dim(U + span(w))
(20.26)
By corollary 14.20
(T I)|U +span(w)

(20.27)

is not injective (1-1), hence its null space is not {0} (theorem 14.17). Since
the null space is not {0}, there exists a non-zero vector v U + span(w)
such that
(T I)v = 0
(20.28)
Thus T has an eigenvalue.

Page 140

Last revised: November 3, 2012

Topic 21

Inner Products and


Norms
Note for Next Year
Move material from chapter 5 here and integrate them more closely.
Definition 21.1 Let V be a finite dimensional non-zero vector space over
F. An inner product on V is a function
hu, vi : V V 7 F

(21.1)

with the following properties:


1. Positivity: For all v V,
hv, vi 0

(21.2)

hv, vi = 0 v = 0

(21.3)

2. Definiteness:
3. Additivity in first variable: for all u, v, w V,
hu + v, wi = hu, wi + hv, wi

(21.4)

4. Homogeneity in first variable: for all v, w V and all a F,


hav, wi = a hv, wi

(21.5)

5. Conjugate Symmetry: for all v, w V,


hv, wi = hw, vi

2012. Draft of: November 3, 2012

(21.6)
Page 141

Math 462

TOPIC 21. INNER PRODUCTS AND NORMS

Definition 21.2 An inner product space is a vector field with an inner


product.
Example 21.1 Let F be the real numbers and V = R2 . Then the dot
product is an inner product, and R2 is an inner product space.
Example 21.2 Let Pm (F) be the set of all polynomials with coefficients in
F. Then for p, q Pm (F)define
Z 1
(21.7)
hp, qi =
p(x)q(x)dx
0

Theorem 21.3 Properties of Inner Products


1. Inner Product with zero:
h0, wi = hw, 0i = 0

(21.8)

2. Additivity in the second slot:


hu, v + wi = hu, vi + hu, wi

(21.9)

3. Congugate Homogeneity in the second slot:


hu, avi = a hu, vi

(21.10)

Proof. Proof of (1). By additivity in the first slot,


h0, wi = h0 + 0, wi = h0, wi + h0, wi

(21.11)

Hence h0, wi = 0. By conjugate symmetry,


hw, 0i = h0, wi = 0 = 0

(21.12)

Proof of (2)
hu, v + wi = hv + w, ui

(conjugate symmetry)

(21.13)

= hv, ui + hw, ui

(additivity in 1st slot)

(21.14)

= hv, ui + hw, ui

(additivity of comp. conjugate)

(21.15)

= hu, vi + hu, wi

(conjugate symmetry)

(21.16)

Proof of (3)
hu, avi = hav, ui

(conjugate symmetry)

(21.17)

= a hv, ui

(homgeneity in first slot)

(21.18)

= ahv, ui

(property of complex numbers)

(21.19)

= a hu, vi

(conjugate symmetry)

Page 142

Last revised: November 3, 2012

TOPIC 21. INNER PRODUCTS AND NORMS

Math 462

Definition 21.4 Let v V be a vector. Then the norm of the vector is


p
kvk = hv, vi
(21.20)
Example 21.3 The Euclidean norm is

k(z1 , . . . , zn )k = z1 z1 + + zn zn

(21.21)

Example 21.4 Let p Pm (F); then


s
Z
kpk =

(21.22)

|p(x)|2 dx

Theorem 21.5 kvk = 0 v = 0


Proof. This follows from the definiteness of the inner product.
Theorem 21.6 kavk = |a|kvk where a F and v V(F).
Proof.
kavk2 = hav, avi = a hv, avi = aa hv, vi = |a|2 kvk2

(21.23)

Definition 21.7 Two vectors are called orthogonal if hu, vi = 0.


Theorem 21.8 Pythagorean Theorem If u, v are orthogonal vectors
then
ku + vk2 = kuk2 + kvk2
(21.24)
Proof.
ku + vk2 = hu + v, u + vi

(21.25)

= hu, u + vi + hv, u + vi

(21.26)

= hu, ui + hu, vi + hv, ui + hv, vi

(21.27)

= kuk + kvk

(21.28)

where the last step follows becaue hv, ui = hu, vi = 0, by orthogonality.


Definition 21.9 An orthogonal decomposition of a vector is a decomposition of the vector as a sum of orthogonal vectors:
v = v1 + v2 + + vn

(21.29)

where hvi , vj i = 0 unless i = j.


Last revised: November 3, 2012

Page 143

Math 462

TOPIC 21. INNER PRODUCTS AND NORMS

Example 21.5 In Euclidean space R3 we can decompose a vector into


components parallel to the three Cartesian axes:
(x, y, z) = (x, 0, 0) + (0, y, 0) + (0, 0, z)

(21.30)

Example 21.6 In Euclidean space we commonly decompose a vector u


into a part that is parallel to a second vector v and one that is orthogonal
to v. These are given by the dot product (the parallel component) and the
original vector minus the parallel component:
perpendicular to v

z
}|
{
u = (v u)v + (u (v u)v)
| {z }

(21.31)

parallel to v

We would like to generalize example 21.6 to general vector spaces:


u = av + (u av)

(21.32)

where av is the part of u that is parallel to v and (u av) is the part


that is orthogonal to v. (By parallel we mean a scalar multiple of v.)
To force orthogonality of the second part of equation 21.32 to v means
0 = hu av, vi = hu, vi + hav, vi = hu, vi akvk2
Thus
a=

hu, vi
kvk2

(21.33)

(21.34)

Hence from equation 21.32 we have the following.


Theorem 21.10 Orthogonal Decomposition. Let u, v 6= 0 be vectors.
Then u can be decomposed into the sum of a scalar multiple of v and an a
vector that is orthogonal to v as follows:
orthogonal to v

z

}|
{
hu, vi
hu, vi
v + u
v
u=
kvk2
kvk2
| {z }

(21.35)

parallel to v

The first term in equation 21.35 is a scalar multiple of v (hence parallel to


it) and the second term, by construction, is orthogonal to v.

Page 144

Last revised: November 3, 2012

TOPIC 21. INNER PRODUCTS AND NORMS

Math 462

Theorem 21.11 Cauchy-Schwarz Inequality. Let u, v V be vectors.


Then
| hu, vi | kukkvk
(21.36)
Proof. If v = 0 then both sides are zero and the result holds. So assume
that v 6= 0.
By the orthogonal decomposition theorem (theorem 21.10) we can write


hu, vi
hu, vi
hu, vi
v+ u
v =
v+w
(21.37)
u=
kvk2
kvk2
kvk2
where w is orthogonal to v. By the Pythagorean Theorem (theorem 21.8),
2

hu, vi


v + w
kuk =

2
kvk

2
hu, vi
2

=
kvk2 v + kwk
2

(21.38)
(21.39)

| hu, vi |2
+ kwk2
kvk2
| hu, vi |2

kvk2

(21.40)
(21.41)

Multiplying through by kvk2 gives


kuk2 kvk2 | hu, vi |2

(21.42)

Taking square roots gives equaton 21.36.


Example 21.7 Using the inner product of example 21.2, the CauchySchwarz ineqality tells us that
Z


2

p(x)q(x)dx = | hp, qi |2

(21.43)

kpk2 kqk2
Z 1
 Z
=
|p(x)|2 dx
0

Last revised: November 3, 2012

(21.44)
1

|q(x)|2 dx


(21.45)

Page 145

Math 462

TOPIC 21. INNER PRODUCTS AND NORMS

Theorem 21.12 Triangle Inequality. Let u, v V. Then


ku + vk kuk + kvk

(21.46)

Proof.
2

ku + vk = hu + v, u + vi

(21.47)

= hu, ui + hu, vi + hv, ui + hv, vi

(21.48)

(because Re(z) |z|) (21.51)

(Cauchy-Schwarz Inequality)
(21.52)

= kuk + kvk + hu, vi + hu, vi

(Conjugate Symmetry)
(21.49)

= kuk + kvk + 2 Re(hu, vi)


kuk + kvk + 2| hu, vi |
kuk + kvk + 2 kuk kvk

(because Re(z) = (z + z)/2)


(21.50)

= (kuk + kvk)2

(21.53)

Taking square roots of both sides gives the triangle inequality.


Combining theorem 21.12 with theorems theorems 21.5 and 21.6 tells us
that the definition of a norm given by definition 21.4 is consistent with the
definition of a norm we gave earlier in definition 5.8.
Theorem 21.13 Parallelogram Inequality. If u, v V, then


2
2
2
2
ku + vk + ku vk = 2 kuk + kvk

(21.54)

Proof.
2

ku + vk + ku vk = hu + v, u + vi + hu v, u vi
2

(21.55)

= kuk + kvk + hu, vi + hv, ui


2

+ kuk + kvk hu, vi hv, ui




2
2
= 2 kuk + kvk

(21.56)
(21.57)

Taking square roots gives the desired result.

Page 146

Last revised: November 3, 2012

Topic 22

Fixed Points of Operators


Fixed Points of Functions
Before we look at fixed points of operators, we review the analogous concept
of fixed points for functions on R. Then we will generalize from functions
on R to operators on a vector space V.
Definition 22.1 Let f : R 7 R. A number a R is called a fixed point of
f if f (a) = a.
Example 22.1 Find the fixed points of the function f (x) = x4 +2x2 +x3.
x = x4 + 2x2 + x 3
0 = x4 + 2x2 3
= (x 1)(x + 1)(x2 + 3)
Hence the real fixed points are x = 1 and x = 1.
A function f : R 7 R has a fixed point if and only if its graph intersects
with the line y = x. If there are multiple intersections, then there are
multiple fixed points. Consequently a sufficient condition is that the range
of f is contained in its domain (see figure 22.1).
Theorem 22.2 (Sufficient condition for fixed point) Suppose that f (t) is
a continuous function that maps its domain into a subset of itself, i.e.,
f (t) : [a, b] 7 S [a, b]

(22.1)

Then f (t) has a fixed point in [a, b].

2012. Draft of: November 3, 2012

Page 147

Math 462

TOPIC 22. FIXED POINTS OF OPERATORS

Figure 22.1: A sufficient condition for a bounded continuous function to


have a fixed point is that the range be a subset of the domain. A fixed
point occurs whenever the curve of f (t) intersects the line y = t.

a
a

Proof. If f (a) = a or f (b) = b then there is a fixed point at either a or b.


So assume that both f (a) 6= a and f (b) 6= b. By assumption, f (t) : [a, b] 7
S [a, b], so that
f (a) a and f (b) b
(22.2)
Since both f (a) 6= a and f (b) 6= b, this means that
f (a) > a and f (b) < b

(22.3)

Let g(t) = f (t) t. Then g is continuous because f is continuous, and


furthermore,
g(a) = f (a) a > 0

(22.4)

g(b) = f (b) b < 0

(22.5)

Hence by the intermediate value theorem, g has a root r (a, b), where
g(r) = 0. Then
0 = g(r) = f (r) r = f (r) = r
(22.6)
i.e., r is a fixed point of f .
In the case just proven, there may be multiple fixed points. If the derivative
is sufficiently bounded then there will be a unique fixed point.
Theorem 22.3 (Condition for a unique fixed point) Let f be a continuous
function on [a, b] such that f : [a, b] 7 S (a, b), and suppose further that
there exists some postive constant K < 1 such that
|f 0 (t)| K,
Page 148

t [a, b]

(22.7)

Last revised: November 3, 2012

TOPIC 22. FIXED POINTS OF OPERATORS

Math 462

Then f has a unique fixed point in [a, b].


Proof. By theorem 22.2 a fixed point exists. Call it p,
p = f (p)

(22.8)

Suppose that a second fixed point q [a, b], q 6= p also exists, so that
q = f (q)

(22.9)

|f (p) f (q)| = |p q|

(22.10)

Hence
By the mean value theorem there is some number c between p and q such
that
f (p) f (q)
f 0 (c) =
(22.11)
pq
Taking absolute values,


f (p) f (q)

= |f 0 (c)| K < 1
pq

(22.12)

and thence
|f (p) f (q)| < |p q|

(22.13)

This contradicts equation 22.10. Hence our assumption that a second,


different fixed point exists must be incorrect. Hence the fixed point is
unique.
Theorem 22.4 Let f be as defined in theorem 22.3, and p0 (a, b). Then
the sequence of numbers

p1 = f (p0 )

p2 = f (p1 )

..
.
(22.14)

pn = f (pn1 )

..

.
converges to the unique fixed point of f in (a, b).
Proof. We know from theorem 22.3 that a unique fixed point p exists. We
need to show that pi p as i .
Since f maps onto a subset of itself, every point pi [a, b].
Last revised: November 3, 2012

Page 149

Math 462

TOPIC 22. FIXED POINTS OF OPERATORS

Further, since p itself is a fixed point, p = f (p) and for each i, since pi =
f (pi1 ), we have
|pi p| = |f (pi1 ) f (p)|
(22.15)
If for any value of i we have pi = p then we have reached the fixed point
and the theorem is proved.
So we assume that pi 6= p for all i.
Then by the mean value theorem, for each value of i there exists a number
ci between pi1 and p such that
|f (pi1 ) f (p)| = |f 0 (ci )||pi1 p| K|pi1 p|

(22.16)

where the last inequality follows because f is bounded by K < 1 (see


equation 22.7).
Substituting equation 22.15 into equation 22.16,
|pi p| = |f (pi1 ) f (p)| K|pi1 p|
Restating the same result with i replaced by i 1, i 2, . . . ,

|pi1 p| = |f (pi2 ) f (p)| K|pi2 p|

|pi2 p| = |f (pi3 ) f (p)| K|pi3 p|

|pi4 p| = |f (pi4 ) f (p)| K|pi4 p|


..

|p2 p| = |f (p1 ) f (p)| K|p1 p|

|p1 p| = |f (p0 ) f (p)| K|p0 p|

(22.17)

(22.18)

Putting all these together,


|pi p| K 2 |pi2 p| K 3 |pi2 p| K i |p0 p|

(22.19)

Since 0 < K < 1,


0 lim |pi p| |p0 p| lim K i = 0
i

(22.20)

Thus pi p as i .
A weaker condition that is sufficient for convergence is the Lipshitz condition.
Definition 22.5 A function f on I R is said to be Lipshitz (or Lipshitz continuous, or satisfy a Lipshitz condition) on y if there exists
some constant K > 0 if for all x I then
|f (x1 ) f (x2 )| K|x1 x2 |

(22.21)

The constant K is called a Lipshitz Constant for f .


Page 150

Last revised: November 3, 2012

TOPIC 22. FIXED POINTS OF OPERATORS

Math 462

Theorem 22.6 Under the same conditions as theorem 22.4 except that
the condition of equation 22.7 is replaced with the following condition:
f (t) is Lipshitz with Lipshitz constant K < 1. Then fixed point iteration
converges.
Proof. The Lipshitz condition gives equation 22.16 immediately. The rest
of the the proof follows as before.

Fixed Points of Operators


Now we turn to a discussion of fixed points of an operator on a general
vector space.
Definition 22.7 Let V be a vector space over F and let T be an operator
on V. Then we say v is a fixed point of T if T v = v.
To get a condition for fixed points of operators we will need something analogous to the Lipshitz condition, but on vector spaces. First, we look at the
generalization of the Lipshitz condition can be generalized to a multivariate
function f (x1 , x2 , . . . ). The Lipshitz condition can either be applied to any
single argument of f , or to the norm of the entire vector of arguments.
If we apply it to a single variable, we will assume this variable in the final
slot of f and call it y, though it could be anywhere. We state it for a twovariable function, but in fact, we could replace t in the following definition
with t1 , t2 , ....
Definition 22.8 A function f (t, y) on D is said to be Lipshitz in the
variable y if there exists some constant K > 0 if for all (x, y1 ), (x, y2 ) D
then
|f (x, y1 ) f (x, y2 )| K|y1 y2 |
(22.22)
The constant K is called a Lipshitz Constant for f . We will sometimes
denote this as f Lip(y; K)(D).
Theorem 22.9 Suppose that |f /y| is bounded by K on a set D. Then
f (t, y) Lip(y; K)(D).
Proof. Let (t, y1 ), (t, y2 ) D. Then by the mean value theorem, there is
some number c between y1 and y2 such that
|f (t, y1 ) f (t, y2 )| = |fy (c)||y1 y2 | < K|y1 y2 |

(22.23)

Hence f is Lipshitz in y on D.
Last revised: November 3, 2012

Page 151

Math 462

TOPIC 22. FIXED POINTS OF OPERATORS

The generalization of the Lipshitz condition to using norm of the vector of


arguments is called a contraction. If we think of an operator as a function
of a single variable (v), rather than a function of all the components of that
vector (e.g., (ve1 , . . . , ven )) then the definition of a contraction is identical
to the definition of the Lipshitz condition on a function of a single variable,
with absolute value replaced by vector norm.
Definition 22.10 Let V be a normed vector space, S V, and v, w S.
Then a contraction is any mapping T : S 7 V that satisfies
kT v T wk Kkv wk

(22.24)

form some K R, 0 < K < 1, for all v, w S. We will call the number K
the contraction constant.
Definition 22.11 Let V be a vector space over F and let v1 , v2 , . . . be a
sequence in V. Then we say that the sequence is Cauchy if kvm vn k 0
as n, m . More precisely, the sequence is Cauchyl if
( > 0)(N > 0)(m, n > N, m, n Z)(kvm vn k < )

(22.25)

The study of Complete spaces and Cauchy sequences is beyond the scope
of this class. We will just assume that we are working in a vector space in
which Cauchy sequences converge.
Definition 22.12 Let V be a vector space over F. Then we say that V is
complete if every Cauchy sequence in V converges to some element of V.
Lemma 22.13 Let T be a contraction on a complete normed vector space
V with contraction constant K. Then for any v V
kT n v vk

1 Kn
kT v vk
1K

(22.26)

Proof. Use induction. For n = 1, the formula gives


kT v vk

1K
kT v vk
1K

(22.27)

which is trivially true.


As our inductive hypothesis choose any n > 1 and suppose that
kT n v vk

1 Kn
kT v vk
1K

(22.28)

1 K n+1
kT v vk
1K

(22.29)

is true. We need to show that


kT n+1 v vk
Page 152

Last revised: November 3, 2012

TOPIC 22. FIXED POINTS OF OPERATORS

Math 462

By the triangle inequality,


kT n+1 v vk kT n+1 v T n vk + kT n v vk

(22.30)

Applying the inductive hypothesis to the second term,


kT n+1 v vk kT n+1 v T n vk +

1 Kn
kT v vk
1K

(22.31)

Since T is a contraction with contraction constant K, we can rewrite the


first term as
kT n+1 v T n vk = kT (T n v) T (T n1 v)k
n

n1

(22.32)

Kk(T v) (T
v)k
..
. (repeating the step n times)

(22.33)

K n kT v v|

(22.34)

Substituting (22.34) into (22.31) gives


1 Kn
kT v vk
1K
(1 K)K n + (1 K n )
=
kT v vk
1K
1 K n+1
=
kT v vk
1K

kT n+1 v vk K n kT v vk +

(22.35)
(22.36)
(22.37)

which is exactly equation (22.29), as we needed to prove.


Theorem 22.14 Contraction Mapping Theorem1 Let T be a contraction on a normed vector space V with contraction constant K < 1. Then
T has a unique fixed point u V such that T u = u. Furthermore, any sequence of vectors v1 , v2 , . . . defined by vk = T vk1 converges to the unique
fixed point T u = u. We denote this by vk u, or write limk vk = u
Proof. 2 We need to prove three things: (1) that the sequence converges;
(2) that the limit of the sequence is a fixed point; and (3) that the fixed
point is unique.
1. Proof of Convergence. Let  > 0 be given and let v V.
1 The contraction mapping theorem is sometimes called the Banach Fixed Point
Theorem.
2 The proof follows Proof of Banach Fixed Point Theorem, Encyclopedia of Mathematics (Volume 2, 54A20:2034), PlanetMath.org.

Last revised: November 3, 2012

Page 153

Math 462

TOPIC 22. FIXED POINTS OF OPERATORS

Since K < 1, K n /(1 K) 0 as n , given any v V, it is possible


to choose an integer N such that
K n kT v vk
<
1K

(22.38)

for all n > N . Pick any such integer N .


Choose any two integers m n N , and define the sequence

v0 = v

v1 = T v

v2 = T v1

..
.

vn = T vn1

..

(22.39)

Since T is a contraction,
kvm vn k = kT m v T n vk

(22.40)

m1

v) T (T

m1

n1

= kT (T
KkT
..

vT

n1

v)k

vk

K n kT mn v vk

(22.41)
(22.42)

(22.43)

From Lemma 22.13 we have


1 K mn
kT v vk
1K
n
m
K K
=
kT v vk
1K
Kn
kT v vk < 

1K

kvm vn k K n

(22.44)
(22.45)
(22.46)

where the last step follows from (22.38).


Therefore vn is a Cauchy sequence, and every Cauchy sequence on a complete normed vector space converges. Hence vn u for some u V.
2. Proof that the limit is a fixed point. Either u is a fixed point of T or it
is not a fixed point of T .
Suppose that u is not a fixed point of T . Then T u 6= u, so there exists
some > 0 such that
kT u uk >
(22.47)
Page 154

Last revised: November 3, 2012

TOPIC 22. FIXED POINTS OF OPERATORS

Math 462

On the other hand, because vn u, there exists an integer N such that


for all n > N ,

kvn uk <
(22.48)
2
By the triangle inequality,
kT u uk = kT u vn+1 + vn+1 uk

(22.49)

kT u vn+1 k + kvn+1 uk

(22.50)

kT u uk kT u T vn k + ku vn+1 k

(22.51)

Since vn+1 = T vn ,

Since T is a contraction, kT u T vn k Kku vn k, with K < 1, so that


kT u uk Kku vn k + ku vn+1 k

(22.52)

ku vn k + ku vn+1 k

(22.53)

2ku vn k

(22.54)

<

(22.55)

This contradicts (22.47). Hence u must be a fixed point of T .


3. Proof of Uniqueness. To prove uniqueness, suppose that there is another
fixed point w 6= u.
Then kw uk > 0 (otherwise they are equal). Since both w and u are
fixed points, T w = w and T u = u. Furthermore, since T is a contraction
with contraction constant K < 1,
ku wk = kT u T wk Kku wk < ku wk

(22.56)

which contradicts kw uk > 0.


Thus u is the unique fixed point of T .

Last revised: November 3, 2012

Page 155

Math 462

TOPIC 22. FIXED POINTS OF OPERATORS

Application to Differential Equations


Lemma 22.15 Let V be the vector space consisting of integrable functions
on an interval (a, b), and let f V. Then the sup-norm defined by
kf k = sup{|f (x)| : x (a, b)}

(22.57)

is a norm.
The proof is left as an exercise.
The notation for the sup-norm comes from the p-norm,
!1/p
Z
b

|f (x)|p dx

(22.58)

lim kf kp = kf k

(22.59)

kf kp =

then
p

See any text on analysis for a discussion of this.


Theorem 22.16 Existence of Solutions to the Initial Value Problem. Let D R2 be convex and suppose that f is continuously differentiable on D. Then the initial value problem
y 0 = f (t, y),

y(t0 ) = y0

(22.60)

has a unique solution (t) in the sense that 0 (t) = f (t, (y)), (t0 ) = y0 .
Proof. We begin by observing that is a solution of equation 22.60 if and
only if it is a solution of
Z t
(t) = y0 +
f (x, (x))dx
(22.61)
t0

Our goal will be to prove 22.61.


Let V be the set of all continuous integrable functions on an interval (a, b)
that contains t0 .
Define the map T L(V): 7 T by
Z t
T = y0 +
f (x, (x))dx

(22.62)

t0

We will assume b t > t0 a. The proof for t < t0 is completely


analogous.
Page 156

Last revised: November 3, 2012

TOPIC 22. FIXED POINTS OF OPERATORS

Math 462

We will use the sup-norm on (a, b). Let g, h V.


kT g T hk = sup |T g T h|

(22.63)

atb



Z t
Z t



f (x, h(x))dx
f (x, g(x))dx y0
= sup y0 +
atb
t0

t0

(22.64)

Z t



[f (x, g(x)) f (x, h(x))] dx
= sup
atb

(22.65)

t0

Since f is continuously differentiable it is differentiable and its derivative


is continuous. Thus the derivative is bounded (otherwise it could not be
continuous on all of (a, b)). Therefore by theorem 22.9, it is Lipshitz in its
second argument. Consequently there is some K R such that

Z t



(22.66)
g(x) h(x)dx
kT g T hk L sup
atb

t0
t

L sup
atb

|g(x) h(x)| dx

(22.67)

t0

K(t t0 ) sup |g(x) h(x)|

(22.68)

atb

K(b a) sup |g(x)) h(x)|

(22.69)

atb

K(b a) kg hk

(22.70)

Since K is fixed, so long as the interval (a, b) is larger than 1/K we have
kT g T hk K 0 kg hk

(22.71)

K 0 = K(b a) < 1

(22.72)

where
Thus T is a contraction. By the contraction mapping theorem it has a fixed
point; call this point . Equation 22.61 follows immediately.

Last revised: November 3, 2012

Page 157

Math 462

TOPIC 22. FIXED POINTS OF OPERATORS

Page 158

Last revised: November 3, 2012

Topic 23

Orthogonal Bases
Definition 23.1 Kronecker Delta Function.1

1, if i = j
ij =
0, if i 6= j

(23.1)

Definition 23.2 A list of vectors B = (v1 , . . . , vn ) is called orthonormal


if kvi k = 1 and hvi , vj i = 0 for i 6= j, i.e.,
hvi , vj i = ij

(23.2)

If B is also a basis of V then it is called an orthonormal basis of V.


Theorem 23.3 Let B = (e1 , . . . , em ) be an orthonormal list of vectors in
V. Then
2

ka1 e1 + a2 e2 + + am em k = |a1 |2 + |a2 |2 + + |am |2

(23.3)

for all a1 , a2 , F.
Proof. This follows immediately from the Pythagorean theorem.
Theorem 23.4 Let B = (e1 , . . . , em ) be and orthonormal list of vectors in
V. Then B is linearly independent.
Proof. Suppose there exist a1 , . . . , am F such that
0 = a1 e1 + + am em
1 Named

(23.4)

for Leopold Kronecker (1823-1891).

2012. Draft of: November 3, 2012

Page 159

Math 462

TOPIC 23. ORTHOGONAL BASES

Then by theorem 23.3


0 = ka1 e1 + + am em k = |a1 |2 + + |am |2

(23.5)

Hence a1 = a2 = = am = 0, and therefore B is linearly independent.


Theorem 23.5 Let B = (e1 , . . . , en ) be an orthonormal basis of V. Then
both of the following are true for every v V:
v = hv, e1 i e1 + + hv, en i en

(23.6)

(23.7)

kvk = | hv, e1 i |2 + + | hv, en i |2


Proof. Equation 23.7 follows from equation 23.6 and theorem 23.3.

To prove equation 23.6 pick any v V. Since B is a basis of V there exist


scalars a1 , . . . , an such that
v = a1 e1 + + an en
Hence
hv, ej i =

n
X

ai hei , ej i =

i=1

n
X

ai ij = aj

(23.8)

(23.9)

i=1

Substituting equation 23.9 into equation 23.8 for each aj gives equation
23.6.
Theorem 23.6 Gram-Schmidt Orthonormalization Procedure. Let
A = (v1 , . . . , vn ) be a linearly independent list of vectors in V. Then there
exists an orthonormal list of vectors B = (e1 , . . . , en ) in V such that
span(v1 , . . . , vj ) = span(e1 , . . . , ej ) j = 1, . . . , n

(23.10)

Proof. The proof is constructive. We start by defining


e1 =

v1
kv1 k

(23.11)

and then define ej recursively for j > 1 from the e1 , . . . , ej1 . Clearly
ke1 k = 1.
To illustrate the general form we construct the first few. We define e2 by
normalizing2 the part of v2 that is orthogonal to e1 :
e2 =

v2 hv2 , e1 i e1
kv2 hv2 , e1 i e1 k

(23.12)

2 When

we say we are normalizing a vector v, we mean that we are constructing a


vector of unit length that is parallel to v.

Page 160

Last revised: November 3, 2012

TOPIC 23. ORTHOGONAL BASES

Math 462

which by construction satisfies ke2 k = 1 and he2 , e1 i = 0. Furthermore,


span(v1 , v2 ) = span(e1 , e2 ).
Next, we obtain e3 by normalizing the part of v3 that is orthogonal to both
e1 and e2 :
v3 hv3 , e1 i e1 hv3 , e2 i e2
(23.13)
e3 =
kv3 hv3 , e1 i e1 hv3 , e2 i e2 k
Again, by construction, ke3 k = 1, he3 , e1 i = he3 , e2 i = 0 and span(e1 , e2 , e3 ) =
span(v1 , v2 , v3 ).
In general we construct each ej by normalizing the part of vj that is orthogonal to all of the e1 , e2 , ..., ej1 ,
vj
ej =


vj

j1
X
i=1
j1
X
i=1

hvj , ei i ei



hvj , ei i ei

To prove that span(v1 , . . . , vn ) span(e1 , . . . , en ) let


X
v=
ai vi

(23.14)

(23.15)

and substitute to get an expression for v in terms of the ei .


P
To show that span(e1 , . . . , en ) span(v1 , . . . , vn ) suppose that v =
bi ei
and rearrange to get an expression in terms of the vi . The exact calculation
is left as an exercise. This proves that the two spans are equal.
Corollary 23.7 Let V be a finite-dimensional inner-product space. Then
V has an orthonormal basis.
Proof. Pick any basis B of V. Since B is linearly independent, we can find
an orthonormal list of vectors B 0 with the same span, by the Gram-Schmidt
orthonormalization process. Since B 0 spans V and is orthonormal, it is an
orthonormal basis.
Corollary 23.8 Let V be a finite dimensional inner-product space of dimension n and let B = (b1 , . . . , bm ) be an orthornormal list of vectors in
V. Then B can be extended to an orthonormal basis
E = (b1 , . . . , bm , v1 , . . . , vnm )

(23.16)

of V. In particular, if U is a substace of V with an orthonormal basis B,


then B can be extended to a basis of V.
Last revised: November 3, 2012

Page 161

Math 462

TOPIC 23. ORTHOGONAL BASES

Corollary 23.9 Let V be a finite dimensional inner-product space; let


T L(V) be an operator on V; and suppose that there exists a basis B of
V such that M(T, B) is upper-triangular. Then there exists an orthonormal
basis B 0 of V such that M(T, B 0 ) is upper-triangular.
Corollary 23.10 Let V be a complex vector space and let T L(V). Then
T has an upper triangular matrix with respect to some orthonormal basis
of V.
Proof. From theorem 18.3, there exists some basis under which M(T ) is
uper triangular.
By Corollary 23.9 M(T ) is upper triangular with respect to some orthonormal basis.
Definition 23.11 Let U V. Then the orthogonal complement of U
is the set of all vectors in V that are orthogonal to all vectors in U :
U = {v V| hv, ui = 0, u U }

(23.17)

Remark 23.12 Some properties of orthogonal complements:


1. V = {0}
2. {0} = V
3. U W = W U
Theorem 23.13 Let U be a subspace of V. Then V = U U .
Proof. Let v V and let B = (e1 , . . . , en ) be a orthonormal basis of U.
Then
v =u+vu
(23.18)
where
u = hv, e1 i e1 + + hv, en i en

(23.19)

w =vu

(23.20)

v =u+w

(23.21)

Define
so that
Since B is a basis of U, u U, and
hw, ej i = hv, ej i hu, ej i = hv, ej i hv, ej i = 0

(23.22)

D X
E X
hw, ui = w,
hv, ei i ei =
hv, ei i hw, ei i = 0

(23.23)

Therefore

Page 162

Last revised: November 3, 2012

TOPIC 23. ORTHOGONAL BASES

Math 462

Hence w u for all u span(B) and thus w U .


Thus v = u + w where u U and w U . Hence V = U + U .
To show that the sum is actually a direct sum, let v U U .
Since v U U , v U
Since v U U , v U . Thus v is orthogonal to every vector in U.
In particular, since v U, v is orthogonal to itself, which means that
hv, vi = 0. Hence v = 0.
Therefore
U U = {0}

(23.24)

Hence by theorem 10.8 V = U U .


Corollary 23.14 If U is a subspace of V then U = (U ) .
Definition 23.15 Suppose that U is a subspace of V, and v = u + w,
where u U and w U . Then the orthogonal projection of V onto U
is given by the operator PU where PU v = u. (In the notation of definition
20.2, PU = PU ,U .)
Remark 23.16 Properties of the Orthogonal Projection. Let U be a subspace of V. Then
1. range PU = U
2. null PU = U
3. v PU v U for every v V.
4. PU2 = PU
5. kPU vk kvk for every v V.
Theorem 23.17 Let U be a subspace of V and let v V. Then
kv PU vk kv uk

(23.25)

for every u U. (This is equivalent to the statement that the shortest


path from a point to a line is a perpendicular line segment dropped from
the point to the line).
2

Proof. Since kPU v uk 0 we use the property of real numbers that


a a + b for all positive numbers b to write
2

kv PU vk kv PU vk + kPU v uk

(23.26)

v PU v U

(23.27)

PU v u U

(23.28)

But

Last revised: November 3, 2012

Page 163

Math 462

TOPIC 23. ORTHOGONAL BASES

Therefore the vectors v PU v and PU v u are orthogonal.


Pythagorean theorem, this means that
2

kv PU vk + kPU v uk = kv PU v + PU v uk
2

= kv uk

By the

(23.29)
(23.30)

Substituting (23.30) into (23.26) gives (23.25) as required.

Page 164

Last revised: November 3, 2012

Topic 24

Fourier Series
Theorem 24.1 Let V be the set of all integrable functions f : [a, b] C
and let k(x) be any positive real-valued function on [a, b]. Then V is a
normed inner product space with inner product
Z b
hf, gi =
(24.1)
k(x)f (x)g(x)dx
a

We observe that V is an infinite-dimensional vector space (consider the


set of functions {1, x, x2 , x3 , . . . }, which is linearly independent). We can
extend our definition of a basis to infinite dimensions as follows.
Definition 24.2 Let V be an infinite dimensional vector space. Then a
sequence f0 , f1 , f2 , V is called a complete basis for V if, for every
f V, there exists a sequence of constants c0 , c1 , C such that
f=

ci fi = c0 f0 + c1 f1 + c2 f2 +

(24.2)

k=0

and is called a complete orthonormal basis if hfi , fj i = ij .1


Example 24.1 Let V be the set of integrable functions on [1, 1] and consider the sequence of functions 1, x, x2 , x3 , . . . . Then by Taylors theorem,
for any f V, such that f is infinitely differentiable, there exists a sequence
of constants a0 , a1 , . . . given by
ak =
1 Here


ij =

1, if i = j
0, if i 6= j

f (k) (0)
k!

(24.3)

is the Kroneker delta function.

2012. Draft of: November 3, 2012

Page 165

Math 462

TOPIC 24. FOURIER SERIES

such that
f (x) =

ak fk

(24.4)

k=0

Hence the sequence 1, x, x2 , ... is a complete basis of V.


Given a complete basis for a vector space along with an inner product and
its associated norm, we can use the Gram-Schmidt process to create an
orthogonal basis for the space.
Example 24.2 Using the inner product
1

Z
hf, gi =

f (x)g(x)dx

(24.5)

on the real vector space defined in the previous example, use the GramSchmidt process to find an orthogonal basis from the complete basis 1, x, x2 , . . .
Denote the original basis by fj = xj , j = 0, 1, 2, . . . the orthogonal basis by
pj , and the normalized basis by qj . Then since
1

kf0 k = hf0 , f0 i =

dx = 2

(24.6)

we can define p0 = f0 = 1 and


1
p0
=
kp0 k
2

q0 =

(24.7)

Next we calculate
1
hf1 , q0 i =
2

x2 dx = 0

(24.8)

and thus
p1 = f1 hf1 , q0 i q0 = f1 = x
2

kp1 k =

x2 dx =

q1 =
Page 166

p1
=
kp1 k

3
x
2

2
3

(24.9)

(24.10)

(24.11)

Last revised: November 3, 2012

TOPIC 24. FOURIER SERIES

Math 462

Next,
p2 = f2 hf2 , q0 i q0 hf2 , q1 i q1



Z 1
1
2
2
1
dx = =
hf2 , q0 i =
(x2 )
3
3
2
2
1
r !
Z 1
3
2
xdx = 0 (odd function)
hf2 , q1 i =
(x )
2
1
r

2
1
3
1
2
p2 = x
0
x = x2
3
2
3
2
2
Z 1
1
8
2
2
kp2 k =
x
dx =
3
45
1
r
2 2
kp2 k =
3 5
r 

p2
3 5
1
q2 =
=
x2
kp2 k
2 2
3

(24.12)
(24.13)
(24.14)
(24.15)
(24.16)
(24.17)
(24.18)

and so forth.
Remark 24.3 The sequence of orthonormal functions generated in the previous example are related to the Legendre polynomials, which are solutions
of the initial value problem


d
d
(24.19)
(1 x2 ) Pn (x) + n(n + 1)Pn (x) = 0, Pn (1) = 1
dx
dx
in other words, they are eigenfunctions of the operator T L(V), where V
is the vector space of functions on [1, 1], given by
T f = [(1 x)2 f 0 ]0

(24.20)

with eigenvalues n(n + 1). It turns out the eigenfunctions of certain differential operators will always produce orthogonal bases. See any book on
boundary value problems or the Sturm-Liouville operator for more details.

Last revised: November 3, 2012

Page 167

Math 462

TOPIC 24. FOURIER SERIES

Table of first several Legendre Polynomals, solutions to Legendres equation


(equation 24.19). The Legendre Polynomials can be generated recursively
via the relation (n + 1)Pn+1 (x) = (2n + 1)xPn (x) nPn1 x.
n

Pn (x)

1
2


3x2 1

1
2

5x3 3x

1
8

35x4 30x2 + 3

1
8

63x5 70x3 + 15x





Theorem 24.4 Let V be an infinite dimensional real-valued function space


on an interval I R and let 0 , 1 , . . . be a complete orthonormal basis on
V. Then for any f V, the Generalized Fourier Series of f is given by
f

hf, i i i

(24.21)

i=0

where the notation is used instead of = because convergence may not


occur at a countable number of points.
Proof. Since 0 , 1 , . . . is a complete basis, there are some numbers ci R
such that
f=
hf, j i =

X
i=0

X
i=0

ci i

(24.22)

ci hi , j i =

ci ij = cj

(24.23)

i=0

Plugging the second equation into the first gives the desired result.
Remark 24.5 In the previous
P theorem we overlooked what we mean by
convergence of the series k=0 ck k . This is a subtle point that we will not
concern ourselves with in this class. In particular, the convergence of the
series only satisfies the concept of convergence in the mean, namely,
that

X



ck k sn 0 as n
(24.24)



k=0

Page 168

Last revised: November 3, 2012

TOPIC 24. FOURIER SERIES

Math 462

The consequence is that the equality may not hold at a countable number
of points, in the sense that at any point x0 , equation 24.21 really means

 X
1

f (x+
hf, i i i (x0 )
0 ) + f (x0 ) =
2
i=0

(24.25)

Example 24.3 Consider the space of integrable functions on [, ] with


an inner product given by
hf, gi =

1
2

f (x)g(x)dx

(24.26)

Then the set of functions k = eikx , k = 0, 1, 2 . . . is a orthonormal


because
Z
1
hn , n i =
dx = 1
(24.27)
2
and for n 6= m
Z
1
einx eimx dx
2
Z
1
ei(nm)x dx
=
2

hn , m i =



1
1
ei(nm)x
2 i(n m)

h
i
1
i(nm)
=
e
ei(nm)
2i(n m)
1
=
sin(n m) = 0
(n m)
=

(24.28)
(24.29)
(24.30)
(24.31)
(24.32)

The Generalized Fourier Series with respect to this basis2 is

ck eikx

(24.33)

k=

where
ck =

1
2

f (x)eikx dx

(24.34)

2 We

havent actually shown that j form a basis, only that they are orthonormal. To
show that it is a basis we have to show that it spans the space.

Last revised: November 3, 2012

Page 169

Math 462

TOPIC 24. FOURIER SERIES

Example 24.4 Repeat the previous example with the set of functions
1 sin kx cos jx
k = , , , k, j = 0, 1, 2, . . .

(24.35)

and the inner product


Z

hf, gi =

f (x)g(x)dx

(24.36)

The standard notation is to define


Z
1
f (x) cos kxdx
ak =

Z
1
bk =
f (x) sin kxdx

(24.37)
(24.38)
(24.39)

Then the Fourier series is

a0 X
+
[ak cos kx + bk sin kx]
2

(24.40)

k=1

The coefficients in the Fourier series are called the Fourier Coefficients
cj = hf, j i

(24.41)

The following result tells us that the sum of the Fourier coefficients is
2
bounded by the square of the norm, kf k , in the sense that
X
2
|cj |2 kf k
(24.42)
This is different from a finite-dimensional vector spaces, because we saw
above that for a finite dimensional vector space V with basis e1 , . . . , en , if
we define ai = hv, ei i,
*
+
X
X
X
X
2
|ai |2
kvk = hv, vi =
ai ei ,
aj ej =
ai aj hei , ej i =
i

i,j

(24.43)
There is no reason to necessarily expect equality to hold in the case of the
infinite dimensional space.

Page 170

Last revised: November 3, 2012

TOPIC 24. FOURIER SERIES

Math 462

Theorem 24.6 Bessels Inequality. Let V be an infinite dimensional


real-valued function space on an interval I R and let 0 , 1 , . . . be a
complete orthonormal basis on V. Then for any smooth function f ,

|hf, k i| kf k

(24.44)

k=0

Proof. Let
sn =

n
X

hf, k i k

(24.45)

k=0

Since the k are orthonormal,


* n
+
n
X
X
hsn , i i =
hf, k i k , i =
hf, k i hk , i i = hf, i i
k=0

(24.46)

k=0

Therefore,
hf sn , i i = hf, i i hsn , i i = hf, i i hf, i i = 0

(24.47)

i.e., f sn and i are orthogonal. Hence we can apply the Pythagorean


theorem,
2
2
2
2
kf k = kf sn + sn k = kf sn k + ksn k
(24.48)
2

Solving for ksn k gives


2

ksn k = kf k kf sn k kf k

(24.49)

But
2

ksn k = hsn , sn i
* n
+
n
X
X
=
hf, j i j ,
hf, k i k
j=0

n
X
j=0
n
X
j=0
n
X

hf, j i

(24.50)
(24.51)

k=0
n
X

hf, k i hk , j i

(24.52)

k=0

hf, j i hf, j i

(24.53)

| hf, j i |2

(24.54)

j=0

Last revised: November 3, 2012

Page 171

Math 462

TOPIC 24. FOURIER SERIES

Substituting the right hand side of 24.54 into the the left hand side of
equation 24.49 gives
n
X

| hf, j i |2 = ksn k kf k

(24.55)

j=0

The sequence of partial sums sn is a bounded increasing sequence and hence


it converges. Taking the limit as n gives the desired result.
Though we mentioned before stating Bessels Inequality that there is no
reason that we should expect equality in (24.55). As it turns out, however,
it is possible to extend the proof to prove equality.
Theorem 24.7 Parsevals Theorem Under the same conditions as the
previous theorem,

X
2
2
|hf, k i| = kf k
(24.56)
k=0

Proof. From equation 24.48


2

kf k = kf sn + sn k = kf sn k + ksn k

(24.57)

Since sn f we must also have kf sn k 0. The desired result follows


by taking the limit as n .
The following Best Mean Approximation Theorems tells us that the truncated fourier series is the best approximation to a function among all the
possible linear combinations of basis functions.
Theorem 24.8 Best Mean Approximation Theorem. Under the same
conditions as the previous theorems, the nterm truncation of the Fourier
series is the P
best approximation (in the mean) of all possible expansions
n
of the form
k=0 ak k , in the sense that for any sequence of numbers
a0 , a1 , . . . ,



n
n



X
X



ak k f
hf, k i k
(24.58)
f



k=1

k=1

with equality holding only when


ak = hf, k i for all k = 0, 1, . . . , n
Page 172

(24.59)

Last revised: November 3, 2012

TOPIC 24. FOURIER SERIES

Math 462

Proof. Letf be any function in our vector space and define


ck = hf, k i
n
X
sn =
ck k
tn =

k=0
n
X

(24.60)
(24.61)

ak k

(24.62)

k=0

To verify equation 24.58, we need to show that


kf tn k kf sn k

(24.63)

and that equality only holds when ak = ck for all k.


By the linearity of the inner product,
2

kf tn k = hf tn , f tn i

(24.64)

= hf, f i + htn , tn i hf, tn i htn , f i

(24.65)

and that
*
htn , tn i =
=

n
X

aj j ,

j=0
n
X

n
X

j=0
n
X

k=0
n
X

aj

j=0
n
X
j=0
n
X

aj

n
X

+
ak k

(24.66)

ak hi , j i

(24.67)

ak ij

(24.68)

k=0

k=0

aj aj

(24.69)

|aj |2

(24.70)

j=0

Furthermore,
*
hf, tn i =

f,

n
X

+
ak k

k=0

*
htn , f i =

n
X

+
ak k , f

k=0

Last revised: November 3, 2012

n
X
k=0
n
X
k=0

ak hf, k i =
ak hk , f i =

n
X
k=0
n
X

ak ck

(24.71)

ak ck

(24.72)

k=0

Page 173

Math 462

TOPIC 24. FOURIER SERIES

Therefore
2

n
X

k=0
n
X

kf tn k = kf k +
= kf k +

|ak |2
|ak |2

k=0

n
X
k=0
n
X

ak ck

n
X

ak ck

(24.73)

k=0

2 Re(ak ck ) ck

(24.74)

k=0

Similarly,
2

n
X

k=0
n
X

kf sn k = kf k +
= kf k

|ck |2

n
X

ck ck

k=0

n
X

ck ck

(24.75)

k=0

|ck |2

(24.76)

k=0

Hence
n
X

|ck |2 = kf k kf sn k

(24.77)

k=0

But for any complex numbers x and y,


|x y|2 = (x y)(x y)

(24.78)

= xx xy yx + yy
2

(24.79)

= |x| + |y| 2 Re(xy)

(24.80)

Hence
2

n
X

k=0
n
X

k=0
n
X

kf tn k = kf k +
= kf k +
= kf k +

|ak |2 +

n
X

|ak ck |2 |ak |2 |ck |2

(24.81)

k=0

|ak ck |2

n
X

|ck |2

(24.82)

k=0
2

|ak ck |2 kf k + kf sn k (from eq. 24.77)

k=0

(24.83)
=

n
X

|ak ck |2 + kf sn k kf sn k

(24.84)

k=0

with equality holding only if each of the terms in the first sum is zero,
namely, whence ak = ck for all k.
Page 174

Last revised: November 3, 2012

Topic 25

Triangular Decomposition
Corollary 25.1 Schurs Theorem. Let V be a complex inner-product
space; and let T L(V) be an operator on V. Then there exists an orthonormal basis B of V such that M(T, B) is upper-triangular.
Proof. This follows from corollary 23.9.
Definition 25.2 Let U be a complex valued matrix. Then the Conjugate
Transpose of U, or Adjoint matrix1 denoted by U , is the complex
conjugate of the matrix transpose.
U = (UT ) = U

(25.1)

The conjugate transpose is sometimes called the Hermitian Conjugate


or the Hermitian Transpose (notably in the physics literature). Other
notations for U include U (common in physics), U+ , and UH .
Definition 25.3 A matrix U is said to be unitary if U1 = U . If U is
real and unitary then U1 = UT .
Theorem 25.4 Properties of unitary matrices.
1. The rows are orthogonal unit vectors.
2. The columns are orthogonal unit vectors.
3. det U = 1
4. U is invertible.
5. The rows form an orthonormal basis of Cn .
6. The columns form an orthonormal basis of Cn .
1 Not

to be confused with the adjoint map defined in the next section.

2012. Draft of: November 3, 2012

Page 175

Math 462

TOPIC 25. TRIANGULAR DECOMPOSITION

Proof. (1) and (2) follow from the fact that hrowi , columnj i = ij (because
U1 = U ) and the fact that rowi = (columni )T .
(3) Let be an eigenvalue of U with nonzero eigenvector v. Then since
v 6= 0,
Uv = v = Uv = v

(25.2)

= vT U = v

(25.3)

= (vT U )(Uv) = (vT )(v)

= v (U U)v = v v
=

vT Iv

= || kvk

= kvk = || kvk
2

= || = 1

(25.4)
(25.5)
(25.6)
(25.7)
(25.8)

Since the determinant is the product of the eigenvalues, det U = 1.


(4) follows because det U 6= 0.
(5) and (6) follow from (1) and (2) because the rows (columns) are linearly
independent and spanning.
Theorem 25.5 The Schur Decomposition Let A be any n n square
matrix over C. Then there exists a unitary matrix U such that
A = U1 TU

(25.9)

where T is an upper triangular matrix.

Proof. Prove by induction.


For n = 1 the result automatically holds since a 1 1 matrix is upper triangular.
Inductive hypothesis. Suppose the following is true: For any (n 1)
(n 1) square matrix M, there exists a unitary matrix V and some upper
triangular matrix T such that M = V1 TV.
Let A be any n n square matrix over C.
Let u1 be any normalized eigenvector of A with eigenvalue 1 . We know
that at least one eigenvector exists because det(A I) = 0 has at least
one root over C.
Page 176

Last revised: November 3, 2012

TOPIC 25. TRIANGULAR DECOMPOSITION

Math 462

Define E = (e1 , . . . , en ) as the standard basis of Fn , namely

e1 = (1, 0, 0, 0, . . . , 0)

e2 = (0, 1, 0, 0, . . . , 0)

e3 = (0, 0, 1, 0, . . . , 0)

..

en = (0, 0, . . . , 0, 0, 1)

(25.10)

Since the eigenvector u1 6= 0, it has at least one non-zero component. Pick


one of these components and designate it as (u1 )r = er u1 (e.g., the rth
component).
Define E 0 as E with er removed and u1 inserted at the front of the list, as
follows:
E 0 = (u1 , e1 , e2 , . . . , er1 , er+1 , er+2 , . . . , en )

(with er missing) (25.11)

Then E 0 is also a basis of Fn .


Use the Gram-Schmidt process to obtain an orthonormal basis B = (v1 , . . . , vn )
of Fn from E 0 , starting with v1 = u1 .
Let M be the matrix whose columns are vi :

M = v1

v2

vn

(25.12)

Then

AM = A v1

v2

vn = 1 v1

Av2

Avn (25.13)

because Av1 = Au1 = 1 u1 .


Similary, M , the conjugate transpose of M, has viT (the transpose of the
complex conjugate of vi ) as row i:

v1T

v2T
v Av Av
M AM =
(25.14)
2
n

1 1
..

T
vn

Last revised: November 3, 2012

Page 177

Math 462

TOPIC 25. TRIANGULAR DECOMPOSITION

M AM =

v1T Av2

1 v1T v1
2 v2T v1
..
.

v1T Avn

1 vnT v1

(25.15)

where B is an (n 1) (n 1) sub-matrix. Since the vi are orthonormal,


viT vj = ij , and so

1
0

M AM = .
(25.16)

B
..

0
where the * notation refers to values we dont care about.
Now we can make use of the inductive hypothesis. Since B has dimensions
(n 1) (n 1), there exists some unitary matrix W and some upper
triangular matrix T1 such that
W BW = T1

(25.17)

Define the matrix Y by

1
0

Y = .
..

(25.18)

0
Since W is unitary, so is Y. From equation (25.16),

1 0 0
1 1
0
0
0

Y (M AM)Y = .

..

.
.
..

.
W
B
0
0
0

(25.19)

1
0

= .
..
0
Page 178

W BW

1
0

= ..
.
0

T1

(25.20)

Last revised: November 3, 2012

TOPIC 25. TRIANGULAR DECOMPOSITION

Math 462

Since T1 is upper triangular,


Y M AMY = T2

(25.21)

where T2 is upper triangular.


Define U = MY. Then U is unitary because the product of unitary matrices is unitary. [Since M and Y are unitary,
U U = (MY) (MY) = Y MMY = Y Y = I

(25.22)

and similarly, UU = I. ] From (25.21),


U AU = T2

(25.23)

where T2 is upper triangular, as we set out to prove.


Remark 25.6 If A is unitary then the Schur decomposition of A gives
A = UDU1

(25.24)

where D is real and diagonal.


Proof. Let A be unitary; then A = A. The Schur decomposition gives
U AU = T

(25.25)

for some unitary matrix U and upper triangular matrix T. Thus


A = UTU

(25.26)

(UTU ) = A = A = UTU

(25.27)

Since A = A,
Applying the conjugate transpose to the first term,
UT U = UTU

(25.28)

Left multiply by U and right multiply by U gives


T = T

(25.29)

Since T is upper triangular this means that all off-diagonal elements must
be zero, and all diagonal elements are real. Thus (25.25) leads to
U AU = D

(25.30)

where D is a real, diagonal matrix. Left-multiplying by U right-multiplying


by U gives
A = UDU
(25.31)
The desired result follows because U is unitary.
Last revised: November 3, 2012

Page 179

Math 462

TOPIC 25. TRIANGULAR DECOMPOSITION

Page 180

Last revised: November 3, 2012

Topic 26

The Adjoint Map


Definition 26.1 Let V be vector space over F. Then a linear functional1
on V is a linear map : V 7 F.
Theorem 26.2 Let be a linear functional on V. Then there is a unique
vector v V such that
(u) = hu, vi
(26.1)
for every u V.
Proof. To prove existence, let B = (e1 , . . . , en ) be an orthonormal basis of
V and pick any u V. Then by theorem 23.5
u = hu, e1 i e1 + + hu, en i en

(26.2)

(u) = hu, e1 i (e1 ) + + hu, en i (en )

(26.3)

By the linearity of

By conjugate homogeneity in the second argument of the inner product


(theorem 21.3),
(u) = hu, (e1 ) e1 i + + hu, (en ) en i

(26.4)

If we define v by
v = (e1 ) e1 + + (en ) en

(26.5)

1 Recall

from definition 14.1 that a linear map is a map that the properties of additivity
((u + v) = (u) + (v), u, v V) and homogeneity ((av) = a(v), v V, a F)

2012. Draft of: November 3, 2012

Page 181

Math 462

TOPIC 26. THE ADJOINT MAP

then
hu, vi = hu, (e1 ) e1 + + (en ) en i

(26.6)

= hu, (e1 ) e1 i + + hu, (en ) en i

(26.7)

= (u)

(26.8)

as desired. Note that v does not depend on u, only on and the basis.
To prove uniqueness, suppose that there exist v, w such that
(u) = hu, vi = hu, wi

(26.9)

0 = hu, vi hu, wi = hu, v wi

(26.10)

for all u V. Then

This must hold for all u, so if we pick u = v w then


0 = hv w, v wi

vw =0

(26.11)

hence v = w, proving uniqueness.


Definition 26.3 Let V, W be finite dimensional inner product spaces over
V and let T L(V, W). Then the adjoint of T is the function T : W 7 V
defined as the unique vector v V such that, given w W,
hT v, wi = hv, T wi

(26.12)

Theorem 26.4 If T L(V, W) then T L(W, V)


Proof. Additivity: Let u, v, w V be any vectors in V. Then
hv, T (u + w)i = hT v, u + wi

(26.13)

= hT v, ui + hT v, wi

(26.14)

= hv, T ui + hv, T wi

(26.15)

= hv, T u + T wi

(26.16)

Since this must hold for any vector v V,


T (u + w) = T u + T w

(26.17)

Homogeneity: Let a F. Then


hv, T (aw)i = hT v, awi

= a hT v, wi

= a hv, T wi

= hv, aT wi
Page 182

(26.18)
(26.19)
(26.20)
(26.21)

Last revised: November 3, 2012

TOPIC 26. THE ADJOINT MAP

Math 462

Since this holds for all v,


T aw = aT w

(26.22)

Theorem 26.5 Properties of the Adjoint Map.

1. Additivity: (S + T ) = S + T

2. Conjugate Homogeneity: (aT ) = a T

3. Adjoint of Adjoint: (T ) = T
4. Identity: I = I

5. Adjoint Product: (ST ) = T S


Proof. (Exercise.)
Theorem 26.6 Let T L(V, W). Then
1. null(T ) = (range(T ))
2. range(T ) = (null(T ))
3. null(T ) = (range(T ))
4. range(T ) = (null(T ))
Proof. (1) Let w W such that w null(T ). Then
T w = 0 hv, T wi = 0
hT v, wi = 0

(v V)

(26.23)

(v V)

(26.24)

(26.25)

w (range T )

(3) Let v V, and suppose that v null T . Then


T v = 0 hw, T vi = 0

(w W)

hT w, vi = 0

(w W)

v (range T )

(26.26)
(26.27)
(26.28)

Returning to (2), we take the orthognoal complement of (3)


null(T ) = (range(T )) (null T ) = ((range(T )) )

= (range(T )

(26.29)
(26.30)

To prove (4), take the orthogonal complement of (1):


null(T ) = (range T ) (null(T )) = ((range T ) )

(26.31)

= range T
Last revised: November 3, 2012

Page 183

Math 462

TOPIC 26. THE ADJOINT MAP

Next we recall the definition of the adjoint matrix as the conjugate transpose (see definition 25.2). Then the adjoint operator and the adjoint matrix
are related in the following way:
Theorem 26.7 Let T L(V, W), and let E = (e1 , . . . , en ) and F =
(f1 , . . . , fm ) be orthonormal bases of V and W respectively. Then the matrix
of the adjoint of T is the adjoint (conjugate transpose) of the matrix of T :

M(T , F, E) = M(T, E, F )

(26.32)

Proof. (Exercise.)
Definition 26.8 Let T LV be an operator. Then we say T is selfadjoint or Hermitian if T = T .
Theorem 26.9 Let V be a finite-dimensional, nonzero, inner-product space
over F (either R or C) and let T L(V ) be a self-adjoint operator over V.
The the eigenvalues of T are all real.
Proof. Let be an eigenvalue of T with nonzero eigenvector v. Then
2

kvk = hv, vi

(26.33)

= hv, vi

(26.34)

= hT v, vi (because is an eigenvalue of T )

(26.35)

= hv, T vi (definition of Adjoint T )

(26.36)

= hv, T vi (because T is self-adjoint)

(26.37)

= hv, vi

(26.38)

= hv, vi

(26.39)

= kvk

(26.40)

Since v 6= 0 then kvk 6= 0 and can be cancelled to give = , which is


only possible of R.
Theorem 26.10 Let V be a complex inner product space and let T L(V)
such that
hT v, vi = 0
(26.41)
for all v V. Then T = 0.
Proof. Since hT v, vi = 0 for all v, it is true for v = u + w and v = u w,
for any vectors u, w V.
0 = hT (u + w), u + wi

(26.42)

= hT u, ui + hT u, wi + hT w, ui + hT w, wi
Page 184

(26.43)

Last revised: November 3, 2012

TOPIC 26. THE ADJOINT MAP

Math 462

and
0 = hT (u w), u wi
= hT u, ui hT u, wi hT w, ui + hT w, wi

(26.44)
(26.45)

Subtracting gives
0 = hT (u + w), u + wi hT (u w), u wi
= 2 hT u, wi + 2 hT w, ui
= hT u, wi = hT w, ui

(26.46)
(26.47)
(26.48)

Next we apply the same idea to v = u + iw and v = u iw:


0 = hT (u + iw), u + iwi
= hT u, ui + hT u, iwi + hiT w, ui + hiT w, iwi
0 = hT (u iw), u iwi
= hT u, ui hT u, iwi hiT w, ui + hiT w, iwi

(26.49)
(26.50)
(26.51)
(26.52)

Subtracting,
0 = hT (u + iw), u + iwi hT (u iw), u iwi

(26.53)

= 2 hT u, iwi + 2 hiT w, ui

(26.54)

= 2i hT u, wi + 2i hT w, ui

(26.55)

= hT u, wi = hT w, ui

(26.56)

Adding equations 26.48 and 26.56 gives


hT u, wi = 0

(26.57)

Since this must hold for all u, w, it certainly holds for w = T u. Then for
any u V,
0 = hT u, wi = hw, wi
(26.58)
which is true if and only if w = 0.
Hence for any u V, w = T u = 0. Hence T = 0.
Remark 26.11 The last result only holds if V is complex; if V is real
equations 26.49 and following do not hold. Consequently equation 26.48
does not imply that T = 0
Example 26.1 (Example of Remark 26.11). Let V be a real vector space
and
 let T be the 90 degree rotation about the origin operator,e.g, if v =
x
V,
y
   
x
y
Tv = T
=
(26.59)
y
x
Last revised: November 3, 2012

Page 185

Math 462

TOPIC 26. THE ADJOINT MAP

Then

hT v, vi =

  
y
x
,
= yx + xy = 0
x
y

(26.60)

Thus the rotation operator satisfies hT v, vi = 0 for all v Rn , including


non-zero v. This illustrates how theorem 26.10 only applies to complex
vector spaces, and not real vector space.
The following theorem is also only true on complex vectors, but is false on
real vector spaces.
Theorem 26.12 Let V be a complex inner product space and let T L(V)
be an operator on V. Then T is self-adjoint if and only if hT v, vi R for
every v V.
Proof. Let v V. Then since any real number is its own complex conjugate,
if hT v, vi = 0 then
0 = hT v, vi hT v, vi

(26.61)

= hT v, vi hv, T vi

(26.62)

= hT v, vi hT v, vi

(26.63)

= hT v T v, vi

(26.64)

= h(T T )v, vi

(26.65)

Hence T T = 0 by theorem 26.10 and thus T = T .


To prove the converse, suppose that T is self adjoint. Then equation 26.65
follows immediately, which (following the steps backwards), tells us that
hT v, vi is its own complex conjugate, hence it must be real.
Theorem 26.13 Let T be a self adjoint operator on a nonzero inner product space V such that
hT v, vi = 0 (v V)
(26.66)
Then T = 0.
Proof. If V is complex this follows from theorem 26.10. So let us assume
that V is real.
Since hT v, vi = 0, we have
0 = hT (u + w), u + wi
= hT u, ui + hT u, wi + hT w, ui + hT w, wi
= hT u, wi + hT w, ui
Page 186

(26.67)
(26.68)
(26.69)

Last revised: November 3, 2012

TOPIC 26. THE ADJOINT MAP

Math 462

Hence
hT u, wi = hT w, ui

(26.70)

= hw, T ui

(26.71)

= hw, T ui (because T is self adjoint)

(26.72)

= hT u, wi (because V is real)

(26.73)

Thus hT u, wi = 0 for all u, w V.


2

Let w = T u. Then hT u, T ui = 0 for all u V. Hence kT uk = 0 =


T u = 0 for all u. Hence T = 0.
Definition 26.14 Let T LV be an operator on V. Then we say T is
normal or a normal operator if it commutes with its adjoint, i.e.,
T T = T T

(26.74)

Corollary 26.15 If T is self-adjoint, then it is normal.


Theorem 26.16 Let T L(V) be an operator over V. Then T is normal
if and only if
kT vk = kT vk

v V

(26.75)

Proof.
T is normal T T T T = 0

(26.76)

h(T T T T )v, vi = 0 v V

(26.77)

hT T v, vi = hT T v, vi

(26.78)

hT v, T vi = hT v, T vi

kT vk = kT vk

(26.79)

Corollary 26.17 Let T L(V) be normal. Then if v V is an eigenvector


of T with eigenvalue then v is also an eigenvector of T with eigenvalue
.
Proof. Let v be an eigenvector of T with eigenvalue . Then
(T I)v = 0
Last revised: November 3, 2012

(26.80)
Page 187

Math 462

TOPIC 26. THE ADJOINT MAP

Define H = T I. Since T is normal,


HH = (T I)(T I)

(26.81)

= (T I)(T I)

(26.82)
2

(26.83)

= T T T T + I

(26.84)

= T (T I) (T I)

(26.85)

= T T T T + || I

= (T I)(T I)

(26.86)

= H H

(26.87)

Hence H is normal. By theorem 26.16


kHvk = kH vk

(26.88)

Since Hv = 0 (because T v = v),






0 = kH vk = (T I) v = (T I)v

(26.89)

Thus
(T I)v = 0

(26.90)

so that T v = v. This means that v is an eigenvector of T with eigenvalue .


Theorem 26.18 Let T L(V) be normal. Then the eigenvectors of T
with distinct eigenvalues are orthogonal.
Proof. Let 6= be eigenvalues of T with eigenvectors u and v. By
corollary 26.17
(26.91)
T v = v = T v = v
Hence
( ) hu, vi = hu, vi hu, vi


= hu, vi u, v

(26.92)
(26.93)

= hT u, vi hu, T vi

(26.94)

= hT u, vi hT u, vi

(26.95)

=0

(26.96)

Since 6= , this means hu, vi = 0.

Page 188

Last revised: November 3, 2012

Topic 27

The Spectral Theorem


Theorem 27.1 Spectral Theorem, Complex Vector Spaces. Let V
be a complex inner-product space and T L(V). Then V has an orthonormal basis consisting of eigenvectors of T if and only if T is normal.
Proof. ( = ) Suppose that V has an orthonormal basis consisting of eigenvectors of T . With respect to this basis, T has a diagonal matrix (this is
theorem 18.5). Since the matrix of the adjoint operator is the adjoint matrix of the matrix of the operator, T also has a diagonal matrix. Since any
two diagonal matrices commuute, T T = T T . Hence T is normal.
( = ) Suppose T is normal. Then by corollary 23.10, there is some
orthonormal basis E = (e1 , . . . , en ) under which M(T ) is upper triangular.
Let

a11 a1n

..
..
M(T, E) =
(27.1)
.
.
0

ann

where by definition of the matrix of an operator, the aij are given by the
coefficients of the expansion
T ei = a1i e1 + + ani en

(27.2)

Since the matrix is diagonal,


T e1 = a11 e1 + + an1 en = a11 e1

(27.3)

so that
2

kT e1 k = |a11 |2

2012. Draft of: November 3, 2012

(27.4)
Page 189

Math 462

TOPIC 27. THE SPECTRAL THEOREM

Similarly,

a11

M(T , E) =

..
.


a11
a1n
.. = ..
.
.

0
..

a1n

ann

(27.5)

ann

Again looking at the first column,


T e1 = a11 e1 + + a1n en =

n
X

a1i ei

(27.6)

i=1

kT e1 k =
=

* n
X

n
X

a1i eei ,
a1j ej
j=1
i=1
n X
n
X
i=1 j=1
n X
n
X
i=1 j=1
n X
n
X

(27.7)

ha1i ei , a1j ej i

(27.8)

a1i a1j hei , ej i

(27.9)

a1i a1j ij

(27.10)

i=1 j=1

= |a11 |2 + + |a1n |2

(27.11)

Since T is normal, by theorem 26.16, we require kT e1 k = kT e1 kHence


|a11 |2 = |a11 |2 + + |a1n |2

(27.12)

The right hand side is a sum of non-negative elements; when we cancel


out the |a11 |2 from both sides of the equation what we have left is a sum
of non-negative numbers that equals zero. This is only possible if each of
these numbers is equal to zero.
a12 = a13 = = a1n = 0
This means the only non-zero element in the
diagonal element:

a11 0
0 a22

M(T ) = .
..
..
..
.
.
0
0
Page 190

(27.13)

first row of M(T ) is the

0
a2n

..
.

(27.14)

ann

Last revised: November 3, 2012

TOPIC 27. THE SPECTRAL THEOREM

Math 462

Repeating the same argument with e2 we find that the only non-zero element in the second row is a22 . We proceed through the matrix and get the
same result on each row, giving us a completely diagonal matrix.
Since the matrix is diagonal, by theorem 18.5) E is an orthonormal basis
of eigenvalues.
Theorem 27.2 Real Spectral Theorem. Let V be a real inner-product
space and let T L(V). Then V has an orthonormal basis consisting of
eigenvectors of T if and only if T is self-adjoint.
Corollary 27.3 Let T L(V) be self-adjoint with distinct eigenvalues
1 , . . . , m . Then
V = null(V 1 I) null(V m I)

(27.15)

where each subspace Ui = null(V i I) is orthogonal to each of the other


Ui .
To prove theorem 27.2 we will need two lemmas.
Lemma 27.4 Let T L(V ) be self-adjoint. If , R satisfy
2 < 4

(27.16)

T 2 + T + I

(27.17)

then
is invertible.
Proof. Let v V be nonzero. Then


(T + T + I)v, v = T 2 v, v + hT v, vi + hv, vi
2

(27.18)

= hT v, T vi + hT v, vi + kvk

(27.19)

(27.20)

= hT v, T vi + hT v, vi + kvk
2

= kT vk + hT v, vi + kvk

(27.21)

From the Cauchy Schwarz inequality (theorem 21.11)


| hT v, vi | kT vk kvk

(27.22)

Multiplying through by || and reversing the inequality,


||| hT v, vi | || kT vk kvk

(27.23)

By definition of the absolute value,


hT v, vi ||| hT v, vi | || kT vk kvk
Last revised: November 3, 2012

(27.24)
Page 191

Math 462

TOPIC 27. THE SPECTRAL THEOREM

Therefore

2

2
2
(T + T + I)v, v kT vk || kT vk kvk + kvk

(27.25)

= kT vk || kT vk kvk +

||
2
kvk
4

||2
2
2
kvk + kvk
(27.26)
4

2 

|| kvk |
2
2
= kT vk
+
kvk > 0
2
4
(27.27)

where the last inequality follows because v 6= 0 and 2 < 4. Hence the
inner product on the left is non-zero. Hence
(T 2 + T + I)v 6= 0

(27.28)

[If it were equal to zero we would have 0 = h0, vi > 0 because the inner
product of any vector with the zero vectors must be zero.]
Since v is an arbitrary non-zero vector, this means that every non-zero
vector in V is not in null(T 2 + T + I), i.e.,
null(T 2 + T + I) = {0}

(27.29)

Thus T 2 + T + I is injective (theorem 14.17). Hence it is invertible.


Lemma 27.5 If T L(V) is self-adjoint in a real vector space V, then T
has an eigenvalue.
Proof. Let n = dim(V) and pick any v 6= 0, v V. Let
S = (v, T v, T 2 v, . . . , T n v)

(27.30)

Since length(S) > n, S cannot be linearly independent. Thus there exists


real numbers, not all zero, such that
0 = a0 v + a1 T v + a2 T 2 v + + an T n v = p(T )v

(27.31)

where the polynomial p(x) is defined by


p(x) = a0 + a1 x + + an xn

(27.32)

Then p(x) can be factored


p(x) = c(x2 + 1 x + 1 ) (x2 + p x + p )(x 1 ) (x m ) (27.33)
Page 192

Last revised: November 3, 2012

TOPIC 27. THE SPECTRAL THEOREM

Math 462

where c 6= 0, j , j , j R, each j2 < 4j , and m + p 1 (There are m


real roots and p/2 complex conjugate pairs of roots).
Hence we can factor (27.31) as
0 = c(T 2 +1 T +1 I) (T 2 +p T +p I)(T 1 I) (T m I)v (27.34)
Since each T is self-adjoint and j2 < 4j , the previous lemma tells us that
each T 2 + j T + j I is invertible; hence (also c 6= 0),
0 = (T 1 I) (T m I)v

(27.35)

Thus for at least one j, T j I is not injective, and j is an eigenvalue of


T.
Proof. (Proof of the Real Spectral Theorem 27.2.)
( = ) Suppose V has an orthonormal basis consisting of eigenvectors of
T.
Then with respect to this basis, T has a diagonal matrix (theorem 18.5).
The eigenvalues are all real. This is because the vector space is real. To see
this, suppose there was a complex eigenvalue with non-zero imaginary
part, with eigenvector v. Then since T v = v V we have v V. Since
v V, then v must be real. Hence v is complex with non-zero imaginary
part, because the product of a real and an imaginary is imaginary. This is
not allowed in a real vector space. Hence must be real.
Thus M(T ) is both diagonal and real. Hence T = T , making T selfadjoint.
( = ) Suppose that T is self-adjoint. Prove by induction on the dimension
of V.
If dim V = 1 then V has an orthonormal basis consisting of eigenvectors (all
vectors are parallel so any vector is an eigenvector).
Inductive Hypothesis: suppose dim V = n > 1, and assume that any vector
space with dimension < n has an orthonormal basis consisting solely of
eigenvectors of T .
Let , u be an eigenvalue,eigenvector pair with kuk = 1.
Let U be the subspace of V spanned by u.
Let v U ; then hu, vi = 0.
Since T is self adjoint,
hu, T vi = hT u, vi = hu, vi = hu, vi = 0
Last revised: November 3, 2012

(27.36)
Page 193

Math 462

TOPIC 27. THE SPECTRAL THEOREM

Hence v U we have T v U , i.e., U is invariant under T .


Let S L(U ) be defined as the restriction of T to U , i.e., S = T |U .
Let v, w U . Then, since T is self-adjoint, it follows that
hSv, wi = hT v, wi = hv, T wi = hv, Swi

(27.37)

Hence S is self-adjoint.
The inductive hypothesis applies to U because it has dimension smaller
than n. By the inductive hypothesis there is an orthonromal basis of U
consisting solely of eigenvectors of S. Joining u to this list of eigenvectors
of S (each of which is also an eigenvector of T ) gives a basis of V that
consists solely of eigenvectors of T .
We need the following lemma to apply the Spectral Theorem to matrices.
Lemma 27.6 Let T be a normal triangular matrix. Then T is a diagonal
matrix.
Proof. Let T be a triangular matrix with Tij = 0 for i > j. Since T is
normal, TT = T T, and in particular, the diagonal elements are equal:
(TT )ii = (T T)ii

(27.38)

Then
(T T)ii =

n
X

(T )ik Tki =

i
X

Tki Tki

(27.39)

k=1

k=1

In the final expression, the sum terminates at i rather than n because


Tki = 0 for k > i in an upper triangular matrix.

(TT )ii =

n
X
k=1

Tik (T )ki =

n
X

Tik Tik

(27.40)

k=i

The second sum starts at i rather than at 1, because Tik = 0 for k < i.
From (27.38), we can equate the expressions in (27.39) and (27.40). This
gives
T1i T1i + + Tii Tii = Tii Tii + + Tin Tin
(27.41)
Each term is an absolute value, hence non-negative.
|T1i |2 + |T2i |2 + + |Tii |2 = |Tii |2 + |Ti,i+1 |2 + + |Tin |2
{z
} |
|
{z
}
top of column i

Page 194

(27.42)

RHS of row i

Last revised: November 3, 2012

TOPIC 27. THE SPECTRAL THEOREM

Math 462

Writing them out:


|T11 |2 = |T11 |2 + |T12 |2 + + |T1n |2

(i = 1)
(27.43)

|T12 |2 + |T22 |2 = |T22 |2 + |T23 |2 + + |T2n |2

(i = 2)
(27.44)

|T13 |2 + |T23 |2 + |T33 |2 = |T33 |2 + |T34 |2 + + |T3n |2

(i = 3)
(27.45)

..
.
In the first line (27.43), the |T11 |2 cancels and we get a sum of non-negatives
equalling zero, hence
T12 = T13 = = T1n = 0

(27.46)

Cancelling the |T22 |2 in (27.44), and using T12 = 0 from (27.46) gives
T22 = T23 = = T2n = 0

(27.47)

Cancelling the |T33 |2 in (27.45), and using the facts that T13 = 0 from
(27.46), and that T23 = 0 from (27.47),
T34 = T35 = = T3n = 0

(27.48)

Progressing this way through the system of equations we obtain Tij = 0


for all i 6= j. Hence T is diagonal.
Theorem 27.7 Spectral Theorem for Matrices Let A be matrix over
F. Then A is normal if and only if it is diagonalizable, and there exists
unitary matrix U such that
A = UDU

(27.49)

where D is a diagonal matrix consisting of the eigenvalues of A.


Proof. Suppose T is normal. By the Schur Decomposition (theorem 25.5),
A = UTU

(27.50)

where T is upper triangular. Hence

A A = (UTU ) UTU = UT U UTU = UT TU

AA = UTU UT U = UTT U
Last revised: November 3, 2012

(27.51)
(27.52)
Page 195

Math 462

TOPIC 27. THE SPECTRAL THEOREM

Since A is normal, AA = A A and therefore


UT TU = UTT U

(27.53)

Left-multiplying by U and right-multiplying by U,


U (UT TU )U = U (UTT U )U

(27.54)

Since U is unitary, this reduces to


T T = TT

(27.55)

Hence T is also normal. By the lemma, Since T is triangular and normal


it is diagonal.

Page 196

Last revised: November 3, 2012

Topic 28

Normal Operators and


Matrices
Recall that an operator is self-adjoint if T = T and is normal if T T =
T T . In this section we will be concerned with real inner product spaces
only, hence
M(T ) = (M(T ))T
(28.1)
Theorem 28.1 Let V be a two-dimensional real inner product space and
suppose that T L(V) is an operator over V. Then the following are
equivalent:
(1) T is normal but not self-adjoint.
(2) For every othonormal basis B of V

a
M(T, B) =
b


b
a

(28.2)

where b 6= 0.
(3) For any orthonormal basis B of V, equation 28.2 holds with b > 0.
Proof. We will show that (1) = (2) = (3) = (1).
((1) = (2)) Assume that (1) is true, i.e.,
T T = T T but T 6= T

(28.3)

Since V is two-dimensional then M(T ) is a 2 2 matrix.

2012. Draft of: November 3, 2012

Page 197

Math 462

TOPIC 28. NORMAL OPERATORS

Let B = (u, v) be any orthonormal basis of V, and let




a c
M(T ) =
b d

(28.4)

To prove (2) we must show that a = d, b = c 6= 0.


By the definition of M(T, B), (equation 18.1)
T u = au + bv

(28.5)

T v = cu + dv

(28.6)

Since B is orthonormal, hu, vi = 0 and hu, ui = hv, vi = 1; therefore


2

(28.7)

(28.8)

kT uk = hT u, T ui = hau + bv, au + bvi = a2 + b2


kT vk = hT v, T vi = hcu + dv, cu + dvi = c2 + d2
Since the matrix of the adjoint is the adjoint of the matrix,

 

a
b
a b

M(T ) = M(T ) =
=
c
d
c d

(28.9)

where the last step follows because the vector space is real. By the definition
of M(T ), (equation 18.1),
T u = au + cu

(28.10)

T v = bu + dv

(28.11)

so that
kT uk = hau + cv, au + cvi = a2 + c2

kT vk = hbu + dv, bu + dvi = b + d

(28.12)
(28.13)

By theorem 26.16, since T is normal,


kT uk = kT uk

(28.14)

kT vk = kT vk

(28.15)

and therefore
a2 + b2 = a2 + c2
2

c +d =b +d

(28.16)
(28.17)

Thus
b2 = c2 = b = c
Page 198

(28.18)

Last revised: November 3, 2012

TOPIC 28. NORMAL OPERATORS

Math 462

Since T is not self-adjoint (given),

and therefore

M(T ) 6= M(T )

(28.19)


a
b

(28.20)

 
c
a
6=
d
c


b
d

so that b 6= c. Hence b = c, and we can write the matrix as




a b
M(T ) =
b d

(28.21)

Furthermore, b 6= 0 because otherwise the matrix would be diagonal and


hence self-adjoint, and we are given that the operator is not self adjoint,
hence neither is the matrix.
Since the operator is normal, T T = T T hence
M(T )M(T ) = M(T )M(T )

(28.22)

But

a
b

M(T )M(T ) =

M(T )M(T ) =

a
b


  2

b
a b
a + b2 ab bd
=
d
b d
ab bd b2 + d2

  2

b a b
a + b2 ab + bd
=
d b d
ab + bd b2 + d2

(28.23)

(28.24)

Substituting this into equation 28.22 and equating like components of the
matrix,
ab + bd = ab bd = ab = bd
(28.25)
Since b 6= 0, we have a = d, and thus

a
M(T, B) =
b


b
a

(28.26)

which is the required expression.


((2) = (3)) Assume that for every orthonormal basis B = (u, v), equation 28.2 holds with b 6= 0:


a b
M(T, B) =
(28.27)
b a
Pick any orthonormal basis B. Then if b > 0, then desired result (3) follows
immediately.
Last revised: November 3, 2012

Page 199

Math 462

TOPIC 28. NORMAL OPERATORS

If b < 0 then consider the basis E = (u, v). By definition of M(T, B)


(equation 18.1), we have
T u = au + bv

(28.28)

T v = bu + av

(28.29)

T (u) = T u = au bv

(28.30)

T (v) = T v = bu av

(28.31)

Thus

so that by the definition of M(T, E),



a
M(T, E) =
b


b
a

(28.32)

Since b < 0, b > 0 and therefore (3) holds for basis E.


((3) = (1)) Assume (3) is true. Thus equation 28.2 holds for some b > 0.


a b
M(T, B) =
(28.33)
b a
We need to show that T is normal but not self-adjoint.
Since b > 0, b 6= b and therefore M(T ) 6= (M(T )) . So T is not selfadjoint.
To show that T is normal we must verify T T = T T , which we do with
matrix multiplication.


  2

a b
a b
a + b2
0
M(T )(M(T )) =
=
(28.34)
b a
b a
0
a 2 + b2



  2
a b a b
0
a + b2

(M(T )) M(T ) =
(28.35)
=
b a b a
0
a 2 + b2
Since M(T )(M(T )) = (M(T )) M(T ) we conclude that T T = T T , as
required.

Block-Notation
We will sometimes divide a matrix into blocks and refer to each of the
blocks as a matrix in its own right. For example, we might write

a b c d e
f g h i j 


k l m n o = A B
(28.36)

C D
p q r s t
u v w x y
Page 200

Last revised: November 3, 2012

TOPIC 28. NORMAL OPERATORS

Math 462

where

A=

a
f

b
c
, B=
g
h

l
m
q , D = r
v
w

k
d e
, C = p
i j
u


n o
s t (28.37)
x y

The matrices A, B, C, D are called sub-matrices of the original matrix.


Multiplication of compatibly blocked matrices proceeds like normal matrix
multplication. Thus


 
A B P Q
AP + BR
=
C D R S
CP + DR

AQ + BS
CQ + DS


(28.38)

so long as all the multiplicatons are well-defined (i.e., it is possible to multiply A by P, etc. ).
A square matrix M is called Block (Upper) Triangular if it can be
written in the form

M11 M12 . . . M1n


0
M22
M2n

(28.39)
M= .

..
..
..
.
.
...
0

...

Mnn

where each of the Mij represent a sub-matrix of M and each 0 represents a


matrix of all-zeros. A similar definition holds for Block (Lower) Triangular
matrices.
A square matrix A is called block-diagonal if it can be represented as

A1

0
A=
.
..
0

0
A2
..
.
...

...
..
.
..
.
0

0
..
.

0
Am

(28.40)

for some square matrices A1 , . . . , Am . If B is also block diagonal with

B1

0
B=
.
..

B2
..
.

...

Last revised: November 3, 2012

...
..
.
..
.
0

0
..
.

0
Bm

(28.41)

Page 201

Math 462

TOPIC 28. NORMAL OPERATORS

with each Bj the same size as each Aj , then

A1 B1

0
AB =
.
..
0

0
A2 B2
..
.
...

...
..
.
..
.
0

0
..
.
0
Am Bm

(28.42)

Invariant Subspaces of Normal Operator


Theorem 28.2 Let T L(V) be normal, with U a subspace of V that is
invariant under T . Then:
(1) U is invariant under T .
(2) U is invariant under T .
(3) (T |U ) = (T )|U .
(4) T |U is normal on U.
(5) T |U is normal on U .

Proof. (proof of theorem 28.2 (1))


Let E = (e1 , . . . , em ) be an orthonormal basis of U.
Extend E to an orthonormal basis E 0 = (e1 , . . . , em , f1 , . . . , fn ) of V (corollary 23.8).
Then F = (f1 , . . . , fn ) is an orthonormal basis of U .
Since U is invariant under T (by the hypothesis of the theorem) there exist
aij such that
T ej = aj1 e1 + + ajm em

(28.43)

= aj1 e1 + + ajm em + 0 f1 + + 0 fn

(28.44)

For a general v V, we know that T v V. Hence since B is an orthonormal


basis of V there exist bij and cij such that
T fj = bj1 e1 + + bjm em + cj1 f1 + + cjn fn

(28.45)

Hence by definition of the matrix of an operator (equation 18.1),


Page 202

Last revised: November 3, 2012

TOPIC 28. NORMAL OPERATORS

a11
..
.

a1m
M(T, B) =
0

.
..
0

Math 462

...

am1
..
.

b11
..
.

...

...
...

amm
0
..
.

b1m
c11
..
.

...
...

...

c1n

...

bn1
..
.


bnm
= A B
0 C
cn1

..
.

(28.46)

cnn

where A is an m m matrix; B is m n; C is n n; and 0 is the n m


zero matrix.
By theorem 23.5, for any vector v V,
2

kT vk =

m
X

| hT v, ek i |2 +

k=1

n
X

| hT v, fk i |2

(28.47)

k=1

Hence
2

kT ej k =
=

m
X
k=1
m
X

| hT ej , ek i |2

(28.48)

| haj1 e1 + + ajm em , ek i |2

(28.49)

k=1

m
2
m X

X


hajp ep , ek i
=



k=1 p=1
m
2
m X

X


=
ajp hep , ek i



k=1 p=1

2
m X
m
m

X
X


=
ajp pk =
|ajk |2



k=1 p=1

(28.50)

(28.51)

(28.52)

k=1

In other words, kT ej k is the sum of the squares of the jth column of A.


Furthermre,
m
X

kT ej k =

j=1

m X
m
X

|ajk |2

(28.53)

j=1 k=1

which is the sum of the squares of all the elements of A.


Last revised: November 3, 2012

Page 203

Math 462

TOPIC 28. NORMAL OPERATORS


2

Bya similar calculation, kT ej k is the sum of the squares of the jth row
of A B
2

kT ej k =

m
X

|akj |2 +

k=1

so that

m
X

kT ej k =

j=1

n
X

|bkj |2

(28.54)

k=1

m X
m
X

|akj | +

j=1 k=1

m X
n
X

|bkj |2

(28.55)

j=1 k=1

Since T is normal, by theorem 26.16 kT vk = kT vk for all v V. In


particular, kT ej k = kT ej k, which also means that
m
X

kT ej k =

m
X

kT ej k

(28.56)

j=1

j=1

Substituting equation 28.53 and 28.55 into equation 28.56 gives


m
m X
X

|ajk |2 =

j=1 k=1

m X
m
X

|akj |2 +

j=1 k=1

m X
n
X

|bkj |2

(28.57)

j=1 k=1

Cancelling the first terms,


n
m X
X

|bkj |2 = 0

(28.58)

j=1 k=1

which means that a


bjk = 0

(28.59)

for all j, k, i.e., B = 0 and


M(T ) =



A 0
0 C

(28.60)

Since all the bjk = 0, from (28.45) we have that T fk span(f1 , . . . , fn ).


Since F = (f1 , . . . , fn ) is a basis of U , this means that T fk U .
Since any vector u U can be written as a linear combination of the
vectors in F , this means that (u U )(T u U ).
Hence U is invariant under T .
(proof of theorem 28.2 (2)) By the same type of argument we made in the
proof of (1), we obtain
 T

A
0
M(T ) =
(28.61)
0 CT
Page 204

Last revised: November 3, 2012

TOPIC 28. NORMAL OPERATORS

Math 462

Hence T ej is a linear combination of the E = (e1 , . . . , em ).


Since any vector u U is a linear combination of E, we have (u U)(T u
U). Thus U is invariant under T .
(proof of theorem 28.2 (3)) Given U is invariant under both T (hypothesis
of the theorem) and T (item (2)), we need to show that (T |U ) = (T )|U ,
Define S = T |U and pick any v U. Then u U, we have
hu, S vi = hSu, vi = hT u, vi = hu, T vi

(28.62)

Since U is invariant under T , hence


v U = T v U, i.e., (T )|U v U

(28.63)

S v = T v = (T |U ) = T |U

(28.64)

Hence
(proof of theorem 28.2 (4)) Given that U is invariant under T and (T |U ) =
T |U , we need to show that T |U is normal on U.
To prove that T |U is normal on U we need to show that it commutes with
its adjoint. But
(T |U ) (T |U ) = (T |U )(T |U )
= (T |U )(T |U )
= (T |U )(T |U )

(by (3))

(28.65)

(T is self-adjoint so )T T = T T (28.66)
(by (3))

(28.67)

Thus (T |U ) is normal.
(proof of theorem 28.2 (5)) The proof is analogous to the proof of (4).
Theorem 28.3 Let V be a real inner product space and T L(V) be an
operator over V. Then T is normal if and only if there is some orthonormal
basis of V under which M(T ) is block diagonal and each block is either
1 1 or 2 2 with the form


a b
(28.68)
b a
with b > 0.
Proof. ( = ) Suppose that T is normal, and prove by induction on n =
dim(V).
For n = 1, the result follows immediately.
Last revised: November 3, 2012

Page 205

Math 462

TOPIC 28. NORMAL OPERATORS

For n = 2, then either T is self-adjoint or is not self-adjoint. If it is selfadjoint, then by the real spectral theorem (theorem 27.2), it has a basis
given entirely by eigenvectors of T , and hence by theorem 18.8, M(T ) is
diagonal with respect to some basis. If T is not self-adjoint, then by theorem
28.1 M(T ) has the form shown.
As our inductive hypothesis, assume n > 2 and that the result holds for
n 1.
By theorem 20.1, T has an invariant subspace U of either dimension 1 or
2, such that the dimension 1 subspaces have real eigenvalues and in the
dimension 2 subspaces T |U have only have complex conjugate eigenvalue
pairs with non-zero imaginary parts (i.e., they do not have real eigenvalues).
If dim (U) = 1, choose any v U such that kvk = 1 as a basis of U.
If dim (U) = 2, then T |U is normal (theorem 28.2, number (4)). But T |U is
not self-adjoint. To see this, note that if T were self-adjoint, it would have
a real eigenvalue by lemma 27.5 and we have just noted that T |U does not
have real eigenvalues on the 2-dimensional subspaces.
Since T |U is normal but not self adjoint (in dimension 2), theorem 28.1
applies. Then there is some basis in which M(T |U ) has the form given by
equation 28.68.
Next observe that U is invariant under T and T |U is normal on U by
theorem 28.2. Since dim U < n, the inductive hypothesis holds on it.
Hence there is some basis B under which M(T |U ) has the properties
predicted by the theorem. Adding this basis to the basis of U gives a
basis B 0 under which M(T, B 0 ) has the properties of the theorem.
( = ) Suppose there is some basis E under which M(T ) is block diagonal
M(T ) = diagonal(A1 , . . . , An )

(28.69)

with each block Ai either a 1 1 matrix or a 2 2 matrix with the form


given in equation 28.68. Then
M(T ) = diagonal(AT1 , . . . , ATn )

(28.70)

For each Ai , if Ai is 1 1, then Ai = ATi , and hence commutes with its


transpose. If Ai is 2 2 then it has the form of equation 28.68, and we
have already shown in equation 28.34 that this matrix commutes with its
transpose. Hence
Ai ATi = ATi Ai
Page 206

(28.71)

Last revised: November 3, 2012

TOPIC 28. NORMAL OPERATORS

Math 462

and therefore
M(T )M(T ) = diagonal(A1 , . . . , An )diagonal(AT1 , . . . , ATn )

(28.72)
(28.73)

diagonal(A1 AT1 , . . . , An ATn )


diagonal(AT1 A1 , . . . , ATn An )

diagonal(AT1 , . . . , ATn )diagonal(A1 , . . . , An )

(28.75)

= M(T )M(T )

(28.74)
(28.76)

Thus T is normal.

Normal Matrices
Definition 28.4 An n n square matrix is a normal matrix if all of its
eigenvectors are orthogonal.
Theorem 28.5 A normal matrix is sef-adjoint if and only if all of its eigenvalues are real.
Proof. Let N be a normal matrix. Then it has n orthogonal eigenvectors
v1 , . . . , vn with eigenvalues 1 , . . . , n .
Define U as the matrix whose column vectors are the orthonormal eigenvectors vi and = diagonal(1 , . . . , n ). Then U1 = U and


NU = N v1 vn
(28.77)


= Nv1 Nvn
(28.78)


(28.79)
= 1 v1 n vn


= v1 vn diagonal(1 , . . . , n )
(28.80)
= U

(28.81)

Hence
U NU =

(28.82)

N = UU

(28.83)

or equivalently
Thus N is similar to the diagonal matrix and has the same eigenvalues.
N is self-adjoint if and only if N = N, but
N = N (UU ) = UU
U U = UU
=
Last revised: November 3, 2012

(28.84)
(28.85)
(28.86)
Page 207

Math 462

TOPIC 28. NORMAL OPERATORS

The last line is true if and only if i = i for all i i.e., the eigenvalues are
all real.
Theorem 28.6 If N is a normal matrix then there exists square, commuting matrices A and B such that
N = A + iB and AB = BA

(28.87)

Proof. Let = U NU be the canonical diagonal form of N. Then


= r + ii

(28.88)

r = diagonal( Re(1 ) , . . . , Re(n ) )

(28.89)

i = diagonal( Im(1 ) , . . . , Im(n ) )

(28.90)

N = U(r + ii )U = Ur U + iUi U

(28.91)

where

Hence
Therefore N = A + iB, where
A = Ur U and B = Ui U

(28.92)

To see that A and B commute, we use the fact that diagonal matrices
commute:
AB = Ur U Ui U

(28.93)

= Ur i U

(28.94)

= Ui r U

(28.95)

= Ui U Ur U

(28.96)

= BA

(28.97)

Hence A and B commute.


The following theorem shows that our previous definition of normality is
consistent with definition 28.4.

Page 208

Last revised: November 3, 2012

TOPIC 28. NORMAL OPERATORS

Math 462

Theorem 28.7 N is normal if and only if it commutes with its adjoint, i.e,
NN = N N

(28.98)

Proof. ( = ) Suppose N is normal. Then by equation 28.82, if is the


diagonal matrix of eigenvalues, for some orthonormal matrix U we have
N = UU , giving us
NN = (UU )(U U )

= U U

= U U

(28.99)

(28.100)

(28.101)

= (U U )(UU )

(28.102)

= N N

(28.103)

( = ) Suppose that N N = NN . We must show that N is normal. But


the Spectral theorem, it is sufficient to show that N has a complete set of
orthogonal eigenvectors.
Define the matrices A and B by
A=

1
(N + N ),
2

1
(N N )
2i

B=

(28.104)

Then
N = A + iB

(28.105)

Furthermore, since NN = N N,
4iAB = (N + N )(N N )
2

(28.106)
2

(28.107)

(28.108)

(28.109)

= N + N N NN (N )

= N + NN N N (N )

= N(N + N ) N (N + N )

= (N N )(N + N )

(28.110)

= 4iBA

(28.111)

Hence AB = BA.
Let a1 a2 an be the eigenvalues of A and let = diagonal(a1 , . . . , an ).
Since A = A, we know the eigenvalues are real (see theorem 26.9).
Since A is real and self-adjoint it has n mutually orthogonal unit vectors
(real spectral theorem). Let V be the matrix whose columns are eigenvecLast revised: November 3, 2012

Page 209

Math 462

TOPIC 28. NORMAL OPERATORS

tors of A. Then
V AV = V

=V

Av1

a1 v1

Avn
an vn




(28.112)
(28.113)

= V V

(28.114)

(28.115)

Let
K = V BV

(28.116)

Then since B is self adjoint,


K = (V BV) = V B V = V BV = K

(28.117)

Consequently K is self-adjoint. However, K is not necessarily diagonal.


Since A and B commute we have
K = (V AV)(V BV)

= V ABV

(28.119)

= V BAV

(28.118)
(28.120)

= (V BV)(V AV)

(28.121)

= K

(28.122)

so K and commute. Hence (K)ij = (K)ij , and so


n
X

ip Kpj =

n
X

p=1
n
X

p=1
n
X

p=1

p=1

ai ip Kpj =

Kip pj

(28.123)

Kip aj pj

(28.124)

ai Kij = Kij aj

(28.125)

Thuse Kij = 0 unless ai = aj .


If the eigenvalues are distinct, then K is diagonal.
If the eigenvalues are not distinct then we can write as a block diagonal
matrix
= diagonal(1 , . . . , m )
(28.126)
where each i is a diagonal matrix with all of its diagonal elements equal,
and correspondingly
K = diagonal(K1 , . . . , Km )
Page 210

(28.127)

Last revised: November 3, 2012

TOPIC 28. NORMAL OPERATORS

Math 462

where each of the Ki is self-adjoint and has the same dimensions as the
corresponding i . (This follows from equation 28.125.)
Since Ki is self-adjoint, there is some self-adjoint matrix Wi such that
Wi Ki Wi is diagonal. Let
W = diagonal(W1 , . . . , Wm )

(28.128)

Then W is also self-adjoint and


P = W KW

W1

..
=

(28.129)
.

0

W1 K1 W

=
0

K1
0

0
Wm

0
..

is diagonal. Now compute



W1

..
W W =
.
0

..

0
W1

Km
0

Km Wm
Wm

0
Wn

0
..

(28.130)

Wm
(28.131)

W1

0
m

..

..


W1 1 W1
0

..
=

0
Wm
m Wm

W1 1 IW1
0

..
=

0
Wm m IWm
=

Wm
(28.132)
(28.133)

(28.134)
(28.135)

Let U = VW. Then


U AU = (W V )A(VW) = W (V AV)W = W W =

(28.136)

is a real diagonal matrix, as is


U BU = (W V )B(VW) = W (V BV)W = W KW = P (28.137)
Therefore
U NU = U (A + iB)U = U AU + iU BU = + iP
Last revised: November 3, 2012

(28.138)
Page 211

Math 462

TOPIC 28. NORMAL OPERATORS

is diagonal. Hence N is is similar to a diagonal matrix, and the columns of


U are the orthogonal eigenvectors of N.

Page 212

Last revised: November 3, 2012

Topic 29

Positive Operators
Definition 29.1 Let V be a finite-dimensioned non-zero inner-product
space over V and let T L(V) be a self-adjoint operator over V. Then
we say T is a positive operator operator, or a positive semi-definite
operator, or just T is positive, if T is self-adjoint and
hT v, vi 0

(29.1)

for all v V.
The term positive is a bit misleading because equality is allowed in equation 29.1; a better term might be non-negative rather than positive. An
operator that satisfies
hT v, vi > 0
(29.2)
unless v = 0 is called positive definite. When equality is allowed, the
definite becomes semi-definite but in our shorthand term, we call it
positive.
Example 29.1 Every orthogonal projection is positive.
Definition 29.2 A self-adjoint matrix A is called positive definite if
vT Av > 0

(29.3)

for all v 6= 0, and is called positive-semi-definite if


vT Av 0

(29.4)

Definition 29.3 An operator S is called the square root of an operator


T if S 2 = T .

2012. Draft of: November 3, 2012

Page 213

Math 462

TOPIC 29. POSITIVE OPERATORS

Example 29.2 Let


T (x, y, z) = (z, 0, 0)

(29.5)

S(x, y, z) = (y, z, 0)

(29.6)

S 2 (x, y, z) = S(y, z, 0) = (z, 0, 0) = T (x, y, z)

(29.7)

and
Then
Hence S is a square root of T .
Example 29.3 Square Root of a Matrix Find the square root of


125 75
T=
(29.8)
75 125

Let M = T be denoted by


a b
M=
(29.9)
c d
Then
T = M2 =


a
c


b a
d c

  2
b
a + bc
=
d
ac + cd


ab + bd
bc + d2

(29.10)

This gives four equations in four unknowns:


a2 + bc = 125

(29.11)

ab + bd = 75

(29.12)

ac + cd = 75

(29.13)

bc + d = 125

(29.14)

The equations are non-linear and there are four solutions for M. To see
this, substractequation (29.11) from (29.14) to get
a2 = d2
This gives two choices, a = d or a = d.
We can rule out a = d since this would give a contradiction in either of
(29.12) or (29.13). Hence a = d.
Substituting this into equations (29.12) and (29.13) gives

Page 214

2ab = 75 = b = 75/2a

(29.15)

2ac = 75 = c = 75/2a

(29.16)

Last revised: November 3, 2012

TOPIC 29. POSITIVE OPERATORS

Math 462

From equation (29.11)


125 = a2 + bc = a2 +

752
4a2

(29.17)

Rearranging,
a4 125a2 +

752
=0
4

(29.18)

This is a quadratic in a2 , so

125 100
225
125
125 1252 752
=
=
or
a2 =
2
2
2
2

(29.19)

This gives four solutions for a,


5
5
15
15
a = , , ,
(29.20)
2
2
2
2

For a = 5/ 2, then d = 5/ 2. Hence b = c = 75/2a = 15/ 2, giving




1
5
15
M=
(29.21)
5
2 15

For a = d = 5/ 2, this gives b = c = 15/ 2.




1 5 15
M=
2 15 5

For a = d = 15/ 2, we have b = c = 75/2a = 5/ 2, so that




1 15 5
M=
2 5 15

Finally, for a = d = 15/ 2, we have b = c = 75/2a = 5/ 2, so




1 15
5
M=
15
2 5

Last revised: November 3, 2012

(29.22)

(29.23)
that
(29.24)

Page 215

Math 462

TOPIC 29. POSITIVE OPERATORS

Theorem 29.4 Let T L(V). Then the following are equivalent.


(1) T is positive.
(2) T is self-adjoint and all of its eigenvalues are non-negative.
(3) T has a positive square root.
(4) T has a self-adjoint square root.
(5) There exists some S L(V) such that T = S S.
Proof. ((1) = (2)) Let T be positive. The by definition of a positive operator, it is self-adjoint.
Let be an eigenvalue of T with non-zero eigenvector v. Then since T is
positive,
2
0 hT v, vi = hv, vi = hv, vi = kvk
(29.25)
2

Since kvk > 0 we conclude that 0, as required.


((2) = (3)) By (2) T is self adjoint and has non-negative eigenvalues.
We need to show that T has a positive square root.
If V is real, by the real spectral theorem (theorem 27.2), T has an orthonormal basis of eigenvectors.
If V is complex, since every self-adjoint map is normal, T is normal, and
the complex spectral theorem (theorem 27.1) gives the same result.
Let E = (e1 , . . . , en ) be a basis of V consisting of eigenvectors of T with
corresponding eigenvalues (1 , . . . , n ).
We are given that each i 0 so they each have a real square root.
Define S L(V) by
p
i ej
(29.26)
Pn
Any vector v V has an expansion v = i=1 ai ei for some a1 , . . . , an .
Hence given and v V,
* n
+
n
X
X
hSv, vi = S
ai ei ,
aj ej
(29.27)
Sej =

i=1

* n
X
i=1

* n
X
i=1

Page 216

ai Sei ,

j=1
n
X

+
aj ej

(linearity)

(29.28)

(eigenvalues)

(29.29)

j=1

ai i ei ,

n
X

+
aj ej

j=1

Last revised: November 3, 2012

TOPIC 29. POSITIVE OPERATORS

Math 462

Hence by additivity, homogeneity in the first argument, and conjugate homogeneity in the second argument of the inner product,
hSv, vi =

n X
n
X
i=1 j=1
n
n X
X
i=1 j=1
n X
n
X

hai i ei , aj ej i

(29.30)

ai i aj hei , ej i

(29.31)

ai i aj ij

(29.32)

i=1 j=1
n
X

i |ai |2 0

(29.33)

i=1

because it is the sum of non-negative numbers. Hence S is positive.


Furthermore, since any v = a1 e1 + + an en ,
S 2 v = S 2 (a1 e1 + + an en )

(29.34)

= S(a1 Se1 + + an Sen )


p
p
= S(a1 1 e1 + + an n en )
p
p
= a1 1 Se1 + + an n Sen

(29.36)

= a1 1 e1 + + an n en

(29.38)

= a1 T e1 + + an T en

(29.35)

(29.37)

(because T ej = j ej )

(29.39)

= T (a1 e1 + an T en )

(29.40)

= Tv

(29.41)

Hence S is a positive square root of T , which proves statement (3).


((3) = (4)) Given that T has a positive square root, we need to show
that T has a self-adjoint square root.
Since T has a positive square root S, the square root is self adjoint, by definition of a positive operator (the definition of a positive operator includes
the words self-adjoint).
((4) = (5)) Given that T has a self-adjoint square root, we need to show
that there exists some S L(V) such that T = S S.
Since T has a self-adjoint square root S, we know that S = S . Thus
T = S 2 = S S = S S, which is (5).
((5) = (1)) Given that there exists some operator S such that T = S S
we need to show that T is positive. But
Last revised: November 3, 2012

Page 217

Math 462

TOPIC 29. POSITIVE OPERATORS

T = (S S) = S (S ) = S S = T

(29.42)

Therefore T is self-adjoint. To show that T is positive, let v V.


hT v, vi = hS Sv, vi = hSv, Svi = kSvk 0

(29.43)

Hence T is positive, as required.


Lemma 29.5 Let T be self-adjoint, and let 1 , . . . , n be the distinct eigenvalues of T . Then
V = null(T 1 I) null(T n I)

(29.44)

and each vector in null(T I) is orthogonal to all of the subspaces in the


decomposition.
Proof. By the spectral theorems (theorems 27.1 and 27.2) V has an basis
consisting of eigenvectors of T .
Equation 29.44 follows from theorem 18.8, statement (5).
Theorem 29.6 Let T be a positive operator on V. Then T has a unique
positive square root.
Proof. Let T L(V ) have distinct eigenvalues 1 , . . . , n . By theorem 29.4
(2), all of these eigenvalues are non-negative.
By the same theorem, since T is positive, it has a positive square root S.
Since T is positive, it is self-adjoint. By lemma 29.5,
V = null(T 1 I) null(T n I)

(29.45)

Let be any eigenvalue of S. Then if v null(S I), then Sv = v.


Hence
T v = S 2 v = 2 v
(29.46)
Thus v null(T 2 I), so that 2 is an eigenvalue of T with eigenvector
v. Thus for some j,
2 = j = = j
(29.47)
Furthermore, from (29.46)
p
j I) null(T j I)
(29.48)

The only possible eigenvalues of S are 1 , . . . , n . Because S is selfadjoint, by lemma 29.5


p
p
V = null(S 1 I) null(S n I)
(29.49)
null(S

Page 218

Last revised: November 3, 2012

TOPIC 29. POSITIVE OPERATORS

Math 462

Combining (29.45), (29.48), and (29.49) gives


null(S

p
j I) = null(T j )I

(29.50)

Thus on each null(T j I), the operator S is equivalent to multiplication


by j . Thus S is uniquely determined by the T .
Theorem 29.7 A self-adjoint matrix is positive definite if and only if all
of its eigenvalues are positive.

Proof. ( = ) Let M be a matrix with only positive eigenvalues. We have


already observed (remark 25.6) that if M is self-adjoint then the Schur
decomposition gives
D = U1 MU

(29.51)

where D is a diagonal matrix consisting of the eigenvalues of M and U1 =


U .
Let u, v be vectors such that u = Uv, so that for any u, v = U1 u = U u
Then if we denote the components of v by v1 , . . . , vn ,
hMu, ui = hMUv, Uvi

(29.52)

= hU MUv, vi

(29.53)

= hDv, vi

(29.54)

= h(1 v1 , . . . , n vn ), vi
2

(29.55)
2

= 1 |v1 | + + n |vn | > 0

(29.56)

unless v = 0, which only occurs if u = 0. Hence M is positive definite.


( = ) Suppose that hMu, ui > 0 unless u = 0. We need to show that all
the eigenvalues are positive.
Let (u1 , . . . , un ) be unit length eigenvectors of M with eigenvalues i . Then
2

hMui , ui i = hi ui , ui i = i kui k = i

(29.57)

Since any vector v can be written as a linear combination of the ui ,


v = a1 u1 + + an un
Last revised: November 3, 2012

(29.58)
Page 219

Math 462

TOPIC 29. POSITIVE OPERATORS

Thus
*
hMv, vi =

n
X

ai ui ,

i=1 j=1
n X
n
X

+
aj uj

(29.59)

hMai ui , aj uj i

(29.60)

ai aj i hui , uj i

(29.61)

|ai |2 i ij

(29.62)

i=1
n X
n
X
i=1 j=1
n
n X
X

n
X
j=1

i=1 j=1

|ai |2 i

(29.63)

which is greater than zero if and only if all eigenvalues are positive.

Page 220

Last revised: November 3, 2012

Topic 30

Isometries
Definition 30.1 A operator S L(V) is called an isometry if it preserves
length in the sense
kSvk = kvk
(30.1)
for all v V.
Example 30.1 Suppose that S L(V) such that
S(ej ) = j ej

(30.2)

where E = (e1 , . . . , en ) is an orthonormal basis and |j | = 1 for all j =


1, . . . , n. Then S is an isometry. To see this, let v V; then since E is a
basis, by theorem 23.5 we have
v=

n
X

hv, ei i ei and kvk =

i=1

n
X

| hv, ei i |2

(30.3)

i=1

Since the j are eigenvalues of S corresponding to the eigenvectors ej ,


Sv =

n
X

hv, ei i Sei =

i=1

n
X

i hv, ei i ei

Using the fact that |i | = 1 and that hei , ej i = ij gives


*
+
X
X
kSvk =
i hv, ei i ei ,
j hv, ej i ej
i

=
=

(30.5)

XX
i

(30.4)

i=1

i j hv, ei i hv, ej i hei , ej i

(30.6)

Xn
i=1

|i |2 | hv, ei i |2 = kvk

2012. Draft of: November 3, 2012

(30.7)
Page 221

Math 462

TOPIC 30. ISOMETRIES

where the last step follows from equation 30.3. Therefore S is an isometry.
Theorem 30.2 Properties of Isometries. Let S L(V). Then the
following are equivalent:
1. S is an isometry.
2. hSu, Svi = hu, vi for all u, v V.
3. S S = I
4. (Se1 , . . . , Sen ) is orthonormal whenever (e1 , . . . , en ) is orthonormal.
5. There exists an orthonormal basis (e1 , . . . , en ) such that (Se1 , . . . , Sen )
is orthonormal.
6. S is an isometry.
7. hS u, S vi = hu, vi for all u, v V.
8. SS = I
9. (S e1 , . . . , S en ) is orthonormal whenever (e1 , . . . , en ) is orthonormal.
10. There exists an orthonormal basis (e1 , . . . , en ) such that (S e1 , . . . , S en )
is orthonormal.
Proof. We will show that (1) through (5) are equivalent. The proof that
conclusions (6) through (10)are equivalent is analogous. To see that (1)
through (5) are equivalent to (6) through (10) we compare items (3) and
(8), and observe that SS = I if and only if S S = I.
((1) = (2)) Assume that S is an isometry; then kSxk = kxk for all x V.
Then for any u, v V,

1
2
2
(homework)
(30.8)
kSu + Svk kSu Svk
hSu, Svi =
4


1
2
2
=
kS(u + v)k kS(u v)k
(30.9)
4


1
2
2
=
ku + vk ku vk
(isometry) (30.10)
4
= hu, vi
(homework) (30.11)
((2) = (3)) Suppose that hSu, Svi = hu, vi for all u, v V. Then
h(S S I)u, vi = hS Su, vi hu, vi = hSu, Svi hu, vi = 0

(30.12)

This is true for all v V including v = (S S I)u. Thus


h(SS I)u, (SS I)ui = 0
Page 222

(30.13)

Last revised: November 3, 2012

TOPIC 30. ISOMETRIES

Math 462

Hence
(S S I)u = 0

(30.14)

S S I = 0 = S S = I

(30.15)

for all u U. Thus

((3) = (4)) Assume that S S = I. Let E = (e1 , . . . , en ) be orthonormal.


hSej , Sek i = hS Sej , ek i = hej , ek i = jk

(30.16)

Hence (Se1 , . . . , Sen ) is an orthonormal list of vectors. Thus (3) = (4).


((4) = (5)) Assume that (Se1 , . . . , Sen ) is orthonormal whenever (e1 , . . . , en )
is orthonormal.
Pick any such orthonormal list of vectors (e1 , . . . , en ). Then (Se1 , . . . , Sen )
is orthonormal. Thus (4) = (5).
((5) = (1)) Assume that there exists an orthonormal list (e1 , . . . , en )
such that (Se1 , . . . , Sen ) is orthonormal.
Let v V. Then since hSei , Sej i = ij ,
2
n

X


hv, ei i ei
kSvk = S


i=1
n
2
X



hv, ei i Sei
=


i=1
* n
+
n
X
X
=
hv, ei i Sei ,
hv, ej i Sej
2

i=1
n X
n
X
i=1 j=1
n
X

(30.17)

(30.18)

(30.19)

j=1

hv, ei i hv, ej i hSei , Sej i


2

| hv, ei i |2 = kvk

(30.20)

(30.21)

i=1

where the last step comes from theorem 23.5. Thus S is an isometry and
(5) = (1).
Remark 30.3 Every isometry is normal, since S S = I = SS .

Last revised: November 3, 2012

Page 223

Math 462

TOPIC 30. ISOMETRIES

Theorem 30.4 Let V be a complex non-zero inner-product space and let


S L(V). Then S is an isometry if and only if there is an orthonormal
basis of V consisting of eigenvectors of S with eigenvalues |i | = 1 for all
i = 1, . . . , n.
Proof. ( = ) Suppose that S is an isometry.
Hence there exists an orthonormal basis (e1 , . . . , en ) consisting solely of
eigenvectors of S (Spectral Theorem, theorem 27.1). Let (1 , . . . , n ) be
the corresponding eigenvalues. Then
|j | = kj ej k = kSej k = kej k = 1

(30.22)

( = ) This was proven already as example 30.1.


Theorem 30.5 Let V be a real inner-product space with S L(V). S is
an isometry if and only f there is an orthonormal basis of V in which S has
a block-diagonal matrix where each block is either
(a) a 1 1 matrix that is either 1 or -1; or


a b
(b) a 2 2 matrix of the form
with a2 + b2 = 1 and b > 0.
b a
Proof. ( = )
Let S be an isometry. Then S is normal (see remark 30.3).
By theorem 28.3 there is an orthonormal basis under which M(T
) isblock

a b
diagonal with the blocks either 1 1 or 2 2 with the form
with
b a
b > 0.
We need to show that a2 + b2 = 1 for each 2 2 block; and that the 1 1
blocks are 1.
Let be any 1 1 diagonal block. Then there is a basis vector ej such that
is an eigevalue
Sej = ej
(30.23)
Since S is an isometry, kSej k = kej k = 1 (the last step follows because ej
is a vector in an orthonormal basis, hence has length 1). Hence || = 1;
since V is real, this means = 1.
Now consider any 2 2 block
=
Page 224


a
b

b
a


(30.24)

Last revised: November 3, 2012

TOPIC 30. ISOMETRIES

Math 462

There must be basis vectors ej , ej+1 corresponding to this block such that
Sej = aej + bej+1
 
x
To see why this is true, let =
R2 . Then
y

 
a b x
=
b a
y


 
 
ax by
x
y
=
=a
+b
bx + ay
y
x
= a + b

(30.25)

(30.26)
(30.27)
(30.28)

where h, i = 0 and kk = kk. Then let us construct ej and ek in Rn


by replacing the remaining components with 0. The result Sej = aej + bek
follows.
Since S is an isometry,
2

1 = kej k = kSej k

(30.29)

= haej + bej+1 , aej + bej+1 i


2

(30.30)
2

= a hej , ej i + ab hej , ej+1 i + ba hej+1 , ej i + b hej+1 , ej+1 i


2

=a +b

(30.31)
(30.32)

as required.
( = ) Suppose that for some orthonormal basis the matrix has the desired
form. Then we can decompose V into subspaces such that
V = U1 Um

(30.33)

where each Ui is a subspace of dimension 1 or 2, each subspace is orthogonal,


and because of the form of the matrix, each S|Uj is an isometry on Uj .
To verify that each S|Uj is an isometry on Uj in dimension 2, let v Uj =
(0, . . . , 0, x, y, 0, . . . )T . Then

kSvk = kdiagonal( , , )(0, , 0, x, y, 0, , 0)k



  
a b x

=
b a
y
 
 
x
y

=
a
+
b
y
x
= kau + bu0 k
Last revised: November 3, 2012

(30.34)
(30.35)
(30.36)
(30.37)
Page 225

Math 462

where u =

TOPIC 30. ISOMETRIES


 
 
x
y
and u0 =
, so that hu, u0 i = 0 and kuk = ku0 k.
y
x

Hence
2

kSvk = hau + bu0 , au + bu0 i = (a2 + b2 ) kuk = kuk

(30.38)

Therefore each S|Uj is an isometry on Uj .


Finally, to show that S is an isometry, let v V. Then we can write
v = u1 + + um

(30.39)

where uj Uj . Since SUj is an isometry on Uj ,


2

kSvk = kSu1 + + Sum k


*
+
X
X
=
Sui ,
Suj
i

X
X

(30.41)

hSui , Suj i =

i,j

(30.40)

hS Sui , uj i =

i,j

hui , uj i

(30.42)

i,j

hui , ui i (because the Uj are orthogonal)

(30.43)

kui k = kvk

(30.44)

which proves that S is an isometry.


Remark 30.6 The 2 2 blocks have the form


cos sin
sin
cos

(30.45)

which is a rotation by an angle in R2 ; thus every rotation in Rn is composed of a sequence of rotations in the coordinate planes.
Corollary 30.7 Let V = Rn where n is odd, and let S be an isometry on
V. Then S has either 1 or 1 (or possibly both) as an eigenvalue.

Page 226

Last revised: November 3, 2012

Topic 31

Singular Value
Decomposition
Note for Next Year
Move the material from chapter 7 to the end of this chapter and integrate
more thoroughly.
Theorem 31.1 Polar Decomposition. Let T L(V). Then there exists
an isometry S L(V) such that

(31.1)
T = S T T
Lemma 31.2 Let U be a subspace of V. Then
dim V = dim U + dim U

(31.2)

Proof. This follows from V = U U


Proof. (of theorem 31.1). Let v V. Then
2

kT vk = hT v, T vi = hT T v, vi 0

(31.3)

hence T T is a positive operator and has a square root. We can continue


D
E

2
(31.4)
kT vk =
T T T T v, v
D
E

=
T T v, T T v
(31.5)

2


= T T v
(31.6)

2012. Draft of: November 3, 2012

Page 227

Math 462
Now define

TOPIC 31. SINGULAR VALUE DECOMPOSITION

S 0 : range T T 7 rangeT

by
S0


T T v = T v

(31.7)

(31.8)

To prove that we can extend S 0 to an isometry S L(V), let v1 , v2 V


such that

T T v1 = T T v2
(31.9)
Then
kT v1 T v2 k = kT (v1 v2 )k




= S 0 T T (v1 v2 )



= S 0 ( T T v1 T T v2 )
=0

(31.10)
(31.11)
(31.12)
(31.13)

Hence
T v1 = T v2

(31.14)

Thus S 0 is well defined and injective.


We must also show that S 0 is linear. This is left as an exercise.

Let u range( T T ). Then for some v, we have u = T T v, hence






T T v
kS 0 uk = S 0

(31.15)

= kT vk (equation 31.8)




= T T v (equation 31.6)

(31.17)

= kuk

(31.18)

(31.16)

Therefore S 0 is an isometry.
Since S 0 is injective we have by theorem 14.19 that

dim range T T = dim null(S 0 ) + dim range(S 0 )


0

(31.19)

= dim range(S )

(31.20)

= dim range(T )

(31.21)

where the last step follows from equation 31.7.


Page 228

Last revised: November 3, 2012

TOPIC 31. SINGULAR VALUE DECOMPOSITION

Math 462

From the lemma


dim range(T ) + dim rangeT = dim V



= dim range T T + dim range T T


hence by equation 31.21,



dim range T T
= dim (rangeT )


Choose orthonormal bases E = (e1 , . . . , en ) of

(31.22)
(31.23)

(31.24)

range T T
and F =

(f1 , . . . , fn ) of (range(T )) ; these bases have the same length by equation


31.24.
Define




7 (rangeT )
S 00 : range T T

S 00 (a1 e1 + + an en ) = a1 f1 + + an fn



Then for all w range T T ,



2
X

00

ai ei
kS wk = S


i
*
+
X
X
=
ai fi ,
aj fj
00

(31.25)
(31.26)

(31.27)

(31.28)

ai aj hfi , fj i

(31.29)

i,j

|ai |2 = kwk

(31.30)

hence S 00 is an isometry.
Define S as the operator

v range T T



Sv =
S 00 v, v range T T

S 0 v,

(31.31)

Let v V be any vector in V. Then there exists u range T T and





w range T T
such that
v = u + w = Sv = S 0 u + S 00 w
Last revised: November 3, 2012

(31.32)
Page 229

Math 462

TOPIC 31. SINGULAR VALUE DECOMPOSITION

But by definiton of S 0 ,

S T T v = S0 T T v = T v

(31.33)

whence T = S T T , which is the desired formula (equation 31.1). To


prove that S is an isometry, frrom equation 31.32,
2

kSvk = kS 0 u + S 00 wk
2

(31.34)
2

00

= kS uk + ks wk
2

= kuk + kwk

(Pythagorean Theorem)

(31.35)

(S 0 and S 00 are isometries)

(31.36)

(Pythagorean Theorem)

(31.37)

= kvk

Hence S is an isometry.
Example 31.1 Find a Polar Decomposition of


11 5
T=
(31.38)
2 10

The polar decomposition is T = S T T for some isometry S, which we


must find.
Since T is real, then



2
10

11
T =T =
5

Hence

(31.39)


2 11
10 2

 

5
125 75
=
10
75 125

From example 29.3 we found four solutions for M = T T,


T T =


1 15

5
2

11
5



1 5
5
,
15
2 15



1
15
5
,
5
15
2



1 15
15
,
5
2 5

(31.40)

5
15


(31.41)

Lets look at the first solution:




1 15
5
M=
15
2 5
We can verify that this works since:




1 15
1 15
5
5
M2 =

15
15
2 5
2 5

 

1 250 150
125 75
=
=
= T T
75 125
2 150 250
Page 230

(31.42)

(31.43)
(31.44)

Last revised: November 3, 2012

TOPIC 31. SINGULAR VALUE DECOMPOSITION

Math 462

It only remains to find S. But by the polar decomposition theorem,

(31.45)
T = S T T = SM = S = TM1
We can calculate that
M1 =


1
3

20 2 1


1
3

(31.46)

Hence
S = TM1 =


1
11

20 2 2


5 3
10 1



1 7
1
=
3
5 2 1

To verify that S is an isometry, we calculate





1 7 1
1 7
S S = ST S =

5 2 1 7
5 2 1


1
7


1
=I
7

(31.47)

(31.48)

Hence S is an isometry. Thus a polar decomposition of T is give by







1 7 1
1 15
5

T=S T T=

(31.49)
15
5 2 1 7
2 5
|
{z
} |
{z
}
S

T T

Definition 31.3 (Singular Value). Let T L(V) and define S = T T .


The eigenvalues of S are called the singular values of T .

Lemma 31.4 Let be an eigenvalue of M . Then 2 is an eigenvalue of


M . The converse also holds, in the sense that if 2 is an eigenvalue
of M

then at least one of the square roots is also an eigenvalue of M .

Proof. ( = ) Let be an eigenvalue of M with eigenvector v. Then

M v = v = M M v = M v = 2 v
(31.50)
= M v = 2 v
( = ) Let 2 be an eigenvalue of M with eigenvector v. Then

0 = (M 2 I)v = ( M I)( M + I)v


Hence either

(31.51)

(31.52)

or is an eigenvalue of M .

Corollary 31.5 The singular values of T are the square roots of eigenvalues
of T T .
Last revised: November 3, 2012

Page 231

Math 462

TOPIC 31. SINGULAR VALUE DECOMPOSITION

Example 31.2 Find the singular values of




11 5
T=
2 10

(31.53)

The singular values of T are the square roots of the eigenvalues of M, where
M = T T. But from the previous example


125 75
T T =
(31.54)
75 125
The characteristic equation is
0 = (125 )2 (75)2 = 2 250 + 10, 000
= ( 200)( 50)

(31.55)
(31.56)

Since = 200, 50, the singular values of T are


200, 50 = 10 2, 5 2.

Theorem 31.6 (Singular Value Decomposition.) Let T L(V) with


singular values s1 , . . . , sn . Then there exist orthonormal bases E = (e1 , . . . , en )
and F = (f1 , . . . , fn ) of V such that
T v = s1 hv, e1 i f1 + + sn hv, en i fn

(31.57)

for every v V.

Proof. Let M = T T . We are given that the si are singular values of T ;


hence they are eigenvalues of M .
By the spectral theorem (theorem 27.1; you should verify that M is normal), there is an orthonormal basis E = (e1 , . . . , en ) V consisting solely of
eigenvectors of M ,
M ej = sj ej
(31.58)
Hence every v V can be expanded in this basis,
v = hv, e1 i e1 + + hv, en i en

(31.59)

M v = M hv, e1 i e1 + + M hv, en i en

(31.60)

= s1 hv, e1 i e1 + + sn hv, en i en

(31.61)

Multiplying by M ,

By the polar decomposition theorem, there is an isometry S such that

T = S T T = SM
(31.62)
Page 232

Last revised: November 3, 2012

TOPIC 31. SINGULAR VALUE DECOMPOSITION

Math 462

Multiplying equation 31.61 by S,


T v = SM v = s1 hv, e1 i Se1 + + sn hv, en i Sen
= s1 hv, e1 i f1 + + sn hv, en i fn

(31.63)
(31.64)

where
fj = Sej ,

j = 1, . . . , n

(31.65)

By theorem 30.2 conclusion (4), (f1 , . . . , fn ) is orthonormal because (e1 , . . . , en )


is orthonormal and S is an isometry, as required.
Let us consider what the singular value decomposition means in terms of
matrices. We will assume the usual Euclidean inner product
X
hp, qi = q p =
qi pi
(31.66)
i

Let (e1 , . . . , en ) and (f1 , . . . , fn ) be orthonormal bases. Let E and F be the


matrices whose columns are given by ei and fi , so that


F = f1 . . . fn
(31.67)


E = e1 . . . en
(31.68)
Define the matrix A = FSE , where S is the diagonal matrix of singular
values si . Then for any vector v V,

s1 0 0
e1


0 s2 0 ..
f
.
.
.
f
Av =
(31.69)
..
. v
1
n
.

en
0
0 sn

s1 0 0
hv, e1 i


 0 s2 0 .
= f1 . . . fn .
(31.70)
..
..

hv, en i
0
0 sn

s1 hv, e1 i



..
= f1 . . . fn
(31.71)

.
sn hv, en i
= s1 hv, e1 i f1 + + sn hv, en i fn

(31.72)

which is exactly the same as (31.57).


This proves the following theorem.
Last revised: November 3, 2012

Page 233

Math 462

TOPIC 31. SINGULAR VALUE DECOMPOSITION

Theorem 31.7 Singular Value Decomposition for Matrices. Let A


be any matrix over R or C. Then there exist unitary matrices E and F
such that
A = FSE
(31.73)
where S is a diagonal matrix of singular values of A, and in particular, for
any orthonormal bases, (e1 , . . . , en ) and (f1 , . . . , fn ), the matrices given by


(31.74)
F = f1 . . . fn


(31.75)
E = e1 . . . en
form a singular value decomposition given
by equation 31.73, where the
ej are normalized
eigenvectors
of
M
=
A A and fj = Sej , where A =

SM = S A A is the polar decomposition of A, so that S is an isometry.


Example 31.3 Find a singular value decomposition of


11 5
T=
2 10

(31.76)

From example (31.2), the singular values of T are s1 = 10 2 and s2 = 5 2,


and the polar decomposition of T is T = SM where



1 15
5
M = T T =
(31.77)
15
2 5
and


1 7

S=
5 2 1

1
7


(31.78)

From the proof of the singular value decomposition


theorem we define ei

as the orthonormal eigenvectors of M = T T.


 
 
1 1
1 1

and
.
Orthonormal eigenvectors of M are
2 1
2 1
We also find the orthonormal basis fj from fj = Sej as


  

1 7 1
1
1
4/5

f1 = Se1 =
=
(31.79)
1
3/5
5 2 1 7
2


  

1 7 1
1
1
3/5

f2 = Se2 =
=
(31.80)
1
4/5
5 2 1 7
2
Hence the singular value decomposition is




0
1/
4/5 3/5 10 2

2
M = FSE =
3/5 4/5
0
5 2
1/ 2

Page 234


1/2
1/ 2

(31.81)

Last revised: November 3, 2012

Topic 32

Generalized Eigenvectors
Motivation Let V be a vector space over F and let T be an operator on
V. Then we would like to describe T by finding subspaces of V in which
V = U1 Un

(32.1)

where each Uj is invariant under T . This is possible if and only if V has


a basis consisting only of eigenvectors (see theorem 18.8). By the same
theorem, this is true if annd only if
V = null(T 1 I) null(T n I)

(32.2)

where 1 , . . . , n are distinct eigenvalues.


By the corollary to the spectral theorem (Corollary 27.3) we know that this
is possible whenever T is self-adjoint. The goal is to find a way to generalize
this to all operators, not just the ones that are self-adjoint. We do this with
the concept of generalized eigenvectors.
Definition 32.1 Let T L(V) be an operator with eigenvalue . A vector
v V is called a generalized eigenvector of T with eigenvector if
(T I)j v = 0

(32.3)

for some positive integer j.


Remark 32.2 Every eigenvector is a generalized eigenvector (with j = 1).
Example 32.1 Find the generalized eigenvectors of

1 1 1
M = 0 2 1
1 0 2

2012. Draft of: November 3, 2012

(32.4)

Page 235

Math 462

TOPIC 32. GENERALIZED EIGENVECTORS

The characteristic equation is


p(x) = 3 7x + 5x2 x3 = (x 1)(x 1)(x 3)
The eigenvector of 3 satisfies

3a
a
1 1 1
0 2 1 b = 3b
3c
c
1 0 2

(32.5)

(32.6)

Hence
a + b + c = 3a

(32.7)

2b + c = 3b = b = c

(32.8)

a + 2c = 3c = a = c

(32.9)

Hence an eigenvector of corresponding to the eigenvalue 3 is (1, 1, 1). This


is also a generalized eigenvector because all eigenvectors are also generalized
eigenvectors.
We need to find two generalized eigenvectors corresponding to the eignevalue
1 because it has multiplicity of 2. One of them is the eigenvector corresponding to the eigenvalue 1, which satisfies
a + b + c = a = b = c

(32.10)

2b + c = b = c = b

(32.11)

a + 2c = c = a = c

(32.12)

Hence an eigenvector corresponding to the eigenvalue 1 is (1, 1, 1). This


gives our second generalized eigenvector.
The third generalized eigenvector satisfies
(M I)2 v = 0
for = 1. Hence we need to find a
But

0 1 1 0
(M I)2 = 0 1 1 0
1 0 1 1

(32.13)

vector in the null space of (M I)2 .


1
1
0


1
1
1 = 1
1
1

1
1
1

2
2
2

(32.14)

so a vector in the null space satisfies


a + b + 2c = 0
Page 236

(32.15)

Last revised: November 3, 2012

TOPIC 32. GENERALIZED EIGENVECTORS

Math 462

If we choose a = b = 1 then c = 1 and we have (1, 1, 1) which is a


multiple of the second eigenvector. If we chose a = 2 and b = 0 then
c = 1, which gives (2, 0, 1), which is not a multiple of either of the other
eigenvectors. There may be other possible solutions.
Summarizing, a set of generalized eigenvectors are

2
1
1
1 , 1 , 0
1
1
1

(32.16)

Theorem 32.3 Let T L(V) be an operator on V and let k > 0 be an


integer. Then if T k v = 0,
{0} = null(T 0 ) null(T 1 ) null(T k ) null(T k+1 )

(32.17)

Proof. Let v null(T k ). Then there is some v such that T k v = 0, sp that


T k+1 v = T (T k v) = T 0 = 0

(32.18)

Hence null(T k ) null(T k+1 ). Since this holds for all k, the result follows.
Theorem 32.4 If for some m > 0, null(T m ) = null(T m+1 ) then
null(T 0 ) null(T 1 ) null(T m ) = null(T m+1 ) =

(32.19)

In other words, once two nullspaces in equation 32.17 are equal, all successive nullspaces are equal.
Proof. Suppose that for some m > 0,
null(T m ) = null(T m+1 )

(32.20)

Let k > 0. We know by equation 32.17 that


null(T m+k ) null(T m+k+1 )

(32.21)

Let v null(T m+k+1 ). Then


0 = T m+k+1 v = T m+1 (T k v)

(32.22)

T k v null(T m+1 ) = null(T m )

(32.23)

Hence

Last revised: November 3, 2012

Page 237

Math 462

TOPIC 32. GENERALIZED EIGENVECTORS

where the second equality follows from our initial assumption (equation
32.20). Hence since
T m+k v = T m T k v = T m 0 = 0

(32.24)

we have v null(T m+k ). As a consequence of this,


null(T m+k+1 ) null(T m+k )

(32.25)

comparison with equation 32.21 gives equality of the two nullspaces.


Theorem 32.5 Let T L(V) be an operator over V. Then equality holds
in equation 32.17 for k dim V:
null(T dim V ) = null(T dim V+1 ) = null(T dim V+2 ) =

(32.26)

Proof. Let m = dim V and suppose that null(T dim V ) 6= null(T dim V+1 )
(proof by contradiction). Then
{0} = null(T 0 ) null(T 1 ) null(T m ) null(T m+1 )

(32.27)

where the subsets are strict (not equality). Since the subsets are strict, the
dimension of each set must increase by at least 1. Hence
{0} = dim null(T 0 ) < dim null(T 1 ) <
< dim null(T m ) < dim null(T m+1 )
dim null(T dim V+1 ) = dim null(T m+1 ) > m + 1 = dim V + 1

(32.28)
(32.29)

But T dim V+1 is a subspace of V and cannot have dimension larger than
V. Hence this is a contradiction. Hence our assumption is false, and the
theorem follows.
Theorem 32.6 Let be an eigenvalue of T L(V). The set of generalized
eigenvalues of T corresponding to eigenvalue equals null((T I)dim V ).
Proof. Let v null((T I)dim V ). Then by definition of generalized eigenvectors, v is a generalized eigenvector with eigenvalue . This proves that
null((T I)dim V ) the set of generalized eigenvectors for

(32.30)

Now assume that v is a generalized eigenvector with eigenvalue . Then


there exists some positive integer j such that
(T I)j v = 0 = v null((T I)j )
Page 238

(32.31)

Last revised: November 3, 2012

TOPIC 32. GENERALIZED EIGENVECTORS

Math 462

Let S = T I. Then
v null(S j ) null(S j+1 ) S dim V

(32.32)

by the previous theorem. Hence


v null((T I)dim V )

(32.33)

Thus
The set of generalized Eigenvectors for null((T I)dim V )

(32.34)

Hence the set of generalized eigenvectors for is equal to null((T I)dim V ).


Definition 32.7 An operator T is called nilpotent if for some k > 0,
T k = 0.
Corollary 32.8 Let N L(V) be nilpotent. Then N dim V = 0.
Proof. Let v V. Then since N is nilpotent there exists some j such that
N j v = 0. Hence (N 0I)j v = 0 which makes v a generalized eigenvector
with eigenvalue 0, i.e., every vector in V is a generalized eigenvector to N
with eigenvalue zero.
By theorem 32.6 the set of generalized eigenvectors of N with eigenvalue
zero is null(N dim V ), i.e.,
V = null(N dim V )
hence for every v V, we have N

dim V

v = 0. Thus N

(32.35)
dim V

= 0.

Theorem 32.9 Let T L(V). Then


V = range(T ) range(T 2 )
range(T k ) range(T k+1 )

(32.36)

Proof. Let w range(T k+1 ). Then for some v V,


w = T k+1 v = T k (T v) range T k

(32.37)

Hence range(T k+1 ) range(T k )


The sequence terminates when k = dim(V), as the following shows.
Theorem 32.10 Let V be a vector space and T L(V) an operator over
V. Then



range T dim V = range T dim V+1 = range T dim V+2 =
(32.38)
Proof. (exercise)
Last revised: November 3, 2012

Page 239

Math 462

TOPIC 32. GENERALIZED EIGENVECTORS

Page 240

Last revised: November 3, 2012

Topic 33

The Characteristic
Polynomial
Definition 33.1 The multiplicity of an eigenvalue is the dimension
of the subspace of generalized eigenvectors corresponding to , i.e.,
multiplicity() = dim null((T I)dim V )

(33.1)

The following theorem shows that if the matrix of T is upper triangular,


the multiplicity is equal to the number of times that occurs on the main
diagonal.
Theorem 33.2 Let V be a vector field over F; let T L(V) an operator
over V; and let F. If B = (v1 , . . . , vn ) is a basis of V such that
M = M(T, B) is upper triangular, appears on the diagonal of M
dim null((T I)dim V )

(33.2)

times.
Proof. Consider first the case with = 0. To prove the general case, replace
T with T 0 = T I in what follows.
Define n = dim V.
For n = 1 the result holds because M is 1 1.
Let n > 1 and assume the result holds for n 1.
Suppose that (with respect to B), M is upper triangular; define i as the
diagonal elements, so that,

2012. Draft of: November 3, 2012

Page 241

Math 462

TOPIC 33. THE CHARACTERISTIC POLYNOMIAL

0
M=
.
..

..
.

..
.

n1

(33.3)

Let U = span (v1 , . . . , vn1 ). By theorem 18.2 U is invariant under T .


Furthermore,

..
..
M0 = M(T |U , U) = ...
(33.4)
.
.
0 n1
By the inductive hypothesis, 0 appears on the diagonal of M0
dim null((T |U )dim U ) = dim null((T |U )n1 )

(33.5)

times, because dim U = n 1. Furthermore, by theorem 32.5 we have


null((T |U )n1 ) = null((T |U )n )

(33.6)

Hence (combining the last two equations), the number of zeros on the diagonal of M 0 is
dim null((T |U )n )
(33.7)
We consider two cases: n = 0 and n 6= 0.
Case 1: n 6= 0 By equation 33.3
n
1

0
M(T n ) = M(T )n = Mn =
.
..

..
.

nn1

..
.

nn

(33.8)

Since vn is the nth basis vector,


T n vn = u + nn vn

(33.9)

for some u U.
[To see this, let mi be the ith row vector of Mn and vni the ith component
of vn ; then

m1
X

Mn vn = ... vn =
mi (vn )i
(33.10)
i
mn
Page 242

Last revised: November 3, 2012

TOPIC 33. THE CHARACTERISTIC POLYNOMIAL

Math 462

The first n 1 terms give u; the last term is mn vnn = nn vn .]


Now let v null(T n ). Then (since B is a basis),
v = u0 + avn

(33.11)

where u0 U and a F. Since v null(T n ),


0 = T n v = T n u0 + aT n vn = T n u0 + au + ann vn
| {z } | {z }
U

(33.12)

6U

where we have used equation 33.9 in the last step. Since U is invariant
under T , and hence under T n , the first two terms are in U, while the last
term is in span (vn ) which is not in U. Since the sum is 0, each term must
be zero. Hence
ann = 0
(33.13)
Since we have assumed (case 1) that n 6= 0 this measn a = 0. Hence
v = u0 + avn = u0 U . But we chose v as any element of null(T n ). Hence
null(T n ) U

(33.14)

and therefore
null(T n ) = null((T |U )n )

(33.15)
n

(otherwise there would be some element of null(T ) that was not in U.)
Hence by equation 33.7 the number of zeros on the diagonal of M0 is
dim null((T |U )n ) = dim null(T n )

(33.16)

which is the objective we want to prove (eq. 33.2).


Case 2: n = 0. By theorem 13.12
dim(U + null(T n )) = dim(U) + dim(null(T n )) dim(U null(T n )) (33.17)
Since
dim U = n 1

(33.18)

dim(U null(T n )) = dim(T |U )n

(33.19)

and
equation 33.17 gives us
dim(null(T n )) = dim(U + null(T n )) + dim(U null(T n )) dim(U)
(33.20)
= dim(T |U )n + dim(U + null(T n )) (n 1)
Last revised: November 3, 2012

(33.21)
Page 243

Math 462

TOPIC 33. THE CHARACTERISTIC POLYNOMIAL

Consider any vector of the form


w = u vn

(33.22)

where u U. Then w 6 U. We want to choose u such that w null(T n ).


This requires
T n w =T n (u vn ) = T n u T n vn = 0
n

(33.23)

= T u = T vn

(33.24)

= T n vn range ((T |U )n )

(33.25)

But because M is upper triangular (see equation 33.9, for example)


T vn = u + n vn = u

(33.26)

where u U ; the second equality follows because we are assuming n = 0.


Hence
T vn U
(33.27)
and consequently

T n vn = T n1 (T vn ) = T n1 u range (T |U )n1 range ((T |U )n )
(33.28)
where the last step comes from theorem 32.10.
By choosing u in this manner we have w 6 U but w null(T n ).
Hence
n = dim(V) dim(U + null(T n )) > dim U = n 1

(33.29)

= dim(U + null(T n )) = n

(33.30)

Substituting equation 33.30 into equation 33.21,


dim(null(T n )) = dim(T |U )n + dim(U + null(T n )) (n 1)

(33.31)

(33.32)

(33.33)

= dim(T |U ) + n (n 1)
= dim(T |U ) + 1
Using the last result in equation 33.7 gives, since n = 0,
Number of zeroes on diagonal of M

(33.34)
0

= 1 + Number of zeroes on diagonal of M

(33.35)

(33.36)

= 1 + dim null((T |U ) ) = dim(null(T ))


which is the required result.
Page 244

Last revised: November 3, 2012

TOPIC 33. THE CHARACTERISTIC POLYNOMIAL

Math 462

Theorem 33.3 Let V be a complex vectors space and suppose that T


L(V). Then the sum of the multiplicities of all the eigenvalues of T equals
dim V.
Proof. The previous theorem showed that the multiplicity of each eigenvalue is the number of times that appears on the diagonal. The total
number of diagonal elements is dim V.
Definition 33.4 Characteristic Polynomial. Let V be a complex vector
space and let T L(V) have distinct eigenvalues 1 , . . . , m with multiplicities d1 , . . . , dm . Then the characteristic polynomila of T is given by
p(z) = (z 1 )d1 (z 2 )d2 (z m )dm

(33.37)

Theorem 33.5 Cayley-Hamilton Theorem. An operator satisfies its


own characteristic polynomial, i.e., if T L((V) is an operator on a complex
vector space V with characteristic polynomial p(z) then
p(T ) = 0

(33.38)

Proof. Let B = (v1 , . . . , vn ) be a basis such that M(T, B) is upper triangular.


We need to prove that p(T ) = 0. This is equivalent to proving that
p(T )v = 0

(33.39)

for all v. Since any v can be expanded as a sum of the basis vectors, this
is also equivalent to proving that
p(T )vj = 0

(33.40)

p(T ) = (T 1 )d1 (T 2 )d2 (T m )dm

(33.41)

for all j = 1, 2, . . . , n. Since

we need to prove that


0 = (T 1 )(P 2 ) (z m )vj

(33.42)

because equation 33.42 implies equation 33.40.


The proof is by induction. For j = 1, there is only one eigenvector, which
satisfies
T v1 = 1 v1 = (T 1 )v1 = 0
(33.43)
which is precisely equation 33.42 for j = 1.
Last revised: November 3, 2012

Page 245

Math 462

TOPIC 33. THE CHARACTERISTIC POLYNOMIAL

For the inductive step, suppose that for 1 < j n, equation 33.42 holds
for 1, 2, . . . , j 1.
Let Mj = M(T |Uj ) where Uj = span(v1 , . . . , vj ). Then the bottom right
diagonal element of Mj j I is zero, hence
(T |Uj j I)vj span (v1 , . . . , vj1 )

(33.44)

i.e.,
vj =

j1
X

ak vk

(33.45)

k=1

j
Y

(T k I)vj =

k=1

j
Y

(T k I)

k=1

j1
X

a` v` = 0

(33.46)

`=1

which proves equation 33.42.


Example 33.1 Verify the Cayley-Hamilton Theorem for


1 3
M=
2 4

(33.47)

The characteristic equation is



3
4 x

(33.48)

= (1 + x)(4 + x) 6

(33.49)


1 x
p(x) =
2
2

= x + 5x 2

(33.50)

The Cayley-Hamilton Theorem says that


p(M) = M2 + 5M 2I = 0

(33.51)

But






1 3
1 3
1 3
1
+5
2
2 4
2 4
2 4
0

 
 

7
15
5 15
2 0
=
+

10 22
10 20
0 2


0 0
=
0 0

p(M) =

Page 246


0
1

(33.52)
(33.53)
(33.54)

Last revised: November 3, 2012

Topic 34

The Jordan Form


Theorem 34.1 Let V be a vector space over F, T L(V) and p is a
polynomial over F. Then null(p(T )) is invariant under T .
Proof. Let v null(p(T )). Then
p(T )v = 0

(34.1)

and since T commutes with p(T ) (because T commutes with itself),


p(T )T v = T p(T )v = T (0) = 0

(34.2)

T v null(p(T ))

(34.3)

Therefore
hence null(p(T )) is invariant under T .
Theorem 34.2 Let V be a complex vector space, and let T be an operator
over V, with distinct eigenvalues 1 , . . . , m and corresponding eigenspaces
(subspaces spanned by the generalized eigenvectors) U1 , . . . , Um . Then
(1) Each Uj is invariant under Tj
(2) Each (T j I)|Uj is nilpotent.
(3) V = U1 Um
Proof. Proof of (1) By theorem 32.6
Uj = null((T j I)dim V )

2012. Draft of: November 3, 2012

(34.4)
Page 247

Math 462

TOPIC 34. THE JORDAN FORM

Let
p(z) = (z j )dim V

(34.5)

By theorem 34.1 null(p(T )) = Uj is invariant under T .


Proof of (2) This follows because corresponding to the j there are generalized eigenvectors vj1 , . . . , vjp such that
(T j I)q vjq = 0

(34.6)

Hence T j I is nilpotent.
Proof of (3). The multiplicity of each j is dim(Uj ) (def. 33.1) and the
sum of the multiplicities is dim(V) (theorem 33.3),
dim V = dim U1 + + dim Um

(34.7)

U = U1 + + Um

(34.8)

Define
By (1) each of the Ui is invariant under T ; hence U is invariant under T .
Define S = T |U . Every generalized eigenvector of T is a generalized eigenvector of S, with the same eigenvalues; and the the eigenvalues have the
same multiplicities. Hence
dim U = dim U1 + + dim Um = dim V

(34.9)

But U is a subspace of V it must have a smaller dimension, or have the


same dimension only if it is equal to V. Hence
V = U = U1 + + Um

(34.10)

which is the desired result.


Theorem 34.3 Let V be a complex vector space and let T L(V) be an
operator on V. Then there is a basis consisting of generalized eigenvectors
of T .
Proof. Let 1 , . . . , m be the eigenvalues of T .
Define Uj as the generalized eigenspace corresponding to j .
Let vj1 , . . . , vjp be the generalized eigenvectors corresponding to j . Then
by definition of Uj , it is the span of the these eigenvectors. Hence they form
a basis of Uj .
Join together all of the bases of each Uj formed in this way. The result is
a basis of V which consists solely of generalized eigenvectors of Uj .
Page 248

Last revised: November 3, 2012

TOPIC 34. THE JORDAN FORM

Math 462

Theorem 34.4 Let V be a vector space and N L(V) be nilpotent. Then


there is a basis B of V such that M(N, B) is strictly upper triangular (i.e.,
all the diagonal elements are zero).
Proof. Since the result is follows trivially if N is the zero operator (hence
M is the zero matrix) we assume that N is not the zero operator.
Since N is nilpotent then for some m, N m = 0. For this m, null(N m ) = V,
because N m v = 0 for every v V.
Choose a basis of V = null(N m ) as follows: choose any basis of null(N ).
Then extend it to a basis of null(N 2 ), then extend the result to a basis of
null(N 3 ), . . . . The result is a basis B of V.
Form M(N, B). The columns are basis vectors of V. The first columns are
in null(N ), followed by vectors in null(N 2 ), etc.
Since the first column v1 is in null(N ), N v1 = 0. Hence v1 = 0, i.e.,
the first column is entirely zeros. The same argument holds for any other
vector in the first set of columns.
The next set of columns are in null(N 2 ). Let vj be any such column.
0 = N 2 vj = N (N vj ) = N vj null(N )

(34.11)

Hence N vj is a linear combination of the columns to the left of it; thus the
diagonal element must be zero.
Repeat the argument for each succeeding column.
Theorem 34.5 Let V be a complex vector space and T LV . Let
1 , . . . , m be the distinct eigenvalues of T . Then there is a basis B of
V such that
M(T, B) = diagonal(A1 , . . . , Am )
(34.12)
where each Aj is upper triangular with j on the diagonal.
Proof. For each j , let Uj be the subspace spanned by the corresponding
generalized eigenvectors.
By theorem 34.2, (T j I)|Uj is nilpotent.
For each j we can choose a basis Bj of Uj such that (by theorem 34.4)

M(T |Uj

j I, Bi ) =
0

Last revised: November 3, 2012

..

(34.13)

Page 249

Math 462

TOPIC 34. THE JORDAN FORM

Define Aj = M(T |Uj ). Then (add j I to the above equation),

..
Aj =

.
0
j

(34.14)

Now form a basis of V by combining the bases of Uj ; the result is a matrix

A1
0

..
M(T, B) =
(34.15)

.
0
Am

Definition 34.6 Let V be a vector space and T L(V). A basis B of V is


called a Jordan Basis for T if the matrix of T is block diagonal,

A1
0

..
M(T, B) =
(34.16)

.
0

Am

with each block upper-triangular with j on the main diagonal and 1s on


the super-diagonal,

j 1
0

.. ..

.
.

(34.17)
Aj =

.
.. 1

0
j
Theorem 34.7 Let V be a complex vector space. If T L(V) then there
is a basis of V that is a Jordan Basis of V.
Lemma 34.8 Let V be a vector space and let N L(V) be nilpotent. Then
there exist vectors v1 , . . . , vk V such that
1. The following is a basis of V:
(v1 , N v1 , . . . , N m(v1 ) v1 , . . . , vk , N vk , . . . , N m(vk ) vk )

(34.18)

2. The following is a basis of null(N ):


(N m(v1 ) v1 , . . . , N m(vk ) vk )

(34.19)

Proof. (See Axler.)


Page 250

Last revised: November 3, 2012

TOPIC 34. THE JORDAN FORM

Math 462

Proof. (Theorem 34.7). Let N L(V) be nilpotent. Construct the vectors


v1 , . . . , vk as in the lemma.
Observe that N sends the first vector in the list
Bj = (N m(vj ) vj , . . . , N vj , vj )

(34.20)

to zero and each subsequent vector in the list to the vector to its left.
By reversing each Bj defined in equation 34.20 and forming the list
B 0 = (reverse(B1 ), . . . , reverse(Bk ))

(34.21)

we obtain the basis in part (1) of the lemma. Hence


B = (B1 , . . . , Bk )
is also a basis of V, but with the
with each block of the form

(34.22)

property that M(N, B) is block diagonal


1
..
.

0
..

..

1
0

(34.23)

This proves the theorem when T is nilpotent.


Now suppose T is any operator over V, with distinct eigenvalues 1 , . . . , m
and corresponding generalized eigenspaces U1 , . . . , Um . Then
V = U1 Um

(34.24)

Sj = (T j I)|Uj

(34.25)

where
is nilpotent, by theorem 34.2 (2).
Now apply the argument in the previous paragraph: to each nilpotent
operator Sj there is a basis of Uj such that M(Sj , Bj ) has the form of
equation 34.23. By equation 34.25, each block has the form desired (by
adding j I to the form in equaiton 34.23). This gives a Jordan Block,
proving the theorem.
Example 34.1 Find the Jordan form

1
M = 0
1
Last revised: November 3, 2012

of
1
2
0

1
1
2

(34.26)

Page 251

Math 462

TOPIC 34. THE JORDAN FORM

From example 32.1 the eigenvalues of


(with multiplicity 1). Hence a Jordan

1 1
J = 0 1
0 0

M are 1 (with multiplicity 2) and 3


form is

0
0
(34.27)
3

Page 252

Last revised: November 3, 2012

TOPIC 34. THE JORDAN FORM

Math 462

Last revised: November 3, 2012

Page 253