Advanced Linear Algebra PDF

TEXTBOOKS in MATHEMATICS
ADVANCED
LINEAR ALGEBRA
Hugo J. Woerdeman
Drexel University
Philadelphia, Pennsylvania, USA
TEXTBOOKS in MATHEMATICS
Series Editors: Al Boggess and Ken Rosen
PUBLISHED TITLES
ABSTRACT ALGEBRA: AN INQUIRY-BASED APPROACH
Jonathan K. Hodge, Steven Schlicker, and Ted Sundstrom
APPLIED ABSTRACT ALGEBRA WITH MAPLE™ AND MATLAB®, THIRD EDITION
Richard Klima, Neil Sigmon, and Ernest Stitzinger
APPLIED DIFFERENTIAL EQUATIONS: THE PRIMARY COURSE
Vladimir Dobrushkin
COMPUTATIONAL MATHEMATICS: MODELS, METHODS, AND ANALYSIS WITH MATLAB® AND MPI,
SECOND EDITION
Robert E. White
DIFFERENTIAL EQUATIONS: THEORY, TECHNIQUE, AND PRACTICE, SECOND EDITION
Steven G. Krantz
DIFFERENTIAL EQUATIONS: THEORY, TECHNIQUE, AND PRACTICE WITH BOUNDARY VALUE PROBLEMS
Steven G. Krantz
DIFFERENTIAL EQUATIONS WITH MATLAB®: EXPLORATION, APPLICATIONS, AND THEORY
Mark A. McKibben and Micah D. Webster
ELEMENTARY NUMBER THEORY
James S. Kraft and Lawrence C. Washington
EXPLORING LINEAR ALGEBRA: LABS AND PROJECTS WITH MATHEMATICA®
Crista Arangala
GRAPHS & DIGRAPHS, SIXTH EDITION
Gary Chartrand, Linda Lesniak, and Ping Zhang
INTRODUCTION TO ABSTRACT ALGEBRA, SECOND EDITION
Jonathan D. H. Smith
INTRODUCTION TO MATHEMATICAL PROOFS: A TRANSITION TO ADVANCED MATHEMATICS, SECOND EDITION
Charles E. Roberts, Jr.
INTRODUCTION TO NUMBER THEORY, SECOND EDITION
Marty Erickson, Anthony Vazzana, and David Garth
PUBLISHED TITLES CONTINUED
LINEAR ALGEBRA, GEOMETRY AND TRANSFORMATION

Bruce Solomon
MATHEMATICAL MODELLING WITH CASE STUDIES: USING MAPLE™ AND MATLAB®, THIRD EDITION
B. Barnes and G. R. Fulford
MATHEMATICS IN GAMES, SPORTS, AND GAMBLING–THE GAMES PEOPLE PLAY, SECOND EDITION
Ronald J. Gould
THE MATHEMATICS OF GAMES: AN INTRODUCTION TO PROBABILITY
David G. Taylor
MEASURE THEORY AND FINE PROPERTIES OF FUNCTIONS, REVISED EDITION
Lawrence C. Evans and Ronald F. Gariepy
NUMERICAL ANALYSIS FOR ENGINEERS: METHODS AND APPLICATIONS, SECOND EDITION
Bilal Ayyub and Richard H. McCuen
ORDINARY DIFFERENTIAL EQUATIONS: AN INTRODUCTION TO THE FUNDAMENTALS
Kenneth B. Howell
RISK ANALYSIS IN ENGINEERING AND ECONOMICS, SECOND EDITION
Bilal M. Ayyub
TRANSFORMATIONAL PLANE GEOMETRY
Ronald N. Umble and Zhigang Han
MATLAB® is a trademark of The MathWorks, Inc. and is used with permission. The MathWorks does
not warrant the accuracy of the text or exercises in this book. This book’s use or discussion of MAT-
LAB® software or related products does not constitute endorsement or sponsorship by The MathWorks
of a particular pedagogical approach or particular use of the MATLAB® software.
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2016 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business
No claim to original U.S. Government works

Version Date: 20151021
International Standard Book Number-13: 978-1-4987-5404-0 (eBook - PDF)
This book contains information obtained from authentic and highly regarded sources. Reasonable
efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication
and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information stor-
age or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copy-
right.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222
Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that pro-
vides licenses and registration for a variety of users. For organizations that have been granted a photo-
copy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
To my very supportive family:
Dara, Sloane, Sam, Ruth, and Myra.
Contents
Preface to the Instructor xi
Preface to the Student xiii
Acknowledgments xv
Notation xvii
List of Figures xxi
1 Fields and Matrix Algebra 1
1.1 The field Z3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 The field axioms . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Field examples . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.1 Complex numbers . . . . . . . . . . . . . . . . . . . . 7
1.3.2 The finite field Zp , with p prime . . . . . . . . . . . . 9
1.4 Matrix algebra over different fields . . . . . . . . . . . . . . . 11
1.4.1 Reminders about Cramer’s rule and the adjugate

matrix. . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2 Vector Spaces 27
2.1 Definition of a vector space . . . . . . . . . . . . . . . . . . . 27
vii
viii Contents
2.2 Vector spaces of functions . . . . . . . . . . . . . . . . . . . . 29
2.2.1 The special case when X is finite . . . . . . . . . . . . 31
2.3 Subspaces and more examples of vector spaces . . . . . . . . 32
2.3.1 Vector spaces of polynomials . . . . . . . . . . . . . . 34
2.3.2 Vector spaces of matrices . . . . . . . . . . . . . . . . 36
2.4 Linear independence, span, and basis . . . . . . . . . . . . . 37
2.5 Coordinate systems . . . . . . . . . . . . . . . . . . . . . . . 45
2.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3 Linear Transformations 55
3.1 Definition of a linear transformation . . . . . . . . . . . . . . 55
3.2 Range and kernel of linear transformations . . . . . . . . . . 57
3.3 Matrix representations of linear maps . . . . . . . . . . . . . 61
3.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4 The Jordan Canonical Form 69
4.1 The Cayley–Hamilton theorem . . . . . . . . . . . . . . . . . 69
4.2 Jordan canonical form for nilpotent matrices . . . . . . . . . 71
4.3 An intermezzo about polynomials . . . . . . . . . . . . . . . 75
4.4 The Jordan canonical form . . . . . . . . . . . . . . . . . . . 78
4.5 The minimal polynomial . . . . . . . . . . . . . . . . . . . . 82
4.6 Commuting matrices . . . . . . . . . . . . . . . . . . . . . . 84
4.7 Systems of linear differential equations . . . . . . . . . . . . 87
4.8 Functions of matrices . . . . . . . . . . . . . . . . . . . . . . 90
4.9 The resolvent . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

Contents ix
5 Inner Product and Normed Vector Spaces 109
5.1 Inner products and norms . . . . . . . . . . . . . . . . . . . . 109
5.2 Orthogonal and orthonormal sets and bases . . . . . . . . . . 119
5.3 The adjoint of a linear map . . . . . . . . . . . . . . . . . . . 122
5.4 Unitary matrices, QR, and Schur triangularization . . . . . . 125
5.5 Normal and Hermitian matrices . . . . . . . . . . . . . . . . 128
5.6 Singular value decomposition . . . . . . . . . . . . . . . . . . 132
5.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
6 Constructing New Vector Spaces from Given Ones 147
6.1 The Cartesian product . . . . . . . . . . . . . . . . . . . . . 147
6.2 The quotient space . . . . . . . . . . . . . . . . . . . . . . . 149
6.3 The dual space . . . . . . . . . . . . . . . . . . . . . . . . . . 157
6.4 Multilinear maps and functionals . . . . . . . . . . . . . . . . 166
6.5 The tensor product . . . . . . . . . . . . . . . . . . . . . . . 168
6.6 Anti-symmetric and symmetric tensors . . . . . . . . . . . . 179
6.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
7 How to Use Linear Algebra 195
7.1 Matrices you can’t write down, but would still like to use . . 196
7.2 Algorithms based on matrix vector products . . . . . . . . . 198
7.3 Why use matrices when computing roots of polynomials? . . 203
7.4 How to find functions with linear algebra? . . . . . . . . . . 209
7.5 How to deal with incomplete matrices . . . . . . . . . . . . . 217
7.6 Solving millennium prize problems with linear algebra . . . . 222
7.6.1 The Riemann hypothesis . . . . . . . . . . . . . . . . . 223

x Contents
7.6.2 P vs. NP . . . . . . . . . . . . . . . . . . . . . . . . . 225
7.7 How secure is RSA encryption? . . . . . . . . . . . . . . . . 229
7.8 Quantum computation and positive maps . . . . . . . . . . . 232
7.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
How to Start Your Own Research Project 247
Answers to Exercises 249
Bibliography 323
Index 325
Preface to the Instructor
This book is intended for a second linear algebra course. Students are
expected to be familiar with (computational) material from a first linear
algebra course: matrix multiplication; row reduction; pivots; solving systems
of linear equations; checking whether a vector is a linear combination of
other vectors; finding eigenvalues and eigenvectors; finding a basis of a
nullspace, column space, row space, and eigenspace; computing
determinants; and finding inverses. The assumption is that so far they have
worked over the real numbers R.
In my view, the core material in this book is the following and takes about
24 academic hours1 of lectures:
• Chapter 1: Introduction to the notion of a general field, with a focus on

Zp and C, and a refresher of the computational items from the first
linear algebra course but now presented over different fields. (4 hours)
• Chapter 2: Vector spaces, subspaces, linear independence, span, basis,
dimension, coordinate systems. (6 hours)
• Chapter 3: Linear transformations, range and kernel, matrix
representations. My suggestion is to do a lot of different examples of
finding matrix representations. (5 hours)
• Chapter 4, Sections 4.1, 4.2 and 4.4: Cayley–Hamilton, presenting
the Jordan canonical form (without complete proof) and doing
computational examples. (3 hours)
• Chapter 5: Inner products, norms, orthogonality, adjoint, QR, normal
matrices (including unitary, Hermitian, and positive (semi-)definite),
singular value decomposition. (6 hours)
To supplement the core material there are several options:
• Chapter 4, Sections 4.2–4.5: Provide the details of the proof of the

1 Academic hour = 50 minutes.
xi
xii Preface to the Instructor
Jordan canonical form and introduce the minimal polynomial (2–3

hours).
• Chapter 4, Sections 4.6 and 4.7: These two sections are independent
of one another, and each takes about 1 hour. Clearly, how the Jordan
canonical form helps in solving differential equations is a classical one for
this course. The result on commuting matrices is one that sometimes
makes it into my course, but other times does not. (1–2 hours)
• Chapter 4, Section 4.8 (and 4.9): The section “Functions of

matrices” provides a way to introduce etA and discuss the application to
systems of linear differential equations in a more conceptual way. Section
4.9 requires knowledge of Cauchy’s integral formula and may be
somewhat of a stretch for this course. Still, accepting Cauchy’s formula, I
believe that the corresponding exercises are accessible to the students. (2
hours)
• Chapter 6, Sections 6.1–6.3: These three sections are independent of
one another. They provide fundamental constructions of new vector
spaces from given ones. (1–3 hours)
• Chapter 6, Sections 6.4–6.6: Tensor (or Kronecker) products provide
a really exciting tool. I especially like how the determinants and
permanents show up in anti-symmetric and symmetric tensors, and how
for instance, the Cauchy–Binet formula is derived. I would strongly
consider including this if I had a semester-long course. (4–5 hours)
• Chapter 7: I use the items in this chapter to try to (re)energize the
students at the end of a lecture, and ask questions like “What made
Google so successful?” (Response: Their page rank algorithm). “Does it
surprise you when I tell you that to compute roots of a polynomial, one
builds a matrix and then computes its eigenvalues?” (Response: Yes
(hopefully) and isn’t the QR algorithm really neat?), “Do you want to
win a million bucks?” (Response: Solve a millennium prize problem). Of
course, there is the option to treat these items in much more detail or
assign them as projects (if only I had the time!). (1–7 hours)
I hope that my suggestions are helpful, and that you find this a useful text
for your course. I would be very happy to hear from you! I realize that it
takes a special effort to provide someone with constructive criticism, so when
you take time to do that, I will be especially appreciative.
Preface to the Student
I think that linear algebra is a great subject, and I strongly hope that you
(will) agree. It has a strong theoretical side, ample opportunity to explore
the subject with computation, and a (continuously growing) number of great
applications. With this book, I hope to do justice to all these aspects. I chose
to treat the main concepts (vector space and linear transformations) in their
full abstraction. Abstraction (taking operations out of their context, and
studying them on their own merit) is really the strength of mathematics;
how else can a theory that started in the 18th and 19th centuries have all
these great 21st-century applications (web search engines, data mining,
etc.)? In addition, I hope that when you are used to the full abstraction of
the theory, it will allow you to think of possibilities of applying the theory in
the broadest sense. And, maybe as a more direct benefit, I hope that it will
help when you take abstract algebra. Which brings me to my last point.
While current curriculum structure has different mathematical subfields
neatly separated, this is not reality. Especially when you apply mathematics,
you will need to pull from different areas of mathematics. This is why this
book does not shy away from occasionally using some calculus, abstract
algebra, real analysis and (a little bit of) complex analysis.
Just a note regarding the exercises: I have chosen to include full solutions to
almost all exercises. It is up to you how you use these. Of course, if
increasingly you rely less on these solutions, the better it is. There are a few
exercises (designated as “Honors”) for which no solution is included. These
are somewhat more challenging. Try them and if you succeed, use them to
impress your teacher or yourself!
xiii
Acknowledgments
In the thirty years that I have taught (starting as a teaching assistant), I

have used many textbooks for different versions of a linear algebra course
(Linear Algebra I, Linear Algebra II, Graduate Linear Algebra; semester
course and quarter course). All these different textbooks have influenced me.
In addition, discussions with students and colleagues, sitting in on lectures,
reading papers, etc., have all shaped my linear algebra courses. I wish I had
a way to thank you all specifically for how you helped me, but I am afraid
that is simply impossible. So I hope that a general thank you to all of you
who have influenced me, will do: THANK YOU!
This book came about while I was immobile due to an ankle fracture. So,
first of all, I would like to thank my wonderful wife Dara and my great kids
Sloane, Sam, Ruth, and Myra, for taking care of me during the four months
I spent recovering. I am very thankful to my colleagues at Drexel University:
Dannis Yang, who used a first version of this text for his course, for
providing me with detailed comments on Chapters 1–5; Shari Moskow, R.
Andrew Hicks, and Robert Boyer for their feedback on the manuscript, and
in Robert Boyer’s case for also providing me with one of the figures. In
addition, I am grateful to graduate student Charles Burnette for his
feedback. I am also very thankful to those at CRC Press who helped me
bring this manuscript to publication. Finally, I would like to thank you, the
reader, for picking up this book. Without you there would have been no
point to produce this. So, MANY THANKS to all of you!
xv
Notation
Here are some often-used notations:
• N = {1, 2, 3, . . .}
• N0 = {0, 1, 2, . . .}
• Z = the set of all integers
• Q = the field of rational numbers
• R = the field of real numbers
• R(t) = the field of real rational functions (in t)
• C = the field of complex numbers
• Re z = real part of z
• Im z = imaginary part of z
• z̄ = complex conjugate of z
• |z| = absolute value (modulus) of z
• Zp (with p prime) = the finite field {0, 1, . . . , p − 1}
• rem(q|p) = remainder of q after division by p
• F = a generic field
• det(A) = the determinant of the matrix A
• tr(A) = the trace of a matrix A (= the sum of its diagonal entries)
• adj(A) = the adjugate of the matrix A
• rank(A) = the rank of a matrix A
• F[X] = the vector space of polynomials in X with coefficients in F
• Fn [X] = the vector space of polynomials of degree ≤ n in X with
coefficients in F
xvii
xviii Notation
• Fn×m = the vector space of n × m matrices with entries in F
• FX = the vector space of functions acting X → F

• Hn = {A ∈ Cn×n : A = A∗ }, the vector space over R consisting of all
n × n Hermitian matrices
• 0 = the zero vector
• dim V = the dimension of the vector space V

• Span{v1 , . . . , vn } = the span of the vectors v1 , . . . , vn
• {e1 , . . . , en } = the standard basis in Fn
• [v]B = the vector of coordinates of v relative to the basis B
• Ker T = the kernel (or nullspace) of a linear map (or matrix) T
• Ran T = the range of a linear map (or matrix) T
• T [W ] = {T (w) : w ∈ W } = {y : there exists w ∈ W so that y =
T (w)} ⊆ Ran T
• idV = the identity map on the vector space V
• [T ]C←B = the matrix representation of T with respect to the bases B and
C
• In = the n × n identity matrix
• Jk (λ) = the k × k Jordan block with eigenvalue λ
• wk (A, λ) = dim Ker(A − λIn )k − dim Ker(A − λIn )k−1 ; Weyr
characteristic of A
 
A1 0 · · · 0
 0 A2 · · · 0 
• ⊕pk=1 Ak =  .
 
.. .. .. 
 .. . . . 
0 0 · · · Ap
 
d11 0 · · · 0
 0 d22 · · · 0 
• diag(dii )ni=1 =  .
 
. . .. 
 .. .. .. . 
0 0 ··· dnn
• +̇ = direct sum
• pA (t) = the characteristic polynomial of the matrix A
• mA (t) = the minimal polynomial of the matrix A
Notation xix
• AT = the transpose of the matrix A
• A∗ = the conjugate transpose of the matrix A

• T ? = the adjoint of the linear map T
• h·, ·i = an inner product
• k · k = a norm
• σj (A) = the jth singular value of the matrix A, where σ1 (A) = kAk is
the largest singular value
• ρ(A) = max{|λ| : λ is an eigenvalue of A} is the spectral radius of A
• PSDn = {A ∈ Cn×n : A is positive semidefinite} ⊆ Hn
• v + W = {v + w : w ∈ W } = {x : x − v ∈ W }
• V /W = {v + W : v ∈ V }, the quotient space
• V 0 = the dual space of V
• L(V, W ) = {T : V → W : T is linear}
• v ⊗ w = the tensor product of v and w
• v ∧ w = the anti-symmetric tensor product of v and w
• v ∨ w = the symmetric tensor product of v and w
• A[P, Q] = (aij )i∈P,j∈Q , a submatrix of A = (aij )i,j
List of Figures
1.1 The complex number z in the complex plane. . . . . . . . . . . . 9
These are the roots of the polynomial 10,000 pk (10, 000)xk , where
P
7.1 k=1
pk (n) is the number of partitions of n in k parts, which is the number
of ways n can be written as the sum of k positive integers. . . . . 207
7.2 A Meyer wavelet. . . . . . . . . . . . . . . . . . . . . . . . . . . 210
7.3 Blurring function. . . . . . . . . . . . . . . . . . . . . . . . . . 216
7.4 The original image (of size 3000 × 4000 × 3). . . . . . . . . . . . 217
7.5 The Redheffer matrix of size 500 × 500. . . . . . . . . . . . . . . 224
7.6 A sample graph. . . . . . . . . . . . . . . . . . . . . . . . . . . 225
5.7 The original image (of size 672 × 524 × 3). . . . . . . . . . . . . . 299
xxi
1
Fields and Matrix Algebra
CONTENTS
1.1 The field Z3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 The field axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Field examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.1 Complex numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.2 The finite field Zp , with p prime . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Matrix algebra over different fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
matrix. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
The central notions in linear algebra are vector spaces and linear
transformations that act between vector spaces. We will define these notions
in Chapters 2 and 3, respectively. But before we can introduce the general
notion of a vector space we need to talk about the notion of a field. In your
first Linear Algebra course you probably did not worry about fields because
it was chosen to only talk about the real numbers R, a field you have been
familiar with for a long time. In this chapter we ask you to get used to the
general notion of a field, which is a set of mathematical objects on which you
can define algebraic operations such as addition, subtraction, multiplication
and division with all the rules that also hold for real numbers
(commutativity, associativity, distributivity, existence of an additive neutral
element, existence of an additive inverse, existence of a multiplicative neutral
element, existence of a multiplicative inverse for nonzeros). We start with an
example.
1
2 Advanced Linear Algebra
1.1 The field Z3
Let us consider the set Z3 = {0, 1, 2}, and use the following tables to define
addition and multiplication:
+ 0 1 2 . 0 1 2
0 0 1 2 , 0 0 0 0
1 1 2 0 1 0 1 2
2 2 0 1 2 0 2 1
So, in other words, 1 + 1 = 2, 2 + 1 = 0, 2 · 2 = 1, 0 · 1 = 0, etc. In fact, to

take the sum of two elements we take the usual sum, and then take the
remainder after division by 3. For example, to compute 2 + 2 we take the
remainder of 4 after division by 3, which is 1. Similarly for multiplication.
What you notice in the table is that when you add 0 to any number, it does
not change that number (namely, 0 + 0 = 0, 0 + 1 = 1, 1 + 0 = 1, 0 + 2 = 2,
2 + 0 = 2). We say that 0 is the neutral element for addition. Analogously, 1
is the neutral element for multiplication, which means that when we multiply
a number in this field by 1, it does not change that number (0 · 1 = 0,
1 · 2 = 2, etc.). Every field has these neutral elements, and they are typically
denoted by 0 and 1, although there is no rule that you have to denote them
this way.
Another important observation is that in the core part of the addition table
+
0 1 2
1 2 0
2 0 1
the 0 appears exactly once in every row and column. What this means is
that whatever x we choose in Z3 = {0, 1, 2}, we can always find exactly one
y ∈ Z3 so that
x + y = 0.
We are going to call y the additive inverse of x, and we are going to write
y = −x. So
0 = −0, 2 = −1, 1 = −2.
It is important to keep in mind that the equation y = −x is just a shorthand
of the equation x + y = 0. So, whenever you wonder “what does this −
mean?,” you have to go back to an equation that only involves + and look at
Fields and Matrix Algebra 3
how addition is defined. One of the rules in any field is that any element of a
field has an additive inverse.
How about multiplicative inverses? For real numbers, any number has a
multiplicative inverse except for 0. Indeed, no number x satisfies x · 0 = 1!
In other fields, the same holds true. This means that in looking at the
multiplication table for multiplicative inverses, we should only look at the
part that does not involve 0:
·
1 2
2 1
And here we notice that 1 appears exactly once in each row and column.
This means that whenever x ∈ Z3 \ {0} = {1, 2}, there exists exactly one y
so that
x · y = 1.
We are going to call y the multiplicative inverse of x, and denote this as x−1 .
Thus
1−1 = 1, 2−1 = 2.
In addition to the existence of neutral elements and inverses, the addition

and multiplication operations also satisfy commutativity, associativity and
distributive laws, so let us next give the full list of axioms that define a field.
And after that we will present more examples of fields, both with a finite
number of elements (such as the field Z3 we defined in this subsection) as
well as with an infinite number of elements (such as the real numbers R).
1.2 The field axioms
A field is a set F on which two operations
+ : F × F → F, · : F × F → F
are defined satisfying the following rules:
1. Closure of addition: for all x, y ∈ F we have that x + y ∈ F.

2. Associativity of addition: for all x, y, z ∈ F we have that
(x + y) + z = x + (y + z).
3. Commutativity of addition: for all x, y ∈ F we have that x + y = y + x.

4. Existence of a neutral element for addition: there exists a 0 ∈ F so that
x + 0 = x = 0 + x for all x ∈ F.
5. Existence of an additive inverse: for every x ∈ F there exists a y ∈ F so
that x + y = 0 = y + x.
6. Closure of multiplication: for all x, y ∈ F we have that x · y ∈ F.
7. Associativity of multiplication: for all x, y, z ∈ F we have that
(x · y) · z = x · (y · z).
8. Commutativity of multiplication: for all x, y ∈ F we have that x · y = y · x.
9. Existence of a neutral element for multiplication: there exists a
1 ∈ F \ {0} so that x · 1 = x = 1 · x for all x ∈ F.
10. Existence of a multiplicative inverse for nonzeros: for every x ∈ F \ {0}
there exists a y ∈ F so that x · y = 1 = y · x.
11. Distributive law: for all x, y, z ∈ F we have that x · (y + z) = x · y + x · z.
We will denote the additive inverse of x by −x, and we will denote the
multiplicative inverse of x by x−1 .
First notice that any field has at least two elements, namely 0, 1 ∈ F, and
part of rule 9 is that 0 6= 1. Next, notice that rules 1–5 only involve addition,
while rules 6–10 only involve multiplication. The distributive law is the only
one that combines both addition and multiplication. In an Abstract Algebra
course, one studies various other mathematical notions that involve addition
and/or multiplication where only some of the rules above apply.
Some notational shorthands:
• Since addition is associative, we can just write x + y + z instead of

(x + y) + z or x + (y + z), because we do not have to worry whether we
first add x and y together, and then add z to it, or whether we first add
y and z together, and subsequently add x.
• When we are adding several numbersPk x1 , . . . , xk together, we can write
this as x1 + · · · + xk or also as j=1 xj . For example, when k = 5, we
have
X 5
xj = x1 + x2 + x3 + x4 + x5 = x1 + · · · + x5 .
j=1
We now also have rules like

k−1
X k
X p
X q
X q
X
xj + xk = xj , xj + xj = xj .
j=1 j=1 j=1 j=p+1 j=1
• While above we use · to denote multiplication, we will often leave the ·

out. Indeed, instead of writing x · y we will write xy. Occasionally,
though, we will write the · just to avoid confusion: for instance, if we
want to write 1 times 2, and leave out the · it looks like 1 2. As this looks
too much like twelve, we will continue to write 1 · 2.
• As multiplication is associative, we can just write xyz instead of (xy)z or
x(yz).
Qk
• When multiplying x1 , . . . , xk , we write j=1 xj or x1 · · · xk . For instance,
when k = 5, we have
5
Y
xj = x1 x2 x3 x4 x5 = x1 · · · x5 .
j=1
We now also have rules like

k−1
Y k
Y p
Y q
Y q
Y
( xj )xk = xj , ( xj )( xj ) = xj .
j=1 j=1 j=1 j=p+1 j=1
• We may write x2 instead of xx, or x3 instead of xxx, x−2 instead of

x−1 x−1 , etc. Clearly, when we use a negative exponent we need to insist
6 0. Using this convention, we have the familiar rule xk x` = xk+` ,
that x =
with the convention that x0 = 1 when x 6= 0.
• For the multiplicative inverse we will use both x−1 and x1 . It is

important, though, that we only use x1 for certain infinite fields (such as
Q, R and C), as there we are familiar with 21 (half), 38 (three eighths),
etc. However, in a finite field such as Z3 we will always use the notation
x−1 . So do not write 12 when you mean the multiplicative inverse of 2 in
Z3 !
1.3 Field examples
In this book we will be using the following fields:
• The real numbers R with the usual definition of addition and

multiplication. As you have already taken a first course in linear algebra,
we know that you are familiar with this field.
• The rational numbers Q, which are all numbers of the form pq , where
p ∈ Z = {. . . , −2, −1, 0, 1, 2, . . . } and q ∈ N = {1, 2, 3, . . .}. Again,
addition and multiplication are defined as usual. We assume that you are
familiar with this field as well. In fact, Q is a field that is also a subset of
the field R, with matching definitions for addition and multiplication. We
say that Q is a subfield of R.
• The complex numbers C, which consist of numbers a + bi, where a, b ∈ R
and i2 = −1. We will dedicate the next subsection to this field.
• The finite fields Zp , where p is a prime number. We already introduced
you to Z3 , and later in this section we will see how for any prime number
one can define a field Zp , where addition and multiplication are defined
via the usual addition and multiplication of integers followed by taking
the remainder after division by p.
• The field R(t) of rational functions with real coefficients and independent
variable t. This field consists of functions r(t)
s(t) where r(t) and s(t) are
polynomials in t, with s(t) not being the constant 0 polynomial. For
instance,
13t2 + 5t − 8 5t10 − 27
, (1.1)
t8 − 3t5 t+5
are elements of R(t). Addition and multiplication are defined as usual.
We are going to assume that you will be able to work with this field. The
only thing that requires some special attention, is to think about the
neutral elements. Indeed, the 0 in this field is the constant function 0,
where r(t) ≡ 0 for all t and s(t) ≡ 1 for all t. The 1 in this field is the
constant function 1, where r(t) ≡ 1 for all t and s(t) ≡ 1 for all t. Now
sometimes, these elements appear in “hidden” form, for instance,
0 t+5
≡ 0, ≡ 1.
t+1 t+5
In calculus you had to worry that t+5t+5 is not defined at t = −5, but in
this setting we always automatically get rid of common factors in the
numerator and denominator. More formally, R(t) is defined as the field
r(t)
s(t) , where r(t) and s(t) 6≡ 0 are polynomials in t that do not have a
common factor. If one insists on uniqueness in the representation r(t)
s(t) ,
one can, in addition, require that s(t) is monic which means that the
highest power of t has a coefficient 1 (as is the case in (1.1)).
1.3.1 Complex numbers
The complex numbers are defined as
C = {a + bi ; a, b ∈ R},
with addition and multiplication defined by
(a + bi) + (c + di) := (a + c) + (b + d)i,
(a + bi)(c + di) := (ac − bd) + (ad + bc)i.

Notice that with these rules, we have that (0 + 1i)(0 + 1i) = −1 + 0i, or in
shorthand i2 = −1. Indeed, this is how to remember the multiplication rule:
(a + bi)(c + di) = ac + bdi2 + (ad + bc)i = ac − bd + (ad + bc)i,
where in the last step we used that i2 = −1. It may be obvious, but we
should state it clearly anyway: two complex numbers a + bi and c + di, with
a, b, c, d ∈ R are equal if and only if a = c and b = d. A typical complex
number may be denoted by z or w. When
z = a + bi with a, b ∈ R,
we say that the real part of z equals a and the imaginary part of z equals b.
The notation for this is,
Re z = a, Im z = b.
It is quite laborious, but in principle elementary, to prove that C satisfies all

the field axioms. In fact, in doing so one needs to use that R satisfies the
field axioms, as addition and multiplication in C are defined via addition and
multiplication in R. As always, it is important to realize what the neutral
elements are:
0 = 0 + 0i, 1 = 1 + 0i.
Another tricky part of this is the multiplicative inverse, for instance,
(1 + i)−1 , (2 − 3i)−1 . (1.2)
Here it is useful to look at the multiplication
(a + bi)(a − bi) = a2 + b2 + 0i = a2 + b2 . (1.3)
This means that as soon as a or b is not zero, we have that

(a + bi)(a − bi) = a2 + b2 is a nonzero (actually, positive) real number. From
this we can conclude that
1 a − bi a b
= (a + bi)−1 = 2 = 2 − 2 i.
a + bi a + b2 a + b2 a + b2
So, getting back to (1.2),

1 1 i 1 2 3i
= − , = + .
1+i 2 2 2 − 3i 13 13
Now you should be fully equipped to check all the field axioms for C.
As you notice, the complex number a − bi is a useful “counterpart” of a + bi,

so that we are going to give it a special name. The complex conjugate of
z = a + bi, a, b ∈ R, is the complex number z := a − bi. So, for example,
1 6i 1 6i
2 + 3i = 2 − 3i, + = − .
2 5 2 5
Thus, we have
Re z = Re z, Im z = −Im z.
Finally, we introduce the absolute value or modulus of z, via
p
|a + bi| := a2 + b2 , a, b, ∈ R.
For example,
√
√
r
1 i 1 1 2
|1 + 3i| = 10, | − | = + = .
2 2 4 4 2
Note that we have the rule
zz = |z|2 ,
as observed in (1.3), and its consequence
1 z
= 2
z |z|
when z 6= 0.
A complex number is often depicted as a point in R2 , which we refer to as

the complex plane. The x-axis is the “real axis” and the y-axis is the
“imaginary axis.” Indeed, if z = x + iy then we represent z as the point
(x, y) as in the following figure.
p
The distance from the point z to the origin corresponds to |z| = x2 + y 2 .
The angle t the point z makes with the positive x-axis is referred to as the
argument of z. It can be found via
Re z Im z
cos t = , sin t = .
|z| |z|
Thus we can write

z = |z|(cos t + i sin t).
Figure 1.1: The complex number z in the complex plane.
The following notation, due to Euler, is convenient:
eit := cos t + i sin t.
Using the rules for cos(t + s) and sin(t + s), one can easily check that
eit eis = ei(t+s) .
In addition, note that

eit = e−it .
Thus for z = |z|eit 6= 0, we have that z −1 = 1 −it
|z| e .
1.3.2 The finite field Zp , with p prime
Addition and multiplication in the field Zp are based on the following result
you discovered in elementary school when you did long division.
Proposition 1.3.1 For every q ∈ Z and every p ∈ {2, 3, . . .}, there exists
unique a ∈ Z and r ∈ {0, 1, . . . , p − 1} so that
q = ap + r.
We call r the remainder of q after division by p, and write r = rem(q|p). For

example,
rem(9|2) = 1, rem(27|5) = 2, rem(−30|7) = 5, rem(−19|3) = 2.
Let now p be a prime number, and let Zp = {0, 1, . . . , p − 1}. Define the
addition and multiplication
+ : Zp × Zp → Zp , · : Zp × Zp → Zp
via
a + b := rem(a + b|p), a · b := rem(ab|p). (1.4)
Proposition 1.3.1 guarantees that for any integer q we have that
rem(q|p) ∈ {0, . . . , p − 1} = Zp , so that the closure rules are clearly satisfied.
Also, as expected, 0 and 1 are easily seen to be the neutral elements for
addition and multiplication, respectively. Next, the additive inverse −a of a
is easily identified via
(
a if a = 0
−a =
p − a if a ∈ {1, . . . , p − 1}.
The trickier part is the multiplicative inverse, and here we are going to use
that p is prime. We need to remind you of the following rule for the greatest
common divisor gcd(a, b) of two integers a and b, not both zero.
Proposition 1.3.2 Let a, b ∈ Z not both zero. Then there exist m, n ∈ Z so

that
am + bn = gcd(a, b). (1.5)
Equation (1.5) is sometimes referred to as Bezout’s identity. To solve

Bezout’s identity, one applies Euclid’s algorithm to find the greatest
common divisor (see below), keep track of the division equations, and
ultimately put the equations together.
Algorithm 1 Euclid’s algorithm

1: procedure Euclid(a, b) 6 0
. The g.c.d. of a and b =
2: r ← rem(a|b)
3: while r 6= 0 do . We have the answer if r is 0
4: a←b
5: b←r
6: r ← rem(a|b)
7: return b . The gcd is b
Example 1.3.3 Let a = 17 and b = 5. Then 2 = rem(17|5), which comes

from the equality
2 = 17 − 3 · 5. (1.6)
Next, we look at the pair 5 and 2, and see that 1 = rem(5|2), which comes
from the equality
1 = 5 − 2 · 2. (1.7)
Next we look at the pair 2 and 1, and see that 0 = rem(2|1). This means
that Euclid’s algorithm stops and we find that 1 = gcd(17, 5). To next solve
Bezout’s identity (1.5) with a = 17 and b = 5, we put (1.7) and (1.6)
together, and write
1 = 5 − 2 · 2 = 5 − 2(17 − 3 · 5) = −2 · 17 + 7 · 5,
and find that with the choices m = −2 and n = 7 we have solved (1.5).
We have now all we need to be able to prove the following.
Theorem 1.3.4 Let p be a prime number. Then the set

Zp = {0, 1, . . . , p − 1} with addition and multiplication defined via (1.4) is a
field.
Proof of existence of a multiplicative inverse. Let a ∈ Zp . As p is prime, we

have that gcd(a, p) = 1. By Proposition 1.3.2 there exist integers m, n so
that am + pn = 1. Next we let r = rem(m|p) and let q be so that
r = m − qp. We claim that a−1 = r. Indeed,
ar = am − apq = 1 − pn − apq = 1 − p(n + aq).
From this we see that

1 = rem(ar|p),
and thus in the multiplication defined by (1.4) we have that a · r = 1.
As said, the trickiest part of the proof of Theorem 1.3.4 is the existence of a
multiplicative inverse, so the remainder of the proof we leave to the reader.
1.4 Matrix algebra over different fields
All the matrix algebra techniques that you learned in the first Linear
Algebra course carry over to any field. Indeed, these algebra techniques were
based on elementary algebraic operations, which work exactly the same in

another field. In this section we illustrate these techniques by going through
several examples with different fields. You will be reminded of matrix
multiplication, row reduction, pivots, solving systems of linear equations,
checking whether a vector is a linear combination of other vectors, finding a
basis of a nullspace, column space, row space, eigenspace, computing
determinants, finding inverses, Cramer’s rule, etc., but now we do these
techniques in other fields.
One notable exception where R differs from the other fields we are
considering, is that R is an ordered field (that is, ≥ defines an order relation
on pairs of real numbers, that satisfies x ≥ y ⇒ x + z ≥ z + y and
x, y ≥ 0 ⇒ xy ≥ 0). So anytime we want to use ≤, <, ≥ or >, we will have
to make sure we are dealing with real numbers. We will do this when we talk
about inner products and related concepts in Chapter 5.
Example 1.4.1 Let F = Z3 . Compute the product

 
1 2
1 0 2 
2 1 .
2 2 1
0 1
The product equals

1·1+0·2+2·0 1·2+0·1+2·1 1 1
= .
2·1+2·2+1·0 2·2+2·1+1·1 0 1
Example 1.4.2 Let F = C. Compute the product

 
1+i
2 − i 1 − i 2 + i .
i
The product equals

   
(1 + i)(1 − i) (1 + i)(2 + i) 2 1 + 3i
(2 − i)(1 − i) (2 − i)(2 + i) = 1 − 3i 5 .
i(1 − i) i(2 + i) 1+i −1 + 2i
Example 1.4.3 Let F = Z5 . Put the matrix

 
1 0 2
2 3 1
1 4 0
in row echelon form. We start with the (1, 1) element as our first pivot.
   
1 0 2 1 0 2
2 3 1 → 0 3 (1 − 4 =)2 .
1 4 0 0 4 (0 − 2 =)3
Next, let us multiply the second row with 3−1 = 2, and use the (2, 2) entry
as our next pivot:
   
1 0 2 1 0 2
0 1 4 → 0 1 4 ,
0 4 3 0 0 (3 − 1 =)2
bringing it to row echelon form. After having done this, we can now also
easily compute
   
1 0 2 1 0 2
det 2 3
 1 = 3 det 0 1 4 = 3 · 2 = 1.
1 4 0 0 0 2
Alternatively, we can compute the determinant by expanding along (for

instance) the first row, giving
 
1 0 2
det 2 3 1 = 1 · (3 · 0 − 1 · 4) − 0 · (2 · 0 − 1 · 1) + 2 · (2 · 4 − 3 · 1) = 1.
1 4 0
Example 1.4.4 Let F = Z3 . Find the set of all solutions to the system of
linear equations
x1 + 2x2 =0
.
x1 + x2 + x3 = 1
We set up the associated augmented system and put it in row reduced
echelon form:

1 2 0 0 1 2 0 0 1 0 2 2
→ → .
1 1 1 1 0 2 1 1 0 1 2 2
We find that columns 1 and 2 are pivot columns, and column 3 is not, so x3
is a free variable, and we get the equalities
x1 = 2 − 2x3 = 2 + x3 , x2 = 2 − 2x3 = 2 + x3 . So we find that all solutions
are given by      
x1 2 1
x = x2  = 2 + x3 1 , x3 ∈ Z3 .
x3 0 1
In a typical Linear Algebra I course, systems of linear equations would be
over the field of real numbers, and as soon as there was a free variable, one
would have infinitely many solutions. This is due to R being an infinite field.
In this example, though, we are dealing with a finite field, and thus when we
let x3 range over all elements of Z3 , we only get a finite number of solutions.
This will happen when dealing with any finite field. In this case, all solutions
are found by letting x3 = 0, 1, 2, thus we get that
     
 2 0 1 
2 , 0 , 1
0 1 2
 
are all solutions.
Example 1.4.5 Let F = C. Determine whether b is a linear combination of

a1 , a2 , a3 , where
       
1+i 0 −1 + i 2i
−1 − i  2−i   3 − 2i  2 − 3i
a1 =   2  , a2 = −1 + 2i , a3 = −1 + 4i , b =  1  .
      
0 3 3 3+i
We set up the augmented system and put it in echelon form:
 
1+i 0 −1 + i 2i
−1 − i 2−i 3 − 2i 2 − 3i
 →
 2 −1 + 2i −1 + 4i 1 
0 3 3 3+i
 
1 0 i 1+i
−1 − i 2−i 3 − 2i 2 − 3i
 →
 2 −1 + 2i −1 + 4i 1 
0 3 3 3+i
 
1 0 i 1+i
0
 2−i 2−i 2−i  →
0 −1 + 2i −1 + 2i −1 − 2i
0 3 3 3+i
   
1 0 i 1+i 1 0 i 1+i
0 1 1 1  0 1 1 1 
−3+4i  → 
   .
0 1 1
5
0 0 0 1 
0 3 3 3+i 0 0 0 0
As the augmented column has a pivot, b is not a linear combination of
a1 , a2 , a3 .
Example 1.4.6 Let F = Z5 . Compute the inverse of

 
1 0 2
2 3 1 .
1 4 0
By Example 1.4.3 we know that this matrix is invertible, as every row and
column has a pivot (or equivalently, since its determinant is nonzero). Let us
compute the inverse:
   
1 0 2 1 0 0 1 0 2 1 0 0
2 3 1 0 1 0 → 0 3 (1 − 4 =)2 (0 − 2 =)3 1 0 →
1 4 0 0 0 1 0 4 (0 − 2 =)3 (0 − 1 =)4 0 1
 
1 0 2 1 0 0
0 1 4 1 2 0 →
0 0 (3 − 1 =)2 (4 − 4 =)0 (0 − 3 =)2 1
   
1 0 2 1 0 0 1 0 0 1 0−2 0−1
0 1 4 1 2 0 → 0 1 0 1 2 − 4 0 − 2 ,
0 0 1 0 1 3 0 0 1 0 1 3
so the inverse is  
1 3 4
1 3 3 .
0 1 3
Computing the product
    
1 0 2 1 3 4 1 0 0
2 3 1 1 3 3 = 0 1 0 ,
1 4 0 0 1 3 0 0 1
we see that we computed the inverse correctly.
Example 1.4.7 Let F = C. Find bases of the column space, row space and
null space of the matrix
 
i 1−i 2−i
A = 1 + i −2 −3 + i .
1 − i 1 + 2i 3 + 3i
Let us put A in echelon form:

   
i 1−i 2−i 1 −1 − i −1 − 2i
1 + i −2 −3 + i → 0 −2 + 2i −4 + 4i →
1 − i 1 + 2i 3 + 3i 0 3 + 2i 6 + 4i
 
1 −1 − i −1 − 2i
0 1 2 .
0 0 0
There
 are
 pivots
 incolumns 1 and 2, and thus we find that
i 1−i
{1 + i ,  −2 } is a basis for ColA. Next, for a basis of RowA we
1−i 1 + 2i
simply have to pick the nonzero rows of the row echelon form of A, and thus
we find that
{ 1 −1 − i −1 − 2i , 0 1 2 }
is a basis for RowA. To find a basis for the null space, we put A in row
reduced echelon form:
   
1 −1 − i −1 − 2i 1 0 1
0 1 2  → 0 1 2 .
0 0 0 0 0 0
As there is no pivot in column 3, x3 is a free variable. From

    
1 0 1 x1 0
0 1 2 x2  = 0
0 0 0 x3 0
we find x1 = −x3 and x2 = −2x3 . Thus

   
x1 −1
x = x2  = x3 −2 ,
x3 1
 
−1
yielding that {−2} is a basis for the null space of A. It is easily checked
1
that     
i 1−i 2−i −1 0
1 + i −2 −3 + i −2 = 0 .
1 − i 1 + 2i 3 + 3i 1 0
Let A ∈ Fn×n be a square matrix. Recall that λ ∈ F is an eigenvalue of A, if

there exists a nonzero vector x ∈ Fn so that Ax = λx. Such a vector x =6 0 is
called an eigenvector of A at the eigenvalue λ. Rewriting Ax = λx as
(A − λIn )x = 0, one sees that for λ to be an eigenvalue of A, one needs that
A − λIn is singular, and thus det(A − λIn ) = 0. The null space Ker(A − λIn )
of A − λIn is called the eigenspace of A at λ, and consists of all the
eigenvectors of A at λ and the zero vector.
Example
 1.4.8
 Let F = Z7 . Find a basis for the eigenspace of
4 0 6
A = 3 0 3 corresponding to the eigenvalue λ = 3.
2 5 5
 
1 0 6
We have to find a basis for the null space of A − 3I = 3 4 3 , so we
2 5 2
put A − 3I in row-reduced echelon form:

     
1 0 6 1 0 6 1 0 6
3 4 3 → 0 4 (3 − 4 =)6 → 0 1 5 .
2 5 2 0 5 (2 − 5 =)4 0 0 (4 − 4 =)0
We find that x3 is a free
  variable, and x1 = −6x3 = x3 , x2 = −5x3 = 2x3 ,
1
leading to the basis {2}.
1
      
4 0 6 1 3 1
Let us do a check: A = 3 0 3 2 = 6 = 3 2, confirming that
  2 5 5 1 3 1
1
2 is an eigenvector of A corresponding to the eigenvalue λ = 3.
1

matrix.
Let F be a field, and the n × n matrix A ∈ Fn×n and vector b ∈ Fn be given.

Let ai denote the ith column of A. Now we define

Ai (b) := a1 · · · ai−1 b ai+1 · · · an , i = 1, . . . , n.
Thus Ai (b) is the matrix obtained from A by replacing its ith column by b.
We now have the following result.
Theorem 1.4.9 (Cramer’s rule) Let A ∈ Fn×n be invertible. For any

b ∈ Fn , the unique solution x = (xi )ni=1 to the equation Ax = b has entries
given by
xi = det Ai (b)(det A)−1 , i = 1, . . . , n. (1.8)
Proof. We denote the columns of the n × n identity matrix I by e1 , . . . , en .

Let us compute

A Ii (x) = A e1 · · · ei−1 x ei+1 · · · en =

Ae1 · · · Aei−1 Ax Aei+1 · · · Aen = Ai (b).
But then, using the multiplicativity of the determinant, we get
det A det Ii (x) = det Ai (b). It is easy to see that det Ii (x) = xi , and (1.8)
follows.
Example 1.4.10 Let F = Z3 . Find the solution to the system of linear

equations
x1 + 2x2 = 0
.
x1 + x2 = 1
Applying Cramer’s rule, we get

0 2 1 2 −1
x1 = det (det ) = 1 · 2−1 = 2,
1 1 1 1

1 0 1 2 −1
x2 = det (det ) = 1 · 2−1 = 2.
1 1 1 1
Checking the answer (2 + 2 · 2 = 0, 2 + 2 = 1), confirms that the answer is
correct.
While Cramer’s rule provides a direct formula to solve a system of linear

equations (when the coefficient matrix is invertible), in many ways it is much
better to solve a system of linear equations via row reduction as the latter
requires in general fewer algebraic operations. Cramer’s rule can be useful
for more theoretical considerations. Here is such an example.
Example 1.4.11 Let F = C. Consider the matrix vector equation Ax = b

given by     
i 1−i 2 x1 2
1 + i α 0   x2  =  0  .
1 − i 1 + 2i 3 + 5i x3 5i
Find all α ∈ C so that A is invertible and x2 is real.
Applying Cramer’s rule, we get

   
i 2 2 i 1−i 2
x2 = det 1 + i 0 0  (det 1 + i α 0 )−1 .
1 − i 5i 3 + 5i 1 − i 1 + 2i 3 + 5i
Expanding along the second row we obtain
−(1 + i)(2(3 + 5i) − 2(5i))

x2 = =
−(1 + i)((1 − i)(3 + 5i) − 2(1 + 2i)) + α(i(3 + 5i) − 2(1 − i))
−6 − 6i
.
−8 − 4i + α(−7 + 5i)
−8−4i 18 34i
6 0, we need α 6=
For det A = 7−5i = − 37 − 37 . Next, notice that x2 cannot
equal 0, so we may write −8 − 4i + α(−7 + 5i) = −6−6i 1

x2 . Let t = x2 , and
arrive at
1 18 34i 6 36i
α= (8 + 4i − (6 + 6i)t) = − − +( + )t, t ∈ R \ {0},
−7 + 5i 37 37 37 37
as the set of solutions for α.
Given A = (aij )ni,j=1 ∈ Fn×n . We let Aij ∈ F(n−1)×(n−1) be the matrix

obtained from A by removing the ith row and the jth column, and we put
Cij = (−1)i+j det Aij , i, j = 1, . . . , n.
The number Cij is called the (i, j)th cofactor of A. Given
 
a11 a12 · · · a1n
 a21 a22 · · · a2n 
A= . ..  ,
 
. ..
 . . . 
an1 an2 · · · ann
the adjugate of A is defined by
 
C11 C21 ··· Cn1
 C12 C22 ··· Cn2 
adj(A) =  . ..  . (1.9)
 
..
 .. . . 
C1n C2n ··· Cnn
Thus the (i, j)th entry of adj(A) is Cji (notice the switch in the indices!).
Example 1.4.12 Let F = Z5 . Compute the adjugate of

 
1 0 2
A = 2 3 1 .
1 4 0
We get
   
3 · 0 − 1 · 4 −0 · 0 + 2 · 4 0 · 1 − 2 · 3 1 3 4
adj(A) = −2 · 0 + 1 · 1 1 · 0 − 2 · 1 −1 · 1 + 2 · 2 = 1 3 3 .
2 · 4 − 3 · 1 −1 · 4 + 0 · 1 1 · 3 − 0 · 2 0 1 3
The usefulness of the adjugate matrix is given by the following result.
Theorem 1.4.13 Let A ∈ Fn×n . Then

A adj(A) = (det A)In = adj(A) A. (1.10)
6 0, then
In particular, if det A =
A−1 = (det A)−1 adj(A). (1.11)
Proof. As before, we let ai denote the ith column of A. Consider Ai (aj ),

which is the matrix A with the ith column replaced by aj . Thus, when i = 6 j
we have that Ai (aj ) has two identical columns (namely the ith and the jth)
and thus det Ai (aj ) = 0, i =6 j. When i = j, then Ai (aj ) = A, and thus
det Ai (aj ) = det A, i = j. Computing the (i, j)th entry of the product
adj(A) A, we get
n
(
X det A if i = j
(adj(A) A)ij = Cki akj = det Ai (aj ) = ,
k=1
0 if i 6= j
where we expanded det Ai (aj ) along the ith column. This proves the second
equality in (1.10). The proof of the first equality in (1.10) is similar.
Notice that if we apply (1.11) to a 2 × 2 matrix, we obtain the familiar

formula −1
a b 1 d −b
= .
c d ad − bc −c a
In Example 1.4.3 we have det A = 1, so the adjugate matrix (which we

computed in Example 1.4.12) equals in this case the inverse, confirming the
computation in Example 1.4.6.
1.5 Exercises
Exercise 1.5.1 The set of integers Z with the usual addition and
multiplication is not a field. Which of the field axioms does Z satisfy, and
which one(s) are not satisfied?
Exercise 1.5.2 Write down the addition and multiplication tables for Z2
and Z5 . How is commutativity reflected in the tables?
Exercise 1.5.3 The addition and multiplication defined in (1.4) also works
when p is not prime. Write down the addition and multiplication tables for
Z4 . How can you tell from the tables that Z4 is not a field?
Exercise 1.5.4 Solve Bezout’s identity for the following choices of a and b:
(i) a = 25 and b = 7;
(ii) a = −50 and b = 3.
Exercise 1.5.5 In this exercise we are working in the field Z3 .
(i) 2 + 2 + 2 =
(ii) 2(2 + 2)−1 =
(iii) Solve for x in 2x + 1 = 2.

1 2
(iv) Find det .
1 0

1 2 1 1
(v) Compute .
0 2 2 1
−1
2 0
(vi) Find .
1 1
(i) 4 + 3 + 2 =
(ii) 4(1 + 2)−1 =

4 2
(iv) Find det .
1 0

1 2 0 1
(v) Compute .
3 4 2 1
−1
2 2
(vi) Find .
4 3
Exercise 1.5.7 In this exercise we are working in the field C. Make sure
you write the final answers in the form a + bi, with a, b ∈ R. For instance,
1+i
2−i should not be left as a final answer, but be reworked as
1+i 1+i 2+i 2 + i + 2i + i2 1 + 3i 1 3i

=( )( )= = = + .
2−i 2−i 2+i 22 + 1 2 5 5 5
Notice that in order to get rid of i in the denominator, we decided to
multiply both numerator and denominator with the complex conjugate of
the denominator.
(i) (1 + 2i)(3 − 4i) − (7 + 8i) =

1+i
(ii) 3+4i =
(iii) Solve for x in (3 + i)x + 6 − 5i = −3 + 2i.

4 + i 2 − 2i
(iv) Find det .
1+i −i

−1 + i 2 + 2i 0 1−i
(v) Compute .
−3i −6 + i −5 + 4i 1 − 2i
−1
2+i 2−i
(vi) Find .
4 4
Exercise 1.5.8 Here the field is R(t). Find the inverse of the matrix
1
2 + 3t t2 +2t+1
3t−4 ,
t+1 1+t
if it exists.
Exercise 1.5.9 Let F = Z3 . Compute the product

 
1 0 2
1 1 0 
1 2 1 .
2 1 1
2 0 1
Exercise 1.5.10 Let F = C. Compute the product

2−i 2+i 5+i 6−i
2 − i −10 1−i 2+i
Exercise 1.5.11 Let F = Z5 . Put the matrix

 
3 1 4
2 1 0
2 2 1
in row echelon form, and compute its determinant.
Exercise 1.5.12 Let F = Z3 . Find the set of all solutions to the system of
linear equations
2x1 + x2 =1
.
2x1 + 2x2 + x3 = 0
Exercise 1.5.13 Let F = C. Determine whether b is a linear combination

of a1 , a2 , a3 , where
       
i 0 −i 0
1 − i  3+i   2 + 2i  0
a1 =  2 − i , a2 = −1 + i , a3 = −3 + 2i , b = 0 .
      
1 −3 3 1
Exercise 1.5.14 Let F = Z5 . Compute the inverse of

 
2 3 1
1 4 1
1 1 2
in two different ways (row reduction and by applying (1.11)).
Exercise 1.5.15 Let F = C. Find bases of the column space, row space and
null space of the matrix
 
1 1+i 2
A = 1 + i 2i 3 + i .
1−i 2 3 + 5i
Exercise
 1.5.16
 Let F = Z7 . Find a basis for the eigenspace of
3 5 0
A = 4 6 5 corresponding to the eigenvalue λ = 1.
2 2 4
Exercise 1.5.17 Let F = Z3 . Use Cramer’s rule to find the solution to the
system of linear equations

2x1 + 2x2 = 1
.
x1 + 2x2 = 1
Exercise 1.5.18 Let F = C. Consider the matrix vector equation Ax = b

given by     
i 1−i 2 x1 2
1 + i α 0   x2  =  0  .
1 − i 1 + 2i 3 + 5i x3 5i
Determine α ∈ C so that A is invertible and x1 = x2 .
Exercise 1.5.19 Let F = R(t). Compute the adjugate of

 1
2 + t2 2 − t

t
2
A =  1+t 3t 1 − t .
2
1 4+t 0
Exercise 1.5.20 Recall that the trace of a square matrix is definedPto be the
n
sum of its diagonal entries. Thus tr[(aij )ni,j=1 ] = a11 + · · · + ann = j=1 ajj .
(a) Show that if A ∈ Fn×m and B ∈ Fm×n , then tr(AB) = tr(BA).

(b) Show that if A ∈ Fn×m , B ∈ Fm×k , and C ∈ Fk×n , then
tr(ABC) = tr(CAB) = tr(BCA).
(c) Give an example of matrices A, B, C ∈ Fn×n so that

tr(ABC) =6 tr(BAC).
Exercise 1.5.21 Let A, B ∈ Fn×n . The commutator [A, B] of A and B is

defined by [A, B] := AB − BA.
(a) Show that tr([A, B]) = 0.

(b) Show that when n = 2, we have that [A, B]2 = − det([A, B])I2 .
(c) Show that if C ∈ Fn×n as well, then tr(C[A, B]) = tr([B, C]A).
The following two exercises provide a very introductory illustration of how

finite fields may be used in coding. To learn more, please look for texts on
linear coding theory.
Exercise 1.5.22 The 10-digit ISBN number makes use of the field
Z11 = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, X} (notice that X is the roman numeral for
10). The first digit(s) present the group. For English-speaking countries, the
first digit is a 0 or a 1. The next set of digits represents the publisher. For
instance, Princeton University Press has the digits 691. Some of the bigger
publishers have a 2-digit publisher code, leaving them more digits for their
titles. The next set of digits represent the specific title. Finally, the last digit
of the 10-digit ISBN number a1 a2 . . . a10 is a check digit, which needs to
satisfy the equation
a1 + 2a2 + 3a3 + · · · + 9a9 + 10a10 = 0
in Z11 . For instance, 0691128898 is the 10-digit ISBN of Matrix Completions,

Moments, and Sums of Hermitian Squares by Mihály Bakonyi and Hugo J.
Woerdeman. Indeed, we have
1 · 0 + 2 · 6 + 3 · 9 + 4 · 1 + 5 · 1 + 6 · 2 + 7 · 8 + 8 · 8 + 9 · 9 + 10 · 8 = rem(341|11) = 0.
Check that the 10-digit ISBN number 3034806388 has a correct check digit.
Exercise 1.5.23 A not so secure way to convert a secret message is to

replace letters by numbers, e.g., AWESOME = 1 23 5 19 15 13 5. Whatever
numbers are chosen with the letters, knowing that it corresponds to an
English text, one can use general information about English (such as that
“E” is the letter that appears most and “Z” is the letter that appears least),
to crack the code. What will make cracking the code more challenging is to
use a matrix to convert the list of numbers. We are going to work with Z29
and 29 characters, with 0 standing for “space”, the numbers 1–26 standing
for the letters A–Z, the number 27 standing for “period,” and 28 standing
for “comma.” Thus, for example,
Wow, he said. ⇔ 23 15 23 28 0 8 5 0 19 1 9 4 27
Next we are going to use 3 × 3 matrices in Z29 to convert the code as
follows. Letting  
2 1 6
A =  2 0 10 ,
11 2 3
we can take the first three numbers in the sequence, put them in a vector,
multiply it by the matrix A, and convert them back to characters:
   
23 25
A 15 = 15 , 25 15 4 ⇔ YOD.
23 4
If we do this for the whole sentence, putting the numbers in groups of three,
adding spaces (=0) at the end to make sure we have a multiple of three, we
have that “Wow, he said. ” (notice the two spaces at the end) converts to
“YODQTMHZYFMLYYG.” In order to decode. one performs the same
algorithm with  
9 9 10
A−1 = 17 27 21 .
4 7 27
Decode the word “ZWNOWQJJZ.”
Exercise 1.5.24 (Honors) The field axioms imply several things that one
might take for granted but that really require a formal proof. In this
exercise, we address the uniqueness of the neutral element and the inverse.
Claim In a field there is a unique neutral element of addition.
Proof. Suppose that both 0 and 00 satisfy Axiom 4. Thus 0 + x = x = x + 0

and 00 + x = x = x + 00 hold for every x. Then 0 = 0 + 00 = 00 , proving the
uniqueness.
(i) Prove the uniqueness of the neutral element of multiplication.

(ii) Prove uniqueness of the additive inverse. To do this, one needs to show
that if x + y = 0 = x + z, it implies that y = z. Of course, it is tempting
to just remove the x’s from the equation x + y = x + z (as you are used
to), but the exact purpose of this exercise is to make you aware that
these familiar rules need to be reproven by exclusively using the field
axioms. So use exclusively the fields axioms to fill in the blanks:
y = y + 0 = y + (x + z) = · · · = · · · = · · · = z.
Exercise 1.5.25 (Honors) Let F be a field, and K ⊆ F. Show that K is a

subfield of F if and only if
(i) 0, 1 ∈ K, and
6 0, x−1 also
(ii) x, y ∈ K implies x + y, xy, −x belong to K, and when x =
belongs to K.
√ √
√ 1.5.26 (Honors) Let Q + Q 2 := {a + b 2 : a, b ∈ Q}. So
Exercise
Q + Q 2 contains elements such as
√ √
5 2 1 1 2−3 2 1 3√
− + and √ = √ . √ =− + 2.
6 2 2+3 2 2+3 2 2−3 2 7 14
√
Show that Q + Q 2 is a subfield of R.
Exercise 1.5.27 (Honors) Let

n
X
A = {z ∈ C : there exist n ∈ N and a0 , . . . , an ∈ Z so that ak z k = 0}.
k=0
In other words, A consists of all roots of polynomials √

with integer coefficients
(also known as algebraic numbers). Numbers such as 3 2 − 5, cos( π7 ), and
√
5 − i 3 belong to A. The numbers π and e do not belong to A (such
numbers are called transcendental).
Formulate the statements about polynomials and their roots that would need
to be proven to show that A is closed under addition and multiplication. It
turns out that A is a subfield of C, and you are welcome to look up the proof.
2
Vector Spaces
CONTENTS
2.1 Definition of a vector space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2 Vector spaces of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2.1 The special case when X is finite . . . . . . . . . . . . . . . . . . . . . . . 31
2.3 Subspaces and more examples of vector spaces . . . . . . . . . . . . . . . . . . 32
2.3.1 Vector spaces of polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.3.2 Vector spaces of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.4 Linear independence, span, and basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.5 Coordinate systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
The foundation for linear algebra is the notion of a vector space over a field.
Two operations are important in a vector space (i) addition: any two
elements in a vector space can be added together; (ii) multiplication by a
scalar: an element in a vector space can be multiplied by a scalar (= an
element of the field). Anytime one has mathematical objects where these two
operations are well-defined and satisfy some basic properties, one has a
vector space. Allowing this generality and developing a theory that just uses
these basic rules, leads to results that can be applied in many settings.
2.1 Definition of a vector space
A vector space over a field F is a set V along with two operations
+ : V × V → V, · : F × V → V
satisfying the following rules:
1. Closure of addition: for all u, v ∈ V we have that u + v ∈ V .
27
2. Associativity of addition: for all u, v, w ∈ V we have that

(u + v) + w = u + (v + w).
3. Commutativity of addition: for all u, v ∈ V we have that u + v = v + u.
4. Existence of a neutral element for addition: there exists a 0 ∈ V so that
u + 0 = u = 0 + u for all u ∈ V .
5. Existence of an additive inverse: for every u ∈ V there exists a −u ∈ V
so that u + (−u) = 0 = (−u) + u.
6. Closure of scalar multiplication: for all c ∈ F and u ∈ V we have that
cu ∈ V .
7. First distributive law: for all c ∈ F and u, v ∈ V we have that
c(u + v) = cu + cv.
8. Second distributive law: for all c, d ∈ F and u ∈ V we have that
(c + d)u = cu + du.
9. Associativity for scalar multiplication: for all c, d ∈ F and u ∈ V we have
that c(du) = (cd)u.
10. Unit multiplication rule: for every u ∈ V we have that 1u = u.
These axioms imply several rules that seem “obvious,” but as all properties
in vector spaces have to be traced back to the axioms, we need to reprove
these obvious rules. Here are two such examples.
Lemma 2.1.1 Let V be a vector space over F. Then for all u ∈ V we have
that
(i) 0u = 0.
(ii) (−1)u = −u.
Proof. (i) As 0u ∈ V , we have that 0u has an additive inverse; call it v. Then
0 = 0u + v = (0 + 0)u + v = (0u + 0u) + v = 0u + (0u + v) = 0u + 0 = 0u.
For (ii) we observe
−u = 0 + (−u) = 0u + (−u) = ((−1) + 1)u + (−u) = ((−1)u + 1u) + (−u) =
(−1)u + (1u + (−u)) = (−1)u + (u + (−u)) = (−1)u + 0 = (−1)u.

Vector Spaces 29
2.2 Vector spaces of functions
The set of all functions from a set X to a field F is denoted by FX . Thus
FX := {f : X → F : f is a function}.
When f, g : X → F we can define the sum of f and g as the function
f + g : X → F, (f + g)(x) = f (x) + g(x).
Thus, by virtue that F has a well-defined addition, the set FX now also has a
well-defined addition. It is a fine point, but it is important to recognize that
in the equation
(f + g)(x) = f (x) + g(x)
the first + sign represents addition between functions, while the second +
sign represents addition in F, so really the two +s are different. We still
choose to use the same + sign for both, although technically we could have
made them different (+FX and +F , say) and written
(f +FX g)(x) = f (x) +F g(x).
Next, it is also easy to define the scalar multiplication on FX as follows.

Given c ∈ F and f : X → F, we define the function cf via
cf : X → F, (cf )(x) = c(f (x)).
Again, let us make the fine point that there are two different multiplications
here, namely the multiplication of a scalar (i.e., an element of F) with a
function and the multiplication of two scalars. Again, if we want to highlight
this difference, one would write this for instance as
(c ·FX f )(x) = c ·F f (x).
We now have the following claim.
Proposition 2.2.1 The set FX with the above definitions of addition and
scalar multiplication is a vector space over the field F.
Checking that all the vector space axioms is not hard. For instance, to check
commutativity of addition, we have to show that f + g = g + f . This
introduces the question: When are two functions equal? The answer to this
is:
Two functions h, k : X → F are equal if and only if for all x ∈ X:

h(x) = k(x).
Thus, to show that f + g = g + f , we simply need to show that for all x ∈ X

we have that (f + g)(x) = (g + f )(x). The proof of this is:
(f +FX g)(x) = f (x) +F g(x) = g(x) +F f (x) = (g +FX f )(x) for all x ∈ X,
where in the first and third equality we applied the definition of the sum of
two functions, while in the middle equality we applied commutativity of
addition in F.
Important to realize is what the neutral element of addition in FX is: it is a

function, and when added to another function it should not change the other
function. This gives:
the function 0 : X → F defined via 0(x) = 0 for all x ∈ X, is the neutral

element in FX .
Notice that again we have two different mathematical objects: the constant
zero function (= the neutral element of addition in FX ) and the neutral
element of addition in F. If we want to highlight this difference, one would
write for instance:
0FX (x) = 0F for all x ∈ X.
For the inverse element for addition in FX , we have similar considerations.

Given f : X → F,
the additive inverse is the function −f : X → F defined via

(−f )(x) = −f (x), x ∈ X.
The two minuses are different, which can be highlighted by writing
(−FX f )(x) = −F f (x).
Now all the ingredients are there to write a complete proof to Proposition
5.1.6. We already showed how to address the commutativity of addition, and
as the proofs of the other rules are similar, we will leave them to the reader.
Vector Spaces 31
2.2.1 The special case when X is finite
The case when X is a finite set is special in the sense that in this case we
can simply write out all the values of the function. For instance, if
X = {1, . . . , n}, then the function f : X → F simply corresponds to choosing
elements f (1), . . . , f (n) ∈ F. Thus we can identify
 
f (1)
f : {1, . . . n} → F ⇔  ...  ∈ Fn .
 
f (n)
When f, g : {1, . . . , n} → F, the sum function is defined by
(f + g)(1) = f (1) + g(1), . . . , (f + g)(n) = f (n) + g(n),
which in the notation above amounts to

 
f (1) + g(1)
f + g : {1, . . . n} → F ⇔  .. n
∈F .
 
.
f (n) + g(n)
This corresponds to the definition of adding elements in Fn :

     
f (1) g(1) f (1) + g(1)
 ..   ..   ..
 . + . = .

.
f (n) g(n) f (n) + g(n)
Similarly, for scalar multiplication we have

 
cf (1)
cf : {1, . . . n} → F ⇔  ...  ∈ Fn .
 
cf (n)
This corresponds to scalar multiplication of elements in Fn :

   
f (1) cf (1)
c  ...  =  ...  .
   
f (n) cf (n)
Thus when we deal with function f : X → F with X a finite set with n

elements, the vector space FX corresponds exactly to the vector space Fn .
Clearly, when X has n elements, it does not have to equal the set
X = {1, . . . , n}, however, it will be our default choice. Sometimes, though, it
may be convenient to use X = {0, . . . , n − 1} instead.
As a final remark in this subsection, we note that typically we write a vector

x ∈ Fn as  
x1
 .. 
x =  . ,
xn
instead of the function notation. Of course, this is just a notational choice.
Conceptually, we can still think of this vector as representing a function on a
finite set of n elements.
2.3 Subspaces and more examples of vector spaces
Given a vector space V over a field F, and W ⊆ V . When w, y ∈ W , then as

W is a subset of V , we also have w, y ∈ V , and thus w + y is well-defined.
In addition, when c ∈ F and w ∈ W ⊆ V , then cw is well-defined. Thus we
can consider the question whether W with the operations as defined on V , is
itself a vector space. If so, we call W a subspace of V .
Proposition 2.3.1 Given a vector space V over a field F, and W ⊆ V , then

W is a subspace of V if and only if
(i) 0 ∈ W .
(ii) W is closed under addition: for all w, y ∈ W , we have w + y ∈ W .
(iii) W is closed under scalar multiplication: for all c ∈ F and w ∈ W , we
have that cw ∈ W .
Proof. If W is a vector space, then (i), (ii) and (iii) are clearly satisfied.
For the converse, we need to check that when W satisfies (i), (ii) and (iii), it
satisfies all ten axioms in the definition of a vector space. Clearly properties
(i), (ii) and (iii) above take care of axioms 1, 4 and 6 in the definition of a
vector space. Axiom 5 follows from (iii) in combination with Lemma
2.1.1(ii). The other properties (associativity, commutativity, distributivity,
unit multiplication) are satisfied as they hold for all elements of V , and thus
also for elements of W .
In Proposition 2.3.1 one may replace (i) by

Vector Spaces 33
(i)’ W 6= ∅.
Clearly, if (i) holds then (i)’ holds.
For the other direction, note that if w ∈ W (existence of such w is

guaranteed by (i)’) then by (iii) and Lemma 2.1.1(i), we get that
0 = 0w ∈ W . Thus (i)’ and (iii) together imply (i).
Given two subspaces U and W of a vector space V , we introduce
U + W := {v ∈ V : there exist u ∈ U and w ∈ W so that v = u + w},
U ∩ W := {v ∈ V : v ∈ U and v ∈ W }.
Proposition 2.3.2 Given two subspaces U and W of a vector space V over

F, then U + W and U ∩ W are also subspaces of V .
Proof. Clearly 0 = 0 + 0 ∈ U + W as 0 ∈ U and 0 ∈ W . Let v, v̂ ∈ U + W

and c ∈ F. Then there exist u, û ∈ U and w, ŵ ∈ W so that v = u + w and
v̂ = û + ŵ. Then v + v̂ = (u + w) + (û + ŵ) = (u + û) + (w + ŵ) ∈ U + W ,
since u + û ∈ U and w + ŵ ∈ W . Also cv = c(u + w) = cu + cw ∈ U + W as
cu ∈ U and cw ∈ W . This proves that U + W is a subspace.
As 0 ∈ U and 0 ∈ W , we have that 0 ∈ U ∩ W . Next, let v, v̂ ∈ U ∩ W and

c ∈ F. Then v, v̂ ∈ U , and since U is a subspace, we have v + v̂ ∈ U .
Similarly, v + v̂ ∈ W . Thus v + v̂ ∈ U ∩ W . Finally, since v ∈ U and U is a
subspace, cv ∈ U . Similarly, cv ∈ W . Thus cv ∈ U ∩ W .
When U ∩ W = {0}, then we refer to U + W as a direct sum of U and W ,

and write U +̇W . More generally, when U1 , . . . , Uk are subspaces of V , then
we define the following
+kj=1 Uj = U1 + · · · + Uk =
{v ∈ V : there exist uj ∈ Uj , j = 1, . . . , k, so that v = u1 + · · · + uk },

∩kj=1 Uj = U1 ∩ . . . ∩ Uk = {v ∈ V : v ∈ Uj for all j = 1, . . . , k}.
It is straightforward to prove that +kj=1 Uj and ∩kj=1 Uj are subspaces of V .
We say that U1 + · · · + Uk is a direct sum if for all j = 1, . . . , k, we have that
Uj ∩ (U1 + · · · + Uj−1 + Uj+1 + · · · + Uk ) = {0}.

k
In that case we write U1 +̇ · · · +̇Uk or +̇j=1 Uj .
Proposition 2.3.3 Consider the direct sum U1 +̇ · · · +̇Uk , then for every
v ∈ U1 +̇ · · · +̇Uk there exists unique uj ∈ Uj , j = 1, . . . , k, so that
v = u1 + · · · + uk . In particular, if uj ∈ Uj , j = 1, . . . , k, are so that
u1 + · · · + uk = 0, then uj = 0, j = 1, . . . , k.
Proof. Suppose v = u1 + · · · + uk = û1 + · · · + ûk , with uj , ûj ∈ Uj ,

j = 1, . . . , k. Then
−(uj − ûj ) = (u1 − û1 ) + · · · + (uj−1 − ûj−1 ) + (uj+1 − ûj+1 ) + · · · + (uk − ûk )
belongs to both Uj and U1 + · · · + Uj−1 + Uj+1 + · · · + Uk , and thus to their

intersection. As the intersection equals {0}, we obtain that uj − ûj = 0. As
j ∈ {1, . . . , k} was arbitrary, we get uj = ûj , j = 1, . . . , k, as desired.
When u1 + · · · + uk = 0 = 0 + · · · + 0, then the uniqueness of the

representation implies that uj = 0, j = 1, . . . , k.
2.3.1 Vector spaces of polynomials
We let F[X] be the set of all polynomials in X with coefficients in F. Thus a

typical element of F[X] has the form
n
X
p(X) = pj X j = p0 X 0 + p1 X + p2 X 2 + · · · + pn X n ,
j=0
where n ∈ N and p0 , . . . , pn ∈ F. Here X is merely a symbol and so are its

powers X i , with the understanding that X i X j = X i+j . Often X 0 is omitted,
as when we specify X we will have that X 0 is a multiplicative neutral
element (as for instance the equality X 0 X i = X i suggests).
Pn Pm
When we have two polynomials p(X) = j=0 pj X j and q(X) = j=0 qj X j ,
it is often convenient to have m = n. We do this by introducing additional
terms with a zero coefficient. For instance, if we want to view
p(X) = 1 + X and q(X) = 1 + 2X 2 − X 5
as having the same number of terms we may view them as
p(X) = 1+X +0X 2 +0X 3 +0X 4 +0X 5 , q(X) = 1+0X +2X 2 +0X 3 +0X 4 −X 5 .
Notice that the term X is really 1X, and −X 5 is (−1)X 5 .

Pn Pn
Two polynomials p(X) = j=0 pj X j and q(X) = j=0 qj X j are equal
exactly when all their coefficients are equal: pj = qj , j = 0, . . . , n.
Vector Spaces 35
Pn Pn
The sum of two polynomials p(X) = j=0 pj X j and q(X) = j=0 qj X
j
is
given by
n
X
(p + q)(X) = (pj + qj )X j .
j=0
Pn
When c ∈ F and p(X) = j=0 pj X j are given, we define the polynomial
(cp)(X) via
n
X
(cp)(X) = (cpj )X j .
j=0
Proposition 2.3.4 The set F[X] with the above defined addition and scalar
multiplication, is a vector space over F.
The proof is straightforward, so we will leave it as an exercise. Of course, the

zero in F[X]
Pn is the polynomials with P all its coefficients equal to 0, and when
n
p(X) = j=0 pj X j then (−p)(X) = j=0 (−pj )X j .
Given two equal polynomial p(X), q(X) ∈ F[X] , then obviously p(x) = q(x)
for all x ∈ F. However, the converse is not always the case, as the following
example shows.
Example 2.3.5 Let F = Z2 , and p(X) = 0 and q(X) = X − X 2 . Then p(X)

and q(X) are different polynomials (e.g., p1 = 0 =6 1 = q1 ), but p(x) = q(x)
for all x ∈ Z2 . Indeed, p(0) = 0 = q(0) and p(1) = 0 = q(1).
When A ∈ Fm×m (i.e., A is an m × m

We do have the following observation. P
n
matrix with entries in F) and p(X) = j=0 pj X j then we define
p(A) = p0 Im + p1 A + p2 A2 + · · · + pn An ∈ Fm×m ,
where Im denotes the m × m identity matrix. For future use, we define

A0 := Im .
Proposition 2.3.6 Two polynomials p(X), q(X) ∈ F[X] are equal if and
only if for all m ∈ N
p(A) = q(A) for all A ∈ Fm×m . (2.1)
Proof. When p(X) = q(X), clearly (2.1) holds for all m ∈ N.

Pn Pm
For the converse, suppose that p(X) = j=0 pj X j and q(X) = j=0 qj X j
satisfy (2.1) for all m ∈ N. Let J be the n × n matrix

 
0 1 0 ··· 0
0 0 1 · · · 0
 
Jn =  ... ... . . . . . . ...  .
 
 
0 0 · · · 0 1 
0 0 ··· 0 0
Then
   
p0 p1 p2 ··· pn q0 q1 q2 ··· qn
 0 p0
 p1 ··· pn−1 

0
 q0 q1 ··· qn−1 

 .. .. .. .. ..  = p(J ) = q(J ) =  .. .. .. .. ..  ,
.
 . . . . 
 n n .
 . . . . 

0 0 ··· p0 p1  0 0 ··· q0 q1 
0 0 ··· 0 p0 0 0 ··· 0 q0
and thus pj = qj , j = 0, . . . n, follows.
Pn
For a polynomial p(X) = j=0 pj X j with pn = 6 0 we say that its degree
equals n, and we write deg p = n. It is convenient to assign −∞ as the
degree of the zero polynomial (in this way, with the convention that
−∞ + n = −∞, we have that the degree of a product of polynomials is the
sum of the degrees).
Proposition 2.3.7 Let Fn [X] := {p(X) ∈ F[X] : deg p ≤ n}, where

n ∈ {0, 1, . . .}. Then Fn [X] is a subspace of F[X].
Proof. Clearly 0 ∈ Fn [X]. Next, if deg p ≤ n, deg q ≤ n and c ∈ F, then

deg p + q ≤ n and deg cp ≤ n. Thus, Fn [X] is closed under addition and
scalar multiplication. Apply now Proposition 2.3.1 to conclude that Fn [X] is
a subspace of F[X].
One can also consider polynomials in several variables X1 , . . . , Xk , which can

either be commuting variables (so that, for instance, X1 X2 and X2 X1 are
the same polynomial) or non-commuting variables (so that X1 X2 and X2 X1
are different polynomials). We will not pursue this here.
2.3.2 Vector spaces of matrices
Let Fn×m denote the set of n × m matrices with entries in F. So a typical

element of Fn×m is
 
a11 · · · a1m
A = (ai,j )ni=1,j=1
m
=  ... ..  .

. 
an1 ··· anm
Vector Spaces 37
Addition and scalar multiplication are defined via
(ai,j )ni=1,j=1
m
+(bi,j )ni=1,j=1
m
= (ai,j +bi,j )ni=1,j=1
m
, c(ai,j )ni=1,j=1
m
= (cai,j )ni=1,j=1
m
.
Proposition 2.3.8 The set Fn×m with the above definitions of addition and
scalar multiplication is a vector space over F.
When m = 1, we have Fn×1 = Fn . The vector space F1×m can be identified

with Fm (by simply turning a row vector into a column vector). In fact, we
can identify Fn×m with Fnm , for instance by stacking the columns of a
matrix into a large vector. For example, when n = m = 2, the identification
would be  
a
a b c
↔ b .

c d
d
This identification works when we are only interested in the vector space
properties of Fn×m . However, if at the same time we are interested in other
properties of n × m matrices (such as, that one can multiply such matrices
on the left with a k × n matrix), one should not make this identification.
2.4 Linear independence, span, and basis
The notion of a basis is a crucial one; it basically singles out few elements in
the vector space with which we can reconstruct the whole vector space. For
example, the monomials 1, X, X 2 , . . . form a basis of the vector space of
polynomials. When we start to do certain (namely, linear) operations on
elements of a vector space, we will see in the next chapter that it will suffice
to know how these operations act on the basis elements. Differentiation is an
example: as soon as we know that the derivatives of 1, X, X 2 , X 3 , . . . are
0, 1, 2X, 3X 2 , . . ., respectively, it is easy to find the derivative of a
polynomial. Before we get to the notion of a basis, we first need to introduce
linear independence and span.
Let V be a vector space over F. A set of vectors {v1 , . . . , vp } in V is said to

be linearly independent if the vector equation
c1 v1 + c2 v2 + · · · + cp vp = 0, (2.2)
with c1 , . . . , cp ∈ F, only has the solution c1 = 0, . . . , cp = 0 (the trivial

solution). The set {v1 , . . . , vp } is said to be linearly dependent if (2.2) has a

solution where not all of c1 , . . . , cp are zero (a nontrivial solution). In such a
case, (2.2) with at least one ci nonzero gives a linear dependence relation
among {v1 , . . . , vp }. An arbitrary set S ⊆ V is said to be linearly
independent if every finite subset of S is linearly independent. The set S is
linearly dependent, if it is not linearly independent.
Example 2.4.1 Let V = RR = {f : R → R : f is a function}, and consider

the finite set of vectors {cos(x), ex , x2 } in RR . We claim that this set is
linearly independent. For this, consider a linear combination
c1 cos(x) + c2 ex + c3 x2 and set it equal to the zero function 0(x), which is
the neutral element of addition in RR :
c1 cos(x) + c2 ex + c3 x2 = 0(x) = 0 for all x ∈ R.
If we take different values for x we get linear equations for c1 , c2 , c3 . Taking

x = 0, x = π2 , x = − π2 , we get the following three equations:

 c1 + c2 e =0
π π2
c2 e + c3 4 = 0
2
π 2
c2 e− 2 + c3 π4 = 0.

1 e 0
 
π 2
π 
As det 0 e 2 42 6 0, we get that we must have c1 = c2 = c3 = 0.
=
−π
0 e 2 π4
Thus linear independence of {cos(x), ex , x2 } follows.
Let us also consider the set of vectors {1, cos(x), sin(x), cos2 (x), sin2 (x)}. We
claim this set is linearly dependent, as the nontrivial choice
c1 = 1, c2 = 0, c3 = 0, c4 = −1, c5 = −1 gives the linear dependence relation
c1 1 + c2 cos(x) + c3 sin(x) + c4 cos2 (x) + c5 sin2 (x) =
1 − cos2 (x) − sin2 (x) = 0(x) = 0 for all x ∈ R.
Example 2.4.2 Let V = Z2×2 3 . Let us check whether

1 0 1 1 0 2
S={ , , }
2 1 1 1 1 1

0 0
is linearly independent or not. Notice that is the neutral element of
0 0
addition in this vector space. Consider the equation

1 0 1 1 0 2 0 0
c1 + c2 + c3 = .
2 1 1 1 1 1 0 0
Vector Spaces 39
Rewriting, we get    
1 1 0   0
0 c1
1 2
 c2  = 0 .
 

2 (2.3)
1 1 0
c3
1 1 1 0
Bringing this 4 × 3 matrix in row echelon form gives
     
1 1 0 1 1 0 1 1 0
0 1 2 0 1 2 0 1 2
2 1 1 → 0 2 1 → 0 0
     .
1
1 1 1 0 0 1 0 0 0
As there are pivots in all columns, the system (2.3) only has the trivial
solution c1 = c2 = c3 = 0. Thus S is linearly independent.
Next, consider
1 0 1 1 2 1
Ŝ = { , , }.
2 1 1 1 0 2
Following the same reasoning as above we arrive at the system
   
1 1 2   0
0 c1
1 1 c2  = 0 ,
 

2 (2.4)
1 0 0
c3
1 1 2 0
which after row reduction leads to

   
1 1 2   0
0 1 c1
1 0
 c  =   . (2.5)
0 2

0 0 0
c3
0 0 0 0
So, c3 is a free variable. Letting c3 = 1, we get c2 = −c3 = 2 and

c1 = −c2 − 2c3 = 2, so we find the linear dependence relation

1 0 1 1 2 1 0 0
2 +2 + = ,
2 1 1 1 0 2 0 0
and thus Ŝ is linearly dependent.
Given a set S ⊆ V we define
Span S := {c1 v1 + · · · + cp vp : p ∈ N, c1 , . . . , cp ∈ F, v1 , . . . , vp ∈ S}.
Thus, Span S consists of all linear combinations of a finite set of vectors in

S. It is straightforward to check that Span S is a subspace of V . Indeed,
0 ∈ Span S as one can choose p = 1, c1 = 0, and any v1 ∈ S, to get that

0 = 0v1 ∈ Span S. Next, the sum of two linear combinations of vectors in S
again a linearPcombination of vectors of S. Finally, for c ∈ F we have that
isP
p p
c j=1 cj vj = j=1 (ccj )vj is again a linear combination of elements in S.
Thus by Proposition 2.3.1 we have that Span S is a subspace of V .
Example 2.4.3 Let V = (Z5 )3 [X] = {p(X) ∈ Z5 [X] : deg p ≤ 3}. We

claim that
Span{X −1, X 2 −2X +1, X 3 −3X 2 +3X −1} = {p(X) ∈ V : p(1) = 0} =: W.

(2.6)
First, observe that if
p(X) = c1 (X − 1) + c2 (X 2 − 2X + 1) + c3 (X 3 − 3X 2 + 3X − 1),
then deg p ≤ 3 and p(1) = 0, and thus

Span{X − 1, X 2 − 2X + 1, X 3 − 3X 2 + 3X − 1} ⊆ W.
To prove the converse inclusion ⊇ in (2.6), let

p(X) = p0 + p1 X + p2 X 2 + p3 X 3 be an arbitrary element of W . The
condition p(1) = 0 gives that p0 + p1 + p2 + p3 = 0. We need to show that we
can write
p(X) = c1 (X − 1) + c2 (X 2 − 2X + 1) + c3 (X 3 − 3X 2 + 3X − 1),
for some c1 , c2 , c3 ∈ Z5 . As two polynomials are equal if and only if all the
coefficients are equal, we arrive at the following set of equations


 −c1 + c2 − c3 = p0
c1 − 2c2 + 3c3 = p1

.

 c2 − 3c3 = p2
c3 = p3

Setting up the corresponding augmented matrix and putting it in row

reduced echelon form, we find
   
4 1 4 p0 1 0 0 p1 + 2p2 + 3p3
1 3 3 p1 
 → 0 1 0 p2 + 3p3 

 ,
0 1 2 p2  0 0 1 p3 
0 0 1 p3 0 0 0 0
where we used that p0 + p1 + p2 + p3 = 0. Thus the system is consistent. We
find that
p(X) = (p1 +2p2 +3p3 )(X−1)+(p2 +3p3 )(X 2 −2X+1)+p3 (X 3 −3X 2 +3X−1).
Thus p(X) ∈ Span{X − 1, X 2 − 2X + 1, X 3 − 3X 2 + 3X − 1}, and the proof

is complete.
Vector Spaces 41
Let W be a vector space. We say that S ⊂ W is a basis for W if the

following two conditions are both satisfied:
(i) Span S = W .
(ii) S is linearly independent.
If S has a finite number of elements, then for any other basis of W it will
have the same number of elements, as the following result shows.
Proposition 2.4.4 Let B = {v1 , . . . , vn } and C = {w1 , . . . , wm } be bases

for the vector space W . Then n = m.
Proof. Suppose that n 6= m. Without loss of generality, we may assume that

n < m. As B is a basis, we can express wj as a linear combination of
elements of B:
wj = a1j v1 + · · · + anj vn , j = 1, . . . , m.
The matrix A = (aij )ni=1,j=1

m
has more columns than rows (and thus a
non-pivot
  column), so the equation Ac = 0 has a nontrivial solution
c1
c =  ...  6= 0. But then it follows that
 
cm
m
X m
X n
X n X
X m n
X
cj wj = [cj aij vi )] = ( aij cj )vi = 0vi = 0.
j=1 j=1 i=1 i=1 j=1 i=1
Thus a nontrivial linear combination of elements of C equals 0, and thus C is

linearly dependent. Contradiction. Consequently, we must have n = m.
We can now define the dimension of a vector space as:
dim W := the number of elements in a basis of W.
When no basis with a finite number of elements exists for W , we say

dim W = ∞.
Remark 2.4.5 Notice that the proof of Proposition 2.4.4 also shows that in
an n-dimensional vector space any set of vectors with more than n elements
must be linearly dependent.
R1
Example 2.4.6 Let W = {p(X) ∈ R2 [X] : −1 p(x)dx = 0}. Show that W
is a subspace of R2 [X] and find a basis for W .
Clearly, the zero polynomial 0(x) belongs to W as

R1 R1
−1
0(x)dx = −1 0dx = 0. Next, when p(X), q(X) ∈ W , then
Z 1 Z 1 Z 1
(p(x) + q(x))dx = p(x)dx + q(x)dx = 0 + 0 = 0,
−1 −1 −1
so (p + q)(X) ∈ W . Similarly, when c ∈ R and p(X) ∈ W , then

Z 1 Z 1 Z 1
(cp)(x)dx = cp(x)dx = c p(x)dx = c0 = 0,
−1 −1 −1
so (cp)(X) ∈ W . Thus by Proposition 2.3.1, W is a subspace of R2 [X].
To find a basis, let us take an arbitrary element

p(X) = p0 + p1 X + p2 X 2 ∈ W , which means that
Z 1
2
p(x)dx = 2p0 + p2 = 0.
−1 3
This yields the linear system

 
p0
2

2 0 p1  = 0.
3
p2
The coefficient matrix only has a pivot in column 1, so we let p1 and p2 be

the free variables (as they correspond to the variables corresponding to the
2nd and 3rd column) and observe that p0 = − 13 p2 . Expressing p(X) solely in
the free variables we get
1
p(X) = p0 + p1 X + p2 X 2 = p1 X + p2 (X 2 − ).
3
Thus p(X) ∈ Span{X, X 2 − 13 }. As we started with an arbitrary p(X) ∈ W ,
we now proved that W ⊆ Span{X, X 2 − 31 }. As X ∈ W and X 2 − 13 ∈ W
and W is a subspace, we also have Span{X, X 2 − 13 } ⊆ W . Thus
Span{X, X 2 − 13 } = W .
Next, one easily checks that {X, X 2 − 31 } is linearly independent, and thus
{X, X 2 − 13 } is a basis for W . In particular, dimW = 2.
Example 2.4.7 Let V = R4 ,

       
1 1 4 2
0 1 2 0
U = Span{2 , 1}, W = Span{2 , 2}.
      
1 1 0 0
Vector Spaces 43
Find bases for U ∩ W and U + W .
Vectors in U ∩ W are of the form

       
1 1 4 2
0 1 2 0
2 + x2 1 = x3 2 + x4 2 .
x1         (2.7)
1 1 0 0
Setting up the homogeneous system of linear equations, and subsequently

row reducing, we get
     
1 1 −4 −2 1 1 −4 −2 1 1 −4 −2
0 1 −2 0  0 1 −2 0  0 1 −2 0 
2 1 −2 −2 → 0 −1 6
   → .
2 0 0 4 2
1 1 0 0 0 0 4 2 0 0 0 0
This gives that x4 is free and x3 = − x24 . Plugging this into the right-hand
side of (2.7) gives
     
4 2 0
x4  2 0 −1
−   + x4   = x4  
2 2 2 1
0 0 0
as a typical element of U ∩ W . So
 
0
−1
 1 }
{ 
is a basis for U ∩ W .
Notice that        
1 1 4 2
0 1 2 0
2 , 1 , 2 , 2}.
U + W = Span{       
1 1 0 0
From the row reductions above, we see that the fourth vector is a linear
combination of the first three, while the first three are linearly independent.
Thus a basis for U + W is
     
1 1 4
0 1 2
{  ,   ,  
     }.
2 1 2
1 1 0
Notice that
dim(U + W ) = 3 = 2 + 2 − 1 = dim U + dim W − dim(U ∩ W ).
In Exercise 2.6.15 we will see that this holds in general.
Example 2.4.8 The vectors

     
1 0 0
0 1 0
e1 =  .  , e2 =  .  , . . . , en =  . 
     
 ..   ..   .. 
0 0 1
form a basis for Fn . Thus dim Fn = n. We call this the standard basis for Fn .
Example 2.4.9 The previous example shows that Cn has dimension n.

Here it is understood that we view Cn as a vector space over C, which is the
default. However, we may also view Cn as a vector space over R. In this
case, we only allow scalar multiplication with real scalars. In this setting the
vectors
           
1 i 0 0 0 0
0 0 1 i 0 0
e1 =  .  , ie1 =  .  , e2 =  .  , ie2 =  .  . . . , en =  .  , ien =  . 
           
 ..   ..   ..   ..   ..   .. 
0 0 0 0 1 i
(2.8)
are linearly independent. Indeed, if a1 , b1 , . . . , an , bn ∈ R, then the equality
a1 e1 + b1 (ie1 ) + · · · + an en + bn (ien ) = 0,
leads to a1 + ib1 = 0, . . . , an + ibn = 0, which yields

a1 = b1 = · · · = an = bn = 0 (in this last step we used that aj and bj are all
real). It is easy to check that taking all real linear combinations of the
vectors in (2.8) we get all of Cn . Thus the vectors in (2.8) form a basis of Cn
when viewed as a vector space over R, and thus its dimension viewed as a
vector space over R is 2n. We write this as dimR Cn = 2n.
Example 2.4.10 For polynomial spaces we have the following observations.
• The set {1, X, X 2 , X 3 , . . .} is a basis for F[X]. We have dim F[X] = ∞.
• The set {1, X, X 2 , . . . , X n } is a basis for Fn [X]. We have

dim Fn [X] = n + 1.
Vector Spaces 45
• The set {1, i, X, iX, X 2 , iX 2 , . . . , X n , iX n } is a basis for Cn [X] when

viewed as a vector space over R. We have dimR Cn [X] = 2n + 2.
These bases are all referred to as the “standard” basis for their respective
vector space.
We let Ejk be the matrix with all zero entries except for the (j, k) entry
which equals 1. We expect the size of the matrices Ejk to be clear from the
context in which we use them.
Example 2.4.11 For matrix spaces we have the following observations.
• The set {Ej,k : j = 1, . . . , n, k = 1, . . . , m} is a basis for Fn×m . We have

dim Fn×m = nm.
• The set
{Ej,k : j = 1, . . . , n, k = 1, . . . , m}∪{iEj,k : j = 1, . . . , n, k = 1, . . . , m}
is a basis for Cn×m when viewed as a vector space over R. We have
dimR Cn×m = 2nm.
These bases are all referred to as the standard basis for their respective
vector space.
2.5 Coordinate systems
We will see in this section that any n-dimensional vector space over F
“works the same” as Fn , which simplifies the study of such vector spaces
tremendously. To make this idea more precise, we have to discuss coordinate
systems. We start with the following result.
Theorem 2.5.1 Let B = {v1 , . . . , vn } be a basis for a vector space V over

F. Then for each v ∈ V there exists unique c1 , . . . , cn ∈ F so that
v = c1 v1 + · · · + cn vn . (2.9)
Proof. Let v ∈ V . As Span B = V , we have that v = c1 v1 + · · · + cn vn for

some c1 , . . . , cn ∈ F. Suppose that we also have v = d1 v1 + · · · + dn vn for
some d1 , . . . , dn ∈ F. Then
n
X n
X
0=v−v = cj vj − dj vj = (c1 − d1 )v1 + · · · + (cn − dn )vn .
j=1 j=1
As {v1 , . . . , vn } is linearly independent, we must have

c1 − d1 = 0, . . . , cn − dn = 0. This yields c1 = d1 , . . . , cn = dn , yielding the
uniqueness.
When (2.9) holds, we say that c1 , . . . , cn are the coordinates of v relative to

the basis B, and we write  
c1
 .. 
[v]B =  .  .
cn
Thus, when B = {v1 , . . . , vn } we have
 
c1
 .. 
v = c1 v1 + · · · + cn vn ⇔ [v]B =  .  . (2.10)
cn
Pn v = c1 v1 + · · · + cn vn , w = d1 v1 + · · · + dn vn , then
Clearly, when
v + w = j=1 (cj + dj )vj , and thus
 
c1 + d1
[v + w]B =  ...  = [v]B + [w]B .
 
cn + dn
Similarly, 

αc1
[αv]B =  ...  = α[v]B .
 
αcn
Thus adding two vectors in V corresponds to adding their corresponding
coordinate vectors (which are both with respect to the basis B), and
multiplying a vector by a scalar in V corresponds to multiplying the
corresponding coordinate vector by the same scalar. As we will see in the
next chapter, the map v 7→ [v]B is a bijective linear map (also called an
isomorphism). This map allows one to view an n-dimensional vector space V
over F as essentially being the vector space Fn .
       
1 1 1 6
Example 2.5.2 Let V = Z37 and B = {1 , 2 , 3}. Let v = 5.
1 3 6 4
Find [v]B .
Vector Spaces 47
 
c1
Denoting [v]B = c2  we need to solve for c1 , c2 , c3 in the vector equation
c3
       
1 1 1 6
c1 1 + c2 2 + c3 3 = 5 .
1 3 6 4
Setting up the augmented matrix and row reducing gives

   
1 1 1 6 1 1 1 6
1 2 3 5 → 0 1 2 6 →
1 3 6 4 0 2 5 5
   
1 1 1 6 1 0 0 0
0 1 2 6  → 0 1 0 6 ,
0 0 1 0 0 0 1 0
 
0
yielding c1 = 0, c2 = 6, c3 = 0. Thus [v]B = 6 .
0
Example 2.5.3 Let V = C3 [X] and

B = {1, X − 1, X 2 − 2X + 1, X 3 − 3X 2 + 3X − 1}. Find [X 3 + X 2 + X + 1]B .
We need to find c1 , c2 , c3 , c4 ∈ C so that
c1 1 + c2 (X − 1) + c3 (X 2 − 2X + 1) + c4 (X 3 − 3X 2 + 3X − 1) = X 3 + X 2 + X + 1.
Equating the coefficients of 1, X, X 2 , X 3 , setting up the augmented matrix,

and row reducing gives
   
1 −1 1 −1 1 1 −1 1 0 2
0 1 −2 3 1  0 1 −2 0 −2
 → →
0 0 1 −3 1 0 0 1 0 4
0 0 0 1 1 0 0 0 1 1
   
1 −1 0 0 −2 1 0 0 0 4
0 1 0 0 6  0 1 0 0 6
 → .
0 0 1 0 4 0 0 1 0 4
0 0 0 1 1 0 0 0 1 1
 
4
6
Thus we find [X 3 + X 2 + X + 1]B =  4 .

1
2.6 Exercises
Exercise 2.6.1 For the proof of Lemma 2.1.1 provide a reason why each
equality holds. For instance, the equality 0 = 0u + v is due to Axiom 5 in
the definition of a vector space and v being the additive inverse of 0u.
Exercise 2.6.2 Consider p(X), q(X) ∈ F[X] with F = R or F = C. Show

that p(X) = q(X) if and only if p(x) = q(x) for all x ∈ F. (One way to do it
is by using derivatives. Indeed, using calculus one can observe that if two
polynomials are equal, then so are all their derivatives. Next observe that
dj p
pj = j!1 dx j (0).) Where do you use in your proof that F = R or F = C?
Exercise 2.6.3 When the underlying field is Zp , why does closure under
addition automatically imply closure under scalar multiplication?
Exercise 2.6.4 Let V = RR . For W ⊂ V , show that W is a subspace of V .
(a) W = {f : R → R : f is continuous}.
(b) W = {f : R → R : f is differentiable}.
Exercise 2.6.5 For the following choices of F, V and W , determine whether

W is a subspace of V over F. In case the answer is yes, provide a basis for W .
(a) Let F = R and V = R3 ,

 
x1
W = {x2  : x1 , x2 , x3 ∈ R, x1 − 2x2 + x23 = 0}.
x3
(b) F = C and V = C3×3 ,

 
a b c
W = {0 a b  : a, b, c ∈ C}.
0 0 a
(c) F = C and V = C2×2 ,

a b̄
W ={ : a, b, c ∈ C}.
b c
Vector Spaces 49
(d) F = R, V = R2 [X] and

Z 1
W = {p(x) ∈ V : p(x) cos xdx = 0}.
0
(e) F = R, V = R2 [X] and
W = {p(x) ∈ V : p(1) = p(2)p(3)}.
(f) F = C, V = C3 , and
 
x1
W = {x2  ∈ C3 : x1 − x2 = x3 − x2 }.
x3
Exercise 2.6.6 For the following vector spaces (V over F) and vectors,
determine whether the vectors are linearly independent or linearly
independent.
(a) Let F = Z5 , V = Z45 and consider the vectors

     
3 2 1
0 1 2
 , , .
2 0 1
1 3 0
(b) Let F = R, V = {f | f : (0, ∞) → R is a continuous function}, and

consider the vectors
1
t, t2 , .
t
(c) Let F = Z5 , V = Z45 and consider the vectors
     
4 2 1
0 1 2
 , , .
2 0 1
3 3 0
(d) Let F = R, V = {f | f : R → R is a continuous function}, and consider

the vectors
cos 2x, sin 2x, cos2 x, sin2 x.
(e) Let F = C, V = C2×2 , and consider the vectors

i 1 1 1 −1 i
, , .
−1 −i i −i −i 1
(f) Let F = R, V = C2×2 , and consider the vectors

i 1 1 1 −1 i
, , .
−1 −i i −i −i 1
(g) Let F = Z5 , V = F3×2 , and consider the vectors

     
3 4 1 1 1 2
1 0 , 4 2 , 3 1 .
1 0 1 2 1 2
(h) Let F = R, V = {f | f : R → R is a continuous function}, and consider

the vectors
1, et , e2t .
Exercise 2.6.7 Let v1 , v2 , v3 be linearly independent vectors in a vector

space V .
(a) For which k are kv1 + v2 , kv2 − v3 , v3 + v1 linearly independent?

(b) Show that if v is in the span of v1 , v2 and in the span of
v2 + v3 , v2 − v3 , then v is a multiple of v2 .
Exercise 2.6.8 (a) Show that if the set {v1 , . . . , vk } is linearly

independent, and vk+1 is not in Span{v1 , . . . , vk }, then the set
{v1 , . . . , vk , vk+1 } is linearly independent.
(b) Let W be a subspace of an n-dimensional vector space V , and let
{v1 , . . . , vp } be a basis for W . Show that there exist vectors
vp+1 , . . . , vn ∈ V so that {v1 , . . . , vp , vp+1 , . . . , vn } is a basis for V .
(Hint: once v1 , . . . , vk are found and k < n, observe that one can choose
vk+1 ∈ V \ (Span{v1 , . . . , vk }). Argue that this process stops when
k = n, and that at that point a basis for V is found.)
Exercise 2.6.9 Let V = R2 [X] and
W = {p ∈ V : p(2) = 0}.
(a) Show that W is a subspace of V .

(b) Find a basis for W .
Exercise 2.6.10 For the following choices of subspaces U and W in V , find

bases for U + W and U ∩ W .
Vector Spaces 51
(a) V = R5 [X], U = Span{X + 1, X 2 − 1}, W = {p(X) : p(2) = 0}.

(b) V = Z45 ,
       
3 2 1 4
0 1 2 4
2 , 0}, W = Span{1 , 1}.
U = Span{       
1 0 0 1
Exercise 2.6.11 Let {v1 , v2 , v3 , v4 , v5 } be linearly independent vectors in

a vector space V . Determine whether the following sets are linearly
dependent or linearly independent.
(a) {v1 + v2 + v3 + v4 , v1 − v2 + v3 − v4 , v1 − v2 − v3 − v4 }.
(b) {v1 + v2 , v2 + v3 , v3 + v4 , v4 + v5 , v5 + v2 }.
(c) {v1 + v3 , v4 − v2 , v5 + v1 , v4 − v2 , v5 + v3 , v1 + v2 }.
When you did this exercise, did you make any assumptions on the
underlying field?
Exercise 2.6.12
Let {v1 , v2 , v3 , v4 } be a basis for a vector space V over Z3 . Determine

whether the following are also bases for V .
(a) {v1 + v2 + v3 + v4 , v1 − v2 + v3 − v4 , v1 − v2 − v3 − v4 }.
(b) {v1 , v2 + v3 + v4 , v1 − v2 + v3 − v4 , v1 − v2 − v3 − v4 }.
(c) {v1 + v2 + v3 + v4 , v1 − v2 + v3 − v4 , v1 − v2 − v3 − v4 , v2 + v4 , v1 + v3 }.
Exercise 2.6.13
For the following choices of vector spaces V over the field F, bases B and
vectors v, determine [v]B .
(a) Let F = Z5 , V = Z45 ,

         
3 2 1 0 1
0 1 2 2 3
B = { 2 , 0 , 1 , 1}, v = 2 .
        
1 0 0 0 2
t3 +3t2 +5
(b) Let F = R, B = {t, t2 , 1t }, V = SpanB and v = t .
(c) Let F = C, V = C2×2 ,

0 1 1 1 i 0 i 1 −2 + i 3 − 2i
B={ , , }, v = .
−1 −i i −i −1 −i −1 −i −5 − i 10
(d) Let F = R, V = C2×2 , and consider the vectors

−1 i
B = {E11 , E12 , E21 , E22 , iE11 , iE12 , iE21 , iE22 }, v = .
−i 1
(e) Let F = Z5 , V = SpanB,

       
 3 4 1 1 1 2  0 2
B=  1 0 , 4
 2 , 3 3 , v = 3 0 .
1 0 1 2 3 0 0 2
 
Exercise 2.6.14 Given a matrix A = (ajk )nj=1,k=1

m
∈ Cn×m , we define
∗ m n m×n
A = (akj )j=1,k=1 ∈ C . For instance,
 
∗ 1 − 2i 7 − 8i
1 + 2i 3 + 4i 5 + 6i
= 3 − 4i 9 − 10i  .
7 + 8i 9 + 10i 11 + 12i
5 − 6i 11 − 12i
We call a matrix A ∈ Cn×n Hermitian if A∗ = A. For instance,

2 1 − 3i
is Hermitian. Let Hn ⊆ Cn×n be the set of all n × n
1 + 3i 5
Hermitian matrices.
(a) Show that Hn is not a vector space over C.

(b) Show that Hn is a vector space over R. Determine dimR Hn .
(Hint: Do it first for 2 × 2 matrices.)
Exercise 2.6.15 (a) Show that for finite-dimensional subspaces U and W

of V we have that dim(U + W ) = dim U + dim W − dim(U ∩ W ).
(Hint: Start with a basis {v1 , . . . , vp } for U ∩ W . Next, find u1 , . . . , uk
so that {v1 , . . . , vp , u1 , . . . , uk } is a basis for U . Similarly, find
w1 , . . . , wl so that {v1 , . . . , vp , w1 , . . . , wl } is a basis for W . Finally,
argue that {v1 , . . . , vp , u1 , . . . , uk , w1 , . . . , wl } is a basis for U + W .)
(b) Show that for a direct sum U1 +̇ · · · +̇Uk of finite-dimensional subspaces
U1 , . . . , Uk , we have that
dim(U1 +̇ · · · +̇Uk ) = dim U1 + · · · + dim Uk .
Vector Spaces 53
Exercise 2.6.16 (Honors) Let Pn

Pn = {(pi )ni=1 ∈ Rn : pi > 0, i = 1, . . . , n, and i=1 pi = 1}. Define the
operations
⊕ : Pn × Pn → Pn , ◦ : R × Pn → Pn ,
via
1
(pi )ni=1 ⊕ (qi )ni=1 := Pn (pi qi )ni=1 ,
j=1 pj qj
and
1
c ◦ (pi )ni=1 := Pn c (pci )ni=1 .
j=1 pj
Show that Pn with the operations ⊕ and ◦ is a vector space over R. Why is
Pn not a subspace of Rn ?
1
n
Hint: observe that  ...  is the neutral element for ⊕.
 
1
n
This exercise is based the paper [A. Sgarro, An informational divergence

geometry for stochastic matrices. Calcolo 15 (1978), no. 1, 41–49.] Thanks
are due to Valerie Girardin for making the author aware of the example.
Exercise 2.6.17 (Honors) Let V be a vector space over F and W ⊆ V a

subspace. Define the relation
v ∼ v̂ ⇔ v − v̂ ∈ W.
(a) Show that ∼ is an equivalence relation.
Let
v + W := {v̂ : v ∼ v̂}
denote the equivalence class of v ∈ V , and let
V /W := {v + W : v ∈ V }
denote the set of equivalence classes. Define addition and scalar

multiplication on V /W via
(v + W ) + (v̂ + W ) := (v + v̂) + W , c(v + W ) := (cv) + W.
(b) Show that addition on V /W is well-defined. (It needs to be shown that

if v + W = w + W and v̂ + W = ŵ + W , then
(v + v̂) + W := (w + ŵ) + W as the sum of two equivalence classes
should be independent on the particular representatives chosen.)
(c) Show that scalar multiplication on V /W is well-defined.
(d) Show that V /W is a vector space.

1
(e) Let V = R2 and W = Span{ }. Explain that V /W consists of all
1
lines parallel to W , and explain how the addition and scalar
multiplication are defined on these parallel lines.
3
Linear Transformations
CONTENTS
3.1 Definition of a linear transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.2 Range and kernel of linear transformations . . . . . . . . . . . . . . . . . . . . . . 57
3.3 Matrix representations of linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Now that we have introduced vector spaces, we can move on to the next
main object in linear algebra: linear transformations. These are functions
between vector spaces that behave nicely with respect to the two
fundamental operations on a vector space: addition and scalar
multiplication. Differentiation and taking integrals are two important
examples of linear transformation. Regarding the nice behavior, note for
example that if we take the derivative of the sum of two functions, it is the
same as if we would take the derivative of each and then the sum. Let us
start with the precise definition.
3.1 Definition of a linear transformation
Let V and W be vector spaces over the same field F. A function T : V → W

is called linear if
(i) T (u + v) = T (u) + T (v) for all u, v ∈ V , and

(ii) T (cu) = cT (u) for all c ∈ F and all u ∈ V . In this case, we say that T is
a linear transformation or a linear map.
When T is linear, we must have that T (0) = 0. Indeed, by using (ii) we have
T (0) = T (0 · 0) = 0T (0) = 0, where in the first and last step we used
Lemma 2.1.1.
55
Example 3.1.1 Let T : Z23 → Z33 be defined by

 
2x1 + x2
x1
T =  x1 + x2  .
x2
x2
Then
 
2(x1 + y1 ) + x2 + y2
x y x1 + y1
T( 1 + 1 ) = T =  x1 + y1 + x2 + y2  =
x2 y2 x2 + y2
x2 + y2
   
2x1 + x2 2y1 + y2
 x1 + x2  +  y1 + y2  = T x1 + T x1 ,
x2 x2
x2 y2
and
   
2cx1 + cx2 2x1 + x2
x1 cx1 x1
T (c )=T = cx1 + cx2 = c x1 + x2 = cT
    .
x2 cx2 x2
cx2 x2
Thus T is linear.
Example 3.1.2 Let T : C3 → C2 be defined by

 
x1
x1 x2
T x2  = .
x1 + x2 + x3
x3
Then      
1 2 1
1 4 1
T 1 = , T 2 = 6 2
= = 2T 1 ,
3 6 3
1 2 1
thus T fails to satisfy (ii) above. Thus T is not linear.
Notice that in order to show that a function is not linear, one only needs to
provide one example where the above rules (i) or (ii) are not satisfied.
The linear map in Example 3.1.1 can be written in the form

 
2 1
x1 x
T = 1 1 1 .
x2 x2
0 1
We have the following general result, from which linearity in Example 3.1.1
directly follows.
Linear Transformations 57
Proposition 3.1.3 Let A ∈ Fn×m and define T : Fm → Fn via T (x) := Ax.

Then T is linear.
Proof. This follows directly from rules on matrix vector multiplication:

A(x + y) = Ax + Ay and A(cx) = cAx.
3.2 Range and kernel of linear transformations
With a linear transformation there are two subspaces associated with it: the
range (which lies in the co-domain) and the kernel (which lies in the
domain). These subspaces provide us with crucial information about the
linear transformation. We start with discussing the range.
Let T : V → W be a linear map. Define the range of T by
Ran T := {w ∈ W : there exists a v ∈ V so that T (v) = w}.
Proposition 3.2.1 Let T : V → W be a linear map. Then Ran T is a

subspace of W . Moreover, if {v1 , . . . , vp } is a basis for V , then
Ran T = Span{T (v1 ), . . . , T (vp )}. In particular dim Ran T ≤ dim V .
Proof. First observe that T (0) = 0 gives that 0 ∈ Ran T . Next, let w,
ŵ ∈ Ran T and c ∈ F. Then there exist v, v̂ ∈ V so that T (v) = w and
T (v̂) = ŵ. Then w + ŵ = T (v + v̂) ∈ Ran T and cw = T (cv) ∈ Ran T .
Thus, by Proposition 2.3.1, Ran T is a subspace of W .
Clearly, T (v1 ), . . . , T (vp ) ∈ Ran T , and since Ran T is a subspace we have

that Span{T (v1 ), . . . , T (vp )} ⊆ Ran T . For the converse inclusion, let
w ∈ Ran T . Then there exists a v ∈ V so that T (v) = w. As {v1 , . . . , vp } is
a basis for V , there exist c1 , . . . , cp ∈ F so that v = c1 v1 + · · · + cp vp . Then
Xp p
X
w = T (v) = T ( cj vj ) = cj T (vj ) ∈ Span{T (v1 ), . . . , T (vp )}.
j=1 j=1
Thus Ran T ⊆ Span{T (v1 ), . . . , T (vp )}. We have shown both inclusions, and
consequently Ran T = Span{T (v1 ), . . . , T (vp )} follows.
We say that T : V → W is onto (or surjective) if Ran T = W . Equivalently,

T is onto if and only if for every w ∈ W there exists a v ∈ V so that
T (v) = w.
Example 3.1.1 continued. As the standard basis {e1 , e2 } is a basis for Z23 ,
we have that
   
2 1
Ran T = Span{T (e1 ), T (e2 )} = 1 , 1}.
0 1
In fact, as these two vectors are linearly independent, they form a basis for
Ran T . The map T is not onto as dim Ran T = 2, while
dim W = dim Z33 = 3, and thus Ran T 6= Z33 .
Define the kernel of T by
Ker T := {v ∈ V : T (v) = 0}.
Proposition 3.2.2 Let T : V → W be a linear map. Then Ker T is

subspace of V .
Proof. First observe that T (0) = 0 gives that 0 ∈ Ker T . Next, let v,
v̂ ∈ Ker T and c ∈ F. Then T (v + v̂) = T (v) + T (v̂) = 0 + 0 = 0 and
T (cv) = cT (v) = c0 = 0, so v + v̂, cv ∈ Ker T . Thus, by Proposition 2.3.1,
Ker T is a subspace of V .
We say that T : V → W is one-to-one (or injective) if T (v) = T (w) only

holds when v = w. We have the following way to check for injectivity of a
linear map.
Lemma 3.2.3 The linear map T is one-to-one if and only if Ker T = {0}.
Proof. Suppose that T is one-to-one, and v ∈ Ker T . Then T (v) = 0 = T (0),

where in the last step we used that T is linear. Since T is one-to-one,
T (v) = T (0) implies that v = 0. Thus Ker T = {0}.
Next, suppose that Ker T = {0}, and let T (v) = T (w). Then, using linearity
we get 0 = T (v) − T (w) = T (v − w), implying that v − w ∈ Ker T = {0},
and thus v − w = 0. Thus v = w, and we can conclude that T is one-to-one.
Example 3.2.4 Let V = R3 [X], W = R2 , and

p(1)
T (p(X)) = R 2 .
0
p(x)dx
Let p(X) = a + bX + cX 2 + dX 3 ∈ Ker T , then

 
a
a+b+c+d 1 1 1 1 b .

0 = T (p(X)) = =
2a + 2b + 38 c + 4d 2 2 8
3 4  c
d
Row reducing
1 1 1 1 1 1 1 1
8 → 2 , (3.1)
2 2 3 4 0 0 3 2
gives that b and d are free variables and c = −3d, a = −b − c − d = −b + 2d.
Thus
p(X) = b(−1 + X) + d(2 − 3X 2 + X 3 ).
We get that Ker T = Span{−1 + X, 2 − 3X 2 + X 3 }. In fact, the two
polynomials are a basis for Ker T .
As {1, X, X 2 , X 3 } is a basis for R3 [X], we get that

Ran T = Span{T (1), T (X), T (X 2 ), T (X 3 )} =

1 1 1 1 1 1
Span{ , , 8 , } = Span{ , 8 }.
2 2 3 4 2 3
In the last step, we reduced the set of vectors to a basis for Ran T by just
keeping the columns corresponding to pivot columns in (3.1).
Notice that
dim Ker T + dim Ran T = 2 + 2 = 4 = dim R3 [X].
As the next result shows, this is not a coincidence.
Theorem 3.2.5 Let T : V → W be linear, and suppose that dim V < ∞.

Then
dim Ker T + dim Ran T = dim V. (3.2)
Proof. Let {v1 , . . . , vp } be a basis for Ker T (⊆ V ), and {w1 , . . . , wq } a basis

for Ran T (notice that by Proposition 3.2.1 it follows that Ran T is finite
dimensional as V is finite dimensional). Let x1 , . . . , xq ∈ V be so that
T (xj ) = wj , j = 1, . . . , q. We claim that B = {v1 , . . . , vp , x1 , . . . , xq } is a
basis for V , which then yields that dim V = p + q = dim Ker T + dim Ran T .
Let v ∈ P
V . Then T (v) ∈ Ran T , and thus there exist b1 , . . . , bq so that
q
T (v) = j=1 bj wj . Then
q
X q
X
T (v − bj xj ) = T (v) − bj wj = 0.
j=1 j=1
Pq
Thus v − j=1 bj xj ∈ Ker T . Therefore, there exist a1 , . . . , ap ∈ F so that
Pq Pp
v − j=1 bj xj = j=1 aj vj . Consequently,
Pp Pq
v = j=1 aj vj + j=1 bj xj ∈ Span B. This proves that V = Span B.
It
Ppremains to P show that B, is linearly independent, so assume
q
j=1 aj vj + j=1 bj xj = 0. Then
p
X q
X p
X q
X q
X
0 = T( aj vj + bj xj ) = aj T (vj ) + bj T (xj ) = bj wj ,
j=1 j=1 j=1 j=1 j=1
where we use that vj ∈ Ker T , j = 1, . . . , p. As {w1 , . . . , wq } is linearly

independent,
Pp we now get that b1 = · · · = bq = 0. But then we obtain that
j=1 a j vj = 0, and as {v1 , . . . , vp } is linearly independent, we get
Pp Pq
a1 = · · · = ap = 0. Thus j=1 aj vj + j=1 bj xj = 0 implies
a1 = · · · = ap = b1 = · · · = bq = 0, showing the linear independence of B.
We say that T is bijective if T is both onto and one-to-one. We let

idV : V → V denote the identity mapping, that is idV (v) = v, v ∈ V .
Proposition 3.2.6 Let T : V → W be bijective. Then T has an inverse

T −1 . That is, T −1 : W → V exists so that T ◦ T −1 = idW and
T −1 ◦ T = idV . Moreover, T −1 is linear. Conversely, if T has an inverse,
then T is bijective.
Proof. Let w ∈ W . As T is onto, there exists a v ∈ V so that T (v) = w, and

as T is one-to-one, this v is unique. Define T −1 (w) := v, making
T −1 : W → V well-defined. It is straightforward to check that
T (T −1 (w)) = w for all w ∈ W , and T −1 (T (v)) = v for all v ∈ V .
Next suppose T −1 (w) = v and T −1 (ŵ) = v̂. This means that T (v) = w and
T (v̂) = ŵ. Thus T (v + v̂) = w + ŵ. But then, by definition,
T −1 (w + ŵ) = v + v̂ and, consequently, T −1 (w + ŵ) = T −1 (w) + T −1 (ŵ).
Similarly, one proves T −1 (cw) = cT −1 (w). Thus T −1 is linear.
Next, suppose that T has an inverse T −1 . Let w ∈ W . Put v = T −1 (w).

Then T (v) = w, and thus w ∈ Ran T . This shows that T is onto. Finally,
suppose that T (v) = T (v̂). Applying T −1 on both sides, gives
v = T −1 (T (v)) = T −1 (T (v̂)) = v̂, showing that T is one-to-one.
A bijective linear map T is also called an isomorphism. We call two vector

spaces V and W isomorphic if there exists an isomorphism T : V → W .
When two vector spaces are isomorphic they essentially have the same vector
space properties. Indeed, whatever vector space properties V has, are carried
over by T to W , and whatever vector space properties W has, are carried
over by T −1 to V . As the following results shows, any n-dimensional vector

space over the field F is isomorphic to Fn .
Theorem 3.2.7 Let V be a n-dimensional vector space over F. Let

B = {v1 , . . . , vn } be a basis for V . Then the map T : V → Fn defined via
T (v) = [v]B is an isomorphism. In particular, V and Fn are isomorphic.
Proof. In Section 2.5 we havealready

 seen that T is linear. Next suppose
0
that T (v) = 0. Thus [v]B =  ... , which means that v = j=1 0vj = 0.
  Pn
0
 
c1
 .. 
Thus Ker T = {0}, giving that T is one-to-one. Next, let  .  ∈ Fn . Put
cn
 
c1
v = j=1 cj vj . Then T (v) = [v]B =  ...  ∈ Ran T . This shows that
Pn  
cn
Ran T = V , and thus T is onto.
The following example illustrates this result.
Example 3.2.8 Let T : Fn−1 [X] → Fn be defined by

 
a0
 a1 
T (a0 + a1 X + · · · + an−1 X n−1 ) :=  .
 
..
 . 
an−1
It is easy to see that T is an isomorphism, and thus Fn−1 [X] and Fn are
isomorphic. The underlying basis B here is the standard basis
{1, X, . . . , X n−1 }.
3.3 Matrix representations of linear maps
The following results show that any linear map between finite-dimensional
spaces allows a matrix representation with respect to chosen bases. The
significance of this result is that one can study linear maps between
finite-dimensional spaces by studying matrices.
Theorem 3.3.1 Given vector spaces V and W over F, with bases

B = {v1 , . . . , vn } and C = {w1 , . . . , wm }, respectively. Let T : V → W .
Represent T (vj ) with respect to the basis C:
 
a1j
T (vj ) = a1j w1 + · · · + amj wm ⇔ [T (vj )]C =  ...  , j = 1, . . . , n. (3.3)
 
amj
Introduce the matrix [T ]C←B = (aij )m n

i=1,j=1 . Then we have that
T (v) = w ⇔ [w]C = [T ]C←B [v]B . (3.4)
Conversely, if A = (aij )m n
i=1,j=1 ∈ F
m×n
is given, then defining T : V → W
Pn Pn
via (3.3) and extending by linearity via T ( j=1 cj vj ) := j=1 cj T (vj ),
yields a linear map T : V → W with matrix representation [T ]C←B = A.
Proof. The proof follows directly from the following observation. If

 
c1
 .. 
v = c1 v1 + · · · + cn vn ⇔ [v]B =  .  ,
cn
then
n
X
w = T (v) = cj T (vj ) =
j=1
 Pn   
n m j=1 a1j cj c1
X
cj (
X
akj wk ) ⇔ [w]C =  .. m n  .. 
 = (aij )i=1,j=1  .  .
 
.
j=1 k=1
Pn
j=1 amj cj cn

Example 3.3.2 Let V = C2×2 and F = C. Let B be the standard basis

{E11 , E12 , E21 , E22 }. Define T : V → V via

1 2 i 3i
T (A) = A .
3 4 5i 7i
Find the matrix representation [T ]B←B .
Compute

i 3i
T (E11 ) = = iE11 + 3iE12 + 3iE21 + 9iE22 ,
3i 9i

5i 7i
T (E12 ) = = 5iE11 + 7iE12 + 15iE21 + 21iE22 ,
15i 21i

2i 6i
T (E21 ) = = 2iE11 + 6iE12 + 4iE21 + 12iE22 ,
4i 12i

10i 14i
T (E22 ) = = 10iE11 + 14iE12 + 20iE21 + 28iE22 .
20i 28i
This gives that  
i 5i 2i 10i
3i 7i 6i 14i
[T ]B←B =
3i 15i 4i
.
20i
9i 21i 12i 28i
Example 3.3.3 Let V = Z35 and

           
3 2 1 0 1 2
B = {0 , 3 , 4}, C = {2 , 0 , 1}.
1 4 1 4 3 0
Find the matrix representation [idV ]C←B .
Compute          
3 3 0 1 2
idV 0 = 0 = 2 2 + 1 0 + 1 1 ,
1 1 4 3 0
         
2 2 0 1 2
idV 3 = 3 = 1 2 + 0 0 + 1 1 ,
4 4 4 3 0
         
1 1 0 1 2
idV 4 = 4 = 2 2 + 1 0 + 0 1 .
1 1 4 3 0
This gives that  
2 1 2
[idV ]C←B = 1 0 1 .
1 1 0
The next result shows that composition of linear maps corresponds to matrix
multiplication of the matrix representation, when the bases match. Please be
reminded that the composition is defined via (S ◦ T )(x) = S(T (x)).
Theorem 3.3.4 Let T : V → W and S : W → X be linear maps between

finite-dimensional vector spaces over F, and let B, C, and D be bases for
V, W , and X, respectively. Then
[S ◦ T ]D←B = [S]D←C [T ]C←B . (3.5)
Proof. Denoting
B = {v1 , . . . , vn }, C = {w1 , . . . , wm }, D = {x1 , . . . , xp },
[S ◦ T ]D←B = (cij )pi=1,j=1

n
, [S]D←C = (bij )pi=1,j=1
m
, [T ]C←B = (aij )m n
i=1,j=1 .
We thus have that

m
X p
X
T (vj ) = aij wi , j = 1, . . . , n, S(wk ) = blk xl , k = 1, . . . , m.
i=1 l=1
Then
Xm m
X
(S ◦ T )(vj ) = S(T (vj )) = S( aij wj ) = aij S(wi ) =
i=1 i=1
m
X p
X p X
X m
[aij bli xl ] = ( bli aij )xl , j = 1, . . . , n.
i=1 l=1 l=1 i=1
Pm
Thus we get that clj = i=1 bli aij , l = 1, . . . , p, j = 1, . . . , n, which
corresponds exactly to (3.5).
Corollary 3.3.5 Let V be a n-dimensional vector space over F with bases B

and C. Then
[idV ]−1
B←C = [idV ]C←B . (3.6)
Proof. Clearly, idV ◦ idV = idV . In addition, it is easy to see that

[idV ]B←B = In = [idV ]C←C . Then from Theorem 3.3.4 we get that
[idV ]B←C [idV ]C←B = [idV ]B←B = In .
As the matrices involved are all square, we can now conclude that (3.6)
holds.
Example 3.3.3 continued.

 −1  
2 1 2 4 2 1
[idV ]B←C = 1 0 1 = 1 3 0 .
1 1 0 1 4 4
Let us check:
         
0 0 3 2 1
idV 2 = 2 = 4 0 + 1 3 + 1 4 ,
4 4 1 4 1
         
1 1 3 2 1
idV 0 = 0 = 2 0 + 3 3 + 4 4 ,
3 3 1 4 1
         
2 2 3 2 1
idV 1 = 1 = 1 0 + 0 3 + 4 4 ,
0 0 1 4 1
confirming that our calculations were correct.
In the next corollary, we present an important special case where we change

bases in a vector space, and express a linear map with respect to the new
basis. Recall that two n × n matrices A and B are called similar if there
exists an invertible n × n matrix P so that
A = P BP −1 .
We have the following corollary.
Corollary 3.3.6 Let T : V → V and let B and C be two bases in the

n-dimensional vector space V . Then
[T ]B←B = [idV ]B←C [T ]C←C [idV ]C←B = [idV ]−1

C←B [T ]C←C [idV ]C←B . (3.7)
In particular, [T ]B←B and [T ]C←C are similar.
In the next chapter we will find bases of generalized eigenvectors of a linear

T , making the corresponding matrix representation of a particular simple
form (the Jordan canonical form). In the case of a basis of eigenvectors, the
matrix representation is diagonal.
3.4 Exercises
Exercise 3.4.1 Let T : V → W and S : W → X be linear maps. Show that

the composition S ◦ T : V → X is also linear.
Exercise 3.4.2 For the following choices of V , W and T : V → W ,

determine whether T is linear or not.
(a) V = R3 , W = R4 ,  
  x1 − 5x3
x1  7x2 + 5 
T  x2  = 
3x1 − 6x2  .

x3
8x3
(b) V = Z35 , W = Z25 ,  

x1
x1 − 2x3
T x2 =
  .
3x2 x3
x3
(c) V = W = C2×2 (over F = C), T (A) = A − AT .

(d) V = W = C2×2 (over F = C), T (A) = A − A∗ .
(e) V = W = C2×2 (over F = R), T (A) = A − A∗ .
(f) V = {f : R → R : f is differentiable}, W = RR ,
(T (f ))(x) = f 0 (x)(x2 + 5).
(g) V = {f : R → R : f is continuous}, W = R,
Z 10
T (f ) = f (x)dx.
−5
Exercise 3.4.3 Show that if T : V → W is linear and the set

{T (v1 ), . . . , T (vk )} is linearly independent, then the set {v1 , . . . , vk } is
linearly independent.
Exercise 3.4.4 Show that if T : V → W is linear and onto, and {v1 . . . , vk }

is a basis for V , then the set {T (v1 ), . . . , T (vk )} spans W . When is
{T (v1 ), . . . , T (vk )} a basis for W ?
Exercise 3.4.5 Let T : V → W be linear, and let U ⊆ V be a subspace of

V . Define
T [U ] := {w ∈ W ; there exists u ∈ U so that w = T (u)}.
Observe that T [V ] = Ran T .
(a) Show that T [U ] is a subspace of W .

(b) Assuming dim U < ∞, show that dim T [U ] ≤ dim U .
(c) If Û is another subspace of V , is it always true that
T [U + Û ] = T [U ] + T [Û ]? If so, provide a proof. If not, provide a
counterexample.
(d) If Û is another subspace of V , is it always true that
T [U ∩ Û ] = T [U ] ∩ T [Û ]? If so, provide a proof. If not, provide a
counterexample.
Exercise 3.4.6 Let v1 , v2 , v3 , v4 be a basis for a vector space V .
(a) Let T : V → V be given by T (vi ) = vi+1 , i = 1, 2, 3, and T (v4 ) = v1 .

Determine the matrix representation of T with respect to the basis
{v1 , v2 , v3 , v4 }.
(b) If the matrix representation of a linear map S : V → V with respect to
the {v1 , v2 , v3 , v4 } is given by
 
1 0 1 1
0 2 0 2
 ,
1 2 1 3
−1 0 −1 −1
determine S(v1 − v4 ).
(c) Determine bases for Ran S and Ker S.
Exercise 3.4.7
Consider
the linear map T : R2 [X] → R2 given by
p(1)
T (p(X)) = .
p(3)
(a) Find a basis for the kernel of T .

(b) Find a basis for the range of T .
Exercise 3.4.8 Let T : V → W with V = Z45 and W = Z2×2

5 be defined by
 
a
b
T ( ) = a + b b + c .
c c+d d+a
d

Exercise 3.4.9 For the following T : V → W with bases B and C,

respectively, determine the matrix representation for T with respect to the
bases B and C. In addition, find bases for the range and kernel of T .
d2 d
(a) B = C = {sin t, cos t, sin 2t, cos 2t}, V = W = Span B, and T = dt2 + dt .

1 1
(b) B = {1, t, t2 , t3 }, C = { , }, V = C3 [X], and W = C2 , and
0 −1

p(3)
T (p) = .
p(5)
d
(c) B = C = {et cos t, et sin t, e3t , te3t }, V = W = Span B, and T = dt .

2 1 1
(d) B = {1, t, t }, C = { , }, V = C2 [X], and W = C2 , and
1 0
R 1
p(t)dt
T (p) = 0 .
p(1)
Exercise 3.4.10 Let V = Cn×n . Define L : V → V via L(A) = 21 (A + AT ).
(a) Let
1 0 0 1 0 0 0 0
B={ , , , }.
0 0 0 0 1 0 0 1
Determine the matrix representation of L with respect to the basis B.
(b) Determine the dimensions of the subspaces
W = {A ∈ V : L(A) = A}, and
Ker L = {A ∈ V : L(A) = 0}.
(c) Determine the eigenvalues of L.
Exercise 3.4.11 Let B = {1, t, . . . , tn }, C = {1, t, . . . , tn+1 }, V = Span B

and W = Span C. Define A : V → W via
Af (t) := (2t2 − 3t + 4)f 0 (t),
where f 0 is the derivative of f .
(a) Find the matrix representation of A with respect to the bases B and C.
(b) Find bases for Ran A and Ker A.
Exercise 3.4.12 (Honors) Let V and W be vector spaces. Let L(V, W ) be

the set of all linear maps acting V → W :
L(V, W ) = {T : V → W : T is linear}.
Notice that L(V, W ) ⊆ W V , and as addition and scalar multiplication are
defined in W , one may define addition and scalar multiplication on W V as is
done in vector spaces of functions. Show that L(V, W ) is a subspace of W V .
What is the dimension of L(V, W ) when dim V = n and dim W = m?
4
The Jordan Canonical Form
CONTENTS
4.1 The Cayley–Hamilton theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.2 Jordan canonical form for nilpotent matrices . . . . . . . . . . . . . . . . . . . . 71
4.3 An intermezzo about polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.4 The Jordan canonical form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.5 The minimal polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.6 Commuting matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.7 Systems of linear differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.8 Functions of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.9 The resolvent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
The main result in this chapter allows us to write a square matrix A as

A = SJS −1 , where J is a particularly simple matrix (in some cases a
diagonal matrix). In light of the results in Section 3.3, this means that for a
linear transformation on a finite-dimensional vector space we can find a
simple matrix representation J (called the Jordan canonical form). This is
helpful when one wants to work with this linear transformation. For
example, we will see how the Jordan canonical form is helpful in solving a
system of linear differential equations.
4.1 The Cayley–Hamilton theorem
It will take a few sections before we get to the general Jordan canonical form.
First we need to develop the following polynomial identity for a matrix.
Let A ∈ Fn×n . We define the characteristic polynomial pA (λ) of A to be the

degree n polynomial
pA (λ) := det(λIn − A).
69
Note that pA (λ) has the form
pA (λ) = λn + an−1 λn−1 + · · · + a1 λ + a0 , (4.1)
where an−1 , . . . , a0 ∈ F. When the leading coefficient in a polynomial is 1, we

call the polynomial monic. Thus the characteristic polynomial of A is monic.
We have the following result.
Theorem 4.1.1 (Cayley–Hamilton) Let A ∈ Fn×n with characteristic

polynomial pA (λ) as in (4.1). Then
pA (A) = An + an−1 An−1 + · · · + a1 A + a0 In = 0. (4.2)
convention A0 = In and an = 1, we can write (4.2) also as

With the P
n
pA (A) = j=0 aj Aj = 0.

1 2
Example 4.1.2 Let A = . Then
3 4
pA (λ) = (λ − 1)(λ − 4) − (−2)(−3) = λ2 − 5λ − 2. Let us check (4.1) for this
matrix:
2
1 2 1 2 1 0 7−5−2 10 − 10 − 0 0 0
−5 −2 = = .
3 4 3 4 0 1 15 − 15 − 0 22 − 20 − 2 0 0
In the proof of Theorem 4.1.1 we use matrices in which the entries are
polynomials in λ, such as for instance
2
λ − 6λ + 1 2λ − 10
. (4.3)
3λ2 + 5λ − 7 −λ2 + 4λ − 25
Pn
We will rewrite such polynomials in the form j=0 λj Aj , with Aj constant
matrices (i.e., Aj does not depend on λ). For (4.3) it looks like

2 1 0 −6 2 1 −10
λ +λ + .
3 −1 5 4 −7 −25
Proof of Theorem 4.1.1. Applying Theorem 1.4.13 to the matrix λIn − A, we

get that
(λIn − A) adj(λIn − A) = pA (λ)In . (4.4)
It is easy to see that adj(λIn − A) is of the form
adj(λIn − A) = λn−1 In + λn−2 An−2 + · · · + λA1 + A0 ,

The Jordan Canonical Form 71
with Aj ∈ Fn×n constant matrices. Using the notation (4.1) and equating
the coefficients of λj , j = 0, . . . , n, on both sides of (4.4) we get
−A + An−2 = an−1 In , −AAn−2 + An−3 = an−2 In , . . . ,
−AA1 + A0 = a1 In , −AA0 = a0 In .
But then pA (A) equals
n
X
aj Aj = An + An−1 (−A + An−2 ) + An−2 (−AAn−2 + An−3 )+
j=0
· · · + A(−AA1 + A0 ) − AA0 = 0.

4.2 Jordan canonical form for nilpotent matrices
We will see that the Cayley–Hamilton theorem (Theorem 4.1.1) plays a

crucial role in obtaining the Jordan canonical of a matrix. In this section we
focus on the case when pA (λ) = λn . Thus An = 0. A matrix with this
property is called nilpotent.
Given a matrix A, we introduce the following quantities:
wk (A, λ) = dim Ker(A − λIn )k − dim Ker(A − λIn )k−1 , k = 1, . . . , n. (4.5)
Here (A − λIn )0 = In , so w1 (A, λ) = dim Ker(A − λIn ). The numbers

wk (A, λ) are collectively called the Weyr characteristics of A. The spaces
Ker(A − λIn )k are called generalized eigenspaces of A at λ.
We also introduce the Jordan block Jk (λ) of size k at λ, as being the k × k

upper triangular matrix
 
λ 1 0 ··· 0
0
 λ 1 ··· 0 
Jk (λ) =  ... .. .. ..  (4.6)

 . . .
0 0 ··· λ 1
0 0 ··· 0 λ
We write ⊕pk=1 Ak for the block diagonal matrix

 
A1 0 · · · 0
 0 A2 · · · 0 
⊕pk=1 Ak =  . .
 
.. .. ..
 .. . . . 
0 0 ··· Ap
When we have a block diagonal matrix with p copies of the same matrix B,
we write ⊕pk=1 B.
Theorem 4.2.1 Let A ∈ Fn×n be so that An = 0. Let wj = wj (A, 0),

j = 1, . . . , n + 1. Note that wn+1 = 0. Then A is similar to the matrix
w −wk+1
J = ⊕nk=1 (⊕j=1
k
Jk (0)).
Thus J is a block diagonal matrix with Jordan blocks, where for k = 1, . . . , n

the Jordan block Jk (0) appears exactly wk − wk+1 times.
Example 4.2.2 Let

 
0 1 0 −1 1 −1
0
 1 1 −2 2 −2

0 1 0 −1 2 −2
A= .
0
 1 0 −1 2 −2

0 1 0 −1 1 −1
0 1 0 −1 1 −1
Then one finds that
dim KerA = 3, dim KerA2 = 5, and dim KerAj = 6, j = 3, 4, 5, 6.
Thus w1 = 3, w2 = 2, w3 = 1, w4 = w5 = w6 = 0. Theorem 4.2.1 now states

that A is similar to the matrix
 
0
 0 1 
 
 0 0 
J = , (4.7)

 0 1 0

 0 0 1
0 0 0
where the empty entries are zeros.
Proof of Theorem 4.2.1. Put sk = wk − wk+1 . Choose linearly independent

vectors xn1 , . . . , xnsn so that
Span{xn1 , . . . , xn,sn }+̇KerAn−1 = KerAn (= Fn ).

Next, for j = n − 1, . . . , 1, choose linearly independent vectors xj,1 , . . . , xj,sj

so that
n
Span{xj,1 , . . . , xj,sj }+̇ +̇k=j+1 Span{Ak−j xk,1 , . . . , Ak−j xk,sk }

+̇KerAj−1 = KerAj . (4.8)
We claim that the set of vectors

B = ∪nk=1 ∪sj=1 {Ak−1 xk,j , . . . , Axk,j , xk,j

k
(4.9)
is a basis of Fn , and that
[A]B←B = J.
First we observe that the number of elements in B equals

n
X n
X
ksk = w1 − w2 + 2(w2 − w3 ) + 3(w3 − w4 ) + · · · + n(wn − 0) = wk =
k=1 k=1
n
X
(dim KerAk − dim KerAk−1 ) = dim KerAn − dim KerIn = n,
k=1
where in the last step we used that An = 0. Thus it remains to prove that B
(l)
is linearly independent. For this purpose, let ck,j ∈ F be so that
sk k−1
n X
(l)
X X
ck,j Al xk,j = 0. (4.10)
k=1 j=1 l=0
When we multiply (4.10) on the left with An−1 and use that Ak xk,j = 0, we
get that
sn
(0)
X
n−1
A ( ck,j xn,j ) = 0.
j=1
Then
sn
(0)
X
cn,j xn,j ∈ (Span{xn1 , . . . , xn,sn }) ∩ KerAn−1 = {0},
j=1
and thus
sn
(0)
X
cn,j xn,j = 0.
j=1
(0)
As {xn1 , . . . , xn,sn } is linearly independent, we get that cn,j = 0,
j = 1, . . . , sn . If n = 1, we are done. If n ≥ 2, we multiply (4.10) with An−2
on the left, to obtain
sn−1 sn
(0) (1)
X X
An−2 ( cn−1,j xn−1,j ) + An−1 ( cn,j xn,j ) = 0.
j=1 j=1
Then
sn−1 sn
(0) (1)
X X
cn−1,j xn−1,j + A cn,j xn,j ∈
j=1 j=1
Span{xn−1,1 , . . . , xn−1,sn−1 }+̇Span{Axn,1 , . . . , Axn,sn } ∩ KerAn−2 .

By (4.8) this intersection equals {0}, and thus

sn−1 sn
(0) (1)
X X
cn−1,j xn−1,j + A cn,j xn,j = 0.
j=1 j=1
Next, using Proposition 2.3.3, we get that

sn−1 sn
(0) (1)
X X
cn−1,j xn−1,j = 0, A cn,j xn,j = 0.
j=1 j=1
(0)
Since {xn−1,1 , . . . , xn−1,sn−1 } is linearly independent, we get cn−1,j = 0,
j = 1, . . . , sn−1 . In addition, as KerA ⊆ KerAn−1 we get that
sn
(1)
X
cn,j xn,j ∈ Span{xn1 , . . . , xn,sn } ∩ KerAn−1 = {0}.
j=1
(1)
and using linear independence of {xn1 , . . . , xn,sn }, we obtain cn,j = 0,
j = 1, . . . , sn . If n = 2, we are done. If n ≥ 3, we continue by multiplying
(4.10) with An−3 on the left and argue in a similar manner as above.
(l)
Ultimately, we arrive at ck,j = 0 for all k, j, and l, showing that B is linearly
independent, and thus a basis for Fn .
To show that [A]B←B = J, notice that if we apply A to an element of B, two

possibilities occur: we either get 0 or we get another element of the basis B.
Indeed, taking the element Ak−1 xk,j ∈ B and applying A to it, we get (since
xk,j ∈ KerAk ) that
A(Ak−1 xk,j ) = Ak xk,j = 0,
and thus the corresponding column in [A]B←B consists of only zeros. If we
apply A to any other element Al−1 xk,j , l < k, of B, we get
A(Al−1 xk,j ) = Al xk,l ∈ B,
and as Al xk,l precedes Al−1 xk,j in B, we get exactly a 1 in the entry above
the diagonal in the column of [A]B←B corresponding to Al−1 xk,j , and zeros
elsewhere in this column. This shows that [A]B←B = J, completing the
proof.
Example 4.2.2 continued. We compute

 
0 0 1 −1 0 0
0 0
 1 −1 0 0

0 0 1 −1 0 0
A2 =   , Aj = 0, j ≥ 3.
0 0
 1 −1 0 0

0 0 1 −1 0 0
0 0 1 −1 0 0
Letting ej , j = 1, . . . , 6, denote the standard basis elements of F6 , we find
KerAj = F6 , j ≥ 3, KerA2 = Span{e1 , e2 , e3 + e4 , e5 , e6 },
KerA = Span{e1 , e2 + e3 + e4 , e5 + e6 }.
We can now choose x3,1 = e3 . Next, we need to choose x2,1 so that
Span{x2,1 }+̇Span{Ax3,1 (= e2 )}+̇KerA = KerA2 .
Take for instance x2,1 = e5 . Finally, we need to choose x1,1 so that
Span{x1,1 }+̇Span{A2 x3,1 , Ax2,1 }+̇KerI6 = KerA.
One can for instance choose x1,1 = e1 . We now get that
B = {x1,1 , Ax2,1 , x2,1 , A2 x3,1 , Ax3,1 , x3,1 } = {e1 , Ae5 , e5 , A2 e3 , Ae3 , e3 }.
Letting P = [idF6 ]E←B , we get that the columns of P are exactly the vectors
in B (with coordinates with respect to the standard basis E), and thus
 
1 1 0 1 0 0
0 2 0 1 1 0
 
0 2 0 1 0 1
P =  .
0 2 0 1 0 0

0 1 1 1 0 0
0 1 0 1 0 0
Then we find indeed that P −1 AP = J with J as in (4.7). Writing this

equality as AP = P J, it is easy to verify this by hand.
4.3 An intermezzo about polynomials
Given two polynomials f (X), g(X) ∈ F[X], we say that f (X) divides g(X)
(notation: f (X)|g(X)) if there exists an h(X) ∈ F[X] so that
f (X)h(X) = g(X). Clearly, if f (X)|g(X) and g(X) is not the zero

polynomial, then deg f ≤ deg g. We say that f (X) is a common divisor of
g(X) and h(X) if f (X) divides both g(X) and h(X). We call a nonzero
polynomial f (X) a greatest common divisor of the nonzero polynomials
g(X) and h(X) if f (X) is a common divisor of g(X) and h(X) and among
all common divisors f (X) has the highest possible degree.
Analogous to the results on integers as presented in Subsection 1.3.2 we have

the following result for polynomials. We will not provide proofs for these
results.
Proposition 4.3.1 For every pair of nonzero polynomials

g(X), h(X) ∈ F[X], there exists unique q(X), r(X) ∈ F[X] so that
g(X) = h(X)q(X) + r(X) and deg r < deg h.
We call r(X) the remainder of g(X) after division by h(X). One can find
q(X) and r(X) via long division. We present an example.
Example 4.3.2 Let g(X) = X 3 + X 2 − 1 and h(X) = X − 1. Then we

perform the long division
X 2 + 2X + 2

X −1 X3 + X2 −1
− X3 + X2
2X 2
− 2X 2 + 2X
2X − 1
− 2X + 2
1
resulting in q(X) = X 2 + 2X + 2 and r(X) = 1.

g(X), h(X) ∈ F[X], the greatest common divisor is unique up to
multiplication with a nonzero element of F. Consequently, every pair of
nonzero polynomials g(X), h(X) ∈ F[X] has a unique monic greatest
common divisor.
We denote the unique monic greatest common divisor of g(X) and h(X) by
gcd(g(X), h(X)). We say that g(X) and h(X) are coprime if
gcd(g(X), h(X)) = 1. In this setting we now also have a Bezout equation
result.

g(X), h(X) ∈ F[X], there exists a(X), b(X) ∈ F[X] so that
a(X)g(X) + b(X)h(X) = gcd(g(X), h(X)). (4.11)
In particular, if g(X) and h(X) are coprime, then there exists

a(X), b(X) ∈ F[X] so that
a(X)g(X) + b(X)h(X) = 1. (4.12)
As in Subsection 1.3.2, to solve Bezout’s identity (4.11), one applies Euclid’s

algorithm to find the greatest common divisor, keep track of the division
equations, and ultimately put the equations together.
Example 4.3.5 Let us solve (4.11) for g(X) = X 4 − 2X 3 + 2X 2 − 2X + 1

and h(X) = X 3 + X 2 − X − 1, both in R[X]. We perform Euclid’s algorithm:
X 4 − 2X 3 + 2X 2 − 2X + 1 = (X 3 + X 2 − X − 1)(X − 3) + (6X 2 − 4X − 2)
1 5 4 4
X 3 + X 2 − X − 1 = (6X 2 − 4X − 2)( X + ) + ( X − )
6 18 9 9
4 4 27 9
6X 2 − 4X − 2 = ( X − )( X + ) + 0.
9 9 2 2
(4.13)
So we find that 94 X − 49 is a greatest common divisor. Making it monic, we

get gcd(g(X), h(X)) = X − 1. Using the above equations, we get
9 3 1 5
X −1= [X + X 2 − X − 1 − (6X 2 − 4X − 2)( X + )] =
4 6 18
9 3 1 5
[X +X 2 −X−1−[X 4 −2X 3 +2X 2 −2X−1−(X 3 +X 2 −X−1)(X−3)]( X+ )].
4 6 18
Thus we find
9 1 5 3 5
a(X) = − ( X + ) = − X − ,
4 6 18 8 8
and
9 1 5 3 1 3
b(X) = [1 + (X − 3)( X + )] = X 2 − X + .
4 6 18 8 2 8
Given nonzero polynomials g1 (X), . . . , gk (X) ∈ F[X], we call f (X) ∈ F[X] a

common divisor of g1 (X), . . . , gk (X) if f (X)|gj (X), j = 1, . . . , k. A common
divisor of g1 (X), . . . , gk (X) is called a greatest common divisor of
g1 (X), . . . , gk (X) if among all common divisors of g1 (X), . . . , gk (X) it has
the highest possible degree. Analogous to the case k = 2, we have the
following.
Proposition 4.3.6 For every k nonzero polynomials

g1 (X), . . . , gk (X) ∈ F[X], the greatest common divisor is unique up to
multiplication with a nonzero element of F. Consequently, every pair of
nonzero polynomials g1 (X), . . . , gk (X) ∈ F[X] has a unique monic greatest
common divisor (notation: gcd(g1 (X), . . . , gk (X)). Moreover, there exists
a1 (X), . . . , ak (X) ∈ F[X] so that
a1 (X)g1 (X) + · · · + ak (X)gk (X) = gcd(g1 (X), . . . , gk (X)). (4.14)
The above result follows easily from the k = 2 case after first observing that
gcd(g1 (X), . . . , gk (X)) = gcd(g1 (X), gcd(g2 (X) . . . , gk (X))).
4.4 The Jordan canonical form
We now come to the main result of this chapter.
Theorem 4.4.1 (Jordan canonical form) Let A ∈ Fn×n and suppose we

may write pA (λ) = (λ − λ1 )n1 · · · (λ − λm )nm , where λ1 , . . . , λm ∈ F are the
different roots of pA (λ). Then A is similar to the matrix
 
J(λ1 ) 0 ··· 0
 0 J(λ2 ) · · · 0 
J = . ..  ,
 
. ..
 .. .. . . 
0 0 ··· J(λm )
where J(λj ) is the nj × nj matrix

nj wk (A,λj )−wk+1 (A,λj )
J(λj ) = ⊕k=1 ⊕l=1 Jk (λj ) , j = 1, . . . , m.
Here wk (A, λ) is defined in (4.5).
Remark 4.4.2 A field F is called algebraically closed if every polynomial

p(X) ∈ F[X] with deg p ≥ 1 has a root in F. If a field is algebraically closed,
one can factor any monic polynomial p(λ) of degree ≥ 1 as
p(λ) = (λ − λ1 )n1 · · · (λ − λm )nm with λ1 , . . . , λm ∈ F. Thus for an
algebraically closed field it is not necessary to assume in Theorem 4.4.1 that
pA (λ) factors in this way, as it automatically does. By the fundamental
theorem of algebra C is an algebraically closed field. The fields Zp and R are
not algebraically closed: 1 + X(X − 1)(X − 2) · · · (X − (p − 1)) does not have
any roots in Zp , while X 2 + 1 does not have any real roots.
Our first step in the proof of Theorem 4.4.1 is the following.
Proposition 4.4.3 Let A ∈ Fn×n and suppose

pA (λ) = (λ − λ1 )n1 · · · (λ − λm )nm , where λ1 , . . . , λm ∈ F are different. Then
Fn = Ker(A − λ1 In )n1 +̇ · · · +̇Ker(A − λm In )nm . (4.15)
Proof. Let gj (λ) = pA (λ)/(λ − λj )nj ∈ F[λ], j = 1, . . . , m. Then

gcd(g1 (λ), . . . , gm (λ)) = 1, thus by Proposition 4.3.6 there exist
a1 (λ), . . . , am (λ) ∈ F[λ] so that
a1 (λ)g1 (λ) + · · · + am (λ)gm (λ) = 1.
But then,
a1 (A)g1 (A) + · · · + am (A)gm (A) = In . (4.16)
Let now v ∈ Fn be arbitrary, and put vj = aj (A)gj (A)v, j = 1, . . . , m.
Then, due to (4.16) we have that v = v1 + · · · + vm . Moreover,
(A − λj In )nj vj = aj (A)pA (A)vj = 0, due to Theorem 4.1.1. Thus
vj ∈ Ker(A − λj In )nj , j = 1, . . . , m, and thus
v = v1 + · · · + vm ∈ Ker(A − λ1 In )n1 + · · · +Ker(A − λm In )nm ,
proving the inclusion ⊆ in (4.15). The other inclusion ⊇ is trivial, so equality
in (4.15) holds.
It remains to show that the right-hand side of (4.15) is a direct sum. We

show that the first + is a direct sum, as this is notationwise the most
convenient. The argument is that same for all the others. Thus we let
v ∈ Ker(A − λ1 In )n1 ∩ [Ker(A − λ2 In )n2 + · · · +Ker(A − λm In )nm ].
We need to show that v = 0. Using that (λ − λ1 )n1 and g1 (λ) are coprime,
we have by Proposition 4.3.3 that there exist a(λ), b(λ) ∈ F[λ] so that
a(λ)(λ − λ1 )n1 + b(λ)g1 (λ) = 1.
Thus
a(A)(A − λ1 In )n1 + b(A)g1 (A) = In . (4.17)
n1 n1
Next, observe that v ∈ Ker(A − λ1 In ) gives that (A − λ1 In ) v = 0, and
that
v ∈ Ker(A − λ2 In )n2 + · · · +Ker(A − λm In )nm
implies that g1 (A)v = 0. But then, using (4.17), we get that
v = a(A)(A − λ1 In )n1 v + b(A)g1 (A)v = 0 + 0 = 0,
showing that
Ker(A − λ1 In )n1 ∩ [Ker(A − λ2 In )n2 + · · · +Ker(A − λm In )nm ] = {0},
as desired.
Lemma 4.4.4 Let A ∈ Fn×n , λ ∈ F, and s ∈ N. Put W = Ker(A − λIn )s .

Then
A[W ] ⊆ W.
Let B : W → W be defined by Bw = Aw. Then B is a linear map, and
(B − λ idW )s = 0. Moreover, λ is the only eigenvalue of B.
When W is a subspace satisfying A[W ] ⊆ W , we say that W is an invariant

subspace of A. We denote the linear map B in Lemma 4.4.4 by A|W and call
it the restriction of A to the invariant subspace W .
Proof of Lemma 4.4.4. Let w ∈ W = Ker(A − λIn )s , thus (A − λIn )s w = 0.

But then (A − λIn )s Aw = A(A − λIn )s w = 0, and thus Aw ∈ W . Clearly, B
is linear. Notice that for any w ∈ W , we have that
(B − λ idW )s w = (A − λIn )s w = 0,
due to w ∈ Ker(A − λIn )s . This shows that (B − λ idW )s = 0. Finally, let µ

be an eigenvalue of B, with eigenvector v(6= 0), say. Then
(B − λ idWj )v = (µ − λ)v, and thus 0 = (B − λ idW )s v = (µ − λ)s v. As
v=6 0, this implies that µ = λ.
Proof of Theorem 4.4.1. Let Wj = Ker(A − λj In )nj , j = 1, . . . , m. First note

that by Proposition 4.4.3 and Lemma 4.4.4 we have that A = ⊕m j=1 A|Wj ,
and thus
m
Y m
Y
pA (λ) = det(λIn − A) = det(λ idWj − A|Wj ) = det(λ − λj )dim Wj .
j=1 j=1
We now obtain that dim Wj = nj , j = 1, . . . , m.
Next, by Lemma 4.4.4 we have that (A|Wj − λj idWj )nj is nilpotent, and
thus by Theorem 4.2.1 there is a basis Bj for Wj , so that
[(A − λj idWj )|Wj ]Bj ←Bj
is in Jordan form as described in Theorem 4.2.1. But then, using that

[idWj ]Bj ←Bj = Inj , we get that
[A|Wj ]Bj ←Bj = λj Inj + [(A − λj idWj )|Wj ]Bj ←Bj = J(λj ).
Letting now B = ∪m
j=1 Bj , we get by Proposition 4.4.3 that B is a basis for
Fn . Moreover,
[A]B←B = ⊕m m
j=1 [A|Wj ]Bj ←Bj = ⊕j=1 J(λj ) = J,
proving the result.

 
2 2 3
Example 4.4.5 Let A =  1 3 3 . Computing the characteristic
−1 −2 −2
polynomial pA of A we find pA (λ) = (λ − 1)3 . Thus 1 is the only eigenvalue
of A. Computing the eigenspace at λ = 1, we row reduce
   
1 2 3 1 2 3
A−I = 1 2 3  → 0 0 0 .
−1 −2 −3 0 0 0
Thus    
−2 −3
Ker (A − I) = Span{ 1  ,  0 }.
0 1
One finds that (A − I)2 = 0, and thus w1 (A, 1) = 2, wj (A, 1) = 3, j ≥ 2.
Thus A has one Jordan block of size 1 and one of size 2, giving that A is
similar to  
1
J =  1 1 .
0 1
For the basis B = {b1 , b2 , 3}, we choose b3 so that
Ker (A − I)+̇Span{b3 } = Ker (A − I)2 = C3 .

 
1
Choose, for instance, b3 = e1 . Then b2 = (A − I)e1 =  1  . Next we
−1
choose b1 so that
Span{b1 }+̇Span{b2 } = Ker (A − I).

 
−2
For instance b1 =  1 . Letting
0
 
−2 1 1
P = [idC3 ]E←B =  1 1 0 ,
0 −1 0
we indeed find that P −1 AP = J.

4.5 The minimal polynomial
As we have seen in Theorem 4.1.1, the characteristic polynomial pA (t) of a

matrix A has the property that pA (A) = 0. There are many other monic
polynomials p(t) that also satisfy p(A) = 0. Of particular interest is the one
of lowest possible degree. This so-called “minimal polynomial” of A captures
some essential features of the Jordan canonical form of the matrix A.
Given A ∈ Fn×n we define its minimal polynomial mA (t) to be the

lowest-degree monic polynomial so that mA (A) = 0.
Example 4.5.1 Let  

1 0 0
A = 0 1 0 .
0 0 2
Then mA (t) = (t − 1)(t − 2). Indeed,
mA (A) = (A − I3 )(A − 2I3 ) = 0,
and any monic degree-1 polynomial has the form t − λ, but A − λI3 6= 0 for
all λ.
Proposition 4.5.2 Every A ∈ Fn has a unique minimal polynomial mA (t),

and every eigenvalue of A is a root of mA (t). Moreover, if p(A) = 0, then
mA (t) divides p(t). In particular, mA (t) divides pA (t).
Proof. As pA (A) = 0, there certainly exists a degree-n polynomial satisfying

p(A) = 0, and thus there exists also a nonzero polynomial of lowest degree
which can always be made monic by multiplying by a nonzero element of F.
Next suppose that m1 (t) and m2 (t) are both monic polynomials of lowest
possible degree k so that m1 (A) = 0 = m2 (A). Then by Proposition 4.3.1
there exists q(t) and r(t) with deg r < k so that
m1 (t) = q(t)m2 (t) + r(t).
Note that r(A) = m1 (A) − q(A)m2 (A) = 0. If r(t) is not the zero
polynomial, then after multiplying by a nonzero constant r(t) will be a
monic polynomial of degree < k so that r(A) = 0. This contradicts m1 (t)
and m2 (t) being minimal polynomials for A. Thus r(t) is the zero
polynomial, and thus m1 (t) = q(t)m2 (t). Since deg m1 = deg m2 = k and m1
and m2 are both monic, we must have that q(t) ≡ 1, and thus
m1 (t) = m2 (t). This proves uniqueness.
Let λ be an eigenvalue with corresponding eigenvector v(6= 0). Thus

Av = λv. Then 0 = mA (A)v = mA (λ)v, and since v 6= 0 it follows that
mA (λ) = 0. Thus λ is a root of mA (t).
Finally, let p(t) be so that p(A) = 0. If p(t) ≡ 0, then clearly mA (t) divides
p(t). If p(t) is not the zero polynomial, apply Proposition 4.3.1 providing the
existence of q(t) and r(t) with deg r < deg mA so that
pA (t) = q(t)mA (t) + r(t).
As in the previous paragraph, r(t) not being the zero polynomial contradicts
that mA (t) is the minimal polynomial. Thus r(t) ≡ 0, yielding that mA (t)
divides p(t). As pA (A) by Theorem 4.1.1 we get in particular that mA (t)
divides pA (t).
Theorem 4.5.3 Let A ∈ Fn×n and suppose

pA (t) = (t − λ1 )n1 · · · (t − λm )nm , where λ1 , . . . , λm ∈ F are different. Then
mA (t) = (t − λ1 )k1 · · · (t − λm )km , (4.18)
where kj is the size of the largest Jordan block at λj , j = 1, . . . , m.

Equivalently, kj is the largest index k so that wk−1 (A, λj ) 6= wk (A, λj ).
Proof. It is easy to see that the minimal polynomial for Jk (λ) is (t − λ)k . As
mA (t) divides pA (t) we must have that mA (t) is of the form (4.18) for some
kj ≤ nj , j = 1, . . . , m. Observing that A = P JP −1 implies
m(A) = P m(J)P −1 for any polynomial m(t), it is easy to see by inspection
that kj must correspond exactly to the size of the largest Jordan block
corresponding to λj .
Example 4.2.2 continued. The minimal polynomial for A is mA (t) = t3 as

0 is the only eigenvalue of A and the largest Jordan block associated with it
is of size 3.
Example 4.5.4 Let A ∈ Z4×4 5 satisfy A3 − 4A3 + 2I6 = 0. What are the
possible Jordan canonical forms of A?
Let p(t) = t3 − 4t2 − 2 = (t − 1)2 (t − 2). Then p(A) = 0. Since mA (t) divides
p(t), there are 5 possibilities:
mA (t) = t − 1, mA (t) = (t − 1)2 , mA (t) = t − 2, mA (t) = (t − 1)(t − 2), or
mA (t) = (t − 1)2 (t − 2). Possibilities for the Jordan canonical form are:
     
1 0 0 0 1 1 0 0 1 1 0 0
0 1 0 0 0 1 0 0 0 1 0 0
J =0 0 1 0 , 0 0 1 0 , 0 0 1 1 ,
    
0 0 0 1 0 0 0 1 0 0 0 1
     
2 0 0 0 1 0 0 0 1 0 0 0
0 2 0 0 0 1 0 0 0 1 0 0
0 0 2 0 , 0 0 1 0 , 0 0 2 0 ,
     
0 0 0 2 0 0 0 2 0 0 0 2
     
1 0 0 0 1 1 0 0 1 1 0 0
0 2 0 0 0 1 0 0 0 1 0 0
0 0 2 0 , 0 0 2 0 , 0 0 1 0 .
     
0 0 0 2 0 0 0 2 0 0 0 2
We say that a matrix A is diagonalizable if its Jordan canonical form is a

diagonal matrix. In other words, a matrix is diagonalizable if and only if all
its Jordan blocks are of size 1.
Corollary 4.5.5 A matrix A is diagonalizable if and only if its minimal

polynomial mA (t) has only roots of multiplicity 1.
Proof. Follows directly from Theorem 4.5.3 as a matrix is diagonalizable if

and only if the largest Jordan block for each eigenvalue is 1.
4.6 Commuting matrices
One learns early on when dealing with matrices that in general they do not
commute (indeed, in general AB = 6 BA). Sometimes, though, one does
encounter commuting matrices; for example, if they are matrix
representations of taking partial derivatives with respect to different
variables on a vector space of “nice” functions. It is of interest to relate such
commuting matrices to one another. We focus on the case when one of the
matrices is nonderogatory.
We call a matrix nonderogatory if the matrix only has a single Jordan block
associated with each eigenvalue. The following results is easily proven.
Proposition 4.6.1 Let A ∈ Fn×n . The following are equivalent.
(i) A is nonderogatory.
(ii) w1 (A, λ) = dim Ker(A − λIn ) = 1 for every eigenvalue λ of A.
(iii) mA (t) = pA (t).
The main result of this section is the following. We say that matrices A and
B commute if AB = BA.
Theorem Qm4.6.2 Let A ∈ Fn×n be nonderogatory with

pA (λ) = i=1 (λ − λi )ni with λ1 , . . . , λm ∈ F all different. Then B ∈ Fn×n
commutes with A if and only if there exists a polynomial p(X) ∈ F[X] so that
B = p(A). In that case, one can always choose p(X) to have degree ≤ n − 1.
When A is not nonderogatory, there is no guarantee that commuting

matrices have to be of the form p(A), as the following example shows.

0 1 1 2
Example 4.6.3 Let F = R, A = , and B = . Clearly
1 0 0 3

p(1) 0
AB = BA. If p(X) is some polynomial, then p(A) = , which
0 p(1)
never equals B.
We will need the following result.
Lemma 4.6.4 Let λ = 6 µ, C = Jn (λ) ∈ Fn×n , D = Jm (µ) ∈ Fm×m , and

Y ∈ Fn×m . Suppose that CY = Y D. Then Y = 0.
Proof. We first show that
C k Y = Y Dk for all k ∈ {0, 1, 2, . . .}. (4.19)
For k = 0 this is trivial, while for k = 1 it is an assumption of this lemma.

Next, C 2 Y = C(CY ) = C(Y D) = (CY )D = (Y D)D = Y D2 . Proceeding by
induction, assume that C k Y = Y Dk holds for some k ∈ {2, 3 . . .}. Then
C k+1 Y = CC k Y = CY Dk = Y DDk = Y Dk+1 . This proves (4.19). By
taking linear combinations, we get that for all polynomials p(t) we have
p(C)Y = Y p(D). Let now p(t) = (t − λ)n . Then p(C) = 0, while p(D) is
upper triangular with (µ − λ)n = 6 0 on the main diagonal. Thus p(D) is
invertible. So from p(C)Y = Y p(D), we get that 0 = Y p(D), and since p(D)
is invertible, we get Y = 0(p(D))−1 = 0.
Proof of Theorem 4.6.2. When B = p(A), then clearly A and B commute.

Thus the main part concerns the converse statement. Thus, suppose that A
is as in the statement, and let B commute with A.
We first consider the case when A = Jn (0). Let B = (bij )ni,j=1 . Then
AB = (bi+1,j )ni,j=1 , where we let bn+1,j = 0 for all j. Furthermore,
BA = (bi,j−1 )ni,j=1 , where we let bi,0 = 0 for all i. Equating AB and BA, we
therefore obtain
bi+1,j = bi,j−1 , i, j = 1, . . . , n, where bi,0 = 0 = bn+1,j . (4.20)
Set now bk = bi,j , whenever j − i = k and i ∈ {1, . . . , n + 1} and

j ∈ {0, . . . , n}. This is well-defined due to (4.20). Then we see that bk = 0
when k < 0, and that B is the upper-triangular Toeplitz matrix
 
b0 b1 b2 · · · bn−1
 0 b0 b1 · · · bn−2 
 
B =  ... .. . . . . ..  .

 . . . . 

 0 0 · · · b0 b1 
0 0 ··· 0 b0
If we put p(X) = b0 + b1 X + · · · + bn−1 X n−1 , we get B = p(A).
Next, if A = Jn (λ), then we have that AB = BA if and only if

(A − λIn )B = B(A − λIn ). By the previous paragraph, we have that
B = p(A − λIn ) for some polynomial p(X). But then, B = q(A), where
q(X) = p(X − λ). Notice that deg q = deg p, so we are done in this case as
well.
Next, let A = ⊕mj=1 Jnj (λj ) with λ1 , . . . , λm different, and decompose

B = (Bij )m
i,j=1 where Bij has size ni × nj . The equation AB = BA leads
now to the equalities
Jni (λi )Bij = Bij Jnj (λj ), i, j = 1, . . . , m. (4.21)
When i =6 j, we get by Lemma 4.6.4 that Bij = 0, and for j = i we get that
Bii is upper-triangular Toeplitz. Define
pA (t)
qj (t) = .
(t − λj )nj
Then qj (t) is a polynomial of degree n − nj , and qj (Jni (λi )) = 0, i 6= j. Also,

observe that qj (Jnj (λj )) is an upper-triangular invertible Toeplitz matrix,
and thus (qj (Jnj (λj ))−1 Bjj is upper-triangular Toeplitz. But then there
exists a polynomial rj (t) of degree ≤ nj − 1 so that

r(Jnj (λj )) = (qj (Jnj (λj ))−1 Bjj . It is now straightforward to check that the
polynomial
p(t) = q1 (t)r1 (t) + · · · + qm (t)rm (t)
satisfies p(A) = B.
−1
Finally, we consider A = P (⊕m
j=1 Jnj (λj ))P . Then AB = BA implies that
−1 m
B̂ = P BP commutes with ⊕j=1 Jnj (λj ). The polynomial from the
previous paragraph, now establishes p(⊕m j=1 Jnj (λj )) = B̂. But then
p(A) = B also holds.
4.7 Systems of linear differential equations
The Jordan canonical form is useful for solving systems of linear differential
equations. We set F = C, as we are dealing with differentiable functions. A
system of linear differential equations has the form
 0
 x1 (t) = a11 x1 (t) + · · · + a1n xn (t),
 x1 (0) = c1 ,
.. ..
 . .
 0
xn (t) = an1 x1 (t) + · · · + ann xn (t), xn (0) = cn ,
which in shorthand we can write as
0
x (t) = Ax(t)
x(0) = c.
If A = Jn (0) (and, for later convenience, changing x to z), the system is

 0

 z1 (t) = z2 (t), z1 (0) = c1 ,

 .. ..
. .
0
 z (t) = z n (t), zn−1 (0) = cn−1 ,
 n−10


zn (t) = 0, zn (0) = cn .
Solving from the bottom up, one easily sees that the solution is
cn 2
zn (t) = cn , zn−1 (t) = cn−1 + cn t, zn−2 = cn−2 + cn−1 t + t ,
2!
n
X ck
, . . . , z1 (t) = tk−1 .
(k − 1)!
k=1
 
z1 (t)
Next, if A = Jn (λ) = λIn + Jn (0), then with z(t) =  ...  as above, it is
 
zn (t)
straightforward to see that y(t) = eλt z(t) solves
0
y (t) = Jn (λ)y(t),
y(0) = c.
Clearly, y(0) = z(0) = c. Furthermore,
y 0 (t) = λeλt z(t)+eλt z 0 (t) = λy(t)+eλt Jn (0)z(t) = λy(t)+Jn (0)y(t) = Jn (λ)y(t).
To solve the general system

0
x (t) = Ax(t)
(4.22)
x(0) = c,
one writes A in Jordan canonical form A = P JP −1 . If we now put

y(t) = P −1 x(t), then we get that y(t) satisfies
0
y (t) = Jy(t)
y(0) = P −1 c.
With J = ⊕m j=1 Jnj (λj ), this system converts to m systems treated in the
previous paragraph. We can subsequently solve these m systems, leading to
a solution y(t). Then, in the end, x(t) = P y(t) solves the system (4.22). We
will illustrate this in an example below.
We have the following observation.
Theorem 4.7.1 Consider the system 4.22,  where  A is similar to

x1 (t)
 . 
⊕m
j=1 Jnj (λj ). Then the solution x(t) =  ..  consists of functions xj (t)
xn (t)
that are linear combinations of the functions
eλj t , teλj t , . . . , tnj −1 eλj t , j = 1, . . . , m.
Example 4.7.2 Consider the system
x01 (t) = 5x1 (t) + 4x2 (t) + 2x3 (t) + x4 (t),


 x1 (0) = 0,
x02 (t) =

x2 (t) − x3 (t) − x4 (t), x2 (0) = 1,

0
 x 3 (t) = −x 1 (t) − x 2 (t) + 3x3 (t), x3 (0) = 1,
x04 (t) = x1 (t) + x2 (t) − x3 (t) + 2x4 (t),

x4 (0) = 0.

We find that A = P −1 JP , where

   
−1 1 1 1 1 0 0 0
 1 −1 0 0 0 2 0 0
P = , J =  .
0 0 −1 0 0 0 4 1
0 1 1 0 0 0 0 4
Thus 0
y (t) = Jy(t)
T
y(0) = P c = 2 −1 −1 2 ,
has the solution
2et
   
y1 (t)
y2 (t)  −e2t 
y3 (t) = −e4t + 2te4t  .
   
y4 (t) 2e4t
And thus
−e2t + e4t + 2te4t
 
e4t + 2te4t
x(t) = P −1 y(t) = 
 
4t 4t
.
 e − 2te 
2et − e2t − e4t + 2te4t
A higher-order linear differential equation can be converted to a system of

first-order differential equations, as in the following example.
Example 4.7.3 Consider the third-order differential equation
f (3) (t) − 5f (2) (t) + 8f 0 (t) − 4f (t) = 0, f (2) (0) = 3, f 0 (0) = 2, f (0) = 1.
If we let x1 (t) = f (t), x2 (t) = f 0 (t), x3 (t) = f (2) (t), we get the system
 0        
x1 (t) 0 1 0 x1 (t) x1 (0) 1
x02 (t) = 0 0 1 x2 (t) , x2 (0) = 1 .
x03 (t) 4 −8 5 x3 (t) x3 (0) 3
For the eigenvalues of the coefficient matrix, we find 1,2,2, and we find that
there is a Jordan block of size 2 at the eigenvalue 2. Thus the solution is a
linear combination of et , e2t , and teet . Letting f (t) = c1 et + c2 e2t + c3 te2t ,
and plugging in the initial conditions f (2) (0) = 3, f 0 (0) = 2, f (0) = 1, we get
the equations 
 c1 + c2 = 1,
c1 + 2c2 + c3 = 2,
c1 + 4c2 + 4c3 = 3.

Solving, we obtain c1 = −1, c2 = 2, c3 = −1, yielding the solution

f (t) = −et + 2e2t − te2t .
4.8 Functions of matrices

Pm
We have already used many times that for a polynomial p(t) = j=1 pj tj
and a square
Pm matrix A, the matrix p(A) is well-defined, simply by setting
p(A) = j=1 pj Aj . Can we also define in a sensible way f (A), where F is
some other function, such as for instance f (t) = et , f (t) = sin t, etc.? For
this, let us start with the case when A is a Jordan block A = Jk (λ). We first
observe that in this case
 2  3
λ 3λ2 3λ
 
λ 2λ 1 1

 λ2 2λ 1 


 λ3 3λ2 3λ 1 

 . .
.. .. ...   . .. . .
.. .. 
A2 =  , A3 = 
   
,
 .. ..   .. .. 

 . . 


 . . 

 λ2 2λ  λ3 3λ2 
λ2 λ3
(−1)k−1
 
1
λ − λ12 1
λ3 ··· ··· λk
1 (−1)k−2 
− λ12 1


λ λ3 ··· λk−1 

..

 .. .. 
A−1 =
 . . .  , λ 6= 0.

 .. .. .. 

 . . . 

1

λ − λ12 
1
λ
In all cases, it has the form

0
f (k−1) (λ)
 
f (λ) f 1!(λ) f ”(λ)
2! ··· ··· (k−1)!
f 0 (λ) f (k−2) (λ) 
 f ”(λ)


 f (λ) 1! 2! ··· (k−2)! 

.. .. .. 
. . .
 
f (Jk (λ)) =  . (4.23)

.. .. .. 
. .
 
 . 
f 0 (λ)
 
 f (λ) 1!

f (λ)
This observation leads to the following Qm definition. Let A ∈ Cn×n have

minimal polynomial mA (t) = j=1 (t − λj )kj , and let f be a complex-valued
function on a domain in C so that
f (λj ), f 0 (λj ), . . . , f (kj −1) (λj ), j = 1, . . . , m, are well-defined. If A is given in
Jordan canonical form A = SJS −1 , with

 
J(λ1 ) 0 ··· 0
 0 J(λ 2 ) ··· 0 
J = . ,
 
. .. ..
 .. .. . . 
0 0 ··· J(λm )

J(λj ) = ⊕k=1 ⊕l=1 Jk (λj ) , j = 1, . . . , m,
we define
 
f (J(λ1 )) 0 ··· 0
 0 f (J(λ 2 )) ··· 0 
f (A) := Sf (J)S −1 , f (J) :=   (4.24)
 
.. .. .. ..
 . . . . 
0 0 ··· f (J(λm ))
and

f (J(λj )) := ⊕k=1 ⊕l=1 f (Jk (λj )) , j = 1, . . . , m,
with f (Jk (λj )) given via (4.23).
Let us do an example.
 
2 2 3
Example 4.8.1 Let A =  1 3 3 . In Example 4.4.5 we calculated
−1 −2 −2
that A = SJS −1 , where
   
1 −2 1 1
J = 1 1 , S =  1 1 0 .
1 0 −1 0
If f (t) = ewt , we find that f (1) = ew and f 0 (1) = wew , and thus
  w  −1
−2 1 1 e −2 1 1
f (A) =  1 1 0  ew wew   1 1 0 .
0 −1 0 ew 0 −1 0
Notice that we need to check that f (A) is well-defined. Indeed, we need to

check that if A = SJS −1 = S̃J S̃ −1 , then Sf (J)S −1 = S̃f (J)S̃ −1 (where we
used that J is unique up to permutation of its blocks, so we do not have to
worry about different J’s). In other words, if we let T = S̃ −1 S, we need to

check that
T J = JT implies T f (J) = f (J)T. (4.25)
Using the techniques in Section 4.6, this is fairly straightforward to check,
and we will leave this for the reader.
Qm
Remark 4.8.2 It should be noticed that with mA (t) = j=1 (t − λj )kj and
with functions f and g so that
f (r) (λj ) = g (r) (λj ), r = 0, . . . , kj − 1, j = 1, . . . , m, (4.26)
we have that f (A) = g(A). Thus, as an alternative way of defining f (A), one
can construct a polynomial g satisfying (4.26) and define f (A) via
f (A) := g(A). In this way, one avoids having to use the Jordan canonical
form in the definition of f (A), which may be preferable in some cases.
When h(t) = f (t)g(t) we expect that h(A) = f (A)g(A). This is indeed true,
but it is something that we need to prove. For this we need to remind
ourselves of the product rule for differentiation:
h(t) = f (t)g(t) implies that h0 (t) = f (t)g 0 (t) + f 0 (t)g(t).
Taking a second and third derivative we obtain
h00 (t) = f (t)g 00 (t) + 2f 0 (t)g 0 (t) + f 00 (t)g(t),
h(3) (t) = f (t)g (3) (t) + 3f 0 (t)g 00 (t) + 3f 00 (t)g 0 (t) + f (3) (t)g(t).
In general, we obtain that the kth derivative of h is given by
k
X k
h(k) (t) = f (r) (t)g (k−r) (t), (4.27)
r=0
r
which is referred to as the Leibniz rule. We will use the Leibniz rule in the
following proof.
TheoremQ4.8.3 Let A ∈ Cn×n with minimal polynomial

m
mA (t) = j=1 (t − λj )kj and let f and g be functions so that
f (λj ), f 0 (λj ), . . . , f (kj −1) (λj ), g(λj ), g 0 (λj ), . . . , g (kj −1) (λj ), j = 1, . . . , m, are
well-defined. Put k(t) = f (t) + g(t) and h(t) = f (t)g(t). Then
k(A) = f (A) + g(A) and h(A) = f (A)g(A).

Proof. We will show the equation h(A) = f (A)g(A). The equation

k(A) = f (A) + g(A) can be proven in a similar manner (and is actually
easier to prove, as k (j) (λ) = f (j) (λ) + g (j) (λ) for all j).
First, let A = Jk (λ). Then

  
f (k−1) (λ) g (k−1) (λ)
f (λ) · · · (k−1)!  g(λ) ··· (k−1)! 
.. ..

f (A)g(A) =  ..  .. =
. .  . . 
f (λ) g(λ)
   
Pk−1 f (j) (λ) g (k−j−1) (λ) h(k−1) (λ)
f (λ)g(λ) · · · j=0 j! (k−j−1)!  h(λ) · · · (k−1)! 
.. ..
 
 .. =  ..  = h(A),
 . .   . . 
f (λ)g(λ) h(λ)
where we used that Leibniz’s rule yields
k−1 k−1
X f (j) (λ) g (k−j−1) (λ) 1 X k − 1 h(k−1) (λ)
= f (j) (λ)g (k−j−1) (λ) = .
j=0
j! (k − j − 1)! (k − 1)! j=0 j (k − 1)!
As the rule works for a Jordan block, it will also work for a direct sum of
Jordan blocks. Finally, when A = SJS −1 , we get that f (A)g(A) =
Sf (J)S −1 Sg(J)S −1 = S[f (J)g(J)]S −1 = Sh(J)S −1 = h(A).
Observe that the matrix in (4.23) can be written as
f 0 (λ) f 00 (λ) f (k−1) (λ)

f (λ)Jk (0)0 + Jk (0)1 + Jk (0)2 + · · · + Jk (0)k−1 ,
1! 2! (k − 1)!
where Jk (0)0 = Ik . Applying this to each summand in (4.24), we arrive at

the following theorem.

m
mA (t) = j=1 (t − λj )kj . Then there exist matrices
Pjk , j = 1, . . . , m, k = 0, . . . , kj − 1,
so that for every complex-valued function f so that f (λj ), f 0 (λj ), . . .,

f (kj −1) (λj ), j = 1, . . . , m, are well-defined, we have that
j −1
m kX
X
f (A) = f (k) (λj )Pjk . (4.28)
j=1 k=0
Moreover, these matrices Pjk satisfy

2
(i) Pj0 = Pj0 ,
(ii) Pjk Prs = 0, j 6= r,
(iii) Pjk Pjs = k+s

k Pj,k+s , and
Pm
(iv) j=1 Pj0 = In .
Here Pjk = 0 when k ≥ kj − 1.
Proof. Let A be given in Jordan canonical form A = SJS −1 , with

 
J(λ1 ) 0 ··· 0
 0 J(λ2 ) · · · 0 
J = . ..  ,
 
. .
 .. .. .. . 
0 0 · · · J(λm )

J(λj ) = ⊕k=1 ⊕l=1 Jk (λj ) , j = 1, . . . , m.
We define  
0
 .. 

 . 


 0 
 −1
Pj0 := S 
 Inj S ,


 0 

 .. 
 . 
0
 
0
 .. 

 . 

 0 
1   −1
Pjk := S  Jjk S , (4.29)
k!  

 0 

 .. 
 . 
0
where
Jj = ⊕k=1 ⊕l=1 Jk (0) , j = 1, . . . , m.
Notice that Jjs = 0 when s ≥ kj . Equality (4.28) now follows directly from
(4.24). Moreover, from the definitions (4.29) it is easy to check that (i)–(iv)
hold.
Let us compute decomposition (4.28) for an example.

Example 4.8.5 Let

 
4 0 0 0 2 0
−2 3 −1 0 −2 −1
 
 0 −2 6 0 2 2
A=
 2 −4 8 2 6
.
 6 
−2 3 −5 0 −2 −3
2 −3 5 0 4 5
Then A = SJS −1 , where

   
1 0 0 −1 0 1 2 0 0 0 0 0
1
 0 1 1 −1 0

0
 2 2 0 0 0

1 0 0 1 0 0
 , J = 0 0 2 0 0 0

S= −1
.
 1 0 1 1 0
 0
 0 0 4 2 0

−1 0 0 0 −1 0 0 0 0 0 4 2
0 0 1 0 1 0 0 0 0 0 0 4
Thus λ2 = 1 and λ2 = 4,
   
0 0 0 0 1 0
J 1 = 0 0 1 , J2 = 0 0 1 .
0 0 0 0 0 0
Now
1
− 12 −1 − 12
 
0 2 0
0 1
 −1 0 −1 0 
1
− 12 −1 − 12 

I 0 0 0
P10 = S 3 S −1 =  2 ,
0 0 0 1
 −2 1 −1 −1  
0 − 1 1
0 1 1 
2 2 2
1
0 2 − 12 0 0 1
2
 
0 0 0 0 0 0
0 0 0 0 0 0
 
J1 0 0 0 0 0 0 0
P11 = S S −1 = 
0 1
.
0 0  2 − 21 0 0 12 

0 0 0 0 0 0
0 0 0 0 0 0
We leave the other computations as an exercise.
We will next see that the formalism introduced above provides a useful tool
in the setting of systems of differential equations. We first need that if
B(t) = (bij )ni=1,j=1
m
is a matrix whose entries are functions in t, then we
define
d d
B(t) = B 0 (t) := (b0ij )ni=1,j=1
m
= ( bij )ni=1,j=1
m
.
dt dt
Thus the derivative of a matrix function is simply defined by taking the

derivative in each entry. For instance
d t2 cos t

2t − sin t
= .
dt e5t 7 5e5t 0
As taking the derivative is a linear operation, we have that
d d
(SB(t)W ) = S( (B(t))W, (4.30)
dt dt
where S and W are matrices of appropriate size. Indeed, looking at the (r, s)
entry of this product, we have that
d XX XX d
( sri bij (t)wjs ) = sri ( bij (t))wjs .
dt i j i j
dt
d at
The following proposition now shows that the equality dt e = aeat
generalizes to the case when a is replaced by a matrix A.
Proposition 4.8.6 Given A ∈ Cn×n . Then

d tA
e = AetA = etA A. (4.31)
dt
Proof. We first show (4.31) for a Jordan block A = Jk (λ). If A = Jk (λ) and
6 0, we have that
t=
   
tλ t 1
1

 tλ t  
  t


tA = 
 .. .. =
  .. ×

 . .   . 
1
 tλ t  
tk−2

1
tλ tk−1
  
tλ 1 1
 tλ 1  t 
  
 .. ..  .. ,


 . . 
 . 
 tλ 1  tk−2 
tλ tk−1
bringing A in the SJk (tλ)S −1 format. Then with f (x) = ex we get
f (tA) = Sf (Jk (tλ))S −1 , yielding
  tλ etλ etλ

e · · · · · · (k−1)!

1 1!
tλ
1 etλ 

 t

 etλ e1! · · · (k−2)! 
tA . ..

.. .. .

e = ×
 




. . .. 
1  tλ

  tλ e
tk−2
1
 e 1!

tλ
tk−1
e
 
1
 t 
 
 .. .


 . 
 tk−2 
tk−1
Thus we find
tetλ tk−1 etλ
 tλ 
e 1! ··· ··· (k−1)!
tλ tetλ tk−2 etλ 
e ···

 1! (k−2)! 
tA ..
 
e = .. .. ,

 . . . 

tetλ
 etλ 1!

tλ
e
j tλ j−1 tλ
d t e λtj etλ
which also holds when t = 0. As dt ( j! ) = t(j−1)!
e
+ j! , j ≥ 1, one
easily sees that that (4.31) holds for A = Jk (λ)).
Next, one needs to observe that (4.31) holds for A a direct sum of Jordan
blocks. Finally, using (4.30), one obtains that (4.31) holds when A = SJS −1 ,
thus proving the statement for general A.
We can now write down the solution of a system of differential equations

very efficiently as follows.
Corollary 4.8.7 The system of differential equations

0
x (t) = Ax(t)
x(0) = c.
has the solution

x(t) = etA x0 .
d
Proof. With x(t) = etA x0 , we have x(0) = e0 x0 = Ix0 = x0 , and dt x(t) =
d tA tA
dt e x0 = Ae x0 = Ax(t).
Using these techniques, we can now also handle non-homogenous systems of

differential equations of the form
x0 (t) = Ax(t) + B(t). (4.32)
Indeed, if we set x(t) = etA f (t), for some differentiable function f (t), then
using the product rule we obtain
x0 (t) = AetA f (t) + etA f 0 (t) = Ax(t) + etA f 0 (t).
If x(t) is a solution to (4.32), then we need B(t) = etA f 0 (t), yielding

f 0 (t) = e−tA B(t), and thus

Z t
f (t) = s−sA B(s)ds + K,
t0
where K is some constant vector. Let us illustrate how this works in an

example.
Example 4.8.8 Consider the system

0
x1 (t) = −x2 (t),
x02 (t) = x1 (t) + t.
Then
0 −1 0
A= , B(t) = ,
1 0 t
and
−sA cos s sin s
e = .
− sin s cos s
Thus
Z t Z t
cos s sin s 0 K1
f (t) := e−sA B(s)ds + K = + =
0 0 − sin s cos s s K2
Z t
s sin s K1 sin t − t cos t + K1
= ds + = .
0 s cos s K2 cos t + t sin t − 1 + K2
We now find

x1 (t) cos t − sin t sin t − t cos t + K1
= =
x2 (t) sin t cos t cos t + t sin t − 1 + K2

−t + K1 cos t + (1 − K2 ) sin t
.
1 − (1 − K2 ) cos t + K1 sin t
4.9 The resolvent
One matrix function that is of particular interest is the resolvent. The

resolvent of a matrix A ∈ Cn×n is the function
R(λ) := (λIn − A)−1 ,
which is well-defined on C \ σ(A), where
σ(A) = {z ∈ C : z is an eigenvalue of A} is the spectrum of A. We have the
following observation.
Proposition
Qm 4.9.1 Let A ∈ Cn×n with minimal polynomial
kj
mA (t) = j=1 (t − λj ) , and let Pjk , j = 1, . . . , m, k = 0, . . . , kj − 1, be as
in Theorem 4.8.4. Then
j −1
m nX
X k!
R(λ) = (λIn − A)−1 = Pjk . (4.33)
j=1 k=0
(λ − λj )k+1
1
Proof. Fix λ ∈ C \ σ(A), and define g(z) = λ−z , which is well-defined and k
times differentiable for every k ∈ N on the domain C \ {λ}. Notice that
g(A) = (λIn − A)−1 = R(λ). Also observe that
1 2 k!
g 0 (t) = , g 00 (t) = , . . . , g (k) (t) = .
(λ − t)2 (λ − t)3 (λ − t)k+1
Thus, by Theorem 4.8.4,
j −1
m nX j −1
m nX
X X k!
R(λ) = g(A) = g (k) (t)Pjk = Pjk .
j=1 k=0 j=1 k=0
(λ − λj )k+1
If we make use of a fundamental complex analysis result, Cauchy’s integral

formula, we can develop an integral formula for f (A) that is used, for
instance, in analyzing differential operators. Let us start by stating Cauchy’s
result. A function f of a complex variable is called analytic on an open set
D ⊆ C if f is continuously differentiable at every point z ∈ D. If f is
analytic on a domain D bounded by a contour γ and continuous on the
closure D, then Cauchy’s integral formula states that
Z
j! f (z)
f (j) (λ0 ) = dz, for all λ0 ∈ D and j = 0, 1, . . . . (4.34)
2πi γ (z − λ0 )j+1
Applying this to Proposition 4.9.1 we obtain the following result.
Theorem 4.9.2 Let A ∈ Cn×n with spectrum σ(A) = {λ1 , . . . , λm }. Let D

be a domain bounded by the contour γ, and assume that σ(A) ⊂ D. For
functions f analytic on D and continuous on the closure D, we have that
Z Z
1 1
f (A) = f (λ)(λI − A)−1 dλ = f (λ)R(λ)dλ. (4.35)
2πi γ 2πi γ
Proof. Follows directly from combining Proposition 4.9.1 with Cauchy’s

integral formula (4.34) and equation (4.28).
By choosing particular functions for f we can retrieve the matrices Pjk from
Theorem 4.8.4.

m
mA (t) = j=1 (t − λj )kj . Let γj be a contour that contains λj in its interior,
but none of the other eigenvalues of A are in the interior of or on γj . Then
the matrices Pjk as defined in Theorem 4.8.4 can be found via
Z Z
1 1
Pjk = (λ − λj )k (λI − A)−1 dλ = (λ − λj )k R(λ)dλ, (4.36)
2πi γj 2πi γk
where j = 1, . . . , m, k = 0, . . . , kj − 1.
4.10 Exercises
Exercise 4.10.1 Let F = Z3 . Check the Cayley–Hamilton theorem on the

matrix  
1 0 2
A = 2 1 0 .
2 2 2
Exercise 4.10.2 For the following matrices A (and B) determine its Jordan
canonical form J and a similarity matrix P , so that P −1 AP = J.
(a)  
−1 1 0 0
−1 0 1 0
A=
−1
.
0 0 1
−1 0 0 1
This matrix is nilpotent.
(b)  
10 −1 1 −4 −6
9
 −1 1 −3 −6

4
A= −1 1 −3 −1
.
9 −1 1 −4 −5
10 −1 1 −4 −6
(c)  
0 1 0
A = −1 0 0 .
1 1 1
(d)  
2 0 −1 1
0 1 0 0
A=
1
.
0 0 0
0 0 0 1
(e)  
1 −5 0 −3
1 1 −1 0 
 0 −3 1 −2 .
B= 
−2 0 2 1
(Hint: 1 is an eigenvalue.)
(f) For the matrix B, compute B 100 , by using the decomposition
B = P JP −1 .
Exercise 4.10.3 Let

 
3 1 0 0 0 0 0
0 3 1 0 0 0 0
 
0 0 3 0 0 0 0
 
0
A 0 0 3 1 0 0
.
0 0 0 0 3 0 0
 
0 0 0 0 0 3 1
0 0 0 0 0 0 3
Determine bases for the following spaces:
(a) Ker(3I − A).

(b) Ker(3I − A)2 .
(c) Ker(3I − A)3 .
Exercise 4.10.4 Let M and N be 6 × 6 matrices over C, both having

minimal polynomial x3 .
(a) Prove that M and N are similar if and only if they have the same rank.
(b) Give a counterexample to show that the statement is false if 6 is
replaced by 7.
(c) Compute the minimal and characteristic polynomials of the following
matrix. Is it diagonalizable?
 
5 −2 0 0
6 −2 0 0 
 
0 0 0 6 
0 0 1 −1
Exercise 4.10.5 (a) Let A be a 7 × 7 matrix of rank 4 and with minimal

polynomial equal to qA (λ) = λ2 (λ + 1). Give all possible Jordan
canonical forms of A.
(b) Let A ∈ Cn . Show that if there exists a vector v so that
v, Av, . . . , An−1 v are linearly independent, then the characteristic
polynomial of A equals the minimal polynomial of A. (Hint: use the
basis B = {v, Av, . . . , An−1 v}.)
Exercise 4.10.6 Let A ∈ Fn×n and AT denote its transpose. Show that
wk (A, λ) = wk (AT , λ), for all λ ∈ F and k ∈ N. Conclude that A and AT
have the same Jordan canonical form, and are therefore similar.
Exercise 4.10.7 Let A ∈ C4×4 matrix satisfying A2 = −I.
(a) Determine the possible eigenvalues of A.

(b) Determine the possible Jordan structures of A.
Exercise 4.10.8 Let p(x) = (x − 2)2 (x − 3)2 . Determine a matrix A for

which p(A) = 0 and for which q(A) =6 0 for all nonzero polynomials q of
degree ≤ 3. Explain why q(A) 6= 0 for such q.
Exercise 4.10.9 Let mA (t) = (t − 1)2 (t − 2)(t − 3) be the minimal

polynomial of A ∈ M6 .
(a) What possible Jordan forms can A have?

(b) If it is known that rank(A − I) = 3, what possible Jordan forms can A
have?
Exercise 4.10.10 Let A be a 4 × 4 matrix satisfying A2 = −A.

(b) Determine the possible Jordan structures of A (Hint: notice that
(A + I)A = 0.)
Exercise 4.10.11 Let A ∈ Cn×n . For the following, answer True or False.
Provide an explanation.
(a) If det(A) = 0, then 0 is an eigenvalue of A.

n
(b) If A2 = 0, then the rank of A is at most 2.
(c) There exists a matrix A with minimal polynomial mA (t) = (t − 1)(t − 2)

and characteristic polynomial pA (t) = tn−2 (t − 1)(t − 2) (here n > 2).
(d) If all eigenvalues of A are 1, then A = In (=the n × n identity matrix).
Exercise 4.10.12 Show that if A is similar to B, then tr A = tr B.
Exercise 4.10.13 Let P be a matrix so that P 2 = P.
(a) Show that P only has eigenvalues 0 or 1.

(b) Show that rank P = trace P. (Hint: determine the possible Jordan
canonical form of P .)
Exercise 4.10.14 Let A = P JP −1 . Show that Ran A = P [Ran J] and

Ker A = P [Ker J]. In addition, dim Ran A = dim Ran J and
dim Ker A = dim Ker J.
Exercise 4.10.15 Show that matrices A and B are similar if and only if
they have the same Jordan canonical form.
Exercise 4.10.16 Show that if A and B are square matrices of the same
size, with A invertible, then AB and BA have the same Jordan canonical
form.
Exercise 4.10.17 Let A ∈ Fn×m and B ∈ Fm×n . Observe that

In −A AB 0 In A 0n 0
= .
0 Im B 0m 0 Im B BA
(a) Show that the Weyr characteristics at λ 6= 0 of AB and BA satisfy
wk (AB, λ) = wk (BA, λ), k ∈ N.
6 0 is an eigenvalue of AB if and only if it is an eigenvalue

(b) Show that λ =
of BA, and that AB and BA have the same Jordan structure at λ.
(c) Provide an example of matrices A and B so that AB and BA have
different Jordan structures at 0.
Exercise 4.10.18 Let A, B ∈ Cn×n be such that (AB)n = 0. Prove that

(BA)n = 0.
Exercise 4.10.19 (a) Let A ∈ R8×8 with characteristic polynomial

p(x) = (x + 3)4 (x2 + 1)2 and minimal polynomial
m(x) = (x + 3)2 (x2 + 1). What are the possible Jordan canonical form(s)
for A (up to permutation of Jordan blocks)?
(b) Suppose that A ∈ Cn×n satisfies Ak = 6 0 and Ak+1 = 0. Prove that there
exists x ∈ C such that {x, Ax, . . . , Ak x} is linearly independent.
n
(c) Let A, B ∈ Cn×n be such that A2 − 2AB + B 2 = 0. Prove that every

eigenvalue of B is an eigenvalue of A, and conversely that every
eigenvalue of A is an eigenvalue of B.
Exercise 4.10.20 (a) Prove Proposition 4.6.1.

 
0 0 0 ··· −a0
1 0
 0 ··· −a1 
0 1 0 ··· −a2 
(b) Let A =  . .. . Show that
 
.. .. ..
 .. . . . . 
 
0 · · · 1 0 −an−2 
0 ··· 0 1 −an−1
pA (t) = tn + an−1 tn−1 + · · · + a1 t + a0 = mA (t).
This matrix is called the companion matrix of the polynomial

p(t) = pA (t). Thus a companion matrix is nonderogatory.
Exercise 4.10.21 For the following pairs of matrices A and B, find a

polynomial p(t) so that p(A) = B, or show that it is impossible.
   
1 1 0 1 2 3
(a) A = 0 1 1 , B = 0 2 3 .
0 0 1 0 0 3
   
1 1 0 1 2 3
(b) A = 0 1 1 , B = 0 1 2 .
0 0 1 0 0 1
Exercise 4.10.22 Solve the system of differential equations

 
1
x0 (t) = Ax(t), x(0) = −1 ,
0
where    −1
1 −1 1 2 1 0 1 −1 1
A = 0 1 −1 0 2 1 0 1 −1 .
0 1 0 0 0 2 0 1 0
Exercise 4.10.23 Solve the following systems of linear differential

equations:
(a) 0
x1 (t) = 3x1 (t) − x2 (t), x1 (0) = 1,
x02 (t) = x1 (t) + x2 (t), x2 (0) = 2.
(b)  0
 x1 (t) = 3x1 (t) + x2 (t) + x3 (t), x1 (0) = 1,
x02 (t) = 2x1 (t) + 4x2 (t) + 2x3 (t), x2 (0) = −1,
 0
x3 (t) = −x1 (t) − x2 (t) + x3 (t), x3 (0) = 1.
(c) 0
x1 (t) = −x2 (t), x1 (0) = 1,
x02 (t) = x1 (t), x2 (0) = 2.
(d)
x00 (t) − 6x0 (t) + 9x(t) = 0, x(0) = 2, x0 (0) = 1.
(e)
x00 (t) − 4x0 (t) + 4x(t) = 0, x(0) = 6, x0 (0) = −1.
Exercise 4.10.24 For the following matrices, we determined their Jordan

canonical form in Exercise 4.10.2.
(a) Compute cos A for  

−1 1 0 0
−1 0 1 0
A=
−1
.
0 0 1
−1 0 0 1
(b) Compute A24 for  

0 1 0
A = −1 0 0 .
1 1 1
(c) Compute eA for  

2 0 −1 1
0 1 0 0
A=
1
.
0 0 0
0 0 0 1
Exercise 4.10.25 (a) Find matrices A, B ∈ Cn×n so that eA eB 6= eA+B .

(b) When AB = BA, then eA eB = eA+B . Prove this statement when A is
nonderogatory.
Exercise 4.10.26 Compute the matrices P20 , P21 , P22 from Example 4.8.5.
Exercise 4.10.27 (a) Show that if A = A∗ , then eA is positive definite.

(b) If eA is positive definite, is A necessarily Hermitian?
(c) What can you say about eA when A is skew-Hermitian?
π 
2 1 −1
π
Exercise 4.10.28 Let A =  0 2 − π4 .
π
0 0 4
(a) Compute cos A and sin A.

2 2
(b) Check that (cos A) + (sin A) = I.
Exercise 4.10.29 Show that for A ∈ C4×4 , one has that
sin 2A = 2 sin A cos A.
Exercise 4.10.30 Solve the inhomogeneous system of differential equations

0
x1 (t) = x1 (t) + 2x2 (t) + e−2t ,
x02 (t) = 4x1 (t) − x2 (t).
Exercise 4.10.31 With the notation of Section 4.9 show that

Z Z
1 1
I= R(λ)dλ, A = λR(λ)dλ.
2πi γ 2πi γ
Exercise 4.10.32 Show that the resolvent satisfies
R(λ)−R(µ)
(a) λ−µ = −R(λ)R(µ).
dR(λ)
(b) dλ = −R(λ)2 .
dj R(λ)
(c) dλj = (−1)j j!R(λ)j+1 .
Exercise 4.10.33 With the notation of Theorem 4.9.3, show that

Z
1
λj Pj0 + Pj1 = APj0 = λR(λ)dλ.
2πi γk
Exercise 4.10.34 (Honors) In this exercise we develop the real Jordan

canonical form. Let A ∈ Rn×n .
(a) Show that if λ ∈ C is an eigenvalue of A, then so is λ.

(b) Show that if Ax = λx with λ = a + ib ∈ C \ R, then Ax = λx,
A Rex = a Rex − b Imx and A Imx = b Rex + a Imx.
Here x is the vector obtained from x by taking the complex conjugate of

each entry, Rex is the vector obtained from x by taking the real part of
each entry, Imx is the vector obtained from x by taking the imaginary
part of each entry.
(c) Show that for all λ ∈ C, we have that wk (A, λ) = wk (A, λ), k ∈ N.
(d) Show that Jk (λ) ⊕ Jk (λ), where λ = a + ib, is similar to the 2k × 2k
matrix
 
C(a, b) I2 0 ··· 0
 0
 C(a, b) I2 ··· 0  
 .. . .. . .. ..  ,
Kk (a, b) :=  . . 
 
 0 0 · · · C(a, b) I2 
0 0 ··· 0 C(a, b)

a −b
where C(a, b) = .
b a
(e) Show that if A ∈ Rn×n , then there exists a real invertible matrix S and
a matrix K so that A = SKS −1 , where K is a block diagonal matrix
with blocks Jk (λ), λ ∈ R, and blocks Kk (a, b) on the diagonal.
(Hint: First find the Jordan canonical form of A over C, where for
complex eigenvalues the (generalized) eigenvectors x and x are paired
up. Then use the similarity in (d) to simultaneously convert P to a real
matrix S and J to the matrix K.)
(f) Show that for systems of real differential equations with real initial
conditions, the solutions are combinations of functions tk eλt , k ∈ N0 ,
λ ∈ R, and tk eαt cos(βt), tk eαt sin(βt), k ∈ N0 , α, β ∈ R.
Exercise 4.10.35 (Honors) Show that the function f : C2×2 × C2×2 → C5

defined by
f (A, B) = (trA, det A, trB, det B, tr(AB))
is surjective. What happens for other fields? (Hint: Notice that a 2 × 2
matrix A has a single eigenvalue if and only if (trA)2 = 4 det A. To show
that (a, b, c, d, e) lies in the range of f , first consider the case when a2 =
6 4b,
so that A has two different eigenvalues.)
This exercise is based on a result that can be found in Section 1.2 of the
book by L. Le Bruyn, entitled Noncommutative geometry and Cayley-smooth
orders, Volume 290 of Pure and Applied Mathematics, Chapman &
Hall/CRC, Boca Raton, FL. Thanks are due to Paul Muhly for making the
author aware of this result.
5
Inner Product and Normed Vector Spaces
CONTENTS
5.1 Inner products and norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.2 Orthogonal and orthonormal sets and bases . . . . . . . . . . . . . . . . . . . . . 119
5.3 The adjoint of a linear map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
5.4 Unitary matrices, QR, and Schur triangularization . . . . . . . . . . . . . 125
5.5 Normal and Hermitian matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
5.6 Singular value decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
5.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Vector spaces may have additional structure. For instance, there may be a
natural notion of length of a vector and/or angle between vectors. The
properties of length and angle will be formally captured in the notions of
norm and inner product. These notions require us to restrict ourselves to
vector spaces over R or C. Indeed, a length is always nonnegative and thus
we will need the inequalities ≤, ≥, <, > (with properties like x, y ≥ 0 ⇒
xy ≥ 0 and x ≥ y ⇒ x + z ≥ y + z).
5.1 Inner products and norms
Let F be R or C. We will write most results for the choice F = C. To

interpret these results for the choice F = R, one simply ignores the complex
conjugates that are part of the definitions.
Let V be a vector space over F. A function
h·, ·i : V × V → F
is called a Hermitian form if
(i) hx + y, zi = hx, zi + hy, zi for all x, y, z ∈ V .
109
(ii) hax, yi = ahx, yi for all x, y ∈ V and all a ∈ F.
(iii) hx, yi = hy, xi, for all x, y ∈ V .
Notice that (iii) implies that hx, xi ∈ R for all x ∈ V . In addition, (ii)
implies that h0, yi = 0 for all y ∈ V . Also, (ii) and (iii) imply that
hx, ayi = ahx, yi for all x, y ∈ V and all a ∈ F. Finally, (i) and (iii) imply
that hx, y + zi = hx, yi + hx, zi for all x, y, z ∈ V . Also,
The Hermitian form h·, ·i is called an inner product if in addition
(iv) hx, xi > 0 for all 0 6= x ∈ V .
If V has an inner product h·, ·i (or sometimes we say “V is endowed with the
inner product h·, ·i ”), then we call the pair (V, h·, ·i) an inner product space.
At times we do not explicitly mention the inner product h·, ·i, and we refer
to V as an inner product space. In the latter case it is implicitly understood
what the underlying inner product is, and typically it would be one of the
standard inner products which we will encounter below.
Example 5.1.1 On Fn define

   
x1 y1
 ..   .. 
h .  ,  . i = x1 y 1 + · · · + xn y n ,
xn yn
or in shorthand notation hx, yi = y∗ x, where y∗ = y 1 · · · y n . Properties

(i)–(iv) are easily checked. This is the standard inner product or Euclidean
inner product on Fn , where F = R or C.
Example 5.1.2 On F2 define

x y
h 1 , 1 i = 2x1 y 1 + x1 y 2 + x2 y 1 + 3x2 y 2 .
x2 y2
Properties (i)–(iii) are easily checked. For (iv) observe that

hx, xi = |x1 |2 + |x1 + x2 |2 + 2|x2 |2 , so as soon as x1 6= 0 or x2 6= 0, we have
that hx, xi > 0.

x y
h 1 , 1 i = x1 y 1 + 2x1 y 2 + 2x2 y 1 + x2 y 2 .
x2 y2
Inner Product and Normed Vector Spaces 111
Properties (i)–(iii) are easily checked, so it is a Hermitian form. In order to

check (iv) observe that hx, xi = −|x1 |2 + 2|x1 + x2 |2 − |x2 |2 . So for instance

1 1
h , i = −1 + 0 − 1 = −2,
−1 −1
so h·, ·i is not an inner product.
Example 5.1.4 Let V = {f : [0, 1] → F : f is continuous}, and

Z 1
hf, gi = f (x)g(x)dx.
0
Properties (i)–(iii) are easily checked. For (iv) notice that

R1
hf, f i = 0 |f (x)|2 dx ≥ 0, and as soon as f (x) is not the zero function, by
continuity |f (x)|2 is positive on an interval (a, b), where 0 < a < b < 1, so
that hf, f i > 0. This is the standard inner product on V .

Z 1
hf, gi = (x2 + 1)f (x)g(x)dx.
0
Properties (i)–(iii) are easily checked. For (iv) notice that

R1
hf, f i = 0 (x2 + 1)|f (x)|2 dx ≥ 0, and as soon as f (x) is not the zero
function, by continuity (x2 + 1)|f (x)|2 is positive on an interval (a, b), where
0 < a < b < 1, so that hf, f i > 0.
Example 5.1.6 On Fn [X] define
hp(X), q(X)i = p(x1 )q(x1 ) + · · · + p(xn+1 )q(xn+1 ),
where x1 , . . . , xn+1 ∈ F are different points chosen in advance. Properties

(i)–(iii) are easily checked. For (iv) observe that
Pn+1
hp(X), p(X)i = j=1 |p(xj )|2 ≥ 0, and that hp(X), p(X)i = 0 if and only if
p(x1 ) = · · · = p(xn+1 ) = 0. As a polynomial of degree ≤ n with n + 1
different roots must be the zero polynomial, we get that as soon as p(X) is
not the zero polynomial, then hp(X), p(X)i > 0. Thus (iv) holds.
Example 5.1.7 On F3 [X] define
hp(X), q(X)i = p(0)q(0) + p(1)q(1) + p(2)q(2).
Properties (i)–(iii) are easily checked. However, (iv) does not hold. If we let
p(X) = X(X − 1)(X − 2) ∈ F3 [X], then p(X) is not the zero polynomial,
but hp(X), p(X)i = 0.
Example 5.1.8 On Fn×m define

hA, Bi = tr(AB ∗ ).
Properties (i)–(iii) are easily checked.
Pn P For (iv) we observe that if
m
A = (aij )ni=1,j=1
m
, then hA, Ai = i=1 j=1 |aij |2 , which is strictly positive
as soon as A 6= 0. This is the standard inner product on Fn×m , where F = R
or C.
Example 5.1.9 On F2×2 define

hA, Bi = tr(AW B ∗ ),

1 1
where W = . Properties (i)–(iii) are easily checked. For (iv) we
1 2
observe that if A = (aij )2i,j=1 , then
hA, Ai = (a11 + a12 )a11 + (a11 + 2a12 )a12 + (a21 + a22 )a21 + (a21 + 2a22 )a22 =
|a11 + a12 |2 + |a12 |2 + |a21 + a22 |2 + |a22 |2 ,
which is always nonnegative, and which can only equal zero when
a11 + a12 = 0, a12 = 0, a21 + a22 = 0, a22 = 0,
that is, when A = 0. Thus as soon as A 6= 0 we have hA, Ai > 0. Thus (iv)
holds, and therefore h·, ·i is an inner product.
Proposition 5.1.10 (Cauchy–Schwarz inequality) For an inner product

space (V, h·, ·i) , we have that
|hx, yi|2 ≤ hx, xihy, yi for all x, y ∈ V. (5.1)
Moreover, equality in (5.1) holds if and only if {x, y} is linearly dependent.
Proof. When x = 0, inequality (5.1) clearly holds. Next, suppose that x 6= 0.

Put z = y − hy,xi
hx,xi x. As h·, ·i is an inner product, we have that hz, zi ≥ 0.
This gives that
|hy, xi|2 |hy, xi|2 |hy, xi|2
0 ≤ hy, yi − 2 + 2
hx, xi = hy, yi − .
hx, xi hx, xi hx, xi
But now (5.1) follows.
If {x, y} is linearly dependent, it is easy to check that equality in (5.1) holds

(as x = 0 or y is a multiple of x). Conversely, suppose that equality holds in
(5.1). If x = 0, then clearly {x, y} is linearly dependent. Next, let us suppose
that x 6= 0. As before, put z = y − hy,xi
hx,xi x. Using equality in (5.1) one
computes directly that hz, zi = 0. Thus z = 0, showing that {x, y} is linearly
dependent.
Remark 5.1.11 Notice that in the first paragraph of the proof of

Proposition 5.1.10 we did not use the full strength of property (iv) of an
inner product. Indeed, we only needed to use
(v) hx, xi ≥ 0 for all x ∈ V .
In Section 6.2 we will encounter so-called pre-inner products that only

satisfy (i)–(iii) and (v). We will use in the proof of Proposition 6.2.12 that in
such case the Cauchy–Schwarz inequality (5.1) still holds.
Next we define the notion of a norm. Let V be a vector space over F = R or

F = C. A function
k·k:V →R
is called a norm if
(i) kxk ≥ 0 for all x ∈ V , and kxk = 0 if and only if x = 0.

(ii) kcxk = |c|kxk for all x ∈ V and c ∈ F.
(iii) kx + yk ≤ kxk + kyk for all x, y ∈ V . (Triangle inequality.)
Every norm satisfies the following inequality.
Lemma 5.1.12 Let V be a vector space with norm k · k. Then for every
x, y ∈ V we have
| kxk − kyk | ≤ kx − yk. (5.2)
Proof. Note that the triangle inequality implies

kxk = kx − y + yk ≤ kx − yk + kyk,
and thus
kxk − kyk ≤ kx − yk. (5.3)
Reversing the roles of x and y, we also obtain that
kyk − kxk ≤ ky − xk = kx − yk. (5.4)
Combining (5.3) and (5.4) yields (5.2).

 
x1
 .. 
k  .  k∞ = max |xj |.
j=1,...,n
xn
One easily checks that k · k∞ is a norm.

 
x1 n
 ..  X
k  .  k1 = |xj |.
xn j=1
One easily checks that k · k1 is a norm.

x
k 1 k = 2|x1 | + 3|x2 |.
x2
One easily checks that k · k is a norm.
kf k∞ = sup |f (x)|.
x∈[0,1]
One easily checks that k · k∞ is a norm.

n
X n
X
k pj X j k = |pj |.
j=0 j=0

n X
X m
k(aij )ni=1,j=1
m
k= |aij |.
i=1 j=1
We are mostly interested in the norm associated with an inner product.
Theorem 5.1.19 Let (V, h·, ·i) be an inner product space. Define
p
kxk := hx, xi.
Then k · k is a norm, which satisfies the parallelogram identity:
kx + yk2 + kx − yk2 = 2kxk2 + kyk2 for all x, y ∈ V. (5.5)
Moreover,
kx1 + · · · + xn k = kx1 k + · · · + kxn k (5.6)
if and only if dim Span{x1 , . . . , xn } ≤ 1 and hxi , xj i ≥ 0 for all
i, j = 1, . . . , n.
Proof. Conditions (i) and (ii) in the definition of a norm follow directly from
the definition of an inner product. For (iii) we observe that
kx + yk2 = hx + y, x + yi = hx, xi + 2Re hx, yi + hy, yi ≤

p p
hx, xi + 2|hx, yi| + hy, yi ≤ hx, xi + 2 hx, xi hy, yi + hy, yi = (kxk + kyk)2 ,
(5.7)
where we used the Cauchy–Schwarz inequality (5.1). Taking square roots on
both sides proves (iii). Notice that if kx + yk = kxk + kyk, then
p we must
p
have equality in (5.7). This then gives Re hx, yi = |hx, yi| = hx, xi hy, yi.
In particular, we have equality in the Cauchy–Schwarz inequality, which by
Proposition 5.1.10 implies that {x, y} is linearly dependent. Moreover,
Re hx, yi = |hx, yi| yields that hx, yi ≥ 0. If (5.6) holds, we obtain
kx1 + · · · + xn k = kx1 k + kx2 + · · · + xn k = · · · =
kx1 k + · · · + kxn−2 k + kxn−1 + xn k = kx1 k + · · · + kxn k.

This gives that
{xn−1 , xn }, {xn−2 , xn−1 + xn }, . . . , {x1 , x2 + · · · + xn }
are all linearly dependent, which easily yields that

dim Span{x1 , . . . , xn } ≤ 1. In addition, we get that
hxn−1 , xn i ≥ 0, hxn−2 , xn−1 + xn i ≥ 0, . . . , hx1 , x2 + · · · + xn i ≥ 0.
Combining this with dim Span{x1 , . . . , xn } ≤ 1 it is easy to deduce that

hxi , xj i ≥ 0 for all i, j = 1, . . . , n. The converse statement is straightforward.
To prove the parallelogram identity (5.5), one simply expands hx ± y, x ± yi,

and it follows immediately.
It is easy to see that the norm in Examples 5.1.13–5.1.18 do not satisfy the
parallelogram identity (5.5), and thus these norms are not associated with
an inner product.

 
x1 v
u n
uX
 .. 
k  .  k2 = t |xj |2 .
xn j=1
This norm, sometimes referred to as the Euclidean norm, is the norm

associated with the standard inner product on Fn , where F = R or C.
Corollary 5.1.21 Let z1 , . . . , zn ∈ C. Then
|z1 + · · · + zn | = |z1 | + · · · + |zn |
if and only if there exists a θ ∈ R so that zj = |zj |eiθ , j = 1, . . . , n (i.e.,

z1 , . . . , zn all have the same argument).

Re z
Proof. If we view a complex number z as a vector ∈ R2 with the
Im z
Re z
Euclidean norm, then |z| = k k. Apply now Theorem 5.1.19 to
Im z
obtain the result.
Example 5.1.22 Let V = {f : [0, 1] → F : f is continuous}, and define

s
Z 1
kf k2 = |f (x)|2 dx.
0
This “2-norm” on V is associated with the standard inner product on V .

v
un+1
uX
kp(X)k = t |p(xj )|2 ,
j=1
where x1 , . . . , xn+1 ∈ F are different points. This is the norm associated with
the inner product defined in Example 5.1.6.

p
kAkF = tr(AA∗ ).
This norm is called the Frobenius norm, and is the norm associated with the
inner product defined in Example 5.1.8.
Given a vector space V , we say that two norms k · ka and k · kb are equivalent
if there exist constants c, C > 0 so that
ckvka ≤ kvkb ≤ Ckvka for all v ∈ V. (5.8)
Notice that if k · ka and k · kb are equivalent and k · kb and k · kc are

equivalent, then k · ka and k · kc are equivalent.
Using the Heine–Borel theorem from analysis, along with the result that a
continuous real-valued function defined on a compact set attains a maximum
and a minimum, we can prove the following.
Theorem 5.1.25 Let V be a finite-dimensional vector space over F = R or

C, and let k · ka and k · kb be two norms. Then k · ka and k · kb are equivalent.
Proof. Let B = {b1 , . . . , bn } be a basis for V , and define the norm
kvkc := k[v]B k∞ ,
where k · k∞ is as in Example 5.1.13. We will show that any other norm k · k

on V is equivalent to k · k. This will yield the result.
We first claim that k · k : V → R is a continuous function, where distance in

V is measured using k · kc . In other words, we claim that for every > 0
there exists a δ > 0 so that
kx − ykc ≤ δ implies | kxk − kyk | ≤ .
Indeed, if > 0 is given, we choose δ = Pn

i=1 kbi k . Let us write
   
x1 y1
 ..   .. 
[x]B =  .  , [y]B =  .  .
xn yn
Then we get that

kx − ykc = max |xi − yi | < δ = Pn
i=1,...,n
i=1 kbi k
yields that
n
X n
X
| kxk − kyk | ≤ kx − yk = k (xi − yi )bi k ≤ |xi − yi |kbi k ≤
i=1 i=1
n n
X X
( max |xi − yi |) kbi k < Pn kbi k = .
i=1,...,n
i=1 i=1 kbi k i=1
Consider now the set S of k · kc -unit vectors in V ; thus

S = {v ∈ V : kvkc = 1}. By the Heine–Borel theorem (identifying V with
Fn ) this set S is compact, as S is closed and bounded. As k · k is a
real-valued continuous function on this set, we have that
c := min kvk, C = max kvk

v∈S v∈S
exist, and as kvk > 0 for all v ∈ S, we get that c, C > 0. Take now an
1
arbitrary nonzero v ∈ V . Then kvk c
v ∈ S, and thus
1
c≤k vk ≤ C,
kvkc
which implies
ckvkc ≤ kvk ≤ Ckvkc .
Clearly, this inequality also holds for v = 0, and thus the proof is complete.
Comparing, for instance, the norms k · k∞ and k · k2 on Fn , we have

√
kxk∞ ≤ kxk2 ≤ nkxk∞ .
Notice that the upper bound (which is attained by the vector of all 1’s)
depends on the dimension n, and tends to ∞ as n goes to ∞. Therefore, one
may expect Theorem 5.1.25 not to hold for infinite-dimensional vector
spaces. This is confirmed by the following example.
Example 5.1.26 Let V = {f : [0, 1] → R : f is continuous}, and take the

norms s
Z 1
kf k2 = |f (x)|2 dx, kf k∞ = max |f (x)|.
0 x∈[0,1]
Let gk ∈ V , k ∈ N, be defined by
(
1 − kx, for 0 ≤ x ≤ k1 ,
gk (x) =
0, for k1 < x ≤ 1.
Then Z 1
k 1 k 1 1
kgk k∞ = 1, kgk k22 = 1 − kx dx = − = .
0 k 2 k2 2k
q
1
No constant C > 0 exists so that 1 ≤ C 2k for all k ∈ N, and thus the
norms k · k2 and k · k∞ on V are not equivalent.
5.2 Orthogonal and orthonormal sets and bases
When a vector space has an inner product, it is natural to study objects

that behave nicely with respect to the inner product. For bases this leads to
the notions of orthogonality and orthonormality.
Given is an inner product space (V, h·, ·i). When in an inner product space a
norm kp· k is used, then this norm is by default the associated norm
k · k = h·, ·i unless stated otherwise. We say that v and w are orthogonal if
hv, wi = 0, and we will denote this as v ⊥ w. Notice that 0 is orthogonal to
any vector, and it is the only vector that is orthogonal to itself.
For ∅ 6= W ⊆ V we define
W ⊥ = {v ∈ V : hv, wi = 0 for all w ∈ W } = {v ∈ V : v ⊥ w for all w ∈ W }.
(5.9)
Notice that in this definition we do not require that W is a subspace, thus
W can be any set of vectors of V .
Lemma 5.2.1 For ∅ 6= W ⊆ V we have that W ⊥ is a subspace of V .
Proof. Clearly 0 ∈ W ⊥ as 0 is orthogonal to any vector, in particular to those

in W . Next, let x, y ∈ W ⊥ and c ∈ F. Then for every w ∈ W we have that
hx + y, wi = hx, wi + hy, wi = 0 + 0 = 0, and hcx, wi = chx, wi = c0 = 0.
Thus x + y, cx ∈ W ⊥ , showing that W ⊥ is a subspace.
In Exercise 5.7.4 we will see that in case W is a subspace of a

finite-dimensional space V , then
dim W + dim W ⊥ = dim V.
A set of vectors {v1 , . . . , vp } is called orthogonal if vi ⊥ vj for i =

6 j. The set
{v1 , . . . , vp } is called orthonormal if vi ⊥ vj for i =
6 j and kvi k = 1,
i = 1, . . . , p. For several reasons it will be convenient to work with
orthogonal, or even better, orthonormal sets of vectors. We first show how
any set of linearly independent vectors can be converted to an orthogonal or
orthonormal set. Before we state the theorem, let us just see how it works
with two vectors.
Example 5.2.2 Let {v, w} be linearly independent, and let us make a new
vector z of the form z = w + cv so that z ⊥ v. Thus we would like that
0 = hz, vi = hw, vi + chv, vi.
This is accomplished by taking

hw, vi
c=− .
hv, vi
6 0. Thus, by putting
Note that we are not dividing by zero, as v =
hw, vi
z=w− v,
hv, vi
we obtain an orthogonal set {v, z} so that Span{v, z} = Span{v, w}. If we

want to convert it to an orthonormal set, we simply divide v and z by their
v z
respective lengths, obtaining the set { kvk , kzk }.
We can do the above process for a set of p linearly independent vectors as

well. It is called the Gram–Schmidt process.
Theorem 5.2.3 Let (V, h·, ·i) be an inner product space, and let
{v1 , . . . , vp } be linearly independent. Construct {z1 , . . . , zp } as follows:
z1 = v1
hvk , zk−1 i hvk , z1 i
zk = vk − zk−1 − · · · − z1 , k = 2, . . . , p.
hzk−1 , zk−1 i hz1 , z1 i
(5.10)
Then {z1 , . . . , zp } is an orthogonal linearly independent set with the property

that
z1 zk
Span{v1 , . . . , vk } = Span{z1 , . . . , zk } = Span{ ,..., }, k = 1, . . . , p.
kz1 k kzk k
z
The set { kzz11 k , . . . , kzpp k } is an orthonormal set.
The proof is straightforward, and left to the reader. It is important to note

that none of the zk ’s are zero, otherwise it would indicate that
vk ∈ Span{v1 , . . . , vk−1 } (which is impossible due to the linear
independence). If one applies the Gram–Schmidt process to a set that is not
necessarily linearly independent, one may encounter a case where zk = 0. In
that case, vk ∈ Span{v1 , . . . , vk−1 }. In order to produce linearly
independent zj ’s, one would simple leave out vk and zk , and continue with
constructing zk+1 skipping over vk and zk .
Example 5.2.4 Let V = R2 [X], with
hp, qi = p(−1)q(−1) + p(0)q(0) + p(1)q(1).

Let {1, X, X 2 } be the linearly independent set. Applying Gram–Schmidt, we

get
z1 (X) = 1
hX, 1i
z2 (X) = X − 1 = X − 0 = X,
h1, 1i
hX 2 , Xi hX 2 , 1i 2
z3 (X) = X 2 − X− 1 = X2 − .
hX, Xi h1, 1i 3
(5.11)
The orthonormal set would be { √13 , √X2 , √36 X 2 − √2 }.

6
We call B = {v1 , . . . , vn } an orthogonal/orthonormal basis of V , if B is a

basis and is orthogonal/orthonormal.
One of the reasons why it is easy to work with an orthonormal basis, is that
it is easy to find the coordinates of a vector with respect to an orthonormal
basis.
Lemma 5.2.5 Let B = {v1 , . . . , vn } be an orthonormal basis of the inner

product space (V, h·, ·i). Let x ∈ V . Then
 
hx, v1 i
[x]B =  ...  .
 
hx, vn i
Pn Pn
Proof. Let x = i=1 ci vi . Then hx, vj i = i=1 ci hvi , vj i = cj , proving the
lemma.
Proposition 5.2.6 Let B = {v1 , . . . , vn } an orthonormal basis of the inner

product space (V, h·, ·iV ). Let x, y ∈ V , and write
   
c1 d1
 ..   .. 
[x]B =  .  , [y]B =  .  .
cn dn
Then
hx, yiV = c1 d1 + · · · + cn dn = h[x]B , [y]B i, (5.12)
where the last inner product is the standard (Euclidean) inner product for Fn .
Pn Pn
Proof. We have x = i=1 ci vi , y= j=1 dj vj , and thus
n X
X n n
X
hx, yiV = ci dj hvi , vj iV = cj dj ,
i=1 j=1 j=1
where we used that hvj , vj iV = 1, and hvi , vj iV = 0 when i 6= j.
Let (V, h·, ·iV ) and (W, h·, ·iW ) be inner product spaces, and let T : V → W
be linear. We call T an isometry if
hT (x), T (y)iW = hx, yiV for all x, y ∈ V.
Two inner product spaces V and W are called isometrically isomorphic if

there exists an isomorphism T : V → W that is also isometric.
Corollary 5.2.7 Let (V, h·, ·iV ) be an n-dimensional inner product space
over F, with F equal to R or C. Then V is isometrically isomorphic to Fn
with the standard inner product.
Proof. Let B = {v1 , . . . , vn } be an orthonormal basis for V , and define the

map T : V → Fn via T (v) = [v]B . By Theorem 3.2.7, T is an isomorphism,
and by Proposition 5.2.6, T is an isometry. Thus V and Fn are isometrically
isomorphic. .
A consequence of Corollary 5.2.7 is that to understand finite-dimensional

inner product spaces, it essentially suffices to study Fn with the standard
inner product. We will gladly make use of this observation. It is important to
remember, though, that to view an n-dimensional inner product space as Fn
one needs to start by choosing an orthonormal basis and represent vectors
with respect to this fixed chosen basis.
5.3 The adjoint of a linear map
Via the inner product, one can relate with a linear map another linear map
(called the adjoint). On a vector space over the reals, the adjoint of
multiplication with a matrix A corresponds to multiplication with the
transpose AT of the matrix A. Over the complex numbers, it also involves
taking a complex conjugate. We now provide you with the definition.
Let (V, h·, ·iV ) and (W, h·, ·iW ) be inner product spaces, and let T : V → W
be linear. We call a map T ? : W → V the adjoint of T if
hT (v), wiW = hv, T ? (w)iV for all v ∈ V, w ∈ W.
Notice that the adjoint is unique. Indeed if S is another adjoint for T , we get
that hv, T ? (w)iV = hv, S(w)iV for all v, w. Choosing v = T ? (w) − S(w)
yields
hT ? (w) − S(w), T ? (w) − S(w)iV = 0,
and thus T ? (w) − S(w) = 0. As this holds for all w, we must have that
T ? = S.
Lemma 5.3.1 If T : V → W is an isometry, then T ? T = idV .
Proof. Since T is an isometry, we have that
hT (v), T (v̂)iW = hv, v̂iV for all v, v̂ ∈ V.
But then, we get that
hT ? T (v), v̂iW = hv, v̂iV for all v, v̂ ∈ V,
or equivalently,
hT ? T (v) − v, v̂iW = hv, v̂iV for all v, v̂ ∈ V.
Letting v̂ = T ? T (v) − v, this yields T ? T (v) − v = 0 for all v ∈ V . Thus

T ? T = idV .
We call T unitary, if T is an isometric isomorphism. In that case T ? is the

inverse of T and we have T ? T = idV and T T ? = idW .
A linear map T : V → V is called self-adjoint if
hT (v), v̂iV = hv, T (v̂)iV for all v, v̂ ∈ V.
In other words, T is self-adjoint if T ? = T .
Example 5.3.2 Let k ∈ N and V = Span{sin(x), sin(2x), . . . , sin(kx)}, with

the inner product Z π
hf, giV = f (x)g(x)dx.
0
2 2
d d 2
Let T = − dx 2 : V → V . Notice that indeed − dx2 sin(mx) = m sin(mx) ∈ V ,
thus T is well-defined. We claim that T is self-adjoint. For this, we need to

apply integration by parts twice, and it is important to note that for all f in
V we have that f (0) = f (π) = 0. So let us compute
Z π Z π
hT (f ), giV = −f 00 (x)g(x)dx = −f 0 (π)g(π)+f 0 (0)g(0)+ f 0 (x)g 0 (x)dx =
0 0
Z π
f (π)g 0 (π) − f (0)g 0 (0) − f (x)g 00 (x)dx = hf, T (g)iV .
0
Theorem 5.3.3 Let (V, h·, ·iV ) and (W, h·, ·iW ) be inner product spaces with
orthonormal bases B = {v1 , . . . , vn } and C = {w1 , . . . , wm }. Let T : V → W
be linear. If  
a11 · · · a1n
[T ]C←B =  ... ..  ,

. 
am1 ··· amn
then  
a11 ··· am1
[T ? ]B←C =  ... ..  .

. 
a1n ··· amn
In other words,
[T ? ]B←C = ([T ]C←B )∗ , (5.13)
∗
where as before A is the conjugate transpose of the matrix A.
Proof. The matrix representation for T tells us, in conjunction with Lemma
5.2.5, that aij = hT (vj ), wi iW . The (k, l)th entry of the matrix
representation of T ? is, again by using the observation in Lemma 5.2.5,
equal to
hT ∗ (wl ), vk iV = hwl , T (vk )iW = hT (vk ), wl iW = alk ,
proving the statement.
Thus, when we identify via a choice of an orthonormal basis a

finite-dimensional inner product space V with Fdim V endowed with the
standard inner product, the corresponding matrix representation has the
property that the adjoint corresponds to taking the conjugate transpose.
One of the consequences of this correspondence is that any linear map
between finite-dimensional vector spaces has an adjoint. Indeed, this follows
from the observation that any matrix has a conjugate transpose. In what
follows we will focus on using matrices and their conjugate transposes.
Having the material of this and previous sections in mind, the results that
follow may be interpreted on the level of general finite-dimensional inner
product spaces. It is always good to remember that when adjoints appear,
there are necessarily inner products in the background.
5.4 Unitary matrices, QR, and Schur triangularization
Unitary transformations are ones where a pair of vectors is mapped to a new

pair of vectors without changing their lengths or the angle between them.
Thus, one can think of a unitary transformation as viewing the vector space
from a different viewpoint. Using unitary transformations (represented by
unitary matrices) can be used to put general transformations in a simpler
form. These simpler forms give rise to QR and Schur triangular
decompositions.
Let F = R or C. We call a matrix A ∈ Fn×m an isometry if A∗ A = Im .

Notice that necessarily we need to have that n ≥ m. The equation A∗ A can
also be interpreted as that the columns of A are orthonormal. When
A ∈ Fn×n is square, then automatically A∗ A = In implies that AA∗ = In .
Such a matrix is called unitary. Thus a square isometry is a unitary. From
the Gram–Schmidt process we can deduce the following.
Theorem 5.4.1 (QR factorization) Let A ∈ Fn×m with n ≥ m. Then there

exists an isometry Q ∈ Fn×m and an upper triangular matrix R ∈ Fm×m
with nonnegative entries on the diagonal, so that
A = QR.
If A has rank equal to m, then the diagonal entries of R are positive, and R
is invertible. If n = m, then Q is unitary.
Proof. First we consider the case when rankA = m. Let v1 , . . . , vm denote

the columns of A, and let z1 , . . . , zm denote the resulting vectors when we
apply the Gram–Schmidt process to v1 , . . . , vm as in Theorem 5.2.3. Let
now Q be the matrix with columns kzz11 k , . . . , kzzm
m ∗
k . Then Q Q = Im as the
columns of Q are orthonormal. Moreover, we have that
k−1
zk X
vk = kzk k + rkj zj ,
kzk k j=1
for some rkj ∈ F, k > j. Putting rkk = kzk k, and rkj = 0, k < j, and letting
R = (rkj )m
k,j=1 , we get the desired upper triangular matrix R yielding
A = QR.
When rank < m, apply the Gram–Schmidt process with those columns of A
z
that do not lie in the span of the preceding columns. Place the vectors kzk
that are found in this way in the corresponding columns of Q. Next, one can
fill up the remaining columns of Q with any vectors making the matrix an
isometry. The upper triangular entries in R are obtained from writing the
z
columns of A as linear combinations of the kzk ’s found in the process above.
Let us illustrate the QR factorization on an example where the columns of A

are linearly dependent.
 
1 0 1 2
1 −2 0 −2
Example 5.4.2 Let A =  1 0 1 0  . Applying the Gram–Schmidt

1 −2 0 0
process we obtain,
 
1
1
1 ,
z1 =  
1
     
0 1 1
−2 −4 1 −1
 0  − 4 1 =  1  ,
z2 =      
−2 1 −1
       
1 1 1 0
0 2 1 2 −1 0
1 − 4 1 − 4  1  = 0 .
z3 =  (5.14)
      
0 1 −1 0
We thus notice that the third column of A is a linear combination of the first
two columns of A, so we continue to compute z4 without using z3 :
       
2 1 1 1
−2 1 4 −1 −1
z4 =   − 0   −   =  
       .
0 1 4 1 −1
0 1 −1 1
Dividing z1 , z2 , z4 by their respective lengths, and putting them in the

matrix Q, we get the equality
  1 1 1
 
1 0 1 2 2 2 2 −2 2 −1 0
1 −2 0 −2  1 − 1 − 12 
  0 2 1 2 ,
 
A=  =  12 1
2
1 
1 0 1 0  
2 2 − 2 0 0 0 0
1 1 1
1 −2 0 0 2 − 2 2
0 0 0 2
where it remains to fill in the third column of Q. To make the columns of Q

∗
orthonormal, we choose the third column to be 21 1
2 − 12 − 21 , so we
get 1 1 1 1
  
2 2 2 2 −2 2 −1 0
1 −1 1
− 21  0 2 1 2
Q = 1 2 2 2  ,R = 
 .
1
2 2 − 21 − 12  0 0 0 0
1 1 1 1
2 −2 −2 2
0 0 0 2
Notice that finding the QR factorization only requires simple algebraic

operations. Surprisingly, it can be used very effectively to find eigenvalues of
a matrix. The QR algorithm is based on the following iteration scheme. Let
A ∈ Fn×n be given. Let A0 = A, and perform the iteration:
find QR factorization Ak = QR, then put Ak+1 = RQ.
Notice that Ak+1 = Q−1 Ak Q, so that Ak+1 and Ak have the same
eigenvalues. As it turns out, Ak converges to an upper triangular matrix,
manageable exceptions aside, and thus one can read the eigenvalues from the
diagonal of this upper triangular limit. In a numerical linear algebra course,
one studies the details of this algorithm. It is noteworthy, though, to remark
that when one wants to find numerically the roots of a polynomial, it is
often very effective to build the associated companion matrix, and
subsequently use the QR algorithm to find the eigenvalues of this companion
matrix, which coincide with the roots of the polynomial. Thus contrary to
how we do things by hand, we rather find roots by computing eigenvalues
than the other way around. We will discuss this further in Section 7.3.
By combining the Jordan canonical form theorem and the QR factorization

theorem, we can prove the following result.
Theorem 5.4.3 (Schur triangularization) Let A ∈ Fn×n and suppose that

all its eigenvalues are in F. Then there exits a unitary U ∈ Fn×n and an
upper triangular T ∈ Fn×n so that A = U T U ∗ .
Proof. By Theorem 4.4.1 there exists an invertible nonsingular P ∈ Fn×n

such that A = P JP −1 , where the matrix J is a direct sum of Jordan blocks,
and thus J is upper triangular. By Theorem 5.4.1 there exists a unitary Q
and an invertible upper triangular R such that P = QR. Now,
A = P JP −1 = QRJ(QR)−1 = Q(RJR−1 )Q−1 = Q(RJR−1 )Q∗ ,
where Q−1 = Q∗ since Q is unitary. The inverse of an upper triangular

matrix is upper triangular, and the product of upper triangular matrices is
also upper triangular. It follows that T := RJR−1 is upper triangular, and
thus A = QT Q∗ with Q unitary and T upper triangular.
5.5 Normal and Hermitian matrices
In this section we study transformations that interact particularly nicely

with respect to the inner product. A main feature of these normal and
Hermitian transformations is that its eigenvectors can be used to form an
orthonormal basis for the underlying space.
A matrix A ∈ Fn×n is called normal if A∗ A = AA∗ .
Lemma 5.5.1 (a) If U is unitary, then A is normal if and only if U ∗ AU is

normal.
(b) If T is upper triangular and normal, then T is diagonal.
Proof. (a). Let us compute U ∗ AU (U ∗ AU )∗ = U ∗ AU U ∗ A∗ U = U ∗ AA∗ U , and

(U ∗ AU )∗ U ∗ AU ) = U ∗ A∗ U U ∗ AU = U ∗ A∗ AU . The two are equal if and only
if AA∗ = A∗ A (where we used that U is invertible). This proves the first
part.
(b). Suppose that T = (tij )ni,j=1 is upper triangular. Thus tij = 0 for i > j.
Since T is normal we have that T ∗ T = T T ∗ . Comparing the (1, 1) entry on
both sides of this equation we get
|t11 |2 = |t11 |2 + |t12 |2 + · · · + |t1n |2 .
This gives that t12 = t13 = · · · = t1n = 0. Next, comparing the (2, 2) entry on
both sides of T ∗ T = T T ∗ we get
|t22 |2 = |t22 |2 + |t23 |2 + · · · + |t2n |2 .
This gives that t23 = t24 = · · · = t2n = 0. Continuing this way, we find that
tij = 0 for all i < j. Thus T is diagonal.
Theorem 5.5.2 (Spectral theorem for normal matrices) Let A ∈ Fn×n be

normal, and suppose that all eigenvalues of A lie in F. Then there exists a
unitary U ∈ Fn×n and a diagonal D ∈ Fn×n so that
A = U DU ∗ .
Proof. By Theorem 5.4.3 we have that A = U T U ∗ , where U is unitary and T

is upper triangular. By Lemma 5.5.1, since A is normal, so is T . Again, by
Lemma 5.5.1, as T is upper triangular and normal, we must have that T is

diagonal. But then, with D = T we have the desired factorization
A = U DU ∗ .
Examples of normal matrices are the following:
• A is Hermitian if A = A∗ .
• A is skew-Hermitian if A = −A∗ .
• A is unitary if AA∗ = A∗ A = I.
Hermitian, skew-Hermitian, and unitary matrices are all normal. Hermitian

and skew-Hermitian matrices have the following characterization.
Proposition 5.5.3 Let A ∈ Cn×n .
(i) A is Hermitian if and only if x∗ Ax ∈ R for all x ∈ Cn .

(ii) A is skew-Hermitian if and only if x∗ Ax ∈ iR for all x ∈ Cn .
Proof. (i) If A = A∗ , then (x ∗ Ax)∗ = x∗ A∗ x = x∗ Ax, and thus x∗ Ax ∈ R.

Conversely, suppose that x∗ Ax ∈ R for all x ∈ Cn . Write A = (ajk )nj,k=1 .
First, let x = ej . Then e∗j Aej = ajj , so we get that ajj ∈ R, j = 1, . . . , n.
Next, let x = ej + ek . Then x∗ Ax = ajj + ajk + akj + akk . Thus we get that
ajk + akj ∈ R, and consequently Im ajk = −Im akj . Finally, let x = ej + iek .
Then x∗ Ax = ajj + iajk − iakj + akk . Thus we get that i(ajk − akj ) ∈ R, and
thus Re ajk = Re akj . Thus ajk = akj . Thus A = A∗ .
(ii) Replace A by iA and use (i).
We say that a matrix A ∈ Cn×n is positive semidefinite if x∗ Ax ≥ 0 for all

x ∈ Cn . The matrix A ∈ Cn×n is positive definite if x∗ Ax > 0 for all
x ∈ Cn \ {0}. Clearly, by Proposition 5.5.3(i), if A is positive (semi)definite,
then A is Hermitian.
We have the following result.
Theorem 5.5.4 Let A ∈ Cn×n . Then the following hold.
(i) A is Hermitian if and only if there exists a unitary U and a real

diagonal D so that A = U DU ∗ . If A ∈ Rn×n , then U can be chosen to
be real as well.
(ii) A is skew-Hermitian if and only if there exists a unitary U and a

purely imaginary diagonal D so that A = U DU ∗ .
(iii) A is unitary if and only if there exists a unitary U and a diagonal

D = diag(dii )ni=1 with |dii | = 1 so that A = U DU ∗ .
(iv) A is positive semidefinite if and only if there exists a unitary U and a
nonnegative real diagonal D so that A = U DU ∗ . If A ∈ Rn×n , then U
can be chosen to be real as well.
(v) A is positive semidefinite if and only if there exists a unitary U and a
positive real diagonal D so that A = U DU ∗ . If A ∈ Rn×n , then U can
be chosen to be real as well.
Proof. It is easy to see that when A = U DU ∗ with U unitary, then A is

Hermitian/skew-Hermitian/unitary/positive (semi)definite if and only if D is
Hermitian/skew-Hermitian/unitary/positive (semi)definite. Next, for a
diagonal matrix one easily observes that D is Hermitian if and only if D is
real, D is skew-Hermitian if and only if D is purely imaginary, D is unitary
if and only if its diagonal entries have modulus 1, D is positive semidefinite
if and only if D is nonnegative, and D is positive definite if and only if D is
positive. Combining these observations with Theorem 5.5.2 yields the result.
.
We end this section with Sylvester’s Law of Inertia. Given a Hermitian

matrix A ∈ Cn×n , its inertia In(A) is a triple In(A) = (i+ (A), i− (A), i0 (A)),
where i+ (A) is the number of positive eigenvalues of A (counting
multiplicity), i− (A) is the number of negative eigenvalues of A (counting
multiplicity), and i0 (A) is the number of zero eigenvalues of A (counting
multiplicity). For example,
 
2 0 0 0
0 3 0 0
In 
0 0 1 1 = (3, 0, 1).

0 0 1 1
Note that i0 (A) = dim Ker(A), and i+ (A) + i− (A) + i0 (A) = n. We now
have the following result.
Theorem 5.5.5 (Sylvester’s Law of Inertia) Let A, B ∈ Cn×n be

Hermitian. Then In(A) = In(B) if and only if there exists an invertible S so
that A = SBS ∗ .
We will be using the following lemma.

Lemma 5.5.6 Let A ∈ Cn×n be Hermitian with In(A) = (µ, ν, δ). Then
there exists an invertible T so that
 
Iµ
A=T −Iν  T ∗.
0
Proof. Let A = U ΛU ∗ with U unitary, Λ = diag(λi )ni=1 , λ1 , . . . , λµ > 0,

λµ+1 , . . . λµ+ν < 0, and λµ+ν+1 = · · · = λn = 0. If we let
√ 
λ1
 .. 
 . 
 p 

 λµ p 

D=  −λµ+1 

 .. 

 . 

p
 −λµ+ν 
Iδ
and T = U D, then the lemma follows.
Proof of Theorem 5.5.5. First suppose that In(A) = (µ, ν, δ) = In(B). By

Lemma 5.5.6 there exist invertible T and W so that
   
Iµ Iµ
A=T −Iν  T ∗, B = W  −Iν  W ∗.
0 0
But then by letting S = T W −1 we obtain that A = SBS ∗ .
Conversely, if A = SBS ∗ for some invertible S. We first notice that

i0 (A) = dim Ker(A) = dim Ker(B) = i0 (B). By applying Lemma 5.5.6 to
both A and B, and combining the results with A = SBS ∗ , we obtain that
there exists an invertible W so that
   
Ii+ (A) Ii+ (B)
 −Ii− (A) =W −Ii− (B)  W ∗, (5.15)
0 0
where the diagonal zeros have equal size. Let us partition W = (Wij )3i,j=1 in
an appropriately sized block matrix (so, for instance, W11 has size
i+ (A) × i+ (B) and W22 has size i− (A) × i− (B)). Then from the (1, 1) block
entry of the equality (5.15) we get that
∗ ∗
Ii+ (A) = W11 Ii+ (B) W11 − W12 Ii− (B) W12 .
∗ ∗
This gives that rank W11 W11 ≤ i+ (B) and W11 W11 = Ii+ (A) + W12 W12 is
∗
positive definite of size i+ (A) × i+ (A), and thus rank W11 W11 ≥ i+ (A).
Combining these observations, gives i+ (B) ≥ i+ (A). Reversing the roles of A
and B, one can apply the same argument and arrive at the inequality
i+ (A) ≥ i+ (B). But then i+ (B) = i+ (A) follows. Finally,
i− (A) = n − i+ (A) − i0 (A) = n − i+ (B) − i0 (B) = i− (B),
and we are done.
5.6 Singular value decomposition
The singular values decomposition gives a way to write a general (typically,

non-square) matrix A as the product A = V ΣW ∗ , where V and W ∗ are
unitary and Σ is essentially diagonal with nonnegative entries. This means
that by changing the viewpoint in both the domain and the co-domain, a
linear transformation between finite-dimensional spaces can be viewed as
multiplying with a relatively few (at most the dimension of the domain
and/or co-domain) nonnegative numbers. One of the main applications of
the singular value decomposition is that it gives an easy way to approximate
a matrix with one that has a low rank. The advantage of a low rank matrix
is that it requires less memory to store it. If you take a look at the solution
of Exercise 5.7.31, you will see how a rank 524 matrix (the original image) is
approximated by a rank 10, 30, and 50 one by using the singular value
decomposition.
Here is the main result of this section.
Theorem 5.6.1 Let A ∈ Fn×m have rank k. Then there exist unitary
matrices V ∈ Fn×n , W ∈ Fm×m , and a matrix Σ ∈ Fn×m of the form
 
σ1 0 · · · 0 · · · 0
 0 σ2 · · · 0 · · · 0
 .. .. . . .. .. 
 
. . . . . 
 , σ1 ≥ σ2 ≥ . . . ≥ σk > 0,
Σ= (5.16)
0
 0 · · · σ k · · · 0 

. .. .. . .
 .. . . 0. .. 
0 0 ··· 0 ··· 0
so that A = V ΣW ∗ .
Proof. As A∗ A is positive semidefinite, there exists a unitary W and a

diagonal matrix Λ = (λi )ni=1 , with λ1 ≥ · · · ≥ λk > 0 = λk+1 = · · · = p λn , so

that A∗ A = W ΛW ∗ . Notice that rank A = rankA∗ A = k. Put σj = λj ,
j = 1, . . . , k, and write W = w1 · · · wm . Next, put vj = σ1j Awj ,
∗
{vk+1 , . . . , vn } be an orthonormal basis for KerA . Put
j = 1, . . . , k, and let
V = v1 · · · vn . First, let us show that V is unitary. When
i, j ∈ {1, . . . , k}, then
1 1
vj∗ vi = wj A∗ Awi = w∗ W ΛW ∗ wi =
σi σj σi σj j
(
1 ∗ 0 when i 6= j,
e Λei = λj
σi σj j σ2
= 1 when i = j.
j
Next, when j ∈ {1, . . . , k} and i ∈ {k + 1, . . . , n}, we get that

vj∗ vi = σ1j wj A∗ vi = 0 as vi ∈ Ker A∗ . Similarly, vj∗ vi = 0 when
i ∈ {1, . . . , k} and j ∈ {k + 1, . . . , n}. Finally, {vk+1 , . . . , vn } is an
orthonormal set, and thus we find that V ∗ V = In .
It remains to show that A = V ΣW ∗ , or equivalently, AW = V Σ. The

equality in the first k columns follows from the definition of vj , j = 1, . . . , k.
In columns k + 1, . . . , m we have 0 on both sides, and thus AW = V Σ
follows.
Alternative proof. First assume that n = m. Consider the Hermitian 2n × 2n

matrix
0 A
M= .
A∗ 0

v1
Observe that M v = λv, where v = , yields
v2

0 A v1 v1
= λ .
A∗ 0 v2 v2
Then
0 A v1 v1
= −λ .
A∗ 0 −v2 −v2
Thus, if we denote the positive eigenvalues of M by σ1 ≥ σ2 ≥ . . . ≥ σk > 0,
then −σ1 , . . . , −σk are also eigenvalues of M . Notice that when λ = 0, we
(1) (n−k) (1) (n−k)
can take a basis {v1 , . . . , v1 } of Ker A∗ , and a basis {v2 , . . . , v2 }
of KerA, and then
! ! ! !
(1) (n−k) (1) (n−k)
v1 v1 v1 v1
{ (1) , . . . , (n−k) , (1) , . . . , (n−k) }
v2 v2 −v2 −v2
is a basis for Ker M . By Theorem 5.5.4 there exists a unitary U and a

diagonal D so that M = U DU ∗ , and by the previous observations we can

organize it so that

X X Σ 0
U= ,D = .
Y −Y 0 −Σ
Now, we get
∗
Y∗

0 A X X Σ 0 X
= ,
A∗ 0 Y −Y 0 −Σ X∗ −Y ∗
and we also have that
∗
Y∗

X X X In 0
= .
Y −Y X∗ −Y ∗ 0 In
Writing out these equalities, we get that
√ √
A = ( 2X)Σ( 2Y )∗ ,
√ √
with 2X and 2Y unitary.
When A is of size n × m with n > m, one can do a QR factorization A = QR.

Next, obtain a singular value decomposition of the m × m matrix R:
R = V̂ Σ̂W ∗ . Then A = (QV̂ )Σ̂W ∗ . The matrix QV̂ is isometric, and we can
make it a square unitary by adding columns Q2 so that the square matrix

V := QV̂ Q2
has columns that form an orthonormal basis of Fn ; in other words, so that V
is unitary. Next let
Σ̂
Σ= ∈ Fn×m .
0
Then A = V ΣW ∗ as desired.
Finally, when A is of size n × m with n < m, apply the previous paragraph

to the m × n matrix A∗ , to obtain A∗ = V̂ Σ̂Ŵ ∗ . Then by letting V = Ŵ ,
Σ = Σ̂∗ , and W = V̂ , we get the desired singular value decomposition
A = V ΣW ∗ .
The values σj are called the singular values of A, and they are uniquely
determined by A. We also denote them by σj (A).
Proposition 5.6.2 Let A ∈ Fn×m , and let k · k be the Euclidean norm. Then
σ1 (A) = max kAxk. (5.17)
kxk=1
In particular, σ1 (·) is a norm of Fn×m . Finally, if A ∈ Fn×m and B ∈ Fm×k

then
σ1 (AB) ≤ σ1 (A)σ1 (B). (5.18)
Proof. Write A = V ΣW ∗ in its singular value decomposition. For U unitary

we have that kU vk = kvk for all vectors v. Thus
kAxk = kV ΣW ∗ xk = kΣW ∗ xk. Let u = W ∗ x. Then kxk = kW uk = kuk.
Combining these observations, we have that
q
max kAxk = max kΣuk = max σ12 |u1 |2 + · · · + σk2 |uk |2 ,
kxk=1 kuk=1 kuk=1
p
which is clearly bounded above by σ12 |u1 |2 + · · · + σ12 |uk |2 = σ1 kuk = σ1 .
When u = e1 , then we get that kΣuk = σ1 . Thus maxkuk=1 kΣuk = σ1
follows.
To check that σ1 (·) is a norm, the only condition that is not immediate is
the triangle inequality. This now follows by observing that
σ1 (A + B) = max k(A + B)xk ≤ max kAxk + max kBxk = σ1 (A) + σ1 (B).

kxk=1 kxk=1 kxk=1
To prove (5.18) we first observe that for every vector v ∈ Fm we have that
1
kAvk ≤ σ1 (A)kvk, as w := kvk v has norm 1, and thus
kAwk ≤ maxkxk=1 kAxk = σ1 (A). Now, we obtain that
σ1 (AB) = max k(AB)xk ≤ max σ1 (A)kBxk =

kxk=1 kxk=1
σ1 (A) max kBxk = σ1 (A)σ1 (B).

kxk=1
An important application of the singular value decomposition is low rank

approximation of matrices. The advantage of a low rank matrix is that it
requires less memory to store a low rank matrix.
Proposition 5.6.3 Let A have singular value decomposition A = V ΣW ∗

with Σ as in (5.16). Let l ≤ k. Put Â = V Σ̂W ∗ with
 
σ1 0 · · · 0 · · · 0
 0 σ2 · · · 0 · · · 0 
 .. .. . . .. .. 
 
. . . . . 
Σ̂ =  . (5.19)
0
 0 · · · σ l · · · 0 

. .. .. . .
 .. . . 0. .. 
0 0 ··· 0 ··· 0
Then rank Â = l, σ1 (A − Â) = σl+1 , and for any matrix B with rankB ≤ l
we have σ1 (A − B) ≥ σ1 (A − Â).
Proof. Clearly rank Â = l, σ1 (A − Â) = σl+1 . Next, let B with rankB ≤ l.

Put C = V ∗ BW . Then rankC = rankB ≤ l, and σ1 (A − B) = σ1 (Σ − C).
Notice that dim Ker C ≥ m − l, and thus Ker C ∩ Span{e1 , . . . , el+1 } has
dimension ≥ 1. Thus we can find a v ∈ Ker C ∩ Span{e1 , . . . , el+1 } with
kvk = 1. Then
σ1 (Σ − C) ≥ k(Σ − C)vk = kΣvk ≥ σl+1 ,
where in the last step we used that v ∈ Span{e1 , . . . , el+1 }. This proves the
statement.
Low rank approximations are used in several places, for instance in data
compression and in search engines.
We end this section with an example where we compute the singular value
decomposition of a matrix. For this it is useful to notice that if A = V ΣW ∗ ,
then AA∗ = V ΣΣ∗ V ∗ and A∗ A = W Σ∗ ΣW ∗ . Thus the columns of V are
eigenvectors of AA∗ , and the diagonal elements σj2 of the diagonal matrix
ΣΣ∗ are the eigenvalues of AA∗ . Thus the singular values can be found by
computing the square roots of the nonzero eigenvalues of AA∗ . Similarly, the
columns of W are eigenvectors of A∗ A, and the diagonal elements σj2 of the
diagonal matrix Σ∗ Σ are the nonzero eigenvalues of A∗ A, as we have seen in
the proof of Theorem 5.6.1.

3 2 2
Example 5.6.4 Let A = . Find the singular value
2 3 −2
decomposition of A.
Compute
∗17 8
AA = ,
8 17
which has eigenvalues 9 and 25. So the singular values of A are 3 and 5, and
we get
5 0 0
Σ= .
0 3 0
To find V , we find unit eigenvectors of AA∗ , giving
√ √
1/√2 1/ √2
V = .
1/ 2 −1/ 2
For W observe that V ∗ A = ΣW ∗ . Writing W = w1

w2 w3 , we get
√ √ ∗
5/√2 5/ √2 0√ 5w1
= .
1/ 2 −1/ 2 4/ 2 3w2∗
This yields w1 and w2 . To find w3 , we need to make sure that W is unitary,

and thus w3 needs to be a unit vector orthogonal to w1 and w2 . We find
 √ √ 
1/√2 1/3 √2 2/3
W = 1/ 2 −1/3√ 2 −2/3 .
0 4/3 2 −1/3
5.7 Exercises
Exercise 5.7.1 For the following, check whether h·, ·i is an inner product.
(a) V = R2 , F = R,

x y
h 1 , 1 i = 3x1 y1 + x1 y2 + x2 y1 + 2x2 y2 .
x2 y2
(b) V = C2 , F = C,

x y
h 1 , 1 i = 3x1 y1 + x1 y2 + x2 y1 + 2x2 y2 .
x2 y2
(c) Let V = {f : [0, 1] → R : f is continuous}, F = R,
hf, gi = f (0)g(0) + f (1)g(1) + f (2)g(2).
(d) Let V = R2 [X], F = R,
hf, gi = f (0)g(0) + f (1)g(1) + f (2)g(2).
(e) Let V = {f : [0, 1] → C : f is continuous}, F = C,

Z 1
hf, gi = f (x)g(x)(x2 + 1)dx.
0
Exercise 5.7.2 For the following, check whether k · k is a norm.
(a) V = C2 , F = C,
x1
k k = x21 + x22 .
x2
(b) V = C2 , F = C,
x1
k k = |x1 | + 2|x2 |.
x2

Z 2
kf k = |f (x)|(1 − x)dx.
0
(d) Let V = {f : [0, 1] → R : f is continuous}, F = R,

Z 1
kf k = |f (x)|(1 − x)dx.
0
Exercise 5.7.3 Let v1 , . . . , vn be nonzero orthogonal vectors in an inner

product space V . Show that {v1 , . . . , vn } is linearly independent.
Exercise 5.7.4 Let V be an inner product space.
(a) Determine {0}⊥ and V ⊥ .

   
1 0
 i   −i  ⊥
(b) Let V = C4 and W = {
1 + i , 1 + 2i}. Find a basis for W .
  
2 0
(c) In case V is finite dimensional and W is a subspace, show that
dim W ⊥ = dim V − dim W . (Hint: start with an orthonormal basis for
W and add vectors to it to obtain an orthonormal basis for V ).
Exercise 5.7.5 Let h·, ·i be the Euclidean inner product on Fn , and k · k the
associated norm.
(a) Let F = C. Show that A ∈ Cn×n is the zero matrix if and only if
hAx, xi = 0 for all x ∈ Cn . (Hint: for x, y ∈ C, use that
hA(x + y), x + yi = 0 = hA(x + iy), x + iyi.)
(b) Show that when F = R, there exists nonzero matrices A ∈ Rn×n , n > 1,
so that hAx, xi = 0 for all x ∈ Rn .
(c) For A ∈ Cn×n define
w(A) = max |hAx, xi|. (5.20)
x∈Cn ,kxk=1
Show that w(·) is a norm on Cn×n . This norm is called the numerical
radius of A.
(d) Explain why maxx∈Rn ,kxk=1 |hAx, xi| does not define a norm.
Exercise 5.7.6 Find an orthonormal basis for the subspace in R4 spanned

by      
1 1 3
1 2 1
 , , .
1 1 3
1 2 1
Exercise 5.7.7
Let V = R[t] over the field R. Define the inner product

Z 1
hp, qi := p(t)q(t)dt.
−1
For the following linear maps on V , determine whether they are self-adjoint.
(a) Lp(t) := (t2 + 1)p(t).

dp
(b) Lp(t) := dt (t).
(c) Lp(t) = −p(−t).
Exercise 5.7.8 Let V = R[t] over the field R. Define the inner product
Z 2
0
For the following linear maps on V , determine whether they are unitary.
(a) Lp(t) := tp(t).

(b) Lp(t) = −p(2 − t).
Exercise 5.7.9 Let U : V → V be unitary, where the inner product on V is

denoted by h·, ·i.
(a) Show that |hx, U xi| ≤ kxk2 for all x in V .

(b) Show that |hx, U xi| = kxk2 for all x in V , implies that U = αI for some
|α| = 1.
Exercise 5.7.10 Let V = Cn×n , and define

1 2 1 0
(a) Let W = span{ , }. Find an orthonormal basis for W .
0 1 2 1
(b) Find a basis for W ⊥ := {B ∈ V : B ⊥ C for all C ∈ W }.
Exercise 5.7.11 Let A ∈ Cn×n . Show that if A is normal and Ak = 0 for

some k ∈ N, then A = 0.
Exercise 5.7.12 Let A ∈ Cn×n and a ∈ C. Show that A is normal if and

only if A − aI is normal.
Exercise 5.7.13 Show that the sum of two Hermitian matrices is

Hermitian. How about the product?
Exercise 5.7.14 Show that the product of two unitary matrices is unitary.
How about the sum?
Exercise 5.7.15 Is the product of two normal matrices is normal? How

about the sum?
Exercise 5.7.16 Show that the following matrices are unitary.

√1
1 1
(a) .
2 1 −1
 
1 1 1
2iπ 4iπ
(b) √13 1 e 3 e 3 .
4iπ 8iπ
1 e 3 e 3
 
1 1 1 1
1 i −1 −i 
(c) 21  .
1 −1 1 −1
1 −i −1 i
(d) Can you guess the general rule? (Hint: the answer is in Proposition
7.4.3).
Exercise 5.7.17 For the following matrices A find the spectral

decomposition U DU ∗ of A.

2 i
(a) A = .
−i 2
√
2 3
(b) A = √ .
3 4
 
3 1 1
(c) A = 1 3 1.
1 1 3
 
0 1 0
(d) A = 0 0 1.
1 0 0

3 2i
Exercise 5.7.18 Let A = .
−2i 3
(a) Show that A is positive semidefinite.

(b) Find the positive square root of A; that is, find a positive semidefinite B
1
so that B 2 = A. We denote B by A 2 .
Exercise 5.7.19 Let A ∈ Cn×n be positive semidefinite, and let k ∈ N.

Show that there exists a unique positive semidefinite B so that B k = A. We
1
call B the kth root of A and denote B = A k .
Exercise 5.7.20 Let A ∈ Cn×n be positive semidefinite. Show that

1
lim trA k = rankA.
k→∞
1
(Hint: use that for λ > 0 we have that limk→∞ λ k = 1.)
Exercise 5.7.21 Let A = A∗ be an n × n Hermitian matrix, with

eigenvalues λ1 ≥ · · · ≥ λn .
(a) Show tI − A is positive semidefinite if and only if t ≥ λ1 .

(b) Show that λmax (A) = λ1 = maxhx,xi=1 hAx, xi, where h·, ·i is the
Euclidean inner product.
(c) Let Â be the matrix obtained from A by removing row and column i.
Then λmax (Â) ≤ λmax (A).
Exercise 5.7.22 (a) Show that a square matrix A is Hermitian iff

A2 = A∗ A.
(b) Let H be positive semidefinite, and write H = A + iB where A and B
are real matrices. Show that if A is singular, then H is singular as well.
Exercise 5.7.23 (a) Let A be positive definite. Show that A + A−1 − 2I is

positive semidefinite.
(b) Show that A is normal if and only if A∗ = AU for some unitary matrix
U.
 
1 1 0
Exercise 5.7.24 Find a QR factorization of 1 0 1 .
0 1 1
Exercise 5.7.25 Find the Schur factorization A = U T U ∗ , with U unitary

and T triangular, for the matrix
 
−1 −2 3
A= 2 4 −2 .
1 −2 1
Note: 2 is an eigenvalue of A.
Exercise 5.7.26 Let

A B
T = (5.21)
C D
be a block matrix, and suppose that D is invertible. Define the Schur
complement S of D in T by S = A − BD−1 C. Show that
rank T = rank(A − BD−1 C) + rank D.
Exercise 5.7.27 Using Sylvester’s law of inertia, show that if

A B
M= = M ∗ ∈ C(n+m)×(n+m)
B∗ C
with C invertible, then
In M = In C + In(A − BC −1 B ∗ ). (5.22)

I 0
(Hint: Let S = and compute SM S ∗ .)
−B ∗ A−1 I
Exercise 5.7.28 Determine the singular value decomposition of the

following matrices.
 √ 
1 1 2√2i
(a) A = √−1 −1
√ 2 2i.
2i − 2i 0
 
−2 4 5
6 0 −3
(b) A =  .
6 0 −3
−2 4 5
Exercise 5.7.29 Let A be a 4 × 4 matrix with spectrum

σ(A) = {−2i, 2i, 3 + i, 3 + 4i} and singular values σ1 ≥ σ2 ≥ σ3 ≥ σ4 .
(a) Determine the product σ1 σ2 σ3 σ4 .

(b) Show that σ1 ≥ 5.
(c) Assuming A is normal, determine tr(A + AA∗ ).

P Q
Exercise 5.7.30 Let A = ∈ C(k+l)×(m+n) , where P is of size
R S
k × m. Show that
σ1 (P ) ≤ σ1 (A).
Conclude that σ1 (Q) ≤ σ1 (A), σ1 (R) ≤ σ1 (A), σ1 (S) ≤ σ1 (A) as well.
Exercise 5.7.31 This is an exercise that uses MATLAB 1 R

, and its purpose
is to show what happens with an image if you take a low rank
approximation of it.
1. Take an image.
2. Load it into MATLAB R

(using “imread”). This produces a matrix (three
matrices (organized as a three-dimensional array for a color image). The
elements are of type “uint8.”
3. Convert the elements to type “double” (using the command “double”);

otherwise you cannot do computations.
1 MATLAB R
is a trademark of TheMathWorks, Inc., and is used with permission. The-
MathWorks does not warrant the accuracy of the text or exercises in this book. This book’s
use or discussion of MATLAB R
software or related products does not constitute endorse-
ment or sponsorship by TheMathWorks of a particular pedagogical approach or particular
use of the MATLAB R
software
4. Take a singular value decomposition (using “svd”).
5. Keep only the first k largest singular values.
6. Compute the rank k approximation.
7. Look at the image (using “imshow”).
Here are the commands I used on a color image (thus the array has three
levels) with k = 30:
A=imread(Hugo2.png);
AA=double(cdata);
[U,S,V]=svd(AA(:,:,1));
[U2,S2,V2]=svd(AA(:,:,2));
[U3,S3,V3]=svd(AA(:,:,3));
H=zeros(size(S,1),size(S,2));
for i=1:30, H(i,i)=1; end;
Snew=S.*H;
Snew2=S2.*H;
Snew3=S3.*H;
Anew(:,:,1)=U*Snew*V’;
Anew(:,:,2)=U2*Snew2*V2’;
Anew(:,:,3)=U3*Snew3*V3’;
Anew=uint8(Anew);
imshow(Anew)
Exercise 5.7.32 The condition number κ (A) of an invertible n × n matrix

A is given by κ (A) = σσn1 (A)
(A) , where σ1 (A) ≥ · · · ≥ σn (A) are the singular
values of A. Show that for all invertible matrices A and B, we have that
κ (AB) ≤ κ (A) κ (B). (Hint: use that σ1 (A−1 ) = (σn (A))−1 and (5.18).)
Exercise 5.7.33 Prove that if X and Y are positive definite n × n matrices

such that Y − X is positive semidefinite, then det X ≤ det Y . Moreover,

det X = det Y if and only if X = Y .
Exercise 5.7.34 (Least squares solution) When the equation Ax = b does

not have a solution, one may be interested in finding an x so that kAx − bk
is minimal. Such an x is called a least squares solution to Ax = b. In this
exercise we will show that if A = QR, with R invertible, then the least
squares solution is given by x = R−1 Q∗ b. Let A ∈ Fn×m with rank A = m.
(a) Let A = QR be a QR-factorization of A. Show that Ran A = Ran Q.

(b) Observe that QQ∗ b ∈ Ran Q. Show that for all v ∈ Ran Q we have
6 QQ∗ b.
kv − bk ≥ kQQ∗ b − bk and that the inequality is strict if v =
(c) Show that x := R−1 Q∗ b is the least squares solution to Ax = b.
   
1 1 3
(d) Let A = 2 1 and b = 5. Find the least squares solution to
3 1 4
Ax = b.
(e) In trying to fit a line y = cx + d through the points (1, 3), (2, 5), and
(3, 4), one sets up the equations
3 = c + d, 5 = 2c + d, 4 = 3c + d.
Writing this in matrix form we get

c
A = b,
d
where A and b are as above. One way to get a “fitting line” y = cx + d,

is to solve for c and d via least squares, as we did in the previous part.
This is the most common way to find a so-called regression line. Plot the
three points (1, 3), (2, 5), and (3, 4) and the line y = cx + d, where c and
d are found via least squares as in the previous part.
Exercise 5.7.35 Let A, X be m × m matrices such that A = A∗ is

invertible and
H := A − X ∗ AX (5.23)
is positive definite.
(a) Show that X has no eigenvalues on the unit circle T = {z ∈ C : |z| = 1}.
(b) Show that A is positive definite if and only if X has all eigenvalues in
D = {z ∈ C : |z| < 1}. (Hint: When X has all eigenvalues in D, we have
that X n →
P0∞as n → ∞. Use this to show that
A = H + k=1 X ∗k HX k .)
Exercise 5.7.36 (Honors) On both C4×4 and C6×6 , we have the inner
product given via hA, Bi = tr(B ∗ A). Let T : C4×4 → C6×6 be given via
 
m11 m12 m13 m14 m13 m14
m21 m22 m23 m24 m23 m24 
 
4
m31 m32 m33 m34 m33 m34 
T (mij )i,j=1 := 
 .
m41 m42 m43 m44 m43 m44 

m31 m32 m33 m34 m33 m34 
m41 m42 m43 m44 m43 m44
Determine the dual of T .
Exercise 5.7.37 (Honors) Let A have no eigenvalues on the unit circle, and
let C = −(A∗ + I)(A∗ − I)−1 .
(a) Show that C is well-defined.

(b) Show that A satisfies the Stein equation H − A∗ HA = V , with V
positive definite, if and only if C satisfies a Lyapunov equation
CH + HC ∗ = G with G positive definite.
(c) With C as above, show that C has no purely imaginary eigenvalues.
(d) Show that H is positive definite if and only if C has all its eigenvalues in
the right half-plane H = {z ∈ C : Re z > 0}. (Hint: use Exercise 5.7.35.)
Example 5.7.38 (Honors) Let A have all its eigenvalues in the left
half-plane −H = {z ∈ C : Re z < 0}, and let C be a positive semidefinite
matrix of the same size. Show that
Z∞
∗
X= eAt CeA t dt
0
exists (where an integral of a matrix function is defined entrywise), is

positive semidefinite, and satisfies the Lyapunov equation XA + A∗ X = −C.
6
Constructing New Vector Spaces from
Given Ones
CONTENTS
6.1 The Cartesian product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
6.2 The quotient space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
6.3 The dual space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
6.4 Multilinear maps and functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
6.5 The tensor product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
6.6 Anti-symmetric and symmetric tensors . . . . . . . . . . . . . . . . . . . . . . . . . . 179
6.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
In this chapter we study several useful constructions that yield a new vector
space based on given ones. We also study how inner products and linear
maps yield associated constructions.
6.1 The Cartesian product
Given vector spaces V1 , . . . , Vk over the same field F, the Cartesian product
vector space V1 × · · · × Vk is defined via
 
v1
 .. 
V1 × · · · × Vk = { .  : vi ∈ Vi , i = 1, . . . , k},
vk
     
v1 w1 v1 + w1
 ..   ..  ..
 .  +  .  :=  ,
 
.
vk wk vk + wk
147
and    
v1 cv1
c  ...  :=  ...  .
   
vk cvk
k
Clearly, one may view F as the Cartesian product F × · · · × F (where F
appears k times). Sometimes V1 × · · · × Vk is viewed as a direct sum
(V1 ×{0}×· · ·×{0})+̇({0}×V2 ×{0}×· · ·×{0})+̇ · · · · · · +̇({0}×· · ·×{0}×Vk ).
It is not hard to determine the dimension of a Cartesian product.
Proposition 6.1.1 Let V1 , . . . , Vk be finite-dimensional vector sapces. Then

dim(V1 × · · · × Vk ) = dimV1 + · · · + dimVk .
Proof. Let Bi be a basis for Vi , i = 1, . . . , k. Put

     
b1 0 0
0 b2   .. 
B = { .  : b1 ∈ B1 } ∪ { .  : b2 ∈ B2 } ∪ . . . ∪ {
     . 
 : bk ∈ Bk }.
 ..   ..   0
0 0 bk
It is easy to check that B is a basis for V1 × · · · × Vk .
When Vi has an inner product h·, ·ii , i = 1, . . . , k, then it is straightforward

to check that    
v1 w1 k
 ..   ..  X
h .  ,  . i := hvi , wi ii
vk wk i=1
defines an inner product on V1 × · · · × Vk . While this is the default way to

make an Pinner product on the Cartesian product, one can also take for
k
instance i=1 βi hvi , wi ii , where βi > 0, i = 1, . . . , k.
When Vi has a norm k · ki , i = 1, . . . , k, then there are infinitely many ways

to make a norm on V1 × · · · × Vk . For instance, one can take any p ≥ 1, and
put  
v1
v
u k
 ..  uX
k  .  kp := t p
kvi kpi .
vk i=1
It takes some effort to prove that this is a norm, and we will outline the
proof in Exercise 6.7.1. Also,
 
v1
 .. 
k  .  k∞ := max kvi k
i=1,...,k
vk
Constructing New Vector Spaces from Given Ones 149
defines a norm on the Cartesian product.
Finally, when Aij : Vj → Vi , 1 ≤ i, j ≤ k, are linear maps, then

 
A11 · · · A1k
A :=  ... ..  : V × · · · × V → V × · · · × V

.  1 k 1 k
Ak1 · · · Akk
defines a linear map via usual block matrix multiplication
   Pk 
v1 j=1 A 1j v j
A  ...  =  ..
   
.
 . 
Pk
vk j=1 Akj v j
A similar construction also works when Aij : Vj → Wi , 1 ≤ i ≤ l, 1 ≤ j ≤ k.

Then A = (Aij )li=1,j=1
k
acts V1 × · · · × Vk → W1 × · · · × Wl .
6.2 The quotient space
Let V be a vector space over F and W ⊆ V a subspace. We define the

relation ∼ via
v1 ∼ v2 ⇔ v1 − v2 ∈ W.
Then ∼ is an equivalence relation:
(i) Reflexivity: v ∼ v for all v ∈ V , since v − v = 0 ∈ W .

(ii) Symmetry: Suppose v1 ∼ v2 . Then v1 − v2 ∈ W . Thus
−(v1 − v2 ) = v2 − v1 ∈ W , which yields v2 ∼ v1 .
(iii) Transitivity: Suppose v1 ∼ v2 and v2 ∼ v3 . Then v1 − v2 ∈ W and
v2 − v3 ∈ W . Thus v1 − v3 = (v1 − v2 ) + (v2 − v3 ) ∈ W . This yields
v1 ∼ v3 .
As ∼ is an equivalence relation, it has equivalence classes, which we will

denote as v + W :
v + W := {v̂ : v ∼ v̂} = {v̂ : v − v̂ ∈ W } =
{v̂ : there exists w ∈ W with v̂ = v + w}.
Any member of an equivalence class is called a representative of the
equivalence class.
2
Example 6.2.1 Let V = R and W = Span{e1 }. Then the equivalence
v
class of v = 1 is the horizontal line through v. In this example it is
v2
simple to see how one would
add two equivalence classes. Indeed, to add the
0 0
horizontal line through to the horizontal line through , would
c d
0
result in the horizontal line through . Or, what is equivalent, to add
c + d
5 10
the horizontal line through to the horizontal line through , would
c d
15
result in the horizontal line through . Similarly, one can define scalar
c+d
multiplication for these equivalence classes. We give the general definition
below.
The set of equivalence classes is denoted by V /W :

V /W := {v + W : v ∈ V }.
We define addition and scalar multiplication on V /W via
(v1 + W ) + (v2 + W ) := (v1 + v2 ) + W, (6.1)
c(v + W ) := (cv) + W. (6.2)
These two operations are defined via representatives (namely, v1 , v2 , and v)
of the equivalence classes, so we need to make sure that if we had chosen
different representatives for the same equivalence classes, the outcome would
be the same. We do this in the following lemma.
Lemma 6.2.2 The addition and scalar multiplication on V /W as defined in

(6.1) and (6.2) are well-defined.
Proof. Suppose v1 + W = x1 + W and v2 + W = x2 + W . Then v1 − x1 ∈ W

and v2 − x2 ∈ W . As W is a subspace, it follows that
v1 + v2 − (x1 + x2 ) = v1 − x1 + v2 − x2 ∈ W.
Thus (v1 + v2 ) + W = (x1 + x2 ) + W follows.
Next, suppose v + W = x + W . Then v − x ∈ W and as W is a subspace, it

follows that
cv − cx = c(v − x) ∈ W.
Thus (cv) + W = (cx) + W .
It is now a straightforward (and tedious) exercise to show that V /W with

addition and scalar multiplication defined via (6.1) and (6.2), yields a vector
space, called the quotient space. Let us next determine its dimension.
Proposition 6.2.3 Let V be a finite-dimensional vector space and W a

subspace. Then
dimV /W = dimV − dimW.
Proof. Choose a basis {w1 , . . . , wl } for W , and complement this linearly

independent set with vectors {v1 , . . . , vk } in V , so that the resulting set
{w1 , . . . , wl , v1 , . . . , vk }
is a basis for V (see in Exercise 2.6.8 why this is always possible). We now
claim that
B = {v1 + W, . . . , vk + W }
is a basis for V /W , which then proves the proposition.
First, let us prove that B is a linearly independent set. Suppose that
c1 (v1 + W ) + · · · + ck (vk + W ) = 0 + W
(where we use the observation that 0 + W is the neutral element for addition
in V /W ). Then
c1 v1 + · · · + ck vk − 0 ∈ W.
Thus there exist d1 , . . . , dl so that
c1 v1 + · · · + ck vk = d1 w1 + · · · + dl wl .
This gives that
c1 v1 + · · · + ck vk + (−d1 )w1 + · · · + (−dl )wl = 0.
As {w1 , . . . , wl , v1 , . . . , vk } is a linearly independent set, we get that c1 =

· · · = ck = d1 = . . . = dl = 0. Thus, in particular c1 = · · · = ck = 0, yielding
that B is a linearly independent set.
Next, we need to show that B spans V /W . Let v + W ∈ V /W . As v ∈ V ,

there exist c1 , . . . , ck , d1 , . . . , dl ∈ F so that
v = c1 v1 + · · · + ck vk + d1 w1 + · · · + dl wl .
But then
v − (c1 v1 + · · · + ck vk ) = d1 w1 + · · · + dl wl ∈ W,
and thus
v + W = (c1 v1 + · · · + ck vk ) + W = c1 (v1 + W ) + · · · + ck (vk + W ).

In case a finite-dimensional vector space V has an inner product h·, ·i, then
the spaces V /W and
W ⊥ = {v ∈ V : hv, wi = 0 for every w ∈ W }
are isomorphic. This follows immediately from a dimension count (see
Exercise 5.7.4), but let us elaborate and provide the explicit isomorphism in
the proof below.
Proposition 6.2.4 Let V be a finite-dimensional vector space with an inner

product h·, ·i, and let W ⊆ V be a subspace. Then V /W and W ⊥ are
isomorphic.
Proof. Let T : W ⊥ → V /W be defined by T (v) = v + W . We claim that T is

an isomorphism. Clearly, T is linear. Next, let v ∈ W ⊥ be so that
v + W = 0 + W . Then v = v − 0 ∈ W . We also have that v ∈ W ⊥ , and thus
v is orthogonal to itself: hv, vi = 0. But then v = 0 follows, and thus
KerT = {0}. As the dimensions of V /W and W ⊥ are the same, we also
obtain that T is onto.
When V has a norm k · k, we say that a subset W ⊆ V is closed with respect

to the norm k · k, if wn ∈ W , n ∈ N, and limn→∞ kwn − vk implies that
v ∈ W . In finite-dimensional vector spaces all subspaces are closed, as the
following proposition shows.
Proposition 6.2.5 If V is a finite-dimensional vector space with a norm

k · k, and W is a subspace, then W is closed.
Proof. Let {v1 , . . . , vk } be a basis for W , and extend it to a basis

B = {v1 , . . . , vn } for V . Define the norm k · kV via
kvkV = k[v]B kE ,
Pn
where k · kE ispthe Euclidean norm on Fn . In other words, if v = i=1 ci vi ,
Pn 2
then kvkV = i=1 |ci | .
Let w(m) , m = 1, 2, . . ., be vectors in W , and suppose that

limm→∞ kw(m) − vk = 0 for some v ∈ V . By Theorem 5.1.25 any two norms
on a finite-dimensional space are equivalent, thus we also have
limm→∞ kw(m) − vkV = 0. We need to prove that v ∈ W . As {v1 , . . . , vk } is
(m) (m)
a basis for W , we obtain w(m) = c1 v1 + · · · + ck vk for some scalars
(m) (m) Pn
c1 , . . . , ck . In addition, we have v = i=1 ci vi for some scalars ci . Then
for j = k + 1, . . . , n we observe
k n
(m)
X X
|cj |2 ≤ |ci − ci |2 + |ci |2 = kw(m) − vk2V ,
i=1 i=k+1
and thus |cj | ≤ limm→∞ kw(m) − vkV = 0. Consequently,

ck+1 = · · · = cn = 0, yielding that v ∈ W .
The following example shows that in infinite dimensions, not all subspaces
are closed.
Example 6.2.6 Let

∞
X
V = {x = (xj )∞
j=1 : kxkV := |xj | < ∞}.
j=1
Thus V consists of vectors with infinitely many entries whose absolute values
P∞ 2
have a finite sum. As an example, since j=1 j12 (= π6 ) < ∞,
1 1 1 1
v = (1, , , , . . .) = ( 2 )∞ ∈ V.
4 9 16 j j=1
The addition and scalar multiplication are defined entrywise. Thus
(xj )∞ ∞ ∞ ∞ ∞
j=1 + (yj )j=1 = (xj + yj )j=1 , c(xj )j=1 = (cxj )j=1 .
With these definitions, V is a vector space and k · kV is a norm on V . Let now
W = {x = (xj )∞
j=1 ∈ V : only finitely many xj are nonzero}.
It is clear that W is closed under addition and scalar multiplication, and

that 0 = (0, 0, . . .) ∈ W . Thus W is a subspace. Moreover, if we let
1 1
vk = (1, , . . . , 2 , 0, 0, . . .),
4 k
then vk ∈ W , k ∈ N. Also, limk→∞ kvk − vkV = 0, where v is as above.
However, v 6∈ W , and thus W is not closed with respect to the norm k · kV .
When V has a norm k · kV and the subspace W ⊆ V is closed with respect to

k · kV , one defines a norm on V /W as follows:
kv + W k = inf kv + wkV . (6.3)

w∈W
Let us show that this is indeed a norm.
Proposition 6.2.7 Let V have a norm k · kV and let the subspace W ⊆ V

be closed with respect to k · kV . Then k · k defined via (6.3) defines a norm on
V /W .
Proof. Clearly kv + W k ≥ 0 for all v + W ∈ V /W . Next, suppose that

kv + W k = 0. Then inf w∈W kv + wkV = 0, and thus for every n ∈ N, there
exists a wn ∈ W so that kv + wn kV < n1 . Thus limn→∞ kv − (−wn )kV = 0,
and since −wn ∈ W , we use that W is closed to conclude that v ∈ W . But
then v + W = 0 + W , taking care of the first property of a norm.
Next,
kc(v + W )k = inf kcv + wkV = inf kc(v + ŵ)kV =
w∈W ŵ∈W
inf |c|kv + ŵkV = |c|kv + W k.

ŵ∈W
Finally, for the triangle inequality let v + W and v̂ + W be in V /W . We

show that for every > 0 we can find w ∈ W so that
kv + v̂ + wkV ≤ kv + W k + kv̂ + W k + . (6.4)
Indeed, let w1 be so that kv + w1 kV ≤ kv + W k + 2 and let w2 be so that

kv̂ + w2 kV ≤ kv̂ + W k + 2 . Put w = w1 + w2 , and then (6.4) holds. As
was arbitrary, we obtain that
k(v+W )+(v̂+W )k = k(v+v̂)+W k = inf kv+v̂+wkV ≤ kv+W k+kv̂+W k.

w∈W
Next, we see how a linear map A : V → V̂ induces a map acting

V /W → V̂ /Ŵ , provided A[W ] := {Aw : w ∈ W } is a subset of Ŵ .
Proposition 6.2.8 Let A : V → V̂ be linear, and suppose that W is a

subspace of V , Ŵ a subspace V̂ , so that A[W ] ⊆ Ŵ . Then
A∼ (v + W ) := Av + Ŵ defines a linear map A∼ : V /W → V̂ /Ŵ .
Proof. We need to check that if v + W = x + W , then Av + Ŵ = Ax + Ŵ .

As v − x ∈ W , we have that A(v − x) ∈ Ŵ , and thus Av + Ŵ = Ax + Ŵ
follows. This makes A∼ well-defined. The linearity of A∼ is straightforward
to check.
Typically, the induced map A∼ is simply denoted by A again. While this is a

slight abuse of notation, it usually does not lead to any confusion. We will
adopt this convention as well.
The techniques introduced in this section provide a useful way to look at the
Jordan canonical form. Let us return to Theorem 4.2.1 and have a nilpotent
A ∈ Fn . The crucial subspaces of Fn here are
Wj := KerAj , j = 0, . . . , n,
as we observed before. We have
{0} = W0 ⊆ W1 ⊆ · · · ⊆ Wn = Fn .
In addition, the following holds.
Proposition 6.2.9 We have that A[Wj+1 ] ⊆ Wj . Moreover, the induced

map
A : Wj+1 /Wj → Wj /Wj−1
is one-to-one.
Proof. Let x ∈ Wl+1 . Then Al+1 x = 0. Thus Al (Ax) = 0, yielding that

Ax ∈ Wl . Thus with V = Wj+1 , V̂ = Wj = W , Ŵ = Wj−1 , we satisfy the
conditions of Proposition 6.2.8, and thus the induced map
A : Wj+1 /Wj → Wj /Wj−1 is well-defined.
Next, suppose that x + Wj ∈ Wj+1 /Wj is so that A(x + Wj ) = 0 + Wj−1 .

Then Ax ∈ Wj−1 , and thus 0 = Aj−1 (Ax) = Aj x. This gives that x ∈ Wj ,
and thus x + Wj = 0 + Wj . This proves that A : Wj+1 /Wj → Wj /Wj−1 is
one-to-one.
We let wj = dimWj /Wj−1 . As a consequence of Proposition 6.2.9 we have

(j+1) (j+1)
that a when Bj+1 = {b1 + Wj , . . . , bwj+1 + Wj } is a basis for Wj+1 /Wj ,
then
(j+1)
{Ab1 + Wj−1 , . . . , Ab(j+1)
wj+1 + Wj−1 }
is a linearly independent set in Wj /Wj−1 . This set can be complemented by

vectors {xj,1 + Wj−1 , . . . , xj,sj + Wj−1 }, where sj = wj − wj+1 , so that
(j+1)
Bj := {xj,1 + Wj−1 , . . . , xj,sj + Wj−1 , Ab1 + Wj−1 , . . . , Ab(j+1)
wj+1 + Wj−1 }
is a basis for Wj /Wj−1 . Starting with a basis for Wn /Wn−1 and repeating
the iteration outlined in this paragraph, one ultimately arrives at bases Bj
for Wj /Wj−1 , j = 1, . . . , n. Picking the specific representatives of these basis
elements (thus by taking the vector x when x + Wj−1 appears in Bj ), one
arrives at the desired basis for Fn giving the Jordan canonical form of A.
These observations form the essence of the construction in the proof of
Theorem 4.2.1.
A scenario where the quotient space shows up, is in the case we have a
vector space V with a Hermitian form [·, ·] that satisfies [v, v] ≥ 0 for all
v ∈ V . Such a Hermitian form is sometimes called a pre-inner product. It is
not an inner product as [x, x] = 0 does not necessarily imply x = 0, but all
the other rules of an inner product are satisfied. The following example is
the type of setting where this may occur.
Example 6.2.10 Let
V = {f : [0, 1] → R : f is continuous except at a finite number of points}.
Define Z 1
[f, g] := f (t)g(t)dt.
0
R1
Then [·, ·] is a Hermitian form and [f, f ] = 0 f (t)2 dt ≥ 0. However, there
are nonzero functions f in V so that [f, f ] = 0; for instance,
(
0 if x 6= 12 ,
f (x) =
1 if x = 12 ,
satisfies [f, f ] = 0. Thus [·, ·] is a pre-inner product, but not an inner product.
So, what prevents a pre-inner product [·, ·] from being an inner product, is
that W := {v ∈ V : [v, v] = 0} contains nonzero elements. It turns out that
this set W is a subspace.
Lemma 6.2.11 Let the vector space V over F = R or C, have an pre-inner

product [·, ·]. Then W = {v ∈ V : [v, v] = 0} is a subspace.
Proof. Let x, y ∈ W . As [·, ·] is a pre-inner product, we have that for all

c ∈ F the inequality [x + cy, x + cy] ≥ 0 holds. Thus
0 ≤ [x + cy, x + cy] = [x, x] + c[y, x] + c̄[x, y] + |c|2 [y, y] = 2Re(c[y, x]).
By choosing c = −[y, x], we get that −|[y, x]|2 ≥ 0, and thus [y, x] = 0. But
then it follows that x + y ∈ W , proving that W is closed under addition.
Since 0 ∈ W and W is clearly closed under scalar multiplication, we obtain

that W is a subspace.
By considering the vector space V /W we can turn a pre-inner product into

an inner product, as we see next.
Proposition 6.2.12 Let the vector space V over F = R or C, have an

pre-inner product [·, ·]. Let W be the subspace W = {v ∈ V : [v, v] = 0}, and
define h·, ·i on V /W via
hx + W, y + W i := [x, y].
Then h·, ·i defines an inner product on V /W .

Proof. First we need to show that h·, ·i is well-defined. Assume that

x + W = x̂ + W and let us show that
hx + W, y + W i = hx̂ + W, y + W i. (6.5)
We have x − x̂ ∈ W . As [·, ·] satisfies the Cauchy–Schwarz inequality (see

Remark 5.1.11) we have that
|[x − x̂, y]|2 ≤ [x − x̂, x − x̂][y, y] = 0,
since x − x̂ ∈ W . Thus (6.5) follows. Similarly, when y + W = ŷ + W , we

have hx̂ + W, y + W i = hx̂ + W, ŷ + W i. But then, when x + W = x̂ + W
and y + W = ŷ + W , we find that hx + W, y + W i = hx̂ + W, y + W i =
hx̂ + W, ŷ + W i, showing that h·, ·i is well-defined.
That h·, ·i defines a pre-inner product on V /W is easily checked, so let us

just address the definiteness property. Assume that hx + W, x + W i = 0.
Then [x, x] = 0, and thus x ∈ W . This gives that x + W = 0 + W , which is
exactly what we were after.
Getting back to Example 6.2.10, studying V /W instead of V , means that we

are identifying functions whose values only differ in a finite number of
points. In a setting of a vector space consisting of function, and where the
interest lies in taking integrals, this is a common feature. In a Functional
Analysis course this idea will be pursued further.
6.3 The dual space
Let V be a vector space over the field F. We call a linear map f : V → F

that takes values in the underlying field, a linear functional. Linear
functionals, as all function with values in a field, allow for addition among
them, as well as scalar multiplication:
(f + g)(v) := f (v) + g(v), (cf )(x) := cf (x).
With these operations the linear functions form a vector space V 0 , the dual
space of V . Thus
V 0 = {f : V → F : f is linear}.
The first observation is that the dual space of a finite-dimensional space V

has the same dimension as V .
Proposition 6.3.1 Let V be a finite-dimensional space, and V 0 be its dual

space. Then
dimV = dimV 0 .
When {v1 , . . . , vn } is a basis for V , then a basis for V 0 is given by
{f1 , . . . , fn }, where fj ∈ V 0 , j = 1, . . . , n, is so that
(
0 if k = 6 j,
fj (vk ) =
1 if k = j.
The basis {f1 , . . . , fn } above is called the dual basis of {v1 , . . . , vn }.

Pn
Proof. When v = k=1 ck vk , then fj (v) = cj , yielding a well-defined linear
functional on V . Let us show that {f1 , . . . , fn } is linearly independent. For
this, suppose that d1 f1 + · · · + dn fn = 0. Then
Xn n
X
0 = 0(vk ) = ( dj fj )(vk ) = dj fj (vk ) = dk , k = 1, . . . , n,
j=1 j=1
showing linear independence.
Next, we need to show that Span{f1 , . . . , fn } = V 0 , so let f ∈ V 0 be

arbitrary. We claim that
f = f (v1 )f1 + · · · + f (vn )fn . (6.6)
Indeed, for k = 1, . . . , n, we have that

n
X
f (vk ) = f (vk )fk (vk ) = f (vj )fj (vk ).
j=1
Thus the functionals in the left- and right-hand sides of (6.6) coincide on the
basis elements vk , k = 1, . . . , n. But then, by linearity, the functionals in the
left- and right-hand sides of (6.6) coincide for all v ∈ V .
When h·, ·i is an inner product, then for a fixed v ∈ V , the function

fv = h·, vi defined via
fv (x) = hx, vi
is a linear functional; that is, fv ∈ V 0 . In the case of finite-dimensional inner
product vector spaces, these functionals fv comprise all of V 0 .
Theorem 6.3.2 (Riesz representation theorem) Let V be a

finite-dimensional vector space with inner product h·, ·i. Then for every
f ∈ V 0 there exists a v ∈ V so that f = fv ; that is, f (x) = hx, vi, for all
x ∈ V . Moreover, we have that
kfv kV 0 := sup |fv (x)| = kvkV ,

kxkV ≤1
p
where kxkV = hx, xi.
Proof. Let B = {e1 , . . . , en } be an orthonormal basis for V .PGiven f ∈ V 0 , let

n
v = f (e1 )e1 + · · · + f (en )en . Then f = fv . Indeed, if x = j=1 cj ej , then
Xn n
X n
X n
X
fv (x) = h cj ej , f (ek )ek i = ck f (ek ) = f ( ck ek ) = f (x).
j=1 k=1 k=1 k=1
Next, suppose that kxkV ≤ 1. Then, by the Cauchy–Schwarz inequality (5.1),

p p
|fv (x)| = |hx, vi| ≤ hx, xi hv, vi = kxkV kvkV ≤ kvkV .
As for v 6= 0,
1 hv, vi
|fv ( v)| = = kvkV ,
kvkV kvkV
we obtain that kfv kV 0 = kvkV (an equality that trivially holds for v = 0 as
well).
One may define a map Φ : V → V 0 via
Φ(v) = fv = h·, vi. (6.7)
Notice that
Φ(v + v̂) = fv+v̂ = fv + fv̂ = Φ(v) + Φ(v̂), (6.8)
and
Φ(cv) = fcv = cfv = cΦ(v). (6.9)
Thus, when the underlying field is C, the map Φ is not linear, due to the
complex conjugate showing up in (6.9). A map Φ satisfying
Φ(v + v̂) = Φ(v) + Φ(v̂), Φ(cv) = cΦ(v)
is called a conjugate linear map. Thus, for a finite-dimensional vector space,

the map Φ defined in (6.7) is a bijective conjugate linear map. Moreover,
kΦ(v)kV 0 = kvkV , so Φ also has an isometry property. For
infinite-dimensional, so-called, Hilbert spaces, the same result is true
(provided we only bounded linear functionals), but this requires more
analysis results than we are ready to address here. The following example
shows that in the infinite-dimensional case, one indeed needs to proceed with
caution.
Example 6.3.3 Let V = {f : [0, 1] → R : f is continuous}, and

Z 1
hf, gi := f (t)g(t)dt,
0
which defines an inner product on V . Let L : V → R be defined by

L(f ) = f (0). Then L ∈ V 0 . However, there is no function g ∈ V so that
Z 1
f (0) = f (t)g(t)dt for all f ∈ V. (6.10)
0
Indeed, if (6.10) holds then by Cauchy–Schwarz,

p p
|L(f )| = |hf, gi| ≤ hf, f i hg, gi for all f ∈ V. (6.11)
For n ∈ N we define the function fn ∈ V via

(√
n − n2 t if 0 ≤ t ≤ n1 ,
fn (t) =
0 if n1 ≤ t ≤ 1.
√
Then L(fn ) = n and
1
1
t2 t= n1
Z Z n 1 1
hfn , fn i = fn (t)2 dt = n − n2 tdt = 1 − n2
|t=0 = 1 − = .
0 0 2 2 2
p √
If (6.10) holds we would need by (6.11) that hg, gi ≥ 2n for all n ∈ N,
which is clearly impossible as hg, gi is a real number that does not depend on
n.
When V has a norm k · kV , we define k · kV 0 on V 0 via
kf kV 0 := sup |f (x)|, f ∈ V 0 .
kxkV ≤1
If the supremum is finite we say that f is a bounded functional. As we will

see, in finite dimensions every linear functional is bounded. However, as the
previous example shows, this is not true in infinite dimensions. We therefore
introduce
0
Vbdd = {f ∈ V 0 : kf kV 0 < ∞} = {f ∈ V 0 : f is bounded}.
Proposition 6.3.4 Let V have a norm k · kV . Then k · kV 0 defined above is

0
a norm on the vector space Vbdd . When dimV < ∞, then V 0 = Vbdd0
.
0
Proof. First suppose that f, g ∈ Vbdd , thus kf kV 0 , kgkV 0 < ∞. Then
kf + gkV 0 = sup |(f + g)(x)| ≤ sup |f (x)| + |g(x)| ≤

kxkV ≤1 kxkV ≤1
sup |f (x)| + sup |g(x)| = kf kV 0 + kgkV 0 ,

kxkV ≤1 kxkV ≤1
0
and thus f + g ∈ Next, kcf kV 0 = |c|kf kV 0 follows immediately by using
Vbdd .
0
the corresponding property of k · kV . Thus Vbdd is closed under scalar
0
multiplication. As the zero functional also belongs to Vbdd , we obtain that
0
Vbdd is a vector space.
To show that kf kV 0 is a norm, it remains to show that item (i) in the

definition of a norm is satisfied. Clearly, kf kV 0 ≥ 0. Next, if kf kV 0 = 0, then
|f (x)| = 0 for all kxk ≤ 1. Thus f (x) = 0 for all kxk ≤ 1, and thus by scaling
f (x) = 0 for all x ∈ V .
In the case that dimV = n < ∞, we may choose a basis in V and identify V
with Fn . Defining the standard inner product on Fn , we obtain also an inner
product h·, ·i on V . Using Theorem 6.3.2 we obtain that for every f ∈ V 0 we
have that
sup |f (x)| < ∞,
hx,xi≤1
p
as f = fv for some v ∈ V and suphx,xi≤1 |fv (x)| ≤ hv, vi (by the
p
Cauchy–Schwarz inequality). Using Theorem 5.1.25, we have that h·, ·i and
k · kV are equivalent norms. From this kf kV 0 < ∞ now easily follows.
If A : V → W is a linear map, then the induced map A0 : W 0 → V 0 is given

by
A0 g = f, where f (v) = g(Av).
Note that indeed g acts on elements of W while f acts on elements of V . We
show next that if the matrix representation of A with respect to some bases
is B, then the matrix representation of A0 with respect to the corresponding
dual bases is B T , the transpose of B.
Proposition 6.3.5 Let A : V → W be linear and let B and C be bases for V

and W , respectively. Let B 0 and C 0 be the dual bases of B and C, respectively.
Then
([A]C←B )T = [A0 ]B0 ←C 0 .
Proof. Let us denote B = {b1 , . . . , bn }, C = {c1 , . . . , cm }, B 0 = {f1 , . . . , fn },

C = {g1 , . . . , gn }. Also let
B = (bij )m n
i=1,j=1 = [A]C←B .
Pn
Let us compute A0 gk . For v = l=1 dl bl we have
Xn Xn Xn
A0 gk (v) = A0 gk ( dl bl ) = gk (A( dl bl )) = gk ( dl Abl ) =
l=1 l=1 l=1
Xn n
X n
X n
X n
X
gk ( dl ( bil ci )) = dl ( bil gk (ci )) = dl bkl .
l=1 i=1 l=1 i=1 l=1
Pn
Observing that dl = fl ( i=1 dj bj ) = fl (v), we thus obtain that
n
X
A0 gk (v) = bkl fl (v) for all v ∈ V.
l=1
Consequently,
n
X
A0 gk = bkl fl ,
l=1
and thus the kth column of [A0 ]B0 ←C 0 equals

 
bk1
 .. 
 . ,
bkn
which is the transpose of the kth row of B.
As V 0 is a vector space, we can study its dual space
V 00 = {E : V 0 → F : E linear},
also referred to as the double dual of V . One way to generate an element of

V 00 is to introduce the evalution map Ev at v ∈ V as follows:
Ev (f ) = f (v).
Clearly, Ev (f + g) = Ev (f ) + Ev (g) and Ev (cf ) = cEv (f ), and thus Ev is

indeed linear. In case V is finite dimensional, we have that every element of
V 00 corresponds to an evaluation map.
Proposition 6.3.6 Let V be finite dimensional, and consider the map

Φ : V → V 00 defined by
Φ(v) = Ev .
Then Φ is an isomorphism.
Proof. First we observe that

Ev+w (f ) = f (v + w) = f (v) + f (w) = Ev (f ) + Ew (f ) and
Ecv (f ) = f (cv) = cf (v) = cEv (f ). Thus Φ is linear.
As dimV = dimV 0 = dimV 00 , it suffices to show that Φ is one-to-one.

Suppose that v 6= 0. Then we can choose a basis B = {v, v2 , . . . , vn } of V
(where dimV = n). Let now f ∈ V 0 be so that f (v) = 1 and f (vj ) = 0,
6 0. This shows that

j = 2, . . . , n. Then Ev (f ) = f (v) = 1, and thus Ev =
v 6= 0 yields that Φ(v) 6= 0. Thus Φ is one-to-one.
The notion of a dual space is useful in the context of optimization. For

instance, let  
f1 (t)
f : R → Rn , f (t) =  ... 
 
fn (t)
be a differentiable function. With the Euclidean norm on Rn we have that
d d
kf (t)k2 = (f1 (t)2 + · · · + fn (t)2 ) = 2(f10 (t)f1 (t) + · · · + fn0 (t)fn (t)) =
dt dt
 
f1 (t)
2 f10 (t) · · · fn0 (t)  ...  .
 
fn0 (t)
The row vector
∇f (t) = f10 (t) · · · fn0 (t)

is called the gradient of f at t. In a more general setting, where f : F → V , it

turns out that viewing ∇f (t) as an element of the dual space (or,
equivalently, viewing ∇f as a function acting F → V 0 ) is a natural way to
develop a solid theory.
While we focused in the section on the vector space of linear functionals, one
can, in more generality, study the vector space
L(V, W ) = {T : V → W : T is linear},
with the usual definition of adding linear maps and multiplying them with a
scalar. In finite dimensions, we have seen that after choosing bases B and C
in V and W , respectively, every linear map T : V → W is uniquely identified
by its matrix representation [T ]C←B . Using this, one immediately sees that
dimL(V, W ) = (dimV )(dimW ).
The main item we would like to address here is when V and W have norms
k · kV and k · kW , respectively. In this case there is a natural norm on
L(V, W ), as follows:
kT kL(V,W ) := sup kT (v)kW . (6.12)

kvkV =1
When V and W are finite dimensional, this supremum is always finite and
thus kT kL(V,W ) is a nonnegative real number. We say that k · kL(V,W ) is the
induced operator norm, as its definition relies on the norms on V and W and
on the property of T as a linear operator.
Proposition 6.3.7 . Let V and W be finite-dimensional vector spaces with

norms k · kV and k · kW , respectively. Then k · kL(V,W ) defines a norm on
L(V, W ). In addition, for every v ∈ V , we have that
kT (v)kW ≤ kT kL(V,W ) kvkV . (6.13)
Proof. Since V and W are finite dimensional, the set {T v : kvkV = 1} is a

compact set, and thus k · kW attains a maximum on this set. This gives that
the supremum in (6.12) is in fact a maximum, and is finite. Next, clearly
kT kL(V,W ) ≥ 0. Next, suppose that kT kL(V,W ) = 0. This implies that for
every v ∈ V with vkV = 1, we have that kT (v)kW = 0, and thus T (v) = 0.
But then T = 0.
When c ∈ F, we have that
kcT kL(V,W ) = sup kcT (v)kW = sup |c|kT (v)kW = |c|kT kL(V,W ) .
kvkV =1 kvkV =1
Next, note that for T1 , T2 ∈ L(V, W ), we have that
k(T1 + T2 )(v)kW = kT1 (v) + T2 (v)kW ≤ kT1 (v)kW + kT2 (v)kW .
Using this it is straightforward to see that
kT1 + T2 kL(V,W ) ≤ kT1 kL(V,W ) + kT2 kL(V,W ) .
v
6 0, then
Finally, if v = kvkV has norm 1, and thus
v
kT ( )kW ≤ kT kL(V,W ) .
kvkV
Multiplying both sides with kvkV , and using the norm properties, yields
(6.13). When v = 0, then (6.13) obviously holds as well.
Example 6.3.8 Let T : Cn → Cm be the linear map given by multiplication

with the matrix A = (aij )m n
i=1,j=1 . Let the norm on both V and W be given
by k · k1 , as in Example 5.1.14. Then
kT kL(V,W ) = max |a1j | + · · · + |amj |. (6.14)

j
Indeed, if we take ej ∈ Cn , which is a unit vector in the k · k1 norm, then

T (ej ) = (aij )ni=1 , and thus we find
kT (ej )kW = kT (ej )k1 = |a1j | + · · · |amj |.
Thus the inequality ≥ holds in (6.14). To prove the other inequality, we

Pn Pn
observe that for x = j=1 xj ej with j=1 |xj | = 1, we have that kT (x)kW
equals
n
X n
X
k xj T (ej )kW ≤ |xj |kT (ej )kW ≤
j=1 j=1
n
X
( |xj |)( max kT (ej )kW ) = max |a1j | + · · · + |amj |.
j=1,...,n j
j=1
Example 6.3.9 Let T : Cn → Cm be the linear map given by multiplication

with the matrix A = (aij )m n
i=1,j=1 . Let the norm on both V and W be given
by the Euclidean norm k · k2 . Then
kT kL(V,W ) = σ1 (A). (6.15)
This was already observed in Proposition 5.6.2.
When the vector spaces are not finite dimensional, it could happen that a
linear map does not have a finite norm. When this happens, we say that the
linear map is unbounded. A typical example of an unbounded linear map is
taking the derivative. We provide the details next.
Example 6.3.10 Let
V = {f : (0, 1) → R : f is bounded and differentiable with f 0 bounded}
and
W = {f : (0, 1) → R : f is bounded}.
On both spaces let
kf k∞ = sup |f (t)|
t∈(0,1)
be the norm. Note that f being bounded means exactly that kf k∞ < ∞. Let
d
T = dt : V → W be the differentiation map. Then T is linear. Let now
fn (t) = tn , n ∈ N. Then kfn k∞ = 1 for all n ∈ N. However,
(T fn )(t) = fn0 (t) = ntn−1 has the norm equal to kT fn k∞ = n, n ∈ N. Thus,
it follows that
sup kT f k∞ ≥ kT fn k∞ = n
kf k∞ =1
for all n ∈ N, and thus T is unbounded.
We end this section with the following norm of a product inequality.

Proposition 6.3.11 Let V, W and X be finite-dimensional vector spaces

with norms k · kV , k · kW , and k · kX , respectively. Let T : V → W and
S : W → X be linear maps. Then
kST kL(V,X) ≤ kSkL(W,X) kT kL(V,W ) . (6.16)
Proof. Let v ∈ V with kvkV = 1. By (6.13) applied to the vector T (v) and
the map S we have that
kS(T (v))kX ≤ kSkL(W,X) kT (v)kW .
Next we use (6.13) again, and obtain that
kS(T (v))kX ≤ kSkL(W,X) kT (v)kW l ≤
kSkL(W,X) kT kL(V,W ) kvkV = kSkL(W,X) kT kL(V,W ) .

Thus kSkL(W,X) kT kL(V,W ) is an upper bound for kS(T (v))kX for all unit
vectors v in V , and therefore the least upper bound is at most
kSkL(W,X) kT kL(V,W ) .
6.4 Multilinear maps and functionals
Let V1 , . . . , Vk , W be vector spaces over a field F. We say that a function
φ : V1 × · · · × Vk → W
is multilinear if the function is linear in each coordinate. Thus, for each

i ∈ {1, . . . , k}, if we fix vj ∈ Vj , j 6= i, we require that the map
u 7→ φ(v1 , . . . , vi−1 , u, vi+1 , . . . , vn )
is linear. Thus
φ(v1 , . . . , vi−1 , u + û, vi+1 , . . . , vn ) = φ(v1 , . . . , vi−1 , u, vi+1 , . . . , vn )+
φ(v1 , . . . , vi−1 , û, vi+1 , . . . , vn )

and
φ(v1 , . . . , vi−1 , cu, vi+1 , . . . , vn ) = cφ(v1 , . . . , vi−1 , u, vi+1 , . . . , vn ).
When W = F we call φ a multilinear functional. When k = 2, we say that φ

is bilinear.
 
x1
k  .. 
Example 6.4.1 Let Φ : F → F be defined by φ  .  = x1 x2 · · · xk . Then
xk
φ is a multilinear functional.
Example 6.4.2 Let φ : Fk × · · · × Fk → F be defined by

Φ(v1 , . . . , vk ) = det v1 · · · vk .
Then Φ is a multilinear functional.
Example 6.4.3 Let Φ : R3 × R3 → R3 be defined by

         
x1 y1 x2 y3 − x3 y2 x1 y1
Φ(x2  , y2 ) = x3 y1 − x1 y3  =: x2  × y2  . (6.17)
x3 y3 x1 y3 − x3 y1 x3 y3
Then Φ is a bilinear map, which corresponds to the so-called cross product in
R3 . Typically, the cross product of x and y is denoted as x × y.
Example 6.4.4 Given matrices Aj ∈ Fnj ×mj , j = 0, . . . , k. Define

Φ : Fm0 ×n1 × · · · × Fmk−1 ×nk → Fn0 ×mk ,
Φ(X1 , . . . , Xk ) = A0 X1 A1 X2 A2 · · · Ak−1 Xk Ak .
Then Φ is a multilinear map.
If we let
M = {φ : V1 × · · · × Vk → W : φ is multilinear},
then by usual addition and scalar multiplication of functions, we have that
M is a vector space over F. When the vector spaces V1 , . . . , Vk have inner
products, h·, ·i1 , . . . , h·, ·ik , respectively, then for fixed u1 ∈ V1 , . . . , uk ∈ Vk
the map
k
Y
φu1 ,...,uk (v1 , . . . , vk ) := hv1 , u1 i1 · · · hvk , uk ik = hvj , uj ij
j=1
is a multilinear functional acting V1 × · · · × Vk → F. Notice, that due to the

Cauchy–Schwarz inequality, we have
k
Y k
Y k
Y
|φu1 ,...,uk (v1 , . . . , vk )| = |hvj , uj ij | ≤ kuj kj kvj kj . (6.18)
j=1 j=1 j=1
In finite dimensions, any multilinear functional is a linear combination of

φu1 ,...,uk , u1 ∈ V1 , . . . , uk ∈ Vk , as we will now see.
Proposition 6.4.5 Let V1 , . . . , Vk be finite-dimensional vector spaces with

inner products h·, ·i1 , . . . , h·, ·ik , respectively. Then every multilinear
functional on V1 × · · · × Vk is a linear combination of multilinear functionals
φu1 ,...,uk , where u1 ∈ V1 , . . . , uk ∈ Vk .
Proof. Let φ be a multilinear functional on V1 × · · · × Vk , and let

(j) (j)
{e1 , . . . , enj } be an orthonormal basis for Vj , j = 1, . . . , k. Writing
Pnj (j) (j)
vj = r=1 hvj , er ij er , we obtain that
n1
X nk
X
φ(v1 , . . . , vk ) = ··· hv1 , e(1) (k) (1) (k)
r1 i1 · · · hvk , erk ik φ(er1 , . . . , erk ).
r1 =1 rk =1
Thus φ is a linear combination of φe(1) ,...,e(k) , rj = 1, . . . , nj , j = 1, . . . , k.

r1 rk
When k · kj is a norm on Vj , j = 1, . . . , k, and k · kW a norm on W , then we

say that φ is bounded if
sup kφ(v1 , . . . , vk )kW < ∞.

kv1 k1 ≤1,...,kvk kk ≤1
Similar to the proof of Proposition 6.3.4, one can show that if V1 , . . . , Vk are
finite dimensional and W = F, then φ is automatically bounded. Indeed, if
the norms come from inner products, one can use Proposition 6.4.5 and
(6.18) to see that φ is bounded. Next, using that on finite-dimensional
spaces any two norms are equivalent, one obtains the boundedness with
respect to any norms on V1 , . . . , Vk .
For a detailed study of multilinear functionals, it is actually useful to

introduce tensor products. We will do this is the next section.
6.5 The tensor product
Given two vector spaces V1 and V2 over a field F, we introduce a tensor

product ⊗ : V1 × V2 → V1 ⊗ V2 with the properties
(x + y) ⊗ v = x ⊗ v + y ⊗ v for all x, y ∈ V1 , v ∈ V2 , (6.19)
x ⊗ (v + w) = x ⊗ v + x ⊗ w for all x ∈ V1 , v, w ∈ V2 , (6.20)

and
(cx) ⊗ v = c(x ⊗ v) = x ⊗ (cv) for all x ∈ V1 , v ∈ V2 , c ∈ F. (6.21)

The set V1 ⊗ V2 is defined by

m
X
V1 ⊗ V2 = {0} ∪ { cj (xj ⊗ vj ) : m ∈ N0 , cj ∈ F, xj ∈ V1 , vj ∈ V2 },
j=1
where we say that two elements in V1 ⊗ V2 are equal, if by applying rules

(6.19)–(6.21) one element can be converted into the other.
Example 6.5.1 We have that the elements

(x1 + x2 ) ⊗ (v1 + v2 ) − 2(x1 ⊗ v2 )
and
(x1 − x2 ) ⊗ (v1 − v2 ) + 2(x2 ⊗ v1 )
are equal. Indeed, applying rules (6.19)–(6.21), we get
(x1 + x2 ) ⊗ (v1 + v2 ) − 2(x1 ⊗ v2 ) = x1 ⊗ v1 + x2 ⊗ v1 − x1 ⊗ v2 + x2 ⊗ v2
and
(x1 − x2 ) ⊗ (v1 − v2 ) + 2(x2 ⊗ v1 ) = x1 ⊗ v1 + x2 ⊗ v1 − x1 ⊗ v2 + x2 ⊗ v2 .
Pm
It is convenient to allow m = 0 in the expression j=1 cj (xj ⊗ vj ), in which
case the sum should just be interpreted as 0. We define addition and scalar
multiplication on V1 ⊗ V2 by
m
X l
X l
X
cj (xj ⊗ vj ) + cj (xj ⊗ vj ) = cj (xj ⊗ vj ),
j=1 j=m+1 j=1
and
m
X m
X
d cj (xj ⊗ vj ) = (dcj )(xj ⊗ vj ).
j=1 j=1
With these operations, one can easily check that V1 ⊗ V2 is a vector space.
An element of the form x ⊗ v is called a simple tensor. In general, the
elements of V1 ⊗ V2 are linear combinations of simple tensors. This definition
of the tensor product of two vector spaces is perhaps the most abstract
notion in this book. The elements of this space are just sums of a set of
symbols, and then we have equality when we can convert one sum to the
other by using the rules (6.19)–(6.21). We intend to make things more
concrete in the remainder of this section.
First, let us figure out a way to determine whether an equality like

m
X l
X
cj (xj ⊗ vj ) = dj (yj ⊗ wj )
j=1 j=1
holds. For this, the following proposition is helpful.

Proposition
Pm 6.5.2 Consider the vector space V1 ⊗ V2 over F, and let
j=1 cj (xj ⊗ vj ) ∈ V1 ⊗ V2 . Let W1 = Span{x1 , . . . , xm }, and
W2 = Span{v1 , . . . , vm } The following are equivalent:
Pm
(i) j=1 cj (xj ⊗ vj ) = 0,
for all bilinear maps F : W1 × W2 → W we have that
(ii) P
m
j=1 cj F (xj , vj ) = 0W ,
for all bilinear functionals f : W1 × W2 → F we have that

(iii) P
m
j=1 cj f (xj , vj ) = 0.
Proof. (i) → (ii): Let F : W1 × W2 → W be bilinear. It is clear that if we

apply F to the left-hand side of (6.19) and to the right-hand side of (6.19),
we get the same outcome; that is
F (x + y, v) = F (x, v) + F (y, v).
The
Pm same holds for (6.20) and (6.21). Thus if the expression
j=1 cj (xj ⊗ vjP ) can be converted to 0 by applying (6.19)–(6.21), then we
m
must have that j=1 cj F (xj , vj ) = F (0, 0) = 0W . (It could be that in the
Pm
conversion of j=1 cj (xj ⊗ vj ) to 0, one encounters vectors in V1 that do
not lie in W1 and/or vectors in V2 that do not lie in W2 . In this case, one
needs to extend the definition of F to a larger space Ŵ1 × Ŵ2 . In the end,
Pm can restrict again to W1 × W2 , as in the equality
one
j=1 cj F (xj , vj ) = F (0, 0), the bilinear map F acts only on W1 × W2 .)
(ii) → (iii): Note that (iii) is just a special case of (ii), by taking W = F, and
thus (iii) holds when (ii) holds.
(iii) → (i): WePprove the contrapositive, so we assume that P(i) does not hold.
m m
Suppose that j=1 cj (xj ⊗ vj ) = 6 0. Thus the expression j=1 cj (xj ⊗ vj )
cannot be converted to 0 by rules (6.19)–(6.21). Let B = {y1 , . . . , ys } be a
basis for W1 = Span{x1 , . . . , xm }, and C = {w1 , . . . , wt } be a basis for
W2 = Span{v1 , . . . , vm }. We introduce the s × m matrix S = (sij ) and the
t × m matrix T = (tij ) as follows:

S = [c1 x1 ]B · · · [cm xm ]B , T = [v1 ]C · · · [vm ]C .
We now claim that ST T =
6 0. Indeed, note that by applying (6.19)–(6.21) we
may write
m
X m X
X s t
X s X
X t X m
(cj xj ) ⊗ vj = [( slj yl ) ⊗ ( tnj wn )] = ( slj tnj )yl ⊗ wn .
j=1 j=1 l=1 n=1 l=1 n=1 j=1
Pm
The number j=1 slj tnj is exactly the (l, n)th entry of ST T , so if ST T = 0,
Pm
it would mean that j=1 cj (xj ⊗ vj ) = 0.
As ST T = 6 0, we have that some entry of it is nonzero. Say, entry (p, q) of

ST T is nonzero. Let now g : W1 → F be linear so that g(yp ) = 1 and
g(yj ) = 0, j 6= p. Thus g ∈ W10 . Similarly, let h ∈ W20 be so that h(yq ) = 1
and h(yj ) = 0, j = 6 q. Let now f : W1 × W2 → F be defined by
f (x, v) = g(x)h(v). Then f is bilinear. Furthermore,
m
X m
X t
X t
X
f (cj xj , vj ) = f( slj yj , tnj wn ) =
j=1 j=1 l=1 n=1
m
X Xt t
X m
X
g( slj yj )h( tnj wn ) = 6 0,
spj tqj =
j=1 l=1 n=1 j=1
as this number is exactly equal to the (p, q) entry of ST T . This finishes the
proof.
The proofPof Proposition 6.5.2 provides a way for checking whether an

m
element j=1 cj (xj ⊗ vj ) ∈ V1 ⊗ V2 equals 0 or not. Indeed, we would
produce that matrices S and T as in the proof, and check whether ST T = 0
or not. Let us do an example.
Example 6.5.3 In Z35 ⊗ Z25 consider the element

       
1 1 3 3
2 ⊗ 1 0 1 2
+ 1 ⊗
  + 4 ⊗
  + 3 ⊗
  . (6.22)
2 1 1 1
3 1 0 3
We choose    
1 1
0 1
B = {2 , 1}, C = { , },
1 1
3 1
and find that
1 0 1 0 1 1 0 4
S= ,T = .
0 1 2 3 1 0 1 2
Compute now
1 2
ST T = .
3 3
Thus (6.22) is not 0. Using any factorization of ST T , for instance

T 1 2 1 0 1 2
ST = = 1 0 + 0 1 ,
3 3 0 1 3 3
we can write (6.22) differently. Indeed, choose x1 , x2 , v1 and v2 so that

1 2 1 0
[x1 ]B = , [x2 ]B = , [v1 ]C = , [v2 ]C = .
3 3 0 1
Thus    
4 0
0 1
x1 = 0 , x2 = 2 , v1 = , v2 = .
1 1
1 4
Then (6.22) equals
   
4 0
0 1
x1 ⊗ v1 + x2 ⊗ v2 = 0 ⊗
  + 2 ⊗
  . (6.23)
1 1
1 4
We can now also determine the dimension of V1 ⊗ V2 .
Proposition 6.5.4 Let V1 and V2 be finite-dimensional spaces. Then

dimV1 ⊗ V2 = (dimV1 )(dimV2 ).
More specifically, if B = {x1 , . . . , xn } is a basis for V1 and C = {v1 , . . . , vm }
be a basis for V2 , then {xi ⊗ vj : 1 ≤ i ≤ n, 1 ≤ j ≤ m} is a basis for V1 ⊗ V2 .
Pn
Proof.PFor any y ⊗ w ∈ V1 ⊗ V2 , we can write y = i=1 ci xi and
m
w = j=1 vj . Then
n X
X m
y⊗w = ci dj xi ⊗ vj .
i=1 j=1
Pk
For a linear combination r=1 ar yr ⊗ wr we can write each term as a linear
combination of {xi ⊗ vj : 1 ≤ i ≤ n, 1 ≤ j ≤ m}, and thus this linear
combination also lies in Span{xi ⊗ vj : 1 ≤ i ≤ n, 1 ≤ j ≤ m}. This shows
that {xi ⊗ vj : 1 ≤ i ≤ n, 1 ≤ j ≤ m} spans V1 ⊗ V2 .
{xi ⊗P
To show that P vj : 1 ≤ i ≤ n, 1 ≤ j ≤ m} is linearly independent,
n m
suppose that i=1 j=1 aij xi ⊗ vj = 0. Performing the procedure in the
proof of Proposition 6.5.2 with B and C as above we obtain that
 
a11 · · · a1m
ST T =  ... ..  .

. 
an1 ··· anm
Pn Pm
Thus i=1 j=1 aij xi ⊗ vj = 0 holds if and only if aij = 0 for all i and j.
This proves linear independence.
By Proposition 6.5.4 we have that Fn ⊗ Fm has dimension nm. Thus

Fn ⊗ Fm is isomorphic to Fnm . This isomorphism can be obtained via a
bijection between the basis
{ei ⊗ ej : 1 ≤ i ≤ n, 1 ≤ j ≤ m}
of Fn ⊗ Fm and the basis

{ei : 1 ≤ i ≤ nm}
of Fnm . The canonical way to do this is to order
{ei ⊗ ej : 1 ≤ i ≤ n, 1 ≤ j ≤ m} lexicographically. The ordering on pairs
(i, j) is lexicographical if
(i, j) ≤ (k, l) ⇔ i < k or (i = j and j ≤ k).
For example, ordering {1, 2, 3} × {1, 2} lexicographically results in
(1, 1) ≤ (1, 2) ≤ (1, 3) ≤ (2, 1) ≤ (2, 2) ≤ (2, 3).
In this example we would match the bases by
e1 ⊗e1 ↔ e1 , e1 ⊗e2 ↔ e2 , e1 ⊗e3 ↔ e3 , e2 ⊗e1 ↔ e4 , e2 ⊗e2 ↔ e5 , e2 ⊗e3 ↔ e6 .
In general, we match
ei ⊗ ej ∈ Fn ⊗ Fm ↔ e(j−1)n+i ∈ Fnm , 1 ≤ i ≤ n, 1 ≤ j ≤ m.
For a general x = (xj )nj=1 ∈ Fn and v ∈ Fm we now get the correspondence
 
x1 v
x ⊗ v ∈ Fn ⊗ Fm ↔  ...  ∈ Fnm .
 
xn v
For example
   
4 4
  1 5   5 
1    
2 ⊗ 4 R3 ⊗ R2 ↔ 2 4  =  8  ∈ R6 .
   
5  5 10
   
3  4  12
3
5 15
In other words, if we define
Φ : Fn ⊗ Fm → Fnm by Φ(ei ⊗ ej ) = e(j−1)n+i ,
or equivalently, by  
x1 v
Φ(x ⊗ v) =  ... 
 
xn v
and extend it to the full space by linear extension
m
X m
X
Φ( cj (xj ⊗ vj )) = cj Φ(xj ⊗ vj ),
j=1 j=1
then Φ is an isomorphism. We call this the canonical isomorphism between

Fn ⊗ Fm and Fnm .
Example 6.5.5 For the vector in Z35 ⊗ Z25

       
1 1 3 3
1 0 1 2
f = 2 ⊗ + 1 ⊗ + 4 ⊗ + 3 ⊗ (6.24)
2 1 1 1
3 1 0 3
from (6.22), we have that

         
1 0 3 1 0
2 1 3 3 4
         
2 0 4 1 2
4 + 1 + 4 + 3 = 2 .
Φ(f ) =          
         
3 0 0 1 4
1 1 0 3 0
If we apply Φ to the vector in (6.23) we obtain

     
0 0 0
4 0 4
     
0 2 2
  +   =  ,
0 2 2
     
0 4 4
1 4 0
which is the same vector in Z65 as expected.
When V1 and V2 have inner products, the tensor product space V1 ⊗ V2 has a
natural associated inner product, as follows.
Proposition 6.5.6 Let V1 and V2 have inner products h·, ·i1 and h·, ·i2 ,
respectively. Define h·, ·i on V1 ⊗ V2 via
hx ⊗ v, y ⊗ wi = hx, yi1 hv ⊗ wi2 ,
and extend h·, ·i via the rules of a Hermitian form to all of V1 ⊗ V2 . Then
h·, ·i is an inner product.
By the extension via the rules of a Hermitian form, we mean that we set
Xn m
X n X
X m
h ci xi ⊗ vi , dj yj ⊗ wj i = c̄i dj hxi ⊗ vi , yj ⊗ wj i =
i=1 j=1 i=1 j=1
n X
X m
c̄i dj hxi , yj i1 hvi ⊗ wj i2 .
i=1 j=1
Pn
Proof. The only tricky part is to check that when f = i=1 ci xi ⊗ vi has the
property that from hf , f i = 0 we obtain f = 0. For this, we choose an
orthonormal basis {z1 , . . . , zk } for Span{v1 , . . . , vn }, and rewrite f as
k
X
f= dj yj ⊗ zj .
j=1
This can always be done by writing vi as linear combinations of {z1 , . . . , zk },

and reworking the expression for f using the rules (6.19)–(6.21). From
hf , f i = 0, we now obtain that
k X
X k k
X
0= d¯i dj hyi , yj i1 hzi ⊗ zj i2 = hdi yi , di yi i1 ,
i=1 j=1 i=1
yielding for each i that hdi yi , di yi i1 = 0, and thus di yi = 0. This gives that
f = 0.
It is straightforward to check that h·, ·i satisfies all the other rules of an inner
product, and we will leave this to the reader.
When V1 and V2 have norms k · k1 and k · k2 , it is possible to provide V1 ⊗ V2

with an associated norm as well. However, there are many ways of doing
this. One way is to define
k
X
kf k := inf |cj |kxj k1 kvj k2 ,
j=1
where
Pkthe infimum is taken over all possible ways of writing f as
f = j=1 cj xj ⊗ vj . We will not further pursue this here.
When we have linear maps A : V1 → W1 and B : V2 → W2 , one can define a

linear map A ⊗ B : V1 ⊗ V2 → W1 ⊗ W2 via
(A ⊗ B)(x ⊗ v) := (Ax) ⊗ (Bv),
and extend by linearity. Thus

Xn n
X
(A ⊗ B)( xj ⊗ vj ) := (Axj ) ⊗ (Bvj ).
j=1 j=1
Since
(A ⊗ B)[(x + y) ⊗ v] = (A ⊗ B)(x ⊗ v + y ⊗ v) (6.25)
(A ⊗ B)[x ⊗ (v + w)] = (A ⊗ B)(x ⊗ v + x ⊗ w), (6.26)
and
(A ⊗ B)[(cx) ⊗ v] = (A ⊗ B)[c(x ⊗ v)] = (A ⊗ B)[x ⊗ (cv)], (6.27)

A ⊗ B is well-defined. Let us see how this “tensor” map works on a small

example.
Example 6.5.7 Consider the linear maps given by matrix multiplication

with the matrices

a11 a12 2 2 b11 b12
A= : F → F ,B = : F2 → F2 .
a21 a22 b21 b22
Then
(A ⊗ B)(e1 ⊗ e1 ) = (a11 e1 + a21 e2 ) ⊗ (b11 e1 + b21 e2 ) =
a11 b11 e1 ⊗ e1 + a11 b21 e1 ⊗ e2 + a21 b11 e2 ⊗ e1 + a21 b21 e2 ⊗ e2 .

Similarly,
(A⊗B)(e1 ⊗e2 ) = a11 b12 e1 ⊗e1 +a11 b22 e1 ⊗e2 +a21 b12 e2 ⊗e1 +a21 b22 e2 ⊗e2 ,
(A⊗B)(e2 ⊗e1 ) = a12 b11 e1 ⊗e1 +a12 b21 e1 ⊗e2 +a22 b11 e2 ⊗e1 +a22 b21 e2 ⊗e2 ,
(A⊗B)(e2 ⊗e2 ) = a12 b12 e1 ⊗e1 +a12 b22 e1 ⊗e2 +a22 b12 e2 ⊗e1 +a22 b22 e2 ⊗e2 .
Thus, if we take the canonical basis E = {e1 ⊗ e1 , e1 ⊗ e2 , e2 ⊗ e1 , e2 ⊗ e2 },
we obtain that
 
a11 b11 a11 b12 a12 b11 a12 b12
a11 b21 a11 b22 a12 b21 a12 b22 
[A ⊗ B]E←E =  a21 b11 a21 b12 a22 b11 a22 b12  .

a21 b21 a21 b22 a22 b21 a22 b22
Note that we may write this as

a11 B a12 B
[A ⊗ B]E←E = .
a21 B a22 B
The above example indicates how find a matrix representation for T ⊗ S in

general.
Proposition 6.5.8 Let V1 , V2 , W1 , W2 be vector spaces over F with bases

B1 = {xj : j = 1, . . . , n1 }, B1 = {vj : j = 1, . . . , n2 },
C1 = {yj : j = 1, . . . , m1 }, C2 = {wj : j = 1, . . . , m2 }, respectively. For
V1 ⊗ V2 and W1 ⊗ W2 , we choose the bases
E = {xj ⊗ vl : j = 1, . . . , n1 , l = 1, . . . , n2 },
F = {yj ⊗ wl : j = 1, . . . , m1 , l = 1, . . . , m2 },
respectively, where we order the elements lexicographically. If T : V1 → W1

and S : V2 → W2 are linear maps, with matrix representations
A = (ajl )m n1
j=1,l=1 = [T ]C1 ←B1 , B = [S]C2 ←B2 ,
1
then the matrix representation for T ⊗ S is given by the (m1 m2 ) × (n1 n2 )

matrix  
a11 B · · · a1,n1 B
 .. ..
[T ⊗ S]F ←E =  . . (6.28)

.
am1 ,1 B ··· am1 ,n1 B
Remark 6.5.9 Sometimes the matrix in (6.28) is taken as the definition of

A ⊗ B. It is important to realize that this particular form of the matrix is
due to the chosen (lexicographically ordered) bases of the underlying spaces,
and that changing the convention for choosing these bases will also change
the matrix.
m2 n2
Proof of Proposition 6.5.8. Writing B = (bij )i=1,j=1 , we have that
m1 X
X m2
(T ⊗ S)(xj ⊗ vl ) = arj bsl yr ⊗ ws , j = 1, . . . , n1 , l = 1, . . . , n2 .
r=1 s=1
Organizing this information appropriately in the representation matrix, we

find that (6.28) holds.
Several important properties of linear maps carry over to their tensor

products. We first note the following.
Lemma 6.5.10 If T : V1 → W1 , T̂ : W1 → Z1 , S : V2 → W2 , Ŝ : W2 → Z2
are linear maps. Then
(T̂ ⊗ Ŝ)(T ⊗ S) = (T̂ T ) ⊗ (ŜS).
Proof. For a simple tensor x ⊗ v we clearly have that
(T̂ ⊗Ŝ)(T ⊗S)(x⊗v) = (T̂ ⊗Ŝ)(T x⊗Sv) = (T̂ T x)⊗(ŜSv) = (T̂ T )⊗(ŜS)(x⊗v).
But then (T̂ ⊗ Ŝ)(T ⊗ S) and (T̂ T ) ⊗ (ŜS) also act the same on linear
combinations of simple tensors. Thus the lemma follows.
Proposition 6.5.11 Let T : V1 → W1 and S : V2 → W2 be linear, where the

vector spaces are over F. Then the following hold:
(i) If T and S are invertible, then so is T ⊗ S and (T ⊗ S)−1 = T −1 ⊗ S −1 .

(ii) If V1 = W1 and V2 = W2 , and x and v are eigenvectors for T and S
with eigenvalues λ and µ, respectively, then x ⊗ v is an eigenvector for
T ⊗ S, with eigenvalue λµ; thus (T ⊗ S)(x ⊗ v) = λµ(x ⊗ v).
For the remaining parts, the vector spaces are assumed to be inner product
spaces (and thus necessarily, F = R or C), and the inner product on the
tensor product is given via the construction in Proposition 6.5.6.
(iii) (T ⊗ S)? = T ? ⊗ S ? .
(iv) If T and S are isometries, then so is T ⊗ S.
(v) If T and S are unitary, then so is T ⊗ S.
(vi) If T and S are normal, then so is T ⊗ S.
(vii) If T and S are Hermitian, then so is T ⊗ S.
(viii) If T and S are positive (semi-)definite, then so is T ⊗ S.
Proof. The proof is straightforward. For instance, using Lemma 6.5.10,
(T ⊗ S)(T −1 ⊗ S −1 ) = (T T −1 ) ⊗ (SS −1 ) = idV1 ⊗ idV2 = idV1 ⊗V2 ,
and
(T −1 ⊗ S −1 )(T ⊗ S) = (T −1 T ) ⊗ (S −1 S) = idW1 ⊗ idW2 = idW1 ⊗W2 ,
proving (i).
For parts (iii)–(viii) it is important to observe that
h(T ⊗ S)(x ⊗ v), y ⊗ wi = hT x ⊗ Sv, y ⊗ wi = hT x, yihSv, wi =
hx, T ? yihv, S ? wi = hx ⊗ v, T ? y ⊗ S ? wi = hx ⊗ v, (T ? ⊗ S ? )(y ⊗ w)i.

This equality extends to linear combinations of simple tensors, showing (iii).
The remaining details of the proof are left to the reader. For part (viii) use
that T is positive semidefinite if and only if T = CC ∗ for some C, which can
be chosen to be invertible when T is positive definite.
The theory we developed in this section for two vector spaces, can also be
extended to a tensor product V1 ⊗ · · · ⊗ Vk of k vector spaces. In that case
V1 ⊗ · · · ⊗ Vk is generated by elements
v1 ⊗ · · · ⊗ vk ,
where v1 ∈ V1 , . . . , vk ∈ Vk . The tensor product needs to satisfy the rules

(v1 ⊗· · ·⊗vr ⊗· · ·⊗vk )+(v1 ⊗· · ·⊗ v̂r ⊗· · ·⊗vk ) = v1 ⊗· · ·⊗(vr + v̂r )⊗· · ·⊗vk ,
(6.29)
and
v1 ⊗ · · · ⊗ (cvr ) ⊗ · · · ⊗ vk = c(v1 ⊗ · · · ⊗ vr ⊗ · · · ⊗ vk ). (6.30)
Alternatively, one can first construct V1 ⊗ V2 , and then (V1 ⊗ V2 ) ⊗ V3 and so
forth, arriving at a vector space generated by elements
(·((v1 ⊗ v2 ) ⊗ v3 ) ⊗ · · · ) ⊗ vk .
These vector spaces V1 ⊗ · · · ⊗ Vk and (·((V1 ⊗ V2 ) ⊗ V3 ) ⊗ · · · ) ⊗ Vk are
isomorphic, by introducing the isomorphism Φ via
Φ(v1 ⊗ · · · ⊗ vk ) = (·((v1 ⊗ v2 ) ⊗ v3 ) ⊗ · · · ) ⊗ vk .
As these vector spaces are isomorphic, we will not draw a distinction
between them and treat the tensor product as an associative operation, so
that for instance
(v ⊗ w) ⊗ x = v ⊗ w ⊗ x = v ⊗ (w ⊗ x).
In the following section, we will use the tensor product of k vector spaces,
where each vector space is the same vector space. In other words,
V1 = · · · = Vk = V . In this case we write
V1 ⊗ · · · ⊗ Vk = V ⊗ · · · ⊗ V =: ⊗k V.
6.6 Anti-symmetric and symmetric tensors
In this section we define two important subspaces of V ⊗ · · · ⊗ V =: ⊗k V ,

the vector space obtained by taking a vector space V and taking the kth
tensor product of itself. Elements in ⊗k V are linear combinations of vectors
v1 ⊗ · · · ⊗ vk , where v1 , . . . , vk ∈ V .
The anti-symmetric tensor product of vectors v1 , . . . , vk ∈ V is defined to be

the vector X
v1 ∧ · · · ∧ vk = signσ vσ(1) ⊗ · · · ⊗ vσ(k) ,
σ∈Sk
where Sk denotes the set of all permutations on {1, . . . , k} and signσ = 1

when σ is an even permutation and signσ = −1 when σ is an odd
permutation.
Example 6.6.1 In F2 , we have

 
0
1 0 1 0 0 1 1 4
∧ = ⊗ − ⊗ ↔
−1 ∈ F .

0 1 0 1 1 0
0
In F3 , we have
     
0 0 0
1 0 0
     
0 1 0
                 
1 0 −1 1 0 0 0 0
    0
 
0 ∧ 1 ↔  0  , 0 ∧ 0 ↔  0  , 1 ∧ 0 ↔  0  .
     
0 0 0 0
  1 0 0
  1 1
 
0 −1 0
     
0 0 −1
0 0 0
Lemma 6.6.2 The anti-symmetric tensor is linear in each of its parts; that
is
v1 ∧· · ·∧(cvi +dv̂i )∧· · ·∧vk = c(v1 ∧· · ·∧vi ∧· · ·∧vk )+d(v1 ∧· · ·∧v̂i ∧· · ·∧vk ).
Proof. Follows immediately from the corresponding property of the tensor

product.
Proposition 6.6.3 If two vectors in an anti-symmetric tensor are switched,

it will change sign; that is,
v1 ∧ · · · ∧ vi ∧ · · · ∧ vj ∧ · · · ∧ vk = −v1 ∧ · · · ∧ vj ∧ · · · ∧ vi ∧ · · · ∧ vk .
Proof. Let τ = (i j) be the permutation that switches i and j. Then

X
v1 ∧ · · · ∧ vj ∧ · · · ∧ vi ∧ · · · ∧ vk = signσ vσ(τ (1)) ⊗ · · · ⊗ vσ(τ (k)) =
σ∈Sk
X X
− sign(στ ) vσ(τ (1)) ⊗ · · · ⊗ vσ(τ (k)) = − signσ̂ vσ̂(1) ⊗ · · · ⊗ vσ̂(k) =
σ∈Sk σ̂∈Sk
−v1 ∧ · · · ∧ vi ∧ · · · ∧ vj ∧ · · · ∧ vk ,
where we used that if τ runs through all of Sk , then so does σ̂ = στ .
An immediate consequence is the following.

Corollary 6.6.4 If a vector appears twice in an anti-symmetric tensor, then

the anti-symmetric tensor is zero; that is,
v1 ∧ · · · ∧ vi ∧ · · · ∧ vi ∧ · · · ∧ vk = 0.
Proof. When F is so that 2 6= 0, it is a consequence of Proposition 6.6.3 as

follows. Let f = v1 ∧ · · · ∧ vi ∧ · · · ∧ vi ∧ · · · ∧ vk . By Proposition 6.6.3 we
have that f = −f . Thus 2f = 0. As 2 6= 0, we obtain that f = 0.
When F is so that 1 + 1 = 0 (which is also referred to as a field of

characteristic 2), we have the following argument which actually works for
any field. Let Ek = {σ ∈ Sk : σ is even}. Let i and j be the two locations
where vi appears, and let τ = (i j). Then all odd permutations in Sk are of
the form στ with σ ∈ Ek . Thus
v1 ∧ · · · ∧ vi ∧ · · · ∧ vi ∧ · · · ∧ vk =
X
(signσ vσ(1) ⊗ · · · ⊗ vσ(k) + sign(στ ) vσ(τ (1)) ⊗ · · · ⊗ vσ(τ (k)) ).
σ∈Ek
As sign(στ ) = −signσ and
vσ(1) ⊗ · · · ⊗ vσ(k) = vσ(τ (1)) ⊗ · · · ⊗ vσ(τ (k)) ,
we get that all terms cancel.
We define
∧k V := Span{v1 ∧ · · · ∧ vk : vj ∈ Vj , j = 1, . . . , k}.
Then ∧k V is a subspace of ⊗k V .
Proposition 6.6.5 When dimV = n, then dim ∧k V = nk . In fact, if

{v1 , . . . , vn } is a basis of V , then

E = {vi1 ∧ · · · ∧ vik : 1 ≤ i1 < · · · < ik ≤ n} is a basis of ∧k V .
Proof. Let f ∈ ∧k V . Then f is a linear combination of elements of the form

x1 ∧ · · · ∧ xk where xjP ∈ V , j = 1, . . . , k. Each xj is a linear combination of
n
v1 , . . . , vn , thus xj = l=1 clj vl for some scalars clj . Plugging this into
x1 ∧ · · · ∧ xk and using Lemma 6.6.2 we get that
n
X
x1 ∧ · · · ∧ xk = cl1 ,1 · · · clk ,k vl1 ∧ · · · ∧ vlk .
l1 ,...,lk =1
6 s, we have that vl1 ∧ · · · ∧ vlk = 0, so we only

When lr = ls for some r =
have nonzero terms when all l1 , . . . , lk are different. Moreover, by applying
Proposition 6.6.3, we can always do several switches so that vl1 ∧ · · · ∧ vlk

turns into vi1 ∧ · · · ∧ vik or −vi1 ∧ · · · ∧ vik , where now i1 < · · · < ik (and
{i1 , . . . , ik } = {l1 , . . . , lk }). Putting these observations together, we obtain
that f ∈ Span E.
For
P linear independence, suppose that
1≤i1 <···<ik ≤n ci1 ,...,ik vi1 ∧ · · · ∧ vik = 0 for some scalars ci1 ,...,ik ∈ F,
1 ≤ i1 < · · · < ik ≤ n. Putting in the definition of vi1 ∧ · · · ∧ vik , we arrive at
the equality
Xn
al1 ,...,lk vl1 ⊗ · · · ⊗ vik = 0, (6.31)
l1 ,...,lk =1
where either al1 ,...,lk = 0 (when lr = ls for some r = 6 s) or where al1 ,...,lk
equals one of the numbers ±ci1 ,...,ik . As the tensors
{vl1 ⊗ · · · ⊗ vik : lj = 1, . . . , n, j = 1, . . . , k} form a basis of ⊗k V , we have
that (6.31) implies that al1 ,...,lk are all equal to 0. This implies that all
ci1 ,...,ik are equal to 0. This shows that E is linearly independent.
It remains to observe that the number of elements of E corresponds to the

number of ways one can choose k numbers from {1, . . . , n}, which equals nk .
Remark 6.6.6 Note that when V has an inner product, and when
{v1 , . . . , vn } is chosen to be an orthonormal basis of V , the basis E is not
orthonormal. It is however an orthogonal basis, and thus all that needs √ to be
done is to make the elements of E of unit length. As all have length k!, one
obtains an orthonormal basis for ∧k V by taking
1
Eon = { √ (vi1 ∧ · · · ∧ vik ) : 1 ≤ i1 < · · · < ik ≤ n}.
k!
We next analyze how linear operators and anti-symmetric tensors interact.
Proposition 6.6.7 Let T : V → W be a linear map. Denote

⊗k T = T ⊗ · · · ⊗ T : ⊗k V → ⊗k W . Then ⊗k T [∧k V ] ⊆ ∧k W .
Proof. Let v1 ∧ · · · ∧ vk ∈ ∧k V . Then

X
⊗k T (x1 ∧ · · · ∧ xk ) = ⊗k T ( signσ vσ(1) ⊗ · · · ⊗ vσ(k) ) =
σ∈Sk
X X
signσ ⊗k T (vσ(1) ⊗ · · · ⊗ vσ(k) ) = signσ (T vσ(1) ) ⊗ · · · ⊗ (T vσ(k) ) =
σ∈Sk σ∈Sk
(T v1 ) ∧ · · · ∧ (T vk ) ∈ ∧k W.
As every element of ∧k V is a linear combination of elements of the type

v1 ∧ · · · ∧ vk , we obtain the result.
The restriction of ⊗k T to the subspace ⊗k V is denoted as ∧k T .

Equivalently, ∧k T : ∧k V → ∧k W is defined via
∧k T (v1 ∧ · · · ∧ vk ) = (T v1 ) ∧ · · · ∧ (T vk ).
Let us find ∧k T in the following example.
Example 6.6.8 Let T : F3 → F3 be multiplication with the matrix

A = (aij )3i,j=1 . Thus T (x) = Ax. The standard basis on ∧2 F3 is
E = {e1 ∧ e2 , e1 ∧ e3 , e2 ∧ e3 }. Let us compute [∧2 T ]E←E . We apply ∧2 T to
the first basis element:
∧2 T (e1 ∧e2 ) = (T e1 )∧(T e2 ) = (a11 e1 +a21 e2 +a31 e3 )∧(a12 e1 +a22 e2 +a32 e3 ) =
(a11 a22 − a21 a12 )e1 ∧ e2 + (a11 a32 − a31 a32 )e1 ∧ e3 + (a21 a32 − a31 a22 )e2 ∧ e3 .
Continuing, we find
 
a11 a22 − a21 a12 a11 a23 − a21 a13 a12 a23 − a22 a13
[∧2 T ]E←E = a11 a32 − a31 a12 a11 a33 − a31 a13 a12 a33 − a32 a13  .
a21 a32 − a31 a22 a21 a33 − a31 a23 a22 a33 − a33 a23
If we denote A[I.J] = (aij )i∈I,j∈J , then we obtain
 
det A[{1, 2}, {1, 2}] det A[{1, 2}, {1, 3}] det A[{1, 2}, {2, 3}]
[∧2 T ]E←E = det A[{1, 3}, {1, 2}] det A[{1, 3}, {1, 3}] det A[{1, 3}, {2, 3}].
det A[{2, 3}, {1, 2}] det A[{2, 3}, {1, 3}] det A[{2, 3}, {2, 3}]
(6.32)
The matrix on the right-hand side of (6.32) is called the second compound
matrix of A. In general, for a n × n matrix the kth compound matrix of A is
an nk × nk matrix whose entries are A[I, J], where I and J, run through
all subsets of {1, . . . , n} with k elements. This matrix corresponds to
[∧k T ]E←E when T (x) = Ax and E = {ei1 ∧ · · · ∧ eik : 1 ≤ i1 < · · · < ik ≤ n} .
Lemma 6.6.9 Let T : V → W and S : X → Y be linear, where the vector

spaces are over F. Then
(∧k T )(∧k S) = ∧k (T S).
Proof. Since (⊗k T )(⊗k S) = ⊗k (T S), and as in general ∧k U is defined as

⊗k U on the subspace of anti-symmetric tensors, the lemma follows.
Proposition 6.6.10 Let T : V → W be linear, where the vector spaces are

over F. Then the following hold:
(i) If T is invertible, then so is ∧k T and (∧k T )−1 = ∧k (T −1 ).

(ii) If V = W and x1 , . . . , xk are linearly independent eigenvectors for T
Qk then x1 ∧ · · · ∧ xk is an
with eigenvalues λ1 , . . . , λk , respectively,
eigenvector for ∧k T , with eigenvalue i=1 λi = λ1 · · · λk ; thus
∧k T (x1 ∧ · · · ∧ xk ) = λ1 · · · λk (x1 ∧ · · · ∧ xk ).
For the remaining parts, V is assumed to be an inner product space (and

thus necessarily, F = R or C), and the inner product ∧k V is inherited from
the inner product on the tensor product given via the construction in
Proposition 6.5.6.
(iii) (∧k T )? = ∧k T ? .
(iv) If T is an isometry, then so is ∧k T .
(v) If T is unitary, then so is ∧k T .
(vi) If T is normal, then so is ∧k T .
(vii) If T is Hermitian, then so is ∧k T .
(viii) If T is positive (semi-)definite, then so is ∧k T .
Proof. Use Lemma 6.6.9 and Proposition 6.5.11.
We next switch to the symmetric tensor product. The development is very

similar to the anti-symmetric tensor product, where now determinants are
replaced by permanents (see (6.34) for the definition). The symmetric tensor
product of vectors v1 , . . . , vk ∈ V is defined to be the vector
X
v1 ∨ · · · ∨ vk = vσ(1) ⊗ · · · ⊗ vσ(k) .
σ∈Sk
Thus, the difference with the anti-symmetric tensor product is the absence
of the factor signσ.
Example 6.6.11 In F2 with k = 2, we have

 
2
1 1 1 1 1 1 0 4
∨ = ⊗ + ⊗ ↔
0 ∈ F

0 0 0 0 0 0
0
and  
0
1 0 1 0 0 1 1 4
∨ = ⊗ + ⊗ ↔
1 ∈ F .

0 1 0 1 1 0
0
In F2 with k = 3, we have
   
6 0
0 2
   
0 2
   
1 1 1 0
 8 1 1 0 0
  ∈ F8 .
∨ ∨ ↔ ∈ F , ∨ ∨ ↔
0 0 0 0
  0 0 1 2
 
0 0
   
0 0
0 0
Lemma 6.6.12 The symmetric tensor is linear in each of its parts; that is
v1 ∨· · ·∨(cvi +dv̂i )∨· · ·∨vk = c(v1 ∨· · ·∨vi ∨· · ·∨vk )+d(v1 ∨· · ·∨v̂i ∨· · ·∨vk ).
Proof. Follows immediately from the corresponding property of the tensor

product.
Proposition 6.6.13 If two vectors in a symmetric tensor are switched, it

will not change the symmetric tensor; that is,
v1 ∨ · · · ∨ vi ∨ · · · ∨ vj ∨ · · · ∨ vk = v1 ∨ · · · ∨ vj ∨ · · · ∨ vi ∨ · · · ∨ vk .
Proof. Let τ = (i j) be the permutation that switches i and j. Then

X
v1 ∨ · · · ∨ vj ∨ · · · ∨ vi ∨ · · · ∨ vk = vσ(τ (1)) ⊗ · · · ⊗ vσ(τ (k)) =
σ∈Sk
X X
vσ(τ (1)) ⊗ · · · ⊗ vσ(τ (k)) = vσ̂(1) ⊗ · · · ⊗ vσ̂(k) =
σ∈Sk σ̂∈Sk
v1 ∨ · · · ∨ vi ∨ · · · ∨ vj ∨ · · · ∨ vk ,
where we used that if τ runs through all of Sk , then so does σ̂ = στ .
We define
∨k V := Span{v1 ∨ · · · ∨ vk : vj ∈ Vj , j = 1, . . . , k}.
Then ∨k V is a subspace of ⊗k V .
Proposition 6.6.14 When dimV = n, then dim ∨k V = n+k−1

k . In fact, if
{v1 , . . . , vn } is a basis of V , then
F = {vi1 ∨ · · · ∨ vik : 1 ≤ i1 ≤ · · · ≤ ik ≤ n} is a basis of ∨k V .
Proof. Let f ∈ ∨k V . Then f is a linear combination of elements of the form

x1 ∨ · · · ∨ xk where xjP ∈ V , j = 1, . . . , k. Each xj is a linear combination of
n
v1 , . . . , vn , thus xj = l=1 clj vl for some scalars clj . Plugging this into
x1 ∨ · · · ∨ xk and using Lemma 6.6.12 we get that
n
X
x1 ∨ · · · ∨ xk = cl1 ,1 · · · clk ,k vl1 ∨ · · · ∨ vlk .
l1 ,...,lk =1
By applying Proposition 6.6.3, we can always do several switches so that

vl1 ∨ · · · ∨ vlk turns into vi1 ∨ · · · ∨ vik , where now i1 ≤ · · · ≤ ik . Putting
these observations together, we obtain that f ∈ Span F.
For
P linear independence, suppose that
1≤i1 ≤···≤ik ≤n ci1 ,...,ik vi1 ∨ · · · ∨ vik = 0 for some scalars ci1 ,...,ik ∈ F,
1 ≤ i1 ≤ · · · ≤ ik ≤ n. Putting in the definition of vi1 ∨ · · · ∨ vik , we arrive at
the equality
Xn
al1 ,...,lk vl1 ⊗ · · · ⊗ vlk = 0, (6.33)
l1 ,...,lk =1
where al1 ,...,lk equals one of the numbers ci1 ,...,ik or a positive integer
multiple of it. As the tensors {vl1 ⊗ · · · ⊗ vik : lj = 1, . . . , n, j = 1, . . . , k}
form a basis of ⊗k V , we have that (6.33) implies that al1 ,...,lk are all equal to
0. This implies that all ci1 ,...,ik are equal to 0. This shows that F is linearly
independent.
It remains to observe that the number of elements of F equals n+k−1

k . To
see this, one chooses among the numbers 2, 3, . . . , n + k numbers
m1 < · · · < mk . This choice corresponds in a one-to-one way to a choice
1 ≤ i1 ≤ · · · ≤ ik ≤ n, by letting ij = mj − j, j = 1, . . . , k.
Remark 6.6.15 Note that when V has an inner product, and when
{v1 , . . . , vn } is chosen to be an orthonormal basis of V , the basis F is not
orthonormal. It is however an orthogonal basis, and thus all that needs to be
done is to make the elements of E of unit length. The elements of F have
different lengths, so some care needs to be taken in doing this.
We next analyze how linear operators and symmetric tensors interact.
Proposition 6.6.16 Let T : V → W be a linear map. Denote

⊗k T = T ⊗ · · · ⊗ T : ⊗k V → ⊗k W . Then ⊗k T [∨k V ] ⊆ ∨k W .
Proof. Let v1 ∨ · · · ∨ vk ∈ ∨k V . Then

X X
⊗k T (x1 ∨· · ·∨xk ) = ⊗k T ( vσ(1) ⊗· · ·⊗vσ(k) ) = ⊗k T (vσ(1) ⊗· · ·⊗vσ(k) ) =
σ∈Sk σ∈Sk
X
(T vσ(1) ) ⊗ · · · ⊗ (T vσ(k) ) = (T v1 ) ∨ · · · ∨ (T vk ) ∈ ∨k W.
σ∈Sk
As every element of ∨k V is a linear combination of elements of the type

v1 ∨ · · · ∨ vk , we obtain the result.
The restriction of ⊗k T to the subspace ⊗k V is denoted as ∨k T . Thus,

∨k T : ∨k V → ∨k W is defined via
∨k T (v1 ∨ · · · ∨ vk ) = (T v1 ) ∨ · · · ∨ (T vk ).
For a matrix B = (bij )ni,j=1 we define its permanent by

X
per B = b1,σ(1) · · · bn,σ(n) . (6.34)
σ∈Sn
This is almost the same expression as for the determinant except that signσ
does not appear. Thus all terms have a +. For instance,

b b
per 11 12 = b11 b22 + b21 b12 .
b21 b22
Example 6.6.17 Let T : F2 → F2 be matrix multiplication with the matrix

A = (aij )2i,j=1 . The standard basis on ∨2 F2 is F = {e1 ∨ e1 , e1 ∨ e2 , e2 ∨ e2 }.
Let us compute [∨2 T ]F ←F . We apply ∨2 T to the first basis element:
∨2 T (e1 ∨ e1 ) = (T e1 ) ∨ (T e1 ) = (a11 e1 + a21 e2 ) ∨ (a11 e1 + a21 e2 ) =
a211 e1 ∨ e1 + 2a11 a21 e1 ∨ e2 + a221 e2 ∨ e2 .

Similarly,
∨2 T (e1 ∨ e2 ) = (T e1 ) ∨ (T e2 ) = (a11 e1 + a21 e2 ) ∨ (a12 e1 + a22 e2 ) =
a11 a12 e1 ∨ e1 + (a11 a22 + a21 a12 )e1 ∨ e2 + a21 a22 e2 ∨ e2 .

Continuing, we find
a211 a212
 
a11 a12
[∨2 T ]F ←F = 2a11 a21
 a11 a22 + a21 a12 2a12 a22  . (6.35)
a221 a21 a22 a222
Notice that the (2, 2) element is equal to perA[{1, 2}, {1, 2}].
Lemma 6.6.18 Let T : V → W and S : X → Y be linear, where the vector

spaces are over F. Then
(∨k T )(∨k S) = ∨k (T S).

Proof. Since (⊗k T )(⊗k S) = ⊗k (T S), and as in general ∨k U is defined as

⊗k U on the subspace of symmetric tensors, the lemma follows.
Proposition 6.6.19 Let T : V → W be linear, where the vector spaces are

over F. Then the following hold:
(i) If T is invertible, then so is ∨k T and (∨k T )−1 = ∨k (T −1 ).

(ii) If V = W and x1 , . . . , xk are eigenvectors for T with eigenvalues
λ1 , . . . , λk , respectively, then x1 ∨ · · · ∨ xk is an eigenvector for ∨k T ,
with eigenvalue λ1 · · · λk ; thus
∨k T (x1 ∨ · · · ∨ xk ) = λ1 · · · λk (x1 ∨ · · · ∨ xk ).
For the remaining parts, V is assumed to be an inner product space (and

thus necessarily, F = R or C), and the inner product ∨k V is inherited from
the inner product on the tensor product given via the construction in
Proposition 6.5.6.
(iii) (∨k T )? = ∨k T ? .
(iv) If T is an isometry, then so is ∨k T .
(v) If T is unitary, then so is ∨k T .
(vi) If T is normal, then so is ∨k T .
(vii) If T is Hermitian, then so is ∨k T .
(viii) If T is positive (semi-)definite, then so is ∨k T .
Proof. Use Lemma 6.6.18 and Proposition 6.5.11.
Remark 6.6.20 It should be noted that when A = (aij )2i,j=1 is a Hermitian

matrix, the matrix in (6.35) is not (necessarily) Hermitian. Why is this not
in contradiction with Proposition 6.6.19 (vii)? The reason is that that the
basis F = {e1 ∨ e1 , e1 ∨ e2 , e2 ∨ e2 } is not orthonormal. For the Hermitian
property to be necessarily reflected in the matrix representation, we need the
basis to be orthonormal. If we introduce
1 1 1
Fon = { (e1 ∨ e1 ), √ (e1 ∨ e2 ), (e2 ∨ e2 )},
2 2 2
then
 2
√ 2

√ a11 2a11 a12 √ a12
[∨2 T ]Fon ←Fon =  2a11 a21 a11√
a22 + a21 a12 2a12 a22  , (6.36)
a221 2a21 a22 a222
which now is Hermitian when A is. The same remark holds for the other
properties in Proposition 6.6.19 (iii)–(viii).
6.7 Exercises
Exercise 6.7.1 The purpose of this exercise is to show (the vector form of)
Minkowski’s inequality, which says that for complex numbers xi , yi ,
i = 1, . . . , n, and p ≥ 1, we have
n
! p1 n
! p1 n
! p1
X X X
|xi + yi | ≤ |yi | + |yi | . (6.37)
i=1 i=1 i=1
Recall that a real-valued function f defined on an interval in R is called

convex if for all c, d in the domain of f , we have that
f (tc + (1 − t)d) ≤ tf (c) + (1 − t)f (d), 0 ≤ t ≤ 1.
(a) Show that f (x) = − log x is a convex function on (0, ∞). (One can do
this by showing that f 00 (x) ≥ 0.)
(b) Use (a) to show that for a, b > 0 and p, q ≥ 1, with p1 + 1q = 1, we have
p q
ab ≤ ap + bq . This inequality is called Young’s inequality.
(c) Show Hőlder’s inequality: when ai , bi ≥ 0, i = 1, . . . , n, then
n n
! p1 n
! q1
X X X
ai bi ≤ api bqi .
i=1 i=1 i=1
Pn 1 Pn 1
(Hint: Let λ = ( i=1 api ) p and µ = ( i=1 bqi ) q , and divide on both
sides ai by λ and bi by µ. Use this to argue that it is enough to prove
the inequality when λ = µ = 1. Next use (b)).
(d) Use (c) to prove (6.37) in the case when xi , yi ≥ 0. (Hint: Write
(xi + yi )p = xi (xi + yi )p−1 + yi ((xi + yi )p−1 , take the sum on both sides,
and now apply Hőlder’s inequality to each of the terms on the right-hand
side. Rework the resulting inequality, and use that p + q = pq.)
(e) Prove Minkowski’s inequality (6.37).

(f) Show that when Vi has a norm k · ki , i = 1, . . . , k, then for p ≥ 1 we have

that  
v1 k
! p1
 ..  X
k  .  kp := kvi kpi
vk i=1
defines a norm on V1 × · · · × Vk .
Exercise 6.7.2 Let V and Z be vector spaces over F and T : V → Z be

linear. Suppose W ⊆ Ker T . Show there exists a linear transformation
S : V /W → Ran T such that S(v + W ) = T v for v ∈ V . Show that S is
surjective and that Ker S is isomorphic to (Ker T )/W .
Exercise 6.7.3 Consider the vector space Fn×m , where F = R or F = C,

and let k · k be norm on Fn×m .
(k)
(a) Let A = (aij )ni=1,j=1
m
, Ak = (aij )ni=1,j=1
m
, k = 1, 2, . . . , be matrices in
n×m
F . Show that limk→∞ kAk − Ak = 0 if and only if
(k)
limk→∞ |aij − aij | = 0 for every i = 1, . . . , n and j = 1, . . . , m.
(b) Let n = m. Show that limk→∞ kAk − Ak = 0 and limk→∞ kBk − Bk = 0
imply that limk→∞ kAk Bk − ABk = 0.
Exercise 6.7.4 Given A ∈ Cn×n , we define its similarity orbit to be the set
of matrices
O(A) = {SAS −1 : S ∈ Cn×n is invertible}.
Thus the similarity orbit of a matrix A consists of all matrices that are
similar to A.
(a) Show that if A is diagonalizable, then its similarity orbit O(A) is closed.
(Hint: notice that due to A being diagonalizable, we have that B ∈ O(A)
if and only if mA (B) = 0.)
(b) Show that if A is not diagonalizable, then its similarity orbit is not
closed.
Exercise 6.7.5 Suppose that V is an infinite-dimensional vector space with

basis {vj }j∈J . Let fj ∈ V 0 , j ∈ J, be so that fj (vj ) = 1 and fj (vk ) = 0 for
k 6= j. Show that {fj }j∈J is a linearly independent set in V 0 but is not a
basis of V 0 .
Exercise 6.7.6 Describe the linear functionals on Cn [X] that form the dual
basis of {1, X, . . . , X n }.
Exercise 6.7.7 Let a0 , . . . , an be different complex numbers, and define

Ej ∈ (Cn [X])0 , j = 0, . . . , n, via Ej (p(X)) = p(aj ). Find a basis of Cn [X] for
which {E0 , . . . , En } is the dual basis.
Exercise 6.7.8 Let V = W +̇X.
(a) Show how given f ∈ W 0 and g ∈ X 0 , one can define h ∈ V 0 so that

h(w) = f (w) for w ∈ W and h(x) = g(x) for x ∈ X.
(b) Using the construction in part (a), show that V 0 = W 0 +̇X 0 . Here it is
understood that we view W 0 as a subspace of V 0 , by letting f ∈ W 0 be
defined on all of V by putting f (w + x) = f (w), when w ∈ W and
x ∈ X. Similarly, we view X 0 as a subspace of V 0 , by letting g ∈ W 0 be
defined on all of V by putting g(w + x) = g(x), when w ∈ W and x ∈ X.
Exercise 6.7.9 Let W be a subspace of V . Define

Wann = {f ∈ V 0 : f (w) = 0 for all w ∈ W },
the annihilator of W .
(a) Show that Wann is a subspace of V 0 .

   
1 1
−1 0 4
 2  , 1} ⊆ C .
(b) Determine the annihilator of Span{   
−2 0
(c) Determine the annihilator of Span{1 + 2X, X + X 2 } ⊆ R3 [X].
Exercise 6.7.10 Let V be a finite-dimensional vector space over R, and let

{v1 , . . . , vk } be linearly independent. We define
k
X
C = {v ∈ V : there exist c1 , . . . , ck ≥ 0 so that v = ci vi }.
i=1
Show that v ∈ C if and only if for all f ∈ V 0 with f (vj ) ≥ 0, j = 1, . . . , k, we

have that f (v) ≥ 0.
Remark. The statement is also true when {v1 , . . . , vk } are not linearly
independent, but in that case the proof is more involved. The corresponding
result is the Farkas–Minkowski Theorem, which plays an important role in
linear programming.
Exercise 6.7.11 Let V and W be finite-dimensional vector spaces and

A : V → W a linear map. Show that Av = w has a solution if and only if for
all f ∈ (RanA)ann we have that f (w) = 0. Here the definition of the
annihilator is used as defined in Exercise 6.7.9.
Exercise 6.7.12 For x, y ∈ R3 , let the cross product x × y be defined as in

(6.17).
(a) Show that hx, x × yi = hy, x × yi = 0.
(b) Show that x × y = −y × x.

(c) Show that x × y = 0 if and only if {x, y} is linearly dependent.
Exercise 6.7.13 Let

 
−1 0
i 1−i 2−i
A= , B = −2 5 .
1 + i −2 −3 + i
1 3
Compute A ⊗ B and B ⊗ A, and show that they are similar via a

permutation matrix.
Exercise 6.7.14 Let A ∈ Fn×n and B ∈ Fm×m .
(a) Show that tr(A ⊗ B) = (tr A)(tr B).

(b) Show that rank(A ⊗ B) = (rank A)(rank B).
Exercise 6.7.15 Given Schur triangularization decompositions for A and

B, find a Schur triangularization decomposition for A ⊗ B. Conclude that if
λ1 , . . . , λn are the eigenvalues for A and µ1 , . . . , µm are the eigenvalues for
B, then λi µj , i = 1, . . . , n, j = 1, . . . , m, are the nm eigenvalues of A ⊗ B.
Exercise 6.7.16 Given singular value decompositions for A and B, find a

singular value decomposition for A ⊗ B. Conclude that if σ1 , . . . , σk are the
nonzero singular values for A and σ̂1 , . . . , σ̂l are the nonzero singular values
for B, then σi σ̂j , i = 1, . . . , k, j = 1, . . . , l, are the kl nonzero singular values
of A ⊗ B.
Exercise 6.7.17 Show that det(I ⊗ A + A ⊗ I) = (−1)n det pA (−A), where

A ∈ Cn×n .
Exercise 6.7.18 Show that if A is a matrix and f a function, so that f (A)

is well-defined, then f (Im ⊗ A) is well-defined as well, and
f (Im ⊗ A) = Im ⊗ f (A).
Exercise 6.7.19 For a diagonal matrix A = diag(λi )ni=1 , find matrix

representations for A ∧ A and A ∨ A using the canonical (lexicographically
ordered) bases for Fn ∧ Fn and Fn ∨ Fn , respectively.
Exercise 6.7.20 Show that

hv1 ∧ · · · ∧ vk , w1 ∧ · · · ∧ wk i = k! det(hvi , wj i)ki,j=1 .
Exercise 6.7.21 Find an orthonormal basis for ∨2 C3 .
Exercise 6.7.22 (a) Let A = (aij )2i=1,j=1

m
∈ F2×m and
m 2 m×2
B = (bij )i=1,j=1 ∈ F . Find the matrix representations for A ∧ A,
B ∧ B and AB ∧ AB using the canonical (lexicographically ordered)
bases for ∧k Fn , k = 2, n = 2, m, 1, respectively.
(b) Show that the equality AB ∧ AB = (A ∧ A)(B ∧ B) implies that
Xm Xm m
X Xm
( a1j bj1 )( a2j bj2 ) − ( a1j bj2 )( a2j bj1 ) =
j=1 j=1 j=1 j=1
X
(a1j a2k − a1k a2j )(b1j b2k − b1k b2j ). (6.38)
1≤j<k≤m
(c) Let M = {1, . . . , m} and P = {1, . . . , p}. For A ∈ Fp×m and B ∈ Fm×p ,
show that
X
det AB = det(A[P, S]) det(B[S, P ]). (6.39)
S⊆M,|S|=p
(Hint: Use that (∧p A)(∧p B) = ∧p (AB) = det AB.)
Remark: Equation (6.39) is called the Cauchy–Binet identity. When p = 2

it reduces to (6.38), which when B = AT (or B = A∗ when F = C) is called
the Lagrange identity.
Exercise 6.7.23 For x, y ∈ R3 , let the cross product x × y be defined as in

(6.17). Show, using (6.38) (with B = AT ), that
kx × yk2 = kxk2 kyk2 − (hx, yi)2 . (6.40)
Notice that this equality implies the Cauchy–Schwarz inequality.

Exercise 6.7.24 (Honors) Let V be a vector space and let W ⊆ Y ⊆ V be

subspaces.
(a) Show that (V /W )/(Y /W ) is isomorphic to V /Y .
(b) Show that dim(V /W ) = dim(V /Y ) + dim(Y /W ), assuming that

dim(V /W ) is finite.
Exercise 6.7.25 (Honors)
(a) Show that when a = 6 0, the Jordan canonical form of Js (a) ⊗ Jt (0) is
given by ⊕si=1 Jt (0).
(b) Show that when b 6= 0, the Jordan canonical form of Js (0) ⊗ Jt (b) is
given by ⊕ti=1 Js (0).
(c) Show that when t ≥ s, the Jordan canonical form of Js (0) ⊗ Jt (0) is
given by
[⊕t−s+1
i=1 Js (0)] ⊕ [⊕s−1
i=1 (Js−i (0) ⊕ Js−i (0))].
(d) Show that when a, b = 6 0 and t ≥ s, the Jordan canonical form of

Js (a) ⊗ Jt (b) is given by
Jt+s−1 (ab) ⊕ Jt+s−3 (ab) ⊕ · · · ⊕ Jt+s−(2s−3) (ab) ⊕ Jt+s−(2s−1) (ab).
This is also the Jordan canonical form of Jt (a) ⊗ Js (b).
Using the above information one can now find the Jordan canonical form of
A ⊗ B, when one is given the Jordan canonical forms of A and B.
7
How to Use Linear Algebra
CONTENTS
7.1 Matrices you can’t write down, but would still like to use . . . . . . 196
7.2 Algorithms based on matrix vector products . . . . . . . . . . . . . . . . . . . . 198
7.3 Why use matrices when computing roots of polynomials? . . . . . . 203
7.4 How to find functions with linear algebra? . . . . . . . . . . . . . . . . . . . . . . 209
7.5 How to deal with incomplete matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
7.6 Solving millennium prize problems with linear algebra . . . . . . . . . 222
7.6.1 The Riemann hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
7.6.2 P vs. NP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
7.7 How secure is RSA encryption? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
7.8 Quantum computation and positive maps . . . . . . . . . . . . . . . . . . . . . . . 232
7.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
Bibliography for Chapter 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
In this chapter we would like to give you an idea how creative thinking led
to some very useful ideas to exploit the power of linear algebra. The hope is
that it will inspire you to think of new ways to use linear algebra in areas of
your interest. It would be great if one day we would be remiss by not
including your ideas in this chapter. So, go for it!
This chapter has a somewhat different flavor than the other chapters. As
applications use mathematics from different fields, we will be mentioning
and use some results from other areas of mathematics without proofs. In
addition, not everything will have a complete theory. Some of the algorithms
described may be based on heuristic arguments and do not necessarily have
a full theoretical justification. It is natural that these things happen:
mathematics is a discipline with several different, often useful, aspects and
continues to develop as a discipline. There will always be mathematical
research continuing to improve on existing results.
195
7.1 Matrices you can’t write down, but would still like
to use
In previous chapters we have done computations with matrices to learn the

concepts, and they were all small matrices (at most 8 × 8). Bigger matrices
(say, number of rows and columns in the thousands) you may not want to
deal with by hand, but working with them in a spreadsheet or other software
seems doable. But what do we do when matrices are simply too big to store
anywhere (say, if the number of rows or columns run in the billions), or if it is
simply impossible to gather all the data? Can we still work with the matrix?
Here are two examples to begin with, both used in search engines:
• A matrix P where there is a row and column for every existing web page,
and the (i, j)th entry pij represents the probability that you go from web
page i to web page j. Currently (October 2015), there are about 4.76
billion indexed web pages, so this matrix is huge. However, if you have a
way of looking at a page i and determining all the probabilities pij , then
determining a row of this matrix is not a big deal.
• A matrix M where there is a row for every web page, and a column for
every search word. The (i, j)th entry mij of this matrix is set to be 1 if
search word j appears on page i, and 0 otherwise. Again, this matrix is
huge, but determining row i is easily done by looking at this particular
page.
One big difference between these two matrices is obvious: P is square and M
is not. Thus P has eigenvectors, and M does not. In fact, it is the eigenvector
of P T at the eigenvalue 1 that is of interest. Notice that for these matrices it
may not be convenient to use numbers 1, 2, . . . as indices for the rows and
columns, as we usually do. Rather one may just use the name of the web
page or the actual search word as the index. So, for instance, we would write
1
pwww.linear− algebra.edu,www.woerdeman.edu = ,
10
mwww.linear− algebra.edu,Hugo = 1.
Notice that this means that the rows and columns are not ordered in a
natural way (although we can order them if we have to), and thus anything
meaningful that we should be looking for should not depend on any
particular order. In the case of P , though, the rows and columns are indexed
by the same index set, so if any ordering is chosen for the rows we should use
How to Use Linear Algebra 197
the same for the columns. Let us also observe that any vector x for which we
would to consider the product P x, needs to be indexed by the same index
set as is used for the columns of P . Thus x would have entries like
xwww.linear− algebra.edu and xwww.woerdeman.edu .
Here are some more matrices that may be of interest:
• A matrix K where the columns represent the products you sell and the
rows represent your customers. The entry Kij is the rating customer i
gives to product j. So for instance
KHugo Woerdeman , Advanced Linear Algebra by Woerdeman = ? ? ? ? ?.
Notice that the ratings, one through five stars, do not form a field (we
can make it a matrix with entries in Z5 , but it would not be meaningful
in this context). Still, as it turns out, it is useful to consider this as a
matrix over R. Why the real numbers? Because the real numbers have a
natural ordering, and the ratings are ordered as well. In fact, the
ordering of the ratings is the only thing we care about! The main
problem with this matrix is that you will never know all of its entries
(unless you are running a really small business), and the ones you think
you know may not be accurate.
• A matrix C where both the rows and columns represent all the people
(in the worlds, in a country, in a community), and the entries cij are 1 or
0 depending whether person i knows person j. If we believe the “six
degrees of separation” theory, the matrix C + C 2 + · · · + C 6 will only
have positive entries. (The matrix C is an adjacency matrix; see Exercise
7.9.16 for the definition.)
• A matrix H where each row represents the genetic data of each known
(DNA) virus. Does this even make sense? Can anything be done with
this? The entries would be letters (A, C, G, T ), without any (obvious)
addition and multiplication to make it into a meaningful field.
• (Make up your own.)
There are at least two types of techniques that can help in dealing with
these types of matrices:
1. Techniques where one just needs to multiply a matrix with a vector. In

the case of matrices P and M we can figure out the rows fairly easily, so
if our techniques just involve multiplying with row vectors on the left,
preferably with ones that only have few nonzero entries, then we can
apply this technique.
2. If we can assume that the matrix is low rank, then knowing and/or storing
just part of the matrix gives us enough to work with the whole matrix.
We will explore these ideas further in the next sections.
7.2 Algorithms based on matrix vector products
If we are in a situation where it is hard to deal with the whole square matrix
A, but we are able to compute a vector product Av, are we still able to
compute eigenvalues of A, or solve an equation Ax = b. Examples of such a
situation include
• A sparse matrix A; that is, a matrix with relatively few nonzero entries.
While the matrix may be huge, computing a product Av may be doable.
• A situation where the matrix A represents the action of some system in
which we can give inputs and measure outputs. If the input is u and the
output is y, then by giving the system the input u and by measuring the
output y we would in effect be computing the product y = Au. In this
situation we would not know the (complete) inner workings of this
system, but assume (or just guess as a first try) that the system can be
modeled/approximated by a simple matrix multiplication.
• The matrices M and P from Section 7.1.
Here is a first algorithm that computes the eigenvalue of the largest modulus
in case it has geometric multiplicity 1.
Theorem 7.2.1 (Power method) Let A ∈ Cn×n have eigenvalues

{λ1 , . . . , Q
λn } with λ1 > maxj=2,...,n |λj |. Let v be so that
n
v 6∈ Ker j=2 (A − λj ). Then the iteration
1 v∗ Avk
v0 := v, , vk+1 = Avk , µk := k∗ , k = 1, 2, . . . ,
kAvk k vk vk
has the property that λ1 = limk→∞ µk and w := limk→∞ vk is a unit

eigenvector for at λ1 , thus Aw = λ1 w and kwk = 1.
Example 7.2.2 For illustration, let us see how the algorithm works on the
matrix  
3 0 0
A = 0 2 0
0 0 1
with initial vector  
1
v0 = 1 .
1
Then    
3 9
1 1
v1 = √ 2 , v2 = √ 4 ,
32 + 2 2 + 1 2 1 34 + 24 + 14 1
 k
3
1 2k  , k ∈ N.
vk = √
32k + 22k + 12k 1
Notice that
3k 1
√ =q → 1,
32k + 22k + 12k 1 + ( 3 ) + ( 13 )2k
2 2k
2k ( 23 )k
√ =q → 0,
32k + 22k + 12k 1 + ( 23 )2k + ( 13 )2k
1k ( 13 )k
√ =q → 0,
32k + 22k + 12k 1 + ( 23 )2k + ( 13 )2k
so that vk → e1 as k → ∞. In addition,
32k+1 + 22k+1 + 12k+1

lim µk = lim = 3.
k→∞ k→∞ 32k + 22k + 12k
As long as the initial vector v0 does not have a 0 as the first entry, we will
always have that limk→∞ vk = e1 .
In order to prove Theorem 7.2.1 we need to show that powers of a Jordan

block with an eigenvalue of modulus less than 1, converge to the zero matrix.
Lemma 7.2.3 Let |µ| < 1 and k ∈ N. Then limm→∞ Jk (µ)m = 0.
m
m
Proof. Let 1 ≤ i ≤ j ≤ k. Then the (i, j)th entry of Jk (µ)m equals j−i µ .
Notice that
n m(m − 1) · · · (m − (j − i) + 1)
=
j−i (j − i)!
is a polynomial p(m) of degree j − i ≤ k − 1 in m. But then

m
lim µm = lim p(m)µm = 0,
m→∞ j − i m→∞
where we used that |µ| < 1. For details on the last step, please see Exercise
7.9.1.
For A ∈ Cn×n we define its spectral radius ρ(A) via
ρ(A) = max{|λ| : λ is an eigenvalue of A} = max |λ|.

λ∈σ(A)
Corollary 7.2.4 Let A ∈ Cn×n . Then limm→∞ Am = 0 if and only if

ρ(A) < 1.
Proof. Let ρ(A) < 1. Then A = SJS −1 with J = ⊕sj=1 Jnj (λj ), where
|λj | < 1, j = 1, . . . , s. Then, by Lemma 7.2.3, J m = ⊕sj=1 (Jnj (λj ))m → 0 as
m → ∞. But then Am = SJ m S −1 → 0 as m → ∞.
Next, suppose that ρ(A) ≥ 1. Then A has an eigenvalue λ with |λ| ≥ 0. Let
x be a corresponding eigenvector. Then Am x = λm x 6→ 0 as m → ∞. But
then it follows that Am 6→ 0 as m → ∞.
Proof of Theorem 7.2.1. Use Theorem 4.4.1 to write A = SJS −1 , with

 
λ1 0 ··· 0
 0 J(λ2 ) · · · 0 
J = . ..  ,
 
. ..
 .. .. . . 
0 0 · · · J(λm )
where we use that λ1 has only one Jordan block of size 1. Denote
 
c1
−1  .. 
S v =  . .
cn
Qn
Since v 6∈ Ker j=2 (A − λj ), we have that c1 =
6 0. Put
 
1 0 ··· 0  
0 1
J(λ 2)
k
··· 0  c1
1 λk1
 . 
wk = k Ak v = S 

  .  , k = 0, 1, . . . .
λ1  .. .. ..
.
..  .
. . .
cn

1
0 0 ··· λk
J(λm )k
1
wk
Then vk = kwk k . Also, using Lemma 7.2.3, we see that for j > 1, we have
that
1 λj
J(λj )k = diag(λr−1
1 )nr=1 J( )k diag(λ−r+1
1 )nr=1 → 0 when k → ∞.
λk1 λ1
Thus
   
1 0 ··· 0  c1  c1
0 0 ··· 0 . 0
wk → S  .
 . 
 . =S .  =: x when k → ∞.
   
.. . . ..  
 .. . . . .  .. 
..c
0 0 ··· 0 n 0
Notice that x, a multiple of the first column of S, is an eigenvector of A at
wk x
λ1 . We now get that vk = kw kk
→ kxk =: w is a unit eigenvector of A at λ1 ,
and
vk∗ Avk wk∗ Awk x∗ Ax
µk = = → = w∗ Aw = λ1 w∗ w = λ1 when k → ∞.
vk∗ vk wk∗ wk x∗ x

If one is interested in more than just one eigenvalue of the matrix, one can
introduce so-called Krylov spaces:
Span{v, Av, A2 v, . . . , Ak v}.
Typically one finds an orthonormal basis for this space, and then studies
how powers of the matrix A act on this space. In this way one can
approximate more than one eigenvalue of A.
Another problem of interest is to find a solution x to the equation Ax = b,

where we expect the equation to have a solution x with only few nonzero
entries. In this case, A typically has far more columns than rows, so that
solutions to the equation are never unique. We are however interested in the
solution that only has a few nonzero entries, say at most s nonzero entries.
The system typically is of the form
 
0
 .. 
.
 
0
 ∗

 
 A  0
 = b,

  .. 
 
.
 
 
0
 
∗
 
0
 
..
.
where the ∗’s indicate the few nonzero entries in the desired solution x. It is
important to realize that the location of the nonzero entries in x are not
known; otherwise one can simply remove all the columns in A that
correspond to a 0 in x and solve the much smaller system.
To solve the above problem one needs to use some non-linear operations.
One possibility is to use the hard thresholding operator Hs : Cn → Cn , which
keeps the s largest (in magnitude) entries of a vector x and sets the other
entries equal to zero. For instance
   
    1 0
3+i 0  5   0 
   
2 − 8i 2 − 8i −20 −20
H2  = ,H =
.
2−i  0  3
      
 2   0 

10 10  11   11 
−7 −7
Notice that these hard thresholding operators are not linear; for instance

3 −2 3 −2 1 0 3 −2
H1 + H1 = + = 6= = H1 ( + ).
1 1 0 0 0 2 1 1
Notice that Hs is actually not well-defined on vectors where the sth largest
element and the (s + 1)th largest element have the same magnitude. For
instance, is
       
3+i 3+i 3+i 0
3 − i  0  3 − i 3 − i
H2 
2 − i =  0  or H2 2 − i =  0 ?
      
10 10 10 10
When the algorithm below is used, this scenario either does not show up, or
the choice one makes does not affect the outcome, so this detail is usually
ignored. Of course, it may cause a serious problem in some future
application, at which point one needs to rethink the algorithm. There are
other thresholding functions where some of the values are diminished, but
not quite set to 0. The fact that one completely annihilates some elements
(by setting them to 0, thus completely ignoring their value) gives it the term
“hard” in hard thresholding.
The hard thresholding algorithm is now as follows:
Let A ∈ Cm×n so that σ1 (A) < 1.
1. Let x0 = 0.
2. Put xn+1 = Hs (xn + A∗ (b − Axn )).
3. Stop when kxn+1 − xn k < .
The above algorithm (without stopping it) converges to a local minimum of

the problem
min kb − Axk subject to Hs (x) = x.
x
Finding a solution x to Ax = b that is sparse (only few entries nonzero), is

referred as a compressed sensing problem. It has been successfully applied in
several settings. For instance, in [T. Zhang, J. M. Pauly, S. S. Vasanawala
and M. Lustig, 2013] one can see how compressed sensing was used in
reducing MRI acquisition time substantially.
7.3 Why use matrices when computing roots of

polynomials?
We saw in Section 5.4 that in order to compute the QR factorization of a

matrix only simple arithmetic computations are required. Indeed, one only
needs addition, subtraction, multiplication, division, and taking square roots
to find the QR factorization of a matrix. Amazingly, doing it repeatedly in a
clever way provides an excellent way to compute eigenvalues of a matrix.
This is surprising since finding roots of a polynomial is not as easy as
performing simple algebraic operations (other than for degree 1, 2, 3, 4
polynomials, using the quadratic formula (for degree 2) and its
generalizations; for polynomials of degree 5 and higher it was shown by Niels
Hendrik Abel in 1823 that no algebraic formula exists for its roots). In fact,
it works so well that for finding roots of a polynomial one can just build its
corresponding companion matrix, and subsequently apply the QR algorithm
to compute its roots. Let us give an example.
Example 7.3.1 Let p(t) = t3 − 6t2 + 11t − 6 (= (t − 1)(t − 2)(t − 3)). Its
companion matrix is  
0 0 6
A = 1 0 −11 .
0 1 6
Computing its QR factorization, we find
  
0 0 1 1 0 −11
A = QR = 1 0 0 0 1 6 .
0 1 0 0 0 6
If we now let A1 = RQ = Q−1 QRQ = Q−1 AQ, then A1 has the same
eigenvalues as A. We find
    
1 0 −11 0 0 1 0 −11 1
A1 = 0 1 6  1 0 0 = 1 6 0 .
0 0 6 0 1 0 0 6 0
Again, we do a QR factorization of A1 = Q1 R1 , and let

A2 = R1 Q1 (= Q−1
1 A1 Q1 ). We find
 
6.0000 −0.8779 0.4789
A2 = 12.5300 −0.4204 −0.7707 .
0 0.2293 0.4204
After 8 more iterations (Ai = Qi Ri , Ai+1 := Ri Qi ) we find that

 
3.0493 −10.9830 7.5430
A10 = 0.0047 1.9551 −1.8346 .
0 0.0023 0.9956
Notice that the entries below the diagonal are relatively small. In addition,
the diagonal entries are not too far off from the eigenvalues of the matrix:
1,2,3. Let us do another 20 iterations. We find
 
3.0000 −10.9697 7.5609
A30 = 0.0000 2.0000 −1.8708 .
0 0.0000 1.0000
As A30 is upper triangular we obtain that its diagonal entries 3,2,1 are the
eigenvalues of A30 , and therefore they are also the eigenvalues of A.
The QR algorithm converges to an upper triangular matrix for large classes

of matrices. We provide the proof for the following class of Hermitian
matrices.
Theorem 7.3.2 If A = A∗ ∈ Cn×n has eigenvalues
|λ1 | > |λ2 | > · · · > |λn | > 0,
and A = V ΛV ∗ where Λ = diag(λi )ni=1 and V is unitary with V ∗ = LU

where L is lower triangular and U is upper triangular, then the iteration
A1 = A, Ai = Qi Ri , Ai+1 = Ri Qi , i = 1, 2, . . . ,
with Qi Q∗i = In and Ri upper triangular with positive diagonal entries, gives
that
lim Ak = Λ.
k→∞
We first need a lemma.
Lemma 7.3.3 Let Vk ∈ Cn×n , k ∈ N, be unitary matrices and Uk ∈ Cn×n ,

k ∈ N, be upper triangular matrices with positive diagonal entries. Suppose
that limk→∞ Vk Uk = In . Then limk→∞ Vk = In and limk→∞ Uk = In .
Proof. Let us write

(k)
Vk = v1(k) ··· vn
(k)
, Uk = (uij )ni,j=1 .
Then, looking at the first column of the equality limk→∞ Vk Uk = In , we have

that
(k) (k)
u11 v1 → e1 , (7.1)
and thus
(k) (k) (k) (k) (k)
u11 = u11 kv1 k = ku11 v1 k → ke1 k = 1,
(k) (k)
giving that limk→∞ u11 = 1. Combining this with (7.1) gives that v1 → e1 .
Next, from the second column of the equality limk→∞ Vk Uk = In , we have
that
(k) (k) (k) (k)
u12 v1 + u22 v2 → e2 . (7.2)
(k)
Taking the inner product with v1 gives
(k) (k) (k) (k) (k) (k)
u12 = hu12 v1 + u22 v2 , v1 i → he2 , e1 i = 0. (7.3)
(k) (k)
Then by (7.49) we find that u22 v2 → e2 , which in a similar manner as
(k) (k)
before implies that u22 → 1 and v2 → e2 . Continuing this way, we find
(k) (k) (k)
that uij → 0, i < j, and uii → 1, i = 1, . . . , n, and vj → ej , j = 1, . . . , n.
This proves the result.
Proof of Theorem 7.3.2. Notice that A2 = Q1 R1 Q1 R1 = Q1 Q2 R2 R1 , and

that in general we have that
Ak = Q1 Q2 · · · Qk Rk · · · R2 R1 .
In addition,
Ak = V Λk V ∗ = V Λk LU.
Notice that we may choose for L to have diagonal elements equal to 1.
Combining we obtain
Λk L = (V ∗ Q1 Q2 · · · Qk )(Rk · · · R2 R1 U −1 ),
and thus
Λk LΛ−k = (V ∗ Q1 Q2 · · · Qk )(Rk · · · R2 R1 U −1 Λ−k ).

Write L = (lij )ni,j=1 with lii = 1, i = 1, . . . , n, and lij = 0 for i < j. We now
have that Λk LΛ−k is lower triangular with a unit diagonal, and with (i, j)th
entry lij ( λλji )k , i < j, in the lower triangular part. As | λλji | < 1, i > j, we have
that limk→∞ lij ( λλji )k = 0, and thus limk→∞ Λk LΛ−k = In . Let
λi n uii n
∆ = diag( ) , E = diag( ) ,
|λi | i=1 |uii | i=1
where U = (uij )ni,j=1 . Let
Wk = V ∗ Q1 Q2 · · · Qk E −1 ∆−k , Uk = ∆k ERk · · · R2 R1 U −1 Λ−k . (7.4)
Then Wk is unitary, Uk is upper triangular with positive diagonal entries,

and Wk Uk → In . By Lemma 7.3.3 it now follows that Wk → In and
Uk → In . Now
−1
Ak = Qk Rk = E ∗ ∆−(k−1) Wk−1
∗
Wk ∆k E∆−k E ∗ Uk ΛUk−1 E∆k−1 → Λ.
Indeed, if we write Wk = I + Gk and Uk = I + Hk , then Gk → 0 and

Hk → 0. Reworking the expression
E ∗ ∆−(k−1) Wk−1
∗
Wk ∆k E∆−k E ∗ Uk ΛUk−1
∗
E∆k−1 − Λ (7.5)
gives that each term has at least one of Gk , G∗k−1 , Hk , Hk−1

∗
in it, while
multiplying with diagonal unitaries E and ∆ does not affect the norms of
the expression. This show that (7.5) converges to 0 as k → ∞.
While Theorem 7.3.2 only addresses the case of Hermitian matrices, the
convergence result goes well beyond this case. In particular, it works for
large classes of companion matrices. Due to the structure of companion
matrices, one can set up the algorithm quite efficiently, so that one can
actually compute roots of polynomials of very high degree accurately. In
Figure 7.1, we give an example of degree 10,000.
Concerns with large matrices (say, 104 × 104 = 108 entries) are (i) how do
you update them quickly? (ii) how do you store them? As it happens,
companion matrices have a lot of structure that can be maintained
throughout the QR algorithm. First observe that a companion matrix has
zeros in the lower triangular part under the subdiagonal. The terminology is
as follows. We say that A = (aij )ni,j=1 is upper Hessenberg if aij = 0 when
i > j + 1. The upper Hessenberg structure is maintained throughout the QR
algorithm, as we will see now.
Proposition 7.3.4 If A is upper Hessenberg, Q is unitary and R is upper

triangular, and A = QR, then RQ is upper Hessenberg as well.
1
0.5
0
-1 -0.5 0 0.5 1
-0.5
-1
P10,000
Figure 7.1: These are the roots of the polynomial pk (10, 000)xk , where pk (n)
k=1
is the number of partitions of n in k parts, which is the number of ways n can be
written as the sum of k positive integers.
Proof. As the jth column of Q is a linear combination of columns 1, . . . , j of

A, the jth column of Q has zeroes in positions j + 2, . . . , n. This gives that
Q is upper Hessenberg. As R is upper triangular, the ith row of RQ is a
linear combination of rows i, . . . , n of Q, and thus the ith row of RQ has
zeros in positions 1, . . . , i − 2. This gives that RQ is upper Hessenberg.
Corollary 7.3.5 If A is upper Hessenberg, then its iterates in the QR

algorithm are also upper Hessenberg.
Proof. Follows directly from Proposition 7.3.4.
Aside from the upper Hessenberg property, a companion matrix has more
structure: it is the sum of a unitary matrix and a rank 1 matrix. Indeed, the
companion matrix
 
0 0 ··· 0 −a0
1 0 · · · 0 −a1 
 
 .. .. ..  ,
C = . . . 
 
0 · · · 1 0 −an−2 
0 · · · 0 1 −an−1
can be written as
C = Z + xy∗ ,
where
0 eiθ −a0 − eiθ
   
0 0 ···
1 0
 ··· 0 0 
 −a1 
 
Z =  ... .. ..  , x =  ..
 , y = en .
 
 . . 
 
 . 
0 · · · 1 0 0   −an−2 
0 ··· 0 1 0 −an−1
Here Z is unitary and xy∗ has rank 1. Notice that θ can be chosen to be any
real number. The property of being the sum of a unitary and a rank 1 is
maintained throughout the QR algorithm, as we prove next.
Proposition 7.3.6 If A = Z + K with Z unitary and rank K = 1, then its

iterates in the QR algorithm are also the sum of a unitary matrix and a rank
1 matrix.
Proof. Let A = Z + K and A = QR. Then R = Q∗ Z + Q∗ K, and thus

RQ = Q∗ ZQ + Q∗ KQ. As Q∗ ZQ is unitary, and rank Q∗ KQ = rank K = 1,
we find that the first iterate has the required form. But then repeating the
argument we get that the same follows for every iterate.
Combining the observations in Corollary 7.3.5 and Proposition 7.3.6 it is

clear that when starting with a companion matrix, all its iterates continue to
have a lot of structure that can be used to perform computations and store
them efficiently. Taking advantage of this can lower the number of arithmetic
operations required in each iteration, as well as the amount of storage
required to store the information. As a result, one can deal with high-degree
polynomials in this way.
Let us observe that in finding roots of non-linear systems one often still relies
on linear algebra. Indeed, Newton’s method is based on the idea that if we
would like to find a root of a function f , we start at a first guess, and if this
is not a root, we pretend that the graph at this point is a line (the tangent
line) and find the root of that line. This is our next guess for our root of f . If
the guess is right, we stop. If not, we continue as before by computing the
root of the tangent line there, and repeat this process iteratively.
There are many iterative linear schemes that solve a nonlinear problem. One
such example is an image enhancement scheme that was used in law
enforcement. Such methods need to be defendable in court, convincing a jury
that the information extracted was there to begin with rather than that the
program “invented” information. In the riots in Los Angeles in 1992 one of
the convictions was based on the enhancement of video images taken from a
helicopter. Indeed, after enhancement of these images a tattoo became
recognizable leading to the identity of one of the rioters.
7.4 How to find functions with linear algebra?
Many scenarios require finding a function based on partial data:
• In medical imaging, one is looking for a function f (x, y, z) which

describes the material density of one’s body at a point (x, y, z). To do
this, one sends radiation through the body and measures on the other
side the intensities at different locations. These intensities will be
different based on the different intensities the rays of radiation R
encountered in the body. Mathematically, one measures integrals f gL
along lines L (here gL is the function that takes on the value 1 on the
line L and is zero elsewhere), this being the data one collects from which
one would like to reconstruct the function.
• In prediction theory, one tries to predict what will happen in the future
based on measurements in the past and present. In this situation, one
has data f (w1 ), . . . , f (wn+1 ), and one would like to find values
f (wn+2 ), f (wn+3 ), . . ..
These are the problems we will focus on in this section: reconstruct a

function
R f basedR on either interpolating data f (w1 ), . . . , f (wn ), or integral
data f g1 , . . . , f gn . As the maps
Z
f 7→ f (w) and f 7→ f g
are linear, linear algebra plays a very useful role here. In both cases we will
restrict the discussion to collecting a finite number of data points. For more
general data collection, one would need some tools from functional analysis
to set up a robust theory.
We are thus considering the problem: Given a linear map
E : FX → Fn
and a vector v ∈ Fn , find a function f ∈ FX so that E(f ) = v. In the case of

interpolation data the field F can be any field, while in the case of integral
data the underlying field is R or C. Certainly in the last two cases the vector
space FX is infinite dimensional, and one typically would like to restrict the
question to a finite-dimensional subspace W of FX . Thus, rather than trying
to find just any type of function, one restricts the attention to a
(finite-dimensional) subspace. This leads to an important question: What
subspace W makes the most sense in your application? When one deals with
Figure 7.2: A Meyer wavelet.
sound signals, cosines and sines are great functions to work with. In this case
one could take
W = Span{1, cos x, sin x, cos 2x, sin 2x, . . . , cos N x, sin N x}.
The number k in cos kx, sin kx is referred to as the frequency, and our ear
hears a higher tone when the frequency is higher. In addition, the range that
the human ear can hear is between 20 Hz and 20,000 Hz (with Hz
corresponding to 1 cycle per second). Thus, when it is about sounds the
human ear can hear, it makes perfectly sense to use a finite-dimensional
subspace. As
eix = cos x + i sin x, e−ix = cos x − i sin x
one can also deal with the subspace
W = Span{e−iN x , ei(N −1)x , . . . , ei(N −1)x , eiN x },
often simplifying the calculations (which may sound counterintuitive when
you are still getting used to complex numbers, but for instance, simple rules
like ea eb = ea+b are easier to work with than the formulas for cos(a + b) and
sin(a + b)). In some cases it is better to work with functions that are nonzero
only on a finite interval (which is not true for cos and sin), and so-called
wavelet functions were invented to have this property while still keeping some
advantages that cos and sin have. In Figure 7.2 is an example of a wavelet.
Once we have settled on a finite dimensional subspace of functions, we can
start to use linear algebra. We begin the exposition using polynomials.
Example 7.4.1 Consider Fn−1 (X) with basis B = {1, X, X 2 , . . . , X n−1 }.

Let {x1 , . . . , xn } ⊆ F and E : W → Fn be given by
 
p(x1 )
E(p(X)) =  ...  .
 
p(xn )
Then
xn−1
 
1 x1 ··· 1
1 x2 ··· x2n−1 
[E]E←B = . ..  =: V (x1 , . . . , xn ), (7.6)
 
..
 .. . . 
1 xn ··· xn−1
n
where E is the standard basis of Fn . The matrix V (x1 , . . . , xn ) is called the
Vandermonde matrix. Thus interpolation with polynomials leads to a system
of equations with a Vandermonde matrix.
Proposition 7.4.2 The Vandermonde matrix V (x1 , . . . , xn ) satisfies

Y
det V (x1 , . . . , xn ) = (xi − xj ). (7.7)
1≤j<i≤n
6 xj when i 6= j.
In particular, V (x1 , . . . , xn ) is invertible as soon as xi =
Proof. We prove this by induction. When n = 2 we have

1 x1
det V (x1 , x2 ) = det = x2 − x1 .
1 x2
Next, suppose the satement holds for V (w1 , . . . , wn−1 ). We now take
V (x1 , . . . , xn ) and subtract row 1 from all the other rows, leaving the
determinant unchanged and arriving at the matrix
xn−1
 
1 x1 ··· 1
n−1
0 x2 − x1 · · · x2 − x1  n−1
.
 
 .. .. ..
. . . 
0 xn − x1 ··· xn−1
n − xn−1
1
Next, we subtract, in order, x1 times column n − 1 from column n, x1 times

column n − 2 from column n − 1, and so on, until we subtract x1 times
column 1 from column 2. This again leaves the determinant unchanged, and
leads to the matrix
 
1 0 0 ··· 0 0
0 x2 − x1 (x2 − x1 )x2 · · · (x2 − x1 )xn−32 (x2 − x1 )xn−2
2

.
 
 .. .. .. .. ..
. . . . . 
0 xn − x1 (xn − x1 )xn ··· (xn − x1 )xn−3
n (xn − x1 )xn−2
n
This matrix equals

  
1 0 ··· 0 1 0 0 ··· 0
0 x2 − x1 ··· 0  0 1 x2 ··· xn−2
2

..  .
  
 .. .. .. ..   .. .. ..
. . . .  . . . . 
0 0 ··· xn − x1 0 1 xn ··· xn−2
n
So we find that
n
Y
det V (x1 , . . . , xn ) = [ (xj − x1 )] det V (x2 , . . . , xn ),
j=2
and (7.7) follows by using the induction assumption.
A particular useful Vandermonde matrix is the Fourier matrix

2πi
Fn = V (1, α, α2 , . . . , αn−1 ), α = e n .
Proposition 7.4.3 The matrix √1 Fn

n
is unitary. In particular, Fn−1 = 1 ∗
n Fn .
Proof. Notice that for k ∈ {1, . . . , n − 1}, we have that

0 = 1 − (αk )n = (1 − αk )(1 + αk + (αk )2 + · · · + (αk )n−1 ).
As αk 6= 1, we get that
1 + αk + (αk )2 + · · · + (αk )n−1 = 0.
Now one can easily check that Fn Fn∗ = nIn .
Aside from having an easily computable inverse, the Fourier matrix (when n
is a power of 2) also has the advantage that it factors in simpler matrices.
This makes multiplication with the Fourier matrix easy (and fast!) to
compute. We just illustrate the idea for n = 4 and n = 8:
    
1 1 1 1 1 1 0 0 1 0 1 0
1 i −1 −i  0 0 1 i  0 1 0 1
F4 = 
1 −1 1 −1 = 1 −1 0 0  1 0 −1 0  ,
   
1 −i −1 i 0 0 1 −i 0 1 0 −1
  
1 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0
0 0 1 α 0 0 0 0  0 1 0 1 0 0 0 0
  
0 0 0 0 1 α2 0 0  0 0 0 0 1 0 α 2
0
   
0 0 0 0 0 0 1 α3  0 0 0 0 0 1 0 α2 
F8 =    ×
1 α4 0 0 0 0 0 0  1 0 α4 0 0 0 0 0
   
0 0 1 α5 0 0 0 0  0 1 0 α4 0 0 0 0
   
0 0 0 0 1 α6 0 0  0 0 0 0 1 0 α6 0 
0 0 0 0 0 0 1 α7 0 0 0 0 0 1 0 α6
 
1 0 0 0 1 0 0 0
0 1 0 0 0 1 0 0
 
0 0 1 0 0 0 1 0
 
0 0 0 1 0 0 0 1
  , where α = e 2πi
8 .
1 0 0 0 −1 0 0 0 
 
0 1 0 0 0 −1 0 0
 
0 0 1 0 0 0 −1 0 
0 0 0 1 0 0 0 −1
Notice that when multiplying with one of these simpler matrices, for each
entry one only needs to do one multiplication and one addition. Thus
multiplying with F8 requires 24 multiplications and 24 additions. In general,
we have that multiplying with Fn requires n log2 n multiplications and
n log2 n additions. This is a lot better than n2 multiplications and (n − 1)n
additions, which one has with a regular n × n matrix vector multiplication
(this number can be reduced somewhat, but still for a general matrix it is of
the order n2 ).
Interpolation techniques are also useful over finite fields. An example where
this is used is secret sharing.
Example 7.4.4 (Shamir’s secret sharing) Suppose that we have a secret

number and we would like, among N people, that every k of them can piece
together the secret, making sure that if only k − 1 people get together they
cannot figure out the secret. An example would be of a bank, where with
any 3 people of the upper management, one would like to be able to open
the vault, but not with just 2 of them. We explain the idea in an example.
Suppose that m = 1432 is our secret number. Let us choose a prime number
p > m, for instance p = 2309 (which happens to be a primorial prime).
Suppose that N = 10 and k = 3. We choose a degree 3 − 1 = 2 polynomial
a(x) = a0 + a1 x + a2 x2 ,
where a0 = m = 1432 and a1 and a2 are some other numbers in Zp \ {0}. For
instance, a1 = 132 and a2 = 547. Now we generate interpolation data, for
instance:
x f (x)
1 2111
2 1575
3 2133
4 1476
5 1913
6 1135
7 1451
8 552
9 747
10 2036
If now three people get together, one will be able to reconstruct the
polynomial a(x), and thus the secret a(0). With only two people (thus, with
only two interpolation points), one will not be able to reconstruct the secret
code. For instance, with the data (2, 1575), (5, 1913), (9, 747), one finds the
secret by computing
 −1  
1 2 4 1575
a0 = 1 0 0 1 5 25 1913 , (7.8)
1 9 81 747
where one is working over the field Z2309 . The calculation (7.8) can be
programmed so that those holding the interpolation points do not need to
know the prime number p. When the three data points are known, but the
prime p is unknown, one still will not be able to reconstruct the secret,
providing some protection when someone listening in is able to get 3
interpolation points. This secret sharing scheme was introduced by Adi
Shamir.
We will next explain how one arrives at problems where a function is to be

found satisfying certain integral conditions. We start by explaining the ideas
behind the Galerkin method. Let X and Y be vector spaces of functions, and
Φ : X → Y a linear operator. Consider the problem of solving the equation
Φ(f ) = g. Typically, X and Y are infinite-dimensional spaces. If, however, Y
has a Hermitian form h·, ·i (which on a function space is often given via an
integral), and w1 , . . . , wn ∈ Y , we can instead solve:
hΦ(f ), wi i = hg, wi i, i = 1, . . . , n.
Pn U = Span{u1 , . . . , un },
In addition, we can take a finite-dimensional subspace
and seek a solution f in this subspace, thus f = j=1 aj uj for some scalars
a1 , . . . , an . Now we obtain the system of equations
Xn n
X
hΦ(f ), wi i = hΦ( aj uj ), wi i = aj hΦ(uj ), wi i = hg, wi i, i = 1, . . . , n.
j=1 j=1
If we let B be the matrix B = (hΦ(uj ), wi i)ni,j=1 , then we obtain the equation

      
a1 hΦ(u1 ), w1 i · · · hΦ(un ), w1 i a1 hg, w1 i
B  ...  =  .. ..   ..   .. 
  .  =  .  . (7.9)
  
. .
an hΦ(u1 ), wn i ··· hΦ(un ), wn i an hg, wn i
Pn
Now we are in a position to solve for a1 , . . . , an , and build f = j=1 aj uj .
Clearly, whether this is a meaningful solution to our original problem all
depends on whether we made good choices for u1 , . . . , un ∈ X and
w1 , . . . , wn ∈ Y (and, potentially, also on our choice for the Hermitian form
h·, ·i on Y ). One particular construction involves dividing the domain up in
small subdomains (elements) and having functions that are patched together
by taking, on each of these subdomains, a very simple function (linear,
quadratic, etc.). This is the main idea behind the finite element method.
Next, let us compute the matrix B in an important example that involves

the Laplace operator.
Example 7.4.5 Let Ω ⊆ R2 be a bounded region with boundary ∂Ω. One

can think of Ω being the inside of a circle, an ellipse, a rectangle, or some
other shape. We consider real-valued functions defined on the set Ω ∪ ∂Ω. We
∂2 ∂2
let Φ be the Laplace operator ∂x 2 + ∂y 2 , arriving at the Poisson equation
∂2f ∂2f
2
+ 2 = g,
∂x ∂y
and let us add the zero boundary condition
f = 0 on ∂Ω.
Thus our vector space X consists of functions that are differentiable twice
with respect to each of the variables x and y, and that are zero on the
boundary ∂Ω. We introduce the Hermitian form
Z Z
hk, hi := k(x, y)h(x, y) dx dy,
Ω
which is actually an inner product as we are dealing with continuous

functions. Let us choose functions u1 (x, y), . . . , un (x, y) ∈ X, and let
wi (x, y) = ui (x, y), i = 1, . . . , n, be the same set of functions. Now the
matrix B = (bij )ni,j=1 in (7.9) is given by
∂ 2 uj ∂ 2 uj ∂ 2 uj ∂ 2 uj
Z Z
bij = h 2 + , ui i = ( + )ui dx dy.
∂x ∂y 2 Ω ∂x2 ∂y 2
Performing partial integration, and using the zero boundary condition, we
arrive at
Z Z
∂uj ∂ui ∂uj ∂ui ∂uj ∂ui ∂uj ∂ui
bij = + dx dy = h , i+h , i.
Ω ∂x ∂x ∂y ∂y ∂x ∂x ∂y ∂y
Note that B is symmetric, and when ui , i = 1, . . . , n are chosen so that
{ ∂u ∂un ∂u1 ∂un
∂x , . . . , ∂x } or { ∂y , . . . , ∂y } is linearly independent, we have that B is
1
positive definite. This guarantees that one can solve for a1 , . . . , an in (7.9)
and thus construct a solution f .
Another widely used construction involves the Fourier transform.
Example 7.4.6 Given a function f : R → C, we define its Fourier transform

fˆ via Z ∞
fˆ(ω) = f (x)e−2πiωx dx,
−∞
where ω ∈ R. Of course, one needs to worry ˆ

R ∞ whether f is well-defined (which
it is if, for instance, f is continuous and −∞ |f (x)|dx < ∞), but we will not
go into a detailed discussion about this. The quantity fˆ(ω) measures

intuitively how well f (x) matches the function e−2πiωx . The variable ω is
referred to as the frequency, and as mentioned in the beginning of this
section, this is a meaningful notion in sound. For instance, if f represents a
noisy recording of a conversation, one could take its Fourier transform and
analyze which frequencies correspond to the noise (typically the high
frequencies) and which frequencies correspond to the actual conversation. By
keeping only the frequencies corresponding to the conversation, and
performing an inverse Fourier transform, one obtains a noise-free
conversation. This process is referred to as filtering and can be done in real
time (as opposed to first having to record the full conversation). In many of
our communication devices filters are being used. Filters have their flaws, of
course, and can for instance create an echo. A signal processing course would
explain all this in detail.
Example 7.4.7 Blurring of an image is represented by an integral. If

f (x, y), (x, y) ∈ Ω represents the image (at each location (x, y) there is an
intensity), then the blurred image will look like
Z Z
Bf (x, y) = f (x − s, y − t)g(s, t) ds dt,
Ω
which is a so-called convolution integral. The function g will have the

following shape. The effect of the convolution integral is that the value
Figure 7.3: Blurring function.
Bf (x, y) is a weighted average of the values of f in a region around the

point (x, y). To deblur a picture, one would start with Bf (x, y) and try to
solve for f (x, y). As blurring is like taking averages, the deblurring is not
going to be perfect. The following shows some typical effects.
Figure 7.4: The original image (of size 3000 × 4000 × 3).
(a) Blurred image. (b) Deblurred image.
7.5 How to deal with incomplete matrices
In 2006 Netflix put out a one-million-dollar challenge: improve on their

existing movie recommendation scheme. They provided anonymous rating
data, and the assignment was to predict ratings by customers 10% better
than Netflix’s program Cinematchr did based on the same training set. In
September 2009 the $1M grand prize was awarded to team Bellkor’s
Pragmatic Chaos, after a nail-biting finish as The Ensemble submitted their
solution only 19 minutes and 54 seconds after the winners did. These teams
were groups joining together after they submitted solutions that were in the
9% range, not yet quite achieving the desired 10%. An important ingredient
in these solutions is the idea of minimal rank completions, which we will
explain in this section.
A partial matrix over F is a matrix with some entries in F given and others
unknown. For instance
1 0 ?
A= (7.10)
? 1 ?
is a 2 × 3 partial matrix with entries (1, 3), (2, 1) and (2, 3) unknown. When
convenient, we indicate the unknowns by variables:

1 0 x13
A= .
x21 1 x23
We view the unknown as variables xij that take value in the field F. The set
of locations J ⊆ {1, . . . , n} × {1, . . . , m} of known entries is called the
pattern of the partial matrix. For instance, for the partial matrix (7.10) the
pattern is {(1, 1), (1, 2), (2, 2)}. A completion of a partial matrix is obtained
by choosing values in F for the unknowns. For instance, if F = R, then
!

1 0 1
1 0 qπ
A= ,A = 5
10 1 −7 e2 1 17
are completions of the partial matrix (7.10). We will denote partial matrices
by A, B, etc., and their completions by A, B, etc.
Going back to the Netflix challenge, a partial matrix corresponding to

ratings data may look like
 
1 ? 4
? ? 3
A= 5
,
2 ?
3 ? ?
where each customer is represented by a row and each movie is represented
by a column. So, for instance, customer 1 rated movie 3 with 4 stars, while
customer 3 did not rate movie 3.
Given a partial matrix A, we call a completion A a minimal rank completion

of A if among all completions B of A the rank of A is minimal. Thus
rank A = min rank B.

B a completion of A
The minimal rank of a partial matrix A is defined to be the rank of a

minimal rank completion of A. In other words
min rank A = min rank B.

B a completion of A
For instance,

1 0 ? 1 1 ?
min rank = 2, min rank = 1.
? 1 ? ? 1 ?
Indeed, independent of the choice for x13 , x21 and x23 , we have that

1 0 x13
rank = 2,
x21 1 x23

1 1 ?
while any completion of B = has rank at least 1, and
? 1 ?

1 1 1
1 1 1
is a completion of B with rank 1.
With the partial ranking data, one obtains a large matrix (say, of size
1, 000, 000, 000 × 100, 000) where only a small percentage of the values are
known. It turned out that looking for (an approximation of) a minimal rank
completion was a good move. Apparently, a model where our individual
movie rankings are a linear combination of the ranking of a relatively few
number of people provides a reasonable way to predict a person’s movie
rankings. Of course, a minimal rank completion of a partial matrix that has
entries in the set {1, 2, 3, 4, 5} will not necessarily have its entries in this set,
so additional steps need to be taken to get ranking predictions.
So, how does one find a minimal rank completion? Here we discuss one
algorithm, which assumes that F = R or C, based on an initial guess of an
upper bound of the minimal rank. For
 
σ1 0 · · · 0 · · · 0
 0 σ2 · · · 0 · · · 0
 .. .. . . .. .. 
 
. . . . . , σ1 ≥ · · · ≥ σm ,
Σ= (7.11)
0
 0 · · · σm · · · 0 
. .. .. .
 .. . . .
0. .. 
0 0 ··· 0 ··· 0
and k ≤ m, let us define

 
σ1 0 ··· 0 ··· 0
0 σ2 ··· 0 ··· 0
 .. .. .. .. 
 
..
. . . . .
Hk (Σ) :=  ,
0
 0 ··· σk ··· 0 
. .. .. .. 
 .. . . .
0. .
0 0 ··· 0 ··· 0
thus just keeping the k largest singular values. Notice that the operation Hk
is like the hard thresholding operator introduced in Section 7.2.
The algorithm to find a minimal rank completion is now as follows.
Given are a real or complex partial matrix A with pattern J, an integer k,

and a tolerance > 0.
1. Choose a completion A0 of A.
2. While sk+1 (Ai ) ≥ , do the following:
(i) Find a singular value decomposition Ai = Ui Σi Vi∗ of Ai . Compute

Bi = Ui Hk (Σi )Vi∗ .
(ii) Let Ai+1 be defined by
(
(Ai )rs if (r, s) ∈ J,
(Ai+1 )rs =
(Bi )rs if (r, s) 6∈ J.
3. If the algorithm fails to stop in a reasonable time, raise the integer k.
For this algorithm to work, one needs to be able to find a (good

approximation of a) singular value decomposition of a large matrix. Such
algorithms have been developed, and are used for instance in search engines.
Another area where incomplete matrices appear involve distance matrices. A

matrix D = (dij )ni,j=1 is called a (Euclidean) distance matrix if there exist an
n ∈ N and vectors v1 , . . . , vk ∈ Rn such that dij = kvi − vj k2 , where k · k
denotes the Euclidean distance.
T T
Example 7.5.1 Let v1 = 0 1 1 , v2 = 1 −1 1 , and
T
v3 = 0 0 2 . Then the corresponding distance matrix is given by
 
0 5 2
5 0 3 .
2 3 0
The following result gives a characterization of distance matrices.
Theorem 7.5.2 A real symmetric matrix D = (dij )ki,j=1 , with dii = 0,

i = 1, . . . , k, is a distance matrix if and only if the (k + 1) × (k + 1) bordered
matrix
0 eT

k+1
B = (bij )i,j=1 := (7.12)
e D
has only one positive eigenvalue. Here, e is the vector with all of its entries
equal to 1. In that case, the minimal dimension n for which there exists
vectors v1 , . . . , vk ∈ Rn such that dij = kvi − vj k2 , i, j = 1, . . . , k, is given
by the rank of the matrix
−1
S = B22 − B21 B11 B12 , (7.13)
where

0 1
B11 = (bij )2i,j=1 = T
, B12 = B21 = (bij )2i=1,j=3
k+1
, B22 = (bij )k+1
i,j=3 .
1 0
Proof. We first note that

T
I2 0 B11 B12 I2 0 B11 0
−1 −1 = .
−B21 B11 Ik−1 B21 B22 −B21 B11 Ik−1 0 S
Thus, by Theorem 5.5.5, we obtain

B11 0
In B = In = In B11 + In S = (1, 1, 0) + In S. (7.14)
0 S
Assume without loss of generality that v1 = 0 (by replacing vj by vj − v1 ,

j = 1, . . . , j, which does not affect the matrix) and consider the distance
matrix
kv2 k2 kvk k2
 
0 ···
kv2 k2 0 · · · kv2 − vk k2 
2 2 2
 
D = kv3 k kv3 − v2 k · · · kv3 − vk k  .

 .. .. .. 
 . . . 
kvk k2 kvk − v2 k2 ··· 0
Computing the matrix S in (7.13), one obtains
kv2 − v3 k2 · · · kv2 − vk k2
 
0
kv3 − v2 k2 0 · · · kv3 − vk k2 
 
 .. .. .. .. 
 . . . . 
kvk − v2 k2 kvk − v3 k2 ··· 0
kv2 k2
 
1
1 kv3 k2 
 0 1

1 1 ··· 1

− . ,

..  1 kv2 k2 kv3 k2 kvk k2
 .. .  0 ···
1 kvk k2
which equals the matrix

k k
kvi − vj k2 − kvi k2 − kvj k2 i,j=2
= −2 viT vj i,j=2
 T
v2
v3T 
= −2  .  v2 v3 ··· vk ,
 
 .. 
vkT
which is negative semidefinite of rank n, where n is the dimension of
Span{vi : i = 2, . . . , k}. Thus In S = (0, n, k − 1 − n), and by using (7.14) we
find that In B = (1, 1, 0) + (0, n, k − 1 − n) = (1, n + 1, k − 1 − n) and thus B
has only one positive eigenvalue.
Conversely, if B has only one positive eigenvalue, then In S = In B − (1, 1, 0),

gives that S has no positive eigenvalues. Thus −S is positive semidefinite.
Let us write − 12 S = QT Q, with Q of size n × k − 1, where n = rank S. Write

Q = q2 · · · qk ,
where q2 , . . . , qk ∈ Rn . Put q1 = 0. We claim that dij = kqi − qj k2 . From
− 21 S = QT Q we obtain that
dij − di1 − d1j = −2qTi qj = kqi − qj k2 − kqi k2 − kqj k2 , i, j = 2, . . . , k. (7.15)
Letting i = j ∈ {2, . . . , k} gives that di1 = kqi k2 , i = 2, . . . , k. Using this
(7.15) now gives that dij = kqi − qj k2 , i, j = 2, . . . , k, finishing the proof.
To know the interatomic distances in a molecule is important in

understanding the molecule and its chemical behavior. By using nuclear
magnetic resonance (NMR) data, one would like to determine these
interatomic distances. Clearly, this is a challenge as the distances are so
small, so unavoidably there are errors in the measurements, and moreover
one may not be able to determine all the distances from the data. Now, we
do know that the data comes from a three-dimensional space, so when one
writes down the corresponding distance matrix, it should have the property
that the matrix S in (7.13) has rank 3. This gives the opportunity to fill in
some missing data, as well as correct some inaccurate data.
7.6 Solving millennium prize problems with linear

algebra
The Clay Mathematics Institute (CMI) of Cambridge, Massachusetts,

established the Millennium Prize Problems, seven problems for which the
solution carries a $1 million prize payable by CMI, not to mention with a

place in the (mathematics) history books. The prizes were announced at a
meeting in Paris, held on May 24, 2000 at the Collège de France. In this
section we will discuss two of these problems from a linear algebra
perspective.
7.6.1 The Riemann hypothesis
The Riemann zeta function is defined by

∞
X 1
ζ(s) := .
n=1
ns
Pk
This infinite sum (a series) is defined by letting sk = n=1 n1s and when
limk→∞ sk exists,
P∞ we say that the series converges and call its limit the sum
of the series n=1 n1s . As it turns out, ζ(s) is well-defined when s is a
complex number with Re s > 1. The convergence when s = 2, 3, . . . , thus for
1 1 1 1 1 1 1 1 1
ζ(2) = 2
+ 2 + 2 +· · · , ζ(3) = 3 + 3 + 3 +· · · , ζ(4) = 4 + 4 + 4 +· · · ,
1 2 3 1 2 3 1 2 3
is typically addressed in a first treatment on series. Riemann showed that a
(necessarily unique) analytic function exists (also denoted by ζ) defined on
C \ {1} that coincides with ζ(s) on the domain {s ∈ C : Re s > 1}. If you are
not familiar with the notion of a function being analytic, one can think of
this property as being complex differentiable k times for every k ∈ N (also,
referred to as infinitely complex differentiable). The Riemann hypothesis can
now be formulated as follows.
Riemann hypothesis If s is a zero of ζ(s), then either s is a negative even

integer −2, −4, . . . or s has a real part equal to 12 .
The negative even integers are considered to be the trivial zeros of ζ(s), so
the Riemann hypothesis can also be stated asthe non-trivial zeros of the
Riemann zeta function have a real part 21 . There is a lot to say about the
Riemann hypothesis as the vast literature on the subject shows. A good
place to start to read up on it would be the website of the Clay Mathematics
Institute. In this subsection we would just like to introduce a linear algebra
problem, the solution of which would imply the Riemann hypothesis.
Define n × n matrices Dn = (dij )ni,j=1 and Cn = (cij )ni,j=1 by

(
i if i divides j
dij =
0 otherwise,
Figure 7.5: The Redheffer matrix of size 500 × 500.
and
Cn = (e2 + · · · + en )T e1 .
Let An = Dn + Cn , which is called the Redheffer matrix, after its inventor.
So, for instance  
1 1 1 1 1 1
1 1 0 1 0 1
 
1 0 1 0 0 1
A6 =  1 0 0 1 0 0 .

 
1 0 0 0 1 0
1 0 0 0 0 1
In Figure 7.5 one can see what A500 looks like.
We now have the following result:
The Riemann hypothesis holds if and only if for every > 0 there exist
1
M, N > 0 so that | det An | ≤ M n 2 + for all n ≥ N .
If you are familiar with big O notation, then you will recognize that the last
1
statement can be written as | det An | = O(n 2 + ) as n → ∞. The proof of
this result requires material beyond the scope of this book; please see
[Redheffer, 1977] for more information. While this formulation may be an
interesting way to familiarize oneself with the Riemann hypothesis, the
machinery to solve this problem will most likely tap into many fields of
mathematics. Certainly the solution has been elusive to many
mathematicians since the problem was introduced in 1859, and continues to

capture the interest of many.
7.6.2 P vs. NP
A major unresolved problem in computational complexity theory is the P

versus NP problem. The way to solve this problem is to find a polynomial
time algorithm for one of the problems that are identified as NP hard. In
this section we will discuss the NP hard problem MaxCut. By a polynomial
time algorithm we mean an algorithm for which the running time can be
bounded above by a polynomial expression in the size of the input for the
algorithm. The P versus NP problem was formally introduced in 1971 by
Stephen Cook in his paper “The complexity of theorem proving procedures,”
but earlier versions go back at least to a 1956 letter written by Kurt Gődel
to John von Neumann.
An (undirected) graph is an ordered pair G = (V, E) comprising a set V of

vertices (or nodes) together with a set E ⊆ V × V , the elements of which are
called edges. The set E is required to be symmetric, that is (i, j) ∈ E if and
only if (j, i) ∈ E. For this reason we write {i, j} instead of both (i, j) and
(j, i). In addition, when we count edges we count {i, j} only once. The edges
are depicted as lines between the corresponding vertices, so {1, 2} ∈ E means
that a line (edge) is drawn n between vertex 1 and vertex 2. An example with o
V = {1, 2, 3, 4, 5, 6}, E = {1, 2}, {1, 5}, {2, 5}, {2, 3}, {3, 4}, {4, 5}, {4, 6} is:
Figure 7.6: A sample graph.
A cut in a graph is a disjoint union V = V1 ∪ V2 of the vertices; that is

V1 ∩ V2 = ∅. The size s(V1 , V2 ) of the cut (V1 , V2 ) is the number of edges
where one endpoint lies in V1 and the other in V2 . So, for instance, for the
graph above and the choice V1 = {1, 3, 6}, V2 = {2, 4, 5}, the size equals
s(V1 , V2 ) = 5. A maximum cut of a graph G is a cut whose size is at least the
size of any other cut of G.
MaxCut problem: Given a graph G, find a maximum cut for G.
A graph G with n vertices has 2n−1 cuts, and thus to find a maximum cut
one may simply check the size of each cut and pick one for which the size is
maximal. There is one major problem with this approach: there are simply
too many cuts to check. For instance, if n = 100 and one can check 100,000
cuts per second, it will take more than 1017 years to finish the search. The
main problem is that the time it takes is proportional to 2n−1 , which is an
exponential function of n. We would rather have an algorithm taking time
that is proportional to a polynomial p(n) in n. We call such an algorithm a
polynomial time algorithm. As an example of a problem that can be solved in
polynomial time, putting n numbers in order from smallest to largest can be
done in a time proportional to n2 , for instance by using the Quicksort
algorithm. The MaxCut problem is one of many for which no polynomial
time has been established. We will describe now a polynomial time
algorithm that finds a cut with a size of at least 0.878 times the maximal cut
size. The development of a polynomial time algorithm that would bring this
16
number up to 17 ≈ 0.941 would show that P=NP and thus solve one of the
millennium prize problems.
Let G = (V, E) with V = {1, . . . , n}. Introduce the symmetric matrix

W = (wij )ni,j=1 given by
(
1 if {i, j} ∈ E
wij = (7.16)
0 otherwise.
The MaxCut problem may be rephrased as finding

1X
mc(G) = max wij (1 − yi yj ). (7.17)
yi ∈{−1,1} 2
i<j
Indeed, with a cut {1, . . . , n} = V1 ∪ V2 we set

(
1 if i ∈ V1 ,
yi =
−1 if i ∈ V2 .
Notice that
(
0 if i, j ∈ V1 or i, j ∈ V2 ,
1 − yi yj =
2 if i ∈ V1 , j ∈ V2 or i ∈ V2 , j ∈ V1 .
1
P
Thus 2 i<j wij (1 − yi yj ) indeed corresponds to the size of the cut
{1, . . . , n} = V1 ∪ V2 , and thus (7.17) corresponds to the MaxCut problem.

Notice that if y = (yi )ni=1 is a vector, then the matrix Y = (yij )ni,j=1 := yyT
is positive semidefinite and has diagonal entries equal to 1. In addition, Y
has rank 1. If we drop the rank 1 condition, we arrive at a larger set of
matrices Y . This now leads to the following problem.
Problem: Find Y = (yij )ni,j=1 positive semidefinite with yii = 1, i = 1, . . . , n,

which maximizes
1X
mcr(G) = max wij (1 − yij ). (7.18)
Y 2 i<j
As we showed, when we have a solution y = (yi )ni=1 to (7.17), we have that

Y = (yij )ni,j=1 := yyT is one of the matrices we are maximizing over in
(7.18), and thus we find that mc(G) ≤ mcr(G). If the optimum for (7.18) is
achieved at a Y of rank 1, then Y = yyT with y = (yi )ni=1 and yi ∈ {−1, 1},
so in that case we find that mc(G) = mcr(G). The usefulness of considering
problem (7.18) is that given an accuracy > 0, it can be solved in
polynomial time within the accuracy . The algorithm to solve (7.18) is
based on semidefinite programming, which is a generalization of linear
programming where instead of maximizing over the set of vectors with
nonnegative entries, one now optimizes over the set of positive semidefinite
matrices. In the remainder of this section we will explain why
0.878576 mcr(G) ≤ mc(G).
Let S n = {v ∈ Rn : kvk = 1} be the unit sphere in Rn . We define a function

sign : R → {−1, 1} by
(
1 if x ≥ 0,
sign(x) =
−1 if x < 0.
Lemma 7.6.1 Let v1 , v2 ∈ S n . Choose r ∈ S n randomly (uniform

distribution). Then the probability that sign rT v1 =
6 sign rT v2 equals
arccos(v1T v2 )
π .
Proof. The subspace {x ∈ Rn : rT x = 0} cuts the unit sphere S n in two equal

half-spheres. The chance that v1 and v2 end up in different half-spheres is
proportional to the angle arccos(v1T v2 ) between the vectors v1 and v2 . When
this angle is π (thus v1 = −v2 ) the chance is 1 that they end up in different
half-spheres, while if the angle is 0 (thus v1 = v2 ) the chance is 0 that they
end up in different half-spheres. The proportionality now leads to the chance
arccos(v1T v2 )
of ending up in different half-spheres being in general equal to π .
We now propose the following algorithm.

Given is a graph G = (V, E), with V = {1, . . . , n}.
1. Define the matrix W via (7.16).
2. Solve (7.18), resulting in a maximizing matrix Y .

3. Factor Y = QT Q, where Q = v1 · · · vn .
4. Choose r ∈ S n randomly (uniform distribution).
5. Set V1 = {i ∈ V : sign rT vi = 1} and V2 = {i ∈ V : sign rT vi = −1}.
This leads to a cut V = V1 ∪ V2 .
The following proposition gives an estimate for the expected size of a cut
obtained by the above algorithm.
Proposition 7.6.2 Using the above algorithm, the expected size of a cut is
≥ 0.87856 mcr(G).
Proof. We let E denote the expectation (of a probability distribution). Then

1X
E[size of a cut] = wij Probability(sign rT vi =
6 sign rT vj ) =
2 i,j
1X arccos(viT vj ) 1X arccos(yij )
wij = wij =
2 i,j π 2 i,j π
1X 2 arccos(yij ) 1X 2 arccos(t)
wij (1 − yij ) ≥ wij (1 − yij ) min =
4 i,j π 1 − yij 4 i,j −1≤t≤1 π 1−t
2 θ
mcr(G) min ≥ 0.87856 mcr(G),
0≤θ≤π π 1 − cos θ
where we used the substitution t = cos θ, and where the last step is the result
θ
of a calculus exercise (of determining the minimum of 1−cos θ on [0, π]).
Corollary 7.6.3 For a graph G, we have
0.87856 mcr(G) ≤ mc(G) ≤ mcr(G).
Proof. Indeed, using Proposition 7.6.2, we obtain
mc(G) ≥ E[size of a cut] ≥ 0.87856 mcr(G).
The inequality mcr(G) ≥ mc(G) holds, as we observed before.

Thus the outcome of the polynomial time algorithm provides an answer that
bounds the value of the NP hard problem MaxCut within 12.2% accuracy. It
has been proven that if the approximation ratio can be made better than
16
17 ≈ 0.941, then a polynomial time algorithm for MaxCut can be obtained.
Thus if a polynomial time algorithm can be found achieving this
approximation ratio of 0.941 (instead of 0.87856), one obtains that P=NP
(and a million US dollars).
7.7 How secure is RSA encryption?
Algorithms in public-key cryptography are only secure if the mathematical

steps to crack the code are impossible to perform within a time frame to
take advantage of the solution. The RSA (Rivest-Shamir-Adleman)
encryption scheme is based the challenge to find a factorization n = pq of a
given number n that is known to be the product of two primes. For instance,
the 250-digit number RSA-250
214032465024074496126442307283933356300861471514475501779775492088
141802344714013664334551909580467961099285187247091458768739626192
155736304745477052080511905649310668769159001975940569345745223058
9325976697471681738069364894699871578494975937497937
is known to be the product of two primes p and q, but finding this

factorization is a hard task. Trying out all possibilities will require time that
is beyond our lifetime. So how to do it much much faster? We will explain
some of the ideas behind one method, which ultimately leads to finding
nonzero vectors in the kernel of a large sparse matrix over Z2 . One may
think in terms of a matrix with 100,000 columns, with on average 15 nonzero
entries in each column.
The approach we describe is based on a so-called quadratic sieve. The first

observation is that with p and q odd we have that
p+q 2 p−q 2
n = pq = ( ) −( )
2 2
is a difference of squares. So, if we want to√factor n as a product pq, as a
first step, one can try to take integers s > n and see if s2 − n happens to
be
√ a square. In that case we would choose p and q so that s = (p + q)/2 and
s2 − n = (p − q)/2 and a factorization of n would follow. Most likely this
will not work. However, adjusting this idea (initiated by Carl Pomerance)
leads to the following. Let Φ be the function Φ(x) = x2 − n. Suppose that
x1 , . . . , xk are integers so that

Φ(x1 ) · · · Φ(xk )
is a square. Say, we have that v 2 = Φ(x1 ) · · · Φ(xk ). Let now u = x1 · · · xk ,
and compute
u2 = x21 · · · x2k = Φ(x1 ) · · · Φ(xk ) + rn = v 2 + rn,
for some integer r ∈ Z. Thus n divides u2 − v 2 = (u + v)(u − v). Let now
s = gcd(n, u − v). Then there is a chance that x is a nontrivial factor of n.
The question now becomes: Q Given {Φ(x1 ), . . . , Φ(xk )}, how do we find
J ⊆ {1, . . . , l} so that j∈J Φ(xj ) is a square? For this we factorize Φ(xj ) in
primes
n (j) n (j)
Φ(xj ) = p1 1 · · · pl l , j = 1 . . . , k.
P obtain a square we need to see to it that we choose J ⊆ {1, . . . , k} so that

To
j∈J ni (j) is even for all i = 1, . . . , k. If we set up the matrix
bij = ni (j)(mod 2) we get a l × k matrix A with entries in Z2 for which we
would like to find a vector x ∈ Zk2 so that Ax = 0. We now choose
J = {j : xj = 1}.
Let us consider an example.
Example 7.7.1 Let the numbers 3675, 7865, 165, 231, 7007 be given. Can we
find a square among the different products of these numbers? For this we
first do a prime factorization of each of them:
3675 = 3·52 ·72 , 7865 = 5·112 ·13, 165 = 3·5·11, 231 = 3·7·11, 7007 = 72 ·11·13.
Notice that these are all products of the primes 3, 5, 7, 11, and 13. For each
of these numbers we make a column consisting of the power of these primes
modulo 2. For instance, the column corresponding to 3675 is
T
1 0 0 0 0 ; as only the power of 3 in the prime factorization of 3675
is odd, we only get a 1 in the first position (the row corresponding to the
prime 3). Doing it for all 5 numbers we get the matrix
 
1 0 1 1 0
0 1 1 0 0
 
A= 0 0 0 1 0 .

0 0 1 1 1
0 1 0 0 1
T
Taking the vector x = 1 1 1 0 1 , we get that Ax = 0. This now
gives that the product
3675 · 7865 · 165 · 7007 = 32 · 54 · 74 · 114 · 132
is a square.
To find a solution of Ax = 0 one may use Gaussian elimination, but in doing

so we will lose the sparse structure of the matrix, and too much storage may
be required to do it effectively. The following algorithm, due to Douglas H.
Wiedemann, has smaller storage requirements.
We present an algorithm to find a nonzero vector x ∈ Zm

2 as the solution of
the homogeneous system Ax = 0, where A ∈ Zm×m 2 .
1. Randomly choose 0 6= y ∈ Zm
2 . If u = Ay = 0 we are done.
2. Randomly choose 0 6= v ∈ Zm
2 .
3. Compute si = vT Ai u, i = 0, 1, . . . ..
4. Compute a nonzero polynomial m(X) = m0 + m1 X + · · · + md X d so that
Pd
j=0 sk+j mj = 0 for all k. Equivalently, find a nontrivial solution of the
equation
 
  m0
s0 s1 s2 · · · sd−1 sd  m1 
 
s1 s2 s3 · · · sd sd+1   m2 
 
s2 s3 s4 · · · sd+1 sd+2   ..  = 0. (7.19)

 . 
.. .. .. .. ..

 
. . . . . md−1 
md
m(X)
5. Let j be so that m0 = · · · = mj−1 = 0 and mj =
6 0. Put p(X) = mj X j .
6. Let now x = p(A)y, and check whether Al−1 x 6= 0 and Al x = 0, for some
l = 1, . . . , j. If so, Al−1 x 6= 0 is the desired vector. If not, start over with
another random vector y.
Note that most steps only require a vector matrix multiplication.
It should be noted that Step 4 in the above algorithm can actually be used
to find the minimal polynomial of a matrix. Let us illustrate this on a small
example over R.
 
3 0 0 0
0 2 1 0
Example 7.7.2 Let A = 
0 0 2 0. If we take

0 0 0 2
T
v = u = 1 1 1 1 , one easily calulates that
(sj )j∈N0 = (vT Aj u)j∈N0 = (3j + 3 · 2j + j · 2j−1 )j∈N0 =

4, 10, 25, 63, 161, 419, 1113, 3019, . . . .

Setting up Equation (7.19) (with d = 3), we get
 
4 10 25 63  
 10 25 63 161  m0
  m1 
 25 63 161 419 
   = 0,
 
 63 161 419 1113 m2

m3
161 419 1113 3019
where we are looking for a nontrivial solution. Such a solution is given by

   
m0 −12
m1   16 
 =
m2   −7  ,

m3 1
and indeed the minimal polynomial for A is

mA (z) = −12 + 16z − 7z 2 + z 3 = (z − 3)(z − 2)2 . This works for most choices
of u and v. What one needs to avoid is that certain eigenspaces are
overlooked by u and v. For instance, if v is an eigenvector of A at eigenvalue
3 (and thus v does not have any component in the generalized eigenspace
with eigenvalue 2), one would find m(z) = z − 3 (as sj = u1 3j v1 , in this
case). Finally, note that we set d = 3, thus making use of some advance
knowledge that in general one cannot count on. In an algorithm due to
Elwyn R. Berlekamp and James L. Massey, which is beyond the scope of this
book, one does not need to know the degree in advance.
Remember that finding u and v so that u2 − v 2 = (u − v)(u + v) is a

multiple of n does not automatically lead to the desired factorization n = pq.
Thus one needs to repeat the above steps several times. Using the ideas
presented here, the 232-digit number RSA-768 was factored in 2009. It took
two years to do the factorization and involved more than 1020 operations
(performed on several computer clusters located in different countries).
7.8 Quantum computation and positive maps
Quantum computing provides another threat to RSA encryption. In a

quantum computer information is stored in characteristics of physical
particles. The elementary objects in quantum computing are called qubits as
opposed to bits in a classical computer. A bit can take value 0 or 1 (on or
off) while a qubit is mathematically described by a unit vector in C2 , thus

x1
∈ C2 with |x1 |2 + |x2 |2 = 1. The numbers |x1 |2 and |x2 |2 are thought
x2

1
of as probabilities, with |x1 |2 the probability the vector is in state |0i =
0

0
and |x2 |2 the probability the vector is in state |1i = . While in quantum
1
computing vectors are often denoted in “ket” notation |xi, we will stick to
the notation as used in the rest of this book.
What makes quantum computing powerful is that many computations can

be done simultaneously, though the outcome can only be measured with a
certain possibility. One of the results that gave the development of a
quantum computer a strong impulse of energy is the algorithm that Peter
Shor developed to factor an integer in polynomial time with the help of a
quantum computer. Thus, if a powerful enough quantum computer exists,
the RSA encryption method will need to be rethought as the code can now
be broken. Currently, the quantum computers that exists are very small.
There are still significant problems in managing a large number of physical
particles in close proximity of one another without destroying the stored
information. Time will tell whether these physical problems can be overcome.
The mathematical development of quantum computing leads to many

interesting linear algebra questions. In this section we will discuss one of
these, the so-called separability problem.
We have seen that if A ∈ Cn×n and B ∈ Cm×m are positive semidefinite,

then so is A ⊗ B ∈ C(n+m)×(n+m) . If we take sums of such tensor products
then the resulting matrix is also positive semidefinite. The separability
question is the reverse question: given M ∈ C(n+m)×(n+m)
Pk , how can I
determine whether M can be written as M = i=1 Ai ⊗ Bi with
A1 , . . . , Ak ∈ Cn×n and B1 , . . . , Bk ∈ Cm×m all positive semidefinite? When
M is of this form, we say that M is (n, m)-separable. Positive semidefinite
matrices that are not separable are called entangled. These entangled
matrices provide in some sense the avenue to perform quantum (thus,
non-classical) computations and therefore represent the power of quantum
computing. Before we discuss the separablility problem in more detail, let us
develop some more terminology.
As before, we let Hk be the vector space over R of k × k self-adjoint complex

matrices. Thus dimR Hk = k 2 . A subset C of a vector space V over R is called
a (convex) cone if
(i) v ∈ C and c ≥ 0 implies cv ∈ C, and

(ii) v, w ∈ C implies v + w ∈ C.
For example,
PSDk := {A ∈ Hk : A is positive semidefinite}
is a cone in Hk . If the vector space V has a norm k · k, then we say that the
cone C is closed if
An ∈ C, n = 1, 2, . . . , A ∈ V, and lim kAn − Ak = 0 imply that A ∈ C.

n→∞
The vector space Hk has an inner product
hA, Bi = tr(AB),
and therefore also an induced norm.
Proposition 7.8.1 The set PSDk is a closed cone in Hk .
(n)
Proof. Let An = (aij )ki,j=1 ∈ PSDk , n = 1, 2, . . ., and A = (aij )ni,j=1 ∈ Hk be
(n)
so that limn→∞ kAn − Ak = 0. Then limn→∞ aij = aij for all 1 ≤ i, j ≤ n. If
we let x ∈ Ck , then hAn x, xi ≥ 0. Also limn→∞ hAn x, xi = hAx, xi, and thus
hAx, xi ≥ 0. As this is true for every x ∈ Ck , we obtain that A ∈ PSDk .
In Hnm we define
SEPn,m = {M ∈ Hnm : there exist k ∈ N, Ai ∈ PSDn , Bi ∈ PSDm

k
X
so that M = Ai ⊗ Bi }.
i=1
It is easy to see that SEPn,m is a cone. It is actually a closed cone, but we

will not provide the proof as it requires more analysis results than we are
covering here. We next provide a first way of seeing how some elements of
PSDnm do not lie in SEPn,m .
Proposition 7.8.2 Let M ∈ SEPn,m , and let us write M = (Mij )ni,j=1

where Mij ∈ Cm×m . Then M Γ := (Mij
T
) ∈ SEPn,m ⊆ PSDnm .
Proof. As M ∈ SEPn,m we have that there exist k ∈ N, and

Ai ∈ PSDn , Bi ∈ PSDm , i = 1, . . . , k, so that M = (Mij )ni,j=1 =
Pk Pk (r) (r)
r=1 Ar ⊗ Br . Notice that Mij = r=1 aij Br , where aij is the (i, j)th
T k (r)
= r=1 aij BrT , and thus
P
entry of Ar . But then Mij
k
M Γ = r=1 Ar ⊗ BrT ∈ SEPn,m ⊆ PSDnm , as the transpose of a positive
P
semidefinite matrix Bi is also positive semidefinite.
When M Γ ∈ PSDnm , we say that M “passes the Peres test.” Asher Peres
discovered Proposition 7.8.2 in 1996. In addition, the map M 7→ M Γ is
referred to as taking the partial transpose.
Example 7.8.3 The matrix

 
1 0 0 1
0 0 0 0
M =
0

0 0 0
1 0 0 1
is not (2, 2)-separable, as

 
1 0 0 0
Γ
0 0 1 0
M =
0 1

0 0
0 0 0 1
is not positive semidefinite. Thus M does not pass the Peres test.
Proposition 7.8.2 relies on the fact that taking the transpose maps PSDk
into PSDk . For other maps that have this property the same test can be
applied as well. We call a linear map Φ : Cm×m → Cl×l positive if
Φ(PSDm ) ⊆ PSDl . Thus, Φ is positive if it maps positive semidefinite
matrices to positive semidefinite matrices.
Example 7.8.4 Let Si ∈ Cl×m , i = 1, . . . , k. Then

k
X
Φ(X) = Si XSi∗ (7.20)
i=1
is an example of a positive map. Indeed, if X ∈ PSDm , we have that

Si XSi∗ ∈ PSDl , i = 1, . . . , k, and thus Φ(X) ∈ PSDl . As a special case, we
can take Si = e∗i ∈ C1×m , i = 1, . . . , m. Then
m
X
Φ(X) = e∗i Xei = trX.
i=1
Thus taking the trace is a positive map. If in addition, we let Ti ∈ Cl×m ,

i = 1, . . . , r, then
k
X r
X
Φ(X) = Si XSi∗ + Ti X T Ti∗ (7.21)
i=1 i=1
defines a positive map.

We can now state a more general version of Proposition 7.8.2.
Proposition 7.8.5 Let M ∈ SEPn,m , and let us write M = (Mij )ni,j=1

where Mij ∈ Cm×m . Let Φ : Cm×m → Cl×l be a positive map. Then
(idCn×n ⊗ Φ)(M ) = (Φ(Mij ))ni,j=1 ∈ SEPn,l ⊆ PSDnl .
Proof. As M ∈ SEPn,m we have that there exist k ∈ N, and

Ai ∈ PSDn , Bi ∈ PSDm , i = 1, . . . , k, so that
Pk Pk (r)
M = (Mij )ni,j=1 = r=1 Ar ⊗ Br . Notice that Mij = r=1 aij Br , where
(r) P k (r)
aij is the (i, j)th entry of Ar . But then Φ(Mij ) = r=1 aij Φ(Br ), and thus
k
X
(idCn×n ⊗ Φ)(M ) = Ar ⊗ Φ(Br ) ∈ SEPn,l ⊆ PSDnl .
r=1
There are positive maps Φ for which idCk×k ⊗ Φ is positive for every k ∈ N.
We call such maps completely positive. These completely positive maps are
useful in several contexts, however they are unable to identify M ∈ PSDmn
that are not separable, as (idCn×n ⊗ Φ)(M ) will in this case always be
positive semidefinite. The completely positive maps are characterized in the
following result, due to Man-Duen Choi.
Theorem 7.8.6 Let Φ : Cm×m → Cl×l be a linear map. Let

{Eij : 1 ≤ i, j ≤ m} be the standard basis of Cm×m . Then the following are
equivalent.
(i) Φ is completely positive.

(ii) idCm×m ⊗ Φ is positive.
(iii) The matrix (Φ(Eij ))m
i,j=1 is positive semidefinite.
Ps
(iv) There exist Sr ∈ Cl×m , r = 1, . . . , s, so that Φ(X) = r=1 Sr XSr∗ .
When one (and thus all) of (i)–(iv) hold, then s in (iv) can be chosen to be
at most ml.
Proof. (i) → (ii) is trivial, as when idCk×k ⊗ Φ is positive for all k ∈ N, then
it is certainly positive for k = m.
2
×m2
(ii) → (iii): The matrix H = (Eij )m
i,j=1 ∈ C
m
is easily seen to be
positive semidefinite. As idCm×m ⊗ Φ is positive, we thus get that

(idCm×m ⊗ Φ)(H) = (Φ(Eij ))m i,j=1 is positive semidefinite.
(iii) → (iv): Since M = (Φ(Eij ))mi,j=1 ∈ C

ml×ml
is positive semidefinite, we
Plm
can find vectors vr ∈ C , r = 1, . . . , ml, so that M = r=1 vr vr∗ . Write the
ml
vectors vr as  
vr1
vr =  ...  , where vr1 , . . . , vrm ∈ Cl .
 
vrm
We now have that
ml
X
∗
Mij = Φ(Eij ) = vri vrj .
r=1
∗
= Sr Eij Sr∗ . Thus

If we introduce Sr = vr1 · · · vrm , we have that vri vrj
Plm ∗ ml
= r=1 Sr Eij Sr∗ . As any X is a
P
we find that Mij = Φ(Eij ) = r=1 vri vrj
linear combination of the basis elements Eij , we thus find that
Pml
Φ(X) = i=1 Si XSi∗ .
Ps
(iv) → (i): When Φ(X) = r=1 Sr XSr∗ , we may write for M = (Mij )ki,j=1
with Mij ∈ Cm×m ,
s
X n
X
(idCk×k ⊗ Φ)(M ) = ( Sr Mij Sr∗ )m
i,j=1 = (Ik ⊗ Sr )M (Ik ⊗ Sr∗ ).
r=1 r=1
When M is positive semidefinite, then so is (Ik ⊗ Sr )M (Ik ⊗ Sr )∗ , and thus

also Φ(M ) is positive semidefinite.
Thus completely positive maps are well-characterized in the sense that there
is a simple way to check that a linear map is completely positive, as well as
that it is easy to generate all completely positive maps. The set of positive
maps (which actually forms a cone) is not that well understood. First of all,
it is typically not so easy to check whether a map is positive, and secondly
there is not a way to generate all positive maps. Let us end this section with
a positive map that is not completely positive, also due to Man-Duen Choi.
Example 7.8.7 Let Φ : C3×3 → C3×3 be defined by

 
a11 + 2a22 −a12 −a13
Φ((aij )3i,j=1 ) =  −a21 a22 + 2a33 −a23  .
−a31 −a32 a33 + 2a11
Then
 
1 0 0 0 −1 0 0 0 −1
0 0 0 0 0 0 0 0 0
 
0 0 2 0 0 0 0 0 0
 
0 0 0 2 0 0 0 0 0
3
 
−1
(Φ(Eij )i,j=1 ) =  0 0 0 1 0 0 0 −1
,
0 0 0 0 0 0 0 0 0
 
0 0 0 0 0 0 0 0 0
 
0 0 0 0 0 0 0 2 0
−1 0 0 0 −1 0 0 0 1
which is not positive semidefinite. Thus, by Proposition 7.8.6, Φ is not

completely positive. To show that Φ is positive, it suffices to show that
Φ(xx∗ ) is positive semidefinite for all x = (xi )3i=1 ∈ C3 as every positive
semidefinite is a sum of positive semidefinite rank 1 matrices xx∗ . To show
that Φ(xx∗ ) is positive semidefinite, we need to show that y∗ Φ(xx∗ )y ≥ 0
for all y = (yi )3i=1 ∈ C3 . We show the proof in case |x3 | ≤ |x2 |. In this case
we observe that
y∗ Φ(xx∗ )y = |x1 ȳ1 − x2 ȳ2 + x3 ȳ3 |2 + 2|x3 |2 |y2 |2 +
2(|x2 | − |x3 |2 )|y1 |2 + 2|x1 y3 − x3 y1 |2 ≥ 0.

The proof for the case |x2 | ≤ |x3 | is similar.
7.9 Exercises
Exercise 7.9.1 Let p(n) be a polynomial in n of degree k, and let λ ∈ C be

of modulus greater than one. Show that limn→∞ p(n) λn = 0. (Hint: write
|λ| = 1 P
+ , >0, and use the binomial formula to give that
n
|λn | = j=0 nj j , which for n large enough can be bounded below by a
polynomial of degree greater than k.)
Exercise 7.9.2 Let A = (aij )ni,j=1 ∈ Rn×n . Let A be column-stochastic,

Pn
which means that aij ≥ 0 for all i, j = 1, . . . , n, and i=1 aij = 1,
j = 1, . . . , n.
(i) Show that 1 is an eigenvalue of A.

(ii) Show that Am is column-stochastic for all m ∈ N. (Hint: use that

eA = e.)
(iii) Show that forPevery x, yP∈ Rn we have that
n n
|yT Am x| ≤ ( j=1 |xj |)( j=1 |yj |) for all m ∈ N. In particular, the
sequence {yT Am x}m∈N is bounded.
(iv) Show that A cannot have Jordan blocks at 1 of size greater than 1.
(Hint: use that when k > 1 some of the entries of Jk (1)m do not stay
bounded as m → ∞. With this observation, find a contradiction with
the previous part.)
6 0, then |λ| ≤ 1.
(v) Show that if xA = λx, for some x =
(vi) For a vector v = (vi )ni=1 we define |v| = (|vi |)ni=1 . Show that if λ is an
eigenvalue of A with |λ| = 1, and xA = λx, then y := |x|A − |x| has all
nonnegative entries.
For the remainder of this exercise, assume that A only has positive
entries; thus aij > 0 for all i, j = 1, . . . , n.
(vii) Show that y = 0. (Hint: put z = |x|A, and show that y = 6 0 implies
that zA − zPhas all positive entries. The latter can be shown to
n
contradict i=1 aij = 1, j = 1, . . . , n.)
(viii) Show that if xA = λx with |λ| = 1, then x is a multiple of e and λ = 1.
(Hint: first show that all entries of x have the same modulus.)
(ix) Conclude that we can apply the power method. Starting with a vector
v0 with positive entries, show that there is a vector w with positive
entries so that Aw = w. In addition, show that w is unique when we
require in addition that eT w = 1.
Exercise 7.9.3 Let k · k be a norm on Cn×n , and let A ∈ Cn×n . Show that
1
ρ(A) = lim kAk k k , (7.22)
k→∞
where ρ(·) is the spectral radius. (Hint: use that for any > 0 the spectral
1
radius of ρ(A)+ A is less than one, and apply Corollary 7.2.4.)
Exercise 7.9.4 Let A = (aij )ni,j=1 , B = (bij )ni,j=1 ∈ Cn×n so that |aij | ≤ bij
that ρ(A) ≤ ρ(B). (Hint: use (7.22) with the
for i, j = 1, . . . , n. Show q
Pn 2
Frobenius norm kM k = i,j=1 |mij | .)
Exercise 7.9.5 Show that if {u1 , . . . , um } and {v1 , . . . , vm } are

orthonormal sets, then the coherence µ := maxi,j |hui , vj i|, satisfies
√1 ≤ µ ≤ 1.
m
Exercise 7.9.6 Show that if A has the property that every 2s columns are
linearly independent, then the equation Ax = b can have at most one
solution x with at most s nonzero entries.
Exercise 7.9.7 Let A = (aij )ni,j=1 . Show that for all permutations σ on
{1, . . . , , n} we have a1,σ(1) a2,σ(2) · · · an,σ(n) = 0 if and only if there exist r
(1 ≤ r ≤ n) rows and n + 1 − r columns in A so that the entries they have in
common are all 0.
Exercise 7.9.8 We say that A = (aij )ni,j=1 ∈ Rn×n is row-stochastic if AT is

columns-stochastic. We call A doubly stochastic if A is both column- and
row-stochastic. The matrix P = (pij )ni,j=1 is called a permutation matrix if
every row and column of P has exactly one entry equal to 1 and all the
others equal to zero.
(i) Show that a permutation matrix is doubly stochastic.

(ii) Show that if A is a doubly stochastic matrix, then there exists a
permutation σ on {1, . . . , , n}, so that a1,σ(1) a2,σ(2) · · · an,σ(n) =
6 0.
(iii) Let σ be as in the previous part, and put α = minj=1,...,n aj,σ(j) (> 0),
and let Pσ be the permutation matrix with a 1 in positions
(1, σ(1)), . . . , (n, σ(n)) and zeros elsewhere. Show that either A is a
1
permutation matrix, or 1−α (A − αPσ ) is a doubly stochastic matrix
with fewer nonzero entries than A.
(iv) Prove
Theorem 7.9.9 (Birkhoff ) Let A be doubly stochastic. Then there

exists a k ∈ N, permutation matrices P1 , . . . , Pk and positive numbers
α1 , . . . , αk so that
k
X
A = α1 P1 + · · · + αk Pk , αj = 1.
j=1
In other words, every doubly stochastic matrix is a convex combination

of permutation matrices.
(Hint: Use induction on the number of nonzero entries of A.)
 
1/6 1/2 1/3
Exercise 7.9.10 Write the matrix 7/12 0 5/12 as a convex
1/4 1/2 1/4
combination of permutation matrices.
Exercise 7.9.11 (a) Show that

A ? A
min rank = rank + rank B C .
B C B
(b) Show that the lower triangular partial matrix

 
A11 ?
A =  ... ..
 
. 
An1 · · · Ann
has minimal rank min rank A equal to

   
n Ai1 · · · Aii n−1 Ai+1,1 ... Ai+1,i
 .. . rank  ... ..  . (7.23)
X X
rank  . ..  −
 
. 
i=1 An1 . . . Ani i=1 An1 ··· Ani
Exercise 7.9.12 Show that all minimal rank completions of

 
? ? ?
1 0 ?
0 1 1
are  
x1 x2 x1 x3 + x2
1 0 x3 .
0 1 1
Exercise 7.9.13 Consider the partial matrix

 
1 ? ?
A =  ? 1 ? .
−1 ? 1
Show that there exists a completion of A that is a Toeplitz matrix of rank 1,

but that such a completion cannot be chosen to be real.
Exercise 7.9.14 Consider the n × n tri-diagonal Toeplitz matrix

 
2 −1 0 · · · 0
−1 2 −1 · · · 0 
 
An =  ... .. .. .. ..  .

 . . . . 

 0 · · · −1 2 −1
0 · · · 0 −1 2
π
Show that λj = 2 − 2 cos(jθ), j = 1, . . . , n, where θ = n+1 , are the
eigenvalues. In addition, an eigenvector associated with λj is
 
sin(jθ)
 sin(2jθ) 
vj =  .
 
..
 . 
sin(njθ)
Exercise 7.9.15 Let A = (aij )ni,j=1 ∈ Cn×n be given.

1 0
(a) Let U = ∈ Cn×n , with U1 ∈ C(n−1)×(n−1) a unitary matrix
0 U1
chosen so that
   
a21 σ v
 a31   0  u n
uX
U1  .  =  .  , σ = t |aj1 |2 .
   
 ..   ..  j=2
an1 0
Show that U AU ∗ has the form

 
a11 ∗ ∗ ··· ∗
σ ∗ ∗ ··· ∗ 
a11 ∗

U AU ∗ =  0 ∗ ∗ ··· ∗ = .

 .. .. .. ..  σe1 A1
 . . . .
0 ∗ ∗ ··· ∗
(b) Show that there exists a unitary V so that V AV∗ is upper Hessenberg.
1 0
(Hint: after part (a), find a unitary U2 = so that U2 A1 U2∗ has
0 ∗
∗ ∗
the form , and observe that
σ2 e1 A2

1 0 1 0 1 0 1 0
Â = A
0 U2 0 U1 0 U1∗ 0 U2∗
has now zeros in positions (2, 1), . . . , (n, 1), (3, 2), . . . , (n, 2). Continue the
process.)
Remark. If one puts a matrix in upper Hessenberg form before starting the
QR algorithm, it (in general) speeds up the convergence of the QR
algorithm, so this is standard practice when numerically finding eigenvalues.
Exercise 7.9.16 The adjacency matrix AG of a graph G = (V, E) is an

n × n matrix, where n = |V | is the number of vertices of the graph, and the
entry (i, j) equals 1 when {i, j} is an edge, and 0 otherwise. For instance, the
graph in Figure 7.6 has adjacency matrix
 
0 1 0 0 1 0
1 0 1 0 1 0
 
0 1 0 1 0 0
0 0 1 0 1 1 .
 
 
1 1 0 1 0 0
0 0 0 1 0 0
The adjacency matrix is a symmetric real matrix. Some properties of graphs
can be studied by studying associated matrices. In this exercise we show this
for the so-called chromatic number χ(G) of a graph G. It is defined as
follows. A k-coloring of a graph is a function c : V → {1, . . . , k} so that
6 c(j) whenever {i, j} ∈ E. Thus, there are k colors and adjacent
c(i) =
vertices should not be given the same color. The smallest number k so that G
has a k-coloring is defined to be the chromatic number χ(G) of the graph G.
(a) Find the chromatic number of the graph in Figure 7.6.

(b) The degree di of a vertex i is the number of vertices it is adjacent to. For
instance, for the graph in Figure 7.6 we have that the degree of vertex 1
T
is 2, and the degree of vertex 6 is 1. Let e = 1 · · · 1 ∈ Rn . Show
that eT AG e = i∈V di .
P
(c) For a real number x let bxc denote the largest integer ≤ x. For instance,
bπc = 3, b−πc = −4, b5c = 5. Let α = λmax (AG ) be the largest
eigenvalue of the adjacency matrix of G. Show that G must have a
vertex of degree at most bαc. (Hint: use Exercise 5.7.21(b).)
(d) Show that
χ(G) ≤ bλmax (AG )c + 1, (7.24)
which is a result due to Herbert S. Wilf. (Hint: use induction and
Exercise 5.7.21(c).)
Exercise 7.9.17 Let

2 2 2

3 0 0 0 3 0 0 0 3
0 α
3 0 0 0 0 0 0 0
5−α
 
0 0 0 0 0 0 0 0
 3
5−α

0 0 0 0 0 0 0 0
1 2
3
2

2
ρα =  3 0 0 0 3 0 0 0 3 ,
7
0 α
 0 0 0 0 3 0 0 0

0 α
 0 0 0 0 0 3 0 0

0 5−α
0 0 0 0 0 0 3 0
2 2 2
3 0 0 0 3 0 0 0 3
where 0 ≤ α ≤ 5. We want to investigate when ρα is 3 × 3 separable.
(a) Show that ρα passes the Peres test if and only if 1 ≤ α ≤ 4.

(b) Let  
1 0 0 0 −1 0 0 0 −1
0 0 0 0 0 0 0 0 0
 
0 0 2 0 0 0 0 0 0
 
0 0 0 2 0 0 0 0 0
 
−1
Z= 0 0 0 1 0 0 0 −1
.
0 0 0 0 0 0 0 0 0
 
0 0 0 0 0 0 0 0 0
 
0 0 0 0 0 0 0 2 0
−1 0 0 0 −1 0 0 0 1
Show that for x, y ∈ C3 we have that (x ⊗ y)∗ Z(x ⊗ y) ≥ 0.
(c) Show that tr(ρα Z) = 71 (3 − α), and conclude that ρα is not 3 × 3
separable for 3 < α ≤ 5.
(d) (Honors) Show that ρα is not 3 × 3 separable for 0 ≤ α < 2.
(e) (Honors) Show that ρα is 3 × 3 separable for 2 ≤ α ≤ 3.
Exercise 7.9.18 (Honors) A matrix is 2 × 2 × 2 separable if it lies in the

cone generated by matrices of the form A ⊗ B ⊗ C with A, B, C ∈ PSD2 . Put
R = I − x1 x∗1 − x2 x∗2 − x3 x∗3 − x4 x∗4 ,
where
1√ 1√
1 0 2 0 2 1
x1 = ⊗ ⊗ 21 √ , x2 = ⊗ 12 √ ⊗ ,
0 1 2 2 1 2 2 0
1√ 1√ 1√ 1√
x3 = 2 √2 ⊗ 1 ⊗ 0 , x = 2 √2 2 2
⊗ 21 √ ⊗ 21 √ .
1 4 1
2 2 0 1 − 2 2 − 2 2 − 2 2
Show that R is not 2 × 2 × 2 separable.
Hint: Let  
1 −1 −1 1 −1 1 1 −1
−1 4 1 0 1 3 −1 1
 
−1 1 4 3 1 −1 0 1
 
1 0 3 4 −1 1 1 −1
Z=
−1 1
,
 1 −1 4 0 3 1
1
 3 −1 1 0 4 1 −1
 1 −1 0 1 3 1 4 −1
−1 1 1 −1 1 −1 −1 1
and show that trace(RZ) = − 38 but
(v ⊗ w ⊗ z)∗ Z(v ⊗ w ⊗ z) ≥ 0,
for all v, w, z ∈ C2×2 .
Bibliography for Chapter 7
It is beyond the scope of this book to provide complete references for the
topics discussed in this chapter. Rather, we provide just a few references,
which can be a starting point for further reading on these topics. With the
references in the papers and books below as well as the sources that refer to
them (see the chapter “How to start your own research project” on how to
look for these), we hope that you will be able to familiarize yourself in more
depth with the topics of your interest.
• M. Bakonyi, H. J. Woerdeman, Matrix completions, moments, and sums

of Hermitian squares. Princeton University Press, Princeton, NJ, 2011.
• W. W. Barrett, R. W. Forcade, A. D. Pollington, On the spectral radius
of a (0,1) matrix related to Mertens’ function. Linear Algebra Appl. 107
(1988), 151–159.
• M. Bellare, O. Goldreich, M. Sudan, Free bits, PCPs, and
nonapproximability–towards tight results. SIAM J. Comput. 27 (1998),
no. 3, 804–915.
• T. Blumensath, M. E. Davies, Iterative thresholding for sparse
approximations. J. Fourier Anal. Appl. 14 (2008), no. 5–6, 629–654.
• R. P. Boyer and D. T. Parry, On the zeros of plane partition
polynomials. Electron. J. Combin. 18 (2011), no. 2, Paper 30, 26 pp.
• K. Bryan and T. Leise, The $25,000,000,000 eigenvector. The linear
algebra behind Google. SIAM Rev. 48 (2006), 569–581.
• S. Chandrasekaran, M. Gu, J. Xia and J. Zhu, A fast QR algorithm for
companion matrices. Operator Theory: Adv. Appl., 179 (2007), 111–143.
• M. D. Choi, Positive semidefinite biquadratic forms. Linear Algebra and
Appl. 12 (1975), no. 2, 95–100.
• S. Foucart and H. Rauhut, A mathematical introduction to compressive

sensing. Applied and Numerical Harmonic Analysis. Birkha̋user/
Springer, New York, 2013.
• M.X. Goemans and D.P. Williamson, Improved approximation
algorithms for maximum cut and satisfiability problems using
semidefinite programming, J. ACM 42 (1995) 1115–1145.
• K. Kaplan, Cognitech thinks it’s got a better forensic tool: The firm uses
complex math in video image-enhancing technology that helps in finding
suspects, Los Angeles Times, September 5, 1994;
http://articles.latimes.com/1994-09-05/business/fi-35101 1 image-
enhancement.
• A. K. Lenstra and M. S. Manasse, Factoring with two large primes,
Math. Comp. 63 (1994), no. 208, 785–798.
• P. J. Olver, Orthogonal bases and the QR algorithm, University of
Minnesota, http://www.math.umn.edu/∼olver/aims /qr.pdf.
• L. Page, S. Brin, R. Motwani, T. Winograd, The PageRank citation
ranking: Bringing order to the web (1999),
http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf.
• R. Redheffer, Eine explizit lősbare Optimierungsaufgabe. (German)
Numerische Methoden bei Optimierungsaufgaben, Band 3 (Tagung,
Math. Forschungsinst., Oberwolfach, (1976), pp. 213–216. Internat. Ser.
Numer. Math., Vol. 36, Birkhäuser, Basel, 1977.
• P. W. Shor, Polynomial-time algorithms for prime factorization and
discrete logarithms on a quantum computer, SIAM J. Comput. 26
(1977), no. 5, 1484–1509.
• H. J. Woerdeman, Minimal rank completions for block matrices. Linear
Algebra Appl. 121 (1989), 105–122.
• T. Zhang, J. M. Pauly, S. S. Vasanawala and M. Lustig, Coil
compression for accelerated imaging with Cartesian sampling. Magnetic
Resonance in Medicine, 69 (2013), 571–582.
How to Start Your Own Research
Project
For a research problem you need
• A problem nobody solved,

• One that you will be able to make some headway on, and
• One that people are interested in.
So how do you go about finding such a problem?
In MathSciNet (a database of reviews of mathematical journal articles and

books maintained by the American Mathematical Society) you can do a
search. For instance, with search term “Anywhere” you can put a topic such
as “Normal matrix,” “QR algorithm,” etc., and see what comes up. If you
click on the review of a paper you can see in a box “Citations” what other
papers or reviews in the database refer to this paper. Of course, very recent
papers will have no or few citations, but earlier ones typically have some.
The number of citations is a measure of the influence of the paper.
Of course, you can also search terms in any search engine. Some search
engines, when you give them titles of papers, will indicate what other papers
cite this paper. I find this a very useful feature. Again, it gives a sense of
how that particular line of research is developing and how much interest
there is for it.
If you want to get a sense of how hot a topic is, you can see if government
agencies or private industry give grants for this line of research. For instance,
in the United States the National Science Foundation (NSF) gives grants for
basic research. On the NSF web page (www.nsf.gov) you can go to “Search
Awards,” and type terms like eigenvalue, singular value decomposition, etc.,
and see which funded grants have that term in the title or abstract. Again, it
gives you an idea of what types of questions people are interested in, enough
to put US tax dollars toward the research. Of course, many countries have
government agencies that support research, for instance in the Netherlands
it is the Nederlandse Organisatie voor Wetenschappelijk Onderzoek (NWO,
247
248 How to Start Your Own Research Project
www.nwo.nl). If you are searching in the Dutch language it is useful to know

that “wiskunde” is the Dutch word for mathematics.
Another source for hot topics is to see what areas of mathematics receive the
major prizes. The Fields medal and the Abel Prize are two well-known
prestigious prizes for mathematical research, but there are many others. In
addition, some of the prize winners and other well-known mathematicians
started their own blogs, which are also a source for exciting ideas.
There is some time lag between finishing a paper and it appearing in a

journal, as professional journals have a review process that in fact could take
quite a while. But there are also so-called preprint servers where researchers
can post their finished paper as soon as it is ready. One such example is the
ArXiv (arxiv.org), which many mathematicians (and other scientists) use. So
this is a source where you can find results of some very fresh research. ArXiv
also has an option to get regular email updates on new articles in areas of
your choice.
Of course, you should also leverage all your contacts. Your professor would
be a good person to talk about this, or other professors at your school. In
addition, don’t be afraid to contact a person you do not know. It has been
my experience that when you put some thought in an email message to a
mathematician, a good number of them will take the effort to write back.
For instance, if you would write to me and say something along the lines “I
looked at your paper X, and I thought of changing the problem to Y. Would
that be of interest? Has anyone looked at this?”, you would probably get an
answer from me. And if you don’t within a few weeks, maybe just send the
message again as it may just have ended up in a SPAM filter or it somehow
fell off my radar screen.
Finally, let me mention that in my research I found it often useful to try out
ideas numerically using MATLAB R
, MapleTM or Mathematica R
. In some
cases I discovered patterns this way that turned out to be essential. In
addition, try to write things up along the way as it will help you document
what you have done, and it will lower the bar to eventually write a paper.
Typically mathematical texts (such as this book) are written up using the
program LaTeX (or TeX), so it is definitely useful to getting used to this
freely available program. For instance, you can write up your homework
using LaTeX, which will surely score some points with your professor.
It would be great if you picked up a research project. One thing about

mathematical research: we will never run out of questions. In fact, when you
answer a question, it often generates new ones. So, good luck, and maybe I
will see you at a conference sometime when you present your result!
Answers to Exercises
Chapter 1
Exercise 1.5.1 The set of integers Z with the usual addition and multiplication is not a
field. Which of the field axioms does Z satisfy, and which one(s) are not satisfied?
Answer: The only axiom that is not satisfied is number 10, involving the existence of a
multiplicative inverse. For instance, 2 does not have a multiplicative inverse in Z.
Exercise 1.5.2 Write down the addition and multiplication tables for Z2 and Z5 . How is
commutativity reflected in the tables?
Answer: Here are the tables for Z2 and Z5 :

+ 0 1 . 0 1
0 0 1 , 0 0 0
1 1 0 1 0 1
+ 0 1 2 3 4 . 0 1 2 3 4
0 0 1 2 3 4 0 0 0 0 0 0
1 1 2 3 4 0 , 1 0 1 2 3 4
2 2 3 4 0 1 2 0 2 4 1 3
3 3 4 0 1 2 3 0 3 1 4 2
4 4 0 1 2 3 4 0 4 3 2 1
The symmetry in the tables is due to commutativity.
Exercise 1.5.3 The addition and multiplication defined in (1.4) also works when p is not
prime. Write down the addition and multiplication tables for Z4 . How can you tell from
the tables that Z4 is not a field?
Answer: The tables for Z4 are:

+ 0 1 2 3 . 0 1 2 3
0 0 1 2 3 0 0 0 0 0
1 1 2 3 0 , 1 0 1 2 3
2 2 3 0 1 2 0 2 0 2
3 3 0 1 2 3 0 3 2 1
In the multiplication table there is no 1 in the row involving 2. Indeed, 2 does not have a
multiplicative inverse in Z4 , so therefore it is not a field.
249
250 Answers to Exercises
Exercise 1.5.4 Solve Bezout’s identity for the following choices of a and b:
(i) a = 25 and b = 7;
Answer: 25 − 3 · 7 = 4, 7 − 1 · 4 = 3, 4 − 1 · 3 = 1, thus 1 = gcd(25, 7), and we get
1 = 4 − 1 · 3 = 4 − (7 − 1 · 4) = −7 + 2 · 4 = −7 + 2(25 − 3 · 7) = 2 · 25 − 7 · 7.
Thus m = 2 and n = −7 is a solution to (1.5).
(ii) a = −50 and b = 3.
Answer: −50 + 17 · 3 = 1, thus 1 = gcd(−50, 3) and m = 1 and n = 17 is a solution
to (1.5).
(i) 2 + 2 + 2 =
Answer: 0
(ii) 2(2 + 2)−1 =
Answer: 2
Answer: 2

1 2
(iv) Find det .
1 0
Answer: 1

1 2 1 1
(v) Compute .
0 2 2 1

2 0
Answer:
1 2
−1
2 0
(vi) Find .
1 1

2 0
Answer:
1 1
(i) 4 + 3 + 2 =
Answer: 4
(ii) 4(1 + 2)−1 =
Answer: 3
Answer: 4

4 2
(iv) Find det .
1 0
Answer: 3

1 2 0 1
(v) Compute .
3 4 2 1

4 3
Answer: .
3 2
−1
22
(vi) Find .
34

1 1
Answer: .
2 4
Exercise 1.5.7 In this exercise we are working in the field C. Make sure you write the
1+i
final answers in the form a + bi, with a, b ∈ R. For instance, 2−i should not be left as a
final answer, but be reworked as
1+i 1+i 2+i 2 + i + 2i + i2 1 + 3i 1 3i
=( )( )= = = + .
2−i 2−i 2+i 22 + 12 5 5 5
Notice that in order to get rid of i in the denominator, we decided to multiply both
numerator and denominator with the complex conjugate of the denominator.
(i) (1 + 2i)(3 − 4i) − (7 + 8i) =

Answer: 4 − 6i.
1+i 7 i
(ii) 3+4i
= Answer: 25
− 25
.
(iii) Solve for x in (3 + i)x + 6 − 5i = −3 + 2i. Answer: −2 + 3i.

4 + i 2 − 2i
(iv) Find det . Answer: −3 − 4i.
1+i −i

−1 + i 2 + 2i 0 1−i −18 − 2i 6
(v) Compute . Answer: .
−3i −6 + i −5 + 4i 1 − 2i 26 − 29i −7 + 10i
−1 i 1 i
2+i 2−i − 2 8
+ 4
(vi) Find . Answer: i 1
4 4 2 8
− 4i
Exercise 1.5.8 Here

! the field is R(t). Find the inverse of the matrix
1 −1
2 + 3t t2 +2t+1

1 3t − 4 t+1
3t−4 , if it exists. Answer: 9t2 −6t−9 2 2 .
t+1 1+t −t − 2t − 1 3t + 5t + 2
 
1 0 2
1 1 0 
Exercise 1.5.9 Let F = Z3 . Compute the product 1 2 1 . Answer:
2 1 1
2 0 1

2 2 0
.
2 2 0

2−i 2+i 5+i 6−i
Exercise 1.5.10 Let F = C. Compute the product .
2−i −10 1−i 2+i

14 − 4i 14 − 4i
Answer: .
1 + 7i −9 − 18i
Exercise 1.5.11 Let F = Z5 . Put the matrix

 
3 1 4
2 1 0
2 2 1
in row echelon form, and compute its determinant.
Answer: Multiply the first row with 3−1 = 2 and row reduce:
   
1 2 3 1 2 3
0 2 4 → 0 1 0 ,
0 3 0 0 0 4
where we subsequently switched rows 2 and 3, and multiplied (the new) row 2 with 3−1 .
Then    
3 1 4 1 2 3
det 2 1 0 = −3 · 3 det 0 1 0 = 4.
2 2 1 0 0 4
Exercise 1.5.12 Let F = Z3 . Find the set of all solutions to the system of linear
equations
2x1 + x2 =1
.
2x1 + 2x2 + x3 = 0

2 1 0 1 1 2 0 2 1 0 1 1
Answer: → → , so all solutions
2 2 1 0 0 1 1 2 0 1 1 2
are      
x1 1 2
x2  = 2 + x3 2 , x3 ∈ Z3 ,
x3 0 1
or, equivalently,      
 1 0 2 
2 , 1 , 0 .
0 1 2
 
Exercise 1.5.13 Let F = C. Determine whether b is a linear combination of a1 , a2 , a3 ,

where        
i 0 −i 0
1 − i  3+i   2 + 2i 
 , b = 0 .
 
a1 =   , a2 = 
2 − i
 , a3 = 
−1 + i −3 + 2i 0
1 −3 3 1
Answer: Row reducing the augmented matrix yields the row echelon form
 
1 0 −1 0
0 1 1 0
 .
0 0 7 1
0 0 0 0
No pivot in the augmented column, thus b is a linear combination of a1 , a2 , a3 ; in fact
1 1 1
b= a1 − a2 + a3 .
7 7 7
 
2 3 1
Exercise 1.5.14 Let F = Z5 . Compute the inverse of 1 4 1 in two different ways
1 1 2
(row reduction and by applying (1.11)).
 
4 0 3
Answer: 3 1 3.
4 2 0
Exercise 1.5.15 Let F = C. Find bases of the column space, row space and null space of
the matrix  
1 1+i 2
A = 1 + i 2i 3 + i .
1−i 2 3 + 5i
   
 1 2 
Answer: Basis for ColA is 1 + i ,  3 + i  .
1−i 3 + 5i
 

Basis for RowA is 1 1+i 2 , 0 0 1 .
 
 1+i 
Basis for NulA is  −1  .
0
 
 
3 5 0
Exercise 1.5.16 Let F = Z7 . Find a basis for the eigenspace of A = 4 6 5
2 2 4
corresponding to the eigenvalue λ = 1.
 
 1 
Answer: 1 .
1
 
Exercise 1.5.17 Let F = Z3 . Use Cramer’s rule to find the solution to the system of
linear equations
2x1 + 2x2 = 1
.
x1 + 2x2 = 1
Answer: x1 = 0, x2 = 2.
Exercise 1.5.18 Let F = C. Consider the matrix vector equation Ax = b given by

    
i 1−i 2 x1 2
1 + i α 0   x2  =  0  .
1 − i 1 + 2i 3 + 5i x3 5i
Determine α ∈ C so that A is invertible and x1 = x2 .
Answer: α = −1 − i.
Exercise 1.5.19 Let F = R(t). Compute the adjugate of

 1
2 + t2 2 − t

t
2
A= 3t1+t
1 − t .
1 4 + t2 0
Answer:
−(1 − t)(4 + t2 ) (2 − t)(4 + t2 ) 2 − 8t + 4t2 − t3
 
adj(A) = 
 1−t t−2 − 1t + 1 + 4−2t
1+t  .

8+2t2 4+2t2
1+t
− 3t − 4t − t + 2 + t2 3− 1+t
Exercise 1.5.20 Recall that the trace of a square matrix Pisn defined to be the sum of its
diagonal entries. Thus tr[(aij )n
i,j=1 ] = a11 + · · · + ann = j=1 ajj .
(a) Show that if A ∈ Fn×m and B ∈ Fm×n , then tr(AB) = tr(BA).

Answer: Write A = (aij ) and B = (bij ). Then
n
X n X
X m
tr(AB) = (AB)kk = ( akj bjk ).
k=1 k=1 j=1
Similarly,
m
X m X
X n
tr(BA) = (BA)jj = ( bjk akj ).
j=1 j=1 k=1
As akj bjk = bjk akj for all j and k, the equality tr(AB) = tr(BA) follows.
(b) Show that if A ∈ Fn×m , B ∈ Fm×k , and C ∈ Fk×n , then
tr(ABC) = tr(CAB) = tr(BCA).
Answer: By the previous part, we have that tr((AB)C) = tr(C(AB)) and also
tr(A(BC)) = tr((BC)A). Thus tr(BCA) = tr(ABC) = tr(CAB) follows.
(c) Give an example of matrices A, B, C ∈ Fn×n so that tr(ABC) 6= tr(BAC).

0 1 0 0 0 0
Answer: For instance A = ,B = , and C = . Then
0 0 1 0 0 1
tr(ABC) = 0 6= 1 = tr(BAC).
Exercise 1.5.21 Let A, B ∈ Fn×n . The commutator [A, B] of A and B is defined by

[A, B] := AB − BA.
(a) Show that tr([A, B]) = 0.

Answer: By the previous exercise tr(AB) = tr(BA), and thus
tr(AB − BA) = tr(AB) − tr(BA) = 0.
(b) Show that when n = 2, we have that [A, B]2 = − det([A, B])I2 .
Answer: Write A = (aij ) and B = (bij ). Then AB − BA equals

a12 b21 − b12 a21 a11 b12 + a12 b22 − b11 a12 − b12 a22
,
a21 b11 + a22 b21 − b21 a11 − b22 a21 a21 b12 − b21 a12

x y
which is of the form . Then
z −x
2 2
x y x + yz 0
[A, B]2 = = 2 = − det([A, B])I2 ,
z −x 0 x + yz
since det([A, B]) = −x2 − yz.
(c) Show that if C ∈ Fn×n as well, then tr(C[A, B]) = tr([B, C]A).
Answer: Using the previous exercise
tr(C(AB − BA)) = tr(CAB) − tr(CBA) =
tr(BCA) − tr(CBA) = tr((BC − CB)A).
Exercise 1.5.22 Answer:

1 · 3 + 2 · 0 + 3 · 3 + 4 · 4 + 5 · 8 + 6 · 0 + 7 · 6 + 8 · 3 + 9 · 8 + 10 · 8 = rem(286|11) = 0.
Exercise 1.5.23 Answer: AWESOME

Chapter 2
Exercise 2.6.1 For the proof of Lemma 2.1.1 provide a reason why each equality holds.
For instance, the equality 0 = 0u + v is due to Axiom 5 in the definition of a vector space
and v being the additive inverse of 0u.
Answer:
0 = (Axiom 5) = 0u + v = (Field Axiom 4) = (0 + 0)u + v =
= (Axiom 8) = (0u + 0u) + v = (Axiom 2) = 0u + (0u + v) =
= (Axiom 5) = 0u + 0 = (Axiom 4) = 0u.
Exercise 2.6.2 Consider p(X), q(X) ∈ F[X] with F = R or F = C. Show that if

p(X) = q(X) if and only if p(x) = q(x) for all x ∈ F. (One way to do it is by using
derivatives. Indeed, using calculus one can observe that if two polynomials are equal, then
1 dj p
so are all their derivatives. Next observe that pj = j! dxj
(0).) Where do you use in your
proof that F = R or F = C?
f (x+h)−f (x) g(x+h)−g(x)

Answer: When f (x) = g(x) for all x ∈ R, then h
= h
for all x ∈ F
and all h ∈ F \ {0}. And thus, after taking limits, we get f 0 (x) = g 0 (x), assuming f (and
thus g) is differentiable at x. Thus when two differentiable functions are equal, then so are
their derivatives.
As p(x) = q(x) for all x ∈ F, we get that Pp(j) (x) = q (j) (x) for all j.
PIn particular,
p(j) (0) = q (j) (0) for all j. When p(X) = n j
j=0 pj X and q(X) =
n j
j=0 qj X , then
1 (j) 1 (j)
pj = j! p (0) = j! q (0) = qj , for all j. This proves that p(X) = q(X).
When we took derivatives we used that we are working over F = R or F = C. For the other
fields F we are considering in this chapter, derivatives of functions are not defined.
Exercise 2.6.3 When the underlying field is Zp , why does closure under addition
automatically imply closure under scalar multiplication?
Answer: To show that cx lies in the subspace, one simply needs to observe that
cx = x + · · · + x, where in the right-hand side there are c terms. When the subspace is
closed under addition, x + · · · + x will be in the subspace, and thus cx lies in the subspace.
Exercise 2.6.4 Let V = RR . For W ⊂ V , show that W is a subspace of V .
(a) W = {f : R → R : f is continuous}.
(b) W = {f : R → R : f is differentiable}.
Answer: (a). The constant zero function is continuous. As was shown in calculus, when f
and g are continuous, then so are f + g and cf . This gives that W is a subspace.
(b). The constant zero function is differentiable. As was shown in calculus, when f and g
are differentiable, then so are f + g and cf . This gives that W is a subspace.
Exercise 2.6.5 For the following choices of F, V and W , determine whether W is a

subspace of V over F. In case the answer is yes, provide a basis for W .
(a) Let F = R and V = R3 ,

 
x1
W = {x2  : x1 , x2 , x3 ∈ R, x1 − 2x2 + x23 = 0}.
x3
 
−1
Answer: Not closed under scalar multiplication. For example, x =  0  ∈ W , but
1
(−1)x 6∈ W .
(b) F = C and V = C3×3 ,
 
a b c
W = {0 a b  : a, b, c ∈ C}.
0 0 a
Answer: This is a subspace: the zero matrix lies in W (choose a = b = c = 0), the sum
of two matrices in W is again of the same type, and a scalar multiple of a matrix in
W is again of the same type. In fact,
     
 1 0 0 0 1 0 0 0 1 
W = Span  0 1 0 , 0 0 1 , 0 0 0 .
   
0 0 1 0 0 0 0 0 0
 
(c) F = C and V = C2×2 ,

a b̄
W ={ : a, b, c ∈ C}.
b c

0 −i
Answer: Not closed under scalar multiplication. For example, ∈ W , but
i 0

0 −i 0 1
i = 6∈ W .
i 0 −1 0
(d) F = R, V = R2 [X] and
Z 1
W = {p(x) ∈ V : p(x) cos xdx = 0}.
0
Answer: This is a subspace. If p(x) ≡ 0, then 01 p(x) cos xdx = 0, so W contains 0. If

R
R1
p(x) cos xdx = 0 and 0 q(x) cos xdx = 0, then 01 (p + q)(x) cos xdx = 0 and
R1 R
R01
0 (cp)(x) cos xdx = 0, thus W is closed under addition and scalar multiplication.
(e) F = R, V = R2 [X] and
W = {p(x) ∈ V : p(1) = p(2)p(3)}.
Answer: Not closed under scalar multiplication (or addition). For example, p(X) ≡ 1
is in W , but (2p)(X) ≡ 2 is not in W .
(f) F = C, V = C3 , and
 
x1
W = {x2  ∈ C3 : x1 − x2 = x3 − x2 }.
x3

Answer: This is a subspace; it is in fact the kernel of the matrix 1 0 −1 .
Exercise 2.6.6 For the following vector spaces (V over F) and vectors, determine
whether the vectors are linearly independent or linearly independent.
(a) Let F = Z5 , V = Z45 and consider the vectors

     
3 2 1
0 1 2
 , , .
2 0 1
1 3 0
Answer: Making these vectors the columns of a matrix, and performing row reduction
yields that all columns have a pivot. Thus linearly independent.
(b) Let F = R, V = {f | f : (0, ∞) → R is a continuous function}, and consider the vectors
1
t, t2 , .
t
Answer: Suppose at + bt2 + c 1t ≡ 0. As this equality holds for all t, we can choose for
instance t = 1, t = 2 and t = 12 , giving the system
    
1 1 1 a 0
1 4 1   b  = 0 .
2
1 14 2 c 0
Row reducing this matrix gives pivots in all columns, thus a = b = c = 0 is the only
solution. Thus the vectors t, t2 , 1t are linearly independent.
(c) Let F = Z5 , V = Z45 and consider the vectors
     
4 2 1
0 1 2
 , , .
2 0 1
3 3 0
Answer: Making these vectors the columns of a matrix, and performing row reduction
yields that not all columns have a pivot. Thus linearly dependent. In fact,
       
4 2 1 0
0 1 2 0
22 + 3 0 + 1 1 = 0 .
      
3 3 0 0
(d) Let F = R, V = {f | f : R → R is a continuous function}, and consider the vectors
cos 2x, sin 2x, cos2 x, sin2 x.
Answer: The equality cos 2x = cos2 x − sin2 x holds for all x ∈ R. Thus
cos 2x + 0(sin 2x) − cos2 x + sin2 x = 0(x) for all x ∈ R, thus the vectors are linearly
dependent.
(e) Let F = C, V = C2×2 , and consider the vectors

i 1 1 1 −1 i
, , .
−1 −i i −i −i 1
Answer: Suppose

i 1 1 1 −1 i 0 0
a +b +c = .
−1 −i i −i −i 1 0 0
Rewriting we get  
i 1 −1    
 1 a 0
 1 i 
  b  = 0 .
−1 i −i 
c 0
−i −i 1
Row reducing this matrix gives no pivot in column three. We find that
   
a −i
b = c  0 
c 1
is the general solution. Indeed,

i 1 1 1 −1 i 0 0
−i +0 + = ,
−1 −i i −i −i 1 0 0
and thus these vectors are linearly dependent.
(f) Let F = R, V = C2×2 , and consider the vectors

i 1 1 1 −1 i
, , .
−1 −i i −i −i 1
Answer: Suppose

i 1 1 1 −1 i 0 0
a +b +c = ,
−1 −i i −i −i 1 0 0
with now a, b, c ∈ R. As before we find that this implies that a = −ic and b = 0. As
a, b, c ∈ R, this implies that a = b = c = 0, and thus these vectors are linearly
independent over R.
(g) Let F = Z5 , V = F3×2 , and consider the vectors
     
3 4 1 1 1 2
1 0 , 4 2 , 3 1 .
1 0 1 2 1 2
Answer: Suppose
       
3 4 1 1 1 2 0 0
a 1 0 + b 4
 2 + c 3
 1 = 0 0 .
1 0 1 2 1 2 0 0
Rewriting, we get
3 1 1 0
   
4 1 2   0
 a
1 4 3   0
  
 b =  .
0 2 1 0

1 c
1 1 0
0 2 2 0
solution. Thus the vectors are linearly independent.
(h) Let F = R, V = {f | f : R → R is a continuous function}, and consider the vectors
1, et , e2t .
Answer: Suppose a + bet + ce2t ≡ 0. As this equality holds for all t, we can choose for
instance t = 0, t = ln 2 and t = ln 3, giving the system
    
1 1 1 a 0
1 2 4  b  = 0 .
1 3 9 c 0
solution. Thus the vectors 1, et , e2t are linearly independent.
Exercise 2.6.7 Let v1 , v2 , v3 be linearly independent vectors in a vector space V .

(a) For which k are kv1 + v2 , kv2 − v3 , v3 + v1 linearly independent?

(b) Show that if v is in the span of v1 , v2 and in the span of v2 + v3 , v2 − v3 , then v is a
multiple of v2 .
Answer: (a) Suppose a(kv1 + v2 ) + b(kv2 − v3 ) + c(v3 + v1 ) = 0. Then

(ak + c)v1 + (a + bk)v2 + (−bk + c)v3 = 0.
As v1 , v2 , v3 are linearly independent, we get ak + c = 0, a + bk = 0, and −b + c = 0. Thus
    
k 0 1 a 0
1 k 0  b  = 0 .
0 −1 1 c 0
For this system to have a nontrivial solution, we need that the determinant of the matrix
equals 0. This yields the equation k2 − 1 = 0. Thus for k = 1 and k = −1 we get linearly
dependent vectors.
(b) v = av1 + bv2 and v = c(v2 + v3 ) + d(v2 − v3 ), gives

av1 + bv2 = c(v2 + v3 ) + d(v2 − v3 ). Then av1 + (b − c − d)v2 + (−c + d)v3 = 0. As
v1 , v2 , v3 are linearly independent, we get a = 0, b − c − d = 0, and −c + d = 0. Since
a = 0, we have v = bv2 , and thus is v a multiple of v2 .
Exercise 2.6.8 (a) Show that if the set {v1 , . . . , vk } is linearly independent, and vk+1
is not in Span{v1 , . . . , vk }, then the set {v1 , . . . , vk , vk+1 } is linearly independent.
(b) Let W be a subspace of an n-dimensional vector space V , and let {v1 , . . . , vp } be a
basis for W . Show that there exist vectors vp+1 , . . . , vn ∈ V so that
{v1 , . . . , vp , vp+1 , . . . , vn } is a basis for V .
(Hint: once v1 , . . . , vk are found and k < n, observe that one can choose
vk+1 ∈ V \ (Span{v1 , . . . , vk }). Argue that this process stops when k = n, and that
at that point a basis for V is found.)
Answer: (a) Let c1 , . . . , ck , ck+1 be so that c1 v1 + · · · + ck vk + ck+1 vk+1 = 0. Suppose

6 0. Then vk+1 = − c c1 v1 − · · · − c ck vk ∈ Span{v1 , . . . , vk }. Contradiction.
that ck+1 =
k+1 k+1
Thus we must have ck+1 = 0. Then we get that c1 v1 + · · · + ck vk = 0. As {v1 , . . . , vk } is
linearly independent, we now must have c1 = · · · = ck = 0. Thus
c1 = · · · = ck = ck+1 = 0, and linear independence of {v1 , . . . , vk , vk+1 } follows.
(b) Suppose that v1 , . . . , vk are found and k < n. Then Span{v1 , . . . , vk } is a

k-dimensional subspace of V . As dim V = n > k, there must exist a
vk+1 ∈ V \ (Span{v1 , . . . , vk }). By (a) we have that the set {v1 , . . . , vk , vk+1 } is linearly
independent. If k + 1 < n, one continues this process. Ultimately one finds a linearly
independent set {v1 , . . . , vp , vp+1 , . . . , vn }. This set must span V . Indeed, if we take
v ∈ V , then by Remark 2.4.5 {v1 , . . . , vn , v} is a linear dependent set. Due to linear
independence of {v1 , . . . , vn } this implies that v ∈ Span{v1 , . . . , vn }. Thus
V = Span{v1 , . . . , vn } and {v1 , . . . , vn } is linearly independent, thus {v1 , . . . , vn } is a
basis for V .
Exercise 2.6.9 Let V = R2 [X] and

W = {p ∈ V : p(2) = 0}.
(a) Show that W is a subspace of V .

(b) Find a basis for W .

Answer: (a) We have 0(2) = 0, so 0 ∈ W . Also, when p, q ∈ W and c ∈ R, we have
(p + q)(2) = p(2) + q(2) = 0 + 0 = 0 and (cp)(2) = cp(2) = c0 = 0, so p + q ∈ W and
cp ∈ W . Thus W is a subspace.
(b) A general element in V is of the form p0 + p1 X + p2 X 2 . For this element to be in
W we have the condition p(2) = 0, yielding p0 + 2p1 + 4p2 = 0. Thus
 
p0
1 2 4 p1  = 0.
p2
With p1 and p2 as free variables, we find p0 = −2p1 − 4p2 , thus we get
p0 + p1 X + p2 X 2 = −2p1 − 4p2 + p1 X + p2 X 2 = p1 (−2 + X) + p2 (−4 + X 2 ).
Thus {−2 + X, −4 + X 2 } is a basis for W .
Exercise 2.6.10 For the following choices of subspaces U and W in V , find bases for
U + W and U ∩ W .
(a) V = R5 [X], U = Span{X + 1, X 2 − 1}, W = {p(X) : p(2) = 0}.

(b) V = Z45 ,        
3 2 1 4
0 1 2 4
U = Span{  ,  }, W = Span{  ,  
       }.
2 0 1 1
1 0 0 1
Answer: (a) A general element in U is of the form a(X + 1) + b(X 2 − 1). For this to be in
W , we need a(2 + 1) + b(4 − 1) = 0. Thus 3a + 3b = 0, yielding a = −b. Thus a general
element in U ∩ W is of the form a(X + 1 − (X 2 − 1)) = a(2 + X − X 2 ). A basis for U ∩ W
is {2 + X − X 2 }.
A basis for W is {−2 + X, −4 + X 2 , −8 + X 3 , −16 + X 4 , −32 + X 5 }, thus U + W is

spanned by {X + 1, X 2 − 1, −2 + X, −4 + X 2 , −8 + X 3 , −16 + X 4 , −32 + X 5 }. This is a
linear dependent set. Removing −4 + X 2 , makes it a basis for U + W , so we get that
{X + 1, X 2 − 1, −2 + X, −8 + X 3 , −16 + X 4 , −32 + X 5 } is a basis for U + W . In fact,
U + W = R5 [X], so we can also take {1, X, X 2 , X 3 , X 4 , X 5 } as a basis for U + W .
(b) If v ∈ U ∩ W , then there exist a, b, c, d so that

       
3 2 1 4
0 1 2 4
v = a  + b  = c  + d 
       .
2 0 1 1
1 0 0 1
This gives     
3 2 1 4 a 0
0 1 2 4
  b  = 0 .
   

2 0 1 1  −c  0
1 0 0 1 −d 0
Row reduction yields the echelon form
 
1 4 2 3
0 1 2 4
 ,
0 0 3 2
0 0 0 0
making d a free variable, and c = d. Thus

     
1 4 0
2 4 1
c  + c  = c 
    
1 1 2
0 1 1
is a general element of U ∩ W . Thus  
0
1
{
2}

1
is a basis for U ∩ W .
For a basis for U + W , we find a basis for the column space of

 
3 2 1 4
0 1 2 4
2 0 1 1 .
 
1 0 0 1
From the calculations above, we see that the first three columns are pivot columns. Thus
     
3 2 1
0 1 2
{
2 , 0 , 1}
    
1 0 0
is a basis for U + W .
Exercise 2.6.11 Let {v1 , v2 , v3 , v4 , v5 } be linearly independent vectors in a vector space

V . Determine whether the following sets are linearly dependent or linearly independent.
(a) {v1 + v2 + v3 + v4 , v1 − v2 + v3 − v4 , v1 − v2 − v3 − v4 }
(b) {v1 + v2 , v2 + v3 , v3 + v4 , v4 + v5 , v5 + v2 }
(c) {v1 + v3 , v4 − v2 , v5 + v1 , v4 − v2 , v5 + v3 , v1 + v2 }.
When you did this exercise, did you make any assumptions on the underlying field?
Answer: (a) Let a, b, c be so that

a(v1 + v2 + v3 + v4 ) + b(v1 − v2 + v3 − v4 ) + c(v1 − v2 − v3 − v4 ) = 0.
Rewriting, we get
(a + b + c)v1 + (a − b − c)v2 + (a + b − c)v3 + (a − b − c)v4 = 0.
As {v1 , v2 , v3 , v4 , v5 } is linearly independent, we get
   
1 1 1   0
1 −1 −1 a
  b  = 0 .
 

1 1 −1 0
c
1 −1 −1 0
Row reduction gives the echelon form
 
1 1 1
0
 −2 −2,
0 0 −2
0 0 0
where we assumed that −2 = 6 0. As there is a pivot in every column, we get that

{v1 + v2 + v3 + v4 , v1 − v2 + v3 − v4 , v1 − v2 − v3 − v4 } is linearly independent. We
6 Z2 .
assumed that F =
If F = Z2 , then {v1 + v2 + v3 + v4 , v1 − v2 + v3 − v4 , v1 − v2 − v3 − v4 } is linearly

dependent.
(b) Here we obtain the matrix

1 0 0 0 0
 
1 1 0 0 1
0 1 1 0 0 .
 
0 0 1 1 0
0 0 0 1 1
The echelon form is
1 0 0 0 0
 
0 1 0 0 1 
0 0 1 0 −1 .
 
0 0 0 1 1 
0 0 0 0 0
No pivot in the last column, so {v1 + v2 , v2 + v3 , v3 + v4 , v4 + v5 , v5 + v2 } is linearly
dependent. This works for all fields.
(c) Here we have six vectors in the five-dimensional space Span{v1 , v2 , v3 , v4 , v5 }. Thus
these vectors are linearly dependent. This works for all fields.
Exercise 2.6.12
Let {v1 , v2 , v3 , v4 } be a basis for a vector space V over Z3 . Determine whether the
following are also bases for V .
(a) {v1 + v2 + v3 + v4 , v1 − v2 + v3 − v4 , v1 − v2 − v3 − v4 .}
(b) {v1 , v2 + v3 + v4 , v1 − v2 + v3 − v4 , v1 − v2 − v3 − v4 .}
(c) {v1 + v2 + v3 + v4 , v1 − v2 + v3 − v4 , v1 − v2 − v3 − v4 , v2 + v4 , v1 + v3 .}
Answer: (a) These three vectors can never span the four-dimensional space V , so this is
not a basis.
(b) Here we obtain the matrix

 
1 0 1 1
0
 1 −1 −1
.
0 1 1 −1
0 1 −1 −1
The echelon form is  
1 0 1 1
0
 1 −1 −1
.
0 0 −2 0 
0 0 0 0
No pivot in the last column, so linearly dependent. Thus not a basis.
(c) Here we have five vectors in a four-dimensional vector space, thus not a basis.
Exercise 2.6.13
For the following choices of vector spaces V over the field F, bases B and vectors v,
determine [v]B .
(a) Let F = Z5 , V = Z45 ,

         
3 2 1 0 1
0 1 2 2
 ,   ,   ,  }, v = 3 .
 
B = {
2 0 1 1 2
1 0 0 0 2
t3 +3t2 +5
(b) Let F = R, B = {t, t2 , 1t }, V = SpanB and v = t
.
(c) Let F = C, V = C2×2 ,

0 1 1 1 i 0 i 1 −2 + i 3 − 2i
B={ , , }, v = .
−1 −i i −i −1 −i −1 −i −5 − i 10
(d) Let F = R, V = C2×2 , and consider the vectors

−1 i
B = {E11 , E12 , E21 , E22 , iE11 , iE12 , iE21 , iE22 }, v = .
−i 1
(e) Let F = Z5 , V = SpanB,

       
 3 4 1 1 1 2  0 2
B = 1 0 , 4 2 , 3 3 , v = 3 0 .
1 0 1 2 3 0 0 2
 
 
2
2
Answer: (a) [v]B = 
1 .

2
 
3
(b) [v]B = 1 .
5
 
4−i
 2 + 7i 
(c) [v]B = 
−3 + 12i .

−3 − 8i
−1
 
 0 
 0 
 
 1 
 
(d) [v]B =   .
 0 
 1 
 
−1
0
 
1
(e) [v]B = 1 .
1
Exercise 2.6.14 Given a matrix A = (ajk )n m

j=1,k=1 ∈ C
n×m , we define
∗ m n
A = (akj )j=1,k=1 ∈ C m×n . For instance,
 
∗ 1 − 2i 7 − 8i
1 + 2i 3 + 4i 5 + 6i
= 3 − 4i 9 − 10i  .
7 + 8i 9 + 10i 11 + 12i
5 − 6i 11 − 12i

2 1 − 3i
We call a matrix A ∈ Cn×n Hermitian if A∗ = A. For instance, is
1 + 3i 5
Hermitian. Let Hn ⊆ C n×n be the set of all n × n Hermitian matrices.
(a) Show that Hn is not a vector space over C.

(b) Show that Hn is a vector space over R. Determine dimR Hn .
(Hint: Do it first for 2 × 2 matrices.)

0 i 0 i 0 −1
Answer: (a) ∈ H2 , but i = 6∈ H2 .
−i 0 −i 0 1 0
(b) We observe that (A + B)∗ = A∗ + B ∗ and (cA)∗ = cA∗ , when c ∈ R. Observe that the
zero matrix is in Hn . Next, if A, B ∈ Hn , then (A + B)∗ = A∗ + B ∗ = A + B, thus
A + B ∈ Hn . Finally, if c ∈ R and A ∈ Hn , then (cA)∗ = cA∗ = cA, thus cA ∈ Hn . This
shows that Hn is a subspace over R.
As a basis for Hn we can choose

{Ejj : 1 ≤ j ≤ n} ∪ {Ejk + Ekj : 1 ≤ j < k ≤ n} ∪ {iEjk − iEkj : 1 ≤ j < k ≤ n}.
There are n + 2 n−1 2 2

P
j=1 j = n elements in this basis, thus dimR Hn = n .
Exercise 2.6.15 (a) Show that for finite-dimensional subspaces U and W of V we have
that dim(U + W ) = dim U + dim W − dim(U ∩ W ).
(Hint: Start with a basis {v1 , . . . , vp } for U ∩ W . Next, find u1 , . . . , uk so that
{v1 , . . . , vp , u1 , . . . , uk } is a basis for U . Similarly, find w1 , . . . , wl so that
{v1 , . . . , vp , w1 , . . . , wl } is a basis for W . Finally, argue that
{v1 , . . . , vp , u1 , . . . , uk , w1 , . . . , wl } is a basis for U + W .)
(b) Show that for a direct sum U1 +̇ · · · +̇Uk of finite-dimensional subspaces U1 , . . . , Uk ,
we have that
dim(U1 +̇ · · · +̇Uk ) = dim U1 + · · · + dim Uk .
Answer: (a) Following the hint we need to show that {v1 , . . . , vp , u1 , . . . , uk , w1 , . . . , wl }

is a basis for U + W . First, let v in U + W . Then there exists a u ∈ U and a w ∈ W so
that v = u + w. As u ∈ U , there exists ai and bi so that
p
X k
X
u= ai vi + bi ui .
i=1 i=1
As w ∈ W , there exists ci and di so that

p
X l
X
w= ci v i + di wi .
i=1 i=1
Then v = u + w = pi=1 (ai + ci )vi + ki=1 bi ui + li=1 di wi , thus

P P P
{v1 , . . . , vp , u1 , . . . , uk , w1 , . . . , wl } span U + W . Next, to show linear independence,
suppose that
p
X k
X l
X
ai vi + bi ui + ci wi = 0.
i=1 i=1 i=1
Then
p
X k
X l
X
ai vi + bi ui = − ci wi ∈ U ∩ W.
i=1 i=1 i=1
As {v1 , . . . , vp } is a basis for U ∩ W , there exist di so that
l
X p
X
− ci wi = di vi .
i=1 i=1
Then pi=1 di vi + li=1 ci wi = 0. As {v1 , . . . , vp , w1 , . . . , wl } is linearly independent, we

P P
get that d1 = · · · = dp = c1 = · · · = cl = 0. But then we get that
Pp Pk
i=1 ai vi + i=1 bi ui = 0. Using now that {v1 , . . . , vp , u1 , . . . , uk } is linearly
independent, we get a1 = · · · = ap = b1 = · · · = bk = 0. This shows that
{v1 , . . . , vp , u1 , . . . , uk , w1 , . . . , wl } is linearly independent, proving that it is a basis for
U + W.
Thus dim U + W = p + k + l = (p + k) + (p + l) − p = dim U + dim W − dim(U ∩ W ).
(b) We show this by induction. It is trivial for k = 1. Suppose we have proven the
statement for k − 1, giving dim(U1 +̇ · · · +̇Uk−1 ) = dim U1 + · · · + dim Uk−1 . Then, using
(a) we get
dim[(U1 +̇ · · · +̇Uk−1 )+̇Uk ] = dim(U1 +̇ · · · +̇Uk−1 ) + dim Uk −
dim[(U1 +̇ · · · +̇Uk−1 ) ∩ Uk ] = dim(U1 +̇ · · · +̇Uk−1 ) + dim Uk − 0,
where we used that (U1 +̇ · · · +̇Uk−1 ) ∩ Uk = {0}. Now using the induction assumption, we
get
dim[(U1 +̇ · · · +̇Uk−1 )+̇Uk ] = dim(U1 +̇ · · · +̇Uk−1 ) + dim Uk =
dim U1 + · · · + dim Uk−1 + dim Uk .
This proves the statement.
Chapter 3
Exercise 3.4.1 Let T : V → W and S : W → X be linear maps. Show that the

composition S ◦ T : V → X is also linear.
Answer: (S ◦ T )(v + w) = S(T (v + w)) = S(T (v) + T (w)) = S(T (v)) + S(T (w)) =
(S ◦ T )(v) + (S ◦ T )(w), and
(S ◦ T )(cv) = S(T (cv)) = S(cT (v)) = cS(T (v)) = c(S ◦ T )(v), proving linearity.
Exercise 3.4.2 For the following choices of V , W and T : V → W , determine whether T

is linear or not.
(a) V = R3 , W = R4 ,  
  x1 − 5x3
x1  7x2 + 5 
T  x2  = 
3x1 − 6x2  .

x3
8x3
(b) V = Z35 , W = Z25 ,  
x1
x1 − 2x3
T  x2  = .
3x2 x3
x3
(c) V = W = C2×2 (over F = C), T (A) = A − AT .
(d) V = W = C2×2 (over F = C), T (A) = A − A∗ .
(e) V = W = C2×2 (over F = R), T (A) = A − A∗ .
(f) V = {f : R → R : f is differentiable}, W = RR ,
(T (f ))(x) = f 0 (x)(x2 + 5).
(g) V = {f : R → R : f is continuous}, W = R,
Z 10
T (f ) = f (x)dx.
−5
 
  0
0 5
Answer: (a) T 0 =  
   6= 0, thus T is not linear.
0
0
0
   
0 0
−2 −4
(b) 2T 1 = 2
  6= = T 2, so T is not linear.
3 12
1 2
(c) T (A + B) = A + B − (A + B)T = A + B − AT − B T = T (A) + T (B) and

T (cA) = (cA) − (cA)T = cA − cAT = cT (A), thus T is linear.

0 1 0 i 0 0 0 1 0 1 0 0
(d) T (i )= − , however iT = i( − ),
0 0 0 0 −i 0 0 0 0 0 1 0
thus T does not satisfy the rule T (cA) = cT (A).
(e) T (A + B) = A + B − (A + B)∗ = A + B − A∗ − B ∗ = T (A) + T (B) and

T (cA) = (cA) − (cA)∗ = cA − c̄A∗ = cT (A), where in the last step we used that c̄ = c as c
is real. Thus T is linear.
(f) (T (f + g))(x) = (f + g)0 (x)(x2 + 5) = (f 0 (x) + g 0 (x))(x2 + 5) = T (f )(x) + T (g)(x) =

(T (f ) + T (g))(x), and T (cf )(x) = (cf )0 (x)(x2 + 5) = cf 0 (x)(x2 + 5) = c(T (f ))(x), thus T
is linear.
R 10 R 10 R 10
(g) T (f + g) = −5 f (x) + g(x)dx = −5 f (x)dx + −5 g(x)dx = T (f ) + T (g), and
R 10 R 10
T (cf ) = −5 cf (x)dx = c −5 f (x)dx = cT (f ), and thus T is linear.
Exercise 3.4.3 Show that if T : V → W is linear and the set {T (v1 ), . . . , T (vk )} is
linearly independent, then the set {v1 , . . . , vk } is linearly independent.
Answer: Let c1 , . . . , ck be so that c1 v1 + · · · + ck vk = 0. We need to show that

c1 = · · · = ck = 0. We have T (c1 v1 + · · · + ck vk ) = T (0), which gives that
c1 T (v1 ) + · · · + ck T (vk ) = 0. As {T (v1 ), . . . , T (vk )} is linearly independent, we get
c1 = · · · = ck = 0.
Exercise 3.4.4 Show that if T : V → W is linear and onto, and {v1 . . . , vk } is a basis for
V , then the set {T (v1 ), . . . , T (vk )} spans W . When is {T (v1 ), . . . , T (vk )} a basis for W ?
Answer: We need to show that every w ∈ W is a linear combination of T (v1 ), . . . , T (vk ).

So, let w ∈ W . As T is onto, there exists a v ∈ V so that T (v) = w. As {v1 . . . , vk } is a
basis for V , there exist scalars c1 , . . . , ck so that v = c1 v1 + · · · + ck vk . Then
w = T (v) = T (c1 v1 + · · · + ck vk ) = c1 T (v1 ) + · · · + ck T (vk ),
where in the last equality we use the linearity of T . Thus w is a linear combination of
T (v1 ), . . . , T (vk ).
When T is not one-to-one, then {T (v1 ), . . . , T (vk )} is linearly independent, and therefore
a basis. Indeed, suppose that c1 T (v1 ) + · · · + ck T (vk ) = 0. Then
T (c1 v1 + · · · + ck vk ) = T (0). When T is one-to-one, this implies c1 v1 + · · · + ck vk = 0.
As {v1 . . . , vk } is linearly independent, this yields c1 = · · · = ck = 0.
Exercise 3.4.5 Let T : V → W be linear, and let U ⊆ V be a subspace of V . Define

T [U ] := {w ∈ W ; there exists u ∈ U so that w = T (u)}. (3.25)
Observe that T [V ] = Ran T .
(a) Show that T [U ] is a subspace of W .

(b) Assuming dim U < ∞, show that dim T [U ] ≤ dim U .
(c) If Û is another subspace of V , is it always true that T [U + Û ] = T [U ] + T [Û ]? If so,
provide a proof. If not, provide a counterexample.
(d) If Û is another subspace of V , is it always true that T [U ∩ Û ] = T [U ] ∩ T [Û ]? If so,
provide a proof. If not, provide a counterexample.
Answer: (a) First observe that 0 ∈ U and T (0) = 0 gives that 0 ∈ T [U ]. Next, let w,
ŵ ∈ T [U ] and c ∈ F. Then there exist u, û ∈ U so that T (u) = w and T (û) = ŵ. Then
w + ŵ = T (u + û) ∈ T [U ] and cw = T (cu) ∈ T [U ]. Thus, by Proposition 2.3.1, T [U ] is a
subspace of W .
(b) Let {v1 , . . . , vp } be a basis for U . We claim that T [U ] = Span{T (v1 ), . . . , T (vp )},
from which it then follows that dim T [U ] ≤ dim U .
Clearly, T (v1 ), . . . , T (vp ) ∈ T [U ], and since T [U ] is a subspace we have that

Span{T (v1 ), . . . , T (vp )} ⊆ T [U ]. For the converse inclusion, let w ∈ T [U ]. Then there
exists a v ∈ U so that T (v) = w. As {v1 , . . . , vp } is a basis for U , there exist
c1 , . . . , cp ∈ F so that v = c1 v1 + · · · + cp vp . Then
p
X p
X
w = T (v) = T ( cj v j ) = cj T (vj ) ∈ Span{T (v1 ), . . . , T (vp )}.
j=1 j=1
Thus T [U ] ⊆ Span{T (v1 ), . . . , T (vp )}. We have shown both inclusions, and consequently
T [U ] = Span{T (v1 ), . . . , T (vp )} follows.
(c) Let w ∈ T [U + Û ]. Then there exists a v ∈ U + Û so that w = T (v). As v ∈ U + Û

there exists u ∈ U and û ∈ Û so that v = u + û. Then w = T (v) = T (u + û) =
T (u) + T (û) ∈ T [U ] + T [Û ]. This proves T [U + Û ] ⊆ T [U ] + T [Û ].
For the converse inclusion, let w ∈ T [U ] + T [Û ]. Then there is an x ∈ T [U ] and a

x̂ ∈ T [Û ], so that w = x + x̂. As x ∈ T [U ], there exists a u ∈ U so that x = T (u). As
x̂ ∈ T [Û ], there exists a û ∈ Û so that x̂ = T (û). Then w = x + x̂ = T (u) + T (û) =
T (u + û) ∈ T [U + Û ]. This proves T [U ] + T [Û ] ⊆ T [U + Û ], and we are done.

x1 x1 + x2
(d) Let T : R2 → R2 be given via T = , and let U = Span{e1 } and
x2 0
Û = Span{e2 }. Then T [U ∩ Û ] = T [{0}] = {0}, while T [U ] ∩ T [Û ] =
Span{e1 } ∩ Span{e1 } = Span{e1 }. So T [U ∩ Û ] 6= T [U ] ∩ T [Û ] in this case.
Exercise 3.4.6 Let v1 , v2 , v3 , v4 be a basis for a vector space V .
(a) Let T : V → V be given by T (vi ) = vi+1 , i = 1, 2, 3, and T (v4 ) = v1 . Determine the

matrix representation of T with respect to the basis {v1 , v2 , v3 , v4 }.
(b) If the matrix representation of a linear map S : V → V with respect to the
{v1 , v2 , v3 , v4 } is given by  
1 0 1 1
 0 2 0 2 
 ,
 1 2 1 3 
−1 0 −1 −1
determine S(v1 − v4 ).
(c) Determine bases for Ran S and Ker S.
 
0 0 0 1
1 0 0 0
Answer: (a) 
 .
0 1 0 0
0 0 1 0
(b) S(v1 − v4 ) = S(v1 ) − S(v4 ) = v1 + v3 − v4 − (v1 + 2v2 + 3v3 − v4 ) = −2v2 − 2v3 .

 
1 0 1 1
0 1 0 1
(c) The reduced echelon form of the matrix representation in (b) is 
 .
0 0 0 0
0 0 0 0
From
 this we deduce
 that with respect to
thebasis
  {v1 , v2 , v3 , v4 } we have that
−1 −1 1 0
 0  −1  0  2
{
 1  ,  0 } is a basis for Ker T , { 1  , 2} is a basis for Ran T . In other
      
0 1 −1 0
words, {−v1 + v3 , −v1 − v2 + v4 } is a basis for Ker T , {v1 + v3 − v4 , 2v2 + 2v3 } is a
basis for Ran T .

p(1)
Exercise 3.4.7 Consider the linear map T : R2 [X] → R2 given by T (p(X)) = .
p(3)

Answer: (a) {(X − 1)(X − 3)} = {X 2 − 4X + 3}.
(b) Ran T = R2 , so a possible basis is {e1 , e2 }.
Exercise 3.4.8 Let T : V → W with V = Z45 and W = Z2×2

5 be defined by
 
a
b a+b b+c
 c ) = c + d d + a .
T ( 

 
4
1
Answer: (a) { 
 }.
4
1

1 0 1 1 0 1
(b) { , , }.
0 1 0 0 1 0
Exercise 3.4.9 For the following T : V → W with bases B and C, respectively, determine
the matrix representation for T with respect to the bases B and C. In addition, find bases
for the range and kernel of T .
2
d d
(a) B = C = {sin t, cos t, sin 2t, cos 2t}, V = W = Span B, and T = dt 2 + dt .

1 1 p(3)
(b) B = {1, t, t2 , t3 }, C = { , }, V = C3 [X], and W = C2 , and T (p) = .
0 −1 p(5)
d
(c) B = C = {et cos t, et sin t, e3t , te3t }, V = W = Span B, and T = dt .
R 1
1 1 0 p(t)dt .
(d) B = {1, t, t2 }, C = { , }, V = C2 [X], and W = C2 , and T (p) =
1 0 p(1)
 
−1 −1 0 0
 1 −1 0 0 , Ker T = {0}, B is a basis for Ran T .
Answer: (a)   0 0 −4 −2
0 0 2 −4
   
15 120
2 8 34 152 −8 −49
(b)  1   0 } is a basis for Ker T , {e1 , e2 } is a
, { , 
−1 −5 −25 −125
0 1
basis
for Ran T. 
1 1 0 0
−1 1 0 0
(c)  , Ker T = {0}, {e1 , e2 , e3 , e4 } is a basis for Ran T .
 0 0 3 1
0 0 0 3

1 1 1 1 4 2
(d) 2 , { 3 − 3 X + X } is a basis for Ker T , {e1 , e2 } is a basis for
0 − 21 −3
Ran T .
1
Exercise 3.4.10 Let V = Cn×n . Define L : V → V via L(A) = 2
(A + AT ).
(a) Let
1 0 0 1 0 0 0 0
B={ , , , }.
0 0 0 0 1 0 0 1
Determine the matrix representation of L with respect to the basis B.
(b) Determine the dimensions of the subspaces
W = {A ∈ V : L(A) = A} and Ker L = {A ∈ V : L(A) = 0}.
(c) Determine the eigenvalues of L.
 
1 0 0 0
0 1 1
2 2
0
Answer: (a) C := [L]B←B =
 1 1
.
0 2 2
0
0 0 0 1
− 21 1
   
0 0 0 0 0 0
2
0 − 12 1
2
0
 0 0 0 0
(b) Row reduce C − I =  1 →   , so dim W = 3.
0
2
− 12 0 0 0 0 0
0 0 0 0 0 0 0 0
   
1 0 0 0 1 0 0 0
0 1 1 1 1
2 2
0 → 0

2 2
0
Row reduce C = 
 1 1 0
 , so dim Ker L = 1.
0 2 2
0 0 0 1
0 0 0 1 0 0 0 0
(c) 0 and 1 are the only eigenvalues of L.
Exercise 3.4.11 Let B = {1, t, . . . , tn }, C = {1, t, . . . , tn+1 }, V = Span B and

W = Span C. Define A : V → W via
Af (t) := (2t2 − 3t + 4)f 0 (t),
where f 0 is the derivative of f .
(a) Find the matrix representation of A with respect to the bases B and C.
(b) Find bases for Ran A and Ker A.
0 4 0 ··· 0
 
 .. .. 
0
 −3 8 . . 

..
 
 
 2 −6 . 0 
Answer: (a)  .
 .. 
.
 
 4 4n 
 
 .. 
 . −3n
0 2n
(b) {1} is a basis for Ker A.
{2t2 − 3t + 4, 2t3 − 3t2 + 4t, 2t4 − 3t3 + 4t2 , · · · , 2tn+1 − 3tn + 4tn−1 } is a basis for Ran A.
Chapter 4
Exercise 4.10.1 Let F = Z3 . Check the Cayley–Hamilton Theorem on the matrix

 
1 0 2
A = 2 1 0 .
2 2 2
Answer: We have pA (λ) = λ3 + 2λ2 + λ. Now

     
1 1 1 2 1 0 1 0 2
2 0 1 + 2 1 1 1 + 2 1 0 = 0.
2 1 0 1 0 2 2 2 2
Exercise 4.10.2 For the following matrices A (and B) determine its Jordan canonical
form J and a similarity matrix P , so that P −1 AP = J.
(a)  
−1 1 0 0
−1 0 1 0
A=
 .
−1 0 0 1
−1 0 0 1
Answer:    
1 0 0 0 0 1 0 0
1 1 0 0
 , J = 0 0 1 0

P =
1
.
1 1 0 0 0 0 1
1 1 1 1 0 0 0 0
(b)
10 −1 1 −4 −6
 
9 −1 1 −3 −6
A=4 −1 1 −3 −1 .
 
9 −1 1 −4 −5
10 −1 1 −4 −6
Answer:
1 3 1 −1 2 0 1 0 0 0
   
1 2 2 −1 0  0 0 0 0 0
P = 1 3 2 4 5  , J = 0 0 0 1 0
   
1 3 1 0 2  0 0 0 0 1
1 3 1 −1 −3 0 0 0 0 0
(c)  
0 1 0
A = −1 0 0 .
1 1 1
Answer: Eigenvalues are 1, i − i.
   
0 −1 −1 1 0 0
P = 0 −i i  , J = 0 i 0
1 i −i 0 0 −i
(d)  
2 0 −1 1
0 1 0 0
A=
 .
1 0 0 0
0 0 0 1
Answer: The only eigenvalue is 1. We have
 
0 0 0 1
0 0 0 0
(A − I)2 =  3
0 0 0 1 , and (A − I) = 0.

0 0 0 0
So we get  
1
 1 1 0
J = .
 0 1 1
0 0 1
For P we can choose  
0 1 1 0
1 0 0 0
 .
0 1 0 0
0 0 0 1
(e)  
1 −5 0 −3
 1 1 −1 0 
B=
 .
0 −3 1 −2
−2 0 2 1
(Hint: 1 is an eigenvalue)
Answer: We find that 1 is the only eigenvalue.
   
1 0 −1 0 0 −2 0 −1
0 −2 0 −1 0 0 0 0 
(B − I)2 =  3
 
, and (A − I) =  .
1 0 −1 0  0 −2 0 1 
0 4 0 2 0 0 0 0
So we get  
1 1 0 0
0 1 1 0
J =
0
.
0 1 1
0 0 0 1
For P we can choose  
−1 0 −3 0
 0 −1 0 0
P =
−1
.
0 −2 0
0 2 0 1
(f) For the matrix B, compute B 100 , by using the decomposition B = P JP −1 .
Answer: As
   
2 0 −3 0 1 100 4950 161700
−1
 0 −1 0 0 , J 100 = 0
 1 100 49500 
P = ,
−1 0 1 0  0 1 100 
0 2 0 1 0 2 0 1
we find that
 
4951 −323900 −4950 −162000
 100 −9899 −100 −4950 
B 100 = P J 100 P −1 = .
 4950 −323700 −4949 −161900
−200 19800 200 9901
Exercise 4.10.3 Let

3 1 0 0 0 0 0
 
0 3 1 0 0 0 0
0 0 3 0 0 0 0
 
A 0 0 0 3 1 0 0 .
 
0 0 0 0 3 0 0
 
0 0 0 0 0 3 1
0 0 0 0 0 0 3
Determine bases for the following spaces:
(a) Ker(3I − A).

(b) Ker(3I − A)2 .
(c) Ker(3I − A)3 .
Answer: (a) {e1 , e4 , e6 }.
(b) {e1 , e2 , e4 , e5 , e6 , e7 }.
(c) {e1 , e2 , e3 , e4 , e5 , e6 , e7 }.
Exercise 4.10.4 Let M and N be 6 × 6 matrices over C, both having minimal

polynomial x3 .
(a) Prove that M and N are similar if and only if they have the same rank.
(b) Give a counterexample to show that the statement is false if 6 is replaced by 7.
(c) Compute the minimal and characteristic polynomials of the following matrix. Is it
diagonalizable?  
5 −2 0 0
6 −2 0 0 
 
0 0 0 6 
0 0 1 −1
Answer: (a) both M and N have only 0 as the eigenvalue, and at least one Jordan block
at 0 is of size 3 × 3. So the possible Jordan forms are
0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0
     
0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
     
, , .
0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0

0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Knowing the rank uniquely identifies the Jordan canonical form.
0 1 0 0 0 0 0 0 1 0 0 0 0 0
   
0 0 1 0 0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
   
(b) M = 0 0 0 0 1 0 0 and N = 0 0 0 0 1 0 0 have the
   
0 0 0 0 0 1 0 0 0 0 0 0 0 0
   
0 0 0 0 0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0
same rank and same minimal polynomial x3 , but are not similar.
(c) pA (x) = (x − 2)2 (x − 1)(x + 3) and mA (x) = (x − 2)(x − 1)(x + 3). As all roots of
mA (x) have multiplicity 1, the matrix A is diagonalizable.
Exercise 4.10.5 (a) Let A be a 7 × 7 matrix of rank 4 and with minimal polynomial
equal to qA (λ) = λ2 (λ + 1). Give all possible Jordan canonical forms of A.
(b) Let A ∈ Cn . Show that if there exists a vector v so that v, Av, . . . , An−1 v are linearly
independent, then the characteristic polynomial of A equals the minimal polynomial
of A. (Hint: use the basis B = {v, Av, . . . , An−1 v}.)
0 1 0 0 0 0 0 0 1 0 0 0 0 0
   
0 0 0 0 0 0 0  0 0 0 0 0 0 0 
0 0 0 1 0 0 0  0 0 0 1 0 0 0 
   
Answer: (a) 0 0 0 0 0 0 0 , 0 0 0 0 0 0 0 ,
   
0 0 0 0 0 1 0  0 0 0 0 0 0 0 
   
0 0 0 0 0 0 0  0 0 0 0 0 −1 0 
0 0 0 0 0 0 −1 0 0 0 0 0 0 −1
0 1 0 0 0 0 0
 
0 0 0 0 0 0 0 
0 0 0 0 0 0 0 
 
0 0 0 0 0 0 0 .
 
0 0 0 0 −1 0 0 
 
0 0 0 0 0 −1 0 
0 0 0 0 0 0 −1
(b) As B = {v, Av, . . . , An−1 v} is a linearly independent set with n elements in Cn , it is a

basis for Cn . Now [A]B←B has the form
0 0 ··· 0 ∗
 
1 0 · · · 0 ∗
 
Â =  .. . . . . . . .. .
 
 . . 
0 · · · 1 0 ∗
0 ··· 0 1 ∗
Then
−λ 0 ··· 0 ∗
 
 1 −λ ··· 0 ∗
 
Â − λIn =  .. .. .. ..
,
 
 . . . . 
 0 ··· 1 −λ ∗
0 ··· 0 1 ∗
which has rank ≥ n − 1. Thus dim Ker (A − λIn ) = dim Ker (Â − λIn ) ≤ 1, and thus
w1 (A, λ) = 1 for every eigenvalue of A. This shows that A is nonderogatory, and thus
pA (t) = mA (t).
Exercise 4.10.6 Let A ∈ Fn×n and AT denote its transpose. Show that
wk (A, λ) = wk (AT , λ), for all λ ∈ F and k ∈ N. Conclude that A and AT have the same
Jordan canonical form, and are therefore similar.
Answer: In general we have that for any matrix rank B = rankB T . If B is square of size
n × n, we therefore have that
dim Ker B = n − rank B = n − rankB T = dim Ker B T .
Applying this to B = (A − λI)k , we have B T = [(A − λI)k ]T = (AT − λI)k , we get

dim Ker(A − λI)k = dim Ker(AT − λI)k . Then it follows that wk (A, λ) = wk (AT , λ), for
all λ ∈ F and k ∈ N. Thus A and AT have the same Jordan canonical form, and are
therefore similar.
Exercise 4.10.7 Let A ∈ C4×4 matrix satisfying A2 = −I.

(b) Determine the possible Jordan structures of A.
Answer: (a) Let m(t) = t2 + 1 = (t − i)(t + i). Then m(A) = 0, so the minimal polynomial
of A divides m(A). Thus the only possible eigenvalues of A are i and −i.
(b) As the minimal polynomial of A only has roots of multiplicity 1, the Jordan canonical
form will only have 1 × 1 Jordan blocks. Thus the Jordan canonical form of A is a
diagonal matrix with i and/or −i appearing on the diagonal.
Exercise 4.10.8 Let p(x) = (x − 2)2 (x − 3)2 . Determine a matrix A for which p(A) = 0
6 0 for all nonzero polynomials q of degree ≤ 3. Explain why q(A) 6= 0
and for which q(A) =
for such q.
 
2 1 0 0
0 2 0 0
Answer: Let A =    . Then p(x) is the minimal polynomial for A, and thus
0 0 3 1
0 0 0 3
p(A) = 0, and for any nonzero polynomial q(x) with degree less than deg p = 4 we have
6 0.
q(A) =
Exercise 4.10.9 Let mA (t) = (t − 1)2 (t − 2)(t − 3) be the minimal polynomial of

A ∈ M6 .
(a) What possible Jordan forms can A have?

(b) If it is known that rank(A − I) = 3, what possible Jordan forms can A have?
Answer: (a)
1 1 0 0 0 0 1 1 0 0 0 0
   
0 1 0 0 0 0 0 1 0 0 0 0
0 0 2 0 0 0 0 0 2 0 0 0
   
 or   , where a, b ∈ {1, 2, 3}.
0 0 0 3 0 0 0 0 0 3 0 0

0 0 0 0 1 1 0 0 0 0 a 0
0 0 0 0 0 1 0 0 0 0 0 b
(b)
1 1 0 0 0 0
 
0 1 0 0 0 0
0 0 2 0 0 0
 
.
0 0 0 3 0 0

0 0 0 0 1 0
0 0 0 0 0 1
Exercise 4.10.10 Let A be a 4 × 4 matrix satisfying A2 = −A.


(b) Determine the possible Jordan structures of A (Hint: notice that (A + I)A = 0.)
Answer: Let m(t) = t2 + t = t(t + 1). Then m(A) = 0, and thus the minimal polynomial
mP (t) of P divides m(t). Thus there are three possibilities mA (t) = t, mA (t) = t + 1 or
mA (t) = t(t + 1). The only possible roots of A are therefore 0 or −1. Next, since the
minimal polynomial has roots of multiplicity 1 only, the Jordan blocks are all of size 1 × 1.
Thus the Jordan canonical of A is a diagonal matrix with 0 and/or −1 on the diagonal.
Exercise 4.10.11 Let A ∈ Cn×n . For the following answer True or False. Provide an
explanation.
(a) If det(A) = 0, then 0 is an eigenvalue of A.

Answer: True, if det(A) = 0, then pA (0) = 0.
n
(b) If A2 = 0, then the rank of A is at most 2
.
Answer: True, when m(t) = t2 ,
then m(A) = 0. Thus mA (t) divides m(t), and thus
mA (t) = t or mA (t) = t2 . When mA (t) = t, then A = 0, thus rank A = 0. If
mA (t) = t2 , then the Jordan canonical form has 1 × 1 and 2 × 2 Jordan blocks at 0.
The rank of A equals the number of 2 × 2 Jordan blocks at 0, which is at most n 2
.
(c) There exists a matrix A with minimal polynomial mA (t) = (t − 1)(t − 2) and
characteristic polynomial pA (t) = tn−2 (t − 1)(t − 2) (here n > 2).
Answer: False, since 0 is a root of pA (t) it must also be a root of mA (t).
(d) If all eigenvalues of A are 1, then A = In (=the n × n identity matrix).
Answer: A can have a Jordan block at 1 of size 2 × 2 or larger. For example,
False,
1 1
A= 6 I2 .
has only 1 as eigenvalue, but A =
0 1
Exercise 4.10.12 Show that if A is similar to B, then tr A = tr B.
Answer: Let P be so that A = P BP −1 . Put now C = P B, D = P −1 . Then since

tr(CD) = tr(DC) we obtain tr A = tr(CD) = tr(DC) = tr B.
Exercise 4.10.13 Let P be a matrix so that P 2 = P.
(a) Show that P only has eigenvalues 0 or 1.

(b) Show that rank P = trace P. (Hint: determine the possible Jordan canonical form of
P .)
Answer: (a) and (b). Let m(t) = t2 − t = t(t − 1). Then m(P ) = 0, and thus the minimal
polynomial mP (t) of P divides m(t). Thus there are three possibilities mP (t) = t,
mP (t) = t − 1 or mP (t) = t(t − 1). The only possible roots of P are therefore 0 or 1. Next,
since the minimal polynomial has roots of multiplicity 1 only, the Jordan blocks are all of
size 1 × 1. Thus the Jordan canonical J of P is a diagonal matrix with zeros and/or ones
on the diagonal. The rank of J is equal to the sum of its diagonal entries, and as rank and
trace do not change when applying a similarity, we get rank P = trace P.
Exercise 4.10.14 Let A = P JP −1 . Show that Ran A = P [Ran J] and

Ker A = P [Ker J]. In addition, dim Ran A = dim Ran J, and dim Ker A = dim Ker J.
Answer: Let v ∈ Ran A. Then there exists an x so that v = Ax = P (JP −1 x). Then
v ∈ P [Ran J] follows. Conversely, let v ∈ P [Ran J]. Then there exists an x so that
v = P (Jx). Then v = P Jx = A(P −1 x). Thus v ∈ Ran A.
Let v ∈ Ker A. Then Av = 0. Thus P JP −1 v = 0. Let x = P −1 v. Then x ∈ Ker J and

v = P x. Thus v ∈ P [Ker J]. Conversely, let v ∈ P [Ker J]. Then there exists a x ∈ Ker J
so that v = P x. Thus Jx = 0, and Av = P JP −1 v = P Jx = 0. Thus v ∈ Ker A.
By Exercise 3.4.5, it follows that dim Ran A = dim P [Ran J] ≤ dim Ran J. As
Ran J = P −1 [Ran J], it also follows dim Ran J = dim P −1 [Ran A] ≤ dim Ran A. Thus
dim Ran A = dim Ran J. Similarly, dim Ker A = dim Ker J.
Exercise 4.10.15 Show that matrices A and B are similar if and only if they have the
same Jordan canonical form.
Answer: Suppose A and B have the same Jordan canonical form J. Then there exist
invertible P and S so that A = P JP −1 and B = SJS −1 . But then
A = P (S −1 JS)P −1 = (P S −1 )J(P S −1 )−1 , and thus A and B are similar.
Next suppose that A and B are similar. Thus there exists an invertible P so that
A = P BP −1 . Then A − λIn = P BP −1 − λP P −1 = P (B − λIn )P −1 , and thus A − λIn
and B − λIn are similar for all λ ∈ F. Also,
(A − λIn )k = (P (B − λIn )P −1 )k = P (B − λIn )k P −1 ,
and thus (A − λIn )k and (B − λIn )k are similar for all λ ∈ F and k ∈ N. By Exercise
4.10.14 it follows that dim Ker(A − λIn )k = dim Ker(B − λIn )k for all λ ∈ F and k ∈ N.
Thus wk (A, λ) = wk (B, λ) for all λ ∈ F. Consequently, A and B have the same Jordan
canonical form.
Exercise 4.10.16 Show that if A and B are square matrices of the same size, with A
invertible, then AB and BA have the same Jordan canonical form.
Answer: A−1 (AB)A = BA, so AB and BA are similar, and thus have the same Jordan
canonical form.
Exercise 4.10.17 Let A ∈ Fn×m and B ∈ Fm×n . Observe that

In −A AB 0 In A 0n 0
= . (4.26)
0 Im B 0m 0 Im B BA
(a) Show that the Weyr characteristics at λ 6= 0 of AB and BA satisfy

wk (AB, λ) = wk (BA, λ), k ∈ N.
(b) Show that λ 6= 0 is an eigenvalue of AB if and only if it is an eigenvalue of BA, and

that AB and BA have the same Jordan structure at λ.
(c) Provide an example of matrices A and B so that AB and BA have different Jordan
structures at 0.
Answer: (a) and (b). From (4.26) it follows that

k k
In −A AB − λI 0 In A −λI 0
= , k ∈ N.
0 Im B −λI 0 Im B BA − λI
Thus we get that

k k
AB − λI 0 −λI 0
dim Ker = dim Ker , k ∈ N.
B −λI B BA − λI
Next we observe that when λ 6= 0, we have that
k
(AB − λI)k

AB − λI 0 0
dim Ker = dim Ker =
B −λI ∗ (−λ)k I
dim Ker(AB − λI)k ,

and k
(−λ)k I

−λI 0 0
dim Ker = dim Ker =
B BA − λI ∗ (BA − λI)k
dim Ker(BA − λI)k .
Combining the above equalities, we get that
dim Ker(AB − λI)k = dim Ker(BA − λI)k , λ ∈ F \ {0}, k ∈ N.
Thus
wk (AB, λ) = wk (BA, λ), λ ∈ F \ {0}, k ∈ N.
6 0.
From this it follows that AB and BA have the same Jordan structure at λ =

1 0
0 , B = A∗ . Then BA =

(c) Let A = 1 and thus has a 1 × 1 Jordan block at
0 0

0. On the other hand, AB = 1 does not have 0 as an eigenvalue.
Exercise 4.10.18 Let A, B ∈ Cn×n be such that (AB)n = 0. Prove that (BA)n = 0.
Answer: As (AB)n = 0, we have that 0 is the only eigenvalue of AB. By Exercise 4.10.17
this means that 0 is also the only eigenvalue of BA. Thus BA is nilpotent, and thus
(BA)n = 0.
Exercise 4.10.19 (a) Let A ∈ R8×8 with characteristic polynomial

p(x) = (x + 3)4 (x2 + 1)2 and minimal polynomial m(x) = (x + 3)2 (x2 + 1). What are
the possible Jordan canonical form(s) for A (up to permutation of Jordan blocks)?
(b) Suppose that A ∈ Cn×n satisfies Ak 6= 0 and Ak+1 = 0. Prove that there exists
x ∈ Cn such that {x, Ax, . . . , Ak x} is linearly independent.
(c) Let A, B ∈ Cn×n be such that A2 − 2AB + B 2 = 0. Prove that every eigenvalue of B
is an eigenvalue of A, and conversely that every eigenvalue of A is an eigenvalue of B.
3 1 3 1
   
0 3  0 3 
3 1 3
   
   
0 3 3
   
Answer: (a)  , .
   
 i   i 
i i
   
   
 −i   −i 
−i −i
(b) Choose x so that Ak x = 6 0. We claim that {x, Ax, . . . , Ak x} is linearly independent.

Let c0 , c1 , . . . , ck be so that
c0 x + c1 Ax + · · · + ck Ak x = 0. (4.27)
Multiply (4.27) by Ak . Using that Ak+1 = 0, we now get that c0 Ak x = 0. As Ak x =

6 0,
we must have c0 = 0. Next, multiply (4.27) with Ak−1 . Then we get that c1 Ak x = 0. As
Ak x 6= 0, we must have c1 = 0. Continuing this way, we also get c2 = 0,
c3 = 0, . . . , ck = 0, thus showing the linear independence.
(c) Suppose Bv = λv, v 6= 0. Then A2 v − 2λAv + λ2 v = 0, which gives that

(A − λ)2 v = 0. Thus (A − λIn )2 has a nontrivial kernel, and thus is not invertible. But
then it follows that A − λIn is not invertible, thus λ is an eigenvalue for A. In a similar
way one proves that every eigenvalue of A is an eigenvalue of B.
Exercise 4.10.20 (a) Prove Proposition 4.6.1.

0 0 0 ··· −a0
 
1 0 0 ··· −a1 
0 1 0 ··· −a2 
 
(b) Let A =  . .. .. . Show that
 
 .. .. ..
 . . . . 
0 · · · 1 0 −an−2 
0 ··· 0 1 −an−1
pA (t) = tn + an−1 tn−1 + · · · + a1 t + a0 = mA (t).

This matrix is called the companion matrix of the polynomial p(t) = pA (t). Thus a
companion matrix is nonderogatory.
Answer: (a) The number of Jordan blocks of A at λ equals

(w1 (A, λ) − w2 (A, λ)) + (w2 (A, λ) − w3 (A, λ)) + · · · + (wn (A, λ) − wn+1 (A, λ)),
which in turn equals w1 (A, λ). Thus at λ there is one Jordan block if and only if
w1 (A, λ) = 1. This gives the equivalence of (i) and (ii) in Proposition 4.6.1. Next the
multiplicity of λ as a root in pA (t) equals the sum of the sizes of the Jordan blocks at λ,
while the multiplicity in mA (t) corresponds to the size of the largest Jordan block at λ.
The two multiplicities are the same if and only if there one Jordan block at λ. This shows
the equivalence of (i) and (iii).
(b) Notice that

−λ 0 0 ··· −a0
 
 1 −λ 0 ··· −a1 
 0 1 −λ ··· −a2
 

A − λIn =  . .. .. .
 
 .. .. ..
 . . . . 

 0 ··· 1 −λ −an−2 
0 ··· 0 1 −an−1 − λ
Leaving out the first column and the last row, one obtains an invertible (n − 1) × (n − 1)
submatrix. Thus dim Ker (A − λIn ) ≤ 1, and thus w1 (A, λ) = 1 for every eigenvalue of A.
This shows that A is nonderogatory. It is straightforward to check that pA (t) is as
described.
Exercise 4.10.21 For the following pairs of matrices A and B, find a polynomial p(t) so
that p(A) = B, or show that it is impossible.
   
1 1 0 1 2 3
(a) A = 0 1 1 , B = 0 2 3 .
0 0 1 0 0 3
Answer: AB 6 BA and A is
= nonderogatory, so no polynomial p exists.
   
1 1 0 1 2 3
(b) A = 0 1 1 , B = 0 1 2 .
0 0 1 0 0 1
Answer: B = I + 2(A − I) + 3(A − I)2 , thus p(t) = 2 − 4t + 3t2 works.
Exercise 4.10.22 Solve the system of differential equations

 
1
0
x (t) = Ax(t), x(0) = −1 ,
0
where    −1
1 −1 1 2 1 0 1 −1 1
A = 0 1 −1 0 2 1 0 1 −1 .
0 1 0 0 0 2 0 1 0
  2t 1 2 2t
−1  
te2t
 
1 −1 1 e 2
t e 1 −1 1 1
Answer: x(t) = 0 1 −1  0 e2t te2t  0 1 −1 −1 =
0 1 0 0 0 e2t 0 1 0 0
 1 2
( 2 t − t + 1)e2t

 (t − 1)e2t .
te2t
Exercise 4.10.23 Solve the following systems of linear differential equations:
(a) 0
x1 (t) = 3x1 (t) − x2 (t) x1 (0) = 1
,
x02 (t) = x1 (t) + x2 (t) x2 (0) = 2
Answer: x1 (t) = e2t − te2t , x2 (t) = 2e2t − te2t .
(b)  0
 x1 (t) = 3x1 (t) + x2 (t) + x3 (t) x1 (0) = 1
x0 (t) = 2x1 (t) + 4x2 (t) + 2x3 (t) , x2 (0) = −1
 20
x3 (t) = −x1 (t) − x2 (t) + x3 (t) x3 (0) = 1
1 4t
Answer: x1 (t) = 2
e + 12 e2t , x2 (t) = e4t − 2e2t , x3 (t) = − 12 e4t + 32 e2t .
(c)
x01 (t) = −x2 (t) x1 (0) = 1

,
x02 (t) = x1 (t) x2 (0) = 2
Answer: x1 (t) = cos t − 2 sin t, x2 (t) = sin t + 2 cos t.
(d)
x00 (t) − 6x0 (t) + 9x(t) = 0, x(0) = 2, x0 (0) = 1.
Answer: 2e3t − 5te3t .
(e)
x00 (t) − 4x0 (t) + 4x(t) = 0, x(0) = 6, x0 (0) = −1.
Answer: 6e2t − 13te2t .
Exercise 4.10.24 For the following matrices we determined their Jordan canonical form
in Exercise 4.10.2.
(a) Compute cos A for  

−1 1 0 0
−1 0 1 0
A=
 .
−1 0 0 1
−1 0 0 1
Answer: We have A = P JP −1 , where
   
1 0 0 0 0 1 0 0
1 1 0 0
 , J = 0 0 1 0

P =  .
1 1 1 0 0 0 0 1
1 1 1 1 0 0 0 0
Thus cos A = P (cos J)P −1 , with
cos 0 − sin 0 − 21 cos 0 1
− 12
   
6
sin 0 1 0 0
1 1
 0 cos 0 − sin 0 − 2 cos 0 0 1 0 −2
cos J =   =  .
 0 0 cos 0 − sin 0  0 0 1 0 
0 0 0 cos 0 0 0 0 1
Thus
1
− 12
 
1 2
0
3
0
2
0 − 12 
cos A = 
 1
.
0 2
1 − 12 
1 1
0 2
0 2
(b) Compute A24 for  
0 1 0
A = −1 0 0 .
1 1 1
   
0 −1 −1 1 0 0
P = 0 −i i  , J = 0 i 0 .
1 i −i 0 0 −i
As J 24 = I, we find A24 = I.
(c) Compute eA for  
2 0 −1 1
0 1 0 0
A=
1
.
0 0 0
0 0 0 1
   
0 1 1 0 1 0 0 0
1 0 0 0
 , J = 0 1 1 0

P = 0 1 0
.
0 0 0 1 1
0 0 0 1 0 0 0 1
Thus
3e
   
e 0 0 0 2e 0 −e 2
0 e
e e 2  P −1
0 e 0 0
eA =P
 =
 e .

0 0 e e e 0 0 2
0 0 0 e 0 0 0 e
Exercise 4.10.25 (a) Find matrices A, B ∈ Cn×n so that eA eB =

6 eA+B .

0 1 0 0 1 1 1 0
Answer: Let A = ,B = . Then eA = , eB = , while
0 0 1 0 0 1 1 1
1
e − 1e

A+B 1 e+ e A B
e = 2 6= e e .
e − 1e e + 1e
(b) When AB = BA, then eA eB = eA+B . Prove this statement when A is nonderogatory.
Answer: When A is nonderogatory, and AB = BA, we have by Theorem 4.6.2 that
B = p(A) for some polynomial. We can now introduce the functions
f (t) = et , g(t) = ep(t) , h(t) = et+p(t) = f (t)g(t).

It follows from Theorem 4.8.3 that h(A) = f (A)g(A). But then, we obtain that
eA+p(A) = eA ep(A) , and thus eA+B = eA eB .
Exercise 4.10.26 Compute the matrices P20 , P21 , P22 from Example 4.8.5.
Answer:
− 12 1 1
 
1 2
0 1 2
0 0 1 0 1 0 
− 12 3 1 
 
0 0 1
P20 =  2 2 ,
0 −1 2 0 1 1 
1
− 12 −1
 
0 0 0
2 2
0 − 12 1
2
0 0 1
2
1
− 12 − 12
 
0 2
0 0
1
−1
 2
− 32 0 −2 −21
 0 − 12 1
0 0 1 
P21 =  2 2 ,
 1
 − 32 5
2
0 2 3 
2 
−1 1 −2 0 −2 −1 
1 −1 2 0 2 1
−1 1 −2 0 −2 −1
 
 1 −1 2 0 2 1 
1 1 −1 2 0 2 1 

P22 =  .
2 1 −1 2 0 2 1 
 0 0 0 0 0 0 
0 0 0 0 0 0
Exercise 4.10.27 (a) Show that if A = A∗ , then eA is positive definite.

Answer: Let A = U ΛU ∗ be a spectraldecomposition of A. Then eA = U eΛ U ∗ . Since
Λ is diagonal, we have eΛ = diag eλi , λi ∈ R, i = 1, . . . , n. Since the exponential
∗
function maps R 7→ R+ , we have that eA = U eΛ U ∗ = U eΛ U ∗ with all positive
eigenvalues. Thus eA is positive definite.
(b) If eA is positive definite, is then A necessarily Hermitian?
Answer: No. For instance, let A = (2πi). Then eA = (1) is positive definite, but A is
not Hermitian.
(c) What can you say about eA when A is skew-Hermitian?
Answer: Once again, let A = U ΛU ∗ be a spectral decomposition of A. Because A is
skew-Hermitian, we know that Λ has pure imaginary entries. Then we may write
eA = U eΛ U ∗ where eΛ = diag eiλi , λi ∈ R, i = 1, . . . , n. Because Λ commutes with
∗
itself, we have eA eA = U eΛ e−Λ U ∗ = U IU ∗ = I. Thus eA must be unitary.
π 
2
1 −1
π π
Exercise 4.10.28 Let A =  0 2
−4 .
π
0 0 4
(a) Compute cos A and sin A.

Answer: Computing the Jordan canonical form decomposition of A, we have

 π  
1 0 0 2
1 0 1 0 0
−1 π
A = SJS = 0 1 −1  0 2
0  0 1 −1 .
π
0 0 −1 0 0 4
0 0 −1
Then cos A = cos SJS −1 = S cos (J) S −1 , and because we have a 2 × 2 Jordan block,

0 −1

0

cos J = 0 0 √
0 
2
0 0 2
and finally we may compute

 0 −1 1

−1
 
0 0

1 0 0 1 0 0 √
 0 0 0 2
cos A = 0 1 −1  0 1 −1 = 0 0 .

√

√2 
0 0 −1 2 0 0 −1 2
0 0 2 0 0 2
Similarly, we can compute sin A as

   1 0 0

1 0 0

1 0 0 1 0 0 √
−1 2
sin A = 0 1 −1 0 1  0 1 −1 = 0 1 − 1 .
 
√ 2√
0 0 −1 2 0 0 −1 2
0 0 2 0 0 2
(b) Check that (cos A)2 + (sin A)2 = I.

Answer:
 2  2 
0 −1 √1 1 0 √ 0 0 0 0
 
1 0 0

2 2
0 0 2  +
0 1 − 1 = 0 0 1/2 + 0 1 −1/2 = I.
   
√ 2√
0 0 2
0 0 2 0 0 1/2 0 0 1/2
2 2
Exercise 4.10.29 Show that for A ∈ C4×4 , one has that

sin 2A = 2 sin A cos A.
Answer: Let A = SJ4 (λ)S −1 . Then
− 21 sin λ − 16 cos λ
 
sin λ cos λ
1
 sin λ cos λ − 2 sin λ  S −1 ,
sin A = S 

sin λ cos λ 
sin λ
− 21 cos λ 1
 
cos λ − sin λ 6
sin λ
1
 cos λ − sin λ − 2 cos λ  S −1 ,
cos A = S 
 cos λ − sin λ 
cos λ
and
− 68 cos 2λ
 
sin 2λ 2 cos 2λ −4 sin 2λ
 sin 2λ 2 cos 2λ −2 sin 2λ  S −1 .
sin 2A = S 
 sin 2λ 2 cos 2λ 
sin 2λ
Using now double angle formulas such as
sin 2λ = 2 sin λ cos λ, cos 2λ = cos2 λ − sin2 λ,
one checks that sin 2A = 2 sin A cos A in this case. The more general case where
A = SJS −1 , with J a direct sum of single Jordan blocks, now also follows easily.
Exercise 4.10.30 Solve the inhomogeneous system of differential equations

0
x1 (t) = x1 (t) + 2x2 (t) + e−2t ,
x02 (t) = 4x1 (t) − x2 (t).
Answer:
c1 e3t + c2 e−3t + 51 e−2t

x1 (t)
= 3t −3t 7 −2t .
x2 (t) c1 e − 2c2 e − 10 e
Exercise 4.10.31 With the notation of Section 4.9 show that

Z Z
1 1
I= R(λ)dλ, A = λR(λ)dλ.
2πi γ 2πi γ
Answer: First note that by Cauchy’s integral formula (4.34) with f (z) ≡ 1, we have that
Z Z
1 1 1 1
1= dz, 0 = dz, k ≥ 1.
2πi γ z − λj 2πi γ (z − λj )k+1
Using now (4.33) we get that
Z m nXj −1 Z m
1 X k! 1 X
R(z)dz = dzP lk = Pj0 = I.
2πi γ l=1 k=0
2πi γ (z − λj )k+1 j=0
Next, by Cauchy’s integral formula (4.34) with f (z) = z, we have that

Z Z
1 z 1 z
λj = dz, 1 = dz,
2πi γ z − λj 2πi γ (z − λj )2
Z
1 z
0= dz, k ≥ 2.
2πi γ (z − λj )k+1
Using (4.33) we get that
Z j −1
m nX Z m
1 X 1 z X
zR(z)dz = k![ dzP lk = (λj Pj0 + Pj1 ).
2πi γ l=1 k=0
2πi γ (z − λj )k+1 j=1
Using the definitions of Pjk as in Theorem 4.8.4, one sees that this equals A, as desired.
Exercise 4.10.32 Show that the resolvent satisfies
R(λ)−R(µ)
(a) λ−µ
= −R(λ)R(µ).
dR(λ)
(b) dλ
= −R(λ)2 .
j
d R(λ)
(c) dλj
= (−1)j j!R(λ)j+1 , j = 1, 2, . . . .
Answer: (a) We observe that

(λ − A)−1 − (µ − A)−1 = (λ − A)−1 [(µ − A) − (λ − A)](µ − A)−1 =
(λ − A)−1 [µ − λ](µ − A)−1 .
R(λ)−R(µ)
Divide now both sides by λ − µ and λ−µ
= −R(λ)R(µ) follows.
(b) Using (a) we get that

R(λ + h) − R(λ)
lim = − lim R(λ + h)R(λ) = −R(λ)2 .
h→0 h h→0
(c) Similar to part (b), we have

R(λ + h)k+1 R(λ)l − R(λ + h)k R(λ)l+1
lim =
h→0 h
R(λ + h) − R(λ)
lim R(λ + h)k R(λ)l =
h→0 h
− lim R(λ + h)k R(λ + h)R(λ)R(λ)l = −R(λ)k+l+2 . (4.28)
h→0
dj R(λ)
Let us now prove dλj = (−1)j j!R(λ)j+1 by induction on j. The j = 1 case was covered
in part (b). Assume now that it has been proven that
dj−1 R(λ)
= (−1)j−1 (j − 1)!R(λ)j .
dλj−1
Then
dj R(λ) d R(λ + h)j − R(λ)j
j
= (−1)j−1 (j − 1)!R(λ)j = (−1)j−1 (j − 1)! lim .
dλ dλ h→0 h
Write now
j−1
X
R(λ + h)j − R(λ)j = (R(λ + h)j−k R(λ)k − R(λ + h)j−k−1 R(λ)k+1 ).
k=0
Using observation (4.28), we have that
R(λ + h)j−k R(λ)k − R(λ + h)j−k−1 R(λ)k+1
lim =
h→0 h
j−k+k+1 j+1
−R(λ) = −R(λ) .
And thus
j−1
R(λ + h)j − R(λ)j X
lim = −R(λ)j+1 = −jR(λ)j+1 .
h→0 h k=0
Consequently,
dj R(λ)
= (−1)j−1 (j − 1)!(−j)R(λ)j+1 = (−1)j j!R(λ)j+1 ,
dλj
as desired.
Exercise 4.10.33 With the notation of Theorem 4.9.3

Z show that
1
λj Pj0 + Pj1 = APj0 = λR(λ)dλ.
2πi γj
Answer: First note that by Cauchy’s integral formula (4.34) with f (z) = z, we have that
Z Z
1 z 1 z
λj = dz, 0 = dz when j =6 l,
2πi γj z − λj 2πi γl z − λj
Z
1 z
1= dz,
2πi γj (z − λj )2
Z Z
1 z 1 z
0= dz when j 6
= l, 0 = dz, k ≥ 2.
2πi γl (z − λj )2 2πi γl (z − λj )k+1
Using now (4.33) we get that
Z m nXj −1 Z
1 X 1 z
zR(z)dz = k![ dz]Plk = λj Pj0 + Pj1 .
2πi γj l=1 k=0
2πi γj (z − λj )k+1
Using Theorem 4.8.4 ones sees that
−1
APj0 = S(⊕m
l=1 J(λl )S Pj0 = S(0 ⊕ · · · ⊕ J(λj )Inj ⊕ · · · ⊕ 0)S −1 = λj Pj0 + Pj1 .
Chapter 5
Exercise 5.7.1 For the following, check whether h·, ·i is an inner product.
(a) V = R2 , F = R,

x y
h 1 , 1 i = 3x1 y1 + x1 y2 + x2 y1 + 2x2 y2 .
x2 y2
(b) V = C2 , F = C,

x y
h 1 , 1 i = 3x1 y1 + x1 y2 + x2 y1 + 2x2 y2 .
x2 y2

hf, gi = f (0)g(0) + f (1)g(1) + f (2)g(2).
(d) Let V = R2 [X], F = R,

hf, gi = f (0)g(0) + f (1)g(1) + f (2)g(2).
(e) Let V = {f : [0, 1] → C : f is continuous}, F = C,

Z 1
hf, gi = f (x)g(x)(x2 + 1)dx.
0

x y
Answer: (a) Write h 1 , 1 i = (x1 + x2 )(y1 + y2 ) + 2x1 y1 + x2 y2 and realizing that
x2 y2
everything is over R, it is easy to see that this defines an inner product.
(b) hie1 , ie1 i = −3, thus this is not an inner product.
(c) Let f (t) = t(t − 1)(t − 2), then f 6= 0, but hf, f i = 0. Thus this is not an inner product.
(d) Nonnegativity, linearity and symmetry are easy to check. Next suppose that hf, f i = 0.
Then we get that f (0) = f (1) = f (2) = 0. As f ∈ R2 [X], this implies that f = 0 (as a
degree ≤ 2 polynomial with three roots is the zero polynomial).
(e) Nonnegativity, linearity and (complex conjugate) symmetry are easy to check. Next
suppose that hf, f i = 0. This implies that int10 |f (x)|2 (x2 + 1)dx = 0. Since the integrand
is continuous and nonnegative, we must have that |f (x)|2 (x2 + 1) = 0 for x ∈ [0, 1]. Thus
f (x) = 0 for x ∈ [0, 1]. Thus f = 0. This shows that this is an inner product.
Exercise 5.7.2 For the following, check whether k · k is a norm.
(a) V = C2 , F = C,
x1
k k = x21 + x22 .
x2
(b) V = C2 , F = C,
x1
k k = |x1 | + 2|x2 |.
x2

Z 2
kf k = |f (x)|(1 − x)dx.
0
(d) Let V = {f : [0, 1] → R : f is continuous}, F = R,

Z 1
kf k = |f (x)|(1 − x)dx.
0

i
Answer: (a) Not a norm. For instance, k k = −1 6≥ 0.
0
(b) Clearly this quantity is always nonnegative, and when it equals 0 we need that
|x1 | = 0 = |x2 |, yielding that x = 0. Thus the first property of a norm is satisfied.
Next, kcxk = |cx1 | + 2|cx2 | = |c|(|x1 | + 2|x2 |) = |c|kxk.
Finally,
kx + yk = |x1 + y1 | + 2|x2 + y2 | ≤ |x1 | + |y1 | + 2(|x2 | + |y2 |) = kxk + kyk,
yielding that k · k is a norm.
(c) This is not a norm. For instance, if f (x) = 1 + x, then

Z 2
x3 x=2 8
kf k = (1 + x)(1 − x)dx = (x − )| = 2 − − 0 < 0.
0 3 x=0 3
(d) Notice that 1 − x ≥ 0 when 0 ≤ x ≤ 1, thus kf k ≥ 0 for all f ∈ V . Next, suppose that
kf k = 0. As |f (x)|(1 − x) ≥ 0 on [0, 1], the only way the integral can be zero is when
|f (x)|(1 − x) = 0 for x ∈ [0, 1]. Thus f (x) = 0 for x ∈ (0, 1], and then, by continuity, it also
follows that f (0) = limx→0+ f (x) = 0. Thus f is the zero function. This takes care of the
first condition of a norm.
For properties (ii) and (iii), observe that |cf (x)| = |c||f (x)| and
|(f + g)(x)| = |f (x) + g(x)| ≤ |f (x)| + |g(x)|. Using this, it is easy to see that
kcf k = |c|kf k and kf + gk ≤ kf k + kgk, giving that k · k is a norm.
Exercise 5.7.3 Let v1 , . . . , vn be nonzero orthogonal vectors in an inner product space

V . Show that {v1 , . . . , vn } is linearly independent.
Answer: Let c1 , . . . , cn be so that c1 v1 + · · · + cn vn = 0. We need to show that cj = 0,

j = 1, . . . , n. For this, observe that for j = 1, . . . , n,
0 = h0, vj i = hc1 v1 + · · · + cn vn , vj i = cj hvj , vj i.
6 0, we have that hvj , vj i 6= 0, and thus cj = 0 follows.
As vj =
Exercise 5.7.4 Let V be an inner product space.
(a) Determine {0}⊥ and V ⊥ .

   
1 0
 i   −i  ⊥
(b) Let V = C4 and W = {
1 + i , 1 + 2i}. Find a basis for W .
  
2 0
(c) In case V is finite-dimensional and W is a subspace, show that

dim W ⊥ = dim V − dim W . (Hint: start with an orthonormal basis for W and add
vectors to it to obtain an orthonormal basis for V ).
Answer: (a) {0}⊥ = V and V ⊥ = {0}.

1 −i 1−i 2
(b) We need to find a basis for the null space of , which in
0 i 1 − 2i 0

1 0 −3i 2
row-reduced echelon form is the matrix . This gives the basis
0 1 −2 + i 0
   
3i −2
2 − i   0 
{
  ,  }
1   0 
0 1
for W ⊥ .
(c) Let k = dim W and n = dim V . Clearly k ≤ n. Let {v1 , . . . , vk } be an orthonormal

basis for W , and extend this basis to an orthonormal basis {v1 , . . . , vn } for V (which can
be done, as one can extend to a basis of V and then make it orthonormal via the
Gram–Schmidt process). We now claim that {vk+1 , . . . , vn } is a basis Pfor W ⊥ . Let
⊥ n
x ∈ W . As x ∈ V , we have that there exists c1 , . . . , cn so that x = i=1 ci vi . Since
x ⊥ vi , i = 1, . . . , k, we get that c1 = · · · = ck = 0. Thus x ∈ Span{vk+1 , . . . , vn },
yielding that W ⊥ ⊆ Span{vk+1 , . . . , vn }. Due to orthonormality of the basis for V , we
have that vk+1 , . . . , vn are orthogonal to the vectors v1 , . . . , vk , and thus orthogonal to
any vector in W . Thus Span{vk+1 , . . . , vn } ⊆ W ⊥ , and we obtain equality. As we already
have the linear independence of {vk+1 , . . . , vn }, we obtain that it is a basis for W ⊥ . Thus
dim W ⊥ = n − k.
Exercise 5.7.5 Let h·, ·i be the Euclidean inner product on Fn , and k · k the associated
norm.
(a) Let F = C. Show that A ∈ Cn×n is the zero matrix if and only if hAx, xi = 0 for all
x ∈ Cn . (Hint: for x, y ∈ C, use that hA(x + y), x + yi = 0 = hA(x + iy), x + iyi.)
(b) Show that when F = R, there exists nonzero matrices A ∈ Rn×n , n > 1, so that
hAx, xi = 0 for all x ∈ Rn .
(c) For A ∈ Cn×n define
w(A) = max |hAx, xi|. (5.29)
x∈Cn ,kxk=1
Show that w(·) is a norm on Cn×n . This norm is called the numerical radius of A.
(d) Explain why maxx∈Rn ,kxk=1 |hAx, xi| does not define a norm.
Answer: (a) Clearly, if A = 0, then hAx, xi = 0 for all x ∈ Cn .
For the converse, assume that hAx, xi = 0 for all x ∈ Cn . Let now x, y ∈ C. Then
0 = hA(x + y), x + yi = hAx, xi + hAy, xi + hAx, yi + hAy, yi = hAy, xi + hAx, yi (5.30)
and, similarly,
0 = hA(x + iy), x + iyi = ihAy, xi − ihAx, yi. (5.31)
Combining (5.30) and (5.31), we obtain that hAx, yi = 0 for all x, y ∈ C. Applying this
with x = ej and y = ek , we obtain that the (k, j)th entry of A equals zero. As this holds
for all k, j = 1, . . . , n, we obtain that A = 0.

0 −1
(b) When n = 2 one may choose A = . For larger n one can add zero rows and
1 0
columns to this matrix.
(c) Clearly, w(A) ≥ 0. Next, suppose that w(A) = 0. Then for all kxk = 1, we have that
hAx, xi = 0. This implies that for all x ∈ Cn we have that hAx, xi = 0. By (a), this
implies that A = 0. Next, for kxk = 1, we have
|h(A + B)x, xi| ≤ |hAx, xi| + |hBx, xi| ≤ w(A) + w(B),
and thus w(A + B) ≤ w(A) + w(B). Finally, when c ∈ C, one has that
|h(cA)x, xi| = |c||hAx, xi|, and thus w(cA) = |c|w(A) follows easily.
(d) With A as in part (b), we have that maxx∈Rn ,kxk=1 |hAx, xi| = 0, and thus the first
property in the definition of a norm fails.
Exercise 5.7.6 Find an orthonormal basis for the subspace in R4 spanned by

     
1 1 3
1 2 1
 , , .
1 1 3
1 2 1
Answer: Applying Gram–Schmidt, we find

1 1 1 
2 2 2
 
1 2 3 4
2
− 21 1 
2 ,R
Q=
1 1 = 0 −1 2 .
− 12 
2
1
2 0 0 0
2
− 21 − 12
1  1 
2 2
 1  − 1 
Thus { 2  2
 1  ,  1 } is the requested orthonormal basis.
2 2
1
2
− 21
Exercise 5.7.7
Let V = R[t] over the field R. Define the inner product

Z 1
−1
For the following linear maps on V determine whether they are self-adjoint.
(a) Lp(t) := (t2 + 1)p(t).

dp
(b) Lp(t) := dt
(t).
(c) Lp(t) = −p(−t).
R1 2
Answer: (a) hL(p), qi = −1 (t + 1)p(t)q(t)dt = hp, L(q)i. Thus L is self-adjoint.
R1
(b) Let p(t) ≡ 1 and q(t) = t. Then hL(p), qi = 0, and hp, L(q)i = −1 1dt = 2. Thus L is
not self-adjoint.
R1 R −1
(c) hL(p), qi = −1 −p(−t)q(t)dt = 1 −p(s)q(−s)(−ds) = hp, L(q)i. Thus L is
self-adjoint.
Exercise 5.7.8 Let V = R[t] over the field R. Define the inner product
Z 2
0
For the following linear maps on V determine whether they are unitary.
(a) Lp(t) := tp(t).

(b) Lp(t) = −p(2 − t).
32
Answer: (a) Let p(t) = q(t) = t. Then hp, qi = 2, while hL(p), L(q)i = 5
. Thus L is not
unitary.
(b) Doing a change of variables s = 2 − t, we get

hL(p), L(q)i = 02 (−p(2 − t))(−q(2 − t))dt = 20 p(s)q(s)(−ds) = hp, qi. Thus L is unitary.
R R
Exercise 5.7.9 Let U : V → V be unitary, where the inner product on V is denoted by

h·, ·i.
(a) Show that |hx, U xi| ≤ kxk2 for all x in V .

(b) Show that |hx, U xi| = kxk2 for all x in V , implies that U = αI for some |α| = 1.
Answer: (a) By (5.1) we have that |hx, U xi| ≤ kxkkU xk = kxk2 , where in the last step we
used that kxk = kU xk as U is unitary.
(b) Let x be a unit vector. As |hx, U xi| = kxk2 , we have by (the last part of) Theorem
5.1.10 that U x = αx for some α. As kxk = kU xk we must have |α| = 1. If we are in a
one-dimensional vector space, we are done. If not, let v be a unit vector orthogonal to v.
As above, we get U v = βv for some |β| = 1. In addition, we get that U (x + v) = µ(x + v)
with |µ| = 1. Now, we get that
µ = hµx, xi = hµ(x + v), xi = hU (x + v), xi = hαx + βv), xi = α.
Similarly, we prove µ = β. Thus α = β. Thus, show that U y = αy for all y ⊥ x and also
for y = x. But then the same holds for linear combinations of x and y, and we obtain that
U = αI.
Exercise 5.7.10 Let V = Cn×n , and define


1 2 1 0
(a) Let W = span{ , }. Find an orthonormal basis for W .
0 1 2 1
(b) Find a basis for W ⊥ := {B ∈ V : B ⊥ C for all C ∈ W }.
Answer: (a) Performing Gram–Schmidt, we get

2
− 23

1 0 2 1 2
− = 3 2 .
2 1 6 0 1 2 3
√ 2
− 23

1 2
Thus { √1 , √3 3 2 } is the requested orthonormal basis.
6 0 1 10 2 3

a b
(b) Let B = ∈ W ⊥ . Then we get a + 2c + d = 0, 1a + 2b + d = 0. With c and d as
c d
free variables, we get

a b −2c − d c −2 1 −1 0
= =c +d .
c d c d 1 0 0 1
Performing Gram–Schmidt, we get

1
− 3 − 13

−1 0 2 −2 1
− = 1 .
0 1 6 1 0 −3 1
√ 1
− 3 − 13

−2 1
Thus { √1 , 23 1 } is the requested orthonormal basis.
6 1 0 −3 1
Exercise 5.7.11 Let A ∈ Cn×n . Show that if A is normal and Ak = 0 for some k ∈ N,
then A = 0.
Answer: As A is normal, A = U DU ∗ for some uniray U and diagonal D. Then

0 = Ak = U Dk U ∗ , thus Dk = 0. As D is diagonal, we get D = 0. Thus A = 0.
Exercise 5.7.12 Let A ∈ Cn×n and a ∈ C. Show that A is normal if and only if A − aI
is normal.
Answer: If AA∗ = A∗ A, then

(A − aI)(A − aI)∗ = AA∗ − aA − aA∗ + |a|2 I = A∗ A − aA − aA∗ + |a|2 I = (A − aI)∗ (A − aI).
Exercise 5.7.13 Show that the sum of two Hermitian matrices is Hermitian. How about
the product?
Answer: If A = A∗ and B = B ∗ , then (A + B)∗ = A∗ + B ∗ = A + B. Thus the sum of two

Hermitian matrices is Hermitian.
A product
oftwo Hermitian
matrices is not necessarily Hermitian. For example,
2 1 0 1
A= ,B = are Hermitian but AB is not.
−i 2 1 0
Exercise 5.7.14 Show that the product of two unitary matrices is unitary. How about
the sum?
Answer: Let U and V be unitary. Then (U V )(U V )∗ = U V V ∗ U ∗ = U U ∗ = I and

(U V )∗ (U V ) = V ∗ U ∗ U V = V ∗ V = I, thus U V is unitary.
The sum of two unitary matrices is in general not unitary. For example, U = I is unitary,
but U + U = 2I is not.
Exercise 5.7.15 Is the product of two normal matrices normal? How about the sum?

2 i 0 1
Answer: No, e.g., A = ,B = are normal, but neither AB nor A + B is
−i 2 i 0
normal.
Exercise 5.7.16 Show that the following matrices are unitary.

1 1
1
(a) √ .
2 −11

1 1 1 1 2 0
Answer: √1 √1
= 21 = I2 .
2 1 −1 2 1 −1 0 2
 
1 1 1
2iπ 4iπ 
(b) √1 1 e 3 e 3 .

3 4iπ 8iπ
1 e 3 e 3
   
1 1 1 1 1 1 
3 0 0

2iπ 4iπ 2iπ 4iπ
Answer: √1 1 e 3 e 3  √1 1 e− 3 e− 3  = 1 0 3 0 = I3 .
   
3 4iπ 8iπ 3 4iπ 8iπ
3
1 e 3 e 3 1 e 3 − −
e 3 0 0 3
 
1 1 1 1
1 i −1 −i 
(c) 21 
1 −1
.
1 −1
1 −i −1 i
     
1 1 1 1 1 1 1 1 4 0 0 0
1 i −1 −i   1 1 −i −1 i  0 4 0 0
Answer: 12  = 1

  = I4 .
1 −1 1 −1 2 1 −1 1 −1 4 0 0 4 0
1 −i −1 i 1 i −1 −i 0 0 0 4
(d) Can you guess the general rule?
Answer: The matrices in the previous parts are all Fourier matrices. The general form
of a Fourier matrix is given before Proposition 7.4.3.
Exercise 5.7.17 For the following matrices A, find the spectral decomposition U DU ∗ of
A.

i2
(a) A = .
2−i
√
2 3
(b) A = √ .
3 4
 
3 1 1
(c) A = 1 3 1.
1 1 3
 
0 1 0
(d) A = 0 0 1.
1 0 0
1 1
!
√ √

2 2 1 0
Answer: (a) U = ,D = .
√i − √i 0 3
2 2
√ !
3 1
− 2 √2 1 0
(b) U = ,D = .
1 3 0 5
2 2
 1 1 1

√ √  
 16 2 3 2 0 0
(c) U =  √ − 12 √1 
= 0 2 0 .
3,D

 √6
− 36 0 1
√ 0 0 5
3
   
1 1 1 1 0 0
2πi 4πi  2πi
1
(d) U = √ 1 e 3 e 3  , D = 0 e 0 .
  3

3 2πi 2πi 4πi
1 e 3 e 3 0 0 e 3

3 2i
Exercise 5.7.18 Let A = .
−2i 3
(a) Show that A is positive semidefinite.

(b) Find the positive square root of A; that is, find a positive semidefinite B so that
B 2 = A.
Answer: (a) pA (λ) = (3 − λ) − |2i|2 = λ2 − 6λ + 5, which has roots 1, 5. As A is Hermitian

with nonnegative eigenvalues, A is positive semidefinite.
√1 1
!
√ √

1 0
(b) Let U = 2 2 ,D = . Then A = U DU ∗ . Now let B = U DU ∗ ,
√i − √i 0 5
2 2
√ 1
where D= √0 . Then B is positive semidefinite, and B 2 = A.
0 5
Exercise 5.7.19 Let A ∈ Cn×n be positive semidefinite, and let k ∈ N. Show that there
exists a unique positive semidefinite B so that B k = A. We call B the kth root of A and
1
denote B = A k .
Answer: Since A is positive semidefinite, there exists a unitary U and a diagonal

D = diag(di )n ∗
i=1 so that A = U DU . Moreover, the diagonal entries di of D are
nonnegative, and let us order them so that d1 ≥ · · · ≥ dn . Thus we may define
1 1 1
D k := diag(dik )n ∗ k
i=1 . Let now B = U D U . Then B is positive semidefinite and B = A.
k
Next, suppose that C is positive semidefinite with C k = A. For uniqueness of the kth
root, we need to show that C = B. As C is positive semidefinite, we may write C = V ΛV ∗
with V unitary, and Λ = diag(λi )n k k ∗
i=1 with λ1 ≥ · · · ≥ λn (≥ 0). Then C = V Λ V = A,
and as the eigenvalues of C k are λk1 ≥ · · · ≥ λkn and the eigenvalues of A are
d1 ≥ · · · ≥ dn , we must have that λki = di , i = 1, . . . , n. And thus, since λi ≥ 0 for all i, we
1
have λi = dik , i = 1, . . . , n. From the equalities V Λk V ∗ = U DU ∗ and Λk = D, we obtain
that (U ∗ V )D = D(U ∗ V ). Let W = U ∗ V and write W = (wij )n i,j=1 . Then W D = DW
implies that wij dj = di wij for all i, j = 1, . . . , n. When dj 6= di we thus get that wij = 0
1 1
(since wij (dj − di ) = 0). But then it follows that wij djk = dik wij for all i, j = 1, . . . , n
(indeed, when wij = 0 this is trivial, and when di = dj this also follows from
1 1 1 1
wij dj = di wij ). Now we obtain that U ∗ V D k = W D k = D k W = D k U ∗ V , and thus
1 1
C = V D k V ∗ = U D k U ∗ = B.
Exercise 5.7.20 Let A ∈ Cn×n be positive semidefinite. Show that

1
lim trA k = rankA.
k→∞
1
(Hint: use that for λ > 0 we have that limk→∞ λ k = 1.)
Answer: We may write A = U ΛU ∗ with U unitary and Λ = diag(λi )n

i=1 with
λ1 ≥ · · · ≥ λr > λr+1 = · · · = λn = 0 and r = rankA. Then for i = 1, . . . , r we have that

1 1
limk→∞ λik = 1, while for i = r + 1, . . . , n we have that limk→∞ λik = 0. Thus
1 1 1
lim trA k = lim λ1k + · · · + λnk = 1 + · · · + 1 + 0 + · · · + 0 = r = rankA.
k→∞ k→∞
Exercise 5.7.21 Let A = A∗ be an n × n Hermitian matrix, with eigenvalues

λ1 ≥ · · · ≥ λn .
(a) Show tI − A is positive semidefinite if and only if t ≥ λ1 .

Answer: Write A = U DU ∗ , with U unitary and D = diag(λi )n i=1 . Then
tI − A = U (tI − D)U ∗ . Thus tI − A is positive semidefinite if and only if t − λi ≥ 0,
i = 1, . . . , n, which holds if and only if t − λ1 ≥ 0.
(b) Show that λmax (A) = λ1 = maxhx,xi=1 hAx, xi, where h·, ·i is the Euclidean inner
product.
Answer: By part (a) λ1 I − A is positive semidefinite, and thus h(λ1 I − A)x, xi ≥ 0.
This gives that λ1 hx, xi ≥ hAx, xi for all vectors x. Choosing x to be a unit
eigenvector of A at λ1 we obtain equality. This proves the result.
(c) Let Â be the matrix obtained from A by removing row and column i. Then
λmax (Â) ≤ λmax (A).
Answer: For y ∈ Fn a vector with a 0 in the ith position, we let ŷ denote the vector
obtained from y by removing the ith coordinate. Note that hŷ, ŷi = hy, yi and
hÂŷ, ŷi = hAy, yi. By part (b), we have that
λmax (Â) = max hÂŷ, ŷi = max hAy, yi ≤ max hAx, xi = λmax (A).
hŷ,ŷi=1 hy,yi=1,yi =0 hx,xi=1
Exercise 5.7.22 (a) Show that a square matrix A is Hermitian iff A2 = A∗ A.

(b) Let H be positive semidefinite, and write H = A + iB where A and B are real
matrices. Show that if A is singular, then H is singular as well.
Answer: (a) Clearly, if A is Hermitian, then A∗ A = AA = A2 .
Conversely, suppose that A is so that A2 = A∗ A. Apply Schur’s triangularization theorem

to obtain a unitary U and an upper triangular T so that A = U T U ∗ . But then A2 = A∗ A
implies T 2 = T ∗ T . Write T = (tij )ni,j=1 , with tij = 0 when i > j. Then the (1,1) entry of
the equation T 2 = T ∗ T gives that t211 = |t11 |2 , which shows that t11 ∈ R. Next the (2,2)
entry of T 2 = T ∗ T yields that t222 = |t12 |2 + |t22 |2 , which yields that t222 ≥ 0, and thus
t222 = |t22 |2 . But then it follows that t12 = 0 and t22 ∈ R. In general, from the (k, k)th
entry of T 2 = T ∗ T , one finds that tik = 0, i < k, and tkk ∈ R. Thus T is a real diagonal
matrix, which gives that A = U T U ∗ is Hermitian.
6 v ∈ Rn be so that Av = 0. Then v∗ Hv ≥ 0, and thus

(b) Let 0 =
0 ≤ v∗ (A + iB)v = ivT Bv, and thus we must have that vT Bv = 0. This gives that
v∗ Hv = 0. Next write H = U DU ∗ , with D = (dij )n i,j=1 a nonnegative diagonal matrix.
v1
 
Let v =  .. . Then d11 |v1 |2 + · · · + dnn |vn |2 = 0. As v 6= 0, some vi is nonzero. But

 
.
vn
then 0 ≤ dii |vi |2 ≤ d11 |v1 |2 + · · · + dnn |vn |2 = 0, implies that dii = 0. Thus H is singular.
Exercise 5.7.23 (a) Let A be positive definite. Show that A + A−1 − 2I is positive
semidefinite.
(b) Show that A is normal if and only if A∗ = AU for some unitary matrix U .
Answer: (a) Clearly, since A is Hermitian, we have that A + A−1 − 2I is Hermitian. Next,
every eigenvalue of A + A−1 − 2I is of the form λ + λ−1 − 2 = (λ − 1)2 /λ, where λ is an
eigenvalue of A. As λ > 0, we have that (λ − 1)2 /λ ≥ 0.
(b) If A∗ = AU for some unitary matrix U , then A∗ A = AU (U ∗ A∗ ) = AA∗ . Thus A is

normal.
Conversely, let A be normal. Then there exists a diagonal D (with diagonal entries |djj |)
and unitary V so that A = V DV ∗ . Let W be a unitary diagonal matrix so that DW = D∗
djj
(by taking wjj = djj
, when djj 6= 0, and wjj = 1 when djj = 0). Then U = V W V ∗ is
unitary and AU = A(V W V ∗ ) = V DV ∗ V W V ∗ = V DW V ∗ = V D∗ V ∗ = A∗ .
 
1 1 0
Exercise 5.7.24 Find a QR factorization of 1 0 1 .
0 1 1
Answer:
 √2 1
 √ √ √ 
√ − √1 2 −√ 22 − 2
 √22 6 3 2 
− √1 1 6 1 

Q= 
 2 √ ,R =  0
2
√ .
√ 6 3   √6 
6 1 2 3
0 3
√ 0 0 3
3
Exercise 5.7.25 Find the Schur factorization A = U T U ∗ , with U unitary and T

triangular, for the matrix  
−1 −2 3
A=  2 4 −2 .
1 −2 1
Note: 2 is an eigenvalue of A.
Answer:
√  √ √ 
− 2 2
 
2 2 2 2
√ 0
 2 2
T = 0 4 2 2 , U =  0√ 1 0√  .
 
0 0 −2 − 22 0 − 2
2
Exercise 5.7.26 Let

A B
T = (5.32)
C D
be a block matrix, and suppose that D is invertible. Define the Schur complement S of D
in T by S = A − BD−1 C. Show that rank T = rank(A − BD−1 C) + rank D.
Answer: Observe that

I −BD−1 A − BD−1 C

A B I 0 0
= .
0 I C D −D−1 C I 0 D
But then we get that
A − BD−1 C

0
rank T = rank = rank(A − BD−1 C) + rank D.
0 D
Exercise 5.7.27 Using Sylvester’s law of inertia, show that if

A B
M = ∗ = M ∗ ∈ C(n+m)×(n+m)
B C
with C invertible, then
In M = In C + In(A − BC −1 B ∗ ). (5.33)

I 0
(Hint: Let S = and compute SM S ∗ .)
−B ∗ A−1 I
Answer: Let
I 0
S= ,
−B ∗ A−1 I
and observe that
−A−1 B

I 0 A B I A 0
SM S ∗ = = .
−B ∗ A−1 I B∗ C 0 I 0 C− B ∗ A−1 B
Theorem 5.5.5 now yields that

A 0
In M = In ∗ −1 = In A + In(C − B ∗ A−1 B). (5.34)
0 C−B A B
Exercise 5.7.28 Determine the singular value decomposition of the following matrices.
 √ 
1 1 2√2i
(a) A = √−1 −1
√ 2 2i.
2i − 2i 0
 
−2 4 5
 6 0 −3
(b) A =  .
 6 0 −3
−2 4 5
√ 
2i
− 21 1    
4 0 0 0 0 1
 √2 2
Answer: (a) V =  22i 1
−√12  , Σ = 0 2 0 , W = −1 0 0 .

2
√
0 − 22i − 22i 0 0 2 0 1 0
 1
−2 − 21 − 21 − 12
  
12 0 0  2
− 23 − 13

 1 − 12 1
− 12  0 6 0 3
(b) V =  21
 2
1 ,Σ =  0
   , W = − 1 − 23 2 
.
− 12 − 21 0 0 3 3
2 2 − 23 − 13 − 23
− 21 − 21 1
2
1
2
0 0 0
Exercise 5.7.29 Let A be a 4 × 4 matrix with spectrum σ(A) = {−2i, 2i, 3 + i, 3 + 4i}
and singular values σ1 ≥ σ2 ≥ σ3 ≥ σ4 .
(a) Determine the product σ1 σ2 σ3 σ4 .

(b) Show that σ1 ≥ 5.
(c) Assuming A is normal, determine tr(A + AA∗ ).
Answer: (a) Write A = V ΣW ∗ , then | det A| = | det V | det Σ| det W ∗ | = σ1 σ2 σ3 σ4 as the

determinant of a unitary matrix has absolute
√ value 1. Next observe that
| det A| = | − 2i||2i||3 + i||3 + 4i| = 20 10, which is the answer.
(b) Let v be a unit vector at the eigenvalue 3 + 4i. Then

σ1 = max kAxk ≥ kAvk = k(3 + 4i)vk = |3 + 4i| = 5.
kxk=1
(c) Since A is normal, we can write A = U DU ∗ with U unitary and

 
−2i 0 0 0
 0 2i 0 0 
D=  0
.
0 3+i 0 
0 0 0 3 + 4i
Thus
tr(A + AA∗ ) = tr(D + DD∗ ) = (−2i + 2i + 3 + i + 3 + 4i) + (4 + 4 + 10 + 25) = 49 + 5i.

P Q
Exercise 5.7.30 Let A = ∈ C(k+l)×(m+n) , where P is of size k × m. Show that
R S
σ1 (P ) ≤ σ1 (A).
Conclude that σ1 (Q) ≤ σ1 (A), σ1 (R) ≤ σ1 (A), σ1 (S) ≤ σ1 (A) as well.
Answer: By (5.17),

x
σ1 (P ) = max kP xk ≤ max kA k≤
kxk=1,x∈Cm kxk=1,x∈Cm 0
max kAzk = σ1 (A).

kzk=1,z∈Cm+n
The same type of reasoning can be applied to obtain the other inequalities. Alternatively,
one can use that permuting block rows and/or block columns
does
notchangethe singular
P Q Q P
values of a matrix. For instance, the singular values of and are the
R S S R

0 Im
same, as multiplying on the left with the unitary matrix J = does not change
In 0
the singular values (it only changes the singular value decomposition from V ΣW ∗ to
V ΣW ∗ J = V Σ(J ∗ W )∗ ).
Exercise 5.7.31 This is an exercise that uses MATLAB R

, and its purpose it to show
what happens with an image if you take a low rank approximation of it.
Answer:
Figure 5.7: The original image (of size 672 × 524 × 3).
(a) Using 10 singular val- (b) Using 30 singular val- (c) Using 50 singular val-
ues. ues. ues.
Exercise 5.7.32 The condition number κ (A) of an invertible n × n matrix A is given by

σ (A)
κ (A) = σ 1 (A) , where σ1 (A) ≥ · · · ≥ σn (A) are the singular values of A. Show that for all
n
invertible matrices A and B, we have that κ (AB) ≤ κ (A) κ (B). (Hint: use that
σ1 (A−1 ) = (σn (A))−1 and (5.18).)
Answer: Notice that for any invertible matrix A, κ (A) = σ1 (A) σ1 A−1 . So by (5.18),

σ (A) σ (B)
σ1 (AB) σ1 B −1 A−1 ≤ σ1 (A) σ1 A−1 σ1 (B) σ1 B −1 = σ 1 (A) σ 1 (B) = κ (A) κ (B).

n n
Exercise 5.7.33 Prove that if X and Y are positive definite n × n matrices such that
Y − X is positive semidefinite, then det X ≤ det Y . Moreover, det X = det Y if and only
if X = Y .
1 1 1 1
Answer: Notice that Y − 2 (Y − X)Y − 2 = I − Y − 2 XY − 2 is positive semidefinite. Thus
1
−2 −1
the eigenvalues µ1 , . . . , µn of Y XY 2 satisfy 0 ≤ µj ≤ 1, j = 1, . . . , n. But then
det X −1 −2 1 Qn
det Y
= det(Y 2 XY ) = j=1 µj ≤ 1. Next, det X = det Y if and only if
1 1
µ1 = · · · = µn = 1, which in turn holds if and only if Y − 2 XY − 2 = In . The latter holds if
and only if X = Y .
Exercise 5.7.34 (Least squares solution) When the equation Ax = b does not have a
solution, one may be interested in finding an x so that kAx − bk is minimal. Such an x is
called a least squares solution to Ax = b. In this exercise we will show that if A = QR,
with R invertible, then the least squares solution is given by x = R−1 Q∗ b. Let A ∈ Fn×m
with rank A = m.
(a) Let A = QR be a QR-factorization of A. Show that Ran A = RanQ.

(b) Observe that QQ∗ b ∈ Ran Q. Show that for all v ∈ Ran Q we have
6 QQ∗ b.
kv − bk ≥ kQQ∗ b − bk and that the inequality is strict if v =
(c) Show that x := R−1 Q∗ b is the least squares solution to Ax = b.
   
1 1 3
(d) Let A = 2 1 and b = 5. Find the least squares solution to Ax = b.
 
3 1 4
(e) In trying to fit a line y = cx + d through the points (1, 3), (2, 5), and (3, 4), one sets
up the equations
3 = c + d, 5 = 2c + d, 4 = 3c + d.
Writing this in matrix form we get

c
A = b,
d
where A and b as above. One way to get a “fitting line” y = cx + d, is to solve for c
and d via least squares, as we did in the previous part. This is the most common way
to find a so-called regression line. Plot the three points (1, 3), (2, 5), and (3, 4) and the
line y = cx + d, where c and d are found via least squares as in the previous part.
Answer: (a) Since rank A = m, the columns of A are linearly independent. This gives that
the m × m matrix R is invertible. Thus Ran A = Ran Q follows.
(b) Clearly, QQ∗ b ∈ Ran Q. Let v ∈ Ran Q. Thus there exists a w so that v = Qw.
Then, since (I − QQ∗ )b ⊥ x for every x ∈ Ran Q (use that Q∗ Q = I),
kv − bk2 = kQw − QQ∗ b + QQ∗ b − bk2 = kQw − QQ∗ bk2 + kQQ∗ b − bk2 .
Thus kv − bk ≥ kQQ∗ b − bk, and equality only holds when v = QQ∗ b.
(c) For kAx − bk to be minimal, we need Ax = QQ∗ b, as QQ∗ b is the element in

Ran A = RanQ closest to b. Now, Ax = QQ∗ b, gives QRx = QQ∗ b. Putting
x = R−1 Q∗ b, we indeed get that QRx = QQ∗ b.
1
(d) x = 2 . (e)
3
Exercise 5.7.35 Let A, X be m × m matrices such that A = A∗ is invertible and

H := A − X ∗ AX (5.35)
is positive definite.
(a) Show that X has no eigenvalues on the unit circle T = {z ∈ C : |z| = 1}.
(b) Show that A is positive definite if and only if X has all eigenvalues in
D = {z ∈ C : |z| < 1}. (Hint: When X has allPeigenvalues in D, we have that X n → 0
as n → ∞. Use this to show that A = H + ∞ k=1 X ∗k HX k .)
Answer: (a) Suppose that x is an eigenvector of X with eigenvalue λ. Then (5.35) yields
that
0 < x∗ Hx = (1 − |λ|2 )x∗ Ax,
and thus |λ| 6= 1.
(b) Assuming that A is positive definite we get that

1 1 1 1 1 1
A− 2 HA− 2 = I − (A− 2 X ∗ A 2 )(A 2 XA− 2 )
1 1
is positive definite, and thus σ1 (A 2 XA− 2 ) < 1. Consequently, the eigenvalues of
1 −1 1 1
A XA
2 2 lie in D. But then, the same follows for X (as X is similar A 2 XA− 2 ).
For the converse, suppose that the eigenvalues of S lie in D. Then X n → 0 as n → ∞.

Rewriting (5.35) and reusing it over and over again we get that
A = H + X ∗ AX = H + X ∗ HX + X ∗2 AX 2 = · · · =
n−1
X ∞
X ∞
X
X ∗k HX k + X ∗n AX n → X ∗k HX k = H + X ∗k HX k . (5.36)
k=0 k=0 k=1
P∞
Thus A = H + k=1 X ∗k HX k is positive definite.
Chapter 6
Exercise 6.7.1 The purpose of this exercise is to show (the vector form of) Minkowski’s
inequality, which says that for complex numbers xi , yi , i = 1, . . . , n, and p ≥ 1, we have
n
!1 n
!1 n
!1
X p X p X p
|xi + yi | ≤ |yi | + |yi | . (6.37)
i=1 i=1 i=1
Recall that a real-valued function f defined on an interval in R is called convex if for all
c, d in the domain of f , we have that f (tc + (1 − t)d) ≤ tf (c) + (1 − t)f (d), 0 ≤ t ≤ 1.
(a) Show that f (x) = − log x is a convex function on (0, ∞). (One can do this by showing
that f 00 (x) ≥ 0.)
Answer: f 00 (x) = x12 > 0.
p q
(b) Use (a) to show that for a, b > 0 and p, q ≥ 1, with p1 + 1q = 1, we have ab ≤ ap + bq .
This inequality is called Young’s inequality.
Answer: Taking c = ap and d = bq , t = p1 (and thus 1 − t = 1q ), we obtain from the
convexity of − log that
1 1 1 1
− log( ap + bq ) ≤ − log ap − log bq .
p q p q
Multiplying by −1 and applying s 7→ es on both sides gives
1 p 1 1 1
a + bq ≥ (ap ) p (bq ) q = ab.
p q
(c) Show Hőlder’s inequality: when ai , bi ≥ 0, i = 1, . . . , n, then
n n
!1 n
!1
p q
p q
X X X
ai bi ≤ ai bi .
i=1 i=1 i=1
1 1
Pn p p Pn q q
(Hint: Let λ = ( i=1 ai ) and µ = ( i=1 bi ) , and divide on both sides ai by λ
and bi by µ. Use this to argue that it is enough to prove the inequality when
λ = µ = 1. Next use (b)).
Answer: If λ or µ equals 0, the inequality is trivial, so let us assume λ, µ > 0. Put
1
p p p
αi = aλi and βi = bµi , i = 1, . . . , n. Then ( n
P Pn
i=1 αi ) = 1, and thus i=1 αi = 1.
Pn q P n
Similarly, i=1 βi = 1. We need to prove that i=1 αi βi ≤ 1. By (b) we have that
αi βi ≤ p1 αpi + 1q βiq , ı = 1, . . . , n. Taking the sum, we obtain
n n n
X 1X p 1X q 1 1
αi βi ≤ αi + β = + = 1,
i=1
p i=1 q i=1 i p q
and we are done.
(d) Use (c) to prove (6.37) in the case when xi , yi ≥ 0. (Hint: Write
(xi + yi )p = xi (xi + yi )p−1 + yi ((xi + yi )p−1 , take the sum on both sides, and now
apply Hőlder’s inequality to each of the terms on the right-hand side. Rework the
resulting inequality, and use that p + q = pq.)
Answer: Using (c) we have that
n
X n
X n
X
(xi + yi )p = xi (xi + yi )p−1 + yi (xi + yi )p−1 ≤
i=1 i=1 i=1
n n n n
1 X 1 1 X 1
xpi ) p ( yip ) p (
X X
( (xi + yi )(p−1)q ) q + ( (xi + yi )(p−1)q ) q =
i=1 i=1 i=1 i=1
n n n
1 1 1
xpi ) p yip ) p ](
X X X
[( +( (xi + yi )p ) q ,
i=1 i=1 i=1
where in the last step we used that (p − 1)q = p. Dividing both sides by
1
( n p q
P
i=1 (xi + yi ) ) , we obtain
n n n
1− 1 1 1
xpi ) p + ( yip ) p ,
X X X
( (xi + yi )p ) q ≤ (
i=1 i=1 i=1
1 1
and using that 1 − q
= p
, we are done.
(e) Prove Minkowski’s inequality (6.37).
Answer: We just need to observe Pthat for complex numbers xi and yi we have that
|xi + yi | ≤ |xi | + |yi |, and thus n p n p
P
i=1 |xi + yi | ≤ i=1 (|xi | + |yi |) . Using (d) we
obtain
n n n
X 1 X 1 X 1
( (|xi | + |yi |)p ) p ≤ ( |xi |p ) p + ( |yi |p ) p ,
i=1 i=1 i=1
and we are done.
(f) Show that when Vi has a norm k · ki , i = 1, . . . , k, then for p ≥ 1 we have that
v1
 
k
! p1
 .  X p
k  .  kp := kvi ki
.
i=1
vk
defines a norm on V1 × · · · × Vk .
Answer: The only part that is not trivial is the triangle inequality. For this we need to
observe that kvi + wP i ki ≤ kvi ki + kwi ki , and thus
P n p n p
i=1 kvi + wi ki ≤ i=1 (kvi ki + kwi ki ) . Now we can apply (d) with xi = kvi ki and
yi = kwi ki , and obtain
n n n n
1 1 1 1
kvi kpi ) p + ( kwi kpi ) p ,
X X X X
( (kvi + wi ki )p ) p ≤ ( (kvi ki + kwi ki )p ) p ≤ (
i=1 i=1 i=1 i=1
proving the triangle inequality.
Exercise 6.7.2 Let V and Z be vector spaces over F and T : V → Z be linear. Suppose
W ⊆ Ker T . Show there exists a linear transformation S : V /W → Ran T such that
S(v + W ) = T v for v ∈ V . Show that S is surjective and that Ker S is isomorphic to
(Ker T )/W .
Answer: Define S : V /W → Ran T via S(v + W ) = T v. We need to check that S is

well-defined. For this, suppose that v + W = x + W . Then v − x ∈ W . As W ⊆ Ker T , we
thus have that v − x ∈ Ker T , which implies that T v = T x. This shows that S is
well-defined.
Next, to show surjectivity of S, let y ∈ RanT . Then there exists a v ∈ V so that T v = y.

As S(v + W ) = T v = y, we obtain that y ∈ RanS, showing surjectivity.
Finally, let us define φ : (Ker T )/W → Ker S via φ(v + W ) = v + W , where v ∈ KerT .
We claim that φ is an isomorphism. First note that S(v + W ) = T v = 0, as v ∈ KerT .
Clearly, φ is linear and one-to-one, so it remains to check that φ is surjective. When
v + W ∈ KerS, then we must have that T v = 0. Thus v ∈ KerT , yielding that
v + W ∈ (Ker T )/W . Clearly, φ(v + W ) = v + W , and thus v + W ∈ Ranφ.
Exercise 6.7.3 Consider the vector space Fn×m , where F = R or F = C, and let k · k be
norm on Fn×m .
(k)
(a) Let A = (aij )n m n m
i=1,j=1 , Ak = (aij )i=1,j=1 , k = 1, 2, . . . , be matrices in F
n×m . Show
(k)
that limk→∞ kAk − Ak = 0 if and only if limk→∞ |aij − aij | = 0 for every
i = 1, . . . , n and j = 1, . . . , m.
(b) Let n = m. Show that limk→∞ kAk − Ak = 0 and limk→∞ kBk − Bk = 0 imply that
limk→∞ kAk Bk − ABk = 0.
Answer: (a) Notice that if ckAk − Aka ≤ kAk − Akb ≤ CkAk − Aka , for some c, C > 0,
and limk→∞ kAk − Aka = 0, then limk→∞ kAk − Akn = 0. Thus, by Theorem 5.1.25,
when we have limk→∞ kAk − Ak = 0 in one norm on Fn×m , we automatically have it for
every norm on Fn×m . Let us use the norm
kM k∞ = k(mij )n m
i=1,j=1 k := max |mij |.
i=1,...,n;j=1...,m
Notice that |mij | ≤ kM k∞ for every i and j.
Suppose that limk→∞ kAk − Ak∞ = 0. Then for every i = 1, . . . , n and j = 1, . . . , m, we

(k)
have |aij − aij | ≤ kAk − Ak∞ , and thus
(k) (k)
0 ≤ limk→∞ |aij − aij | ≤ limk→∞ kAk − Ak = 0, giving limk→∞ |aij − aij | = 0.
(k)
Next, let limk→∞ |aij − aij | = 0 for every i and j. Let > 0. Then for every i and j,
(k)
there exists a Kij ∈ N so that for k > Kij we have |aij − aij | < . Let now
K = maxi=1,...,n;j=1...,m Kij . Then for every k > K we have that
(k)
kAk − Ak∞ = maxi=1,...,n;j=1...,m |aij − aij | < . Thus, by definition of a limit, we have
limk→∞ kAk − Ak∞ = 0.
(b) For scalars we have that limk→∞ |ak − a| = 0 = limk→∞ |bk − b| implies
limk→∞ |(ak + bk ) − (a + b)| and limk→∞ |ak bk − ab| = 0 (which you can prove by using
inequalities like |ak bk − ab| = |ak bk − ak b + ak b − ab| ≤ |ak ||bk − b| + |ak − a||b|).
Equivalently, limk→∞ ak = a and limk→∞ bk = b implies limk→∞ ak bk = ab.
Suppose now that limk→∞ kAk − Ak = 0 = limk→∞ kBk − Bk = 0. Then, using (a),
(k) (k)
limk→∞ aij = aij and limk→∞ bij = bij for all i, j = 1, . . . , n. Now, for the (r, s)
element of the product Ak Bk we obtain
n n
(k) (k) (k) (k)
X X
lim (Ak Bk )rs = lim arj bjs = ( lim arj )( lim bjs ) =
k→∞ k→∞ k→∞ k→∞
j=1 j=1
n
X
arj bjs = (AB)rs , r, s = 1, . . . , n.
j=1
Again using (a), we may conclude limk→∞ kAk Bk − ABk = 0.
Exercise 6.7.4 Given A ∈ Cn×n , we define its similarity orbit to be the set of matrices
O(A) = {SAS −1 : S ∈ Cn×n is invertible}.
Thus the similarity orbit of a matrix A consists of all matrices that are similar to A.
(a) Show that if A is diagonalizable, then its similarity orbit O(A) is closed. (Hint: notice
that due to A being diagonalizable, we have that B ∈ O(A) if and only if mA (B) = 0.)
(b) Show that if A is not diagonalizable, then its similarity orbit is not closed.
Answer: (a) Suppose that Bk ∈ O(A), k ∈ N, and that limn→∞ kBk − Bk = 0. We need to
show that B ∈ O(A), or equivalently, mA (B) = 0. Write
mA (t) = an tn + an−1 tn−1 + · · · + a0 (where an = 1). By exercise 6.7.3(b) we have that
limn→∞ kBk − Bk = 0 imples that limn→∞ kBkj − B j k = 0 for all j ∈ N. But then
n
|aj |kBkj − B j k = 0.
X
lim kmA (Bk ) − mA (B)k ≤ lim
k→∞ k→∞
j=0
As mA (Bk ) = 0 for every k, we thus also have that mA (B) = 0. Thus B ∈ O(A) follows.
(b) First let A = Jk (λ), k ≥ 2, be a Jordan block. For > 0 put D = diag(j )k−1
j=0 . Then
λ 0 ··· 0
 
0 λ ··· 0
 
A := D−1 Jk (λ)D =  .. .. .. ..  ∈ O(A). (6.38)

. . . .
0 0 ··· λ 
0 0 ··· 0 λ
Notice that limm→∞ A 1 = λIk 6∈ O(A), and thus O(A) is not closed.
m
Using the reasoning above, one can show that if A = SJS −1 with J = ⊕sl=1 Jnl (λl ) and
some nl > 1, then S(⊕sj=1 λl Inl )S −1 6∈ O(A) is the limit of elements in O(A). This gives
that O(A) is not closed.
Exercise 6.7.5 Suppose that V is an infinite-dimensional vector space with basis

{vj }j∈J . Let fj ∈ V 0 , j ∈ J, be so that fj (vj ) = 1 and fj (vk ) = 0 for k =
6 j. Show that
{fj }j∈J is a linearly independent set in V 0 but is not a basis of V 0 .
linear combination f = sr=1 cr fjr and set it equal to 0. Then

P
Answer: Consider a finite P
r
f (vjk ) = 0, and thus 0 = i=1 cr fjr (vjk ) = ck . As this holds for all k = 1, . . . , s, we get
that c1 = . . . = cs = 0, proving linear independence.
Next, let f ∈ V 0 be defined byPf (vj ) = 1 for all j ∈ J. In other words, if v = sr=1 cr vjr
P
s
is a vector in V , then f (v) = r=1 cr . Clearly, f is a linear functional on V . In addition,
not a finite linear combination of elements in {fj : j ∈ J}. Indeed, suppose that
f is P
f = sr=1 cr fjr . Choose now a j ∈ J \ {j1 , . . . , jr }, which can always Pbe done since J is
infinite. Then f (vj ) = 1, while fjr (vj ) = 0 as j 6= jr . Thus f (vj ) 6= sr=1 cr fjr (vj ),
giving that f 6∈ Span{fj : j ∈ J}.
Exercise 6.7.6 Describe the linear functionals on Cn [X] that form the dual basis of
{1, X, . . . , X n }.
Answer: If Φ0 , . . . , Φn are the dual basis elements, and p(X) = p0 + p1 X + · · · + pn X n ,

then we need that
Φj (p(X)) = pj , j = 0, . . . , n.
One way to find the number j!pj is to take the jth derivative of p(X), and evaluate this
jth derivative at 0. Thus we can describe Φj as
1 dj p
Φj (p(X)) = (0), j = 1, . . . , n.
j! dX j
Exercise 6.7.7 Let a0 , . . . , an be different complex numbers, and define Ej ∈ (Cn [X])0 ,
j = 0, . . . , n, via Ej (p(X)) = p(aj ). Find a basis of Cn [X] for which {E0 , . . . , En } is the
dual basis.
Answer: If we let {q0 (X), . . . , qn (X)} be the basis of Cn [X] we are looking for, then we
need that Ej (qk (X)) = 1 if j = k, and Ej (qk (X)) = 0 if j = 6 k. Thus, we need to find a
polynomial qk (X) so that qk (ak ) = 1, while a0 , . . . , ak−1 , ak+1 , . . . , an are roots of qk (X).
Thus
qk (X) = c(X − a0 ) · · · (X − ak−1 )(X − ak+1 ) · · · (X − an ),
with c chosen so that qk (ak ) = 1. Thus we find
Y X − ar
qk (X) = ,
r=0,...,n;r6=k
ak − ar
which are called the Lagrange interpolation polynomials.
Exercise 6.7.8 Let V = W +̇X.
(a) Show how given f ∈ W 0 and g ∈ X 0 , one can define h ∈ V 0 so that h(w) = f (w) for
w ∈ W and h(x) = g(x) for x ∈ X.
(b) Using the construction in part (a), show that V 0 = W 0 +̇X 0 . Here it is understood
that we view W 0 as a subspace of V 0 , by letting f ∈ W 0 be defined on all of V by
putting f (w + x) = f (w), when w ∈ W and x ∈ X. Similarly, we view X 0 as a
subspace of V 0 , by letting g ∈ W 0 be defined on all of V by putting g(w + x) = g(x),
when w ∈ W and x ∈ X.
Answer: (a) Let f ∈ W 0 and g ∈ X 0 , and v ∈ V . As V = W +̇X, there exist unique

w ∈ W and x ∈ X so that v = w + x. We now define h(v) = f (w) + g(x). Then h ∈ V 0
and satisfies the desired conditions.
(b) We first show that W 0 ∩ X 0 = {0}. Indeed, let f ∈ W 0 ∩ X 0 . By the way of viewing
f ∈ W 0 as a function on all of V , we have that f (x) = 0 for all x ∈ X. Similarly, by the
way of viewing f ∈ X 0 as a function on all of V , we have that f (w) = 0 for all w ∈ W .
But then for a general v ∈ V , which can always be written as v = w + x for some w ∈ W
and x ∈ X, we have that f (v) = f (w + x) = f w) + f (x) = 0 + 0 = 0. Thus f is the zero
functional, yielding W 0 ∩ X 0 = {0}.
Next, when h ∈ V 0 , we can define f ∈ W 0 and g ∈ X 0 as by f (w) = h(w), w ∈ W and

g(x) = h(x), x ∈ X. Then, with the understanding as in (b), we have that h = f + g. This
shows that V 0 = W 0 + X 0 . Together with W 0 ∩ X 0 = {0}, we obtain V 0 = W 0 +̇X 0 .
Exercise 6.7.9 Let W be a subspace of V . Define

Wann = {f ∈ V 0 : f (w) = 0 for all w ∈ W },
the annihilator of W .
(a) Show that Wann is a subspace of V 0 .

   
1 1
−1 0
(b) Determine the annihilator of Span{  ,  } ⊆ C 4 .
 2  1
−2 0
(c) Determine the annihilator of Span{1 + 2X, X + X 2 } ⊆ R3 [X].
Answer: (a) Let f, g ∈ Wann and c be a scalar. Then for w ∈ W we have that
(f + g)(w = f (w) + g(w) = 0 + 0 = 0 and (cf )(w = cf (w) = c0 = 0. This shows that
f + g, cf ∈ Wann , and thus Wann is a subspace.

1 −1 2 −2
(b) This amounts to finding the null space of , which in row-reduced
1 0 1 0

1 0 1 0
echelon form is . The null space is spanned by
0 1 −1 2
   
−1 0
 1  −2
v1 =   , v2 =  
   . Thus Wann = Span{f1 , f2 }, where (using the Euclidean inner
1 0 
0 1
product) fi (v) = hv, v1 i, i = 1, 2.

1 2 0 0
(c) This amounts to finding the null space of , which is spanned by
0 1 1 0
   
2 0
−1
 , v2 = 0 . Now define f1 (p0 + p1 X + p2 X 2 + p3 X 3 ) = 2p0 − p1 + p2 and
 
v1 =  1  0
0 1
f2 (p0 + p1 X + p2 X 2 + p3 X 3 ) = p3 . Then Wann = Span{f1 , f2 }.
Exercise 6.7.10 Let V be a finite-dimensional vector space over R, and let {v1 , . . . , vk }
be linearly independent. We define
k
X
C = {v ∈ V : there exist c1 , . . . , ck ≥ 0 so that v = ci vi }.
i=1
Show that v ∈ C if and only if for all f ∈ V 0 with f (vj ) ≥ 0, j = 1, . . . , k, we have that
f (v) ≥ 0.
Remark. The statement is also true when {v1 , . . . , vk } are not linearly independent, but
in that case the proof is more involved. The corresponding result is the Farkas–Minkowski
Theorem, which plays an important role in linear programming.
Answer: Clearly, when v = ki=1 ci vi ∈ C and f (vj ) ≥ 0, j = 1, . . . , k, then

P
Pk
f (v) = i=1 ci f (vi ) ≥ 0, since ci ≥ 0 and f (vi ) ≥ 0, i = 1, . . . , n.
Conversely, suppose that v ∈ V has the property that for all f ∈ V 0 with f (vj ) ≥ 0,
j = 1, . . . , k, we have that f (v) ≥ 0. First, we show that v ∈ Span{v1 , . . . , vk }. If not, we
can find a linear functional so that on the (k + 1)-dimensional space Span{v, v1 , . . . , vk }
we have f (v) = −1 and f (vj ) = 0, j = 1, . . . , k. But this contradicts that v ∈ V has the
property that for all f ∈ V 0 with f (vj ) ≥ 0, j = 1, . . . , k, we have that f (v) ≥ 0.
As v ∈ Span{v1 , . . . , vk }, we may write v = ki=1 ci vi , for some scalars c1 , . . . , ck . Fix a

P
0 6 j. Then f ∈ V 0
j ∈ {1, . . . , k}. Let now f ∈ V be so that f (vj ) = 1 and f (vr ) = 0, r =
with f (vr ) ≥ 0, r = 1, . . . , k, and thus we must have that f (v) ≥ 0. As this number equals
cj , we obtain that cj ≥ 0. This holds for every j = 1, . . . k, and thus we find that v ∈ C.
Exercise 6.7.11 Let V and W be finite-dimensional vector spaces and A : V → W a

linear map. Show that Av = w has a solution if and only if for all f ∈ (RanA)ann we have
that f (w) = 0. Here the definition of the annihilator is used as defined in Exercise 6.7.9.
Answer: If Av = w and f ∈ (RanA)ann , then 0 = f (Av) = f (w), proving the only if

statement. Next, suppose that for all f ∈ (RanA)ann we have that f (w) = 0. If
w 6∈ RanA, then letting {w1 , . . . , wk } be a basis of RanA, we can find a linear functional
so that f (wj ) = 0, j = 1, . . . , k, and f (w) = 1. Then f ∈ (RanA)ann , but f (w) 6= 0, giving
a contradiction. Thus we must have that w ∈ RanA, yielding the existence of a v so that
Av = w.
Exercise 6.7.12 For x, y ∈ R3 , let the cross product x × y be defined as in (6.17).
(a) Show that hx, x × yi = hy, x × yi = 0.

(b) Show that x × y = −y × x.
(c) Show that x × y = 0 if and only if {x, y} is linearly dependent.
Answer: (a) and (b) are direct computations. For (c), let x × y = 0, and assume that one
of the entries of x and y is nonzero (otherwise, we are done). Without loss of
generalization, we assume that x1 6= 0. Then, reworking the equations one obtains from
x × y = 0, one sees that y = xy1 x.
1
Exercise 6.7.13 Let

 
−1 0
i 1−i 2−i
A= , B = −2 5 .
1+i −2 −3 + i
1 3
Compute A ⊗ B and B ⊗ A, and show that they are similar via a permutation matrix.
Answer:
−i 0 −1 + i 0 −2 + i 0
 
 −2i 5i −2 + 2i 5 − 5i −4 + 2i 10 − 5i 
i 3i 1−i 3 − 3i 2−i 6 − 3i 
 
A⊗B = ,

 −1 − i 0 2 0 3−i 0 
−2 − 2i 5 + 5i 4 −10 6 − 2i −15 + 5i
1+i 3 + 3i −2 −6 −3 + i −9 + 3i
−i −1 + i −2 + i 0 0 0
 
 −1 − i 2 3−i 0 0 0 
 −2i −2 + 2i −4 + 2i 5i 5 − 5i 10 − 5i 
 
B⊗A= .
−2 − 2i 4 6 − 2i 5 + 5i −10 −15 + 5i
 i 1−i 2−i 3i 3 − 3i 6 − 3i 
1+i −2 −3 + i 3 + 3i −6 −9 + 3i
We have that A ⊗ B = P (B ⊗ A)P T , where
1 0 0 0 0 0
 
0 0 0 1 0 0
0 1 0 0 0 0
 
P = .
0 0 0 0 1 0
0 0 1 0 0 0
0 0 0 0 0 1
Exercise 6.7.14 Let A ∈ Fn×n and B ∈ Fm×m .
(a) Show that tr(A ⊗ B) = (tr A)(tr B).

(b) Show that rank(A ⊗ B) = (rank A)(rank B).
Answer: (a) The diagonal entries of A ⊗ B are aii bjj , i = 1, . . . , n, j = 1, . . . , m. Adding

them all up, we obtain
n X
X m n
X m
X n
X
tr(A ⊗ B) = aii bjj = [aii ( bjj )] = [aii (tr B)] = (tr A)(tr B).
i=1 j=1 i=1 j=1 i=1
(b) From the Gaussian elimination algorithm we know that we can write A = SET and
B = Ŝ Ê T̂ , where S, T , Ŝ and T̂ are invertible, and

Ik 0 Il 0
E= , Ê = ,
0 0 0 0
where k = rankA and l = rankB. Then
A ⊗ B = (S ⊗ Ŝ)(E ⊗ Ê)(T ⊗ T̂ ).
Notice that E ⊗ Ê has rank kl (as there are exactly kl entries equal to 1 in different rows
and in different columns, and all the other entries are 0). Since (S ⊗ Ŝ) and (T ⊗ T̂ ) are
invertible, we get that
rank(A ⊗ B) = rank(E ⊗ Ê) = kl = (rank A)(rank B).
Exercise 6.7.15 Given Schur triangularization decompositions for A and B, find a Schur
triangularization decomposition for A ⊗ B. Conclude that if λ1 , . . . , λn are the eigenvalues
for A and µ1 , . . . , µm are the eigenvalues for B, then λi µj , i = 1, . . . , n, j = 1, . . . , m, are
the nm eigenvalues of A ⊗ B.
Answer: If A = U T U ∗ and B = V SV ∗ , with U, V unitary and T, S upper triangular, then

A ⊗ B = (U ⊗ V )(T ⊗ S)(U ⊗ V )∗ . (6.39)
We have that U ⊗ V is unitary, and it is easy to see that T ⊗ S is upper triangular with its
diagonal entries equal to the products of diagonal entries of T and S. Thus λi µj ,
i = 1, . . . , n, j = 1, . . . , m, are the nm diagonal entries (and thus eigenvalues) of T ⊗ S,
and (6.39) a Schur triangularization decomposition for A ⊗ B.
Exercise 6.7.16 Given singular value decompositions for A and B, find a singular value
decomposition for A ⊗ B. Conclude that if σ1 , . . . , σk are the nonzero singular values for A
and σ̂1 , . . . , σ̂l are the nonzero singular values for B, then σi σ̂j , i = 1, . . . , k, j = 1, . . . , l,
are the kl nonzero singular values of A ⊗ B.
Answer: If A = V ΣW ∗ and B = V̂ Σ̂Ŵ ∗ , are singular value decompositions, then

A ⊗ B = (V ⊗ V̂ )(Σ ⊗ Σ̂)(W ⊗ Ŵ )∗ . (6.40)
We have that V ⊗ V̂ and W ⊗ Ŵ are unitary, and Σ ⊗ Σ̂ is up to permutation of rows and

columns of the form
R 0
,
0 0
where R is a kl × kl diagonal matrix with diagonal entries σi σ̂j , i = 1, . . . , k, j = 1, . . . , l.
Thus a singular value decomposition of A ⊗ B is given by
A ⊗ B = [(V ⊗ V̂ )P T ]P (Σ ⊗ Σ̂)P T [P (W ⊗ Ŵ )∗ ], (6.41)
where the permutation matrix P is chosen to that P (Σ ⊗ Σ̂)P T has the nonzero singular
values σi σ̂j , i = 1, . . . , k, j = 1, . . . , l in nonincreasing order in the entries
(1, 1), (2, 2), . . . , (kl, kl) and zeros everywhere else.
Exercise 6.7.17 Show that det(I ⊗ A + A ⊗ I) = (−1)n det pA (−A), where A ∈ Cn×n .
Answer: Let A = U T U ∗ be a Schur triangularization decomposition, where the diagonal

entries of T are λ1 , . . . , λn . Then
I ⊗ A + A ⊗ I = (U ⊗ U )(I ⊗ T + T ⊗ I)(U ⊗ U )∗ .
Notice that I ⊗ T + T ⊗ I isQupper triangular with diagonal entries λi + λQ j , i, j = 1, . . . , n.
Thus det(I ⊗ A + A ⊗ I) = n i,j=1 (λi + λj ). On the other hand, pA (t) =
n
j=1 (t − λj ), so
∗ ∗
pA (−A) = U pA (−T )U = U (−T − λ1 I) · · · (−T − λn I)U . This gives that
det pA (−A) = det(−T − λ1 ) · · · det(−T − λn ) =
n Yn n
Y 2 Y
[ (−λi − λj )] = (−1)n (λi + λj ).
j=1 i=1 i,j=1
2
It remains to observe that (−1)n = (−1)n since n2 is even if and only if n is even.
Exercise 6.7.18 Show that if A is a matrix and f a function, so that f (A) is

well-defined, then f (Im ⊗ A) is well-defined as well, and f (Im ⊗ A) = Im ⊗ f (A).
Answer: Let A = SJS −1 be a Jordan canonical decomposition of A. Then Im ⊗ A =

(Im ⊗ S)(Im ⊗ J)(Im ⊗ S)−1 . Since Im ⊗ J is a direct sum of m copies of J, we have that
Im ⊗ J gives the Jordan canonical form of Im ⊗ A. Thus
f (Im ⊗ A) = (Im ⊗ S)f (Im ⊗ J)(Im ⊗ S)−1 . Moreover, as Im ⊗ J is a direct sum of m
copies of J, we obtain that f (Im ⊗ J) = Im ⊗ f (J). Now
f (Im ⊗ A) = (Im ⊗ S)(Im ⊗ f (J))(Im ⊗ S)−1 = Im ⊗ (Sf (J)S −1 ) = Im ⊗ f (A).
Exercise 6.7.19 For a diagonal matrix A = diag(λi )n i=1 , find matrix representations for
A ∧ A and A ∨ A using the canonical (lexicographically ordered) bases for Fn ∧ Fn and
Fn ∨ Fn , respectively.
Answer: The diagonal elements of the diagonal matrix A ∧ A are ordered as

λ1 λ2 , . . . , λ1 λn , λ2 λ3 , . . . , λ2 λn , . . . , λn−2 λn−1 , λn−2 λn , λn−1 λn .
The diagonal elements of the diagonal matrix A ∨ A are ordered as

λ1 λ1 , . . . , λ1 λn , λ2 λ2 , . . . , λ2 λn , . . .
. . . , λn−2 λn−2 , . . . , λn−2 λn , λn−1 λn−1 , λn−1 λn , λn λn .
Exercise 6.7.20 Show that hv1 ∧ · · · ∧ vk , w1 ∧ · · · ∧ wk i = k! det(hvi , wj i)ki,j=1 .
Answer: Applying the definition of the anti-symmetric wedge product and using the
linearity of the inner product we have that
hv1 ∧ · · · ∧ vk , w1 ∧ · · · ∧ wk i =
X X
(−1) (−1)τ hvσ(1) ⊗ · · · ⊗ vσ(k) , wτ (1) ⊗ · · · ⊗ wτ (k) i.
σ
σ∈Sk τ ∈Sk
Using the definition of the inner product on the tensor space, we obtain that the above
equals X X
(−1)σ (−1)τ hvσ(1) , wτ (1) i · · · hvσ(k) , wτ (k) i.
σ∈Sk τ ∈Sk
Qk Qk
Since i=1 hvσ(i) , wτ (i) i = i=1 hvi , wτ ◦σ −1 (i) i, we obtain that
k
X X −1 Y X
[(−1)τ ◦σ hvi , wτ ◦σ−1 (i) i] = det(hvi , wj i)ki,j=1 =
σ∈Sk τ ∈Sk i=1 σ∈Sk
k! det(hvi , wj i)ki,j=1 .
Exercise 6.7.21 Find an orthonormal basis for ∨2 C3 .
Answer: { 21 e1 ∨ e1 , 1
√ e1 ∨ e2 , 12 e2 ∨ e2 , √1
e1 ∨ e3 , √1 e2 ∨ e3 , 12 e3 ∨ e3 }.
2 2 2
Exercise 6.7.22 (a) Let A = (aij )2i=1,j=1

m ∈ F2×m and B = (bij )m 2
i=1,j=1 ∈ F
m×2 . Find
the matrix representations for A ∧ A, B ∧ B and AB ∧ AB using the canonical

(lexicographically ordered) bases for ∧k Fn , k = 2, n = 2, m, 1, respectively.
(b) Show that the equality AB ∧ AB = (A ∧ A)(B ∧ B) implies that
m
X m
X m
X m
X
( a1j bj1 )( a2j bj2 ) − ( a1j bj2 )( a2j bj1 ) =
j=1 j=1 j=1 j=1
X
(a1j a2k − a1k a2j )(b1j b2k − b1k b2j ). (6.42)
1≤j<k≤m
(c) Let M = {1, . . . , m} and P = {1, . . . , p}. For A ∈ Fp×m and B ∈ Fm×p , show that
X
det AB = det(A[P, S]) det(B[S, P ]). (6.43)
S⊆M,|S|=p
(Hint: Use that (∧p A)(∧p B) = ∧p (AB) = det AB.)
Remark: Equation (6.43) is called the Cauchy–Binet identity. When p = 2 it reduces to

(6.42), which when B = AT (or B = A∗ when F = C) is called the Lagrange identity.
Answer: (a)
A∧A = a11 a22 − a12 a21 a11 a23 − a13 a21 ··· a11 a2m − a1m a21
a12 a23 − a13 a22 ···

a12 a2m − a1m a22 ··· ··· a1,m−1 a2m − a2,m−1 a1m .
b11 b22 − b21 b12

 
 b11 b32 − b31 b12 
..
 
 

 . 


 b b
11 m2 − b b
m1 12


 b21 b32 − b31 b22 
B∧B =  .
 b 21 bm2 − b m1 b 22


 . 

 .. 

 
 .. 
 . 
bm−1,1 bm2 − bm−1,2 bm1
Pm Pm
j=1 a1j bj1 j=1 a1j bj2
As AB = Pm Pm , we get that AB ∧ AB is the 1 × 1 matrix
j=1 a2j bj1 j=1 a2j bj2
Pm Pm Pm Pm
AB ∧ AB = ( j=1 a1j bj1 )( j=1 a2j bj2 ) − ( j=1 a1j bj2 )( j=1 a2j bj1 ) .
(b) This follows immediately from using (a) and multiplying A ∧ A with B ∧ B.
(c) To show (6.43) one needs to use that (∧p A)(∧p B) = ∧p (AB) = det(AB), where in the
last step we used that AB is of size p × p. The 1 × m
p
matrix ∧p A is given by
∧p A = (A[P, S])S⊆M,|S|=p .
m
Similarly, the p
× 1 matrix ∧p B is given by
∧p B = (B[S, P ])S⊆M,|S|=p .
Equation (6.43) now immediately follows from (∧p A)(∧p B) = ∧p (AB).
Exercise 6.7.23 For x, y ∈ R3 , let the cross product x × y be defined as in (6.17). Show,
using (6.42) (with B = AT ), that
kx × yk2 = kxk2 kyk2 − (hx, yi)2 . (6.44)
Notice that this equality implies the Cauchy–Schwarz inequality.
kxk2

x1 x2 x3 hx, yi
Answer: Let A = = B T . Then AB = 2 , thus
y1 y2 y3 hy, xi kyk
det AB = kxk2 kyk2 − (hx, yi)2 . Next B ∧ B = x × y = (A ∧ A)T . And thus
(A ∧ A)(B ∧ B) = kx × yk2 . As this equals AB ∧ AB = det AB, we obtain (6.44). Since
kx × yk2 ≥ 0, equation (6.44) implies the Cauchy–Schwarz inequality for the Euclidean
inner product on R3 .
Chapter 7
Exercise 7.9.1 Let p(n) be a polynomial in n of degree k, and let λ ∈ C be of modulus

p(n)
greater than one. Show that limn→∞ λn = 0. (Hint: write |λ| = 1 + , > 0, and use the
binomial formula to give that |λ | = j=0 n
n
Pn j
j
, which for n large enough can be
bounded below by a polynomial of degree greater than k.)
Answer: Let n > 3k, then

n n(n − 1) · · · (n − k) 1 2n
= ≥ ( )k+1 .
k−1 (k − 1)(k − 2) · · · 1 (k − 1)! 3
Thus
(k − 1)! kj=0 |pj | nk
P
p(n) |p0 + · · · + pk nk |
| n |= Pn n j ≤ → 0 as n → ∞.
λ j=0 j (2/3)k+1 (k−1) nk+1
Exercise 7.9.2 Let A = (aij )n i,j=1 ∈ R

n×n . Let A be column-stochastic, which means
Pn
that aij ≥ 0 for all i, j = 1, . . . , n, and i=1 aij = 1, j = 1, . . . , n.
(i) Show that 1 is an eigenvalue of A.

Answer: If we let e = 1 · · · 1 be the row vector of all ones, then eA = e, and
thus 1 is an eigenvalue of A (with left eigenvector e).
(ii) Show that Am is column-stochastic for all m ∈ N. (Hint: use that eA = e.)
Answer: Clearly Am has all nonnegative entries, and the equality eAm = e gives
that the column sums of Am are 1.
(iii) Show that for every x, y ∈ Rn we have that |yT Am x| ≤ ( n
P Pn
j=1 |xj |)( j=1 |yj |) for
all m ∈ N. In particular, the sequence {yT Am x}m∈N is bounded.
Answer: As Am is column-stochastic, P we have that each entry m m
P (A )ij of A satisfies
0 ≤ (Am )ij ≤ 1. Then |yT Am x| = n i,j=1 |yi (Am )ij xj | ≤ ni,j=1 |yi ||xj | =
( n
P Pn
i=1 |yi |)( j=1 |xj |).
(iv) Show that A cannot have Jordan blocks at 1 of size greater than 1. (Hint: Use that
when k > 1 some of the entries of Jk (1)m do not stay bounded as m → ∞. With this
observation, find a contradiction with the previous part.)
Answer: First notice that when k > 1 the (1, 2) entry of Jk (1)m equals m. Suppose
now that the Jordan canonical decomposition A = SJS −1 of A has Jk (1) in the
upper left corner of J for some k > 1. Put x = Se2 and y = (S T )−1 e1 . Then
yT Am x = m → ∞ as m → ∞. This is in contradiction with the previous part.
(v) Show that if xA = λx, for some x 6= 0, then |λ| ≤ 1.

Answer: If x = x1 · · · xn , let k be so that |xk | = maxj=1,...,n |xj |. Note that
|xk | > 0. Then the kth component of xA satisfies
n
X n
X
|λxk | = |(xA)k | = | aij xi | ≤ aij |xk | = |xk |,
i=1 i=1
and thus after dividing by |xk |, we get |λ| ≤ 1.

(vi) For a vector v = (vi )n n
i=1 we define |v| = (|vi |)i=1 . Show that if λ is an eigenvalue of
A with |λ| = 1, and xA = λx, then y := |x|A − |x| has all nonnegative entries.
Answer: We have
n
X n
X
|xj | = |λxj | = |(xA)j | = | xi aij | ≤ |xi |aij = (|x|A)j , j = 1, . . . , n.
i=1 i=1
For the remainder of this exercise, assume that A only has positive entries; thus
aij > 0 for all i, j = 1, . . . , n.
(vii) Show that y = 0. (Hint: Put z = |x|A, and show that y = 6 P
0 implies that zA − z has
all positive entries. The latter can be shown to contradict n i=1 aij = 1,
j = 1, . . . , n.)
Answer: Suppose that y = 6 0. Then yA = |x|A2 − |x|A has all positive entries (as at
least one entry of y is positive and the others are nonnegative). Put z = |x|A. Then
zA − z has P all positive entries.
Pn If we let zk = maxj=1,...,n zj . then
(zA)k = n i=1 zi aik ≤ zk i=1 aik = zk , which contradicts that zA − z has all
positive entries. Thus we must have y = 0.
(viii) Show that if xA = λx with |λ| = 1, then x is a multiple of e and λ = 1. (Hint: first
show that all entries of x have the same modulus.)
Answer: Let k be so that |xk | = maxj=1,...,n |xj |. Suppose that |xr | < |xk | for some
r = 1, . . . , n. Then
n
X n
X n
X
|xk | = |λxk | = (|xA|)k = | xi aik | ≤ |xi |aik < |xk | aik = |xk |,
i=1 i=1 i=1
giving a contradiction. Thus |xk | = |xj | for j = 1, . . . , n. Now

n
X n
X n
X
|xk | = |λxk | = (|xA|)k = | xi aik | ≤ |xi |aik = |xk | aik = |xk |,
i=1 i=1 i=1
implies that we have

n
X n
X
| xi aik | = |xi |aik .
i=1 i=1
But then, using Corollary 5.1.21, we must have that xj = eiθ |xj |, j = 1, . . . , n, for
some θ ∈ R. Thus it follows that x = eiθ |xk |e. As eA = e, it follows that λ = 1.
(ix) Conclude that we can apply the power method. Starting with a vector v0 with
positive entries, show that there is a vector w with positive entries so that Aw = w.
In addition, show that w is unique when we require in addition that eT w = 1.
Answer: The previous parts show that λ1 = 1 is the eigenvalue of A of largest
modulus, and that 1 > maxj=2,...,n |λj |. The vectors Q{e, e + e1 , . . . , e + en−1 } span
Rn , so at least one of the vectors does not lie in Ker n
j=2 (A − λj ). Choose such a
vector as v0 , and apply Theorem 7.2.1. All the vectors vk have nonnegative entries,
and thus so does w. As w = 6 0, we get that Aw has all positive entries, and thus so
does w. Since dim Ker(A − I) = 1, the vector w is unique up to multiplying with a
scalar. Thus if we require that eT w = 1, we get that w is unique.
Exercise 7.9.3 Let k · k be a norm on Cn×n , and let A ∈ Cn×n . Show that
1
ρ(A) = lim kAk k k , (7.45)
k→∞
where ρ(·) is the spectral radius. (Hint: use that for any > 0 the spectral radius of
1
ρ(A)+
A is less than one, and apply Corollary 7.2.4.)
1
Answer: As limk→∞ C k = 1 for all C > 0, it follows from Theorem 5.1.25 that the limit
in (7.45) is independent of the chosen norm. Let us choose k · k = σ1 (·).
If λ is an eigenvalue and x a corresponding unit eigenvector, then

|λ|k = kλk xk = kAk xk ≤ max kAk yk = σ1 (Ak ),
kyk=1
and thus 1
|λ| ≤ (σ1 (Ak )) k .
This also holds for the eigenvalue of maximal modulus, and thus
1
ρ(A) ≤ (σ1 (Ak )) k . (7.46)
1
Next, let > 0. Then the spectral radius of B = ρ(A)+
A is less than one. Thus, by
Corollary 7.2.4, we have that B k → 0 as k → ∞. In particular, there exists a K so that for
k > K we have that σ1 (B k ) ≤ 1. Then σ1 (Ak ) ≤ (ρ(A) + )k , which gives that
1
(σ1 (Ak )) k ≤ ρ(A) + .
1
Together with (7.46), this now gives that limk→∞ (σ1 (A)) k = ρ(A).
Exercise 7.9.4 Let A = (aij )n n

i,j=1 , B = (bij )i,j=1 ∈ C
n×n so that |a | ≤ b
ij ij for
. . . , n. Show that ρ(A) ≤ ρ(B). (Hint: use (7.45) with the Frobenius norm
i, j = 1, q
Pn 2
kM k = i,j=1 |mij | .)
(k) (k)
Answer: If we denote Ak = (aij )n k n
i,j=1 , B = (bij )i,j=1 , then is is easy to check that
(k) (k)
|aij |
≤ for all i, j, k. Using the Frobenius norm this implies that kAk k ≤ kB k k for all
bij
k ∈ N. But then
1 1
ρ(A) = lim kAk k k ≤ lim kB k k k = ρ(B)
k→∞ k→∞
follows.
Exercise 7.9.5 Show that if {u1 , . . . , um } and {v1 , . . . , vm } are orthonormal sets, then
the coherence µ := maxi,j |hui , vj i|, satisfies √1m ≤ µ ≤ 1.
Answer: By Proposition 5.1.10 we have that |hui , vj i| ≤ kui kkvj k = 1. Thus µ ≤ 1

follows. Next, suppose that µ < √1m . As v1 = m
P
i=1 hui , v1 iui , we have
1 = kv1 k2 = m 2 m 1
P P
i=1 |hui , v1 i| < i=1 m = 1, giving a contradiction. Thus we have
µ ≥ √1m .
Exercise 7.9.6 Show that if A has the property that every 2s columns are linearly
independent, then the equation Ax = b can have at most one solution x with at most s
nonzero entries.
Answer: Suppose that Ax1 = b = Ax2 , where both x1 and x2 have at most s nonzero
entries. Then A(x1 − x2 ) = 0, and x1 − x2 has at most 2s nonzero entries. If x1 − x2 =
6 0
we obtain that the columns of A that hit a nonzero entry in x1 − x2 are linearly
independent. This contradicts the assumption that every 2s columns in A are linearly
independent. Thus x1 = x2 .
Exercise 7.9.7 Let A = (aij )n i,j=1 . Show that for all permutation σ on {1, . . . , , n} we
have a1,σ(1) a2,σ(2) · · · an,σ(n) = 0 if and only if there exist r (1 ≤ r ≤ n) rows and
n + 1 − r columns in A so that the entries they have in common are all 0.
Answer: When there exist rows j1 , . . . jr , 1 ≤ r ≤ n − 1, and columns k1 , . . . , kn+r−1 in A

so that the entries they have in common are all 0, then for all permutations σ we have that
{σ(j1 ), . . . , σ(jr )} ∩ {k1 , . . . , kn+r−1 } 6= ∅.
= 0, and thus n
Q
If l lies in this intersection, we have that ajl ,σ(jQ l) i=1 ai,σ(i) = 0. When
n
r ∈ {1, n}, a full row or column is 0, and thus i=1 ai,σ(i) = 0 follows as well.
For the converse, we use induction on the size of the matrix n. When n = 1, the statement
is trivial, so suppose that the result holds for matrices of size up to n − 1. Let now
A = (aij )n i,j=1 and suppose that a1,σ(1) a2,σ(2) · · · an,σ(n) = 0 for all σ. If A = 0, we are
done. Next, let A have a nonzero entry, say ai0 ,j0 = 6 0. Deleting the row and column of
this nonzero entry, we must have that the resulting (n − 1) × (n − 1) submatrix has a zero
in every of its generalized diagonals {(1, τ (1)), . . . , (n − 1, τ (n − 1))} with τ a permutation
on {1, . . . , n − 1}. By the induction assumption, we can identify rows
j1 , . . . , jr ∈ {1, . . . , n} \ {i0 } and columns k1 , . . . , kn−r ∈ {1, . . . , n} \ {j0 }, so that the
entries they have in common are all 0. By permuting rows and columns of A, we may
assume {j1 , . . . , jr } = {1, . . . , r} and {k1 , . . . , kn−r } = {r + 1, . . . , n}. Thus we have that

A11 0
A= ,
A12 A22
where A11 is r × r and A22 is (n − r) × (n − r). Due to the assumption on A, we must
have that either A11 or A22 also has the property that each of its generalized diagonals
has a zero element. By applying the induction assumption on A11 or A22 , we obtain that
one of these matrices has (possibly after a permutation of rows and columns) an upper
triangular zero block which includes a diagonal entry. But then A has an upper triangular
zero block which includes a diagonal zero entry, and thus we obtain the desired s rows and
n − s + 1 columns.
Exercise 7.9.8 We say that A = (aij )n i,j=1 ∈ R

n×n is row-stochastic if AT is
columns-stochastic. We call A doubly stochastic if A is both column- and row-stochastic.

The matrix P = (pij )n
i,j=1 is called a permutation matrix if every row and column of P
has exactly one entry equal to 1 and all the others equal to zero.
(i) Show that a permutation matrix is doubly stochastic.

Answer: Every row and column has exactly one entry equal to 1 and all others equal
to 0, so all row and column sums equal 1. In addition, all entries (being either 0 or
1) are nonnegative.
(ii) Show that if A is a doubly stochastic matrix, then there exists a permutation σ on
{1, . . . , , n}, so that a1,σ(1) a2,σ(2) · · · an,σ(n) 6= 0.
Answer: Suppose that a1,σ(1) a2,σ(2) · · · an,σ(n) = 0 for all permutations σ. By
Exercise 7.9.7 there exist permutation matrices P1 and P2 so that

B C
P1 AP2 = ,
0 D
where the 0 has size r × (n+ 1 − r). As B has n + 1 − r columns, all its entries sum
up to n + 1 − r. As B C has n − r rows, all its entries add up to n − r, leading to
a contradiction as the entries of C are nonnegative.
(iii) Let σ be as in the previous part, and put α = minj=1,...,n aj,σ(j) (> 0), and let Pσ be
the permutation matrix with a 1 in positions (1, σ(1)), . . . , (n, σ(n)) and zeros
1
elsewhere. Show that either A is a permutation matrix, or 1−α (A − αPσ ) is a
doubly stochastic matrix with fewer nonzero entries than A.
Answer: If A is not a permutation matrix, then α < 1. By the definition of α we have
that A − αPσ only has nonnegative entries. In addition, notice that each row and
1
column sum of A − αPσ is 1 − α. Thus 1−α (A − αPσ ) is doubly stochastic. Finally,
the entry in A that corresponds to α is zero in A − αPσ , and all zero entries in A are
1
still zero in A − αPσ . Thus 1−α (A − αPσ ) has fewer nonzero entries than A.
(iv) Prove
Theorem 7.9.9 (Birkhoff ) Let A be doubly stochastic. Then there exist a k ∈ N,
permutation matrices P1 , . . . , Pk and positive numbers α1 , . . . , αk so that
k
X
A = α1 P 1 + · · · + αk P k , αj = 1.
j=1
In other words, every doubly stochastic matrix is a convex combination of

permutation matrices.
(Hint: Use induction on the number of nonzero entries of A.)
Answer: Since A is doubly stochastic, every column has a nonzero entry, thus A has
at least n nonzero entries. If A has exactly n nonzero entries, then A is a
permutation matrix, and we are done. Next, suppose as our induction hypothesis
that Birkhoff’s theorem holds when A has at most l nonzero entries, where l ≥ n.
Next, let A have l + 1 nonzero entries. Then by the previous part we can identify a
1
permutation σ and an 0 < α < 1 so that Â = 1−α (A − αPσ ) is a doubly stochastic
matrix so that Â has at most l nonzero entries. By our induction assumption we
have that Â = k̂j=1 βj Pj with Pj permutation matrices and βj nonnegative so that
P
Pk̂
j=1 βj = 1. But then
k̂
X
A = αPσ + (1 − α)Â = αPσ + (1 − α)βj Pj
j=1
is of the desired form.
 
1/6 1/2 1/3
Exercise 7.9.10 Write the matrix 7/12 0 5/12 as a convex combination of
1/4 1/2 1/4
permutation matrices.
Answer:
       
1 0 0 0 1 0 0 1 0 0 0 1
1 1 1 1
0 0 1 + 1 0 0 + 0 0 1 + 1 0 0 .
6 0 1 0 4 0 0 1 4 1 0 0 3 0 1 0
Exercise 7.9.11 (a) Show that

A ? A
min rank = rank + rank B C .
B C B
(b) Show that the lower triangular partial matrix

A11 ?
 
A =  .. ..
 
. . 
An1 · · · Ann
has minimal min rank A equal to rank
Ai1 · · · Aii Ai+1,1 ... Ai+1,i
   
n n−1
X  . . X  . ..  .
rank  . ..  − rank  . (7.47)

. . . 
i=1 i=1
An1 . . . Ani An1 ··· Ani
Answer: We prove (b), as it will imply (a). For a matrix M , we let coli (M ) denote the ith
scalar column of the matrix M . For p = 1, . . . , n we let Jp ⊆ {1, . . . , µp } be a smallest
possible set such that the columns
App
 
coli  ..  , i ∈ Jp , (7.48)
 
.
Anp
satisfy
App Ap1 ··· Ap,p−1
     

 

Span coli  ..  : i ∈ Jp + Ran  .. ..
   

 . 
 . . 
Anp An1 · · · An,p−1
Ap1 ··· App

 
= Ran  .. ..  .

. . 
An1 ··· Anp
Note that the number of elements in Jp equals
Ap1 · · · App Ap1 · · · Ap,p−1
   
rank  .. ..  − rank  .. ..
.
 
. .   . .
An1 · · · Anp An1 · · · An,p−1
Thus n
P
p=1 cardJp equals the right-hand side of (7.47). It is clear that regardless of the
choice for Aij , i < j, the collection of columns
A1p
 
coli  ..  , i ∈ Jp , p = 1, . . . , n, (7.49)
 
.
Anp
will be linearly independent. This gives that the minimal rank is greater than or equal to
the right-hand side of (7.47). On the other hand, when one has identified the columns
(7.48) one can freely choose entries above these columns. Once such a choice is made,
every other column of the matrix can be written as a linear combination of the columns
(7.49), and thus a so constructed completion has rank equal to the right-hand side of
(7.47). This yields (7.47).
Exercise 7.9.12 Show that all minimal rank completions of

 
? ? ?
1 0 ? 
0 1 1
are  
x1 x2 x1 x3 + x2
1 0 x3 .
0 1 1
 
x1 x2 x4
1 0
Answer: Let  1 0 x3  be a completion. As is a submatrix, the ranks is at
0 1
0 1 1
least 2. For the rank to equal 2, we need that the determinant is 0. This leads to
x4 = x1 x3 + x2 .
Exercise 7.9.13 Consider the partial matrix

 
1 ? ?
A= ? 1 ? .
−1 ? 1
Show that there exists a completion of A that is a Toeplitz matrix of rank 1, but that such
a completion cannot be chosen to be real.
 
1 b c
Answer: Let  a 1 b  be a Toeplitz completion. For this to be of rank 1 we need all
−1 a 1

a 1
2 × 2 submatrices to have determinant 0. Thus 0 = det = a2 + 1, giving a = ±i,
−1 a

1 b b c
and thus a 6∈ R. Next 0 = det = 1 − ab, thus b = a1 . Finally, det = b2 − c,
a 1 1 b
 
1 −i −1
giving c = b2 . We find that  i 1 −i  is a rank 1 Toeplitz completion.
−1 i 1
Exercise 7.9.14 Consider the n × n tri-diagonal Toeplitz matrix

2 −1 0 ··· 0
 
−1 2 −1 · · · 0 
 
An =  .. .. .. .. ..  .

 . . . . . 

 0 · · · −1 2 −1
0 ··· 0 −1 2
π
Show that λj = 2 − 2 cos(jθ), j = 1, . . . , n, where θ = n+1 , are the eigenvalues. In
addition, an eigenvector associated with λj is
sin(jθ)
 
 sin(2jθ) 
vj =  .. .
 
 . 
sin(njθ)
Answer: Let k ∈ {1, . . . , n}, and compute sin(kjθ − jθ) + sin(kjθ + jθ) =
sin(kjθ) cos(jθ) − cos(kjθ) sin(jθ) + sin(kjθ) cos(jθ) + cos(kjθ) sin(jθ) =
2 sin(kjθ) cos(jθ).
Thus
− sin((k − 1)jθ) + 2 sin(kjθ) − sin((k + 1)jθ) = (2 − 2 cos(jθ)) sin(kjθ).
Using this, and the observation that for k = 1 we have sin((k − 1)jθ) = 0, and for k = n
we have sin((k + 1)jθ) = 0 (here is where the definition of θ is used), it follows that
An vj = (2 − 2 cos(jθ))vj , j = 1, . . . , n.
Exercise 7.9.15 Let A = (aij )n

i,j=1 ∈ C
n×n be given.

1 0
(a) Let U = ∈ Cn×n , with U1 ∈ C(n−1)×(n−1) a unitary matrix chosen so that
0 U1
a21 σ
   
v
 a31   0  u n
uX
U1  .  =  .  , σ = t |aj1 |2 .
   
 ..   .. 
j=2
an1 0
Show that U AU ∗ has the form

a11 ∗ ∗ ··· ∗
 
 σ ∗ ∗ ··· ∗
 0 ∗ ∗ ··· ∗ a11 ∗
 
U AU ∗ =  = .
 . .. .. ..  σe1 A1
 .. . . .
0 ∗ ∗ ··· ∗
(b) Show that there exists a unitary ∗
V so
that V AV is upper Hessenberg. (Hint: after

1 0 ∗ ∗ ∗
part (a), find a unitary U2 = so that U2 A1 U2 has the form ,
0 ∗ σ 2 e1 A 2
and observe that

1 0 1 0 1 0 1 0
Â = A ∗ ∗
0 U2 0 U1 0 U1 0 U2
has now zeros in positions (2, 1), . . . , (n, 1), (3, 2), . . . , (n, 2). Continue the process.)
Remark. If one puts a matrix in upper Hessenberg form before starting the QR algorithm,
it (in general) speeds up the convergence of the QR algorithm, so this is standard practice
when numerically finding eigenvalues.

a11 A12
Answer: (a) Writing A = , we have that
A21 A22

a11 σe1
U AU ∗ = ,
A21 U1∗ U1 A22 U1∗
and is thus of the required form.
(b) As U2 has the special form, the first column of Â coincides with the first column of
U AU ∗ , and has therefore zeros in positions (3, 1), . . . , (n, 1). Next, the second column of Â
below the main diagonal corresponds to σ2 e1 . Thus Â also has zeros in positions
(4, 2), . . . , (n, 2). Continuing this way, one can find Uk , k = 3, . . . , n − 2, making new zeros
in positions (k + 2, k), . . . , (n, k), while keeping the previously obtained zeros. Letting V
equal the product of the unitaries, we obtain the desired result.
Exercise 7.9.16 The adjacency matrix AG of a graph G = (V, E) is an n × n matrix,

where n = |V | is the number of vertices of the graph, and the entry (i, j) equals 1 when
{i, j} is an edge, and 0 otherwise. For instance, the graph in Figure 7.6 has adjacency
matrix
0 1 0 0 1 0
 
1 0 1 0 1 0
0 1 0 1 0 0
 
.
0 0 1 0 1 1

1 1 0 1 0 0
0 0 0 1 0 0
The adjacency matrix is a symmetric real matrix. Some properties of graphs can be
studied by studying associated matrices. In this exercise we show this for the so-called
chromatic number χ(G) of a graph G. It is defined as follows. A k-coloring of a graph is a
function c : V → {1, . . . , k} so that c(i) 6= c(j) whenever {i, j} ∈ E. Thus, there are k
colors and adjacent vertices should not be given the same color. The smallest number k so
that G has a k-coloring is defined to be the chromatic number χ(G) of the graph G.
(a) Find the chromatic number of the graph in Figure 7.6.

Answer: The answer is 3. Indeed, for the vertices 1, 2 and 5, which are all adjacent to
one another, we need at least three colors. Giving then 3 and 4 the same color as 1,
and 6 the same color as 2, yields a 3-coloring of the graph.
(b) The degree di of a vertex i is the number of vertices it is adjacent to. For instance, for
the graph in Figure 7.6 we have that the degree of vertex 1 is 2, and the degree of
T
∈ Rn . Show that eT AG e = i∈V di .
P
vertex 6 is 1. Let e = 1 · · · 1
Answer: Notice that di is equal to the sum of the entries in the Pith row of AG . Next,
eT AG e is the sum of all the entries of AG , which thus equals i∈V di .
(c) For a real number x let bxc denote the largest integer ≤ x. For instance, bπc = 3,
b−πc = −4, b5c = 5. Let α = λmax (AG ) be the largest eigenvalue of the adjacency
matrix of G. Show that G must have a vertex of degree at most bαc. (Hint: use
Exercise 5.7.21(b).)
Answer: If we take y = √1n e, we get that by Exercise 5.7.21 and part (b) that
1 X
α = max xT Ax ≥ yT Ay = di . (7.50)
hx,xi n i∈V
P
If every vertex i has the property that di > α, then i∈V di > nα, which contradicts
(7.50). Thus, for some i we have di ≤ α. As di is an integer, this implies di ≤ bαc.
(d) Show that
χ(G) ≤ bλmax (AG )c + 1, (7.51)
which is a result due to Herbert S. Wilf. (Hint: use induction and Exercise 5.7.21(c).)
Answer: Denote α = λmax (AG ). We use induction. When the graph has one vertex,
we have that AG = (0) and χ(G) = 1 (there is only one vertex to color), and thus
inequality (7.51) holds.
Let us assume that (7.51) holds for all graphs with at most n − 1 vertices, and let
G = (V, E) have n vertices. By part (c) there is a vertex i so that di ≤ bαc. Let us
remove vertex i (and the edges with endpoint i) from the graph G, to give us a graph
Ĝ = (V̂ , Ê). Notice that AĜ is obtained from AG by removing row and column i. By
Exercise 5.7.21(c) we have that λmax (AĜ ) ≤ λmax (AG ) = α. Using the induction
assumption on Ĝ (which has n − 1 vertices), we obtain that
χ(Ĝ) ≤ bλmax (AĜ )c + 1 ≤ bαc + 1.
Thus Ĝ has a (bαc + 1)-coloring. As the vertex i in G has degree ≤ bαc, there is at
least one color left for the vertex i, and thus we find that G also has a
(bαc + 1)-coloring.
Exercise 7.9.17 Let

2 2 2
3
0 0 0 3
0 0 0 3
α
0 3
0 0 0 0 0 0 0

0 5−α 
0 3
0 0 0 0 0 0
5−α
 
0 0 0 0 0 0 0 0
1 2
3
2

2,
ρα =  3
0 0 0 3
0 0 0 3
7
0 α
 0 0 0 0 3
0 0 0

0 α
 0 0 0 0 0 3
0 0

0 5−α
0 0 0 0 0 0 3
0
2 2 2
3
0 0 0 3
0 0 0 3
where 0 ≤ α ≤ 5. We want to investigate when ρα is 3 × 3 separable.
(a) Show that ρα passes the Peres test if and only if 1 ≤ α ≤ 4.

(b) Let
1 0 0 0 −1 0 0 0 −1
 
 0 0 0 0 0 0 0 0 0 
 0 0 2 0 0 0 0 0 0 
 
 0 0 0 2 0 0 0 0 0 
 
Z = −1 0 0 0 1 0 0 0 −1 .
 
 0 0 0 0 0 0 0 0 0 
 
 0 0 0 0 0 0 0 0 0 
 
 0 0 0 0 0 0 0 2 0 
−1 0 0 0 −1 0 0 0 1
Show that for x, y ∈ C3 we have that (x ⊗ y)∗ Z(x ⊗ y) ≥ 0.
1
(c) Show that tr(ρα Z) = 7
(3 − α), and conclude that ρα is not 3 × 3 separable for
3 < α ≤ 5.
Answer: (a) Applying the Peres test, we need to check whether

2 
3
0 0 0 0 0 0 0 0
0 α 2
 3
0 3
0 0 0 0 0
0 5−α 2
 0 3
0 0 0 3
0 0
0 2 5−α
3
0 3
0 0 0 0 0
1 
ρΓ = 0 0 0 0 2
0 0 0 0
 
α  3
7 α 2
0 0 0 0 0 0 0

3 3

0 2 α 
0 3
0 0 0 3
0 0
2 5−α
 
0 0 0 0 0 0 0
3 3
2
0 0 0 0 0 0 0 0 3
is positive semidefinite. This matrix is essentially a direct sum of diagonal elements and
the 2 × 2 submatrix α 2
3 3
2 5−α .
3 3
Computing its determinant we obtain that (4 − α)(α − 1) ≥ 0, which gives 1 ≤ α ≤ 4. It is
easy to see that for these values ρΓ
α is indeed positive semidefinite.
T
(b) Note that x ⊗ y = x1 y1 x1 y2 x1 y3 x2 y1 x2 y2 x2 y3 x3 y1 x3 y2 x3 y3 .
If we assume that |y2 | ≥ |y1 |, we write (x ⊗ y)∗ Z(x ⊗ y) as
|x1 y1 − x2 y2 + x3 y3 |2 + 2|x1 ȳ3 − x3 ȳ1 |2 + 2|x2 y1 |2 + 2|x3 |2 (|y2 |2 − |y1 |2 ),
which is nonnegative. The case |y1 | ≥ |y2 | can be dealt with in a similar manner.
(c) It is straightforward to compute that tr(ρα Z) = 17 (3 − α), which is negative when

3 < α ≤ 5. If ρα is separable, it can be written as ki=1 Ai ⊗ Bi , with Ai and Bi positive
P
semidefinite. As each positive semidefinite can be written as lj=1 vj vj∗ , with vj vectors,
P
we can actually write the separable ρα as
Xs
ρα = xj x∗j ⊗ yj yj∗ ,
j=1
where xj , yj ∈ C3 , j = 1, . . . , s. Observe now that (b) yields

tr((xj x∗j ⊗ yj yj∗ )Z) = (x∗j ⊗ yj∗ )Z(xj ⊗ yj ) ≥ 0,
which implies
s
X
tr(ρα Z) = tr( xj x∗j ⊗ yj yj∗ )Z ≥ 0.
j=1
When 3 < α ≤ 5, we have reached a contradiction, thus ρα is not separable for these
values of α.
Bibliography
• S. Axler, Linear algebra done right. Second edition. Undergraduate Texts

in Mathematics. Springer-Verlag, New York, 1997.
• R. Bhatia, Matrix analysis. Graduate Texts in Mathematics, 169.
Springer-Verlag, New York, 1997.
• J. B. Carrell, Fundamentals of linear algebra,
http://www.math.ubc.ca/ carrell/NB.pdf.
• K. Hoffman, R. Kunze, Linear algebra. Second edition Prentice-Hall,
Inc., Englewood Cliffs, N.J. 1971.
• R. A. Horn, C. R. Johnson, Matrix analysis. Second edition. Cambridge
University Press, Cambridge, 2013.
• R. A. Horn, C. R. Johnson, Topics in matrix analysis. Corrected reprint
of the 1991 original. Cambridge University Press, Cambridge, 1994.
• S. H. Friedberg, A. J. Insel, L. E. Spence, Linear algebra. Second edition.
Prentice Hall, Inc., Englewood Cliffs, NJ, 1989.
• P. Lancaster, M. Tismenetsky, The theory of matrices. Second edition.
Computer Science and Applied Mathematics. Academic Press, Inc.,
Orlando, FL, 1985.
• P. D. Lax, Linear algebra. Pure and Applied Mathematics (New York).

A Wiley-Interscience Publication. John Wiley & Sons, Inc., New York,
1997.
• D. C. Lay, Linear algebra and its applications, Third Edition.
Addison-Wesley, 2003.
• M. Marcus, Finite dimensional multilinear algebra. Part 1. Pure and
Applied Mathematics, Vol. 23. Marcel Dekker, Inc., New York, 1973.
• B. Noble, J. W. Daniel, Applied linear algebra. Second edition.
Prentice-Hall, Inc., Englewood Cliffs, N.J., 1977.
• G. Strang, Linear algebra and its applications. Second edition. Academic
Press [Harcourt Brace Jovanovich, Publishers], New York–London, 1980.
323
324 Bibliography
• S. Treil, Linear algebra done wrong,

http://www.math.brown.edu/ treil/papers/LADW/LADW.html.
• F. Zhang, Matrix theory: Basic results and techniques, Second edition,

Universitext. Springer, New York, 2011.

Advanced Linear Algebra PDF

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Advanced Linear Algebra PDF

Загружено:

Авторское право:

Доступные форматы

TEXTBOOKS in MATHEMATICS

LINEAR ALGEBRA, GEOMETRY AND TRANSFORMATION

No claim to original U.S. Government works

International Standard Book Number-13: 978-1-4987-5404-0 (eBook - PDF)

Preface to the Instructor xi

Preface to the Student xiii

List of Figures xxi

1 Fields and Matrix Algebra 1

1.1 The field Z3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 The field axioms . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Field examples . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3.1 Complex numbers . . . . . . . . . . . . . . . . . . . . 7

1.3.2 The finite field Zp , with p prime . . . . . . . . . . . . 9

1.4 Matrix algebra over different fields . . . . . . . . . . . . . . . 11

1.4.1 Reminders about Cramer’s rule and the adjugate

2.1 Definition of a vector space . . . . . . . . . . . . . . . . . . . 27

2.2 Vector spaces of functions . . . . . . . . . . . . . . . . . . . . 29

2.2.1 The special case when X is finite . . . . . . . . . . . . 31

2.3 Subspaces and more examples of vector spaces . . . . . . . . 32

2.3.1 Vector spaces of polynomials . . . . . . . . . . . . . . 34

2.3.2 Vector spaces of matrices . . . . . . . . . . . . . . . . 36

2.4 Linear independence, span, and basis . . . . . . . . . . . . . 37

2.5 Coordinate systems . . . . . . . . . . . . . . . . . . . . . . . 45

3.1 Definition of a linear transformation . . . . . . . . . . . . . . 55

3.2 Range and kernel of linear transformations . . . . . . . . . . 57

3.3 Matrix representations of linear maps . . . . . . . . . . . . . 61

4 The Jordan Canonical Form 69

4.1 The Cayley–Hamilton theorem . . . . . . . . . . . . . . . . . 69

4.2 Jordan canonical form for nilpotent matrices . . . . . . . . . 71

4.3 An intermezzo about polynomials . . . . . . . . . . . . . . . 75

4.4 The Jordan canonical form . . . . . . . . . . . . . . . . . . . 78

4.5 The minimal polynomial . . . . . . . . . . . . . . . . . . . . 82

4.6 Commuting matrices . . . . . . . . . . . . . . . . . . . . . . 84

4.7 Systems of linear differential equations . . . . . . . . . . . . 87

4.8 Functions of matrices . . . . . . . . . . . . . . . . . . . . . . 90

4.9 The resolvent . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

4.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

5 Inner Product and Normed Vector Spaces 109

5.1 Inner products and norms . . . . . . . . . . . . . . . . . . . . 109

5.2 Orthogonal and orthonormal sets and bases . . . . . . . . . . 119

5.3 The adjoint of a linear map . . . . . . . . . . . . . . . . . . . 122

5.4 Unitary matrices, QR, and Schur triangularization . . . . . . 125

5.5 Normal and Hermitian matrices . . . . . . . . . . . . . . . . 128

5.6 Singular value decomposition . . . . . . . . . . . . . . . . . . 132

5.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

6 Constructing New Vector Spaces from Given Ones 147

6.1 The Cartesian product . . . . . . . . . . . . . . . . . . . . . 147

6.2 The quotient space . . . . . . . . . . . . . . . . . . . . . . . 149

6.3 The dual space . . . . . . . . . . . . . . . . . . . . . . . . . . 157

6.4 Multilinear maps and functionals . . . . . . . . . . . . . . . . 166

6.5 The tensor product . . . . . . . . . . . . . . . . . . . . . . . 168

6.6 Anti-symmetric and symmetric tensors . . . . . . . . . . . . 179

6.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

7 How to Use Linear Algebra 195

7.2 Algorithms based on matrix vector products . . . . . . . . . 198

7.3 Why use matrices when computing roots of polynomials? . . 203

7.4 How to find functions with linear algebra? . . . . . . . . . . 209

7.5 How to deal with incomplete matrices . . . . . . . . . . . . . 217

7.6 Solving millennium prize problems with linear algebra . . . . 222

7.6.1 The Riemann hypothesis . . . . . . . . . . . . . . . . . 223

7.6.2 P vs. NP . . . . . . . . . . . . . . . . . . . . . . . . . 225

7.7 How secure is RSA encryption? . . . . . . . . . . . . . . . . 229

7.8 Quantum computation and positive maps . . . . . . . . . . . 232