Академический Документы
Профессиональный Документы
Культура Документы
ADVANCED
LINEAR ALGEBRA
Hugo J. Woerdeman
Drexel University
Philadelphia, Pennsylvania, USA
TEXTBOOKS in MATHEMATICS
Series Editors: Al Boggess and Ken Rosen
PUBLISHED TITLES
ABSTRACT ALGEBRA: AN INQUIRY-BASED APPROACH
Jonathan K. Hodge, Steven Schlicker, and Ted Sundstrom
APPLIED ABSTRACT ALGEBRA WITH MAPLE™ AND MATLAB®, THIRD EDITION
Richard Klima, Neil Sigmon, and Ernest Stitzinger
APPLIED DIFFERENTIAL EQUATIONS: THE PRIMARY COURSE
Vladimir Dobrushkin
COMPUTATIONAL MATHEMATICS: MODELS, METHODS, AND ANALYSIS WITH MATLAB® AND MPI,
SECOND EDITION
Robert E. White
DIFFERENTIAL EQUATIONS: THEORY, TECHNIQUE, AND PRACTICE, SECOND EDITION
Steven G. Krantz
DIFFERENTIAL EQUATIONS: THEORY, TECHNIQUE, AND PRACTICE WITH BOUNDARY VALUE PROBLEMS
Steven G. Krantz
DIFFERENTIAL EQUATIONS WITH MATLAB®: EXPLORATION, APPLICATIONS, AND THEORY
Mark A. McKibben and Micah D. Webster
ELEMENTARY NUMBER THEORY
James S. Kraft and Lawrence C. Washington
EXPLORING LINEAR ALGEBRA: LABS AND PROJECTS WITH MATHEMATICA®
Crista Arangala
GRAPHS & DIGRAPHS, SIXTH EDITION
Gary Chartrand, Linda Lesniak, and Ping Zhang
INTRODUCTION TO ABSTRACT ALGEBRA, SECOND EDITION
Jonathan D. H. Smith
INTRODUCTION TO MATHEMATICAL PROOFS: A TRANSITION TO ADVANCED MATHEMATICS, SECOND EDITION
Charles E. Roberts, Jr.
INTRODUCTION TO NUMBER THEORY, SECOND EDITION
Marty Erickson, Anthony Vazzana, and David Garth
PUBLISHED TITLES CONTINUED
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2016 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business
This book contains information obtained from authentic and highly regarded sources. Reasonable
efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication
and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information stor-
age or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copy-
right.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222
Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that pro-
vides licenses and registration for a variety of users. For organizations that have been granted a photo-
copy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
To my very supportive family:
Dara, Sloane, Sam, Ruth, and Myra.
Contents
Acknowledgments xv
Notation xvii
1.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2 Vector Spaces 27
vii
viii Contents
2.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3 Linear Transformations 55
3.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
7.1 Matrices you can’t write down, but would still like to use . . 196
Bibliography 323
Index 325
Preface to the Instructor
This book is intended for a second linear algebra course. Students are
expected to be familiar with (computational) material from a first linear
algebra course: matrix multiplication; row reduction; pivots; solving systems
of linear equations; checking whether a vector is a linear combination of
other vectors; finding eigenvalues and eigenvectors; finding a basis of a
nullspace, column space, row space, and eigenspace; computing
determinants; and finding inverses. The assumption is that so far they have
worked over the real numbers R.
In my view, the core material in this book is the following and takes about
24 academic hours1 of lectures:
xi
xii Preface to the Instructor
• Chapter 4, Sections 4.6 and 4.7: These two sections are independent
of one another, and each takes about 1 hour. Clearly, how the Jordan
canonical form helps in solving differential equations is a classical one for
this course. The result on commuting matrices is one that sometimes
makes it into my course, but other times does not. (1–2 hours)
I hope that my suggestions are helpful, and that you find this a useful text
for your course. I would be very happy to hear from you! I realize that it
takes a special effort to provide someone with constructive criticism, so when
you take time to do that, I will be especially appreciative.
Preface to the Student
I think that linear algebra is a great subject, and I strongly hope that you
(will) agree. It has a strong theoretical side, ample opportunity to explore
the subject with computation, and a (continuously growing) number of great
applications. With this book, I hope to do justice to all these aspects. I chose
to treat the main concepts (vector space and linear transformations) in their
full abstraction. Abstraction (taking operations out of their context, and
studying them on their own merit) is really the strength of mathematics;
how else can a theory that started in the 18th and 19th centuries have all
these great 21st-century applications (web search engines, data mining,
etc.)? In addition, I hope that when you are used to the full abstraction of
the theory, it will allow you to think of possibilities of applying the theory in
the broadest sense. And, maybe as a more direct benefit, I hope that it will
help when you take abstract algebra. Which brings me to my last point.
While current curriculum structure has different mathematical subfields
neatly separated, this is not reality. Especially when you apply mathematics,
you will need to pull from different areas of mathematics. This is why this
book does not shy away from occasionally using some calculus, abstract
algebra, real analysis and (a little bit of) complex analysis.
Just a note regarding the exercises: I have chosen to include full solutions to
almost all exercises. It is up to you how you use these. Of course, if
increasingly you rely less on these solutions, the better it is. There are a few
exercises (designated as “Honors”) for which no solution is included. These
are somewhat more challenging. Try them and if you succeed, use them to
impress your teacher or yourself!
xiii
Acknowledgments
This book came about while I was immobile due to an ankle fracture. So,
first of all, I would like to thank my wonderful wife Dara and my great kids
Sloane, Sam, Ruth, and Myra, for taking care of me during the four months
I spent recovering. I am very thankful to my colleagues at Drexel University:
Dannis Yang, who used a first version of this text for his course, for
providing me with detailed comments on Chapters 1–5; Shari Moskow, R.
Andrew Hicks, and Robert Boyer for their feedback on the manuscript, and
in Robert Boyer’s case for also providing me with one of the figures. In
addition, I am grateful to graduate student Charles Burnette for his
feedback. I am also very thankful to those at CRC Press who helped me
bring this manuscript to publication. Finally, I would like to thank you, the
reader, for picking up this book. Without you there would have been no
point to produce this. So, MANY THANKS to all of you!
xv
Notation
• N = {1, 2, 3, . . .}
• N0 = {0, 1, 2, . . .}
• Z = the set of all integers
• Q = the field of rational numbers
• R = the field of real numbers
• R(t) = the field of real rational functions (in t)
• C = the field of complex numbers
• Re z = real part of z
• Im z = imaginary part of z
• z̄ = complex conjugate of z
• |z| = absolute value (modulus) of z
• Zp (with p prime) = the finite field {0, 1, . . . , p − 1}
• rem(q|p) = remainder of q after division by p
• F = a generic field
• det(A) = the determinant of the matrix A
• tr(A) = the trace of a matrix A (= the sum of its diagonal entries)
• adj(A) = the adjugate of the matrix A
• rank(A) = the rank of a matrix A
• F[X] = the vector space of polynomials in X with coefficients in F
• Fn [X] = the vector space of polynomials of degree ≤ n in X with
coefficients in F
xvii
xviii Notation
• +̇ = direct sum
• pA (t) = the characteristic polynomial of the matrix A
• mA (t) = the minimal polynomial of the matrix A
Notation xix
• σj (A) = the jth singular value of the matrix A, where σ1 (A) = kAk is
the largest singular value
• ρ(A) = max{|λ| : λ is an eigenvalue of A} is the spectral radius of A
• PSDn = {A ∈ Cn×n : A is positive semidefinite} ⊆ Hn
• v + W = {v + w : w ∈ W } = {x : x − v ∈ W }
• V /W = {v + W : v ∈ V }, the quotient space
• V 0 = the dual space of V
• L(V, W ) = {T : V → W : T is linear}
• v ⊗ w = the tensor product of v and w
• v ∧ w = the anti-symmetric tensor product of v and w
• v ∨ w = the symmetric tensor product of v and w
• A[P, Q] = (aij )i∈P,j∈Q , a submatrix of A = (aij )i,j
List of Figures
These are the roots of the polynomial 10,000 pk (10, 000)xk , where
P
7.1 k=1
pk (n) is the number of partitions of n in k parts, which is the number
of ways n can be written as the sum of k positive integers. . . . . 207
7.4 The original image (of size 3000 × 4000 × 3). . . . . . . . . . . . 217
5.7 The original image (of size 672 × 524 × 3). . . . . . . . . . . . . . 299
xxi
1
Fields and Matrix Algebra
CONTENTS
1.1 The field Z3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 The field axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Field examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.1 Complex numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.2 The finite field Zp , with p prime . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Matrix algebra over different fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4.1 Reminders about Cramer’s rule and the adjugate
matrix. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
The central notions in linear algebra are vector spaces and linear
transformations that act between vector spaces. We will define these notions
in Chapters 2 and 3, respectively. But before we can introduce the general
notion of a vector space we need to talk about the notion of a field. In your
first Linear Algebra course you probably did not worry about fields because
it was chosen to only talk about the real numbers R, a field you have been
familiar with for a long time. In this chapter we ask you to get used to the
general notion of a field, which is a set of mathematical objects on which you
can define algebraic operations such as addition, subtraction, multiplication
and division with all the rules that also hold for real numbers
(commutativity, associativity, distributivity, existence of an additive neutral
element, existence of an additive inverse, existence of a multiplicative neutral
element, existence of a multiplicative inverse for nonzeros). We start with an
example.
1
2 Advanced Linear Algebra
Let us consider the set Z3 = {0, 1, 2}, and use the following tables to define
addition and multiplication:
+ 0 1 2 . 0 1 2
0 0 1 2 , 0 0 0 0
1 1 2 0 1 0 1 2
2 2 0 1 2 0 2 1
What you notice in the table is that when you add 0 to any number, it does
not change that number (namely, 0 + 0 = 0, 0 + 1 = 1, 1 + 0 = 1, 0 + 2 = 2,
2 + 0 = 2). We say that 0 is the neutral element for addition. Analogously, 1
is the neutral element for multiplication, which means that when we multiply
a number in this field by 1, it does not change that number (0 · 1 = 0,
1 · 2 = 2, etc.). Every field has these neutral elements, and they are typically
denoted by 0 and 1, although there is no rule that you have to denote them
this way.
Another important observation is that in the core part of the addition table
+
0 1 2
1 2 0
2 0 1
the 0 appears exactly once in every row and column. What this means is
that whatever x we choose in Z3 = {0, 1, 2}, we can always find exactly one
y ∈ Z3 so that
x + y = 0.
We are going to call y the additive inverse of x, and we are going to write
y = −x. So
0 = −0, 2 = −1, 1 = −2.
It is important to keep in mind that the equation y = −x is just a shorthand
of the equation x + y = 0. So, whenever you wonder “what does this −
mean?,” you have to go back to an equation that only involves + and look at
Fields and Matrix Algebra 3
how addition is defined. One of the rules in any field is that any element of a
field has an additive inverse.
How about multiplicative inverses? For real numbers, any number has a
multiplicative inverse except for 0. Indeed, no number x satisfies x · 0 = 1!
In other fields, the same holds true. This means that in looking at the
multiplication table for multiplicative inverses, we should only look at the
part that does not involve 0:
·
1 2
2 1
And here we notice that 1 appears exactly once in each row and column.
This means that whenever x ∈ Z3 \ {0} = {1, 2}, there exists exactly one y
so that
x · y = 1.
We are going to call y the multiplicative inverse of x, and denote this as x−1 .
Thus
1−1 = 1, 2−1 = 2.
+ : F × F → F, · : F × F → F
First notice that any field has at least two elements, namely 0, 1 ∈ F, and
part of rule 9 is that 0 6= 1. Next, notice that rules 1–5 only involve addition,
while rules 6–10 only involve multiplication. The distributive law is the only
one that combines both addition and multiplication. In an Abstract Algebra
course, one studies various other mathematical notions that involve addition
and/or multiplication where only some of the rules above apply.
• The rational numbers Q, which are all numbers of the form pq , where
p ∈ Z = {. . . , −2, −1, 0, 1, 2, . . . } and q ∈ N = {1, 2, 3, . . .}. Again,
addition and multiplication are defined as usual. We assume that you are
familiar with this field as well. In fact, Q is a field that is also a subset of
the field R, with matching definitions for addition and multiplication. We
say that Q is a subfield of R.
• The complex numbers C, which consist of numbers a + bi, where a, b ∈ R
and i2 = −1. We will dedicate the next subsection to this field.
• The finite fields Zp , where p is a prime number. We already introduced
you to Z3 , and later in this section we will see how for any prime number
one can define a field Zp , where addition and multiplication are defined
via the usual addition and multiplication of integers followed by taking
the remainder after division by p.
• The field R(t) of rational functions with real coefficients and independent
variable t. This field consists of functions r(t)
s(t) where r(t) and s(t) are
polynomials in t, with s(t) not being the constant 0 polynomial. For
instance,
13t2 + 5t − 8 5t10 − 27
, (1.1)
t8 − 3t5 t+5
are elements of R(t). Addition and multiplication are defined as usual.
We are going to assume that you will be able to work with this field. The
only thing that requires some special attention, is to think about the
neutral elements. Indeed, the 0 in this field is the constant function 0,
where r(t) ≡ 0 for all t and s(t) ≡ 1 for all t. The 1 in this field is the
constant function 1, where r(t) ≡ 1 for all t and s(t) ≡ 1 for all t. Now
sometimes, these elements appear in “hidden” form, for instance,
0 t+5
≡ 0, ≡ 1.
t+1 t+5
In calculus you had to worry that t+5t+5 is not defined at t = −5, but in
this setting we always automatically get rid of common factors in the
numerator and denominator. More formally, R(t) is defined as the field
r(t)
s(t) , where r(t) and s(t) 6≡ 0 are polynomials in t that do not have a
common factor. If one insists on uniqueness in the representation r(t)
s(t) ,
one can, in addition, require that s(t) is monic which means that the
highest power of t has a coefficient 1 (as is the case in (1.1)).
Fields and Matrix Algebra 7
C = {a + bi ; a, b ∈ R},
where in the last step we used that i2 = −1. It may be obvious, but we
should state it clearly anyway: two complex numbers a + bi and c + di, with
a, b, c, d ∈ R are equal if and only if a = c and b = d. A typical complex
number may be denoted by z or w. When
z = a + bi with a, b ∈ R,
we say that the real part of z equals a and the imaginary part of z equals b.
The notation for this is,
Re z = a, Im z = b.
1 6i 1 6i
2 + 3i = 2 − 3i, + = − .
2 5 2 5
Thus, we have
Re z = Re z, Im z = −Im z.
Finally, we introduce the absolute value or modulus of z, via
p
|a + bi| := a2 + b2 , a, b, ∈ R.
For example,
√
√
r
1 i 1 1 2
|1 + 3i| = 10, | − | = + = .
2 2 4 4 2
Note that we have the rule
zz = |z|2 ,
as observed in (1.3), and its consequence
1 z
= 2
z |z|
when z 6= 0.
Using the rules for cos(t + s) and sin(t + s), one can easily check that
Addition and multiplication in the field Zp are based on the following result
you discovered in elementary school when you did long division.
Proposition 1.3.1 For every q ∈ Z and every p ∈ {2, 3, . . .}, there exists
unique a ∈ Z and r ∈ {0, 1, . . . , p − 1} so that
q = ap + r.
10 Advanced Linear Algebra
Let now p be a prime number, and let Zp = {0, 1, . . . , p − 1}. Define the
addition and multiplication
+ : Zp × Zp → Zp , · : Zp × Zp → Zp
via
a + b := rem(a + b|p), a · b := rem(ab|p). (1.4)
Proposition 1.3.1 guarantees that for any integer q we have that
rem(q|p) ∈ {0, . . . , p − 1} = Zp , so that the closure rules are clearly satisfied.
Also, as expected, 0 and 1 are easily seen to be the neutral elements for
addition and multiplication, respectively. Next, the additive inverse −a of a
is easily identified via
(
a if a = 0
−a =
p − a if a ∈ {1, . . . , p − 1}.
The trickier part is the multiplicative inverse, and here we are going to use
that p is prime. We need to remind you of the following rule for the greatest
common divisor gcd(a, b) of two integers a and b, not both zero.
1 = 5 − 2 · 2 = 5 − 2(17 − 3 · 5) = −2 · 17 + 7 · 5,
and find that with the choices m = −2 and n = 7 we have solved (1.5).
As said, the trickiest part of the proof of Theorem 1.3.4 is the existence of a
multiplicative inverse, so the remainder of the proof we leave to the reader.
All the matrix algebra techniques that you learned in the first Linear
Algebra course carry over to any field. Indeed, these algebra techniques were
12 Advanced Linear Algebra
One notable exception where R differs from the other fields we are
considering, is that R is an ordered field (that is, ≥ defines an order relation
on pairs of real numbers, that satisfies x ≥ y ⇒ x + z ≥ z + y and
x, y ≥ 0 ⇒ xy ≥ 0). So anytime we want to use ≤, <, ≥ or >, we will have
to make sure we are dealing with real numbers. We will do this when we talk
about inner products and related concepts in Chapter 5.
in row echelon form. We start with the (1, 1) element as our first pivot.
1 0 2 1 0 2
2 3 1 → 0 3 (1 − 4 =)2 .
1 4 0 0 4 (0 − 2 =)3
Next, let us multiply the second row with 3−1 = 2, and use the (2, 2) entry
as our next pivot:
1 0 2 1 0 2
0 1 4 → 0 1 4 ,
0 4 3 0 0 (3 − 1 =)2
bringing it to row echelon form. After having done this, we can now also
easily compute
1 0 2 1 0 2
det 2 3
1 = 3 det 0 1 4 = 3 · 2 = 1.
1 4 0 0 0 2
Example 1.4.4 Let F = Z3 . Find the set of all solutions to the system of
linear equations
x1 + 2x2 =0
.
x1 + x2 + x3 = 1
We set up the associated augmented system and put it in row reduced
echelon form:
1 2 0 0 1 2 0 0 1 0 2 2
→ → .
1 1 1 1 0 2 1 1 0 1 2 2
We find that columns 1 and 2 are pivot columns, and column 3 is not, so x3
is a free variable, and we get the equalities
x1 = 2 − 2x3 = 2 + x3 , x2 = 2 − 2x3 = 2 + x3 . So we find that all solutions
are given by
x1 2 1
x = x2 = 2 + x3 1 , x3 ∈ Z3 .
x3 0 1
In a typical Linear Algebra I course, systems of linear equations would be
over the field of real numbers, and as soon as there was a free variable, one
14 Advanced Linear Algebra
would have infinitely many solutions. This is due to R being an infinite field.
In this example, though, we are dealing with a finite field, and thus when we
let x3 range over all elements of Z3 , we only get a finite number of solutions.
This will happen when dealing with any finite field. In this case, all solutions
are found by letting x3 = 0, 1, 2, thus we get that
2 0 1
2 , 0 , 1
0 1 2
0 3 3 3+i
We set up the augmented system and put it in echelon form:
1+i 0 −1 + i 2i
−1 − i 2−i 3 − 2i 2 − 3i
→
2 −1 + 2i −1 + 4i 1
0 3 3 3+i
1 0 i 1+i
−1 − i 2−i 3 − 2i 2 − 3i
→
2 −1 + 2i −1 + 4i 1
0 3 3 3+i
1 0 i 1+i
0
2−i 2−i 2−i →
0 −1 + 2i −1 + 2i −1 − 2i
0 3 3 3+i
1 0 i 1+i 1 0 i 1+i
0 1 1 1 0 1 1 1
−3+4i →
.
0 1 1
5
0 0 0 1
0 3 3 3+i 0 0 0 0
As the augmented column has a pivot, b is not a linear combination of
a1 , a2 , a3 .
By Example 1.4.3 we know that this matrix is invertible, as every row and
column has a pivot (or equivalently, since its determinant is nonzero). Let us
compute the inverse:
1 0 2 1 0 0 1 0 2 1 0 0
2 3 1 0 1 0 → 0 3 (1 − 4 =)2 (0 − 2 =)3 1 0 →
1 4 0 0 0 1 0 4 (0 − 2 =)3 (0 − 1 =)4 0 1
1 0 2 1 0 0
0 1 4 1 2 0 →
0 0 (3 − 1 =)2 (4 − 4 =)0 (0 − 3 =)2 1
1 0 2 1 0 0 1 0 0 1 0−2 0−1
0 1 4 1 2 0 → 0 1 0 1 2 − 4 0 − 2 ,
0 0 1 0 1 3 0 0 1 0 1 3
so the inverse is
1 3 4
1 3 3 .
0 1 3
Computing the product
1 0 2 1 3 4 1 0 0
2 3 1 1 3 3 = 0 1 0 ,
1 4 0 0 1 3 0 0 1
Example 1.4.7 Let F = C. Find bases of the column space, row space and
null space of the matrix
i 1−i 2−i
A = 1 + i −2 −3 + i .
1 − i 1 + 2i 3 + 3i
simply have to pick the nonzero rows of the row echelon form of A, and thus
we find that
{ 1 −1 − i −1 − 2i , 0 1 2 }
is a basis for RowA. To find a basis for the null space, we put A in row
reduced echelon form:
1 −1 − i −1 − 2i 1 0 1
0 1 2 → 0 1 2 .
0 0 0 0 0 0
Example
1.4.8
Let F = Z7 . Find a basis for the eigenspace of
4 0 6
A = 3 0 3 corresponding to the eigenvalue λ = 3.
2 5 5
1 0 6
We have to find a basis for the null space of A − 3I = 3 4 3 , so we
2 5 2
Fields and Matrix Algebra 17
−6 − 6i
.
−8 − 4i + α(−7 + 5i)
−8−4i 18 34i
6 0, we need α 6=
For det A = 7−5i = − 37 − 37 . Next, notice that x2 cannot
Fields and Matrix Algebra 19
We get
3 · 0 − 1 · 4 −0 · 0 + 2 · 4 0 · 1 − 2 · 3 1 3 4
adj(A) = −2 · 0 + 1 · 1 1 · 0 − 2 · 1 −1 · 1 + 2 · 2 = 1 3 3 .
2 · 4 − 3 · 1 −1 · 4 + 0 · 1 1 · 3 − 0 · 2 0 1 3
where we expanded det Ai (aj ) along the ith column. This proves the second
equality in (1.10). The proof of the first equality in (1.10) is similar.
1.5 Exercises
Exercise 1.5.1 The set of integers Z with the usual addition and
multiplication is not a field. Which of the field axioms does Z satisfy, and
which one(s) are not satisfied?
Exercise 1.5.2 Write down the addition and multiplication tables for Z2
and Z5 . How is commutativity reflected in the tables?
Exercise 1.5.3 The addition and multiplication defined in (1.4) also works
when p is not prime. Write down the addition and multiplication tables for
Z4 . How can you tell from the tables that Z4 is not a field?
Exercise 1.5.4 Solve Bezout’s identity for the following choices of a and b:
(i) a = 25 and b = 7;
Fields and Matrix Algebra 21
(i) 2 + 2 + 2 =
(ii) 2(2 + 2)−1 =
(iii) Solve for x in 2x + 1 = 2.
1 2
(iv) Find det .
1 0
1 2 1 1
(v) Compute .
0 2 2 1
−1
2 0
(vi) Find .
1 1
(i) 4 + 3 + 2 =
(ii) 4(1 + 2)−1 =
(iii) Solve for x in 3x + 1 = 3.
4 2
(iv) Find det .
1 0
1 2 0 1
(v) Compute .
3 4 2 1
−1
2 2
(vi) Find .
4 3
Exercise 1.5.7 In this exercise we are working in the field C. Make sure
you write the final answers in the form a + bi, with a, b ∈ R. For instance,
1+i
2−i should not be left as a final answer, but be reworked as
Exercise 1.5.8 Here the field is R(t). Find the inverse of the matrix
1
2 + 3t t2 +2t+1
3t−4 ,
t+1 1+t
if it exists.
Exercise 1.5.12 Let F = Z3 . Find the set of all solutions to the system of
linear equations
2x1 + x2 =1
.
2x1 + 2x2 + x3 = 0
Fields and Matrix Algebra 23
1 −3 3 1
Exercise 1.5.15 Let F = C. Find bases of the column space, row space and
null space of the matrix
1 1+i 2
A = 1 + i 2i 3 + i .
1−i 2 3 + 5i
Exercise
1.5.16
Let F = Z7 . Find a basis for the eigenspace of
3 5 0
A = 4 6 5 corresponding to the eigenvalue λ = 1.
2 2 4
Exercise 1.5.17 Let F = Z3 . Use Cramer’s rule to find the solution to the
system of linear equations
2x1 + 2x2 = 1
.
x1 + 2x2 = 1
Exercise 1.5.20 Recall that the trace of a square matrix is definedPto be the
n
sum of its diagonal entries. Thus tr[(aij )ni,j=1 ] = a11 + · · · + ann = j=1 ajj .
Exercise 1.5.22 The 10-digit ISBN number makes use of the field
Z11 = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, X} (notice that X is the roman numeral for
10). The first digit(s) present the group. For English-speaking countries, the
first digit is a 0 or a 1. The next set of digits represents the publisher. For
instance, Princeton University Press has the digits 691. Some of the bigger
publishers have a 2-digit publisher code, leaving them more digits for their
titles. The next set of digits represent the specific title. Finally, the last digit
of the 10-digit ISBN number a1 a2 . . . a10 is a check digit, which needs to
satisfy the equation
1 · 0 + 2 · 6 + 3 · 9 + 4 · 1 + 5 · 1 + 6 · 2 + 7 · 8 + 8 · 8 + 9 · 9 + 10 · 8 = rem(341|11) = 0.
Check that the 10-digit ISBN number 3034806388 has a correct check digit.
Fields and Matrix Algebra 25
If we do this for the whole sentence, putting the numbers in groups of three,
adding spaces (=0) at the end to make sure we have a multiple of three, we
have that “Wow, he said. ” (notice the two spaces at the end) converts to
“YODQTMHZYFMLYYG.” In order to decode. one performs the same
algorithm with
9 9 10
A−1 = 17 27 21 .
4 7 27
Decode the word “ZWNOWQJJZ.”
Exercise 1.5.24 (Honors) The field axioms imply several things that one
might take for granted but that really require a formal proof. In this
exercise, we address the uniqueness of the neutral element and the inverse.
(ii) Prove uniqueness of the additive inverse. To do this, one needs to show
that if x + y = 0 = x + z, it implies that y = z. Of course, it is tempting
to just remove the x’s from the equation x + y = x + z (as you are used
to), but the exact purpose of this exercise is to make you aware that
these familiar rules need to be reproven by exclusively using the field
axioms. So use exclusively the fields axioms to fill in the blanks:
y = y + 0 = y + (x + z) = · · · = · · · = · · · = z.
(i) 0, 1 ∈ K, and
6 0, x−1 also
(ii) x, y ∈ K implies x + y, xy, −x belong to K, and when x =
belongs to K.
√ √
√ 1.5.26 (Honors) Let Q + Q 2 := {a + b 2 : a, b ∈ Q}. So
Exercise
Q + Q 2 contains elements such as
√ √
5 2 1 1 2−3 2 1 3√
− + and √ = √ . √ =− + 2.
6 2 2+3 2 2+3 2 2−3 2 7 14
√
Show that Q + Q 2 is a subfield of R.
Formulate the statements about polynomials and their roots that would need
to be proven to show that A is closed under addition and multiplication. It
turns out that A is a subfield of C, and you are welcome to look up the proof.
2
Vector Spaces
CONTENTS
2.1 Definition of a vector space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2 Vector spaces of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2.1 The special case when X is finite . . . . . . . . . . . . . . . . . . . . . . . 31
2.3 Subspaces and more examples of vector spaces . . . . . . . . . . . . . . . . . . 32
2.3.1 Vector spaces of polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.3.2 Vector spaces of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.4 Linear independence, span, and basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.5 Coordinate systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
The foundation for linear algebra is the notion of a vector space over a field.
Two operations are important in a vector space (i) addition: any two
elements in a vector space can be added together; (ii) multiplication by a
scalar: an element in a vector space can be multiplied by a scalar (= an
element of the field). Anytime one has mathematical objects where these two
operations are well-defined and satisfy some basic properties, one has a
vector space. Allowing this generality and developing a theory that just uses
these basic rules, leads to results that can be applied in many settings.
+ : V × V → V, · : F × V → V
27
28 Advanced Linear Algebra
These axioms imply several rules that seem “obvious,” but as all properties
in vector spaces have to be traced back to the axioms, we need to reprove
these obvious rules. Here are two such examples.
Lemma 2.1.1 Let V be a vector space over F. Then for all u ∈ V we have
that
(i) 0u = 0.
(ii) (−1)u = −u.
FX := {f : X → F : f is a function}.
Thus, by virtue that F has a well-defined addition, the set FX now also has a
well-defined addition. It is a fine point, but it is important to recognize that
in the equation
(f + g)(x) = f (x) + g(x)
the first + sign represents addition between functions, while the second +
sign represents addition in F, so really the two +s are different. We still
choose to use the same + sign for both, although technically we could have
made them different (+FX and +F , say) and written
Again, let us make the fine point that there are two different multiplications
here, namely the multiplication of a scalar (i.e., an element of F) with a
function and the multiplication of two scalars. Again, if we want to highlight
this difference, one would write this for instance as
Proposition 2.2.1 The set FX with the above definitions of addition and
scalar multiplication is a vector space over the field F.
Checking that all the vector space axioms is not hard. For instance, to check
commutativity of addition, we have to show that f + g = g + f . This
30 Advanced Linear Algebra
introduces the question: When are two functions equal? The answer to this
is:
(f +FX g)(x) = f (x) +F g(x) = g(x) +F f (x) = (g +FX f )(x) for all x ∈ X,
where in the first and third equality we applied the definition of the sum of
two functions, while in the middle equality we applied commutativity of
addition in F.
Notice that again we have two different mathematical objects: the constant
zero function (= the neutral element of addition in FX ) and the neutral
element of addition in F. If we want to highlight this difference, one would
write for instance:
0FX (x) = 0F for all x ∈ X.
Now all the ingredients are there to write a complete proof to Proposition
5.1.6. We already showed how to address the commutativity of addition, and
as the proofs of the other rules are similar, we will leave them to the reader.
Vector Spaces 31
The case when X is a finite set is special in the sense that in this case we
can simply write out all the values of the function. For instance, if
X = {1, . . . , n}, then the function f : X → F simply corresponds to choosing
elements f (1), . . . , f (n) ∈ F. Thus we can identify
f (1)
f : {1, . . . n} → F ⇔ ... ∈ Fn .
f (n)
cf (n)
f (n) cf (n)
(i) 0 ∈ W .
(ii) W is closed under addition: for all w, y ∈ W , we have w + y ∈ W .
(iii) W is closed under scalar multiplication: for all c ∈ F and w ∈ W , we
have that cw ∈ W .
Proof. If W is a vector space, then (i), (ii) and (iii) are clearly satisfied.
For the converse, we need to check that when W satisfies (i), (ii) and (iii), it
satisfies all ten axioms in the definition of a vector space. Clearly properties
(i), (ii) and (iii) above take care of axioms 1, 4 and 6 in the definition of a
vector space. Axiom 5 follows from (iii) in combination with Lemma
2.1.1(ii). The other properties (associativity, commutativity, distributivity,
unit multiplication) are satisfied as they hold for all elements of V , and thus
also for elements of W .
(i)’ W 6= ∅.
U ∩ W := {v ∈ V : v ∈ U and v ∈ W }.
+kj=1 Uj = U1 + · · · + Uk =
Proposition 2.3.3 Consider the direct sum U1 +̇ · · · +̇Uk , then for every
v ∈ U1 +̇ · · · +̇Uk there exists unique uj ∈ Uj , j = 1, . . . , k, so that
v = u1 + · · · + uk . In particular, if uj ∈ Uj , j = 1, . . . , k, are so that
u1 + · · · + uk = 0, then uj = 0, j = 1, . . . , k.
−(uj − ûj ) = (u1 − û1 ) + · · · + (uj−1 − ûj−1 ) + (uj+1 − ûj+1 ) + · · · + (uk − ûk )
p(X) = 1+X +0X 2 +0X 3 +0X 4 +0X 5 , q(X) = 1+0X +2X 2 +0X 3 +0X 4 −X 5 .
Proposition 2.3.4 The set F[X] with the above defined addition and scalar
multiplication, is a vector space over F.
Given two equal polynomial p(X), q(X) ∈ F[X] , then obviously p(x) = q(x)
for all x ∈ F. However, the converse is not always the case, as the following
example shows.
p(A) = p0 Im + p1 A + p2 A2 + · · · + pn An ∈ Fm×m ,
Proposition 2.3.6 Two polynomials p(X), q(X) ∈ F[X] are equal if and
only if for all m ∈ N
(ai,j )ni=1,j=1
m
+(bi,j )ni=1,j=1
m
= (ai,j +bi,j )ni=1,j=1
m
, c(ai,j )ni=1,j=1
m
= (cai,j )ni=1,j=1
m
.
Proposition 2.3.8 The set Fn×m with the above definitions of addition and
scalar multiplication is a vector space over F.
The notion of a basis is a crucial one; it basically singles out few elements in
the vector space with which we can reconstruct the whole vector space. For
example, the monomials 1, X, X 2 , . . . form a basis of the vector space of
polynomials. When we start to do certain (namely, linear) operations on
elements of a vector space, we will see in the next chapter that it will suffice
to know how these operations act on the basis elements. Differentiation is an
example: as soon as we know that the derivatives of 1, X, X 2 , X 3 , . . . are
0, 1, 2X, 3X 2 , . . ., respectively, it is easy to find the derivative of a
polynomial. Before we get to the notion of a basis, we first need to introduce
linear independence and span.
c1 v1 + c2 v2 + · · · + cp vp = 0, (2.2)
1 e 0
π 2
π
As det 0 e 2 42 6 0, we get that we must have c1 = c2 = c3 = 0.
=
−π
0 e 2 π4
Thus linear independence of {cos(x), ex , x2 } follows.
Let us also consider the set of vectors {1, cos(x), sin(x), cos2 (x), sin2 (x)}. We
claim this set is linearly dependent, as the nontrivial choice
c1 = 1, c2 = 0, c3 = 0, c4 = −1, c5 = −1 gives the linear dependence relation
Rewriting, we get
1 1 0 0
0 c1
1 2
c2 = 0 .
2 (2.3)
1 1 0
c3
1 1 1 0
Bringing this 4 × 3 matrix in row echelon form gives
1 1 0 1 1 0 1 1 0
0 1 2 0 1 2 0 1 2
2 1 1 → 0 2 1 → 0 0
.
1
1 1 1 0 0 1 0 0 0
As there are pivots in all columns, the system (2.3) only has the trivial
solution c1 = c2 = c3 = 0. Thus S is linearly independent.
Next, consider
1 0 1 1 2 1
Ŝ = { , , }.
2 1 1 1 0 2
Following the same reasoning as above we arrive at the system
1 1 2 0
0 c1
1 1 c2 = 0 ,
2 (2.4)
1 0 0
c3
1 1 2 0
p(X) = c1 (X − 1) + c2 (X 2 − 2X + 1) + c3 (X 3 − 3X 2 + 3X − 1),
p(X) = c1 (X − 1) + c2 (X 2 − 2X + 1) + c3 (X 3 − 3X 2 + 3X − 1),
for some c1 , c2 , c3 ∈ Z5 . As two polynomials are equal if and only if all the
coefficients are equal, we arrive at the following set of equations
−c1 + c2 − c3 = p0
c1 − 2c2 + 3c3 = p1
.
c2 − 3c3 = p2
c3 = p3
p(X) = (p1 +2p2 +3p3 )(X−1)+(p2 +3p3 )(X 2 −2X+1)+p3 (X 3 −3X 2 +3X−1).
If S has a finite number of elements, then for any other basis of W it will
have the same number of elements, as the following result shows.
wj = a1j v1 + · · · + anj vn , j = 1, . . . , m.
cm
m
X m
X n
X n X
X m n
X
cj wj = [cj aij vi )] = ( aij cj )vi = 0vi = 0.
j=1 j=1 i=1 i=1 j=1 i=1
Remark 2.4.5 Notice that the proof of Proposition 2.4.4 also shows that in
an n-dimensional vector space any set of vectors with more than n elements
must be linearly dependent.
R1
Example 2.4.6 Let W = {p(X) ∈ R2 [X] : −1 p(x)dx = 0}. Show that W
is a subspace of R2 [X] and find a basis for W .
42 Advanced Linear Algebra
Next, one easily checks that {X, X 2 − 31 } is linearly independent, and thus
{X, X 2 − 13 } is a basis for W . In particular, dimW = 2.
1 1 0 0
Vector Spaces 43
This gives that x4 is free and x3 = − x24 . Plugging this into the right-hand
side of (2.7) gives
4 2 0
x4 2 0 −1
− + x4 = x4
2 2 2 1
0 0 0
as a typical element of U ∩ W . So
0
−1
1 }
{
is a basis for U ∩ W .
Notice that
1 1 4 2
0 1 2 0
2 , 1 , 2 , 2}.
U + W = Span{
1 1 0 0
From the row reductions above, we see that the fourth vector is a linear
combination of the first three, while the first three are linearly independent.
Thus a basis for U + W is
1 1 4
0 1 2
{ , ,
}.
2 1 2
1 1 0
44 Advanced Linear Algebra
Notice that
form a basis for Fn . Thus dim Fn = n. We call this the standard basis for Fn .
a1 e1 + b1 (ie1 ) + · · · + an en + bn (ien ) = 0,
These bases are all referred to as the “standard” basis for their respective
vector space.
We let Ejk be the matrix with all zero entries except for the (j, k) entry
which equals 1. We expect the size of the matrices Ejk to be clear from the
context in which we use them.
These bases are all referred to as the standard basis for their respective
vector space.
We will see in this section that any n-dimensional vector space over F
“works the same” as Fn , which simplifies the study of such vector spaces
tremendously. To make this idea more precise, we have to discuss coordinate
systems. We start with the following result.
v = c1 v1 + · · · + cn vn . (2.9)
some d1 , . . . , dn ∈ F. Then
n
X n
X
0=v−v = cj vj − dj vj = (c1 − d1 )v1 + · · · + (cn − dn )vn .
j=1 j=1
Pn v = c1 v1 + · · · + cn vn , w = d1 v1 + · · · + dn vn , then
Clearly, when
v + w = j=1 (cj + dj )vj , and thus
c1 + d1
[v + w]B = ... = [v]B + [w]B .
cn + dn
Similarly,
αc1
[αv]B = ... = α[v]B .
αcn
Thus adding two vectors in V corresponds to adding their corresponding
coordinate vectors (which are both with respect to the basis B), and
multiplying a vector by a scalar in V corresponds to multiplying the
corresponding coordinate vector by the same scalar. As we will see in the
next chapter, the map v 7→ [v]B is a bijective linear map (also called an
isomorphism). This map allows one to view an n-dimensional vector space V
over F as essentially being the vector space Fn .
1 1 1 6
Example 2.5.2 Let V = Z37 and B = {1 , 2 , 3}. Let v = 5.
1 3 6 4
Find [v]B .
Vector Spaces 47
c1
Denoting [v]B = c2 we need to solve for c1 , c2 , c3 in the vector equation
c3
1 1 1 6
c1 1 + c2 2 + c3 3 = 5 .
1 3 6 4
c1 1 + c2 (X − 1) + c3 (X 2 − 2X + 1) + c4 (X 3 − 3X 2 + 3X − 1) = X 3 + X 2 + X + 1.
1
48 Advanced Linear Algebra
2.6 Exercises
Exercise 2.6.1 For the proof of Lemma 2.1.1 provide a reason why each
equality holds. For instance, the equality 0 = 0u + v is due to Axiom 5 in
the definition of a vector space and v being the additive inverse of 0u.
Exercise 2.6.3 When the underlying field is Zp , why does closure under
addition automatically imply closure under scalar multiplication?
(a) W = {f : R → R : f is continuous}.
(b) W = {f : R → R : f is differentiable}.
(f) F = C, V = C3 , and
x1
W = {x2 ∈ C3 : x1 − x2 = x3 − x2 }.
x3
Exercise 2.6.6 For the following vector spaces (V over F) and vectors,
determine whether the vectors are linearly independent or linearly
independent.
W = {p ∈ V : p(2) = 0}.
1 0 0 1
(a) {v1 + v2 + v3 + v4 , v1 − v2 + v3 − v4 , v1 − v2 − v3 − v4 }.
(b) {v1 + v2 , v2 + v3 , v3 + v4 , v4 + v5 , v5 + v2 }.
(c) {v1 + v3 , v4 − v2 , v5 + v1 , v4 − v2 , v5 + v3 , v1 + v2 }.
When you did this exercise, did you make any assumptions on the
underlying field?
Exercise 2.6.12
(a) {v1 + v2 + v3 + v4 , v1 − v2 + v3 − v4 , v1 − v2 − v3 − v4 }.
(b) {v1 , v2 + v3 + v4 , v1 − v2 + v3 − v4 , v1 − v2 − v3 − v4 }.
(c) {v1 + v2 + v3 + v4 , v1 − v2 + v3 − v4 , v1 − v2 − v3 − v4 , v2 + v4 , v1 + v3 }.
Exercise 2.6.13
For the following choices of vector spaces V over the field F, bases B and
vectors v, determine [v]B .
1 0 0 0 2
52 Advanced Linear Algebra
t3 +3t2 +5
(b) Let F = R, B = {t, t2 , 1t }, V = SpanB and v = t .
(c) Let F = C, V = C2×2 ,
0 1 1 1 i 0 i 1 −2 + i 3 − 2i
B={ , , }, v = .
−1 −i i −i −1 −i −1 −i −5 − i 10
Show that Pn with the operations ⊕ and ◦ is a vector space over R. Why is
Pn not a subspace of Rn ?
1
n
Hint: observe that ... is the neutral element for ⊕.
1
n
v ∼ v̂ ⇔ v − v̂ ∈ W.
Let
v + W := {v̂ : v ∼ v̂}
denote the equivalence class of v ∈ V , and let
V /W := {v + W : v ∈ V }
CONTENTS
3.1 Definition of a linear transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.2 Range and kernel of linear transformations . . . . . . . . . . . . . . . . . . . . . . 57
3.3 Matrix representations of linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Now that we have introduced vector spaces, we can move on to the next
main object in linear algebra: linear transformations. These are functions
between vector spaces that behave nicely with respect to the two
fundamental operations on a vector space: addition and scalar
multiplication. Differentiation and taking integrals are two important
examples of linear transformation. Regarding the nice behavior, note for
example that if we take the derivative of the sum of two functions, it is the
same as if we would take the derivative of each and then the sum. Let us
start with the precise definition.
When T is linear, we must have that T (0) = 0. Indeed, by using (ii) we have
T (0) = T (0 · 0) = 0T (0) = 0, where in the first and last step we used
Lemma 2.1.1.
55
56 Advanced Linear Algebra
Then
2(x1 + y1 ) + x2 + y2
x y x1 + y1
T( 1 + 1 ) = T = x1 + y1 + x2 + y2 =
x2 y2 x2 + y2
x2 + y2
2x1 + x2 2y1 + y2
x1 + x2 + y1 + y2 = T x1 + T x1 ,
x2 x2
x2 y2
and
2cx1 + cx2 2x1 + x2
x1 cx1 x1
T (c )=T = cx1 + cx2 = c x1 + x2 = cT
.
x2 cx2 x2
cx2 x2
Thus T is linear.
Then
1 2 1
1 4 1
T 1 = , T 2 = 6 2
= = 2T 1 ,
3 6 3
1 2 1
thus T fails to satisfy (ii) above. Thus T is not linear.
Notice that in order to show that a function is not linear, one only needs to
provide one example where the above rules (i) or (ii) are not satisfied.
We have the following general result, from which linearity in Example 3.1.1
directly follows.
Linear Transformations 57
With a linear transformation there are two subspaces associated with it: the
range (which lies in the co-domain) and the kernel (which lies in the
domain). These subspaces provide us with crucial information about the
linear transformation. We start with discussing the range.
Proof. First observe that T (0) = 0 gives that 0 ∈ Ran T . Next, let w,
ŵ ∈ Ran T and c ∈ F. Then there exist v, v̂ ∈ V so that T (v) = w and
T (v̂) = ŵ. Then w + ŵ = T (v + v̂) ∈ Ran T and cw = T (cv) ∈ Ran T .
Thus, by Proposition 2.3.1, Ran T is a subspace of W .
Thus Ran T ⊆ Span{T (v1 ), . . . , T (vp )}. We have shown both inclusions, and
consequently Ran T = Span{T (v1 ), . . . , T (vp )} follows.
Example 3.1.1 continued. As the standard basis {e1 , e2 } is a basis for Z23 ,
we have that
2 1
Ran T = Span{T (e1 ), T (e2 )} = 1 , 1}.
0 1
In fact, as these two vectors are linearly independent, they form a basis for
Ran T . The map T is not onto as dim Ran T = 2, while
dim W = dim Z33 = 3, and thus Ran T 6= Z33 .
Proof. First observe that T (0) = 0 gives that 0 ∈ Ker T . Next, let v,
v̂ ∈ Ker T and c ∈ F. Then T (v + v̂) = T (v) + T (v̂) = 0 + 0 = 0 and
T (cv) = cT (v) = c0 = 0, so v + v̂, cv ∈ Ker T . Thus, by Proposition 2.3.1,
Ker T is a subspace of V .
Lemma 3.2.3 The linear map T is one-to-one if and only if Ker T = {0}.
Next, suppose that Ker T = {0}, and let T (v) = T (w). Then, using linearity
we get 0 = T (v) − T (w) = T (v − w), implying that v − w ∈ Ker T = {0},
and thus v − w = 0. Thus v = w, and we can conclude that T is one-to-one.
Notice that
dim Ker T + dim Ran T = 2 + 2 = 4 = dim R3 [X].
As the next result shows, this is not a coincidence.
Let v ∈ P
V . Then T (v) ∈ Ran T , and thus there exist b1 , . . . , bq so that
q
T (v) = j=1 bj wj . Then
q
X q
X
T (v − bj xj ) = T (v) − bj wj = 0.
j=1 j=1
60 Advanced Linear Algebra
Pq
Thus v − j=1 bj xj ∈ Ker T . Therefore, there exist a1 , . . . , ap ∈ F so that
Pq Pp
v − j=1 bj xj = j=1 aj vj . Consequently,
Pp Pq
v = j=1 aj vj + j=1 bj xj ∈ Span B. This proves that V = Span B.
It
Ppremains to P show that B, is linearly independent, so assume
q
j=1 aj vj + j=1 bj xj = 0. Then
p
X q
X p
X q
X q
X
0 = T( aj vj + bj xj ) = aj T (vj ) + bj T (xj ) = bj wj ,
j=1 j=1 j=1 j=1 j=1
Next suppose T −1 (w) = v and T −1 (ŵ) = v̂. This means that T (v) = w and
T (v̂) = ŵ. Thus T (v + v̂) = w + ŵ. But then, by definition,
T −1 (w + ŵ) = v + v̂ and, consequently, T −1 (w + ŵ) = T −1 (w) + T −1 (ŵ).
Similarly, one proves T −1 (cw) = cT −1 (w). Thus T −1 is linear.
0
c1
..
Thus Ker T = {0}, giving that T is one-to-one. Next, let . ∈ Fn . Put
cn
c1
v = j=1 cj vj . Then T (v) = [v]B = ... ∈ Ran T . This shows that
Pn
cn
Ran T = V , and thus T is onto.
The following results show that any linear map between finite-dimensional
spaces allows a matrix representation with respect to chosen bases. The
significance of this result is that one can study linear maps between
finite-dimensional spaces by studying matrices.
62 Advanced Linear Algebra
amj
Conversely, if A = (aij )m n
i=1,j=1 ∈ F
m×n
is given, then defining T : V → W
Pn Pn
via (3.3) and extending by linearity via T ( j=1 cj vj ) := j=1 cj T (vj ),
yields a linear map T : V → W with matrix representation [T ]C←B = A.
then
n
X
w = T (v) = cj T (vj ) =
j=1
Pn
n m j=1 a1j cj c1
X
cj (
X
akj wk ) ⇔ [w]C = .. m n ..
= (aij )i=1,j=1 . .
.
j=1 k=1
Pn
j=1 amj cj cn
Compute
i 3i
T (E11 ) = = iE11 + 3iE12 + 3iE21 + 9iE22 ,
3i 9i
Linear Transformations 63
5i 7i
T (E12 ) = = 5iE11 + 7iE12 + 15iE21 + 21iE22 ,
15i 21i
2i 6i
T (E21 ) = = 2iE11 + 6iE12 + 4iE21 + 12iE22 ,
4i 12i
10i 14i
T (E22 ) = = 10iE11 + 14iE12 + 20iE21 + 28iE22 .
20i 28i
This gives that
i 5i 2i 10i
3i 7i 6i 14i
[T ]B←B =
3i 15i 4i
.
20i
9i 21i 12i 28i
Compute
3 3 0 1 2
idV 0 = 0 = 2 2 + 1 0 + 1 1 ,
1 1 4 3 0
2 2 0 1 2
idV 3 = 3 = 1 2 + 0 0 + 1 1 ,
4 4 4 3 0
1 1 0 1 2
idV 4 = 4 = 2 2 + 1 0 + 0 1 .
1 1 4 3 0
This gives that
2 1 2
[idV ]C←B = 1 0 1 .
1 1 0
The next result shows that composition of linear maps corresponds to matrix
multiplication of the matrix representation, when the bases match. Please be
reminded that the composition is defined via (S ◦ T )(x) = S(T (x)).
Proof. Denoting
Then
Xm m
X
(S ◦ T )(vj ) = S(T (vj )) = S( aij wj ) = aij S(wi ) =
i=1 i=1
m
X p
X p X
X m
[aij bli xl ] = ( bli aij )xl , j = 1, . . . , n.
i=1 l=1 l=1 i=1
Pm
Thus we get that clj = i=1 bli aij , l = 1, . . . , p, j = 1, . . . , n, which
corresponds exactly to (3.5).
As the matrices involved are all square, we can now conclude that (3.6)
holds.
Let us check:
0 0 3 2 1
idV 2 = 2 = 4 0 + 1 3 + 1 4 ,
4 4 1 4 1
1 1 3 2 1
idV 0 = 0 = 2 0 + 3 3 + 4 4 ,
3 3 1 4 1
Linear Transformations 65
2 2 3 2 1
idV 1 = 1 = 1 0 + 0 3 + 4 4 ,
0 0 1 4 1
confirming that our calculations were correct.
A = P BP −1 .
3.4 Exercises
(a) V = R3 , W = R4 ,
x1 − 5x3
x1 7x2 + 5
T x2 =
3x1 − 6x2 .
x3
8x3
66 Advanced Linear Algebra
(g) V = {f : R → R : f is continuous}, W = R,
Z 10
T (f ) = f (x)dx.
−5
determine S(v1 − v4 ).
(c) Determine bases for Ran S and Ker S.
Exercise 3.4.7
Consider
the linear map T : R2 [X] → R2 given by
p(1)
T (p(X)) = .
p(3)
d2 d
(a) B = C = {sin t, cos t, sin 2t, cos 2t}, V = W = Span B, and T = dt2 + dt .
68 Advanced Linear Algebra
1 1
(b) B = {1, t, t2 , t3 }, C = { , }, V = C3 [X], and W = C2 , and
0 −1
p(3)
T (p) = .
p(5)
d
(c) B = C = {et cos t, et sin t, e3t , te3t }, V = W = Span B, and T = dt .
2 1 1
(d) B = {1, t, t }, C = { , }, V = C2 [X], and W = C2 , and
1 0
R 1
p(t)dt
T (p) = 0 .
p(1)
(a) Let
1 0 0 1 0 0 0 0
B={ , , , }.
0 0 0 0 1 0 0 1
Determine the matrix representation of L with respect to the basis B.
(b) Determine the dimensions of the subspaces
W = {A ∈ V : L(A) = A}, and
Ker L = {A ∈ V : L(A) = 0}.
(c) Determine the eigenvalues of L.
(a) Find the matrix representation of A with respect to the bases B and C.
(b) Find bases for Ran A and Ker A.
CONTENTS
4.1 The Cayley–Hamilton theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.2 Jordan canonical form for nilpotent matrices . . . . . . . . . . . . . . . . . . . . 71
4.3 An intermezzo about polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.4 The Jordan canonical form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.5 The minimal polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.6 Commuting matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.7 Systems of linear differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.8 Functions of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.9 The resolvent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
It will take a few sections before we get to the general Jordan canonical form.
First we need to develop the following polynomial identity for a matrix.
69
70 Advanced Linear Algebra
1 2
Example 4.1.2 Let A = . Then
3 4
pA (λ) = (λ − 1)(λ − 4) − (−2)(−3) = λ2 − 5λ − 2. Let us check (4.1) for this
matrix:
2
1 2 1 2 1 0 7−5−2 10 − 10 − 0 0 0
−5 −2 = = .
3 4 3 4 0 1 15 − 15 − 0 22 − 20 − 2 0 0
In the proof of Theorem 4.1.1 we use matrices in which the entries are
polynomials in λ, such as for instance
2
λ − 6λ + 1 2λ − 10
. (4.3)
3λ2 + 5λ − 7 −λ2 + 4λ − 25
Pn
We will rewrite such polynomials in the form j=0 λj Aj , with Aj constant
matrices (i.e., Aj does not depend on λ). For (4.3) it looks like
2 1 0 −6 2 1 −10
λ +λ + .
3 −1 5 4 −7 −25
with Aj ∈ Fn×n constant matrices. Using the notation (4.1) and equating
the coefficients of λj , j = 0, . . . , n, on both sides of (4.4) we get
−AA1 + A0 = a1 In , −AA0 = a0 In .
But then pA (A) equals
n
X
aj Aj = An + An−1 (−A + An−2 ) + An−2 (−AAn−2 + An−3 )+
j=0
· · · + A(−AA1 + A0 ) − AA0 = 0.
When we multiply (4.10) on the left with An−1 and use that Ak xk,j = 0, we
get that
sn
(0)
X
n−1
A ( ck,j xn,j ) = 0.
j=1
Then
sn
(0)
X
cn,j xn,j ∈ (Span{xn1 , . . . , xn,sn }) ∩ KerAn−1 = {0},
j=1
and thus
sn
(0)
X
cn,j xn,j = 0.
j=1
(0)
As {xn1 , . . . , xn,sn } is linearly independent, we get that cn,j = 0,
j = 1, . . . , sn . If n = 1, we are done. If n ≥ 2, we multiply (4.10) with An−2
on the left, to obtain
sn−1 sn
(0) (1)
X X
An−2 ( cn−1,j xn−1,j ) + An−1 ( cn,j xn,j ) = 0.
j=1 j=1
74 Advanced Linear Algebra
Then
sn−1 sn
(0) (1)
X X
cn−1,j xn−1,j + A cn,j xn,j ∈
j=1 j=1
(0)
Since {xn−1,1 , . . . , xn−1,sn−1 } is linearly independent, we get cn−1,j = 0,
j = 1, . . . , sn−1 . In addition, as KerA ⊆ KerAn−1 we get that
sn
(1)
X
cn,j xn,j ∈ Span{xn1 , . . . , xn,sn } ∩ KerAn−1 = {0}.
j=1
(1)
and using linear independence of {xn1 , . . . , xn,sn }, we obtain cn,j = 0,
j = 1, . . . , sn . If n = 2, we are done. If n ≥ 3, we continue by multiplying
(4.10) with An−3 on the left and argue in a similar manner as above.
(l)
Ultimately, we arrive at ck,j = 0 for all k, j, and l, showing that B is linearly
independent, and thus a basis for Fn .
and as Al xk,l precedes Al−1 xk,j in B, we get exactly a 1 in the entry above
the diagonal in the column of [A]B←B corresponding to Al−1 xk,j , and zeros
elsewhere in this column. This shows that [A]B←B = J, completing the
proof.
The Jordan Canonical Form 75
KerA = Span{e1 , e2 + e3 + e4 , e5 + e6 }.
We can now choose x3,1 = e3 . Next, we need to choose x2,1 so that
Letting P = [idF6 ]E←B , we get that the columns of P are exactly the vectors
in B (with coordinates with respect to the standard basis E), and thus
1 1 0 1 0 0
0 2 0 1 1 0
0 2 0 1 0 1
P = .
0 2 0 1 0 0
0 1 1 1 0 0
0 1 0 1 0 0
Given two polynomials f (X), g(X) ∈ F[X], we say that f (X) divides g(X)
(notation: f (X)|g(X)) if there exists an h(X) ∈ F[X] so that
76 Advanced Linear Algebra
We call r(X) the remainder of g(X) after division by h(X). One can find
q(X) and r(X) via long division. We present an example.
X 2 + 2X + 2
X −1 X3 + X2 −1
− X3 + X2
2X 2
− 2X 2 + 2X
2X − 1
− 2X + 2
1
We denote the unique monic greatest common divisor of g(X) and h(X) by
gcd(g(X), h(X)). We say that g(X) and h(X) are coprime if
gcd(g(X), h(X)) = 1. In this setting we now also have a Bezout equation
result.
The Jordan Canonical Form 77
X 4 − 2X 3 + 2X 2 − 2X + 1 = (X 3 + X 2 − X − 1)(X − 3) + (6X 2 − 4X − 2)
1 5 4 4
X 3 + X 2 − X − 1 = (6X 2 − 4X − 2)( X + ) + ( X − )
6 18 9 9
4 4 27 9
6X 2 − 4X − 2 = ( X − )( X + ) + 0.
9 9 2 2
(4.13)
The above result follows easily from the k = 2 case after first observing that
gcd(g1 (X), . . . , gk (X)) = gcd(g1 (X), gcd(g2 (X) . . . , gk (X))).
(B − λ idW )s w = (A − λIn )s w = 0,
Next, by Lemma 4.4.4 we have that (A|Wj − λj idWj )nj is nilpotent, and
thus by Theorem 4.2.1 there is a basis Bj for Wj , so that
[A|Wj ]Bj ←Bj = λj Inj + [(A − λj idWj )|Wj ]Bj ←Bj = J(λj ).
Letting now B = ∪m
j=1 Bj , we get by Proposition 4.4.3 that B is a basis for
Fn . Moreover,
[A]B←B = ⊕m m
j=1 [A|Wj ]Bj ←Bj = ⊕j=1 J(λj ) = J,
Thus
−2 −3
Ker (A − I) = Span{ 1 , 0 }.
0 1
One finds that (A − I)2 = 0, and thus w1 (A, 1) = 2, wj (A, 1) = 3, j ≥ 2.
Thus A has one Jordan block of size 1 and one of size 2, giving that A is
similar to
1
J = 1 1 .
0 1
For the basis B = {b1 , b2 , 3}, we choose b3 so that
and any monic degree-1 polynomial has the form t − λ, but A − λI3 6= 0 for
all λ.
Note that r(A) = m1 (A) − q(A)m2 (A) = 0. If r(t) is not the zero
polynomial, then after multiplying by a nonzero constant r(t) will be a
monic polynomial of degree < k so that r(A) = 0. This contradicts m1 (t)
and m2 (t) being minimal polynomials for A. Thus r(t) is the zero
polynomial, and thus m1 (t) = q(t)m2 (t). Since deg m1 = deg m2 = k and m1
The Jordan Canonical Form 83
and m2 are both monic, we must have that q(t) ≡ 1, and thus
m1 (t) = m2 (t). This proves uniqueness.
Finally, let p(t) be so that p(A) = 0. If p(t) ≡ 0, then clearly mA (t) divides
p(t). If p(t) is not the zero polynomial, apply Proposition 4.3.1 providing the
existence of q(t) and r(t) with deg r < deg mA so that
As in the previous paragraph, r(t) not being the zero polynomial contradicts
that mA (t) is the minimal polynomial. Thus r(t) ≡ 0, yielding that mA (t)
divides p(t). As pA (A) by Theorem 4.1.1 we get in particular that mA (t)
divides pA (t).
Proof. It is easy to see that the minimal polynomial for Jk (λ) is (t − λ)k . As
mA (t) divides pA (t) we must have that mA (t) is of the form (4.18) for some
kj ≤ nj , j = 1, . . . , m. Observing that A = P JP −1 implies
m(A) = P m(J)P −1 for any polynomial m(t), it is easy to see by inspection
that kj must correspond exactly to the size of the largest Jordan block
corresponding to λj .
Example 4.5.4 Let A ∈ Z4×4 5 satisfy A3 − 4A3 + 2I6 = 0. What are the
possible Jordan canonical forms of A?
Let p(t) = t3 − 4t2 − 2 = (t − 1)2 (t − 2). Then p(A) = 0. Since mA (t) divides
p(t), there are 5 possibilities:
mA (t) = t − 1, mA (t) = (t − 1)2 , mA (t) = t − 2, mA (t) = (t − 1)(t − 2), or
84 Advanced Linear Algebra
mA (t) = (t − 1)2 (t − 2). Possibilities for the Jordan canonical form are:
1 0 0 0 1 1 0 0 1 1 0 0
0 1 0 0 0 1 0 0 0 1 0 0
J =0 0 1 0 , 0 0 1 0 , 0 0 1 1 ,
0 0 0 1 0 0 0 1 0 0 0 1
2 0 0 0 1 0 0 0 1 0 0 0
0 2 0 0 0 1 0 0 0 1 0 0
0 0 2 0 , 0 0 1 0 , 0 0 2 0 ,
0 0 0 2 0 0 0 2 0 0 0 2
1 0 0 0 1 1 0 0 1 1 0 0
0 2 0 0 0 1 0 0 0 1 0 0
0 0 2 0 , 0 0 2 0 , 0 0 1 0 .
0 0 0 2 0 0 0 2 0 0 0 2
One learns early on when dealing with matrices that in general they do not
commute (indeed, in general AB = 6 BA). Sometimes, though, one does
encounter commuting matrices; for example, if they are matrix
representations of taking partial derivatives with respect to different
variables on a vector space of “nice” functions. It is of interest to relate such
commuting matrices to one another. We focus on the case when one of the
matrices is nonderogatory.
We call a matrix nonderogatory if the matrix only has a single Jordan block
associated with each eigenvalue. The following results is easily proven.
The Jordan Canonical Form 85
(i) A is nonderogatory.
(ii) w1 (A, λ) = dim Ker(A − λIn ) = 1 for every eigenvalue λ of A.
The main result of this section is the following. We say that matrices A and
B commute if AB = BA.
0 1 1 2
Example 4.6.3 Let F = R, A = , and B = . Clearly
1 0 0 3
p(1) 0
AB = BA. If p(X) is some polynomial, then p(A) = , which
0 p(1)
never equals B.
invertible. So from p(C)Y = Y p(D), we get that 0 = Y p(D), and since p(D)
is invertible, we get Y = 0(p(D))−1 = 0.
We first consider the case when A = Jn (0). Let B = (bij )ni,j=1 . Then
AB = (bi+1,j )ni,j=1 , where we let bn+1,j = 0 for all j. Furthermore,
BA = (bi,j−1 )ni,j=1 , where we let bi,0 = 0 for all i. Equating AB and BA, we
therefore obtain
When i =6 j, we get by Lemma 4.6.4 that Bij = 0, and for j = i we get that
Bii is upper-triangular Toeplitz. Define
pA (t)
qj (t) = .
(t − λj )nj
The Jordan canonical form is useful for solving systems of linear differential
equations. We set F = C, as we are dealing with differentiable functions. A
system of linear differential equations has the form
0
x1 (t) = a11 x1 (t) + · · · + a1n xn (t),
x1 (0) = c1 ,
.. ..
. .
0
xn (t) = an1 x1 (t) + · · · + ann xn (t), xn (0) = cn ,
which in shorthand we can write as
0
x (t) = Ax(t)
x(0) = c.
Solving from the bottom up, one easily sees that the solution is
cn 2
zn (t) = cn , zn−1 (t) = cn−1 + cn t, zn−2 = cn−2 + cn−1 t + t ,
2!
n
X ck
, . . . , z1 (t) = tk−1 .
(k − 1)!
k=1
88 Advanced Linear Algebra
z1 (t)
Next, if A = Jn (λ) = λIn + Jn (0), then with z(t) = ... as above, it is
zn (t)
straightforward to see that y(t) = eλt z(t) solves
0
y (t) = Jn (λ)y(t),
y(0) = c.
With J = ⊕m j=1 Jnj (λj ), this system converts to m systems treated in the
previous paragraph. We can subsequently solve these m systems, leading to
a solution y(t). Then, in the end, x(t) = P y(t) solves the system (4.22). We
will illustrate this in an example below.
Thus 0
y (t) = Jy(t)
T
y(0) = P c = 2 −1 −1 2 ,
has the solution
2et
y1 (t)
y2 (t) −e2t
y3 (t) = −e4t + 2te4t .
y4 (t) 2e4t
And thus
−e2t + e4t + 2te4t
e4t + 2te4t
x(t) = P −1 y(t) =
4t 4t
.
e − 2te
2et − e2t − e4t + 2te4t
f (3) (t) − 5f (2) (t) + 8f 0 (t) − 4f (t) = 0, f (2) (0) = 3, f 0 (0) = 2, f (0) = 1.
If we let x1 (t) = f (t), x2 (t) = f 0 (t), x3 (t) = f (2) (t), we get the system
0
x1 (t) 0 1 0 x1 (t) x1 (0) 1
x02 (t) = 0 0 1 x2 (t) , x2 (0) = 1 .
x03 (t) 4 −8 5 x3 (t) x3 (0) 3
For the eigenvalues of the coefficient matrix, we find 1,2,2, and we find that
there is a Jordan block of size 2 at the eigenvalue 2. Thus the solution is a
linear combination of et , e2t , and teet . Letting f (t) = c1 et + c2 e2t + c3 te2t ,
and plugging in the initial conditions f (2) (0) = 3, f 0 (0) = 2, f (0) = 1, we get
the equations
c1 + c2 = 1,
c1 + 2c2 + c3 = 2,
c1 + 4c2 + 4c3 = 3.
(−1)k−1
1
λ − λ12 1
λ3 ··· ··· λk
1 (−1)k−2
− λ12 1
λ λ3 ··· λk−1
..
.. ..
A−1 =
. . . , λ 6= 0.
.. .. ..
. . .
1
λ − λ12
1
λ
we define
f (J(λ1 )) 0 ··· 0
0 f (J(λ 2 )) ··· 0
f (A) := Sf (J)S −1 , f (J) := (4.24)
.. .. .. ..
. . . .
0 0 ··· f (J(λm ))
and
nj wk (A,λj )−wk+1 (A,λj )
f (J(λj )) := ⊕k=1 ⊕l=1 f (Jk (λj )) , j = 1, . . . , m,
Let us do an example.
2 2 3
Example 4.8.1 Let A = 1 3 3 . In Example 4.4.5 we calculated
−1 −2 −2
that A = SJS −1 , where
1 −2 1 1
J = 1 1 , S = 1 1 0 .
1 0 −1 0
If f (t) = ewt , we find that f (1) = ew and f 0 (1) = wew , and thus
w −1
−2 1 1 e −2 1 1
f (A) = 1 1 0 ew wew 1 1 0 .
0 −1 0 ew 0 −1 0
Qm
Remark 4.8.2 It should be noticed that with mA (t) = j=1 (t − λj )kj and
with functions f and g so that
we have that f (A) = g(A). Thus, as an alternative way of defining f (A), one
can construct a polynomial g satisfying (4.26) and define f (A) via
f (A) := g(A). In this way, one avoids having to use the Jordan canonical
form in the definition of f (A), which may be preferable in some cases.
When h(t) = f (t)g(t) we expect that h(A) = f (A)g(A). This is indeed true,
but it is something that we need to prove. For this we need to remind
ourselves of the product rule for differentiation:
h(3) (t) = f (t)g (3) (t) + 3f 0 (t)g 00 (t) + 3f 00 (t)g 0 (t) + f (3) (t)g(t).
In general, we obtain that the kth derivative of h is given by
k
X k
h(k) (t) = f (r) (t)g (k−r) (t), (4.27)
r=0
r
which is referred to as the Leibniz rule. We will use the Leibniz rule in the
following proof.
As the rule works for a Jordan block, it will also work for a direct sum of
Jordan blocks. Finally, when A = SJS −1 , we get that f (A)g(A) =
Sf (J)S −1 Sg(J)S −1 = S[f (J)g(J)]S −1 = Sh(J)S −1 = h(A).
Pjk , j = 1, . . . , m, k = 0, . . . , kj − 1,
j −1
m kX
X
f (A) = f (k) (λj )Pjk . (4.28)
j=1 k=0
We define
0
..
.
0
−1
Pj0 := S
Inj S ,
0
..
.
0
0
..
.
0
1 −1
Pjk := S Jjk S , (4.29)
k!
0
..
.
0
where
nj wk (A,λj )−wk+1 (A,λj )
Jj = ⊕k=1 ⊕l=1 Jk (0) , j = 1, . . . , m.
Notice that Jjs = 0 when s ≥ kj . Equality (4.28) now follows directly from
(4.24). Moreover, from the definitions (4.29) it is easy to check that (i)–(iv)
hold.
Thus λ2 = 1 and λ2 = 4,
0 0 0 0 1 0
J 1 = 0 0 1 , J2 = 0 0 1 .
0 0 0 0 0 0
Now
1
− 12 −1 − 12
0 2 0
0 1
−1 0 −1 0
1
− 12 −1 − 12
I 0 0 0
P10 = S 3 S −1 = 2 ,
0 0 0 1
−2 1 −1 −1
0 − 1 1
0 1 1
2 2 2
1
0 2 − 12 0 0 1
2
0 0 0 0 0 0
0 0 0 0 0 0
J1 0 0 0 0 0 0 0
P11 = S S −1 =
0 1
.
0 0 2 − 21 0 0 12
0 0 0 0 0 0
0 0 0 0 0 0
We leave the other computations as an exercise.
We will next see that the formalism introduced above provides a useful tool
in the setting of systems of differential equations. We first need that if
B(t) = (bij )ni=1,j=1
m
is a matrix whose entries are functions in t, then we
define
d d
B(t) = B 0 (t) := (b0ij )ni=1,j=1
m
= ( bij )ni=1,j=1
m
.
dt dt
96 Advanced Linear Algebra
Proof. We first show (4.31) for a Jordan block A = Jk (λ). If A = Jk (λ) and
6 0, we have that
t=
tλ t 1
1
tλ t
t
tA =
.. .. =
.. ×
. . .
1
tλ t
tk−2
1
tλ tk−1
tλ 1 1
tλ 1 t
.. .. .. ,
. .
.
tλ 1 tk−2
tλ tk−1
bringing A in the SJk (tλ)S −1 format. Then with f (x) = ex we get
f (tA) = Sf (Jk (tλ))S −1 , yielding
tλ etλ etλ
e · · · · · · (k−1)!
1 1!
tλ
1 etλ
t
etλ e1! · · · (k−2)!
tA . ..
.. .. .
e = ×
. . ..
1 tλ
tλ e
tk−2
1
e 1!
tλ
tk−1
e
The Jordan Canonical Form 97
1
t
.. .
.
tk−2
tk−1
Thus we find
tetλ tk−1 etλ
tλ
e 1! ··· ··· (k−1)!
tλ tetλ tk−2 etλ
e ···
1! (k−2)!
tA ..
e = .. .. ,
. . .
tetλ
etλ 1!
tλ
e
j tλ j−1 tλ
d t e λtj etλ
which also holds when t = 0. As dt ( j! ) = t(j−1)!
e
+ j! , j ≥ 1, one
easily sees that that (4.31) holds for A = Jk (λ)).
Next, one needs to observe that (4.31) holds for A a direct sum of Jordan
blocks. Finally, using (4.30), one obtains that (4.31) holds when A = SJS −1 ,
thus proving the statement for general A.
d
Proof. With x(t) = etA x0 , we have x(0) = e0 x0 = Ix0 = x0 , and dt x(t) =
d tA tA
dt e x0 = Ae x0 = Ax(t).
Indeed, if we set x(t) = etA f (t), for some differentiable function f (t), then
using the product rule we obtain
Proposition
Qm 4.9.1 Let A ∈ Cn×n with minimal polynomial
kj
mA (t) = j=1 (t − λj ) , and let Pjk , j = 1, . . . , m, k = 0, . . . , kj − 1, be as
in Theorem 4.8.4. Then
j −1
m nX
X k!
R(λ) = (λIn − A)−1 = Pjk . (4.33)
j=1 k=0
(λ − λj )k+1
1
Proof. Fix λ ∈ C \ σ(A), and define g(z) = λ−z , which is well-defined and k
times differentiable for every k ∈ N on the domain C \ {λ}. Notice that
g(A) = (λIn − A)−1 = R(λ). Also observe that
1 2 k!
g 0 (t) = , g 00 (t) = , . . . , g (k) (t) = .
(λ − t)2 (λ − t)3 (λ − t)k+1
Thus, by Theorem 4.8.4,
j −1
m nX j −1
m nX
X X k!
R(λ) = g(A) = g (k) (t)Pjk = Pjk .
j=1 k=0 j=1 k=0
(λ − λj )k+1
By choosing particular functions for f we can retrieve the matrices Pjk from
Theorem 4.8.4.
100 Advanced Linear Algebra
where j = 1, . . . , m, k = 0, . . . , kj − 1.
4.10 Exercises
Exercise 4.10.2 For the following matrices A (and B) determine its Jordan
canonical form J and a similarity matrix P , so that P −1 AP = J.
(a)
−1 1 0 0
−1 0 1 0
A=
−1
.
0 0 1
−1 0 0 1
This matrix is nilpotent.
(b)
10 −1 1 −4 −6
9
−1 1 −3 −6
4
A= −1 1 −3 −1
.
9 −1 1 −4 −5
10 −1 1 −4 −6
This matrix is nilpotent.
(c)
0 1 0
A = −1 0 0 .
1 1 1
The Jordan Canonical Form 101
(d)
2 0 −1 1
0 1 0 0
A=
1
.
0 0 0
0 0 0 1
(e)
1 −5 0 −3
1 1 −1 0
0 −3 1 −2 .
B=
−2 0 2 1
(Hint: 1 is an eigenvalue.)
(f) For the matrix B, compute B 100 , by using the decomposition
B = P JP −1 .
(a) Prove that M and N are similar if and only if they have the same rank.
(b) Give a counterexample to show that the statement is false if 6 is
replaced by 7.
(c) Compute the minimal and characteristic polynomials of the following
matrix. Is it diagonalizable?
5 −2 0 0
6 −2 0 0
0 0 0 6
0 0 1 −1
102 Advanced Linear Algebra
Exercise 4.10.6 Let A ∈ Fn×n and AT denote its transpose. Show that
wk (A, λ) = wk (AT , λ), for all λ ∈ F and k ∈ N. Conclude that A and AT
have the same Jordan canonical form, and are therefore similar.
Exercise 4.10.11 Let A ∈ Cn×n . For the following, answer True or False.
Provide an explanation.
Exercise 4.10.15 Show that matrices A and B are similar if and only if
they have the same Jordan canonical form.
Exercise 4.10.16 Show that if A and B are square matrices of the same
size, with A invertible, then AB and BA have the same Jordan canonical
form.
1 1 0 1 2 3
(a) A = 0 1 1 , B = 0 2 3 .
0 0 1 0 0 3
1 1 0 1 2 3
(b) A = 0 1 1 , B = 0 1 2 .
0 0 1 0 0 1
where −1
1 −1 1 2 1 0 1 −1 1
A = 0 1 −1 0 2 1 0 1 −1 .
0 1 0 0 0 2 0 1 0
(a) 0
x1 (t) = 3x1 (t) − x2 (t), x1 (0) = 1,
x02 (t) = x1 (t) + x2 (t), x2 (0) = 2.
(b) 0
x1 (t) = 3x1 (t) + x2 (t) + x3 (t), x1 (0) = 1,
x02 (t) = 2x1 (t) + 4x2 (t) + 2x3 (t), x2 (0) = −1,
0
x3 (t) = −x1 (t) − x2 (t) + x3 (t), x3 (0) = 1.
(c) 0
x1 (t) = −x2 (t), x1 (0) = 1,
x02 (t) = x1 (t), x2 (0) = 2.
(d)
x00 (t) − 6x0 (t) + 9x(t) = 0, x(0) = 2, x0 (0) = 1.
(e)
x00 (t) − 4x0 (t) + 4x(t) = 0, x(0) = 6, x0 (0) = −1.
Exercise 4.10.26 Compute the matrices P20 , P21 , P22 from Example 4.8.5.
π
2 1 −1
π
Exercise 4.10.28 Let A = 0 2 − π4 .
π
0 0 4
R(λ)−R(µ)
(a) λ−µ = −R(λ)R(µ).
dR(λ)
(b) dλ = −R(λ)2 .
dj R(λ)
(c) dλj = (−1)j j!R(λ)j+1 .
The Jordan Canonical Form 107
(e) Show that if A ∈ Rn×n , then there exists a real invertible matrix S and
a matrix K so that A = SKS −1 , where K is a block diagonal matrix
with blocks Jk (λ), λ ∈ R, and blocks Kk (a, b) on the diagonal.
(Hint: First find the Jordan canonical form of A over C, where for
complex eigenvalues the (generalized) eigenvectors x and x are paired
up. Then use the similarity in (d) to simultaneously convert P to a real
matrix S and J to the matrix K.)
(f) Show that for systems of real differential equations with real initial
conditions, the solutions are combinations of functions tk eλt , k ∈ N0 ,
λ ∈ R, and tk eαt cos(βt), tk eαt sin(βt), k ∈ N0 , α, β ∈ R.
108 Advanced Linear Algebra
This exercise is based on a result that can be found in Section 1.2 of the
book by L. Le Bruyn, entitled Noncommutative geometry and Cayley-smooth
orders, Volume 290 of Pure and Applied Mathematics, Chapman &
Hall/CRC, Boca Raton, FL. Thanks are due to Paul Muhly for making the
author aware of this result.
5
Inner Product and Normed Vector Spaces
CONTENTS
5.1 Inner products and norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.2 Orthogonal and orthonormal sets and bases . . . . . . . . . . . . . . . . . . . . . 119
5.3 The adjoint of a linear map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
5.4 Unitary matrices, QR, and Schur triangularization . . . . . . . . . . . . . 125
5.5 Normal and Hermitian matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
5.6 Singular value decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
5.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Vector spaces may have additional structure. For instance, there may be a
natural notion of length of a vector and/or angle between vectors. The
properties of length and angle will be formally captured in the notions of
norm and inner product. These notions require us to restrict ourselves to
vector spaces over R or C. Indeed, a length is always nonnegative and thus
we will need the inequalities ≤, ≥, <, > (with properties like x, y ≥ 0 ⇒
xy ≥ 0 and x ≥ y ⇒ x + z ≥ y + z).
h·, ·i : V × V → F
109
110 Advanced Linear Algebra
Notice that (iii) implies that hx, xi ∈ R for all x ∈ V . In addition, (ii)
implies that h0, yi = 0 for all y ∈ V . Also, (ii) and (iii) imply that
hx, ayi = ahx, yi for all x, y ∈ V and all a ∈ F. Finally, (i) and (iii) imply
that hx, y + zi = hx, yi + hx, zi for all x, y, z ∈ V . Also,
If V has an inner product h·, ·i (or sometimes we say “V is endowed with the
inner product h·, ·i ”), then we call the pair (V, h·, ·i) an inner product space.
At times we do not explicitly mention the inner product h·, ·i, and we refer
to V as an inner product space. In the latter case it is implicitly understood
what the underlying inner product is, and typically it would be one of the
standard inner products which we will encounter below.
(i)–(iv) are easily checked. This is the standard inner product or Euclidean
inner product on Fn , where F = R or C.
Properties (i)–(iii) are easily checked. However, (iv) does not hold. If we let
p(X) = X(X − 1)(X − 2) ∈ F3 [X], then p(X) is not the zero polynomial,
but hp(X), p(X)i = 0.
112 Advanced Linear Algebra
Lemma 5.1.12 Let V be a vector space with norm k · k. Then for every
x, y ∈ V we have
| kxk − kyk | ≤ kx − yk. (5.2)
kf k∞ = sup |f (x)|.
x∈[0,1]
Theorem 5.1.19 Let (V, h·, ·i) be an inner product space. Define
p
kxk := hx, xi.
Inner Product and Normed Vector Spaces 115
Moreover,
kx1 + · · · + xn k = kx1 k + · · · + kxn k (5.6)
if and only if dim Span{x1 , . . . , xn } ≤ 1 and hxi , xj i ≥ 0 for all
i, j = 1, . . . , n.
Proof. Conditions (i) and (ii) in the definition of a norm follow directly from
the definition of an inner product. For (iii) we observe that
It is easy to see that the norm in Examples 5.1.13–5.1.18 do not satisfy the
parallelogram identity (5.5), and thus these norms are not associated with
an inner product.
116 Advanced Linear Algebra
Re z
Proof. If we view a complex number z as a vector ∈ R2 with the
Im z
Re z
Euclidean norm, then |z| = k k. Apply now Theorem 5.1.19 to
Im z
obtain the result.
where x1 , . . . , xn+1 ∈ F are different points. This is the norm associated with
the inner product defined in Example 5.1.6.
This norm is called the Frobenius norm, and is the norm associated with the
inner product defined in Example 5.1.8.
Inner Product and Normed Vector Spaces 117
Given a vector space V , we say that two norms k · ka and k · kb are equivalent
if there exist constants c, C > 0 so that
Using the Heine–Borel theorem from analysis, along with the result that a
continuous real-valued function defined on a compact set attains a maximum
and a minimum, we can prove the following.
kvkc := k[v]B k∞ ,
yields that
n
X n
X
| kxk − kyk | ≤ kx − yk = k (xi − yi )bi k ≤ |xi − yi |kbi k ≤
i=1 i=1
n n
X X
( max |xi − yi |) kbi k < Pn kbi k = .
i=1,...,n
i=1 i=1 kbi k i=1
118 Advanced Linear Algebra
exist, and as kvk > 0 for all v ∈ S, we get that c, C > 0. Take now an
1
arbitrary nonzero v ∈ V . Then kvk c
v ∈ S, and thus
1
c≤k vk ≤ C,
kvkc
which implies
ckvkc ≤ kvk ≤ Ckvkc .
Clearly, this inequality also holds for v = 0, and thus the proof is complete.
Notice that the upper bound (which is attained by the vector of all 1’s)
depends on the dimension n, and tends to ∞ as n goes to ∞. Therefore, one
may expect Theorem 5.1.25 not to hold for infinite-dimensional vector
spaces. This is confirmed by the following example.
Let gk ∈ V , k ∈ N, be defined by
(
1 − kx, for 0 ≤ x ≤ k1 ,
gk (x) =
0, for k1 < x ≤ 1.
Then Z 1
k 1 k 1 1
kgk k∞ = 1, kgk k22 = 1 − kx dx = − = .
0 k 2 k2 2k
q
1
No constant C > 0 exists so that 1 ≤ C 2k for all k ∈ N, and thus the
norms k · k2 and k · k∞ on V are not equivalent.
Inner Product and Normed Vector Spaces 119
Given is an inner product space (V, h·, ·i). When in an inner product space a
norm kp· k is used, then this norm is by default the associated norm
k · k = h·, ·i unless stated otherwise. We say that v and w are orthogonal if
hv, wi = 0, and we will denote this as v ⊥ w. Notice that 0 is orthogonal to
any vector, and it is the only vector that is orthogonal to itself.
For ∅ 6= W ⊆ V we define
W ⊥ = {v ∈ V : hv, wi = 0 for all w ∈ W } = {v ∈ V : v ⊥ w for all w ∈ W }.
(5.9)
Notice that in this definition we do not require that W is a subspace, thus
W can be any set of vectors of V .
Example 5.2.2 Let {v, w} be linearly independent, and let us make a new
vector z of the form z = w + cv so that z ⊥ v. Thus we would like that
0 = hz, vi = hw, vi + chv, vi.
120 Advanced Linear Algebra
Theorem 5.2.3 Let (V, h·, ·i) be an inner product space, and let
{v1 , . . . , vp } be linearly independent. Construct {z1 , . . . , zp } as follows:
z1 = v1
hvk , zk−1 i hvk , z1 i
zk = vk − zk−1 − · · · − z1 , k = 2, . . . , p.
hzk−1 , zk−1 i hz1 , z1 i
(5.10)
z1 (X) = 1
hX, 1i
z2 (X) = X − 1 = X − 0 = X,
h1, 1i
hX 2 , Xi hX 2 , 1i 2
z3 (X) = X 2 − X− 1 = X2 − .
hX, Xi h1, 1i 3
(5.11)
One of the reasons why it is easy to work with an orthonormal basis, is that
it is easy to find the coordinates of a vector with respect to an orthonormal
basis.
hx, vn i
Pn Pn
Proof. Let x = i=1 ci vi . Then hx, vj i = i=1 ci hvi , vj i = cj , proving the
lemma.
Then
hx, yiV = c1 d1 + · · · + cn dn = h[x]B , [y]B i, (5.12)
where the last inner product is the standard (Euclidean) inner product for Fn .
122 Advanced Linear Algebra
Pn Pn
Proof. We have x = i=1 ci vi , y= j=1 dj vj , and thus
n X
X n n
X
hx, yiV = ci dj hvi , vj iV = cj dj ,
i=1 j=1 j=1
Let (V, h·, ·iV ) and (W, h·, ·iW ) be inner product spaces, and let T : V → W
be linear. We call T an isometry if
Corollary 5.2.7 Let (V, h·, ·iV ) be an n-dimensional inner product space
over F, with F equal to R or C. Then V is isometrically isomorphic to Fn
with the standard inner product.
Via the inner product, one can relate with a linear map another linear map
(called the adjoint). On a vector space over the reals, the adjoint of
multiplication with a matrix A corresponds to multiplication with the
transpose AT of the matrix A. Over the complex numbers, it also involves
taking a complex conjugate. We now provide you with the definition.
Let (V, h·, ·iV ) and (W, h·, ·iW ) be inner product spaces, and let T : V → W
Inner Product and Normed Vector Spaces 123
Notice that the adjoint is unique. Indeed if S is another adjoint for T , we get
that hv, T ? (w)iV = hv, S(w)iV for all v, w. Choosing v = T ? (w) − S(w)
yields
hT ? (w) − S(w), T ? (w) − S(w)iV = 0,
and thus T ? (w) − S(w) = 0. As this holds for all w, we must have that
T ? = S.
or equivalently,
apply integration by parts twice, and it is important to note that for all f in
V we have that f (0) = f (π) = 0. So let us compute
Z π Z π
hT (f ), giV = −f 00 (x)g(x)dx = −f 0 (π)g(π)+f 0 (0)g(0)+ f 0 (x)g 0 (x)dx =
0 0
Z π
f (π)g 0 (π) − f (0)g 0 (0) − f (x)g 00 (x)dx = hf, T (g)iV .
0
Theorem 5.3.3 Let (V, h·, ·iV ) and (W, h·, ·iW ) be inner product spaces with
orthonormal bases B = {v1 , . . . , vn } and C = {w1 , . . . , wm }. Let T : V → W
be linear. If
a11 · · · a1n
[T ]C←B = ... .. ,
.
am1 ··· amn
then
a11 ··· am1
[T ? ]B←C = ... .. .
.
a1n ··· amn
In other words,
[T ? ]B←C = ([T ]C←B )∗ , (5.13)
∗
where as before A is the conjugate transpose of the matrix A.
Proof. The matrix representation for T tells us, in conjunction with Lemma
5.2.5, that aij = hT (vj ), wi iW . The (k, l)th entry of the matrix
representation of T ? is, again by using the observation in Lemma 5.2.5,
equal to
hT ∗ (wl ), vk iV = hwl , T (vk )iW = hT (vk ), wl iW = alk ,
proving the statement.
A = QR.
If A has rank equal to m, then the diagonal entries of R are positive, and R
is invertible. If n = m, then Q is unitary.
for some rkj ∈ F, k > j. Putting rkk = kzk k, and rkj = 0, k < j, and letting
R = (rkj )m
k,j=1 , we get the desired upper triangular matrix R yielding
A = QR.
When rank < m, apply the Gram–Schmidt process with those columns of A
z
that do not lie in the span of the preceding columns. Place the vectors kzk
126 Advanced Linear Algebra
that are found in this way in the corresponding columns of Q. Next, one can
fill up the remaining columns of Q with any vectors making the matrix an
isometry. The upper triangular entries in R are obtained from writing the
z
columns of A as linear combinations of the kzk ’s found in the process above.
1 0 1 2
1 −2 0 −2
Example 5.4.2 Let A = 1 0 1 0 . Applying the Gram–Schmidt
1 −2 0 0
process we obtain,
1
1
1 ,
z1 =
1
0 1 1
−2 −4 1 −1
0 − 4 1 = 1 ,
z2 =
−2 1 −1
1 1 1 0
0 2 1 2 −1 0
1 − 4 1 − 4 1 = 0 .
z3 = (5.14)
0 1 −1 0
We thus notice that the third column of A is a linear combination of the first
two columns of A, so we continue to compute z4 without using z3 :
2 1 1 1
−2 1 4 −1 −1
z4 = − 0 − =
.
0 1 4 1 −1
0 1 −1 1
Notice that Ak+1 = Q−1 Ak Q, so that Ak+1 and Ak have the same
eigenvalues. As it turns out, Ak converges to an upper triangular matrix,
manageable exceptions aside, and thus one can read the eigenvalues from the
diagonal of this upper triangular limit. In a numerical linear algebra course,
one studies the details of this algorithm. It is noteworthy, though, to remark
that when one wants to find numerically the roots of a polynomial, it is
often very effective to build the associated companion matrix, and
subsequently use the QR algorithm to find the eigenvalues of this companion
matrix, which coincide with the roots of the polynomial. Thus contrary to
how we do things by hand, we rather find roots by computing eigenvalues
than the other way around. We will discuss this further in Section 7.3.
(b). Suppose that T = (tij )ni,j=1 is upper triangular. Thus tij = 0 for i > j.
Since T is normal we have that T ∗ T = T T ∗ . Comparing the (1, 1) entry on
both sides of this equation we get
This gives that t12 = t13 = · · · = t1n = 0. Next, comparing the (2, 2) entry on
both sides of T ∗ T = T T ∗ we get
This gives that t23 = t24 = · · · = t2n = 0. Continuing this way, we find that
tij = 0 for all i < j. Thus T is diagonal.
A = U DU ∗ .
• A is Hermitian if A = A∗ .
• A is skew-Hermitian if A = −A∗ .
• A is unitary if AA∗ = A∗ A = I.
0 0 1 1
Note that i0 (A) = dim Ker(A), and i+ (A) + i− (A) + i0 (A) = n. We now
have the following result.
Lemma 5.5.6 Let A ∈ Cn×n be Hermitian with In(A) = (µ, ν, δ). Then
there exists an invertible T so that
Iµ
A=T −Iν T ∗.
0
where the diagonal zeros have equal size. Let us partition W = (Wij )3i,j=1 in
an appropriately sized block matrix (so, for instance, W11 has size
i+ (A) × i+ (B) and W22 has size i− (A) × i− (B)). Then from the (1, 1) block
entry of the equality (5.15) we get that
∗ ∗
Ii+ (A) = W11 Ii+ (B) W11 − W12 Ii− (B) W12 .
∗ ∗
This gives that rank W11 W11 ≤ i+ (B) and W11 W11 = Ii+ (A) + W12 W12 is
132 Advanced Linear Algebra
∗
positive definite of size i+ (A) × i+ (A), and thus rank W11 W11 ≥ i+ (A).
Combining these observations, gives i+ (B) ≥ i+ (A). Reversing the roles of A
and B, one can apply the same argument and arrive at the inequality
i+ (A) ≥ i+ (B). But then i+ (B) = i+ (A) follows. Finally,
Theorem 5.6.1 Let A ∈ Fn×m have rank k. Then there exist unitary
matrices V ∈ Fn×n , W ∈ Fm×m , and a matrix Σ ∈ Fn×m of the form
σ1 0 · · · 0 · · · 0
0 σ2 · · · 0 · · · 0
.. .. . . .. ..
. . . . .
, σ1 ≥ σ2 ≥ . . . ≥ σk > 0,
Σ= (5.16)
0
0 · · · σ k · · · 0
. .. .. . .
.. . . 0. ..
0 0 ··· 0 ··· 0
so that A = V ΣW ∗ .
Then
0 A v1 v1
= −λ .
A∗ 0 −v2 −v2
Thus, if we denote the positive eigenvalues of M by σ1 ≥ σ2 ≥ . . . ≥ σk > 0,
then −σ1 , . . . , −σk are also eigenvalues of M . Notice that when λ = 0, we
(1) (n−k) (1) (n−k)
can take a basis {v1 , . . . , v1 } of Ker A∗ , and a basis {v2 , . . . , v2 }
of KerA, and then
! ! ! !
(1) (n−k) (1) (n−k)
v1 v1 v1 v1
{ (1) , . . . , (n−k) , (1) , . . . , (n−k) }
v2 v2 −v2 −v2
The values σj are called the singular values of A, and they are uniquely
determined by A. We also denote them by σj (A).
Proposition 5.6.2 Let A ∈ Fn×m , and let k · k be the Euclidean norm. Then
σ1 (A) = max kAxk. (5.17)
kxk=1
p
which is clearly bounded above by σ12 |u1 |2 + · · · + σ12 |uk |2 = σ1 kuk = σ1 .
When u = e1 , then we get that kΣuk = σ1 . Thus maxkuk=1 kΣuk = σ1
follows.
To check that σ1 (·) is a norm, the only condition that is not immediate is
the triangle inequality. This now follows by observing that
To prove (5.18) we first observe that for every vector v ∈ Fm we have that
1
kAvk ≤ σ1 (A)kvk, as w := kvk v has norm 1, and thus
kAwk ≤ maxkxk=1 kAxk = σ1 (A). Now, we obtain that
Then rank  = l, σ1 (A − Â) = σl+1 , and for any matrix B with rankB ≤ l
we have σ1 (A − B) ≥ σ1 (A − Â).
136 Advanced Linear Algebra
where in the last step we used that v ∈ Span{e1 , . . . , el+1 }. This proves the
statement.
Low rank approximations are used in several places, for instance in data
compression and in search engines.
We end this section with an example where we compute the singular value
decomposition of a matrix. For this it is useful to notice that if A = V ΣW ∗ ,
then AA∗ = V ΣΣ∗ V ∗ and A∗ A = W Σ∗ ΣW ∗ . Thus the columns of V are
eigenvectors of AA∗ , and the diagonal elements σj2 of the diagonal matrix
ΣΣ∗ are the eigenvalues of AA∗ . Thus the singular values can be found by
computing the square roots of the nonzero eigenvalues of AA∗ . Similarly, the
columns of W are eigenvectors of A∗ A, and the diagonal elements σj2 of the
diagonal matrix Σ∗ Σ are the nonzero eigenvalues of A∗ A, as we have seen in
the proof of Theorem 5.6.1.
3 2 2
Example 5.6.4 Let A = . Find the singular value
2 3 −2
decomposition of A.
Compute
∗17 8
AA = ,
8 17
which has eigenvalues 9 and 25. So the singular values of A are 3 and 5, and
we get
5 0 0
Σ= .
0 3 0
To find V , we find unit eigenvectors of AA∗ , giving
√ √
1/√2 1/ √2
V = .
1/ 2 −1/ 2
5.7 Exercises
Exercise 5.7.1 For the following, check whether h·, ·i is an inner product.
(a) V = R2 , F = R,
x y
h 1 , 1 i = 3x1 y1 + x1 y2 + x2 y1 + 2x2 y2 .
x2 y2
(b) V = C2 , F = C,
x y
h 1 , 1 i = 3x1 y1 + x1 y2 + x2 y1 + 2x2 y2 .
x2 y2
(a) V = C2 , F = C,
x1
k k = x21 + x22 .
x2
138 Advanced Linear Algebra
(b) V = C2 , F = C,
x1
k k = |x1 | + 2|x2 |.
x2
2 0
(c) In case V is finite dimensional and W is a subspace, show that
dim W ⊥ = dim V − dim W . (Hint: start with an orthonormal basis for
W and add vectors to it to obtain an orthonormal basis for V ).
Exercise 5.7.5 Let h·, ·i be the Euclidean inner product on Fn , and k · k the
associated norm.
(a) Let F = C. Show that A ∈ Cn×n is the zero matrix if and only if
hAx, xi = 0 for all x ∈ Cn . (Hint: for x, y ∈ C, use that
hA(x + y), x + yi = 0 = hA(x + iy), x + iyi.)
(b) Show that when F = R, there exists nonzero matrices A ∈ Rn×n , n > 1,
so that hAx, xi = 0 for all x ∈ Rn .
(c) For A ∈ Cn×n define
w(A) = max |hAx, xi|. (5.20)
x∈Cn ,kxk=1
Show that w(·) is a norm on Cn×n . This norm is called the numerical
radius of A.
Inner Product and Normed Vector Spaces 139
(d) Explain why maxx∈Rn ,kxk=1 |hAx, xi| does not define a norm.
Exercise 5.7.7
For the following linear maps on V , determine whether they are self-adjoint.
Exercise 5.7.8 Let V = R[t] over the field R. Define the inner product
Z 2
hp, qi := p(t)q(t)dt.
0
For the following linear maps on V , determine whether they are unitary.
hA, Bi = tr(AB ∗ ).
1 2 1 0
(a) Let W = span{ , }. Find an orthonormal basis for W .
0 1 2 1
Exercise 5.7.14 Show that the product of two unitary matrices is unitary.
How about the sum?
√1
1 1
(a) .
2 1 −1
1 1 1
2iπ 4iπ
(b) √13 1 e 3 e 3 .
4iπ 8iπ
1 e 3 e 3
1 1 1 1
1 i −1 −i
(c) 21 .
1 −1 1 −1
1 −i −1 i
(d) Can you guess the general rule? (Hint: the answer is in Proposition
7.4.3).
Inner Product and Normed Vector Spaces 141
2 i
(a) A = .
−i 2
√
2 3
(b) A = √ .
3 4
3 1 1
(c) A = 1 3 1.
1 1 3
0 1 0
(d) A = 0 0 1.
1 0 0
3 2i
Exercise 5.7.18 Let A = .
−2i 3
1
(Hint: use that for λ > 0 we have that limk→∞ λ k = 1.)
(c) Let  be the matrix obtained from A by removing row and column i.
Then λmax (Â) ≤ λmax (A).
√
1 1 2√2i
(a) A = √−1 −1
√ 2 2i.
2i − 2i 0
−2 4 5
6 0 −3
(b) A = .
6 0 −3
−2 4 5
P Q
Exercise 5.7.30 Let A = ∈ C(k+l)×(m+n) , where P is of size
R S
k × m. Show that
σ1 (P ) ≤ σ1 (A).
Conclude that σ1 (Q) ≤ σ1 (A), σ1 (R) ≤ σ1 (A), σ1 (S) ≤ σ1 (A) as well.
1. Take an image.
Here are the commands I used on a color image (thus the array has three
levels) with k = 30:
A=imread(Hugo2.png);
AA=double(cdata);
[U,S,V]=svd(AA(:,:,1));
[U2,S2,V2]=svd(AA(:,:,2));
[U3,S3,V3]=svd(AA(:,:,3));
H=zeros(size(S,1),size(S,2));
Snew=S.*H;
Snew2=S2.*H;
Snew3=S3.*H;
Anew(:,:,1)=U*Snew*V’;
Anew(:,:,2)=U2*Snew2*V2’;
Anew(:,:,3)=U3*Snew3*V3’;
Anew=uint8(Anew);
imshow(Anew)
3 = c + d, 5 = 2c + d, 4 = 3c + d.
(a) Show that X has no eigenvalues on the unit circle T = {z ∈ C : |z| = 1}.
146 Advanced Linear Algebra
(b) Show that A is positive definite if and only if X has all eigenvalues in
D = {z ∈ C : |z| < 1}. (Hint: When X has all eigenvalues in D, we have
that X n →
P0∞as n → ∞. Use this to show that
A = H + k=1 X ∗k HX k .)
Exercise 5.7.36 (Honors) On both C4×4 and C6×6 , we have the inner
product given via hA, Bi = tr(B ∗ A). Let T : C4×4 → C6×6 be given via
m11 m12 m13 m14 m13 m14
m21 m22 m23 m24 m23 m24
4
m31 m32 m33 m34 m33 m34
T (mij )i,j=1 :=
.
m41 m42 m43 m44 m43 m44
m31 m32 m33 m34 m33 m34
m41 m42 m43 m44 m43 m44
Exercise 5.7.37 (Honors) Let A have no eigenvalues on the unit circle, and
let C = −(A∗ + I)(A∗ − I)−1 .
Example 5.7.38 (Honors) Let A have all its eigenvalues in the left
half-plane −H = {z ∈ C : Re z < 0}, and let C be a positive semidefinite
matrix of the same size. Show that
Z∞
∗
X= eAt CeA t dt
0
CONTENTS
6.1 The Cartesian product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
6.2 The quotient space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
6.3 The dual space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
6.4 Multilinear maps and functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
6.5 The tensor product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
6.6 Anti-symmetric and symmetric tensors . . . . . . . . . . . . . . . . . . . . . . . . . . 179
6.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
In this chapter we study several useful constructions that yield a new vector
space based on given ones. We also study how inner products and linear
maps yield associated constructions.
Given vector spaces V1 , . . . , Vk over the same field F, the Cartesian product
vector space V1 × · · · × Vk is defined via
v1
..
V1 × · · · × Vk = { . : vi ∈ Vi , i = 1, . . . , k},
vk
v1 w1 v1 + w1
.. .. ..
. + . := ,
.
vk wk vk + wk
147
148 Advanced Linear Algebra
and
v1 cv1
c ... := ... .
vk cvk
k
Clearly, one may view F as the Cartesian product F × · · · × F (where F
appears k times). Sometimes V1 × · · · × Vk is viewed as a direct sum
(V1 ×{0}×· · ·×{0})+̇({0}×V2 ×{0}×· · ·×{0})+̇ · · · · · · +̇({0}×· · ·×{0}×Vk ).
It is not hard to determine the dimension of a Cartesian product.
It takes some effort to prove that this is a norm, and we will outline the
proof in Exercise 6.7.1. Also,
v1
..
k . k∞ := max kvi k
i=1,...,k
vk
Constructing New Vector Spaces from Given Ones 149
A ... = ..
.
.
Pk
vk j=1 Akj v j
{w1 , . . . , wl , v1 , . . . , vk }
is a basis for V (see in Exercise 2.6.8 why this is always possible). We now
claim that
B = {v1 + W, . . . , vk + W }
is a basis for V /W , which then proves the proposition.
c1 (v1 + W ) + · · · + ck (vk + W ) = 0 + W
(where we use the observation that 0 + W is the neutral element for addition
in V /W ). Then
c1 v1 + · · · + ck vk − 0 ∈ W.
Thus there exist d1 , . . . , dl so that
c1 v1 + · · · + ck vk = d1 w1 + · · · + dl wl .
v = c1 v1 + · · · + ck vk + d1 w1 + · · · + dl wl .
But then
v − (c1 v1 + · · · + ck vk ) = d1 w1 + · · · + dl wl ∈ W,
and thus
152 Advanced Linear Algebra
In case a finite-dimensional vector space V has an inner product h·, ·i, then
the spaces V /W and
W ⊥ = {v ∈ V : hv, wi = 0 for every w ∈ W }
are isomorphic. This follows immediately from a dimension count (see
Exercise 5.7.4), but let us elaborate and provide the explicit isomorphism in
the proof below.
The following example shows that in infinite dimensions, not all subspaces
are closed.
Thus V consists of vectors with infinitely many entries whose absolute values
P∞ 2
have a finite sum. As an example, since j=1 j12 (= π6 ) < ∞,
1 1 1 1
v = (1, , , , . . .) = ( 2 )∞ ∈ V.
4 9 16 j j=1
The addition and scalar multiplication are defined entrywise. Thus
(xj )∞ ∞ ∞ ∞ ∞
j=1 + (yj )j=1 = (xj + yj )j=1 , c(xj )j=1 = (cxj )j=1 .
W = {x = (xj )∞
j=1 ∈ V : only finitely many xj are nonzero}.
Next,
kc(v + W )k = inf kcv + wkV = inf kc(v + ŵ)kV =
w∈W ŵ∈W
The techniques introduced in this section provide a useful way to look at the
Jordan canonical form. Let us return to Theorem 4.2.1 and have a nilpotent
A ∈ Fn . The crucial subspaces of Fn here are
Wj := KerAj , j = 0, . . . , n,
Constructing New Vector Spaces from Given Ones 155
{0} = W0 ⊆ W1 ⊆ · · · ⊆ Wn = Fn .
is a basis for Wj /Wj−1 . Starting with a basis for Wn /Wn−1 and repeating
the iteration outlined in this paragraph, one ultimately arrives at bases Bj
for Wj /Wj−1 , j = 1, . . . , n. Picking the specific representatives of these basis
elements (thus by taking the vector x when x + Wj−1 appears in Bj ), one
arrives at the desired basis for Fn giving the Jordan canonical form of A.
These observations form the essence of the construction in the proof of
Theorem 4.2.1.
A scenario where the quotient space shows up, is in the case we have a
vector space V with a Hermitian form [·, ·] that satisfies [v, v] ≥ 0 for all
v ∈ V . Such a Hermitian form is sometimes called a pre-inner product. It is
not an inner product as [x, x] = 0 does not necessarily imply x = 0, but all
the other rules of an inner product are satisfied. The following example is
the type of setting where this may occur.
156 Advanced Linear Algebra
Define Z 1
[f, g] := f (t)g(t)dt.
0
R1
Then [·, ·] is a Hermitian form and [f, f ] = 0 f (t)2 dt ≥ 0. However, there
are nonzero functions f in V so that [f, f ] = 0; for instance,
(
0 if x 6= 12 ,
f (x) =
1 if x = 12 ,
satisfies [f, f ] = 0. Thus [·, ·] is a pre-inner product, but not an inner product.
So, what prevents a pre-inner product [·, ·] from being an inner product, is
that W := {v ∈ V : [v, v] = 0} contains nonzero elements. It turns out that
this set W is a subspace.
By choosing c = −[y, x], we get that −|[y, x]|2 ≥ 0, and thus [y, x] = 0. But
then it follows that x + y ∈ W , proving that W is closed under addition.
hx + W, y + W i := [x, y].
hx + W, y + W i = hx̂ + W, y + W i. (6.5)
With these operations the linear functions form a vector space V 0 , the dual
space of V . Thus
V 0 = {f : V → F : f is linear}.
Thus the functionals in the left- and right-hand sides of (6.6) coincide on the
basis elements vk , k = 1, . . . , n. But then, by linearity, the functionals in the
left- and right-hand sides of (6.6) coincide for all v ∈ V .
f ∈ V 0 there exists a v ∈ V so that f = fv ; that is, f (x) = hx, vi, for all
x ∈ V . Moreover, we have that
Xn n
X n
X n
X
fv (x) = h cj ej , f (ek )ek i = ck f (ek ) = f ( ck ek ) = f (x).
j=1 k=1 k=1 k=1
As for v 6= 0,
1 hv, vi
|fv ( v)| = = kvkV ,
kvkV kvkV
we obtain that kfv kV 0 = kvkV (an equality that trivially holds for v = 0 as
well).
Notice that
Φ(v + v̂) = fv+v̂ = fv + fv̂ = Φ(v) + Φ(v̂), (6.8)
and
Φ(cv) = fcv = cfv = cΦ(v). (6.9)
Thus, when the underlying field is C, the map Φ is not linear, due to the
complex conjugate showing up in (6.9). A map Φ satisfying
kf kV 0 := sup |f (x)|, f ∈ V 0 .
kxkV ≤1
0
Proof. First suppose that f, g ∈ Vbdd , thus kf kV 0 , kgkV 0 < ∞. Then
In the case that dimV = n < ∞, we may choose a basis in V and identify V
with Fn . Defining the standard inner product on Fn , we obtain also an inner
product h·, ·i on V . Using Theorem 6.3.2 we obtain that for every f ∈ V 0 we
have that
sup |f (x)| < ∞,
hx,xi≤1
p
as f = fv for some v ∈ V and suphx,xi≤1 |fv (x)| ≤ hv, vi (by the
p
Cauchy–Schwarz inequality). Using Theorem 5.1.25, we have that h·, ·i and
k · kV are equivalent norms. From this kf kV 0 < ∞ now easily follows.
B = (bij )m n
i=1,j=1 = [A]C←B .
Pn
Let us compute A0 gk . For v = l=1 dl bl we have
Xn Xn Xn
A0 gk (v) = A0 gk ( dl bl ) = gk (A( dl bl )) = gk ( dl Abl ) =
l=1 l=1 l=1
162 Advanced Linear Algebra
Xn n
X n
X n
X n
X
gk ( dl ( bil ci )) = dl ( bil gk (ci )) = dl bkl .
l=1 i=1 l=1 i=1 l=1
Pn
Observing that dl = fl ( i=1 dj bj ) = fl (v), we thus obtain that
n
X
A0 gk (v) = bkl fl (v) for all v ∈ V.
l=1
Consequently,
n
X
A0 gk = bkl fl ,
l=1
V 00 = {E : V 0 → F : E linear},
Ev (f ) = f (v).
fn (t)
be a differentiable function. With the Euclidean norm on Rn we have that
d d
kf (t)k2 = (f1 (t)2 + · · · + fn (t)2 ) = 2(f10 (t)f1 (t) + · · · + fn0 (t)fn (t)) =
dt dt
f1 (t)
2 f10 (t) · · · fn0 (t) ... .
fn0 (t)
The row vector
∇f (t) = f10 (t) · · · fn0 (t)
While we focused in the section on the vector space of linear functionals, one
can, in more generality, study the vector space
L(V, W ) = {T : V → W : T is linear},
with the usual definition of adding linear maps and multiplying them with a
scalar. In finite dimensions, we have seen that after choosing bases B and C
in V and W , respectively, every linear map T : V → W is uniquely identified
by its matrix representation [T ]C←B . Using this, one immediately sees that
The main item we would like to address here is when V and W have norms
k · kV and k · kW , respectively. In this case there is a natural norm on
L(V, W ), as follows:
When V and W are finite dimensional, this supremum is always finite and
thus kT kL(V,W ) is a nonnegative real number. We say that k · kL(V,W ) is the
induced operator norm, as its definition relies on the norms on V and W and
on the property of T as a linear operator.
164 Advanced Linear Algebra
kcT kL(V,W ) = sup kcT (v)kW = sup |c|kT (v)kW = |c|kT kL(V,W ) .
kvkV =1 kvkV =1
v
6 0, then
Finally, if v = kvkV has norm 1, and thus
v
kT ( )kW ≤ kT kL(V,W ) .
kvkV
Multiplying both sides with kvkV , and using the norm properties, yields
(6.13). When v = 0, then (6.13) obviously holds as well.
When the vector spaces are not finite dimensional, it could happen that a
linear map does not have a finite norm. When this happens, we say that the
linear map is unbounded. A typical example of an unbounded linear map is
taking the derivative. We provide the details next.
and
W = {f : (0, 1) → R : f is bounded}.
On both spaces let
kf k∞ = sup |f (t)|
t∈(0,1)
be the norm. Note that f being bounded means exactly that kf k∞ < ∞. Let
d
T = dt : V → W be the differentiation map. Then T is linear. Let now
fn (t) = tn , n ∈ N. Then kfn k∞ = 1 for all n ∈ N. However,
(T fn )(t) = fn0 (t) = ntn−1 has the norm equal to kT fn k∞ = n, n ∈ N. Thus,
it follows that
sup kT f k∞ ≥ kT fn k∞ = n
kf k∞ =1
Proof. Let v ∈ V with kvkV = 1. By (6.13) applied to the vector T (v) and
the map S we have that
φ : V1 × · · · × Vk → W
is linear. Thus
If we let
M = {φ : V1 × · · · × Vk → W : φ is multilinear},
then by usual addition and scalar multiplication of functions, we have that
M is a vector space over F. When the vector spaces V1 , . . . , Vk have inner
products, h·, ·i1 , . . . , h·, ·ik , respectively, then for fixed u1 ∈ V1 , . . . , uk ∈ Vk
the map
k
Y
φu1 ,...,uk (v1 , . . . , vk ) := hv1 , u1 i1 · · · hvk , uk ik = hvj , uj ij
j=1
Similar to the proof of Proposition 6.3.4, one can show that if V1 , . . . , Vk are
finite dimensional and W = F, then φ is automatically bounded. Indeed, if
the norms come from inner products, one can use Proposition 6.4.5 and
(6.18) to see that φ is bounded. Next, using that on finite-dimensional
spaces any two norms are equivalent, one obtains the boundedness with
respect to any norms on V1 , . . . , Vk .
Pm
It is convenient to allow m = 0 in the expression j=1 cj (xj ⊗ vj ), in which
case the sum should just be interpreted as 0. We define addition and scalar
multiplication on V1 ⊗ V2 by
m
X l
X l
X
cj (xj ⊗ vj ) + cj (xj ⊗ vj ) = cj (xj ⊗ vj ),
j=1 j=m+1 j=1
and
m
X m
X
d cj (xj ⊗ vj ) = (dcj )(xj ⊗ vj ).
j=1 j=1
With these operations, one can easily check that V1 ⊗ V2 is a vector space.
An element of the form x ⊗ v is called a simple tensor. In general, the
elements of V1 ⊗ V2 are linear combinations of simple tensors. This definition
of the tensor product of two vector spaces is perhaps the most abstract
notion in this book. The elements of this space are just sums of a set of
symbols, and then we have equality when we can convert one sum to the
other by using the rules (6.19)–(6.21). We intend to make things more
concrete in the remainder of this section.
Proposition
Pm 6.5.2 Consider the vector space V1 ⊗ V2 over F, and let
j=1 cj (xj ⊗ vj ) ∈ V1 ⊗ V2 . Let W1 = Span{x1 , . . . , xm }, and
W2 = Span{v1 , . . . , vm } The following are equivalent:
Pm
(i) j=1 cj (xj ⊗ vj ) = 0,
for all bilinear maps F : W1 × W2 → W we have that
(ii) P
m
j=1 cj F (xj , vj ) = 0W ,
(ii) → (iii): Note that (iii) is just a special case of (ii), by taking W = F, and
thus (iii) holds when (ii) holds.
(iii) → (i): WePprove the contrapositive, so we assume that P(i) does not hold.
m m
Suppose that j=1 cj (xj ⊗ vj ) = 6 0. Thus the expression j=1 cj (xj ⊗ vj )
cannot be converted to 0 by rules (6.19)–(6.21). Let B = {y1 , . . . , ys } be a
basis for W1 = Span{x1 , . . . , xm }, and C = {w1 , . . . , wt } be a basis for
W2 = Span{v1 , . . . , vm }. We introduce the s × m matrix S = (sij ) and the
t × m matrix T = (tij ) as follows:
S = [c1 x1 ]B · · · [cm xm ]B , T = [v1 ]C · · · [vm ]C .
We now claim that ST T =
6 0. Indeed, note that by applying (6.19)–(6.21) we
may write
m
X m X
X s t
X s X
X t X m
(cj xj ) ⊗ vj = [( slj yl ) ⊗ ( tnj wn )] = ( slj tnj )yl ⊗ wn .
j=1 j=1 l=1 n=1 l=1 n=1 j=1
Pm
The number j=1 slj tnj is exactly the (l, n)th entry of ST T , so if ST T = 0,
Pm
it would mean that j=1 cj (xj ⊗ vj ) = 0.
Constructing New Vector Spaces from Given Ones 171
m
X Xt t
X m
X
g( slj yj )h( tnj wn ) = 6 0,
spj tqj =
j=1 l=1 n=1 j=1
as this number is exactly equal to the (p, q) entry of ST T . This finishes the
proof.
We choose
1 1
0 1
B = {2 , 1}, C = { , },
1 1
3 1
and find that
1 0 1 0 1 1 0 4
S= ,T = .
0 1 2 3 1 0 1 2
Compute now
1 2
ST T = .
3 3
Thus (6.22) is not 0. Using any factorization of ST T , for instance
T 1 2 1 0 1 2
ST = = 1 0 + 0 1 ,
3 3 0 1 3 3
Thus
4 0
0 1
x1 = 0 , x2 = 2 , v1 = , v2 = .
1 1
1 4
Then (6.22) equals
4 0
0 1
x1 ⊗ v1 + x2 ⊗ v2 = 0 ⊗
+ 2 ⊗
. (6.23)
1 1
1 4
Pn
Proof.PFor any y ⊗ w ∈ V1 ⊗ V2 , we can write y = i=1 ci xi and
m
w = j=1 vj . Then
n X
X m
y⊗w = ci dj xi ⊗ vj .
i=1 j=1
Pk
For a linear combination r=1 ar yr ⊗ wr we can write each term as a linear
combination of {xi ⊗ vj : 1 ≤ i ≤ n, 1 ≤ j ≤ m}, and thus this linear
combination also lies in Span{xi ⊗ vj : 1 ≤ i ≤ n, 1 ≤ j ≤ m}. This shows
that {xi ⊗ vj : 1 ≤ i ≤ n, 1 ≤ j ≤ m} spans V1 ⊗ V2 .
{xi ⊗P
To show that P vj : 1 ≤ i ≤ n, 1 ≤ j ≤ m} is linearly independent,
n m
suppose that i=1 j=1 aij xi ⊗ vj = 0. Performing the procedure in the
proof of Proposition 6.5.2 with B and C as above we obtain that
a11 · · · a1m
ST T = ... .. .
.
an1 ··· anm
Pn Pm
Thus i=1 j=1 aij xi ⊗ vj = 0 holds if and only if aij = 0 for all i and j.
This proves linear independence.
xn v
For example
4 4
1 5 5
1
2 ⊗ 4 R3 ⊗ R2 ↔ 2 4 = 8 ∈ R6 .
5 5 10
3 4 12
3
5 15
In other words, if we define
Φ : Fn ⊗ Fm → Fnm by Φ(ei ⊗ ej ) = e(j−1)n+i ,
or equivalently, by
x1 v
Φ(x ⊗ v) = ...
xn v
and extend it to the full space by linear extension
m
X m
X
Φ( cj (xj ⊗ vj )) = cj Φ(xj ⊗ vj ),
j=1 j=1
When V1 and V2 have inner products, the tensor product space V1 ⊗ V2 has a
natural associated inner product, as follows.
Proposition 6.5.6 Let V1 and V2 have inner products h·, ·i1 and h·, ·i2 ,
respectively. Define h·, ·i on V1 ⊗ V2 via
and extend h·, ·i via the rules of a Hermitian form to all of V1 ⊗ V2 . Then
h·, ·i is an inner product.
By the extension via the rules of a Hermitian form, we mean that we set
Xn m
X n X
X m
h ci xi ⊗ vi , dj yj ⊗ wj i = c̄i dj hxi ⊗ vi , yj ⊗ wj i =
i=1 j=1 i=1 j=1
n X
X m
c̄i dj hxi , yj i1 hvi ⊗ wj i2 .
i=1 j=1
Constructing New Vector Spaces from Given Ones 175
Pn
Proof. The only tricky part is to check that when f = i=1 ci xi ⊗ vi has the
property that from hf , f i = 0 we obtain f = 0. For this, we choose an
orthonormal basis {z1 , . . . , zk } for Span{v1 , . . . , vn }, and rewrite f as
k
X
f= dj yj ⊗ zj .
j=1
yielding for each i that hdi yi , di yi i1 = 0, and thus di yi = 0. This gives that
f = 0.
It is straightforward to check that h·, ·i satisfies all the other rules of an inner
product, and we will leave this to the reader.
where
Pkthe infimum is taken over all possible ways of writing f as
f = j=1 cj xj ⊗ vj . We will not further pursue this here.
Since
(A ⊗ B)[(x + y) ⊗ v] = (A ⊗ B)(x ⊗ v + y ⊗ v) (6.25)
(A ⊗ B)[x ⊗ (v + w)] = (A ⊗ B)(x ⊗ v + x ⊗ w), (6.26)
and
Then
(A⊗B)(e1 ⊗e2 ) = a11 b12 e1 ⊗e1 +a11 b22 e1 ⊗e2 +a21 b12 e2 ⊗e1 +a21 b22 e2 ⊗e2 ,
(A⊗B)(e2 ⊗e1 ) = a12 b11 e1 ⊗e1 +a12 b21 e1 ⊗e2 +a22 b11 e2 ⊗e1 +a22 b21 e2 ⊗e2 ,
(A⊗B)(e2 ⊗e2 ) = a12 b12 e1 ⊗e1 +a12 b22 e1 ⊗e2 +a22 b12 e2 ⊗e1 +a22 b22 e2 ⊗e2 .
Thus, if we take the canonical basis E = {e1 ⊗ e1 , e1 ⊗ e2 , e2 ⊗ e1 , e2 ⊗ e2 },
we obtain that
a11 b11 a11 b12 a12 b11 a12 b12
a11 b21 a11 b22 a12 b21 a12 b22
[A ⊗ B]E←E = a21 b11 a21 b12 a22 b11 a22 b12 .
E = {xj ⊗ vl : j = 1, . . . , n1 , l = 1, . . . , n2 },
F = {yj ⊗ wl : j = 1, . . . , m1 , l = 1, . . . , m2 },
Constructing New Vector Spaces from Given Ones 177
A = (ajl )m n1
j=1,l=1 = [T ]C1 ←B1 , B = [S]C2 ←B2 ,
1
m2 n2
Proof of Proposition 6.5.8. Writing B = (bij )i=1,j=1 , we have that
m1 X
X m2
(T ⊗ S)(xj ⊗ vl ) = arj bsl yr ⊗ ws , j = 1, . . . , n1 , l = 1, . . . , n2 .
r=1 s=1
Lemma 6.5.10 If T : V1 → W1 , T̂ : W1 → Z1 , S : V2 → W2 , Ŝ : W2 → Z2
are linear maps. Then
(T̂ ⊗Ŝ)(T ⊗S)(x⊗v) = (T̂ ⊗Ŝ)(T x⊗Sv) = (T̂ T x)⊗(ŜSv) = (T̂ T )⊗(ŜS)(x⊗v).
But then (T̂ ⊗ Ŝ)(T ⊗ S) and (T̂ T ) ⊗ (ŜS) also act the same on linear
combinations of simple tensors. Thus the lemma follows.
For the remaining parts, the vector spaces are assumed to be inner product
spaces (and thus necessarily, F = R or C), and the inner product on the
tensor product is given via the construction in Proposition 6.5.6.
(iii) (T ⊗ S)? = T ? ⊗ S ? .
(iv) If T and S are isometries, then so is T ⊗ S.
(v) If T and S are unitary, then so is T ⊗ S.
(vi) If T and S are normal, then so is T ⊗ S.
(vii) If T and S are Hermitian, then so is T ⊗ S.
(viii) If T and S are positive (semi-)definite, then so is T ⊗ S.
and
proving (i).
The remaining details of the proof are left to the reader. For part (viii) use
that T is positive semidefinite if and only if T = CC ∗ for some C, which can
be chosen to be invertible when T is positive definite.
The theory we developed in this section for two vector spaces, can also be
extended to a tensor product V1 ⊗ · · · ⊗ Vk of k vector spaces. In that case
V1 ⊗ · · · ⊗ Vk is generated by elements
v1 ⊗ · · · ⊗ vk ,
Constructing New Vector Spaces from Given Ones 179
In F3 , we have
0 0 0
1 0 0
0 1 0
1 0 −1 1 0 0 0 0
0
0 ∧ 1 ↔ 0 , 0 ∧ 0 ↔ 0 , 1 ∧ 0 ↔ 0 .
0 0 0 0
1 0 0
1 1
0 −1 0
0 0 −1
0 0 0
Lemma 6.6.2 The anti-symmetric tensor is linear in each of its parts; that
is
v1 ∧· · ·∧(cvi +dv̂i )∧· · ·∧vk = c(v1 ∧· · ·∧vi ∧· · ·∧vk )+d(v1 ∧· · ·∧v̂i ∧· · ·∧vk ).
v1 ∧ · · · ∧ vi ∧ · · · ∧ vj ∧ · · · ∧ vk = −v1 ∧ · · · ∧ vj ∧ · · · ∧ vi ∧ · · · ∧ vk .
X X
− sign(στ ) vσ(τ (1)) ⊗ · · · ⊗ vσ(τ (k)) = − signσ̂ vσ̂(1) ⊗ · · · ⊗ vσ̂(k) =
σ∈Sk σ̂∈Sk
−v1 ∧ · · · ∧ vi ∧ · · · ∧ vj ∧ · · · ∧ vk ,
where we used that if τ runs through all of Sk , then so does σ̂ = στ .
v1 ∧ · · · ∧ vi ∧ · · · ∧ vi ∧ · · · ∧ vk = 0.
v1 ∧ · · · ∧ vi ∧ · · · ∧ vi ∧ · · · ∧ vk =
X
(signσ vσ(1) ⊗ · · · ⊗ vσ(k) + sign(στ ) vσ(τ (1)) ⊗ · · · ⊗ vσ(τ (k)) ).
σ∈Ek
We define
∧k V := Span{v1 ∧ · · · ∧ vk : vj ∈ Vj , j = 1, . . . , k}.
Then ∧k V is a subspace of ⊗k V .
For
P linear independence, suppose that
1≤i1 <···<ik ≤n ci1 ,...,ik vi1 ∧ · · · ∧ vik = 0 for some scalars ci1 ,...,ik ∈ F,
1 ≤ i1 < · · · < ik ≤ n. Putting in the definition of vi1 ∧ · · · ∧ vik , we arrive at
the equality
Xn
al1 ,...,lk vl1 ⊗ · · · ⊗ vik = 0, (6.31)
l1 ,...,lk =1
where either al1 ,...,lk = 0 (when lr = ls for some r = 6 s) or where al1 ,...,lk
equals one of the numbers ±ci1 ,...,ik . As the tensors
{vl1 ⊗ · · · ⊗ vik : lj = 1, . . . , n, j = 1, . . . , k} form a basis of ⊗k V , we have
that (6.31) implies that al1 ,...,lk are all equal to 0. This implies that all
ci1 ,...,ik are equal to 0. This shows that E is linearly independent.
Remark 6.6.6 Note that when V has an inner product, and when
{v1 , . . . , vn } is chosen to be an orthonormal basis of V , the basis E is not
orthonormal. It is however an orthogonal basis, and thus all that needs √ to be
done is to make the elements of E of unit length. As all have length k!, one
obtains an orthonormal basis for ∧k V by taking
1
Eon = { √ (vi1 ∧ · · · ∧ vik ) : 1 ≤ i1 < · · · < ik ≤ n}.
k!
(T v1 ) ∧ · · · ∧ (T vk ) ∈ ∧k W.
Constructing New Vector Spaces from Given Ones 183
(iii) (∧k T )? = ∧k T ? .
(iv) If T is an isometry, then so is ∧k T .
(v) If T is unitary, then so is ∧k T .
(vi) If T is normal, then so is ∧k T .
(vii) If T is Hermitian, then so is ∧k T .
(viii) If T is positive (semi-)definite, then so is ∧k T .
Thus, the difference with the anti-symmetric tensor product is the absence
of the factor signσ.
In F2 with k = 3, we have
6 0
0 2
0 2
1 1 1 0
8 1 1 0 0
∈ F8 .
∨ ∨ ↔ ∈ F , ∨ ∨ ↔
0 0 0 0
0 0 1 2
0 0
0 0
0 0
Lemma 6.6.12 The symmetric tensor is linear in each of its parts; that is
v1 ∨· · ·∨(cvi +dv̂i )∨· · ·∨vk = c(v1 ∨· · ·∨vi ∨· · ·∨vk )+d(v1 ∨· · ·∨v̂i ∨· · ·∨vk ).
v1 ∨ · · · ∨ vi ∨ · · · ∨ vj ∨ · · · ∨ vk = v1 ∨ · · · ∨ vj ∨ · · · ∨ vi ∨ · · · ∨ vk .
X X
vσ(τ (1)) ⊗ · · · ⊗ vσ(τ (k)) = vσ̂(1) ⊗ · · · ⊗ vσ̂(k) =
σ∈Sk σ̂∈Sk
v1 ∨ · · · ∨ vi ∨ · · · ∨ vj ∨ · · · ∨ vk ,
where we used that if τ runs through all of Sk , then so does σ̂ = στ .
We define
∨k V := Span{v1 ∨ · · · ∨ vk : vj ∈ Vj , j = 1, . . . , k}.
Then ∨k V is a subspace of ⊗k V .
For
P linear independence, suppose that
1≤i1 ≤···≤ik ≤n ci1 ,...,ik vi1 ∨ · · · ∨ vik = 0 for some scalars ci1 ,...,ik ∈ F,
1 ≤ i1 ≤ · · · ≤ ik ≤ n. Putting in the definition of vi1 ∨ · · · ∨ vik , we arrive at
the equality
Xn
al1 ,...,lk vl1 ⊗ · · · ⊗ vlk = 0, (6.33)
l1 ,...,lk =1
where al1 ,...,lk equals one of the numbers ci1 ,...,ik or a positive integer
multiple of it. As the tensors {vl1 ⊗ · · · ⊗ vik : lj = 1, . . . , n, j = 1, . . . , k}
form a basis of ⊗k V , we have that (6.33) implies that al1 ,...,lk are all equal to
0. This implies that all ci1 ,...,ik are equal to 0. This shows that F is linearly
independent.
Remark 6.6.15 Note that when V has an inner product, and when
{v1 , . . . , vn } is chosen to be an orthonormal basis of V , the basis F is not
orthonormal. It is however an orthogonal basis, and thus all that needs to be
done is to make the elements of E of unit length. The elements of F have
different lengths, so some care needs to be taken in doing this.
∨k T (v1 ∨ · · · ∨ vk ) = (T v1 ) ∨ · · · ∨ (T vk ).
This is almost the same expression as for the determinant except that signσ
does not appear. Thus all terms have a +. For instance,
b b
per 11 12 = b11 b22 + b21 b12 .
b21 b22
a211 a212
a11 a12
[∨2 T ]F ←F = 2a11 a21
a11 a22 + a21 a12 2a12 a22 . (6.35)
a221 a21 a22 a222
Notice that the (2, 2) element is equal to perA[{1, 2}, {1, 2}].
(iii) (∨k T )? = ∨k T ? .
(iv) If T is an isometry, then so is ∨k T .
(v) If T is unitary, then so is ∨k T .
(vi) If T is normal, then so is ∨k T .
(vii) If T is Hermitian, then so is ∨k T .
(viii) If T is positive (semi-)definite, then so is ∨k T .
which now is Hermitian when A is. The same remark holds for the other
properties in Proposition 6.6.19 (iii)–(viii).
6.7 Exercises
Exercise 6.7.1 The purpose of this exercise is to show (the vector form of)
Minkowski’s inequality, which says that for complex numbers xi , yi ,
i = 1, . . . , n, and p ≥ 1, we have
n
! p1 n
! p1 n
! p1
X X X
|xi + yi | ≤ |yi | + |yi | . (6.37)
i=1 i=1 i=1
(a) Show that f (x) = − log x is a convex function on (0, ∞). (One can do
this by showing that f 00 (x) ≥ 0.)
(b) Use (a) to show that for a, b > 0 and p, q ≥ 1, with p1 + 1q = 1, we have
p q
ab ≤ ap + bq . This inequality is called Young’s inequality.
n n
! p1 n
! q1
X X X
ai bi ≤ api bqi .
i=1 i=1 i=1
Pn 1 Pn 1
(Hint: Let λ = ( i=1 api ) p and µ = ( i=1 bqi ) q , and divide on both
sides ai by λ and bi by µ. Use this to argue that it is enough to prove
the inequality when λ = µ = 1. Next use (b)).
(d) Use (c) to prove (6.37) in the case when xi , yi ≥ 0. (Hint: Write
(xi + yi )p = xi (xi + yi )p−1 + yi ((xi + yi )p−1 , take the sum on both sides,
and now apply Hőlder’s inequality to each of the terms on the right-hand
side. Rework the resulting inequality, and use that p + q = pq.)
defines a norm on V1 × · · · × Vk .
(k)
(a) Let A = (aij )ni=1,j=1
m
, Ak = (aij )ni=1,j=1
m
, k = 1, 2, . . . , be matrices in
n×m
F . Show that limk→∞ kAk − Ak = 0 if and only if
(k)
limk→∞ |aij − aij | = 0 for every i = 1, . . . , n and j = 1, . . . , m.
(b) Let n = m. Show that limk→∞ kAk − Ak = 0 and limk→∞ kBk − Bk = 0
imply that limk→∞ kAk Bk − ABk = 0.
Exercise 6.7.4 Given A ∈ Cn×n , we define its similarity orbit to be the set
of matrices
O(A) = {SAS −1 : S ∈ Cn×n is invertible}.
Thus the similarity orbit of a matrix A consists of all matrices that are
similar to A.
(a) Show that if A is diagonalizable, then its similarity orbit O(A) is closed.
(Hint: notice that due to A being diagonalizable, we have that B ∈ O(A)
if and only if mA (B) = 0.)
(b) Show that if A is not diagonalizable, then its similarity orbit is not
closed.
Exercise 6.7.6 Describe the linear functionals on Cn [X] that form the dual
basis of {1, X, . . . , X n }.
−2 0
(c) Determine the annihilator of Span{1 + 2X, X + X 2 } ⊆ R3 [X].
Remark. The statement is also true when {v1 , . . . , vk } are not linearly
independent, but in that case the proof is more involved. The corresponding
result is the Farkas–Minkowski Theorem, which plays an important role in
linear programming.
192 Advanced Linear Algebra
(c) Let M = {1, . . . , m} and P = {1, . . . , p}. For A ∈ Fp×m and B ∈ Fm×p ,
show that
X
det AB = det(A[P, S]) det(B[S, P ]). (6.39)
S⊆M,|S|=p
(a) Show that when a = 6 0, the Jordan canonical form of Js (a) ⊗ Jt (0) is
given by ⊕si=1 Jt (0).
(b) Show that when b 6= 0, the Jordan canonical form of Js (0) ⊗ Jt (b) is
given by ⊕ti=1 Js (0).
(c) Show that when t ≥ s, the Jordan canonical form of Js (0) ⊗ Jt (0) is
given by
[⊕t−s+1
i=1 Js (0)] ⊕ [⊕s−1
i=1 (Js−i (0) ⊕ Js−i (0))].
Using the above information one can now find the Jordan canonical form of
A ⊗ B, when one is given the Jordan canonical forms of A and B.
7
How to Use Linear Algebra
CONTENTS
7.1 Matrices you can’t write down, but would still like to use . . . . . . 196
7.2 Algorithms based on matrix vector products . . . . . . . . . . . . . . . . . . . . 198
7.3 Why use matrices when computing roots of polynomials? . . . . . . 203
7.4 How to find functions with linear algebra? . . . . . . . . . . . . . . . . . . . . . . 209
7.5 How to deal with incomplete matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
7.6 Solving millennium prize problems with linear algebra . . . . . . . . . 222
7.6.1 The Riemann hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
7.6.2 P vs. NP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
7.7 How secure is RSA encryption? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
7.8 Quantum computation and positive maps . . . . . . . . . . . . . . . . . . . . . . . 232
7.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
Bibliography for Chapter 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
In this chapter we would like to give you an idea how creative thinking led
to some very useful ideas to exploit the power of linear algebra. The hope is
that it will inspire you to think of new ways to use linear algebra in areas of
your interest. It would be great if one day we would be remiss by not
including your ideas in this chapter. So, go for it!
This chapter has a somewhat different flavor than the other chapters. As
applications use mathematics from different fields, we will be mentioning
and use some results from other areas of mathematics without proofs. In
addition, not everything will have a complete theory. Some of the algorithms
described may be based on heuristic arguments and do not necessarily have
a full theoretical justification. It is natural that these things happen:
mathematics is a discipline with several different, often useful, aspects and
continues to develop as a discipline. There will always be mathematical
research continuing to improve on existing results.
195
196 Advanced Linear Algebra
7.1 Matrices you can’t write down, but would still like
to use
Here are two examples to begin with, both used in search engines:
• A matrix P where there is a row and column for every existing web page,
and the (i, j)th entry pij represents the probability that you go from web
page i to web page j. Currently (October 2015), there are about 4.76
billion indexed web pages, so this matrix is huge. However, if you have a
way of looking at a page i and determining all the probabilities pij , then
determining a row of this matrix is not a big deal.
• A matrix M where there is a row for every web page, and a column for
every search word. The (i, j)th entry mij of this matrix is set to be 1 if
search word j appears on page i, and 0 otherwise. Again, this matrix is
huge, but determining row i is easily done by looking at this particular
page.
One big difference between these two matrices is obvious: P is square and M
is not. Thus P has eigenvectors, and M does not. In fact, it is the eigenvector
of P T at the eigenvalue 1 that is of interest. Notice that for these matrices it
may not be convenient to use numbers 1, 2, . . . as indices for the rows and
columns, as we usually do. Rather one may just use the name of the web
page or the actual search word as the index. So, for instance, we would write
1
pwww.linear− algebra.edu,www.woerdeman.edu = ,
10
mwww.linear− algebra.edu,Hugo = 1.
Notice that this means that the rows and columns are not ordered in a
natural way (although we can order them if we have to), and thus anything
meaningful that we should be looking for should not depend on any
particular order. In the case of P , though, the rows and columns are indexed
by the same index set, so if any ordering is chosen for the rows we should use
How to Use Linear Algebra 197
the same for the columns. Let us also observe that any vector x for which we
would to consider the product P x, needs to be indexed by the same index
set as is used for the columns of P . Thus x would have entries like
• A matrix K where the columns represent the products you sell and the
rows represent your customers. The entry Kij is the rating customer i
gives to product j. So for instance
Notice that the ratings, one through five stars, do not form a field (we
can make it a matrix with entries in Z5 , but it would not be meaningful
in this context). Still, as it turns out, it is useful to consider this as a
matrix over R. Why the real numbers? Because the real numbers have a
natural ordering, and the ratings are ordered as well. In fact, the
ordering of the ratings is the only thing we care about! The main
problem with this matrix is that you will never know all of its entries
(unless you are running a really small business), and the ones you think
you know may not be accurate.
• A matrix C where both the rows and columns represent all the people
(in the worlds, in a country, in a community), and the entries cij are 1 or
0 depending whether person i knows person j. If we believe the “six
degrees of separation” theory, the matrix C + C 2 + · · · + C 6 will only
have positive entries. (The matrix C is an adjacency matrix; see Exercise
7.9.16 for the definition.)
• A matrix H where each row represents the genetic data of each known
(DNA) virus. Does this even make sense? Can anything be done with
this? The entries would be letters (A, C, G, T ), without any (obvious)
addition and multiplication to make it into a meaningful field.
• (Make up your own.)
There are at least two types of techniques that can help in dealing with
these types of matrices:
2. If we can assume that the matrix is low rank, then knowing and/or storing
just part of the matrix gives us enough to work with the whole matrix.
If we are in a situation where it is hard to deal with the whole square matrix
A, but we are able to compute a vector product Av, are we still able to
compute eigenvalues of A, or solve an equation Ax = b. Examples of such a
situation include
• A sparse matrix A; that is, a matrix with relatively few nonzero entries.
While the matrix may be huge, computing a product Av may be doable.
• A situation where the matrix A represents the action of some system in
which we can give inputs and measure outputs. If the input is u and the
output is y, then by giving the system the input u and by measuring the
output y we would in effect be computing the product y = Au. In this
situation we would not know the (complete) inner workings of this
system, but assume (or just guess as a first try) that the system can be
modeled/approximated by a simple matrix multiplication.
• The matrices M and P from Section 7.1.
Here is a first algorithm that computes the eigenvalue of the largest modulus
in case it has geometric multiplicity 1.
1 v∗ Avk
v0 := v, , vk+1 = Avk , µk := k∗ , k = 1, 2, . . . ,
kAvk k vk vk
Example 7.2.2 For illustration, let us see how the algorithm works on the
How to Use Linear Algebra 199
matrix
3 0 0
A = 0 2 0
0 0 1
with initial vector
1
v0 = 1 .
1
Then
3 9
1 1
v1 = √ 2 , v2 = √ 4 ,
32 + 2 2 + 1 2 1 34 + 24 + 14 1
k
3
1 2k , k ∈ N.
vk = √
32k + 22k + 12k 1
Notice that
3k 1
√ =q → 1,
32k + 22k + 12k 1 + ( 3 ) + ( 13 )2k
2 2k
2k ( 23 )k
√ =q → 0,
32k + 22k + 12k 1 + ( 23 )2k + ( 13 )2k
1k ( 13 )k
√ =q → 0,
32k + 22k + 12k 1 + ( 23 )2k + ( 13 )2k
so that vk → e1 as k → ∞. In addition,
m
m
Proof. Let 1 ≤ i ≤ j ≤ k. Then the (i, j)th entry of Jk (µ)m equals j−i µ .
Notice that
n m(m − 1) · · · (m − (j − i) + 1)
=
j−i (j − i)!
200 Advanced Linear Algebra
where we used that |µ| < 1. For details on the last step, please see Exercise
7.9.1.
Proof. Let ρ(A) < 1. Then A = SJS −1 with J = ⊕sj=1 Jnj (λj ), where
|λj | < 1, j = 1, . . . , s. Then, by Lemma 7.2.3, J m = ⊕sj=1 (Jnj (λj ))m → 0 as
m → ∞. But then Am = SJ m S −1 → 0 as m → ∞.
Next, suppose that ρ(A) ≥ 1. Then A has an eigenvalue λ with |λ| ≥ 0. Let
x be a corresponding eigenvector. Then Am x = λm x 6→ 0 as m → ∞. But
then it follows that Am 6→ 0 as m → ∞.
where we use that λ1 has only one Jordan block of size 1. Denote
c1
−1 ..
S v = . .
cn
Qn
Since v 6∈ Ker j=2 (A − λj ), we have that c1 =
6 0. Put
1 0 ··· 0
0 1
J(λ 2)
k
··· 0 c1
1 λk1
.
wk = k Ak v = S
. , k = 0, 1, . . . .
λ1 .. .. ..
.
.. .
. . .
cn
1
0 0 ··· λk
J(λm )k
1
How to Use Linear Algebra 201
wk
Then vk = kwk k . Also, using Lemma 7.2.3, we see that for j > 1, we have
that
1 λj
J(λj )k = diag(λr−1
1 )nr=1 J( )k diag(λ−r+1
1 )nr=1 → 0 when k → ∞.
λk1 λ1
Thus
1 0 ··· 0 c1 c1
0 0 ··· 0 . 0
wk → S .
.
. =S . =: x when k → ∞.
.. . . ..
.. . . . . ..
..c
0 0 ··· 0 n 0
Notice that x, a multiple of the first column of S, is an eigenvector of A at
wk x
λ1 . We now get that vk = kw kk
→ kxk =: w is a unit eigenvector of A at λ1 ,
and
vk∗ Avk wk∗ Awk x∗ Ax
µk = = → = w∗ Aw = λ1 w∗ w = λ1 when k → ∞.
vk∗ vk wk∗ wk x∗ x
If one is interested in more than just one eigenvalue of the matrix, one can
introduce so-called Krylov spaces:
Span{v, Av, A2 v, . . . , Ak v}.
Typically one finds an orthonormal basis for this space, and then studies
how powers of the matrix A act on this space. In this way one can
approximate more than one eigenvalue of A.
where the ∗’s indicate the few nonzero entries in the desired solution x. It is
important to realize that the location of the nonzero entries in x are not
known; otherwise one can simply remove all the columns in A that
correspond to a 0 in x and solve the much smaller system.
To solve the above problem one needs to use some non-linear operations.
One possibility is to use the hard thresholding operator Hs : Cn → Cn , which
keeps the s largest (in magnitude) entries of a vector x and sets the other
entries equal to zero. For instance
1 0
3+i 0 5 0
2 − 8i 2 − 8i −20 −20
H2 = ,H =
.
2−i 0 3
2 0
10 10 11 11
−7 −7
Notice that these hard thresholding operators are not linear; for instance
3 −2 3 −2 1 0 3 −2
H1 + H1 = + = 6= = H1 ( + ).
1 1 0 0 0 2 1 1
Notice that Hs is actually not well-defined on vectors where the sth largest
element and the (s + 1)th largest element have the same magnitude. For
instance, is
3+i 3+i 3+i 0
3 − i 0 3 − i 3 − i
H2
2 − i = 0 or H2 2 − i = 0 ?
10 10 10 10
When the algorithm below is used, this scenario either does not show up, or
the choice one makes does not affect the outcome, so this detail is usually
ignored. Of course, it may cause a serious problem in some future
application, at which point one needs to rethink the algorithm. There are
other thresholding functions where some of the values are diminished, but
not quite set to 0. The fact that one completely annihilates some elements
(by setting them to 0, thus completely ignoring their value) gives it the term
“hard” in hard thresholding.
1. Let x0 = 0.
2. Put xn+1 = Hs (xn + A∗ (b − Axn )).
How to Use Linear Algebra 203
Example 7.3.1 Let p(t) = t3 − 6t2 + 11t − 6 (= (t − 1)(t − 2)(t − 3)). Its
companion matrix is
0 0 6
A = 1 0 −11 .
0 1 6
Computing its QR factorization, we find
0 0 1 1 0 −11
A = QR = 1 0 0 0 1 6 .
0 1 0 0 0 6
204 Advanced Linear Algebra
If we now let A1 = RQ = Q−1 QRQ = Q−1 AQ, then A1 has the same
eigenvalues as A. We find
1 0 −11 0 0 1 0 −11 1
A1 = 0 1 6 1 0 0 = 1 6 0 .
0 0 6 0 1 0 0 6 0
Notice that the entries below the diagonal are relatively small. In addition,
the diagonal entries are not too far off from the eigenvalues of the matrix:
1,2,3. Let us do another 20 iterations. We find
3.0000 −10.9697 7.5609
A30 = 0.0000 2.0000 −1.8708 .
0 0.0000 1.0000
As A30 is upper triangular we obtain that its diagonal entries 3,2,1 are the
eigenvalues of A30 , and therefore they are also the eigenvalues of A.
A1 = A, Ai = Qi Ri , Ai+1 = Ri Qi , i = 1, 2, . . . ,
with Qi Q∗i = In and Ri upper triangular with positive diagonal entries, gives
that
lim Ak = Λ.
k→∞
How to Use Linear Algebra 205
Ak = Q1 Q2 · · · Qk Rk · · · R2 R1 .
In addition,
Ak = V Λk V ∗ = V Λk LU.
Notice that we may choose for L to have diagonal elements equal to 1.
Combining we obtain
Λk L = (V ∗ Q1 Q2 · · · Qk )(Rk · · · R2 R1 U −1 ),
and thus
Write L = (lij )ni,j=1 with lii = 1, i = 1, . . . , n, and lij = 0 for i < j. We now
have that Λk LΛ−k is lower triangular with a unit diagonal, and with (i, j)th
entry lij ( λλji )k , i < j, in the lower triangular part. As | λλji | < 1, i > j, we have
that limk→∞ lij ( λλji )k = 0, and thus limk→∞ Λk LΛ−k = In . Let
λi n uii n
∆ = diag( ) , E = diag( ) ,
|λi | i=1 |uii | i=1
E ∗ ∆−(k−1) Wk−1
∗
Wk ∆k E∆−k E ∗ Uk ΛUk−1
∗
E∆k−1 − Λ (7.5)
While Theorem 7.3.2 only addresses the case of Hermitian matrices, the
convergence result goes well beyond this case. In particular, it works for
large classes of companion matrices. Due to the structure of companion
matrices, one can set up the algorithm quite efficiently, so that one can
actually compute roots of polynomials of very high degree accurately. In
Figure 7.1, we give an example of degree 10,000.
Concerns with large matrices (say, 104 × 104 = 108 entries) are (i) how do
you update them quickly? (ii) how do you store them? As it happens,
companion matrices have a lot of structure that can be maintained
throughout the QR algorithm. First observe that a companion matrix has
zeros in the lower triangular part under the subdiagonal. The terminology is
as follows. We say that A = (aij )ni,j=1 is upper Hessenberg if aij = 0 when
i > j + 1. The upper Hessenberg structure is maintained throughout the QR
algorithm, as we will see now.
0.5
0
-1 -0.5 0 0.5 1
-0.5
-1
P10,000
Figure 7.1: These are the roots of the polynomial pk (10, 000)xk , where pk (n)
k=1
is the number of partitions of n in k parts, which is the number of ways n can be
written as the sum of k positive integers.
Aside from the upper Hessenberg property, a companion matrix has more
structure: it is the sum of a unitary matrix and a rank 1 matrix. Indeed, the
companion matrix
0 0 ··· 0 −a0
1 0 · · · 0 −a1
.. .. .. ,
C = . . .
0 · · · 1 0 −an−2
0 · · · 0 1 −an−1
can be written as
C = Z + xy∗ ,
208 Advanced Linear Algebra
where
0 eiθ −a0 − eiθ
0 0 ···
1 0
··· 0 0
−a1
Z = ... .. .. , x = ..
, y = en .
. .
.
0 · · · 1 0 0 −an−2
0 ··· 0 1 0 −an−1
Here Z is unitary and xy∗ has rank 1. Notice that θ can be chosen to be any
real number. The property of being the sum of a unitary and a rank 1 is
maintained throughout the QR algorithm, as we prove next.
Let us observe that in finding roots of non-linear systems one often still relies
on linear algebra. Indeed, Newton’s method is based on the idea that if we
would like to find a root of a function f , we start at a first guess, and if this
is not a root, we pretend that the graph at this point is a line (the tangent
line) and find the root of that line. This is our next guess for our root of f . If
the guess is right, we stop. If not, we continue as before by computing the
root of the tangent line there, and repeat this process iteratively.
There are many iterative linear schemes that solve a nonlinear problem. One
such example is an image enhancement scheme that was used in law
enforcement. Such methods need to be defendable in court, convincing a jury
that the information extracted was there to begin with rather than that the
program “invented” information. In the riots in Los Angeles in 1992 one of
the convictions was based on the enhancement of video images taken from a
helicopter. Indeed, after enhancement of these images a tattoo became
recognizable leading to the identity of one of the rioters.
How to Use Linear Algebra 209
are linear, linear algebra plays a very useful role here. In both cases we will
restrict the discussion to collecting a finite number of data points. For more
general data collection, one would need some tools from functional analysis
to set up a robust theory.
E : FX → Fn
sound signals, cosines and sines are great functions to work with. In this case
one could take
W = Span{1, cos x, sin x, cos 2x, sin 2x, . . . , cos N x, sin N x}.
The number k in cos kx, sin kx is referred to as the frequency, and our ear
hears a higher tone when the frequency is higher. In addition, the range that
the human ear can hear is between 20 Hz and 20,000 Hz (with Hz
corresponding to 1 cycle per second). Thus, when it is about sounds the
human ear can hear, it makes perfectly sense to use a finite-dimensional
subspace. As
eix = cos x + i sin x, e−ix = cos x − i sin x
one can also deal with the subspace
W = Span{e−iN x , ei(N −1)x , . . . , ei(N −1)x , eiN x },
often simplifying the calculations (which may sound counterintuitive when
you are still getting used to complex numbers, but for instance, simple rules
like ea eb = ea+b are easier to work with than the formulas for cos(a + b) and
sin(a + b)). In some cases it is better to work with functions that are nonzero
only on a finite interval (which is not true for cos and sin), and so-called
wavelet functions were invented to have this property while still keeping some
advantages that cos and sin have. In Figure 7.2 is an example of a wavelet.
Once we have settled on a finite dimensional subspace of functions, we can
start to use linear algebra. We begin the exposition using polynomials.
p(xn )
How to Use Linear Algebra 211
Then
xn−1
1 x1 ··· 1
1 x2 ··· x2n−1
[E]E←B = . .. =: V (x1 , . . . , xn ), (7.6)
..
.. . .
1 xn ··· xn−1
n
where E is the standard basis of Fn . The matrix V (x1 , . . . , xn ) is called the
Vandermonde matrix. Thus interpolation with polynomials leads to a system
of equations with a Vandermonde matrix.
6 xj when i 6= j.
In particular, V (x1 , . . . , xn ) is invertible as soon as xi =
.
.. .. ..
. . .
0 xn − x1 ··· xn−1
n − xn−1
1
So we find that
n
Y
det V (x1 , . . . , xn ) = [ (xj − x1 )] det V (x2 , . . . , xn ),
j=2
Aside from having an easily computable inverse, the Fourier matrix (when n
is a power of 2) also has the advantage that it factors in simpler matrices.
This makes multiplication with the Fourier matrix easy (and fast!) to
compute. We just illustrate the idea for n = 4 and n = 8:
1 1 1 1 1 1 0 0 1 0 1 0
1 i −1 −i 0 0 1 i 0 1 0 1
F4 =
1 −1 1 −1 = 1 −1 0 0 1 0 −1 0 ,
1 −i −1 i 0 0 1 −i 0 1 0 −1
1 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0
0 0 1 α 0 0 0 0 0 1 0 1 0 0 0 0
0 0 0 0 1 α2 0 0 0 0 0 0 1 0 α 2
0
0 0 0 0 0 0 1 α3 0 0 0 0 0 1 0 α2
F8 = ×
1 α4 0 0 0 0 0 0 1 0 α4 0 0 0 0 0
0 0 1 α5 0 0 0 0 0 1 0 α4 0 0 0 0
0 0 0 0 1 α6 0 0 0 0 0 0 1 0 α6 0
0 0 0 0 0 0 1 α7 0 0 0 0 0 1 0 α6
1 0 0 0 1 0 0 0
0 1 0 0 0 1 0 0
0 0 1 0 0 0 1 0
0 0 0 1 0 0 0 1
, where α = e 2πi
8 .
1 0 0 0 −1 0 0 0
0 1 0 0 0 −1 0 0
0 0 1 0 0 0 −1 0
0 0 0 1 0 0 0 −1
How to Use Linear Algebra 213
Notice that when multiplying with one of these simpler matrices, for each
entry one only needs to do one multiplication and one addition. Thus
multiplying with F8 requires 24 multiplications and 24 additions. In general,
we have that multiplying with Fn requires n log2 n multiplications and
n log2 n additions. This is a lot better than n2 multiplications and (n − 1)n
additions, which one has with a regular n × n matrix vector multiplication
(this number can be reduced somewhat, but still for a general matrix it is of
the order n2 ).
Interpolation techniques are also useful over finite fields. An example where
this is used is secret sharing.
a(x) = a0 + a1 x + a2 x2 ,
where a0 = m = 1432 and a1 and a2 are some other numbers in Zp \ {0}. For
instance, a1 = 132 and a2 = 547. Now we generate interpolation data, for
instance:
x f (x)
1 2111
2 1575
3 2133
4 1476
5 1913
6 1135
7 1451
8 552
9 747
10 2036
If now three people get together, one will be able to reconstruct the
polynomial a(x), and thus the secret a(0). With only two people (thus, with
only two interpolation points), one will not be able to reconstruct the secret
code. For instance, with the data (2, 1575), (5, 1913), (9, 747), one finds the
214 Advanced Linear Algebra
secret by computing
−1
1 2 4 1575
a0 = 1 0 0 1 5 25 1913 , (7.8)
1 9 81 747
where one is working over the field Z2309 . The calculation (7.8) can be
programmed so that those holding the interpolation points do not need to
know the prime number p. When the three data points are known, but the
prime p is unknown, one still will not be able to reconstruct the secret,
providing some protection when someone listening in is able to get 3
interpolation points. This secret sharing scheme was introduced by Adi
Shamir.
Pn U = Span{u1 , . . . , un },
In addition, we can take a finite-dimensional subspace
and seek a solution f in this subspace, thus f = j=1 aj uj for some scalars
a1 , . . . , an . Now we obtain the system of equations
Xn n
X
hΦ(f ), wi i = hΦ( aj uj ), wi i = aj hΦ(uj ), wi i = hg, wi i, i = 1, . . . , n.
j=1 j=1
∂2f ∂2f
2
+ 2 = g,
∂x ∂y
and let us add the zero boundary condition
f = 0 on ∂Ω.
Thus our vector space X consists of functions that are differentiable twice
with respect to each of the variables x and y, and that are zero on the
boundary ∂Ω. We introduce the Hermitian form
Z Z
hk, hi := k(x, y)h(x, y) dx dy,
Ω
∂ 2 uj ∂ 2 uj ∂ 2 uj ∂ 2 uj
Z Z
bij = h 2 + , ui i = ( + )ui dx dy.
∂x ∂y 2 Ω ∂x2 ∂y 2
Performing partial integration, and using the zero boundary condition, we
arrive at
Z Z
∂uj ∂ui ∂uj ∂ui ∂uj ∂ui ∂uj ∂ui
bij = + dx dy = h , i+h , i.
Ω ∂x ∂x ∂y ∂y ∂x ∂x ∂y ∂y
Note that B is symmetric, and when ui , i = 1, . . . , n are chosen so that
{ ∂u ∂un ∂u1 ∂un
∂x , . . . , ∂x } or { ∂y , . . . , ∂y } is linearly independent, we have that B is
1
positive definite. This guarantees that one can solve for a1 , . . . , an in (7.9)
and thus construct a solution f .
Figure 7.4: The original image (of size 3000 × 4000 × 3).
A partial matrix over F is a matrix with some entries in F given and others
unknown. For instance
1 0 ?
A= (7.10)
? 1 ?
is a 2 × 3 partial matrix with entries (1, 3), (2, 1) and (2, 3) unknown. When
convenient, we indicate the unknowns by variables:
1 0 x13
A= .
x21 1 x23
We view the unknown as variables xij that take value in the field F. The set
of locations J ⊆ {1, . . . , n} × {1, . . . , m} of known entries is called the
pattern of the partial matrix. For instance, for the partial matrix (7.10) the
pattern is {(1, 1), (1, 2), (2, 2)}. A completion of a partial matrix is obtained
by choosing values in F for the unknowns. For instance, if F = R, then
!
1 0 1
1 0 qπ
A= ,A = 5
10 1 −7 e2 1 17
are completions of the partial matrix (7.10). We will denote partial matrices
by A, B, etc., and their completions by A, B, etc.
For instance,
1 0 ? 1 1 ?
min rank = 2, min rank = 1.
? 1 ? ? 1 ?
How to Use Linear Algebra 219
Indeed, independent of the choice for x13 , x21 and x23 , we have that
1 0 x13
rank = 2,
x21 1 x23
1 1 ?
while any completion of B = has rank at least 1, and
? 1 ?
1 1 1
1 1 1
With the partial ranking data, one obtains a large matrix (say, of size
1, 000, 000, 000 × 100, 000) where only a small percentage of the values are
known. It turned out that looking for (an approximation of) a minimal rank
completion was a good move. Apparently, a model where our individual
movie rankings are a linear combination of the ranking of a relatively few
number of people provides a reasonable way to predict a person’s movie
rankings. Of course, a minimal rank completion of a partial matrix that has
entries in the set {1, 2, 3, 4, 5} will not necessarily have its entries in this set,
so additional steps need to be taken to get ranking predictions.
So, how does one find a minimal rank completion? Here we discuss one
algorithm, which assumes that F = R or C, based on an initial guess of an
upper bound of the minimal rank. For
σ1 0 · · · 0 · · · 0
0 σ2 · · · 0 · · · 0
.. .. . . .. ..
. . . . . , σ1 ≥ · · · ≥ σm ,
Σ= (7.11)
0
0 · · · σm · · · 0
. .. .. .
.. . . .
0. ..
0 0 ··· 0 ··· 0
thus just keeping the k largest singular values. Notice that the operation Hk
is like the hard thresholding operator introduced in Section 7.2.
220 Advanced Linear Algebra
1. Choose a completion A0 of A.
T T
Example 7.5.1 Let v1 = 0 1 1 , v2 = 1 −1 1 , and
T
v3 = 0 0 2 . Then the corresponding distance matrix is given by
0 5 2
5 0 3 .
2 3 0
equal to 1. In that case, the minimal dimension n for which there exists
vectors v1 , . . . , vk ∈ Rn such that dij = kvi − vj k2 , i, j = 1, . . . , k, is given
by the rank of the matrix
−1
S = B22 − B21 B11 B12 , (7.13)
where
0 1
B11 = (bij )2i,j=1 = T
, B12 = B21 = (bij )2i=1,j=3
k+1
, B22 = (bij )k+1
i,j=3 .
1 0
kv2 − v3 k2 · · · kv2 − vk k2
0
kv3 − v2 k2 0 · · · kv3 − vk k2
.. .. .. ..
. . . .
kvk − v2 k2 kvk − v3 k2 ··· 0
kv2 k2
1
1 kv3 k2
0 1
1 1 ··· 1
− . ,
.. 1 kv2 k2 kv3 k2 kvk k2
.. . 0 ···
1 kvk k2
222 Advanced Linear Algebra
The negative even integers are considered to be the trivial zeros of ζ(s), so
the Riemann hypothesis can also be stated asthe non-trivial zeros of the
Riemann zeta function have a real part 21 . There is a lot to say about the
Riemann hypothesis as the vast literature on the subject shows. A good
place to start to read up on it would be the website of the Clay Mathematics
Institute. In this subsection we would just like to introduce a linear algebra
problem, the solution of which would imply the Riemann hypothesis.
and
Cn = (e2 + · · · + en )T e1 .
Let An = Dn + Cn , which is called the Redheffer matrix, after its inventor.
So, for instance
1 1 1 1 1 1
1 1 0 1 0 1
1 0 1 0 0 1
A6 = 1 0 0 1 0 0 .
1 0 0 0 1 0
1 0 0 0 0 1
In Figure 7.5 one can see what A500 looks like.
The Riemann hypothesis holds if and only if for every > 0 there exist
1
M, N > 0 so that | det An | ≤ M n 2 + for all n ≥ N .
If you are familiar with big O notation, then you will recognize that the last
1
statement can be written as | det An | = O(n 2 + ) as n → ∞. The proof of
this result requires material beyond the scope of this book; please see
[Redheffer, 1977] for more information. While this formulation may be an
interesting way to familiarize oneself with the Riemann hypothesis, the
machinery to solve this problem will most likely tap into many fields of
mathematics. Certainly the solution has been elusive to many
How to Use Linear Algebra 225
7.6.2 P vs. NP
graph above and the choice V1 = {1, 3, 6}, V2 = {2, 4, 5}, the size equals
s(V1 , V2 ) = 5. A maximum cut of a graph G is a cut whose size is at least the
size of any other cut of G.
A graph G with n vertices has 2n−1 cuts, and thus to find a maximum cut
one may simply check the size of each cut and pick one for which the size is
maximal. There is one major problem with this approach: there are simply
too many cuts to check. For instance, if n = 100 and one can check 100,000
cuts per second, it will take more than 1017 years to finish the search. The
main problem is that the time it takes is proportional to 2n−1 , which is an
exponential function of n. We would rather have an algorithm taking time
that is proportional to a polynomial p(n) in n. We call such an algorithm a
polynomial time algorithm. As an example of a problem that can be solved in
polynomial time, putting n numbers in order from smallest to largest can be
done in a time proportional to n2 , for instance by using the Quicksort
algorithm. The MaxCut problem is one of many for which no polynomial
time has been established. We will describe now a polynomial time
algorithm that finds a cut with a size of at least 0.878 times the maximal cut
size. The development of a polynomial time algorithm that would bring this
16
number up to 17 ≈ 0.941 would show that P=NP and thus solve one of the
millennium prize problems.
Notice that
(
0 if i, j ∈ V1 or i, j ∈ V2 ,
1 − yi yj =
2 if i ∈ V1 , j ∈ V2 or i ∈ V2 , j ∈ V1 .
1
P
Thus 2 i<j wij (1 − yi yj ) indeed corresponds to the size of the cut
How to Use Linear Algebra 227
The following proposition gives an estimate for the expected size of a cut
obtained by the above algorithm.
Proposition 7.6.2 Using the above algorithm, the expected size of a cut is
≥ 0.87856 mcr(G).
1X arccos(viT vj ) 1X arccos(yij )
wij = wij =
2 i,j π 2 i,j π
1X 2 arccos(yij ) 1X 2 arccos(t)
wij (1 − yij ) ≥ wij (1 − yij ) min =
4 i,j π 1 − yij 4 i,j −1≤t≤1 π 1−t
2 θ
mcr(G) min ≥ 0.87856 mcr(G),
0≤θ≤π π 1 − cos θ
where we used the substitution t = cos θ, and where the last step is the result
θ
of a calculus exercise (of determining the minimum of 1−cos θ on [0, π]).
Thus the outcome of the polynomial time algorithm provides an answer that
bounds the value of the NP hard problem MaxCut within 12.2% accuracy. It
has been proven that if the approximation ratio can be made better than
16
17 ≈ 0.941, then a polynomial time algorithm for MaxCut can be obtained.
Thus if a polynomial time algorithm can be found achieving this
approximation ratio of 0.941 (instead of 0.87856), one obtains that P=NP
(and a million US dollars).
214032465024074496126442307283933356300861471514475501779775492088
141802344714013664334551909580467961099285187247091458768739626192
155736304745477052080511905649310668769159001975940569345745223058
9325976697471681738069364894699871578494975937497937
Example 7.7.1 Let the numbers 3675, 7865, 165, 231, 7007 be given. Can we
find a square among the different products of these numbers? For this we
first do a prime factorization of each of them:
3675 = 3·52 ·72 , 7865 = 5·112 ·13, 165 = 3·5·11, 231 = 3·7·11, 7007 = 72 ·11·13.
Notice that these are all products of the primes 3, 5, 7, 11, and 13. For each
of these numbers we make a column consisting of the power of these primes
modulo 2. For instance, the column corresponding to 3675 is
T
1 0 0 0 0 ; as only the power of 3 in the prime factorization of 3675
is odd, we only get a 1 in the first position (the row corresponding to the
prime 3). Doing it for all 5 numbers we get the matrix
1 0 1 1 0
0 1 1 0 0
A= 0 0 0 1 0 .
0 0 1 1 1
0 1 0 0 1
T
Taking the vector x = 1 1 1 0 1 , we get that Ax = 0. This now
gives that the product
3675 · 7865 · 165 · 7007 = 32 · 54 · 74 · 114 · 132
is a square.
How to Use Linear Algebra 231
1. Randomly choose 0 6= y ∈ Zm
2 . If u = Ay = 0 we are done.
2. Randomly choose 0 6= v ∈ Zm
2 .
3. Compute si = vT Ai u, i = 0, 1, . . . ..
4. Compute a nonzero polynomial m(X) = m0 + m1 X + · · · + md X d so that
Pd
j=0 sk+j mj = 0 for all k. Equivalently, find a nontrivial solution of the
equation
m0
s0 s1 s2 · · · sd−1 sd m1
s1 s2 s3 · · · sd sd+1 m2
s2 s3 s4 · · · sd+1 sd+2 .. = 0. (7.19)
.
.. .. .. .. ..
. . . . . md−1
md
m(X)
5. Let j be so that m0 = · · · = mj−1 = 0 and mj =
6 0. Put p(X) = mj X j .
6. Let now x = p(A)y, and check whether Al−1 x 6= 0 and Al x = 0, for some
l = 1, . . . , j. If so, Al−1 x 6= 0 is the desired vector. If not, start over with
another random vector y.
It should be noted that Step 4 in the above algorithm can actually be used
to find the minimal polynomial of a matrix. Let us illustrate this on a small
example over R.
3 0 0 0
0 2 1 0
Example 7.7.2 Let A =
0 0 2 0. If we take
0 0 0 2
T
v = u = 1 1 1 1 , one easily calulates that
m3 1
For example,
is a cone in Hk . If the vector space V has a norm k · k, then we say that the
cone C is closed if
hA, Bi = tr(AB),
(n)
Proof. Let An = (aij )ki,j=1 ∈ PSDk , n = 1, 2, . . ., and A = (aij )ni,j=1 ∈ Hk be
(n)
so that limn→∞ kAn − Ak = 0. Then limn→∞ aij = aij for all 1 ≤ i, j ≤ n. If
we let x ∈ Ck , then hAn x, xi ≥ 0. Also limn→∞ hAn x, xi = hAx, xi, and thus
hAx, xi ≥ 0. As this is true for every x ∈ Ck , we obtain that A ∈ PSDk .
In Hnm we define
When M Γ ∈ PSDnm , we say that M “passes the Peres test.” Asher Peres
discovered Proposition 7.8.2 in 1996. In addition, the map M 7→ M Γ is
referred to as taking the partial transpose.
is not positive semidefinite. Thus M does not pass the Peres test.
Proposition 7.8.2 relies on the fact that taking the transpose maps PSDk
into PSDk . For other maps that have this property the same test can be
applied as well. We call a linear map Φ : Cm×m → Cl×l positive if
Φ(PSDm ) ⊆ PSDl . Thus, Φ is positive if it maps positive semidefinite
matrices to positive semidefinite matrices.
k
X
(idCn×n ⊗ Φ)(M ) = Ar ⊗ Φ(Br ) ∈ SEPn,l ⊆ PSDnl .
r=1
There are positive maps Φ for which idCk×k ⊗ Φ is positive for every k ∈ N.
We call such maps completely positive. These completely positive maps are
useful in several contexts, however they are unable to identify M ∈ PSDmn
that are not separable, as (idCn×n ⊗ Φ)(M ) will in this case always be
positive semidefinite. The completely positive maps are characterized in the
following result, due to Man-Duen Choi.
When one (and thus all) of (i)–(iv) hold, then s in (iv) can be chosen to be
at most ml.
Proof. (i) → (ii) is trivial, as when idCk×k ⊗ Φ is positive for all k ∈ N, then
it is certainly positive for k = m.
2
×m2
(ii) → (iii): The matrix H = (Eij )m
i,j=1 ∈ C
m
is easily seen to be
How to Use Linear Algebra 237
vectors vr as
vr1
vr = ... , where vr1 , . . . , vrm ∈ Cl .
vrm
We now have that
ml
X
∗
Mij = Φ(Eij ) = vri vrj .
r=1
∗
= Sr Eij Sr∗ . Thus
If we introduce Sr = vr1 · · · vrm , we have that vri vrj
Plm ∗ ml
= r=1 Sr Eij Sr∗ . As any X is a
P
we find that Mij = Φ(Eij ) = r=1 vri vrj
linear combination of the basis elements Eij , we thus find that
Pml
Φ(X) = i=1 Si XSi∗ .
Ps
(iv) → (i): When Φ(X) = r=1 Sr XSr∗ , we may write for M = (Mij )ki,j=1
with Mij ∈ Cm×m ,
s
X n
X
(idCk×k ⊗ Φ)(M ) = ( Sr Mij Sr∗ )m
i,j=1 = (Ik ⊗ Sr )M (Ik ⊗ Sr∗ ).
r=1 r=1
Thus completely positive maps are well-characterized in the sense that there
is a simple way to check that a linear map is completely positive, as well as
that it is easy to generate all completely positive maps. The set of positive
maps (which actually forms a cone) is not that well understood. First of all,
it is typically not so easy to check whether a map is positive, and secondly
there is not a way to generate all positive maps. Let us end this section with
a positive map that is not completely positive, also due to Man-Duen Choi.
Then
1 0 0 0 −1 0 0 0 −1
0 0 0 0 0 0 0 0 0
0 0 2 0 0 0 0 0 0
0 0 0 2 0 0 0 0 0
3
−1
(Φ(Eij )i,j=1 ) = 0 0 0 1 0 0 0 −1
,
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 2 0
−1 0 0 0 −1 0 0 0 1
7.9 Exercises
Exercise 7.9.3 Let k · k be a norm on Cn×n , and let A ∈ Cn×n . Show that
1
ρ(A) = lim kAk k k , (7.22)
k→∞
where ρ(·) is the spectral radius. (Hint: use that for any > 0 the spectral
1
radius of ρ(A)+ A is less than one, and apply Corollary 7.2.4.)
Exercise 7.9.4 Let A = (aij )ni,j=1 , B = (bij )ni,j=1 ∈ Cn×n so that |aij | ≤ bij
that ρ(A) ≤ ρ(B). (Hint: use (7.22) with the
for i, j = 1, . . . , n. Show q
Pn 2
Frobenius norm kM k = i,j=1 |mij | .)
Exercise 7.9.6 Show that if A has the property that every 2s columns are
linearly independent, then the equation Ax = b can have at most one
solution x with at most s nonzero entries.
Exercise 7.9.7 Let A = (aij )ni,j=1 . Show that for all permutations σ on
{1, . . . , , n} we have a1,σ(1) a2,σ(2) · · · an,σ(n) = 0 if and only if there exist r
(1 ≤ r ≤ n) rows and n + 1 − r columns in A so that the entries they have in
common are all 0.
1/6 1/2 1/3
Exercise 7.9.10 Write the matrix 7/12 0 5/12 as a convex
1/4 1/2 1/4
combination of permutation matrices.
How to Use Linear Algebra 241
1 0
(a) Let U = ∈ Cn×n , with U1 ∈ C(n−1)×(n−1) a unitary matrix
0 U1
chosen so that
a21 σ v
a31 0 u n
uX
U1 . = . , σ = t |aj1 |2 .
.. .. j=2
an1 0
(b) Show that there exists a unitary V so that V AV∗ is upper Hessenberg.
1 0
(Hint: after part (a), find a unitary U2 = so that U2 A1 U2∗ has
0 ∗
∗ ∗
the form , and observe that
σ2 e1 A2
1 0 1 0 1 0 1 0
 = A
0 U2 0 U1 0 U1∗ 0 U2∗
has now zeros in positions (2, 1), . . . , (n, 1), (3, 2), . . . , (n, 2). Continue the
process.)
Remark. If one puts a matrix in upper Hessenberg form before starting the
QR algorithm, it (in general) speeds up the convergence of the QR
algorithm, so this is standard practice when numerically finding eigenvalues.
How to Use Linear Algebra 243
(c) For a real number x let bxc denote the largest integer ≤ x. For instance,
bπc = 3, b−πc = −4, b5c = 5. Let α = λmax (AG ) be the largest
eigenvalue of the adjacency matrix of G. Show that G must have a
vertex of degree at most bαc. (Hint: use Exercise 5.7.21(b).)
(d) Show that
χ(G) ≤ bλmax (AG )c + 1, (7.24)
which is a result due to Herbert S. Wilf. (Hint: use induction and
Exercise 5.7.21(c).)
where
1√ 1√
1 0 2 0 2 1
x1 = ⊗ ⊗ 21 √ , x2 = ⊗ 12 √ ⊗ ,
0 1 2 2 1 2 2 0
1√ 1√ 1√ 1√
x3 = 2 √2 ⊗ 1 ⊗ 0 , x = 2 √2 2 2
⊗ 21 √ ⊗ 21 √ .
1 4 1
2 2 0 1 − 2 2 − 2 2 − 2 2
Show that R is not 2 × 2 × 2 separable.
Hint: Let
1 −1 −1 1 −1 1 1 −1
−1 4 1 0 1 3 −1 1
−1 1 4 3 1 −1 0 1
1 0 3 4 −1 1 1 −1
Z=
−1 1
,
1 −1 4 0 3 1
1
3 −1 1 0 4 1 −1
1 −1 0 1 3 1 4 −1
−1 1 1 −1 1 −1 −1 1
How to Use Linear Algebra 245
(v ⊗ w ⊗ z)∗ Z(v ⊗ w ⊗ z) ≥ 0,
It is beyond the scope of this book to provide complete references for the
topics discussed in this chapter. Rather, we provide just a few references,
which can be a starting point for further reading on these topics. With the
references in the papers and books below as well as the sources that refer to
them (see the chapter “How to start your own research project” on how to
look for these), we hope that you will be able to familiarize yourself in more
depth with the topics of your interest.
• K. Kaplan, Cognitech thinks it’s got a better forensic tool: The firm uses
complex math in video image-enhancing technology that helps in finding
suspects, Los Angeles Times, September 5, 1994;
http://articles.latimes.com/1994-09-05/business/fi-35101 1 image-
enhancement.
• A. K. Lenstra and M. S. Manasse, Factoring with two large primes,
Math. Comp. 63 (1994), no. 208, 785–798.
• P. J. Olver, Orthogonal bases and the QR algorithm, University of
Minnesota, http://www.math.umn.edu/∼olver/aims /qr.pdf.
• L. Page, S. Brin, R. Motwani, T. Winograd, The PageRank citation
ranking: Bringing order to the web (1999),
http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf.
• R. Redheffer, Eine explizit lősbare Optimierungsaufgabe. (German)
Numerische Methoden bei Optimierungsaufgaben, Band 3 (Tagung,
Math. Forschungsinst., Oberwolfach, (1976), pp. 213–216. Internat. Ser.
Numer. Math., Vol. 36, Birkhäuser, Basel, 1977.
• P. W. Shor, Polynomial-time algorithms for prime factorization and
discrete logarithms on a quantum computer, SIAM J. Comput. 26
(1977), no. 5, 1484–1509.
• H. J. Woerdeman, Minimal rank completions for block matrices. Linear
Algebra Appl. 121 (1989), 105–122.
• T. Zhang, J. M. Pauly, S. S. Vasanawala and M. Lustig, Coil
compression for accelerated imaging with Cartesian sampling. Magnetic
Resonance in Medicine, 69 (2013), 571–582.
How to Start Your Own Research
Project
Of course, you can also search terms in any search engine. Some search
engines, when you give them titles of papers, will indicate what other papers
cite this paper. I find this a very useful feature. Again, it gives a sense of
how that particular line of research is developing and how much interest
there is for it.
If you want to get a sense of how hot a topic is, you can see if government
agencies or private industry give grants for this line of research. For instance,
in the United States the National Science Foundation (NSF) gives grants for
basic research. On the NSF web page (www.nsf.gov) you can go to “Search
Awards,” and type terms like eigenvalue, singular value decomposition, etc.,
and see which funded grants have that term in the title or abstract. Again, it
gives you an idea of what types of questions people are interested in, enough
to put US tax dollars toward the research. Of course, many countries have
government agencies that support research, for instance in the Netherlands
it is the Nederlandse Organisatie voor Wetenschappelijk Onderzoek (NWO,
247
248 How to Start Your Own Research Project
Another source for hot topics is to see what areas of mathematics receive the
major prizes. The Fields medal and the Abel Prize are two well-known
prestigious prizes for mathematical research, but there are many others. In
addition, some of the prize winners and other well-known mathematicians
started their own blogs, which are also a source for exciting ideas.
Of course, you should also leverage all your contacts. Your professor would
be a good person to talk about this, or other professors at your school. In
addition, don’t be afraid to contact a person you do not know. It has been
my experience that when you put some thought in an email message to a
mathematician, a good number of them will take the effort to write back.
For instance, if you would write to me and say something along the lines “I
looked at your paper X, and I thought of changing the problem to Y. Would
that be of interest? Has anyone looked at this?”, you would probably get an
answer from me. And if you don’t within a few weeks, maybe just send the
message again as it may just have ended up in a SPAM filter or it somehow
fell off my radar screen.
Finally, let me mention that in my research I found it often useful to try out
ideas numerically using MATLAB
R
, MapleTM or Mathematica
R
. In some
cases I discovered patterns this way that turned out to be essential. In
addition, try to write things up along the way as it will help you document
what you have done, and it will lower the bar to eventually write a paper.
Typically mathematical texts (such as this book) are written up using the
program LaTeX (or TeX), so it is definitely useful to getting used to this
freely available program. For instance, you can write up your homework
using LaTeX, which will surely score some points with your professor.
Chapter 1
Exercise 1.5.1 The set of integers Z with the usual addition and multiplication is not a
field. Which of the field axioms does Z satisfy, and which one(s) are not satisfied?
Answer: The only axiom that is not satisfied is number 10, involving the existence of a
multiplicative inverse. For instance, 2 does not have a multiplicative inverse in Z.
Exercise 1.5.2 Write down the addition and multiplication tables for Z2 and Z5 . How is
commutativity reflected in the tables?
+ 0 1 2 3 4 . 0 1 2 3 4
0 0 1 2 3 4 0 0 0 0 0 0
1 1 2 3 4 0 , 1 0 1 2 3 4
2 2 3 4 0 1 2 0 2 4 1 3
3 3 4 0 1 2 3 0 3 1 4 2
4 4 0 1 2 3 4 0 4 3 2 1
The symmetry in the tables is due to commutativity.
Exercise 1.5.3 The addition and multiplication defined in (1.4) also works when p is not
prime. Write down the addition and multiplication tables for Z4 . How can you tell from
the tables that Z4 is not a field?
249
250 Answers to Exercises
Exercise 1.5.4 Solve Bezout’s identity for the following choices of a and b:
(i) a = 25 and b = 7;
Answer: 25 − 3 · 7 = 4, 7 − 1 · 4 = 3, 4 − 1 · 3 = 1, thus 1 = gcd(25, 7), and we get
1 = 4 − 1 · 3 = 4 − (7 − 1 · 4) = −7 + 2 · 4 = −7 + 2(25 − 3 · 7) = 2 · 25 − 7 · 7.
Thus m = 2 and n = −7 is a solution to (1.5).
(ii) a = −50 and b = 3.
Answer: −50 + 17 · 3 = 1, thus 1 = gcd(−50, 3) and m = 1 and n = 17 is a solution
to (1.5).
(i) 2 + 2 + 2 =
Answer: 0
(ii) 2(2 + 2)−1 =
Answer: 2
(iii) Solve for x in 2x + 1 = 2.
Answer: 2
1 2
(iv) Find det .
1 0
Answer: 1
1 2 1 1
(v) Compute .
0 2 2 1
2 0
Answer:
1 2
−1
2 0
(vi) Find .
1 1
2 0
Answer:
1 1
(i) 4 + 3 + 2 =
Answer: 4
(ii) 4(1 + 2)−1 =
Answer: 3
(iii) Solve for x in 3x + 1 = 3.
Answer: 4
4 2
(iv) Find det .
1 0
Answer: 3
1 2 0 1
(v) Compute .
3 4 2 1
4 3
Answer: .
3 2
Answers to Exercises 251
−1
22
(vi) Find .
34
1 1
Answer: .
2 4
Exercise 1.5.7 In this exercise we are working in the field C. Make sure you write the
1+i
final answers in the form a + bi, with a, b ∈ R. For instance, 2−i should not be left as a
final answer, but be reworked as
1+i 1+i 2+i 2 + i + 2i + i2 1 + 3i 1 3i
=( )( )= = = + .
2−i 2−i 2+i 22 + 12 5 5 5
Notice that in order to get rid of i in the denominator, we decided to multiply both
numerator and denominator with the complex conjugate of the denominator.
1 0 2
1 1 0
Exercise 1.5.9 Let F = Z3 . Compute the product 1 2 1 . Answer:
2 1 1
2 0 1
2 2 0
.
2 2 0
2−i 2+i 5+i 6−i
Exercise 1.5.10 Let F = C. Compute the product .
2−i −10 1−i 2+i
14 − 4i 14 − 4i
Answer: .
1 + 7i −9 − 18i
Answer: Multiply the first row with 3−1 = 2 and row reduce:
1 2 3 1 2 3
0 2 4 → 0 1 0 ,
0 3 0 0 0 4
where we subsequently switched rows 2 and 3, and multiplied (the new) row 2 with 3−1 .
Then
3 1 4 1 2 3
det 2 1 0 = −3 · 3 det 0 1 0 = 4.
2 2 1 0 0 4
Exercise 1.5.12 Let F = Z3 . Find the set of all solutions to the system of linear
equations
2x1 + x2 =1
.
2x1 + 2x2 + x3 = 0
2 1 0 1 1 2 0 2 1 0 1 1
Answer: → → , so all solutions
2 2 1 0 0 1 1 2 0 1 1 2
are
x1 1 2
x2 = 2 + x3 2 , x3 ∈ Z3 ,
x3 0 1
or, equivalently,
1 0 2
2 , 1 , 0 .
0 1 2
Answer: Row reducing the augmented matrix yields the row echelon form
1 0 −1 0
0 1 1 0
.
0 0 7 1
0 0 0 0
No pivot in the augmented column, thus b is a linear combination of a1 , a2 , a3 ; in fact
1 1 1
b= a1 − a2 + a3 .
7 7 7
2 3 1
Exercise 1.5.14 Let F = Z5 . Compute the inverse of 1 4 1 in two different ways
1 1 2
(row reduction and by applying (1.11)).
4 0 3
Answer: 3 1 3.
4 2 0
Answers to Exercises 253
Exercise 1.5.15 Let F = C. Find bases of the column space, row space and null space of
the matrix
1 1+i 2
A = 1 + i 2i 3 + i .
1−i 2 3 + 5i
1 2
Answer: Basis for ColA is 1 + i , 3 + i .
1−i 3 + 5i
Basis for RowA is 1 1+i 2 , 0 0 1 .
1+i
Basis for NulA is −1 .
0
3 5 0
Exercise 1.5.16 Let F = Z7 . Find a basis for the eigenspace of A = 4 6 5
2 2 4
corresponding to the eigenvalue λ = 1.
1
Answer: 1 .
1
Exercise 1.5.17 Let F = Z3 . Use Cramer’s rule to find the solution to the system of
linear equations
2x1 + 2x2 = 1
.
x1 + 2x2 = 1
Answer: x1 = 0, x2 = 2.
Answer: α = −1 − i.
Answer:
−(1 − t)(4 + t2 ) (2 − t)(4 + t2 ) 2 − 8t + 4t2 − t3
adj(A) =
1−t t−2 − 1t + 1 + 4−2t
1+t .
8+2t2 4+2t2
1+t
− 3t − 4t − t + 2 + t2 3− 1+t
254 Answers to Exercises
Exercise 1.5.20 Recall that the trace of a square matrix Pisn defined to be the sum of its
diagonal entries. Thus tr[(aij )n
i,j=1 ] = a11 + · · · + ann = j=1 ajj .
Similarly,
m
X m X
X n
tr(BA) = (BA)jj = ( bjk akj ).
j=1 j=1 k=1
As akj bjk = bjk akj for all j and k, the equality tr(AB) = tr(BA) follows.
(b) Show that if A ∈ Fn×m , B ∈ Fm×k , and C ∈ Fk×n , then
tr(ABC) = tr(CAB) = tr(BCA).
Answer: By the previous part, we have that tr((AB)C) = tr(C(AB)) and also
tr(A(BC)) = tr((BC)A). Thus tr(BCA) = tr(ABC) = tr(CAB) follows.
(c) Give an example of matrices A, B, C ∈ Fn×n so that tr(ABC) 6= tr(BAC).
0 1 0 0 0 0
Answer: For instance A = ,B = , and C = . Then
0 0 1 0 0 1
tr(ABC) = 0 6= 1 = tr(BAC).
Chapter 2
Exercise 2.6.1 For the proof of Lemma 2.1.1 provide a reason why each equality holds.
For instance, the equality 0 = 0u + v is due to Axiom 5 in the definition of a vector space
and v being the additive inverse of 0u.
Answer:
0 = (Axiom 5) = 0u + v = (Field Axiom 4) = (0 + 0)u + v =
= (Axiom 8) = (0u + 0u) + v = (Axiom 2) = 0u + (0u + v) =
= (Axiom 5) = 0u + 0 = (Axiom 4) = 0u.
As p(x) = q(x) for all x ∈ F, we get that Pp(j) (x) = q (j) (x) for all j.
PIn particular,
p(j) (0) = q (j) (0) for all j. When p(X) = n j
j=0 pj X and q(X) =
n j
j=0 qj X , then
1 (j) 1 (j)
pj = j! p (0) = j! q (0) = qj , for all j. This proves that p(X) = q(X).
When we took derivatives we used that we are working over F = R or F = C. For the other
fields F we are considering in this chapter, derivatives of functions are not defined.
Exercise 2.6.3 When the underlying field is Zp , why does closure under addition
automatically imply closure under scalar multiplication?
Answer: To show that cx lies in the subspace, one simply needs to observe that
cx = x + · · · + x, where in the right-hand side there are c terms. When the subspace is
closed under addition, x + · · · + x will be in the subspace, and thus cx lies in the subspace.
(a) W = {f : R → R : f is continuous}.
(b) W = {f : R → R : f is differentiable}.
Answer: (a). The constant zero function is continuous. As was shown in calculus, when f
and g are continuous, then so are f + g and cf . This gives that W is a subspace.
(b). The constant zero function is differentiable. As was shown in calculus, when f and g
are differentiable, then so are f + g and cf . This gives that W is a subspace.
256 Answers to Exercises
Answer: This is a subspace: the zero matrix lies in W (choose a = b = c = 0), the sum
of two matrices in W is again of the same type, and a scalar multiple of a matrix in
W is again of the same type. In fact,
1 0 0 0 1 0 0 0 1
W = Span 0 1 0 , 0 0 1 , 0 0 0 .
0 0 1 0 0 0 0 0 0
Exercise 2.6.6 For the following vector spaces (V over F) and vectors, determine
whether the vectors are linearly independent or linearly independent.
3 3 0 0
(d) Let F = R, V = {f | f : R → R is a continuous function}, and consider the vectors
cos 2x, sin 2x, cos2 x, sin2 x.
Answer: The equality cos 2x = cos2 x − sin2 x holds for all x ∈ R. Thus
cos 2x + 0(sin 2x) − cos2 x + sin2 x = 0(x) for all x ∈ R, thus the vectors are linearly
dependent.
(e) Let F = C, V = C2×2 , and consider the vectors
i 1 1 1 −1 i
, , .
−1 −i i −i −i 1
Answer: Suppose
i 1 1 1 −1 i 0 0
a +b +c = .
−1 −i i −i −i 1 0 0
Rewriting we get
i 1 −1
1 a 0
1 i
b = 0 .
−1 i −i
c 0
−i −i 1
258 Answers to Exercises
Row reducing this matrix gives no pivot in column three. We find that
a −i
b = c 0
c 1
is the general solution. Indeed,
i 1 1 1 −1 i 0 0
−i +0 + = ,
−1 −i i −i −i 1 0 0
and thus these vectors are linearly dependent.
(f) Let F = R, V = C2×2 , and consider the vectors
i 1 1 1 −1 i
, , .
−1 −i i −i −i 1
Answer: Suppose
i 1 1 1 −1 i 0 0
a +b +c = ,
−1 −i i −i −i 1 0 0
with now a, b, c ∈ R. As before we find that this implies that a = −ic and b = 0. As
a, b, c ∈ R, this implies that a = b = c = 0, and thus these vectors are linearly
independent over R.
(g) Let F = Z5 , V = F3×2 , and consider the vectors
3 4 1 1 1 2
1 0 , 4 2 , 3 1 .
1 0 1 2 1 2
Answer: Suppose
3 4 1 1 1 2 0 0
a 1 0 + b 4
2 + c 3
1 = 0 0 .
1 0 1 2 1 2 0 0
Rewriting, we get
3 1 1 0
4 1 2 0
a
1 4 3 0
b = .
0 2 1 0
1 c
1 1 0
0 2 2 0
Row reducing this matrix gives pivots in all columns, thus a = b = c = 0 is the only
solution. Thus the vectors are linearly independent.
(h) Let F = R, V = {f | f : R → R is a continuous function}, and consider the vectors
1, et , e2t .
Answer: Suppose a + bet + ce2t ≡ 0. As this equality holds for all t, we can choose for
instance t = 0, t = ln 2 and t = ln 3, giving the system
1 1 1 a 0
1 2 4 b = 0 .
1 3 9 c 0
Row reducing this matrix gives pivots in all columns, thus a = b = c = 0 is the only
solution. Thus the vectors 1, et , e2t are linearly independent.
Exercise 2.6.8 (a) Show that if the set {v1 , . . . , vk } is linearly independent, and vk+1
is not in Span{v1 , . . . , vk }, then the set {v1 , . . . , vk , vk+1 } is linearly independent.
(b) Let W be a subspace of an n-dimensional vector space V , and let {v1 , . . . , vp } be a
basis for W . Show that there exist vectors vp+1 , . . . , vn ∈ V so that
{v1 , . . . , vp , vp+1 , . . . , vn } is a basis for V .
(Hint: once v1 , . . . , vk are found and k < n, observe that one can choose
vk+1 ∈ V \ (Span{v1 , . . . , vk }). Argue that this process stops when k = n, and that
at that point a basis for V is found.)
Exercise 2.6.10 For the following choices of subspaces U and W in V , find bases for
U + W and U ∩ W .
Answer: (a) A general element in U is of the form a(X + 1) + b(X 2 − 1). For this to be in
W , we need a(2 + 1) + b(4 − 1) = 0. Thus 3a + 3b = 0, yielding a = −b. Thus a general
element in U ∩ W is of the form a(X + 1 − (X 2 − 1)) = a(2 + X − X 2 ). A basis for U ∩ W
is {2 + X − X 2 }.
1
is a basis for U ∩ W .
1 0 0 1
From the calculations above, we see that the first three columns are pivot columns. Thus
3 2 1
0 1 2
{
2 , 0 , 1}
1 0 0
is a basis for U + W .
(a) {v1 + v2 + v3 + v4 , v1 − v2 + v3 − v4 , v1 − v2 − v3 − v4 }
(b) {v1 + v2 , v2 + v3 , v3 + v4 , v4 + v5 , v5 + v2 }
(c) {v1 + v3 , v4 − v2 , v5 + v1 , v4 − v2 , v5 + v3 , v1 + v2 }.
When you did this exercise, did you make any assumptions on the underlying field?
(c) Here we have six vectors in the five-dimensional space Span{v1 , v2 , v3 , v4 , v5 }. Thus
these vectors are linearly dependent. This works for all fields.
Exercise 2.6.12
Let {v1 , v2 , v3 , v4 } be a basis for a vector space V over Z3 . Determine whether the
following are also bases for V .
(a) {v1 + v2 + v3 + v4 , v1 − v2 + v3 − v4 , v1 − v2 − v3 − v4 .}
(b) {v1 , v2 + v3 + v4 , v1 − v2 + v3 − v4 , v1 − v2 − v3 − v4 .}
(c) {v1 + v2 + v3 + v4 , v1 − v2 + v3 − v4 , v1 − v2 − v3 − v4 , v2 + v4 , v1 + v3 .}
Answer: (a) These three vectors can never span the four-dimensional space V , so this is
not a basis.
(c) Here we have five vectors in a four-dimensional vector space, thus not a basis.
Answers to Exercises 263
Exercise 2.6.13
For the following choices of vector spaces V over the field F, bases B and vectors v,
determine [v]B .
t3 +3t2 +5
(b) Let F = R, B = {t, t2 , 1t }, V = SpanB and v = t
.
(c) Let F = C, V = C2×2 ,
0 1 1 1 i 0 i 1 −2 + i 3 − 2i
B={ , , }, v = .
−1 −i i −i −1 −i −1 −i −5 − i 10
2
2
Answer: (a) [v]B =
1 .
2
3
(b) [v]B = 1 .
5
4−i
2 + 7i
(c) [v]B =
−3 + 12i .
−3 − 8i
−1
0
0
1
(d) [v]B = .
0
1
−1
0
1
(e) [v]B = 1 .
1
264 Answers to Exercises
(b) We observe that (A + B)∗ = A∗ + B ∗ and (cA)∗ = cA∗ , when c ∈ R. Observe that the
zero matrix is in Hn . Next, if A, B ∈ Hn , then (A + B)∗ = A∗ + B ∗ = A + B, thus
A + B ∈ Hn . Finally, if c ∈ R and A ∈ Hn , then (cA)∗ = cA∗ = cA, thus cA ∈ Hn . This
shows that Hn is a subspace over R.
Exercise 2.6.15 (a) Show that for finite-dimensional subspaces U and W of V we have
that dim(U + W ) = dim U + dim W − dim(U ∩ W ).
(Hint: Start with a basis {v1 , . . . , vp } for U ∩ W . Next, find u1 , . . . , uk so that
{v1 , . . . , vp , u1 , . . . , uk } is a basis for U . Similarly, find w1 , . . . , wl so that
{v1 , . . . , vp , w1 , . . . , wl } is a basis for W . Finally, argue that
{v1 , . . . , vp , u1 , . . . , uk , w1 , . . . , wl } is a basis for U + W .)
(b) Show that for a direct sum U1 +̇ · · · +̇Uk of finite-dimensional subspaces U1 , . . . , Uk ,
we have that
dim(U1 +̇ · · · +̇Uk ) = dim U1 + · · · + dim Uk .
suppose that
p
X k
X l
X
ai vi + bi ui + ci wi = 0.
i=1 i=1 i=1
Then
p
X k
X l
X
ai vi + bi ui = − ci wi ∈ U ∩ W.
i=1 i=1 i=1
As {v1 , . . . , vp } is a basis for U ∩ W , there exist di so that
l
X p
X
− ci wi = di vi .
i=1 i=1
(b) We show this by induction. It is trivial for k = 1. Suppose we have proven the
statement for k − 1, giving dim(U1 +̇ · · · +̇Uk−1 ) = dim U1 + · · · + dim Uk−1 . Then, using
(a) we get
dim[(U1 +̇ · · · +̇Uk−1 )+̇Uk ] = dim(U1 +̇ · · · +̇Uk−1 ) + dim Uk −
dim[(U1 +̇ · · · +̇Uk−1 ) ∩ Uk ] = dim(U1 +̇ · · · +̇Uk−1 ) + dim Uk − 0,
where we used that (U1 +̇ · · · +̇Uk−1 ) ∩ Uk = {0}. Now using the induction assumption, we
get
dim[(U1 +̇ · · · +̇Uk−1 )+̇Uk ] = dim(U1 +̇ · · · +̇Uk−1 ) + dim Uk =
dim U1 + · · · + dim Uk−1 + dim Uk .
This proves the statement.
266 Answers to Exercises
Chapter 3
Answer: (S ◦ T )(v + w) = S(T (v + w)) = S(T (v) + T (w)) = S(T (v)) + S(T (w)) =
(S ◦ T )(v) + (S ◦ T )(w), and
(S ◦ T )(cv) = S(T (cv)) = S(cT (v)) = cS(T (v)) = c(S ◦ T )(v), proving linearity.
(a) V = R3 , W = R4 ,
x1 − 5x3
x1 7x2 + 5
T x2 =
3x1 − 6x2 .
x3
8x3
(b) V = Z35 , W = Z25 ,
x1
x1 − 2x3
T x2 = .
3x2 x3
x3
(c) V = W = C2×2 (over F = C), T (A) = A − AT .
(d) V = W = C2×2 (over F = C), T (A) = A − A∗ .
(e) V = W = C2×2 (over F = R), T (A) = A − A∗ .
(f) V = {f : R → R : f is differentiable}, W = RR ,
(T (f ))(x) = f 0 (x)(x2 + 5).
(g) V = {f : R → R : f is continuous}, W = R,
Z 10
T (f ) = f (x)dx.
−5
0
0 5
Answer: (a) T 0 =
6= 0, thus T is not linear.
0
0
0
0 0
−2 −4
(b) 2T 1 = 2
6= = T 2, so T is not linear.
3 12
1 2
Exercise 3.4.3 Show that if T : V → W is linear and the set {T (v1 ), . . . , T (vk )} is
linearly independent, then the set {v1 , . . . , vk } is linearly independent.
Exercise 3.4.4 Show that if T : V → W is linear and onto, and {v1 . . . , vk } is a basis for
V , then the set {T (v1 ), . . . , T (vk )} spans W . When is {T (v1 ), . . . , T (vk )} a basis for W ?
When T is not one-to-one, then {T (v1 ), . . . , T (vk )} is linearly independent, and therefore
a basis. Indeed, suppose that c1 T (v1 ) + · · · + ck T (vk ) = 0. Then
T (c1 v1 + · · · + ck vk ) = T (0). When T is one-to-one, this implies c1 v1 + · · · + ck vk = 0.
As {v1 . . . , vk } is linearly independent, this yields c1 = · · · = ck = 0.
Answer: (a) First observe that 0 ∈ U and T (0) = 0 gives that 0 ∈ T [U ]. Next, let w,
ŵ ∈ T [U ] and c ∈ F. Then there exist u, û ∈ U so that T (u) = w and T (û) = ŵ. Then
w + ŵ = T (u + û) ∈ T [U ] and cw = T (cu) ∈ T [U ]. Thus, by Proposition 2.3.1, T [U ] is a
subspace of W .
268 Answers to Exercises
(b) Let {v1 , . . . , vp } be a basis for U . We claim that T [U ] = Span{T (v1 ), . . . , T (vp )},
from which it then follows that dim T [U ] ≤ dim U .
Thus T [U ] ⊆ Span{T (v1 ), . . . , T (vp )}. We have shown both inclusions, and consequently
T [U ] = Span{T (v1 ), . . . , T (vp )} follows.
0 0 0 1
1 0 0 0
Answer: (a)
.
0 1 0 0
0 0 1 0
From
this we deduce
that with respect to
thebasis
{v1 , v2 , v3 , v4 } we have that
−1 −1 1 0
0 −1 0 2
{
1 , 0 } is a basis for Ker T , { 1 , 2} is a basis for Ran T . In other
0 1 −1 0
words, {−v1 + v3 , −v1 − v2 + v4 } is a basis for Ker T , {v1 + v3 − v4 , 2v2 + 2v3 } is a
basis for Ran T .
p(1)
Exercise 3.4.7 Consider the linear map T : R2 [X] → R2 given by T (p(X)) = .
p(3)
4
1
Answer: (a) {
}.
4
1
1 0 1 1 0 1
(b) { , , }.
0 1 0 0 1 0
Exercise 3.4.9 For the following T : V → W with bases B and C, respectively, determine
the matrix representation for T with respect to the bases B and C. In addition, find bases
for the range and kernel of T .
2
d d
(a) B = C = {sin t, cos t, sin 2t, cos 2t}, V = W = Span B, and T = dt 2 + dt .
1 1 p(3)
(b) B = {1, t, t2 , t3 }, C = { , }, V = C3 [X], and W = C2 , and T (p) = .
0 −1 p(5)
d
(c) B = C = {et cos t, et sin t, e3t , te3t }, V = W = Span B, and T = dt .
R 1
1 1 0 p(t)dt .
(d) B = {1, t, t2 }, C = { , }, V = C2 [X], and W = C2 , and T (p) =
1 0 p(1)
−1 −1 0 0
1 −1 0 0 , Ker T = {0}, B is a basis for Ran T .
Answer: (a) 0 0 −4 −2
0 0 2 −4
270 Answers to Exercises
15 120
2 8 34 152 −8 −49
(b) 1 0 } is a basis for Ker T , {e1 , e2 } is a
, { ,
−1 −5 −25 −125
0 1
basis
for Ran T.
1 1 0 0
−1 1 0 0
(c) , Ker T = {0}, {e1 , e2 , e3 , e4 } is a basis for Ran T .
0 0 3 1
0 0 0 3
1 1 1 1 4 2
(d) 2 , { 3 − 3 X + X } is a basis for Ker T , {e1 , e2 } is a basis for
0 − 21 −3
Ran T .
1
Exercise 3.4.10 Let V = Cn×n . Define L : V → V via L(A) = 2
(A + AT ).
(a) Let
1 0 0 1 0 0 0 0
B={ , , , }.
0 0 0 0 1 0 0 1
Determine the matrix representation of L with respect to the basis B.
(b) Determine the dimensions of the subspaces
W = {A ∈ V : L(A) = A} and Ker L = {A ∈ V : L(A) = 0}.
1 0 0 0
0 1 1
2 2
0
Answer: (a) C := [L]B←B =
1 1
.
0 2 2
0
0 0 0 1
− 21 1
0 0 0 0 0 0
2
0 − 12 1
2
0
0 0 0 0
(b) Row reduce C − I = 1 → , so dim W = 3.
0
2
− 12 0 0 0 0 0
0 0 0 0 0 0 0 0
1 0 0 0 1 0 0 0
0 1 1 1 1
2 2
0 → 0
2 2
0
Row reduce C =
1 1 0
, so dim Ker L = 1.
0 2 2
0 0 0 1
0 0 0 1 0 0 0 0
(a) Find the matrix representation of A with respect to the bases B and C.
(b) Find bases for Ran A and Ker A.
Answers to Exercises 271
0 4 0 ··· 0
.. ..
0
−3 8 . .
..
2 −6 . 0
Answer: (a) .
..
.
4 4n
..
. −3n
0 2n
{2t2 − 3t + 4, 2t3 − 3t2 + 4t, 2t4 − 3t3 + 4t2 , · · · , 2tn+1 − 3tn + 4tn−1 } is a basis for Ran A.
272 Answers to Exercises
Chapter 4
Exercise 4.10.2 For the following matrices A (and B) determine its Jordan canonical
form J and a similarity matrix P , so that P −1 AP = J.
(a)
−1 1 0 0
−1 0 1 0
A=
.
−1 0 0 1
−1 0 0 1
This matrix is nilpotent.
Answer:
1 0 0 0 0 1 0 0
1 1 0 0
, J = 0 0 1 0
P =
1
.
1 1 0 0 0 0 1
1 1 1 1 0 0 0 0
(b)
10 −1 1 −4 −6
9 −1 1 −3 −6
A=4 −1 1 −3 −1 .
9 −1 1 −4 −5
10 −1 1 −4 −6
This matrix is nilpotent.
Answer:
1 3 1 −1 2 0 1 0 0 0
1 2 2 −1 0 0 0 0 0 0
P = 1 3 2 4 5 , J = 0 0 0 1 0
1 3 1 0 2 0 0 0 0 1
1 3 1 −1 −3 0 0 0 0 0
(c)
0 1 0
A = −1 0 0 .
1 1 1
Answer: Eigenvalues are 1, i − i.
0 −1 −1 1 0 0
P = 0 −i i , J = 0 i 0
1 i −i 0 0 −i
Answers to Exercises 273
(d)
2 0 −1 1
0 1 0 0
A=
.
1 0 0 0
0 0 0 1
Answer: The only eigenvalue is 1. We have
0 0 0 1
0 0 0 0
(A − I)2 = 3
0 0 0 1 , and (A − I) = 0.
0 0 0 0
So we get
1
1 1 0
J = .
0 1 1
0 0 1
For P we can choose
0 1 1 0
1 0 0 0
.
0 1 0 0
0 0 0 1
(e)
1 −5 0 −3
1 1 −1 0
B=
.
0 −3 1 −2
−2 0 2 1
(Hint: 1 is an eigenvalue)
Answer: We find that 1 is the only eigenvalue.
1 0 −1 0 0 −2 0 −1
0 −2 0 −1 0 0 0 0
(B − I)2 = 3
, and (A − I) = .
1 0 −1 0 0 −2 0 1
0 4 0 2 0 0 0 0
So we get
1 1 0 0
0 1 1 0
J =
0
.
0 1 1
0 0 0 1
For P we can choose
−1 0 −3 0
0 −1 0 0
P =
−1
.
0 −2 0
0 2 0 1
(f) For the matrix B, compute B 100 , by using the decomposition B = P JP −1 .
Answer: As
2 0 −3 0 1 100 4950 161700
−1
0 −1 0 0 , J 100 = 0
1 100 49500
P = ,
−1 0 1 0 0 1 100
0 2 0 1 0 2 0 1
we find that
4951 −323900 −4950 −162000
100 −9899 −100 −4950
B 100 = P J 100 P −1 = .
4950 −323700 −4949 −161900
−200 19800 200 9901
274 Answers to Exercises
(b) {e1 , e2 , e4 , e5 , e6 , e7 }.
(c) {e1 , e2 , e3 , e4 , e5 , e6 , e7 }.
(a) Prove that M and N are similar if and only if they have the same rank.
(b) Give a counterexample to show that the statement is false if 6 is replaced by 7.
(c) Compute the minimal and characteristic polynomials of the following matrix. Is it
diagonalizable?
5 −2 0 0
6 −2 0 0
0 0 0 6
0 0 1 −1
Answer: (a) both M and N have only 0 as the eigenvalue, and at least one Jordan block
at 0 is of size 3 × 3. So the possible Jordan forms are
0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0
0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
, , .
0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Knowing the rank uniquely identifies the Jordan canonical form.
0 1 0 0 0 0 0 0 1 0 0 0 0 0
0 0 1 0 0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
(b) M = 0 0 0 0 1 0 0 and N = 0 0 0 0 1 0 0 have the
0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0
same rank and same minimal polynomial x3 , but are not similar.
Answers to Exercises 275
(c) pA (x) = (x − 2)2 (x − 1)(x + 3) and mA (x) = (x − 2)(x − 1)(x + 3). As all roots of
mA (x) have multiplicity 1, the matrix A is diagonalizable.
Exercise 4.10.5 (a) Let A be a 7 × 7 matrix of rank 4 and with minimal polynomial
equal to qA (λ) = λ2 (λ + 1). Give all possible Jordan canonical forms of A.
(b) Let A ∈ Cn . Show that if there exists a vector v so that v, Av, . . . , An−1 v are linearly
independent, then the characteristic polynomial of A equals the minimal polynomial
of A. (Hint: use the basis B = {v, Av, . . . , An−1 v}.)
0 1 0 0 0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 1 0 0 0
Answer: (a) 0 0 0 0 0 0 0 , 0 0 0 0 0 0 0 ,
0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 −1 0
0 0 0 0 0 0 −1 0 0 0 0 0 0 −1
0 1 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0 .
0 0 0 0 −1 0 0
0 0 0 0 0 −1 0
0 0 0 0 0 0 −1
Exercise 4.10.6 Let A ∈ Fn×n and AT denote its transpose. Show that
wk (A, λ) = wk (AT , λ), for all λ ∈ F and k ∈ N. Conclude that A and AT have the same
Jordan canonical form, and are therefore similar.
Answer: In general we have that for any matrix rank B = rankB T . If B is square of size
n × n, we therefore have that
dim Ker B = n − rank B = n − rankB T = dim Ker B T .
dim Ker(A − λI)k = dim Ker(AT − λI)k . Then it follows that wk (A, λ) = wk (AT , λ), for
all λ ∈ F and k ∈ N. Thus A and AT have the same Jordan canonical form, and are
therefore similar.
Answer: (a) Let m(t) = t2 + 1 = (t − i)(t + i). Then m(A) = 0, so the minimal polynomial
of A divides m(A). Thus the only possible eigenvalues of A are i and −i.
(b) As the minimal polynomial of A only has roots of multiplicity 1, the Jordan canonical
form will only have 1 × 1 Jordan blocks. Thus the Jordan canonical form of A is a
diagonal matrix with i and/or −i appearing on the diagonal.
Exercise 4.10.8 Let p(x) = (x − 2)2 (x − 3)2 . Determine a matrix A for which p(A) = 0
6 0 for all nonzero polynomials q of degree ≤ 3. Explain why q(A) 6= 0
and for which q(A) =
for such q.
2 1 0 0
0 2 0 0
Answer: Let A = . Then p(x) is the minimal polynomial for A, and thus
0 0 3 1
0 0 0 3
p(A) = 0, and for any nonzero polynomial q(x) with degree less than deg p = 4 we have
6 0.
q(A) =
Answer: (a)
1 1 0 0 0 0 1 1 0 0 0 0
0 1 0 0 0 0 0 1 0 0 0 0
0 0 2 0 0 0 0 0 2 0 0 0
or , where a, b ∈ {1, 2, 3}.
0 0 0 3 0 0 0 0 0 3 0 0
0 0 0 0 1 1 0 0 0 0 a 0
0 0 0 0 0 1 0 0 0 0 0 b
(b)
1 1 0 0 0 0
0 1 0 0 0 0
0 0 2 0 0 0
.
0 0 0 3 0 0
0 0 0 0 1 0
0 0 0 0 0 1
Answer: Let m(t) = t2 + t = t(t + 1). Then m(A) = 0, and thus the minimal polynomial
mP (t) of P divides m(t). Thus there are three possibilities mA (t) = t, mA (t) = t + 1 or
mA (t) = t(t + 1). The only possible roots of A are therefore 0 or −1. Next, since the
minimal polynomial has roots of multiplicity 1 only, the Jordan blocks are all of size 1 × 1.
Thus the Jordan canonical of A is a diagonal matrix with 0 and/or −1 on the diagonal.
Exercise 4.10.11 Let A ∈ Cn×n . For the following answer True or False. Provide an
explanation.
Answer: (a) and (b). Let m(t) = t2 − t = t(t − 1). Then m(P ) = 0, and thus the minimal
polynomial mP (t) of P divides m(t). Thus there are three possibilities mP (t) = t,
mP (t) = t − 1 or mP (t) = t(t − 1). The only possible roots of P are therefore 0 or 1. Next,
since the minimal polynomial has roots of multiplicity 1 only, the Jordan blocks are all of
size 1 × 1. Thus the Jordan canonical J of P is a diagonal matrix with zeros and/or ones
on the diagonal. The rank of J is equal to the sum of its diagonal entries, and as rank and
trace do not change when applying a similarity, we get rank P = trace P.
Answer: Let v ∈ Ran A. Then there exists an x so that v = Ax = P (JP −1 x). Then
v ∈ P [Ran J] follows. Conversely, let v ∈ P [Ran J]. Then there exists an x so that
v = P (Jx). Then v = P Jx = A(P −1 x). Thus v ∈ Ran A.
By Exercise 3.4.5, it follows that dim Ran A = dim P [Ran J] ≤ dim Ran J. As
Ran J = P −1 [Ran J], it also follows dim Ran J = dim P −1 [Ran A] ≤ dim Ran A. Thus
dim Ran A = dim Ran J. Similarly, dim Ker A = dim Ker J.
Exercise 4.10.15 Show that matrices A and B are similar if and only if they have the
same Jordan canonical form.
Answer: Suppose A and B have the same Jordan canonical form J. Then there exist
invertible P and S so that A = P JP −1 and B = SJS −1 . But then
A = P (S −1 JS)P −1 = (P S −1 )J(P S −1 )−1 , and thus A and B are similar.
Next suppose that A and B are similar. Thus there exists an invertible P so that
A = P BP −1 . Then A − λIn = P BP −1 − λP P −1 = P (B − λIn )P −1 , and thus A − λIn
and B − λIn are similar for all λ ∈ F. Also,
(A − λIn )k = (P (B − λIn )P −1 )k = P (B − λIn )k P −1 ,
and thus (A − λIn )k and (B − λIn )k are similar for all λ ∈ F and k ∈ N. By Exercise
4.10.14 it follows that dim Ker(A − λIn )k = dim Ker(B − λIn )k for all λ ∈ F and k ∈ N.
Thus wk (A, λ) = wk (B, λ) for all λ ∈ F. Consequently, A and B have the same Jordan
canonical form.
Exercise 4.10.16 Show that if A and B are square matrices of the same size, with A
invertible, then AB and BA have the same Jordan canonical form.
Answer: A−1 (AB)A = BA, so AB and BA are similar, and thus have the same Jordan
canonical form.
Exercise 4.10.18 Let A, B ∈ Cn×n be such that (AB)n = 0. Prove that (BA)n = 0.
Answer: As (AB)n = 0, we have that 0 is the only eigenvalue of AB. By Exercise 4.10.17
this means that 0 is also the only eigenvalue of BA. Thus BA is nilpotent, and thus
(BA)n = 0.
3 1 3 1
0 3 0 3
3 1 3
0 3 3
Answer: (a) , .
i i
i i
−i −i
−i −i
Exercise 4.10.21 For the following pairs of matrices A and B, find a polynomial p(t) so
that p(A) = B, or show that it is impossible.
1 1 0 1 2 3
(a) A = 0 1 1 , B = 0 2 3 .
0 0 1 0 0 3
Answer: AB 6 BA and A is
= nonderogatory, so no polynomial p exists.
Answers to Exercises 281
1 1 0 1 2 3
(b) A = 0 1 1 , B = 0 1 2 .
0 0 1 0 0 1
2t 1 2 2t
−1
te2t
1 −1 1 e 2
t e 1 −1 1 1
Answer: x(t) = 0 1 −1 0 e2t te2t 0 1 −1 −1 =
0 1 0 0 0 e2t 0 1 0 0
1 2
( 2 t − t + 1)e2t
(t − 1)e2t .
te2t
(a) 0
x1 (t) = 3x1 (t) − x2 (t) x1 (0) = 1
,
x02 (t) = x1 (t) + x2 (t) x2 (0) = 2
Answer: x1 (t) = e2t − te2t , x2 (t) = 2e2t − te2t .
(b) 0
x1 (t) = 3x1 (t) + x2 (t) + x3 (t) x1 (0) = 1
x0 (t) = 2x1 (t) + 4x2 (t) + 2x3 (t) , x2 (0) = −1
20
x3 (t) = −x1 (t) − x2 (t) + x3 (t) x3 (0) = 1
1 4t
Answer: x1 (t) = 2
e + 12 e2t , x2 (t) = e4t − 2e2t , x3 (t) = − 12 e4t + 32 e2t .
(c)
x01 (t) = −x2 (t) x1 (0) = 1
,
x02 (t) = x1 (t) x2 (0) = 2
Answer: x1 (t) = cos t − 2 sin t, x2 (t) = sin t + 2 cos t.
(d)
x00 (t) − 6x0 (t) + 9x(t) = 0, x(0) = 2, x0 (0) = 1.
Answer: 2e3t − 5te3t .
(e)
x00 (t) − 4x0 (t) + 4x(t) = 0, x(0) = 6, x0 (0) = −1.
Answer: 6e2t − 13te2t .
Exercise 4.10.24 For the following matrices we determined their Jordan canonical form
in Exercise 4.10.2.
282 Answers to Exercises
(b) When AB = BA, then eA eB = eA+B . Prove this statement when A is nonderogatory.
Answer: When A is nonderogatory, and AB = BA, we have by Theorem 4.6.2 that
B = p(A) for some polynomial. We can now introduce the functions
Exercise 4.10.26 Compute the matrices P20 , P21 , P22 from Example 4.8.5.
Answer:
− 12 1 1
1 2
0 1 2
0 0 1 0 1 0
− 12 3 1
0 0 1
P20 = 2 2 ,
0 −1 2 0 1 1
1
− 12 −1
0 0 0
2 2
0 − 12 1
2
0 0 1
2
1
− 12 − 12
0 2
0 0
1
−1
2
− 32 0 −2 −21
0 − 12 1
0 0 1
P21 = 2 2 ,
1
− 32 5
2
0 2 3
2
−1 1 −2 0 −2 −1
1 −1 2 0 2 1
−1 1 −2 0 −2 −1
1 −1 2 0 2 1
1 1 −1 2 0 2 1
P22 = .
2 1 −1 2 0 2 1
0 0 0 0 0 0
0 0 0 0 0 0
π
2
1 −1
π π
Exercise 4.10.28 Let A = 0 2
−4 .
π
0 0 4
Then cos A = cos SJS −1 = S cos (J) S −1 , and because we have a 2 × 2 Jordan block,
0 −1
0
cos J = 0 0 √
0
2
0 0 2
− 21 sin λ − 16 cos λ
sin λ cos λ
1
sin λ cos λ − 2 sin λ S −1 ,
sin A = S
sin λ cos λ
sin λ
− 21 cos λ 1
cos λ − sin λ 6
sin λ
1
cos λ − sin λ − 2 cos λ S −1 ,
cos A = S
cos λ − sin λ
cos λ
and
− 68 cos 2λ
sin 2λ 2 cos 2λ −4 sin 2λ
sin 2λ 2 cos 2λ −2 sin 2λ S −1 .
sin 2A = S
sin 2λ 2 cos 2λ
sin 2λ
Using now double angle formulas such as
sin 2λ = 2 sin λ cos λ, cos 2λ = cos2 λ − sin2 λ,
one checks that sin 2A = 2 sin A cos A in this case. The more general case where
A = SJS −1 , with J a direct sum of single Jordan blocks, now also follows easily.
Answers to Exercises 285
Answer:
c1 e3t + c2 e−3t + 51 e−2t
x1 (t)
= 3t −3t 7 −2t .
x2 (t) c1 e − 2c2 e − 10 e
Answer: First note that by Cauchy’s integral formula (4.34) with f (z) ≡ 1, we have that
Z Z
1 1 1 1
1= dz, 0 = dz, k ≥ 1.
2πi γ z − λj 2πi γ (z − λj )k+1
Using now (4.33) we get that
Z m nXj −1 Z m
1 X k! 1 X
R(z)dz = dzP lk = Pj0 = I.
2πi γ l=1 k=0
2πi γ (z − λj )k+1 j=0
Using the definitions of Pjk as in Theorem 4.8.4, one sees that this equals A, as desired.
R(λ)−R(µ)
(a) λ−µ
= −R(λ)R(µ).
dR(λ)
(b) dλ
= −R(λ)2 .
j
d R(λ)
(c) dλj
= (−1)j j!R(λ)j+1 , j = 1, 2, . . . .
dj R(λ)
Let us now prove dλj = (−1)j j!R(λ)j+1 by induction on j. The j = 1 case was covered
in part (b). Assume now that it has been proven that
dj−1 R(λ)
= (−1)j−1 (j − 1)!R(λ)j .
dλj−1
Then
dj R(λ) d R(λ + h)j − R(λ)j
j
= (−1)j−1 (j − 1)!R(λ)j = (−1)j−1 (j − 1)! lim .
dλ dλ h→0 h
Write now
j−1
X
R(λ + h)j − R(λ)j = (R(λ + h)j−k R(λ)k − R(λ + h)j−k−1 R(λ)k+1 ).
k=0
Using observation (4.28), we have that
R(λ + h)j−k R(λ)k − R(λ + h)j−k−1 R(λ)k+1
lim =
h→0 h
j−k+k+1 j+1
−R(λ) = −R(λ) .
And thus
j−1
R(λ + h)j − R(λ)j X
lim = −R(λ)j+1 = −jR(λ)j+1 .
h→0 h k=0
Consequently,
dj R(λ)
= (−1)j−1 (j − 1)!(−j)R(λ)j+1 = (−1)j j!R(λ)j+1 ,
dλj
as desired.
Answer: First note that by Cauchy’s integral formula (4.34) with f (z) = z, we have that
Z Z
1 z 1 z
λj = dz, 0 = dz when j =6 l,
2πi γj z − λj 2πi γl z − λj
Z
1 z
1= dz,
2πi γj (z − λj )2
Z Z
1 z 1 z
0= dz when j 6
= l, 0 = dz, k ≥ 2.
2πi γl (z − λj )2 2πi γl (z − λj )k+1
Using now (4.33) we get that
Z m nXj −1 Z
1 X 1 z
zR(z)dz = k![ dz]Plk = λj Pj0 + Pj1 .
2πi γj l=1 k=0
2πi γj (z − λj )k+1
Using Theorem 4.8.4 ones sees that
−1
APj0 = S(⊕m
l=1 J(λl )S Pj0 = S(0 ⊕ · · · ⊕ J(λj )Inj ⊕ · · · ⊕ 0)S −1 = λj Pj0 + Pj1 .
Answers to Exercises 287
Chapter 5
Exercise 5.7.1 For the following, check whether h·, ·i is an inner product.
(a) V = R2 , F = R,
x y
h 1 , 1 i = 3x1 y1 + x1 y2 + x2 y1 + 2x2 y2 .
x2 y2
(b) V = C2 , F = C,
x y
h 1 , 1 i = 3x1 y1 + x1 y2 + x2 y1 + 2x2 y2 .
x2 y2
x y
Answer: (a) Write h 1 , 1 i = (x1 + x2 )(y1 + y2 ) + 2x1 y1 + x2 y2 and realizing that
x2 y2
everything is over R, it is easy to see that this defines an inner product.
(c) Let f (t) = t(t − 1)(t − 2), then f 6= 0, but hf, f i = 0. Thus this is not an inner product.
(d) Nonnegativity, linearity and symmetry are easy to check. Next suppose that hf, f i = 0.
Then we get that f (0) = f (1) = f (2) = 0. As f ∈ R2 [X], this implies that f = 0 (as a
degree ≤ 2 polynomial with three roots is the zero polynomial).
(e) Nonnegativity, linearity and (complex conjugate) symmetry are easy to check. Next
suppose that hf, f i = 0. This implies that int10 |f (x)|2 (x2 + 1)dx = 0. Since the integrand
is continuous and nonnegative, we must have that |f (x)|2 (x2 + 1) = 0 for x ∈ [0, 1]. Thus
f (x) = 0 for x ∈ [0, 1]. Thus f = 0. This shows that this is an inner product.
(a) V = C2 , F = C,
x1
k k = x21 + x22 .
x2
(b) V = C2 , F = C,
x1
k k = |x1 | + 2|x2 |.
x2
288 Answers to Exercises
i
Answer: (a) Not a norm. For instance, k k = −1 6≥ 0.
0
(b) Clearly this quantity is always nonnegative, and when it equals 0 we need that
|x1 | = 0 = |x2 |, yielding that x = 0. Thus the first property of a norm is satisfied.
Finally,
kx + yk = |x1 + y1 | + 2|x2 + y2 | ≤ |x1 | + |y1 | + 2(|x2 | + |y2 |) = kxk + kyk,
yielding that k · k is a norm.
(d) Notice that 1 − x ≥ 0 when 0 ≤ x ≤ 1, thus kf k ≥ 0 for all f ∈ V . Next, suppose that
kf k = 0. As |f (x)|(1 − x) ≥ 0 on [0, 1], the only way the integral can be zero is when
|f (x)|(1 − x) = 0 for x ∈ [0, 1]. Thus f (x) = 0 for x ∈ (0, 1], and then, by continuity, it also
follows that f (0) = limx→0+ f (x) = 0. Thus f is the zero function. This takes care of the
first condition of a norm.
For properties (ii) and (iii), observe that |cf (x)| = |c||f (x)| and
|(f + g)(x)| = |f (x) + g(x)| ≤ |f (x)| + |g(x)|. Using this, it is easy to see that
kcf k = |c|kf k and kf + gk ≤ kf k + kgk, giving that k · k is a norm.
2 0
Answers to Exercises 289
for W ⊥ .
Exercise 5.7.5 Let h·, ·i be the Euclidean inner product on Fn , and k · k the associated
norm.
(a) Let F = C. Show that A ∈ Cn×n is the zero matrix if and only if hAx, xi = 0 for all
x ∈ Cn . (Hint: for x, y ∈ C, use that hA(x + y), x + yi = 0 = hA(x + iy), x + iyi.)
(b) Show that when F = R, there exists nonzero matrices A ∈ Rn×n , n > 1, so that
hAx, xi = 0 for all x ∈ Rn .
(c) For A ∈ Cn×n define
w(A) = max |hAx, xi|. (5.29)
x∈Cn ,kxk=1
Show that w(·) is a norm on Cn×n . This norm is called the numerical radius of A.
(d) Explain why maxx∈Rn ,kxk=1 |hAx, xi| does not define a norm.
For the converse, assume that hAx, xi = 0 for all x ∈ Cn . Let now x, y ∈ C. Then
0 = hA(x + y), x + yi = hAx, xi + hAy, xi + hAx, yi + hAy, yi = hAy, xi + hAx, yi (5.30)
and, similarly,
0 = hA(x + iy), x + iyi = ihAy, xi − ihAx, yi. (5.31)
Combining (5.30) and (5.31), we obtain that hAx, yi = 0 for all x, y ∈ C. Applying this
with x = ej and y = ek , we obtain that the (k, j)th entry of A equals zero. As this holds
for all k, j = 1, . . . , n, we obtain that A = 0.
290 Answers to Exercises
0 −1
(b) When n = 2 one may choose A = . For larger n one can add zero rows and
1 0
columns to this matrix.
(c) Clearly, w(A) ≥ 0. Next, suppose that w(A) = 0. Then for all kxk = 1, we have that
hAx, xi = 0. This implies that for all x ∈ Cn we have that hAx, xi = 0. By (a), this
implies that A = 0. Next, for kxk = 1, we have
|h(A + B)x, xi| ≤ |hAx, xi| + |hBx, xi| ≤ w(A) + w(B),
and thus w(A + B) ≤ w(A) + w(B). Finally, when c ∈ C, one has that
|h(cA)x, xi| = |c||hAx, xi|, and thus w(cA) = |c|w(A) follows easily.
(d) With A as in part (b), we have that maxx∈Rn ,kxk=1 |hAx, xi| = 0, and thus the first
property in the definition of a norm fails.
Exercise 5.7.7
R1 2
Answer: (a) hL(p), qi = −1 (t + 1)p(t)q(t)dt = hp, L(q)i. Thus L is self-adjoint.
R1
(b) Let p(t) ≡ 1 and q(t) = t. Then hL(p), qi = 0, and hp, L(q)i = −1 1dt = 2. Thus L is
not self-adjoint.
R1 R −1
(c) hL(p), qi = −1 −p(−t)q(t)dt = 1 −p(s)q(−s)(−ds) = hp, L(q)i. Thus L is
self-adjoint.
Answers to Exercises 291
Exercise 5.7.8 Let V = R[t] over the field R. Define the inner product
Z 2
hp, qi := p(t)q(t)dt.
0
For the following linear maps on V determine whether they are unitary.
32
Answer: (a) Let p(t) = q(t) = t. Then hp, qi = 2, while hL(p), L(q)i = 5
. Thus L is not
unitary.
Answer: (a) By (5.1) we have that |hx, U xi| ≤ kxkkU xk = kxk2 , where in the last step we
used that kxk = kU xk as U is unitary.
(b) Let x be a unit vector. As |hx, U xi| = kxk2 , we have by (the last part of) Theorem
5.1.10 that U x = αx for some α. As kxk = kU xk we must have |α| = 1. If we are in a
one-dimensional vector space, we are done. If not, let v be a unit vector orthogonal to v.
As above, we get U v = βv for some |β| = 1. In addition, we get that U (x + v) = µ(x + v)
with |µ| = 1. Now, we get that
µ = hµx, xi = hµ(x + v), xi = hU (x + v), xi = hαx + βv), xi = α.
Similarly, we prove µ = β. Thus α = β. Thus, show that U y = αy for all y ⊥ x and also
for y = x. But then the same holds for linear combinations of x and y, and we obtain that
U = αI.
1 2 1 0
(a) Let W = span{ , }. Find an orthonormal basis for W .
0 1 2 1
(b) Find a basis for W ⊥ := {B ∈ V : B ⊥ C for all C ∈ W }.
Exercise 5.7.11 Let A ∈ Cn×n . Show that if A is normal and Ak = 0 for some k ∈ N,
then A = 0.
Exercise 5.7.12 Let A ∈ Cn×n and a ∈ C. Show that A is normal if and only if A − aI
is normal.
Exercise 5.7.13 Show that the sum of two Hermitian matrices is Hermitian. How about
the product?
A product
oftwo Hermitian
matrices is not necessarily Hermitian. For example,
2 1 0 1
A= ,B = are Hermitian but AB is not.
−i 2 1 0
Exercise 5.7.14 Show that the product of two unitary matrices is unitary. How about
the sum?
The sum of two unitary matrices is in general not unitary. For example, U = I is unitary,
but U + U = 2I is not.
Exercise 5.7.15 Is the product of two normal matrices normal? How about the sum?
2 i 0 1
Answer: No, e.g., A = ,B = are normal, but neither AB nor A + B is
−i 2 i 0
normal.
Answers to Exercises 293
Exercise 5.7.17 For the following matrices A, find the spectral decomposition U DU ∗ of
A.
i2
(a) A = .
2−i
√
2 3
(b) A = √ .
3 4
3 1 1
(c) A = 1 3 1.
1 1 3
0 1 0
(d) A = 0 0 1.
1 0 0
1 1
!
√ √
2 2 1 0
Answer: (a) U = ,D = .
√i − √i 0 3
2 2
√ !
3 1
− 2 √2 1 0
(b) U = ,D = .
1 3 0 5
2 2
1 1 1
√ √
16 2 3 2 0 0
(c) U = √ − 12 √1
= 0 2 0 .
3,D
√6
− 36 0 1
√ 0 0 5
3
294 Answers to Exercises
1 1 1 1 0 0
2πi 4πi 2πi
1
(d) U = √ 1 e 3 e 3 , D = 0 e 0 .
3
3 2πi 2πi 4πi
1 e 3 e 3 0 0 e 3
3 2i
Exercise 5.7.18 Let A = .
−2i 3
√1 1
!
√ √
1 0
(b) Let U = 2 2 ,D = . Then A = U DU ∗ . Now let B = U DU ∗ ,
√i − √i 0 5
2 2
√ 1
where D= √0 . Then B is positive semidefinite, and B 2 = A.
0 5
Exercise 5.7.19 Let A ∈ Cn×n be positive semidefinite, and let k ∈ N. Show that there
exists a unique positive semidefinite B so that B k = A. We call B the kth root of A and
1
denote B = A k .
Next, suppose that C is positive semidefinite with C k = A. For uniqueness of the kth
root, we need to show that C = B. As C is positive semidefinite, we may write C = V ΛV ∗
with V unitary, and Λ = diag(λi )n k k ∗
i=1 with λ1 ≥ · · · ≥ λn (≥ 0). Then C = V Λ V = A,
and as the eigenvalues of C k are λk1 ≥ · · · ≥ λkn and the eigenvalues of A are
d1 ≥ · · · ≥ dn , we must have that λki = di , i = 1, . . . , n. And thus, since λi ≥ 0 for all i, we
1
have λi = dik , i = 1, . . . , n. From the equalities V Λk V ∗ = U DU ∗ and Λk = D, we obtain
that (U ∗ V )D = D(U ∗ V ). Let W = U ∗ V and write W = (wij )n i,j=1 . Then W D = DW
implies that wij dj = di wij for all i, j = 1, . . . , n. When dj 6= di we thus get that wij = 0
1 1
(since wij (dj − di ) = 0). But then it follows that wij djk = dik wij for all i, j = 1, . . . , n
(indeed, when wij = 0 this is trivial, and when di = dj this also follows from
1 1 1 1
wij dj = di wij ). Now we obtain that U ∗ V D k = W D k = D k W = D k U ∗ V , and thus
1 1
C = V D k V ∗ = U D k U ∗ = B.
Exercise 5.7.23 (a) Let A be positive definite. Show that A + A−1 − 2I is positive
semidefinite.
296 Answers to Exercises
(b) Show that A is normal if and only if A∗ = AU for some unitary matrix U .
Answer: (a) Clearly, since A is Hermitian, we have that A + A−1 − 2I is Hermitian. Next,
every eigenvalue of A + A−1 − 2I is of the form λ + λ−1 − 2 = (λ − 1)2 /λ, where λ is an
eigenvalue of A. As λ > 0, we have that (λ − 1)2 /λ ≥ 0.
Conversely, let A be normal. Then there exists a diagonal D (with diagonal entries |djj |)
and unitary V so that A = V DV ∗ . Let W be a unitary diagonal matrix so that DW = D∗
djj
(by taking wjj = djj
, when djj 6= 0, and wjj = 1 when djj = 0). Then U = V W V ∗ is
unitary and AU = A(V W V ∗ ) = V DV ∗ V W V ∗ = V DW V ∗ = V D∗ V ∗ = A∗ .
1 1 0
Exercise 5.7.24 Find a QR factorization of 1 0 1 .
0 1 1
Answer:
√2 1
√ √ √
√ − √1 2 −√ 22 − 2
√22 6 3 2
− √1 1 6 1
Q=
2 √ ,R = 0
2
√ .
√ 6 3 √6
6 1 2 3
0 3
√ 0 0 3
3
Answer:
√ √ √
− 2 2
2 2 2 2
√ 0
2 2
T = 0 4 2 2 , U = 0√ 1 0√ .
0 0 −2 − 22 0 − 2
2
Answer: Let
I 0
S= ,
−B ∗ A−1 I
and observe that
−A−1 B
I 0 A B I A 0
SM S ∗ = = .
−B ∗ A−1 I B∗ C 0 I 0 C− B ∗ A−1 B
Theorem 5.5.5 now yields that
A 0
In M = In ∗ −1 = In A + In(C − B ∗ A−1 B). (5.34)
0 C−B A B
Exercise 5.7.28 Determine the singular value decomposition of the following matrices.
√
1 1 2√2i
(a) A = √−1 −1
√ 2 2i.
2i − 2i 0
−2 4 5
6 0 −3
(b) A = .
6 0 −3
−2 4 5
√
2i
− 21 1
4 0 0 0 0 1
√2 2
Answer: (a) V = 22i 1
−√12 , Σ = 0 2 0 , W = −1 0 0 .
2
√
0 − 22i − 22i 0 0 2 0 1 0
1
−2 − 21 − 21 − 12
12 0 0 2
− 23 − 13
1 − 12 1
− 12 0 6 0 3
(b) V = 21
2
1 ,Σ = 0
, W = − 1 − 23 2
.
− 12 − 21 0 0 3 3
2 2 − 23 − 13 − 23
− 21 − 21 1
2
1
2
0 0 0
Exercise 5.7.29 Let A be a 4 × 4 matrix with spectrum σ(A) = {−2i, 2i, 3 + i, 3 + 4i}
and singular values σ1 ≥ σ2 ≥ σ3 ≥ σ4 .
P Q
Exercise 5.7.30 Let A = ∈ C(k+l)×(m+n) , where P is of size k × m. Show that
R S
σ1 (P ) ≤ σ1 (A).
Conclude that σ1 (Q) ≤ σ1 (A), σ1 (R) ≤ σ1 (A), σ1 (S) ≤ σ1 (A) as well.
Answer: By (5.17),
x
σ1 (P ) = max kP xk ≤ max kA k≤
kxk=1,x∈Cm kxk=1,x∈Cm 0
The same type of reasoning can be applied to obtain the other inequalities. Alternatively,
one can use that permuting block rows and/or block columns
does
notchangethe singular
P Q Q P
values of a matrix. For instance, the singular values of and are the
R S S R
0 Im
same, as multiplying on the left with the unitary matrix J = does not change
In 0
the singular values (it only changes the singular value decomposition from V ΣW ∗ to
V ΣW ∗ J = V Σ(J ∗ W )∗ ).
Answer:
Answers to Exercises 299
Figure 5.7: The original image (of size 672 × 524 × 3).
(a) Using 10 singular val- (b) Using 30 singular val- (c) Using 50 singular val-
ues. ues. ues.
Answer: Notice that for any invertible matrix A, κ (A) = σ1 (A) σ1 A−1 . So by (5.18),
σ (A) σ (B)
σ1 (AB) σ1 B −1 A−1 ≤ σ1 (A) σ1 A−1 σ1 (B) σ1 B −1 = σ 1 (A) σ 1 (B) = κ (A) κ (B).
n n
Exercise 5.7.33 Prove that if X and Y are positive definite n × n matrices such that
Y − X is positive semidefinite, then det X ≤ det Y . Moreover, det X = det Y if and only
if X = Y .
1 1 1 1
Answer: Notice that Y − 2 (Y − X)Y − 2 = I − Y − 2 XY − 2 is positive semidefinite. Thus
1
−2 −1
the eigenvalues µ1 , . . . , µn of Y XY 2 satisfy 0 ≤ µj ≤ 1, j = 1, . . . , n. But then
det X −1 −2 1 Qn
det Y
= det(Y 2 XY ) = j=1 µj ≤ 1. Next, det X = det Y if and only if
300 Answers to Exercises
1 1
µ1 = · · · = µn = 1, which in turn holds if and only if Y − 2 XY − 2 = In . The latter holds if
and only if X = Y .
Exercise 5.7.34 (Least squares solution) When the equation Ax = b does not have a
solution, one may be interested in finding an x so that kAx − bk is minimal. Such an x is
called a least squares solution to Ax = b. In this exercise we will show that if A = QR,
with R invertible, then the least squares solution is given by x = R−1 Q∗ b. Let A ∈ Fn×m
with rank A = m.
Answer: (a) Since rank A = m, the columns of A are linearly independent. This gives that
the m × m matrix R is invertible. Thus Ran A = Ran Q follows.
(b) Clearly, QQ∗ b ∈ Ran Q. Let v ∈ Ran Q. Thus there exists a w so that v = Qw.
Then, since (I − QQ∗ )b ⊥ x for every x ∈ Ran Q (use that Q∗ Q = I),
kv − bk2 = kQw − QQ∗ b + QQ∗ b − bk2 = kQw − QQ∗ bk2 + kQQ∗ b − bk2 .
Thus kv − bk ≥ kQQ∗ b − bk, and equality only holds when v = QQ∗ b.
(a) Show that X has no eigenvalues on the unit circle T = {z ∈ C : |z| = 1}.
(b) Show that A is positive definite if and only if X has all eigenvalues in
D = {z ∈ C : |z| < 1}. (Hint: When X has allPeigenvalues in D, we have that X n → 0
as n → ∞. Use this to show that A = H + ∞ k=1 X ∗k HX k .)
Answer: (a) Suppose that x is an eigenvector of X with eigenvalue λ. Then (5.35) yields
that
0 < x∗ Hx = (1 − |λ|2 )x∗ Ax,
and thus |λ| 6= 1.
Chapter 6
Exercise 6.7.1 The purpose of this exercise is to show (the vector form of) Minkowski’s
inequality, which says that for complex numbers xi , yi , i = 1, . . . , n, and p ≥ 1, we have
n
!1 n
!1 n
!1
X p X p X p
|xi + yi | ≤ |yi | + |yi | . (6.37)
i=1 i=1 i=1
Recall that a real-valued function f defined on an interval in R is called convex if for all
c, d in the domain of f , we have that f (tc + (1 − t)d) ≤ tf (c) + (1 − t)f (d), 0 ≤ t ≤ 1.
(a) Show that f (x) = − log x is a convex function on (0, ∞). (One can do this by showing
that f 00 (x) ≥ 0.)
Answer: f 00 (x) = x12 > 0.
p q
(b) Use (a) to show that for a, b > 0 and p, q ≥ 1, with p1 + 1q = 1, we have ab ≤ ap + bq .
This inequality is called Young’s inequality.
Answer: Taking c = ap and d = bq , t = p1 (and thus 1 − t = 1q ), we obtain from the
convexity of − log that
1 1 1 1
− log( ap + bq ) ≤ − log ap − log bq .
p q p q
Multiplying by −1 and applying s 7→ es on both sides gives
1 p 1 1 1
a + bq ≥ (ap ) p (bq ) q = ab.
p q
(c) Show Hőlder’s inequality: when ai , bi ≥ 0, i = 1, . . . , n, then
n n
!1 n
!1
p q
p q
X X X
ai bi ≤ ai bi .
i=1 i=1 i=1
1 1
Pn p p Pn q q
(Hint: Let λ = ( i=1 ai ) and µ = ( i=1 bi ) , and divide on both sides ai by λ
and bi by µ. Use this to argue that it is enough to prove the inequality when
λ = µ = 1. Next use (b)).
Answer: If λ or µ equals 0, the inequality is trivial, so let us assume λ, µ > 0. Put
1
p p p
αi = aλi and βi = bµi , i = 1, . . . , n. Then ( n
P Pn
i=1 αi ) = 1, and thus i=1 αi = 1.
Pn q P n
Similarly, i=1 βi = 1. We need to prove that i=1 αi βi ≤ 1. By (b) we have that
αi βi ≤ p1 αpi + 1q βiq , ı = 1, . . . , n. Taking the sum, we obtain
n n n
X 1X p 1X q 1 1
αi βi ≤ αi + β = + = 1,
i=1
p i=1 q i=1 i p q
and we are done.
(d) Use (c) to prove (6.37) in the case when xi , yi ≥ 0. (Hint: Write
(xi + yi )p = xi (xi + yi )p−1 + yi ((xi + yi )p−1 , take the sum on both sides, and now
apply Hőlder’s inequality to each of the terms on the right-hand side. Rework the
resulting inequality, and use that p + q = pq.)
Answer: Using (c) we have that
n
X n
X n
X
(xi + yi )p = xi (xi + yi )p−1 + yi (xi + yi )p−1 ≤
i=1 i=1 i=1
Answers to Exercises 303
n n n n
1 X 1 1 X 1
xpi ) p ( yip ) p (
X X
( (xi + yi )(p−1)q ) q + ( (xi + yi )(p−1)q ) q =
i=1 i=1 i=1 i=1
n n n
1 1 1
xpi ) p yip ) p ](
X X X
[( +( (xi + yi )p ) q ,
i=1 i=1 i=1
where in the last step we used that (p − 1)q = p. Dividing both sides by
1
( n p q
P
i=1 (xi + yi ) ) , we obtain
n n n
1− 1 1 1
xpi ) p + ( yip ) p ,
X X X
( (xi + yi )p ) q ≤ (
i=1 i=1 i=1
1 1
and using that 1 − q
= p
, we are done.
(e) Prove Minkowski’s inequality (6.37).
Answer: We just need to observe Pthat for complex numbers xi and yi we have that
|xi + yi | ≤ |xi | + |yi |, and thus n p n p
P
i=1 |xi + yi | ≤ i=1 (|xi | + |yi |) . Using (d) we
obtain
n n n
X 1 X 1 X 1
( (|xi | + |yi |)p ) p ≤ ( |xi |p ) p + ( |yi |p ) p ,
i=1 i=1 i=1
and we are done.
(f) Show that when Vi has a norm k · ki , i = 1, . . . , k, then for p ≥ 1 we have that
v1
k
! p1
. X p
k . kp := kvi ki
.
i=1
vk
defines a norm on V1 × · · · × Vk .
Answer: The only part that is not trivial is the triangle inequality. For this we need to
observe that kvi + wP i ki ≤ kvi ki + kwi ki , and thus
P n p n p
i=1 kvi + wi ki ≤ i=1 (kvi ki + kwi ki ) . Now we can apply (d) with xi = kvi ki and
yi = kwi ki , and obtain
n n n n
1 1 1 1
kvi kpi ) p + ( kwi kpi ) p ,
X X X X
( (kvi + wi ki )p ) p ≤ ( (kvi ki + kwi ki )p ) p ≤ (
i=1 i=1 i=1 i=1
proving the triangle inequality.
Exercise 6.7.2 Let V and Z be vector spaces over F and T : V → Z be linear. Suppose
W ⊆ Ker T . Show there exists a linear transformation S : V /W → Ran T such that
S(v + W ) = T v for v ∈ V . Show that S is surjective and that Ker S is isomorphic to
(Ker T )/W .
Finally, let us define φ : (Ker T )/W → Ker S via φ(v + W ) = v + W , where v ∈ KerT .
We claim that φ is an isomorphism. First note that S(v + W ) = T v = 0, as v ∈ KerT .
Clearly, φ is linear and one-to-one, so it remains to check that φ is surjective. When
v + W ∈ KerS, then we must have that T v = 0. Thus v ∈ KerT , yielding that
v + W ∈ (Ker T )/W . Clearly, φ(v + W ) = v + W , and thus v + W ∈ Ranφ.
304 Answers to Exercises
Exercise 6.7.3 Consider the vector space Fn×m , where F = R or F = C, and let k · k be
norm on Fn×m .
(k)
(a) Let A = (aij )n m n m
i=1,j=1 , Ak = (aij )i=1,j=1 , k = 1, 2, . . . , be matrices in F
n×m . Show
(k)
that limk→∞ kAk − Ak = 0 if and only if limk→∞ |aij − aij | = 0 for every
i = 1, . . . , n and j = 1, . . . , m.
(b) Let n = m. Show that limk→∞ kAk − Ak = 0 and limk→∞ kBk − Bk = 0 imply that
limk→∞ kAk Bk − ABk = 0.
Answer: (a) Notice that if ckAk − Aka ≤ kAk − Akb ≤ CkAk − Aka , for some c, C > 0,
and limk→∞ kAk − Aka = 0, then limk→∞ kAk − Akn = 0. Thus, by Theorem 5.1.25,
when we have limk→∞ kAk − Ak = 0 in one norm on Fn×m , we automatically have it for
every norm on Fn×m . Let us use the norm
kM k∞ = k(mij )n m
i=1,j=1 k := max |mij |.
i=1,...,n;j=1...,m
(k)
Next, let limk→∞ |aij − aij | = 0 for every i and j. Let > 0. Then for every i and j,
(k)
there exists a Kij ∈ N so that for k > Kij we have |aij − aij | < . Let now
K = maxi=1,...,n;j=1...,m Kij . Then for every k > K we have that
(k)
kAk − Ak∞ = maxi=1,...,n;j=1...,m |aij − aij | < . Thus, by definition of a limit, we have
limk→∞ kAk − Ak∞ = 0.
(b) For scalars we have that limk→∞ |ak − a| = 0 = limk→∞ |bk − b| implies
limk→∞ |(ak + bk ) − (a + b)| and limk→∞ |ak bk − ab| = 0 (which you can prove by using
inequalities like |ak bk − ab| = |ak bk − ak b + ak b − ab| ≤ |ak ||bk − b| + |ak − a||b|).
Equivalently, limk→∞ ak = a and limk→∞ bk = b implies limk→∞ ak bk = ab.
Suppose now that limk→∞ kAk − Ak = 0 = limk→∞ kBk − Bk = 0. Then, using (a),
(k) (k)
limk→∞ aij = aij and limk→∞ bij = bij for all i, j = 1, . . . , n. Now, for the (r, s)
element of the product Ak Bk we obtain
n n
(k) (k) (k) (k)
X X
lim (Ak Bk )rs = lim arj bjs = ( lim arj )( lim bjs ) =
k→∞ k→∞ k→∞ k→∞
j=1 j=1
n
X
arj bjs = (AB)rs , r, s = 1, . . . , n.
j=1
Again using (a), we may conclude limk→∞ kAk Bk − ABk = 0.
Exercise 6.7.4 Given A ∈ Cn×n , we define its similarity orbit to be the set of matrices
O(A) = {SAS −1 : S ∈ Cn×n is invertible}.
Thus the similarity orbit of a matrix A consists of all matrices that are similar to A.
(a) Show that if A is diagonalizable, then its similarity orbit O(A) is closed. (Hint: notice
that due to A being diagonalizable, we have that B ∈ O(A) if and only if mA (B) = 0.)
Answers to Exercises 305
(b) Show that if A is not diagonalizable, then its similarity orbit is not closed.
Answer: (a) Suppose that Bk ∈ O(A), k ∈ N, and that limn→∞ kBk − Bk = 0. We need to
show that B ∈ O(A), or equivalently, mA (B) = 0. Write
mA (t) = an tn + an−1 tn−1 + · · · + a0 (where an = 1). By exercise 6.7.3(b) we have that
limn→∞ kBk − Bk = 0 imples that limn→∞ kBkj − B j k = 0 for all j ∈ N. But then
n
|aj |kBkj − B j k = 0.
X
lim kmA (Bk ) − mA (B)k ≤ lim
k→∞ k→∞
j=0
As mA (Bk ) = 0 for every k, we thus also have that mA (B) = 0. Thus B ∈ O(A) follows.
(b) First let A = Jk (λ), k ≥ 2, be a Jordan block. For > 0 put D = diag(j )k−1
j=0 . Then
λ 0 ··· 0
0 λ ··· 0
A := D−1 Jk (λ)D = .. .. .. .. ∈ O(A). (6.38)
. . . .
0 0 ··· λ
0 0 ··· 0 λ
Notice that limm→∞ A 1 = λIk 6∈ O(A), and thus O(A) is not closed.
m
Using the reasoning above, one can show that if A = SJS −1 with J = ⊕sl=1 Jnl (λl ) and
some nl > 1, then S(⊕sj=1 λl Inl )S −1 6∈ O(A) is the limit of elements in O(A). This gives
that O(A) is not closed.
Next, let f ∈ V 0 be defined byPf (vj ) = 1 for all j ∈ J. In other words, if v = sr=1 cr vjr
P
s
is a vector in V , then f (v) = r=1 cr . Clearly, f is a linear functional on V . In addition,
not a finite linear combination of elements in {fj : j ∈ J}. Indeed, suppose that
f is P
f = sr=1 cr fjr . Choose now a j ∈ J \ {j1 , . . . , jr }, which can always Pbe done since J is
infinite. Then f (vj ) = 1, while fjr (vj ) = 0 as j 6= jr . Thus f (vj ) 6= sr=1 cr fjr (vj ),
giving that f 6∈ Span{fj : j ∈ J}.
Exercise 6.7.6 Describe the linear functionals on Cn [X] that form the dual basis of
{1, X, . . . , X n }.
1 dj p
Φj (p(X)) = (0), j = 1, . . . , n.
j! dX j
306 Answers to Exercises
Exercise 6.7.7 Let a0 , . . . , an be different complex numbers, and define Ej ∈ (Cn [X])0 ,
j = 0, . . . , n, via Ej (p(X)) = p(aj ). Find a basis of Cn [X] for which {E0 , . . . , En } is the
dual basis.
Answer: If we let {q0 (X), . . . , qn (X)} be the basis of Cn [X] we are looking for, then we
need that Ej (qk (X)) = 1 if j = k, and Ej (qk (X)) = 0 if j = 6 k. Thus, we need to find a
polynomial qk (X) so that qk (ak ) = 1, while a0 , . . . , ak−1 , ak+1 , . . . , an are roots of qk (X).
Thus
qk (X) = c(X − a0 ) · · · (X − ak−1 )(X − ak+1 ) · · · (X − an ),
with c chosen so that qk (ak ) = 1. Thus we find
Y X − ar
qk (X) = ,
r=0,...,n;r6=k
ak − ar
(a) Show how given f ∈ W 0 and g ∈ X 0 , one can define h ∈ V 0 so that h(w) = f (w) for
w ∈ W and h(x) = g(x) for x ∈ X.
(b) Using the construction in part (a), show that V 0 = W 0 +̇X 0 . Here it is understood
that we view W 0 as a subspace of V 0 , by letting f ∈ W 0 be defined on all of V by
putting f (w + x) = f (w), when w ∈ W and x ∈ X. Similarly, we view X 0 as a
subspace of V 0 , by letting g ∈ W 0 be defined on all of V by putting g(w + x) = g(x),
when w ∈ W and x ∈ X.
(b) We first show that W 0 ∩ X 0 = {0}. Indeed, let f ∈ W 0 ∩ X 0 . By the way of viewing
f ∈ W 0 as a function on all of V , we have that f (x) = 0 for all x ∈ X. Similarly, by the
way of viewing f ∈ X 0 as a function on all of V , we have that f (w) = 0 for all w ∈ W .
But then for a general v ∈ V , which can always be written as v = w + x for some w ∈ W
and x ∈ X, we have that f (v) = f (w + x) = f w) + f (x) = 0 + 0 = 0. Thus f is the zero
functional, yielding W 0 ∩ X 0 = {0}.
Answer: (a) Let f, g ∈ Wann and c be a scalar. Then for w ∈ W we have that
(f + g)(w = f (w) + g(w) = 0 + 0 = 0 and (cf )(w = cf (w) = c0 = 0. This shows that
f + g, cf ∈ Wann , and thus Wann is a subspace.
1 −1 2 −2
(b) This amounts to finding the null space of , which in row-reduced
1 0 1 0
1 0 1 0
echelon form is . The null space is spanned by
0 1 −1 2
−1 0
1 −2
v1 = , v2 =
. Thus Wann = Span{f1 , f2 }, where (using the Euclidean inner
1 0
0 1
product) fi (v) = hv, v1 i, i = 1, 2.
1 2 0 0
(c) This amounts to finding the null space of , which is spanned by
0 1 1 0
2 0
−1
, v2 = 0 . Now define f1 (p0 + p1 X + p2 X 2 + p3 X 3 ) = 2p0 − p1 + p2 and
v1 = 1 0
0 1
f2 (p0 + p1 X + p2 X 2 + p3 X 3 ) = p3 . Then Wann = Span{f1 , f2 }.
Exercise 6.7.10 Let V be a finite-dimensional vector space over R, and let {v1 , . . . , vk }
be linearly independent. We define
k
X
C = {v ∈ V : there exist c1 , . . . , ck ≥ 0 so that v = ci vi }.
i=1
Show that v ∈ C if and only if for all f ∈ V 0 with f (vj ) ≥ 0, j = 1, . . . , k, we have that
f (v) ≥ 0.
Remark. The statement is also true when {v1 , . . . , vk } are not linearly independent, but
in that case the proof is more involved. The corresponding result is the Farkas–Minkowski
Theorem, which plays an important role in linear programming.
Conversely, suppose that v ∈ V has the property that for all f ∈ V 0 with f (vj ) ≥ 0,
j = 1, . . . , k, we have that f (v) ≥ 0. First, we show that v ∈ Span{v1 , . . . , vk }. If not, we
can find a linear functional so that on the (k + 1)-dimensional space Span{v, v1 , . . . , vk }
we have f (v) = −1 and f (vj ) = 0, j = 1, . . . , k. But this contradicts that v ∈ V has the
property that for all f ∈ V 0 with f (vj ) ≥ 0, j = 1, . . . , k, we have that f (v) ≥ 0.
statement. Next, suppose that for all f ∈ (RanA)ann we have that f (w) = 0. If
w 6∈ RanA, then letting {w1 , . . . , wk } be a basis of RanA, we can find a linear functional
so that f (wj ) = 0, j = 1, . . . , k, and f (w) = 1. Then f ∈ (RanA)ann , but f (w) 6= 0, giving
a contradiction. Thus we must have that w ∈ RanA, yielding the existence of a v so that
Av = w.
Answer: (a) and (b) are direct computations. For (c), let x × y = 0, and assume that one
of the entries of x and y is nonzero (otherwise, we are done). Without loss of
generalization, we assume that x1 6= 0. Then, reworking the equations one obtains from
x × y = 0, one sees that y = xy1 x.
1
Answer:
−i 0 −1 + i 0 −2 + i 0
−2i 5i −2 + 2i 5 − 5i −4 + 2i 10 − 5i
i 3i 1−i 3 − 3i 2−i 6 − 3i
A⊗B = ,
−1 − i 0 2 0 3−i 0
−2 − 2i 5 + 5i 4 −10 6 − 2i −15 + 5i
1+i 3 + 3i −2 −6 −3 + i −9 + 3i
−i −1 + i −2 + i 0 0 0
−1 − i 2 3−i 0 0 0
−2i −2 + 2i −4 + 2i 5i 5 − 5i 10 − 5i
B⊗A= .
−2 − 2i 4 6 − 2i 5 + 5i −10 −15 + 5i
i 1−i 2−i 3i 3 − 3i 6 − 3i
1+i −2 −3 + i 3 + 3i −6 −9 + 3i
We have that A ⊗ B = P (B ⊗ A)P T , where
1 0 0 0 0 0
0 0 0 1 0 0
0 1 0 0 0 0
P = .
0 0 0 0 1 0
0 0 1 0 0 0
0 0 0 0 0 1
(b) From the Gaussian elimination algorithm we know that we can write A = SET and
B = Ŝ Ê T̂ , where S, T , Ŝ and T̂ are invertible, and
Ik 0 Il 0
E= , Ê = ,
0 0 0 0
where k = rankA and l = rankB. Then
A ⊗ B = (S ⊗ Ŝ)(E ⊗ Ê)(T ⊗ T̂ ).
Notice that E ⊗ Ê has rank kl (as there are exactly kl entries equal to 1 in different rows
and in different columns, and all the other entries are 0). Since (S ⊗ Ŝ) and (T ⊗ T̂ ) are
invertible, we get that
rank(A ⊗ B) = rank(E ⊗ Ê) = kl = (rank A)(rank B).
Exercise 6.7.15 Given Schur triangularization decompositions for A and B, find a Schur
triangularization decomposition for A ⊗ B. Conclude that if λ1 , . . . , λn are the eigenvalues
for A and µ1 , . . . , µm are the eigenvalues for B, then λi µj , i = 1, . . . , n, j = 1, . . . , m, are
the nm eigenvalues of A ⊗ B.
Exercise 6.7.16 Given singular value decompositions for A and B, find a singular value
decomposition for A ⊗ B. Conclude that if σ1 , . . . , σk are the nonzero singular values for A
and σ̂1 , . . . , σ̂l are the nonzero singular values for B, then σi σ̂j , i = 1, . . . , k, j = 1, . . . , l,
are the kl nonzero singular values of A ⊗ B.
where the permutation matrix P is chosen to that P (Σ ⊗ Σ̂)P T has the nonzero singular
values σi σ̂j , i = 1, . . . , k, j = 1, . . . , l in nonincreasing order in the entries
(1, 1), (2, 2), . . . , (kl, kl) and zeros everywhere else.
310 Answers to Exercises
Exercise 6.7.17 Show that det(I ⊗ A + A ⊗ I) = (−1)n det pA (−A), where A ∈ Cn×n .
Exercise 6.7.19 For a diagonal matrix A = diag(λi )n i=1 , find matrix representations for
A ∧ A and A ∨ A using the canonical (lexicographically ordered) bases for Fn ∧ Fn and
Fn ∨ Fn , respectively.
Answer: Applying the definition of the anti-symmetric wedge product and using the
linearity of the inner product we have that
hv1 ∧ · · · ∧ vk , w1 ∧ · · · ∧ wk i =
X X
(−1) (−1)τ hvσ(1) ⊗ · · · ⊗ vσ(k) , wτ (1) ⊗ · · · ⊗ wτ (k) i.
σ
σ∈Sk τ ∈Sk
Using the definition of the inner product on the tensor space, we obtain that the above
equals X X
(−1)σ (−1)τ hvσ(1) , wτ (1) i · · · hvσ(k) , wτ (k) i.
σ∈Sk τ ∈Sk
Answers to Exercises 311
Qk Qk
Since i=1 hvσ(i) , wτ (i) i = i=1 hvi , wτ ◦σ −1 (i) i, we obtain that
k
X X −1 Y X
[(−1)τ ◦σ hvi , wτ ◦σ−1 (i) i] = det(hvi , wj i)ki,j=1 =
σ∈Sk τ ∈Sk i=1 σ∈Sk
k! det(hvi , wj i)ki,j=1 .
Answer: { 21 e1 ∨ e1 , 1
√ e1 ∨ e2 , 12 e2 ∨ e2 , √1
e1 ∨ e3 , √1 e2 ∨ e3 , 12 e3 ∨ e3 }.
2 2 2
(c) Let M = {1, . . . , m} and P = {1, . . . , p}. For A ∈ Fp×m and B ∈ Fm×p , show that
X
det AB = det(A[P, S]) det(B[S, P ]). (6.43)
S⊆M,|S|=p
Answer: (a)
A∧A = a11 a22 − a12 a21 a11 a23 − a13 a21 ··· a11 a2m − a1m a21
a12 a23 − a13 a22 ···
a12 a2m − a1m a22 ··· ··· a1,m−1 a2m − a2,m−1 a1m .
(b) This follows immediately from using (a) and multiplying A ∧ A with B ∧ B.
(c) To show (6.43) one needs to use that (∧p A)(∧p B) = ∧p (AB) = det(AB), where in the
last step we used that AB is of size p × p. The 1 × m
p
matrix ∧p A is given by
∧p A = (A[P, S])S⊆M,|S|=p .
m
Similarly, the p
× 1 matrix ∧p B is given by
∧p B = (B[S, P ])S⊆M,|S|=p .
Equation (6.43) now immediately follows from (∧p A)(∧p B) = ∧p (AB).
Exercise 6.7.23 For x, y ∈ R3 , let the cross product x × y be defined as in (6.17). Show,
using (6.42) (with B = AT ), that
kx × yk2 = kxk2 kyk2 − (hx, yi)2 . (6.44)
kxk2
x1 x2 x3 hx, yi
Answer: Let A = = B T . Then AB = 2 , thus
y1 y2 y3 hy, xi kyk
det AB = kxk2 kyk2 − (hx, yi)2 . Next B ∧ B = x × y = (A ∧ A)T . And thus
(A ∧ A)(B ∧ B) = kx × yk2 . As this equals AB ∧ AB = det AB, we obtain (6.44). Since
kx × yk2 ≥ 0, equation (6.44) implies the Cauchy–Schwarz inequality for the Euclidean
inner product on R3 .
Answers to Exercises 313
Chapter 7
(iv) Show that A cannot have Jordan blocks at 1 of size greater than 1. (Hint: Use that
when k > 1 some of the entries of Jk (1)m do not stay bounded as m → ∞. With this
observation, find a contradiction with the previous part.)
Answer: First notice that when k > 1 the (1, 2) entry of Jk (1)m equals m. Suppose
now that the Jordan canonical decomposition A = SJS −1 of A has Jk (1) in the
upper left corner of J for some k > 1. Put x = Se2 and y = (S T )−1 e1 . Then
yT Am x = m → ∞ as m → ∞. This is in contradiction with the previous part.
(v) Show that if xA = λx, for some x 6= 0, then |λ| ≤ 1.
Answer: If x = x1 · · · xn , let k be so that |xk | = maxj=1,...,n |xj |. Note that
|xk | > 0. Then the kth component of xA satisfies
n
X n
X
|λxk | = |(xA)k | = | aij xi | ≤ aij |xk | = |xk |,
i=1 i=1
Answer: We have
n
X n
X
|xj | = |λxj | = |(xA)j | = | xi aij | ≤ |xi |aij = (|x|A)j , j = 1, . . . , n.
i=1 i=1
For the remainder of this exercise, assume that A only has positive entries; thus
aij > 0 for all i, j = 1, . . . , n.
(vii) Show that y = 0. (Hint: Put z = |x|A, and show that y = 6 P
0 implies that zA − z has
all positive entries. The latter can be shown to contradict n i=1 aij = 1,
j = 1, . . . , n.)
Answer: Suppose that y = 6 0. Then yA = |x|A2 − |x|A has all positive entries (as at
least one entry of y is positive and the others are nonnegative). Put z = |x|A. Then
zA − z has P all positive entries.
Pn If we let zk = maxj=1,...,n zj . then
(zA)k = n i=1 zi aik ≤ zk i=1 aik = zk , which contradicts that zA − z has all
positive entries. Thus we must have y = 0.
(viii) Show that if xA = λx with |λ| = 1, then x is a multiple of e and λ = 1. (Hint: first
show that all entries of x have the same modulus.)
Answer: Let k be so that |xk | = maxj=1,...,n |xj |. Suppose that |xr | < |xk | for some
r = 1, . . . , n. Then
n
X n
X n
X
|xk | = |λxk | = (|xA|)k = | xi aik | ≤ |xi |aik < |xk | aik = |xk |,
i=1 i=1 i=1
Exercise 7.9.3 Let k · k be a norm on Cn×n , and let A ∈ Cn×n . Show that
1
ρ(A) = lim kAk k k , (7.45)
k→∞
where ρ(·) is the spectral radius. (Hint: use that for any > 0 the spectral radius of
1
ρ(A)+
A is less than one, and apply Corollary 7.2.4.)
1
Answer: As limk→∞ C k = 1 for all C > 0, it follows from Theorem 5.1.25 that the limit
in (7.45) is independent of the chosen norm. Let us choose k · k = σ1 (·).
Answers to Exercises 315
and thus 1
|λ| ≤ (σ1 (Ak )) k .
This also holds for the eigenvalue of maximal modulus, and thus
1
ρ(A) ≤ (σ1 (Ak )) k . (7.46)
1
Next, let > 0. Then the spectral radius of B = ρ(A)+
A is less than one. Thus, by
Corollary 7.2.4, we have that B k → 0 as k → ∞. In particular, there exists a K so that for
k > K we have that σ1 (B k ) ≤ 1. Then σ1 (Ak ) ≤ (ρ(A) + )k , which gives that
1
(σ1 (Ak )) k ≤ ρ(A) + .
1
Together with (7.46), this now gives that limk→∞ (σ1 (A)) k = ρ(A).
(k) (k)
Answer: If we denote Ak = (aij )n k n
i,j=1 , B = (bij )i,j=1 , then is is easy to check that
(k) (k)
|aij |
≤ for all i, j, k. Using the Frobenius norm this implies that kAk k ≤ kB k k for all
bij
k ∈ N. But then
1 1
ρ(A) = lim kAk k k ≤ lim kB k k k = ρ(B)
k→∞ k→∞
follows.
Exercise 7.9.5 Show that if {u1 , . . . , um } and {v1 , . . . , vm } are orthonormal sets, then
the coherence µ := maxi,j |hui , vj i|, satisfies √1m ≤ µ ≤ 1.
Exercise 7.9.6 Show that if A has the property that every 2s columns are linearly
independent, then the equation Ax = b can have at most one solution x with at most s
nonzero entries.
Answer: Suppose that Ax1 = b = Ax2 , where both x1 and x2 have at most s nonzero
entries. Then A(x1 − x2 ) = 0, and x1 − x2 has at most 2s nonzero entries. If x1 − x2 =
6 0
we obtain that the columns of A that hit a nonzero entry in x1 − x2 are linearly
independent. This contradicts the assumption that every 2s columns in A are linearly
independent. Thus x1 = x2 .
Exercise 7.9.7 Let A = (aij )n i,j=1 . Show that for all permutation σ on {1, . . . , , n} we
have a1,σ(1) a2,σ(2) · · · an,σ(n) = 0 if and only if there exist r (1 ≤ r ≤ n) rows and
n + 1 − r columns in A so that the entries they have in common are all 0.
316 Answers to Exercises
For the converse, we use induction on the size of the matrix n. When n = 1, the statement
is trivial, so suppose that the result holds for matrices of size up to n − 1. Let now
A = (aij )n i,j=1 and suppose that a1,σ(1) a2,σ(2) · · · an,σ(n) = 0 for all σ. If A = 0, we are
done. Next, let A have a nonzero entry, say ai0 ,j0 = 6 0. Deleting the row and column of
this nonzero entry, we must have that the resulting (n − 1) × (n − 1) submatrix has a zero
in every of its generalized diagonals {(1, τ (1)), . . . , (n − 1, τ (n − 1))} with τ a permutation
on {1, . . . , n − 1}. By the induction assumption, we can identify rows
j1 , . . . , jr ∈ {1, . . . , n} \ {i0 } and columns k1 , . . . , kn−r ∈ {1, . . . , n} \ {j0 }, so that the
entries they have in common are all 0. By permuting rows and columns of A, we may
assume {j1 , . . . , jr } = {1, . . . , r} and {k1 , . . . , kn−r } = {r + 1, . . . , n}. Thus we have that
A11 0
A= ,
A12 A22
where A11 is r × r and A22 is (n − r) × (n − r). Due to the assumption on A, we must
have that either A11 or A22 also has the property that each of its generalized diagonals
has a zero element. By applying the induction assumption on A11 or A22 , we obtain that
one of these matrices has (possibly after a permutation of rows and columns) an upper
triangular zero block which includes a diagonal entry. But then A has an upper triangular
zero block which includes a diagonal zero entry, and thus we obtain the desired s rows and
n − s + 1 columns.
(iv) Prove
Theorem 7.9.9 (Birkhoff ) Let A be doubly stochastic. Then there exist a k ∈ N,
permutation matrices P1 , . . . , Pk and positive numbers α1 , . . . , αk so that
k
X
A = α1 P 1 + · · · + αk P k , αj = 1.
j=1
k̂
X
A = αPσ + (1 − α)Â = αPσ + (1 − α)βj Pj
j=1
1/6 1/2 1/3
Exercise 7.9.10 Write the matrix 7/12 0 5/12 as a convex combination of
1/4 1/2 1/4
permutation matrices.
Answer:
1 0 0 0 1 0 0 1 0 0 0 1
1 1 1 1
0 0 1 + 1 0 0 + 0 0 1 + 1 0 0 .
6 0 1 0 4 0 0 1 4 1 0 0 3 0 1 0
A = .. ..
. .
An1 · · · Ann
has minimal min rank A equal to rank
Ai1 · · · Aii Ai+1,1 ... Ai+1,i
n n−1
X . . X . .. .
rank . .. − rank . (7.47)
. . .
i=1 i=1
An1 . . . Ani An1 ··· Ani
318 Answers to Exercises
Answer: We prove (b), as it will imply (a). For a matrix M , we let coli (M ) denote the ith
scalar column of the matrix M . For p = 1, . . . , n we let Jp ⊆ {1, . . . , µp } be a smallest
possible set such that the columns
App
coli .. , i ∈ Jp , (7.48)
.
Anp
satisfy
App Ap1 ··· Ap,p−1
Span coli .. : i ∈ Jp + Ran .. ..
.
. .
Anp An1 · · · An,p−1
= Ran .. .. .
. .
An1 ··· Anp
Note that the number of elements in Jp equals
Ap1 · · · App Ap1 · · · Ap,p−1
rank .. .. − rank .. ..
.
. . . .
An1 · · · Anp An1 · · · An,p−1
Thus n
P
p=1 cardJp equals the right-hand side of (7.47). It is clear that regardless of the
choice for Aij , i < j, the collection of columns
A1p
coli .. , i ∈ Jp , p = 1, . . . , n, (7.49)
.
Anp
will be linearly independent. This gives that the minimal rank is greater than or equal to
the right-hand side of (7.47). On the other hand, when one has identified the columns
(7.48) one can freely choose entries above these columns. Once such a choice is made,
every other column of the matrix can be written as a linear combination of the columns
(7.49), and thus a so constructed completion has rank equal to the right-hand side of
(7.47). This yields (7.47).
x1 x2 x4
1 0
Answer: Let 1 0 x3 be a completion. As is a submatrix, the ranks is at
0 1
0 1 1
least 2. For the rank to equal 2, we need that the determinant is 0. This leads to
x4 = x1 x3 + x2 .
Answers to Exercises 319
Answer: Let k ∈ {1, . . . , n}, and compute sin(kjθ − jθ) + sin(kjθ + jθ) =
sin(kjθ) cos(jθ) − cos(kjθ) sin(jθ) + sin(kjθ) cos(jθ) + cos(kjθ) sin(jθ) =
2 sin(kjθ) cos(jθ).
Thus
− sin((k − 1)jθ) + 2 sin(kjθ) − sin((k + 1)jθ) = (2 − 2 cos(jθ)) sin(kjθ).
Using this, and the observation that for k = 1 we have sin((k − 1)jθ) = 0, and for k = n
we have sin((k + 1)jθ) = 0 (here is where the definition of θ is used), it follows that
An vj = (2 − 2 cos(jθ))vj , j = 1, . . . , n.
1 0
(a) Let U = ∈ Cn×n , with U1 ∈ C(n−1)×(n−1) a unitary matrix chosen so that
0 U1
a21 σ
v
a31 0 u n
uX
U1 . = . , σ = t |aj1 |2 .
.. ..
j=2
an1 0
320 Answers to Exercises
Remark. If one puts a matrix in upper Hessenberg form before starting the QR algorithm,
it (in general) speeds up the convergence of the QR algorithm, so this is standard practice
when numerically finding eigenvalues.
a11 A12
Answer: (a) Writing A = , we have that
A21 A22
a11 σe1
U AU ∗ = ,
A21 U1∗ U1 A22 U1∗
and is thus of the required form.
(b) As U2 has the special form, the first column of  coincides with the first column of
U AU ∗ , and has therefore zeros in positions (3, 1), . . . , (n, 1). Next, the second column of Â
below the main diagonal corresponds to σ2 e1 . Thus  also has zeros in positions
(4, 2), . . . , (n, 2). Continuing this way, one can find Uk , k = 3, . . . , n − 2, making new zeros
in positions (k + 2, k), . . . , (n, k), while keeping the previously obtained zeros. Letting V
equal the product of the unitaries, we obtain the desired result.
(b) The degree di of a vertex i is the number of vertices it is adjacent to. For instance, for
the graph in Figure 7.6 we have that the degree of vertex 1 is 2, and the degree of
T
∈ Rn . Show that eT AG e = i∈V di .
P
vertex 6 is 1. Let e = 1 · · · 1
Answer: Notice that di is equal to the sum of the entries in the Pith row of AG . Next,
eT AG e is the sum of all the entries of AG , which thus equals i∈V di .
(c) For a real number x let bxc denote the largest integer ≤ x. For instance, bπc = 3,
b−πc = −4, b5c = 5. Let α = λmax (AG ) be the largest eigenvalue of the adjacency
matrix of G. Show that G must have a vertex of degree at most bαc. (Hint: use
Exercise 5.7.21(b).)
Answer: If we take y = √1n e, we get that by Exercise 5.7.21 and part (b) that
1 X
α = max xT Ax ≥ yT Ay = di . (7.50)
hx,xi n i∈V
P
If every vertex i has the property that di > α, then i∈V di > nα, which contradicts
(7.50). Thus, for some i we have di ≤ α. As di is an integer, this implies di ≤ bαc.
(d) Show that
χ(G) ≤ bλmax (AG )c + 1, (7.51)
which is a result due to Herbert S. Wilf. (Hint: use induction and Exercise 5.7.21(c).)
Answer: Denote α = λmax (AG ). We use induction. When the graph has one vertex,
we have that AG = (0) and χ(G) = 1 (there is only one vertex to color), and thus
inequality (7.51) holds.
Let us assume that (7.51) holds for all graphs with at most n − 1 vertices, and let
G = (V, E) have n vertices. By part (c) there is a vertex i so that di ≤ bαc. Let us
remove vertex i (and the edges with endpoint i) from the graph G, to give us a graph
Ĝ = (V̂ , Ê). Notice that AĜ is obtained from AG by removing row and column i. By
Exercise 5.7.21(c) we have that λmax (AĜ ) ≤ λmax (AG ) = α. Using the induction
assumption on Ĝ (which has n − 1 vertices), we obtain that
Thus Ĝ has a (bαc + 1)-coloring. As the vertex i in G has degree ≤ bαc, there is at
least one color left for the vertex i, and thus we find that G also has a
(bαc + 1)-coloring.
(b) Let
1 0 0 0 −1 0 0 0 −1
0 0 0 0 0 0 0 0 0
0 0 2 0 0 0 0 0 0
0 0 0 2 0 0 0 0 0
Z = −1 0 0 0 1 0 0 0 −1 .
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 2 0
−1 0 0 0 −1 0 0 0 1
Show that for x, y ∈ C3 we have that (x ⊗ y)∗ Z(x ⊗ y) ≥ 0.
1
(c) Show that tr(ρα Z) = 7
(3 − α), and conclude that ρα is not 3 × 3 separable for
3 < α ≤ 5.
T
(b) Note that x ⊗ y = x1 y1 x1 y2 x1 y3 x2 y1 x2 y2 x2 y3 x3 y1 x3 y2 x3 y3 .
If we assume that |y2 | ≥ |y1 |, we write (x ⊗ y)∗ Z(x ⊗ y) as
|x1 y1 − x2 y2 + x3 y3 |2 + 2|x1 ȳ3 − x3 ȳ1 |2 + 2|x2 y1 |2 + 2|x3 |2 (|y2 |2 − |y1 |2 ),
which is nonnegative. The case |y1 | ≥ |y2 | can be dealt with in a similar manner.
semidefinite. As each positive semidefinite can be written as lj=1 vj vj∗ , with vj vectors,
P
we can actually write the separable ρα as
Xs
ρα = xj x∗j ⊗ yj yj∗ ,
j=1
323
324 Bibliography