Вы находитесь на странице: 1из 81

Matrix Analysis (second edition)

Solutions to Exercises

David Malison

November 2, 2017

Chapter 1

Section 1.0

1.0.P1

Define f (x) = xT Ax, g(x) = xT x and B = {x ∈ Rn : g(x) = 1}. Since f is a continuous function and
B is closed and bounded, Weierstrass’ Theorem implies f (x) achieves its extrema on B. Since f and g
are continuously differentiable, every extremum x∗ must satisfy the necessary condition Ax∗ − λx∗ = 0.
Therefore every extremum is an eigenvector of the matrix A.

1.0.P2

Define λ∗ = max{xT Ax : x ∈ Rn , xT x = 1}. In 1.0.P1, we showed that there exists some x∗ satisfying
x∗T Ax∗ = λ∗ and x∗T x∗ = 1. We also know that x∗ must satisfy the necessary condition Ax∗ − λx∗ = 0 for
some λ ∈ R, so λ is an eigenvalue of A. But then λ∗ = x∗T Ax∗ = λx∗T x∗ = λ, so λ∗ is an eigenvalue of A.
x
Now suppose Ax = λ0 x for some non-zero x ∈ Rn and λ0 ∈ R. Define x̃ = ||x|| . Then x̃T x̃ = 1, so
x̃T Ax̃ = λ0 x̃T x̃ = λ0 . But λ0 = x̃T Ax̃ ≤ λ∗ by definition of λ∗ .

Section 1.1

Exercises

1. Dei = di ei , so ei is an eigenvector of D with eigenvalue di for i = 1, · · · , n.

2. If Ax = λx, then for any non-zero scalar α we have A(αx) = αAx = αλx = λ(αx).

3. Notice that Ax = λx if and only if Āx̄ = λ̄x̄. Therefore λ ∈ σ(A) if and only if λ̄ ∈ σ(Ā). If A ∈ Mn (R),
then A = Ā so that λ ∈ σ(A) if and only if λ̄ ∈ σ(A).

1
4. The eigenvectors associated with the eigenvalue 5 reside in the null space of the matrix
!
−2 2
−4 4

This null space is spanned by the vector x = (1, 1)0 .

5. e1 and x1 are eigenvectors associated with eigenvalues 2 and 0, respectively:


! ! ! !
1 1 1 2 1
= =2
1 1 1 2 1
! ! ! !
1 1 −1 0 −1
= =0
1 1 1 0 1
To see that e1 and x1 are linearly independent, observe that the only solution to the system
! ! !
1 1 a1 0
=
1 −1 a2 0

is a1 = a2 = 0.

For the general case, observe that


Jn x = hx, eie

This equation tells us e will always be an eigenvector associated with the eigenvalue n. Moreover, this
equation tells us that any x orthogonal to e will also be eigenvector associated with the eigenvalue 0.
Since he, xk i = he, ei − nhe, ek i = n − n = 0, each xk is also an eigenvector. Finally, it is easy to verify
using Gaussian elimination that the only solution to the system
 
1 1−n 1 1    
 a1 0
1−n

1 1 1    
   a2  0
1 1 1 ··· 1  ..  =  .. 
   
. .. .. .. 
.   .  .
. . . . 
an 0
1 1 1 1−n

is a1 = a2 = · · · = an = 0. Thus {e, x1 , · · · , xn−1 } are linearly independent.

6. In the previous exercise, we showed that J3 e = 3e and J3 x1 = J3 x2 = 0. Thus Ae = 4e − J3 e =


(4−3)e = e, so 1 is an eigenvalue associated with the eigenvector e. We also have Axi = 4xi −J3 xi = 4xi
for i = 1, 2, so 4 is an eigenvalue associated with the eigenvectors x1 and x2 .

7. Suppose σ(A) = {−1, 1}. Theorem 1.1.6 implies that λ2 is an eigenvalue of A2 for each λ ∈ σ(A), so
we must have 1 ∈ σ(A2 ). Theorem 1.1.6 also implies that if µ ∈ σ(A2 ), then there exists λ ∈ σ(A)
such that µ = λ2 . But since 12 = (−1)2 = 1, we must have that µ = 1. Thus σ(A2 ) = {1}.

8. Observe that A2 = 0. Thus Ae1 = 0 = A2 e1 , so e1 is an eigenvector associated with the eigenvalue


λ = 0 for both A and A2 . Notice that A2 e2 = 0 but Ae2 = e1 , so e2 is an eigenvector of A2 but not of
A.

2
Notice that the eigenvalue equation for A simplifies to
! !
x2 λx1
Ax = =
0 λx2

If λ 6= 0, these equations imply x1 = x2 = 0. But then (x1 , x2 )0 = 0 and therefore is not an eigenvector.
Thus the only eigenvalue of A is λ = 0. When λ = 0, the first equation implies x2 = 0. Thus only
scalar multiples of e1 are eigenvectors of with eigenvalue λ = 0.

The reason that an eigenvector of p(A) is not necessarily an eigenvector of A is because

nullspace(P (A) − µI) = nullspace ((A − β1 I) · · · (A − βk I))


⊇ nullspace(A − β1 I) ∪ · · · ∪ nullspace(A − βk I)

where the set inclusion can be strict. In the above example, P (A) = A2 and µ = 0. We can then factor
P (A) − µI = (A − 0I)(A − 0I), but notice that nullspace(P (A) − µI) = C2 while nullspace(A − 0I) =
span({e1 }).

1.1.P1

If A is non-singular and Ax = λx for x 6= 0, then A−1 x = A−1 (λ−1 λ)x = λ−1 A−1 Ax = λ−1 x, so λ−1 ∈
σ(A−1 ).

1.1.P2
Pn
(a) j=1 Aij = 1 for all i ⇐⇒ Ae = e ⇐⇒ e is an eigenvector of A associated with eigenvalue 1.

(b) Since A is nonsingular, λ− 1 = 1 is an eigenvalue of A−1 associated with eigenvector e (1.1.P1). But this
Pn
implies j=1 A−1 ij = 1. Moreover, by Theorem 1.1.6 p(1) is eigenvalue of p(A) associated with e. But
this implies the sum of the elements in each row of p(A) equals p(1).

1.1.P3

Consider the eigenvalue equation

Ax = λx
Au + iAv = λu + iλv

Since two complex numbers are equal if and only if their real and imaginary parts are equal, we must have
Au = λu and Av = λv. If both u and v are zero, then x = 0 and therefore cannot be an eigenvector. Thus
any real eigenvalue has a real eigenvector associated with it. If x is real, then v = 0 and therefore is not an
eigenvalue of A. If A = diag(i, 1), then e1 is an eigenvector associated with the eigenvalue i.

3
1.1.P4

Partition x = (x01 , x02 )0 so that it is conformable with the blocks in A. Then we have
!
A11 x1
Ax =
A22 x2

If Ax = λx, then A11 x1 = λx1 and A22 x2 = λx2 . If x is non-zero, this then implies λ is either an eigenvalue
of A11 or A22 . If A11 x1 = λx1 and x1 6= 0, then (x01 , 00 )0 is eigenvector of A associated with λ. Likewise, if
A22 x2 = λx2 and x2 6= 0, then (00 , x02 )0 is an eigenvector of A associated with λ.

1.1.P5

If Ax = λx and x 6= 0, then λ2 x = A2 x = Ax = λx. Since x 6= 0, there exists at least one nonzero entry, say
xk , of x. The previous equation then implies λ2 xk = λxk , and since xk 6= 0 we must have λ2 = λ. This then
implies λ either equals 0 or 1.

If A is non-singular, then there exists y1 , · · · , yn such that Ayj = ej for j = 1, · · · , n. But ej = Ayj =
A2 yj = Aej = A·,j where A·,j is the j th column of A.

1.1.P6

Let k denote the index of nilpotence for the matrix A. If Ax = λx and x 6= 0, then 0 = Ak x = λk x. But
since some entry of x is nonzero, this equation implies λk = 0. Thus we must have λ = 0.

The matrix !
0 1
A=
0 0
is nilpotent of index 2.

If A is idempotent and nilpotent, then A = Ak = 0.

1.1.P7

Suppose Ax = λx for some x 6= 0. We then know that Āx̄ = λ̄x̄, which means that x∗ A∗ = λ̄x∗ . Multiplying
∗ ∗ ∗
x∗ x
by x and dividing by x∗ x 6= 0, we obtain λ̄ = x xA
∗x
x
= xx∗Ax
x = λ x∗ x = λ. Since λ̄ = λ, λ must be real.

1.1.P8

The proof of Theorem 1.1.9 hinges on the ability to factor a polynomial equation. However, many polynomials
have complex roots and therefore can only be factored using complex numbers. It is therefore not possible
to guarantee that the eigenvalue found in Theorem 1.1.6 will be real.

4
1.1.P9

Notice that !
x2
Ax =
−x1
Suppose x is an eigenvector of A associated with eigenvalue λ. We must then have λx1 = x2 and λx2 = −x1 .
If λ = 0, these equations imply x = 0. This contradicts the assumption that x is an eigenvector, so λ 6= 0.
If x1 = 0, then the first equation implies x2 = 0. This again contradicts the assumption that x is an
eigenvector, so x1 6= 0. Combining the equations gives us λ2 x1 = −x1 . Since x1 6= 0, this equation implies
λ2 = −1. Thus λ = i or λ = −i.

1.1.P10

Let x = {x1 , x2 , · · · } and y = {y1 , y2 , · · · } be two elements of V . Then

S(ax + by) = S({ax1 + by1 , ax2 + by2 , · · · })


= {0, ax1 + by1 , ax2 + by2 , · · · }
= a{0, x1 , x2 , · · · } + b{0, y1 , y2 , · · · }
= aS(x) + bS(y)

Therefore S is a linear transformation on V .

If S(x) = λx, then {0, x1 , x2 , · · · } = {λx1 , λx2 , · · · }. If λ = 0, the n-th term in this sequence equality implies
xn = 0. If λ 6= 0, the first term in the sequence equality implies x1 = 0. But if xn−1 = 0, then the n-th term
in the sequence equality implies xn = 0. Thus x = 0 by induction.

1.1.P11

The equation (A − λI) adj(A − λI) = 0 implies that any nonzero column of adj(A − λI) is an eigenvector
associated with the eigenvalue λ.

If rank(A − λI) = n − 1, then at least one element of adj(A − λI) is non-zero. Thus rank(adj(A − λI)) ≥ 1.
On the other hand,
adj(A − λI)(A − λI) = det(A − λI)I = 0

so the dimension of nullspace(adj(A−λI)) = n−1. The rank-nullity theorem then implies rank(adj(A−λI)) =
1. The result then follows because any rank 1 matrix can be decomposed into the product of two nonzero
vectors xy ∗ .

If rank(A − λI) ≤ n − 2, then every (n − 1) × (n − 1)-submatrix of A has a determinant of zero. This implies
adj(A − λI) = 0, so choose y ∗ = 0. Thus the result is really only useful when rank(A − λI) = n − 1.

5
1.1.P12

For the matrix A, we have !


d−λ −b
adj(A − λI) =
−c a−λ

According to 1.1.P11, rank(adj(A − λI)) ≤ 1. This implies one of the columns must be a scalar multiple of
the other. We also showed in 1.1.P11 that any nonzero column of adj(A − λI) must be an eigenvector of A
associated with the eigenvalue λ. For the matrix in 1.1.4a, we have
! !
−2 2 −4 2
adj(A − 3I) = adj(A − 5I) =
−4 4 −4 2

Therefore x(2, 4)0 is an eigenvector associated with the eigenvalue 3 and (2, 2)0 is an eigenvector associated
with the eigenvalue 5.

1.1.P13

Suppose λ 6= 0. Then we have

1 1 det A
adj(A)x = adj(A) (λx) = adj(A) Ax = x
λ λ λ
det A
Thus x is an eigenvector of adj A associated with eigenvalue λ . If λ = 0, then rank(A) ≤ n − 1. If
n
rank(A) ≤ n − 2, adj A = 0 so all vectors in C are eigenvectors of adj A associated with the eigenvalue 0.
If rank(A) = n − 1, then the result of 1.1.P11 implies that adj(A) = zy ∗ where z is in the null space of A.
But since dim(nullspace(A)) = 1, any non-zero vector satisfying Ax = 0 must equal αz for some scalar α.
Thus we have
adj(A)x = zy ∗ (αz) = hz, yi(αz) = hz, yix

so x is an eigenvector of adj(A) associated with the eigenvalue hz, yi.

Section 1.2

Exercises

1. Using properties of the determinant, we have

det(A − tI) = det((−1)(tI − A) = (−1)n det(tI − A) = (−1)n (tn + · · · )

This expression implies det(A − tI) = 0 ⇐⇒ det(tI − A) = 0.

2. From the definition of the characteristic polynomial, we have


!
t−a b
pA (t) det = (t − a)(t − d) − bc = t2 − (a + d)t + ad − bc
c t−d

6
λ is an eigenvalue of A if and only if pA (λ) = 0. From the quadratic formula, we know that the zeros
of pA (t) are given by
p √
a+d+ (a + d)2 − 4(ad − bc) a+d+ r
λ1 = =
2 2
p
2

a + d − (a + d) − 4(ad − bc) a+d− r
λ2 = =
2 2

Notice that

λ1 + λ2 = a + d = tr A
1
(a + d)2 − r = ad − bc = det A

λ1 λ2 =
4

Also observe that



λ1 − λ2 = r

so λ1 − λ2 = 0 if and only if r = 0. If A ∈ M2 (R), then



(a) a + d is a real number, so λ1 and λ2 are complex ⇐⇒ r is complex ⇐⇒ r < 0.

(b) If bc ≥ 0, then r = (a − d)2 + 4bc ≥ 0. By part (a), we must have that λ1 and λ2 are real.
√ √
(c) If r < 0. then r = i −r. Thus we have
√ √ √
a + d − i −r a + d + i −r a+d+ r
λ2 = = = = λ1
2 2 2

3. (a) The coefficients of pA (t) are real because they can be written as products of the elements of A
using the Leibniz’ formula for the determinant.

(b) Suppose pA (λ) = 0. But then 0 = pA (λ) = pA (λ̄). Thus λ̄ is also an eigenvalue of A. We can
therefore write p(A)(t) = (t − λ)(t − λ̄)g(t) where g(t) is a polynomial of degree n − 2. If λ is a
root of g(t), λ̄ is a root of g and so we can again factor out t − λ and t − λ̄ out of g(t). We can
continue in this way until the polynomial is completely factored or λ is no longer a root of the
quotient polynomial. But if λ is no longer a root of the quotient polynomial, then λ̄ cannot be
either. Thus λ and λ̄ must have the same algebraic multiplicities.

4. For each λ ∈ σ(A), we have |λ| ≤ ρ(A). Thus λ ∈ {z ∈ C : |z| < ρ(A)}.

5. The smallest non-zero eigenvalue must have positive length, so min{|λ| : λ ∈ σ(A), λ 6= 0} > 0.

6. If σ(A) were empty, then min{|λ| : λ ∈ σ(A), λ 6= 0} and ρ(A) are not well-defined. If σ(A) were
infinite, then min{|λ| : λ ∈ σ(A), λ 6= 0} and max{|λ| : λ ∈ σ(A)} might not exist.

7. Observe that  
t − t11 ··· t1n
 .. .. 
tI − T = 
 . . 

0 t − t1n

Repeatedly expanding on the first column gives us pT (t) = det(tI − T ) = (t − t11 ) · · · (t − tnn ). If T is
lower triangular, then T 0 is upper triangular so that we have pT (t) = det(tI − T ) = det((tI − T )0 ) =

7
det(tI − T 0 ) = (t − t11 ) · · · (t − tnn ) by the previous result. If T is diagonal, then it is upper triangular
and so we can apply the previous result.

8. Observe that  
tI1 − A11 ··· F
 .. .. 
tIn − A = 
 . .


0 tIk − Akk
Pk
where Ajj ∈ Mnj and j=1 nj = n. Thus we have

pA (t) = det(tIn − A)
= det(tI1 − A11 ) · · · det(tIk − Akk )
= pA11 (t) · · · pAkk (t)
= (t − λ11 ) · · · (t − λn1 1 ) · · · (t − λ1k ) · · · (t − λnk k )

where λ1j , · · · , λnj j denote the eigenvalues of Ajj .

9. From the definition, we have


n
X
S1 (λ1 , · · · , λn ) = λi
i=1
Yn
Sn (λ1 , · · · , λn ) = λi
i=1

!
n
The summation in the definition of Sk is over the combinations of k elements of the list
k
{λ1 , · · · , λn }. Reordering or rearranging the elements of the list does not change the combinations
which one can form.

1.2.P1

Let λ1 , · · · , λn denote the eigenvalues of A. We therefore know that


n
Y
λi = Sn (A) = En (A) = det A
i=1

Thus det A = 0 if and only if λi = 0 for some i. But this means A is singular if and only if 0 ∈ σ(A).

8
1.2.P2

From the definition of trace, we have


m
X
tr AB = Ai· B·i
i=1
Xm X n
= aij bji
i=1 j=1
Xn X m
= bji aij
j=1 i=1
Xn
= Bj· a·j
j=1

= tr BA

where Ai· and A·j denotes the i-th row and i-th column of matrix A, respectively, and where aij denotes the
i, j-th element of the matrix A (Bi· , B·j , and bij are analogously defined).

This result implies

tr A = tr AI = tr(ASS −1 ) = tr(S −1 AS)

We also have

det A = det A det I = det A det(SS −1 ) = det A det S det S −1 = det S −1 det A det S = det(S −1 AS)

1.2.P3

Let D = diag(d1 , · · · , dn ). Then we have

pD (t) = det(tI − diag(d1 , · · · , dn )) = det(diag(t − d1 , · · · , t − dn )) = (t − d1 ) · · · (t − dn )

Evaluating this polynomial at the matrix D yields

pD (D) = (D − d1 I) · · · (D − dn I)
= diag(0, d2 , · · · , dn ) · · · diag(d1 , · · · , dn−1 , 0) = 0

1.2.P4
Qk
From 1.1.P5, we know that σ(A) ⊆ {0, 1}. Thus j=1 λij ∈ {0, 1} for any {i1 , · · · , ik } ⊆ {1, · · · , n}. But
this implies Sk (A) is an integer for all k. Since (−1) Sk (A) is the coefficient on tk in pA (t), all the coefficients
n

in pA (t) must be integers.

9
1.2.P5
Pn
From 1.1.P6, we have that σ(A) = {0}. But this implies tr A = i=1 λi = 0. We also have that Sk (A) = 0
n
for all k, so 1.2.15 implies that pA (t) = t .

1.2.P6

Since det(A − λI) = 0, λ must be an eigenvalue of A. However, λ does not necessarily have multiplicity 1.
For example, consider the matrix  
2 0 0
A = 0 1 1
 

0 0 1

The characteristic polynomial of A is given by pA (t) = (t − 2)(t − 1)2 , so λ = 1 has a multiplicity of 2.


However the matrix  
1 0 0
A − I = 0 0 1
 

0 0 0
has rank 2 = 3 − 1.

1.2.P7

The characteristic polynomial can be written as

p(t) = t5 − E1 t4 + E2 t3 − E3 t2 + E4 t − E5

where Ek , k = 1, · · · , 5 is the sum of principle minors of size k. In this case, we have

E1 = 5
E2 = 6
E3 = −2
E4 = −3
E5 = 0

1.2.P8

From the definition of the characteristic polynomial, we have that

pA+λI (t) = det(tI − (A + λI)) = det((t − λ)I − A) = pA (t − λ) = (t − (λ + λ1 )) · · · (t − (λ + λn ))

Thus λ1 + λ, · · · , λn + λ are the zeros of pA+λI (t) and are therefore the eigenvalus of A + λI.

10
1.2.P9

S2 (λ1 , · · · , λ6 ) = λ1 λ2 + λ1 λ3 + λ1 λ4 + λ1 λ5 + λ1 λ6 +
λ2 λ3 + λ2 λ4 + λ2 λ5 + λ2 λ6 + λ3 λ4 + λ3 λ5 +
λ3 λ6 + λ4 λ5 + λ4 λ6 + λ5 λ6

1.2.P10

In an exercise, we showed that if λ is an eigenvalue of A, then λ̄ is an eigenvalue of Ā. Thus if A ∈ Rn ,


the complex eigenvalues must occur in pairs. If n is odd, then there must be at least one eigenvalue which
satisfies λ = λ̄, i.e. A must have at least one real eigenvalue.

1.2.P11

Let B = {v1 , · · · , vn } be a basis for V . For any v ∈ V , let pB (v) ∈ Cn denote the coordinates of v with
respect to the basis B. Define the matrix A ∈ Mn (C) as
 
A = pB (T v1 ) · · · pB (T vn )

and recall that pB (T v) = ApB (v) for all v ∈ V . By Theorem 1.1.9, A must have an eigenvalue λ. Let x
denote the eigenvalue associated with λ and define v = p−1
B (x). Then

T v = p−1 −1 −1 −1
B (ApB (v)) = pB (Ax) = pB (λx) = λpB (x) = λv

Thus λ is an eigenvalue of T .

Also observe that if λ is an eigenvalue of T , we must have

λpB (v) = pB (λv) = pB (T v) = ApB (v)

so that λ is an eigenvalue of the matrix A. Thus all the eigenvalues of T are also eigenvalues of the matrix A.
Notice that it is easy to construct a matrix which only has complex eigenvalues. For example, the eigenvalues
of !
0 −1
A=
1 0

are i and −i. If we take V = R2 , B = {e1 , e2 }, F = R, and let T e1 = e2 and T e2 = −e1 , then there does not
exist λ ∈ F which satisfies Ax = λx for some nonzero x ∈ V . This implies T does not have any eigenvalues.

See 1.1.P10 for an example of a linear transformation on an infinite dimensional complex vector space that
has no eigenvalues.

11
1.2.P12

Using Cauchy’s expansion, we have


!
tIn −x
pA (t) = det = (t − a) det(tIn ) − y ∗ (adj tIn )x = (t − a)tn − tn−1 y ∗ x = tn−1 (t2 − at − y ∗ x)
−y ∗ t−a

Alternatively, observe that the span of the first n column vectors of A has dimension 1, so the rank of A
can at most be 2. This implies all minors of A of size 3 or greater must be zero, so E3 (A) = E4 (A) = · · · =
En+1 (A) = 0. Let Aij , i < j, be the principle submatrix defined by
!
aii aij
Aij =
aji ajj

where aij is the i, j-th element of A. Then Aij = 0 unless j = n + 1. Since det Ai,n+1 = −ȳi xi , we have that
Pn Pn
E2 (A) = 1≤i<j≤n+1 det Aij = i=1 det Ai,n+1 = − i=1 ȳi xi = −y ∗ x. Thus we have
P

pA (t) = tn+1 − E1 (A)tn + E2 (A)tn−1 − · · · + (−1)n En (A)


= tn+1 − atn − y ∗ xtn−1
= tn−1 (t2 − at − y ∗ x)

Using the quadratic formula, we can factor the quadratic term in pA (t) to obtain
   
n−1 1 p
2 ∗
1 p
2 ∗
pA (t) = t t− a + a + 4y x t− a − a + 4y x
2 2

1.2.P13

(a) Using Cauchy’s expansion, we obtain


!
tIn − B −x
pA (t) = det
−y ∗ t−a
= (t − a) det(tIn − B) − y ∗ (adj(tIn − B))x
= (t − a)pB (t) − y ∗ (adj(tIn − B))x

(b) If B = λIn , then pB (t) = (t − λ)n and adj(tIn − B) = (t − λ)n−1 In . Plugging into the above formula,
we obtain

pA (t) = (t − a)(t − λ)n − (t − λ)n−1 y ∗ x


= (t − λ)n−1 (t2 − (a + λ)t + aλ − y ∗ x


12
Using the quadratic formula, we can factor the quadratic term in pA (t) to obtain the other eigenvalues:

1 p  1 p 
a ± λ + (a + λ)2 − 4(y ∗ x − aλ) = a ± λ + (a − λ)2 − 4y ∗ x
2 2

1.2.P14

 
t−λ F F
pA (t) = det  0 t−µ 0
 

0 F tIn−2 − B
!
t−µ 0
= (t − λ) det
FtIn−2 − B
 
= (t − λ)(t − µ) det tIn−2 − B

= (t − λ)(t − µ)pB (t)

1.2.P15

W 0 (t) = tr((adj X(t))X 0 (t))


= tr((adj X(t))A(t)X(t))
= tr(A(t)X(t)(adj X(t)))
= det(X(t)) tr(A(t))
= W (t) tr(A(t))

where the first equality follows from 0.8.10.1, the second equality from the assumption about X 0 (t), the third
equality from the fact that tr(AB) = tr(BA) when A, B ∈ Mm,n , the fourth equality from 0.8.2.1, and the
final equality from the definition of W (t).

The solution of this differential equation is given by


Rt
tr A(s)ds
W (t) = W (t0 )e t0

If x1 (t0 ), · · · , xn (t0 ) are linearly independent, then W (t0 ) = det X(t0 ) 6= 0. The above expression for W (t)
then implies that W (t) = det X(t) 6= 0 for all t, so x1 (t), · · · , xn (t) are linearly independent for all t.

1.2.P16

From 0.8.5.11, we have

f (t) = det(A + txy T ) = det A + (y T (adj A)x) · t

13
Thus for t1 6= t2 , we have

t2 f (t1 ) − t1 f (t2 ) t2 (det A + (y T (adj A)x) · t1 ) − t1 (det A + (y T (adj A)x) · t1 )


=
t2 − t1 t2 − t1
(t2 − t1 ) det A
=
t2 − t1
= det A

If x = y = e, we have that

f (−b) = det(A − beeT )


 
d1 − b 0 ··· 0
 .. .. 
c−b d −b
2 . . 
= det  .
 
.. ..

 .
 . . . 0


c−b ··· c−b dn − b
= (d1 − b) · · · (dn − b)
= q(b)

Likewise, it is easy to show that f (−c) = q(c). Plugging into our previous formula, we have

−cf (−b) + bf (−c)


det A =
−c + b
bq(c) − cq(b)
=
b−c
q(b) − q(c)
= q(b) − b
b−c

if b 6= c. Since q is a differentiable function, taking the limit of the above expression as c approaches b yields

det A = q(b) − bq 0 (b)

If d1 = · · · = dn = 0, define q(s) = (t − s)n . Using the above expressions, we can write

pA (t) = det(tI − A)
 
t −b ··· −b
 .. .. 
−c t . . 
= det  .
 
.. ..

 .
 . . . −b

−c · · · −c t
−bq(−c) + cq(−b)
=
−b + c
c(t + b)n − b(t + c)n
=
c−b

14
if b 6= c and

pA (t) = q(−b) + bq 0 (−b)


= (t + b)n − nb(t + b)n−1
= (t + b)n−1 (t − (n − 1)b)

1.2.P17

Using the partitioned determinant formula, we have


!
tIn −A
pC (t) = det
−B tIn
= det(tIn ) det(tIn − t−1 BA)
= det(t2 In − BA)
= pBA (t2 )

We can also write


! ! !
0 In tIn −B 0 In
pC (t) = det det det
In 0 −A tIn In 0
= det(tIn ) det(tIn − t−1 AB)
= det(t2 In − AB)
= pAB (t2 )

1
Therefore pBA (λ) = 0 ⇐⇒ pC (λ 2 ) = 0 ⇐⇒ pAB (λ) = 0, so BA and AB have the same eigenvalues. This
implies that their characteristic polynomials are the same, so the constant terms and the coefficients on tn−1
are the same. But this means that tr AB = tr BA and det AB = det BA. If we replace A with −A in the
above expressions, we have

det(I + AB) = p−AB (1) = pC (1) = p−BA (1) = det(I + BA)

1.2.P18

When A ∈ M3 , we have that

E2 (A) = det A[(1, 2)] + det A[(1, 3)] + det A[(2, 3)]
! ! !
a11 a12 a11 a13 a22 a23
= det + det + det
a21 a22 a31 a33 a32 a33
= tr adj A

The result then follows from 1.2.13.

15
1.2.P19

We know that det A = λ1 · · · λn . If λi is positive for all i, then det A must also be positive. Thus we have

n ≥ tr A
1
= (λ1 + · · · + λn ) n
n
≥ n(λ1 · · · λn )1/n
= n(det A)1/n
≥n

where the first line follows because all the entries in A are either 0 or 1, the second line follows because
tr A is the sum of the eigenvalues, the third line follows from the AM-GM inequality, the fourth line follows
because the determinant is equal to the product of the eigenvalues and the last line follows because det A is
a positive integer. This expression implies that tr A = n, which means that aii = 1 because all the elements
of A or either 0 or 1. The expression also implies n(det A)1/n = n, which means that det A = 1. Finally,
1
because n (λ1 + · · · + λn ) = (λ1 · · · λn )1/n if and only if λ1 = · · · = λn (AM-GM inequality) we must have
λi = 1 for all i.

1.2.P20

From 1.2.13, we have

det(I + A) = p−A (1) = 1 − E1 (−A) + · · · + (−1)n−1 En−1 (−A) + (−1)n En (−A)

Now Ek (−A) = (−1)k Ek (A), so

det(I + A) = 1 − (−1)E1 (A) · · · + (−1)2(n−1) En−1 (A) + (−1)2 nEn (A) = 1 + E1 (1) + · · · + En (A)

1.2.P21

Observe that
(tI − cA)x = (t − cλ)x

which implies that

det(tI − cA)x = adj(tI − cA)(tI − cA)x = (t − cλ) adj(tI − cA)x (1)

16
We then have

(t − cλ)pA(c) (t) = (t − cλ) det(tI − cA − (1 − c)λxv ∗ )


= (t − cλ) det(tI − cA) − (t − cλ)(1 − c)λv ∗ adj(tI − cA)x
= (t − cλ) det(tI − cA) − (1 − c)λ det(tI − cA)v ∗ x
= (t − λ) det(tI − cA)
= (t − λ)pcA (t)

where the first equality follows from the definition of the characteristic polynomial, the second equality from
0.8.5.11, the third equality from equation (1), the fourth equality from the assumption that v ∗ x = 1, and
the final equality from the definition of the characteristic polynomial. The zeros of the polynomial on the
left-hand side are the n eigenvalues of A(c) along with cλ. The zeros of the polynomial on the right-hand
side are the n eigenvalues of cA along with λ. If c = 0, pcA (t) = tn so that pA(c) (t) = tn−1 (t − λ). If c 6= 0,
then pcA (t) = pA ct . The zeros of this polynomial occur at cλ, cλ2 , · · · , cλn . Thus in either case the zeros


of pA(c) (t) occur at λ, cλ2 , · · · , cλn .

1.2.P22

We have that
 
t −1
t −1
 
 
 
pCn () (t) = det 
 .. .. 
 . . 

t −1
 

− t
= tn + (−1)n+1 (−)(−1)n−1
= tn − 

Observe that for any k ∈ {0, 1, · · · , n − 1}, we have


 
pCn () 1/n e2πik/n = e2πik −  = (cos(2πk) + i sin(2πk) − 1) = 0

Since there are n distinct elements in the set 1/n e2πik/n : k ∈ {0, 1, · · · , n − 1} , they must be the eigen-


values of Cn (). By Observation 1.1.8, we also must have that


n o
σ(I + Cn ()) = 1 + 1/n e2πik/n : k ∈ {0, 1, · · · , n − 1}

17
Define λk = 1 + 1/n e2πik/n . Then

|λk | = 1 + 1/n cos(2πk/n) + 1/n sin(2πk/n)i

q 2 2
= 1 + 1/n cos(2πk/n) + 1/n sin(2πk/n)
q
= 1 + 21/n cos(2πk/n) + 2/n
p
≤ 1 + 21/n + 2/n
q
= (1 + 1/n )2
= 1 + 1/n

where the inequality can be replaced with an equality for k = 0. This then implies ρ(I + Cn ()) = 1 + 1/n .

1.2.P23

From Theorem 1.2.16, we know that Sn−1 (A) = En−1 (A). Since A is singular, 0 is an eigenvalue of A. Since
all the eigenvalues of A are distinct, the rest of the eigenvalues are non-zero. Let λ2 , · · · , λn denote the
nonzero eigenvalues of A. Then
n
Y
Sk (0, λ2 , · · · , λn ) = λj 6= 0
j=2

But this implies En−1 (A) 6= 0, so A must have at least one non-zero principle minor.

Section 1.3

Exercises

1. (a) If A and B are similar, they have the same characteristic polynomial. This implies that the roots
of their characteristic polynomials are the same, so they share the same eigenvalues.

(b) This follows because A and B share the same eigenvalues and the eigenvalues of a diagonal matrix
are the elements on the main diagonal.

(c) If B = S −1 AS = 0, then A = SBS −1 = 0.

(d) If B = S −1 AS = I, then A = SBS −1 = I.

2. Assume that B = S −1 AS and suppose B k = S −1 Ak S. Then B k+1 = S −1 ASB k = S −1 ASS −1 Ak S =


S −1 Ak+1 S. Thus it follows by induction that B k = S −1 Ak S for k = 1, 2, · · · . Now assume that

18
q(t) = a0 + a1 t + · · · + ak tk . Then we have

q(B) = a0 I + a1 B + · · · + ak B k
= a0 S −1 S + a1 S −1 AS + · · · + ak S −1 Ak S
= S −1 (a0 I + a1 A + · · · + ak Ak )S
= S −1 q(A)S

In particular, this result holds for q(t) = α + t. This implies that αI + A is similar to αI + B.

3. If A = S −1 BS and C = S −1 DS, then

A + C = S −1 BS + S −1 DS = S −1 (B + D)S
AC = S −1 BSS −1 DS = S −1 BDS

4. Since S −1 AS and A share the same eigenvalues, Sk (S −1 AS) = Sk (A) for k = 1, · · · , n. Theorem 1.2.16
then implies Ek (S −1 AS) = Ek (A) for k = 1, · · · , n.

5. By 0.4.6(b), left or right multiplication by a nonsingular matrix leaves the rank of a matrix unchanged.
Thus if B = S −1 AS, then rank B = rank A.

6. Notice that pA (t) = t2 . Thus if A was similar to a diagonal matrix D, the elements on the diagonal
of D would have to be zeros because pD (t) = t2 . Thus D = 0. But Corollary 1.3.4(c) implies that the
only matrix similar to the 0 is 0.

Notice that nullspace A is spanned by (1, 0)0 . Thus there is only a single linearly independent eigen-
vector associated with the sole eigenvalue 0.

7. By Theorem 1.1.6, all eigenvectors of A are also eigenvectors of p(A). Thus if A has n linearly
independent eigenvectors, so does p(A). The converse of Theorem 1.1.6 does not apply to
! eigenvectors,
0 1
so it is not true that if p(A) is diagonalizable then so is A. For example, if A = then A2 = 0.
0 0
In this case A2 is diagonalizable but A is not.

8. There can be at most n − m linearly independent eigenvectors associated with eigenvalues other than λ
(see next exercise). If rank(A − λI) > n − m, the rank-nullity theorem implies that nullspace(A − λI)
has dimension less than m. But this means there are fewer than m linearly independent eigenvectors
associated with λ, so there must be fewer than n linearly independent eigenvectors of A.
!
Λ C
9. From Theorem 1.3.7, we know that A is similar to a matrix where Λ = diag(λ, · · · , λ) is a
0 D
k × k diagonal matrix. Because matrices that are similar share the same characteristic polynomials, we
have that pA (t) = pΛ (t)pD (t) = (t − λ)k pD (t). This implies that the algebraic multiplicity of λ must
be at least k.

10. I is a diagonalizable matrix in which all eigenvalues are equal to 1.

11. The i, j-th entry of P AP T is given by Pi·0 APj· where Pi· is the i-th row vector of P . Since each row

19
is equal to ek for some k, the i, i-th entry of P AP T is equal to e0k Aek = Akk , where Akk is the k-th
entry of A on its principle diagonal. To obtain any permutation σ = (σ(1), · · · , σ(n))0 of the elements
on the diagonal, one only needs to construct the permutation matrix P = (eσ(1) · · · eσ(n) ) and form the
product P AP T .

12. Suppose AB = BA. Then (S −1 AS)(S −1 BS) = S −1 ABS = S −1 BAS = (S −1 BS)(S −1 AS). Con-
versely, suppose (S −1 AS)(S −1 BS) = (S −1 BS)(S −1 AS). Then S −1 ABS = S −1 BAS, so AB = BA.

13. This follows from the previous exercise because S −1 AS and S −1 BS are diagonal and therefore com-
mute.

14. If D = S −1 AS for some diagonal matrix D, then λI = S −1 (λI)S.

15. Suppose W is a one-dimensional A-invariant subspace of Cn . Since dim W = 1, there exists z ∈ Cn


such that for all w ∈ W we have w = αz for some α ∈ C. Pick w ∈ W with w 6= 0. Then there exists
α 6= 0 such that w = αz. Since Aw ∈ W , we also must have Aw = α̃z for some α̃. But this implies
α̃
Aw = α w, so w is an eigenvector of A.

16. We can write    


I = S −1 S = S −1 S1 S2 = S −1 S1 S −1 S2

which implies S −1 S1 = I[(1, · · · , n), (1, · · · , k)] and S −1 S2 = I[(1, · · · , n), (k + 1, · · · , n)].

17. Suppose m > 1. Let {x1 , · · · , xm } be a basis of W . Consider the subspace V = span({x1 }). Since
V ⊂ W , for any x ∈ V and any A ∈ F there exists λ such that Ax = λx ∈ V . Thus V is a non-zero
F-invariant subspace. But then 1 = dim V ≥ dim W = m, a contradiction.

18. By Lemma 1.3.19, there exists an x ∈ Cn such  that x is an eigenvector


 of all A ∈ F. The result then
follows by choosing s2 , · · · , sn such that S = x s2 · · · sn is non-singular.

19. The columns of A are clearly linearly independent, so A is nonsingular. We have


! ! !
Im −X Im X Im X −X
= = Im+n
0 In 0 In 0 In

and ! ! !
Im X Im −X Im −X + X
= = Im+n
0 In 0 In 0 In

20
20. We have
! ! !
L(d1 , λ1 ) 0 T (λ1 , λ2 ) e2 L(λ1 , d1 ) 0
LT (λ1 , λ2 , λ3 )L−1 =
0 1 0 λ3 0 1
!
L(d1 , λ1 )T (λ1 , λ2 )L(λ1 , d1 ) e2
=
0 λ3
 
d1 F 0
= F λ1 + λ2 − d1 1
 

0 0 λ3
!
d1 F
=
F T (λ1 + λ2 − d1 , λ3 )

But λ1 + λ2 − d1 + λ3 = d2 + d3 , so by the induction hypothesis we can find nonsingular S such that


the diagonal entries of ST (λ1 + λ2 − d1 , λ3 )S −1 are d2 and d3 . But this implies
! ! ! ! !
1 0 −1 1 0 1 0 d1 F 1 0
LT (λ1 , λ2 , λ3 )L =
0 S 0 S −1 0 S F T (λ1 + λ2 − d1 , λ3 ) 0 S −1
!
d1 F
=
F ST (λ1 + λ2 − d1 , λ3 )S −1

has diagonal entries d1 , d2 , and d3 . Thus we have found a matrix with the desired diagonal entries
which is similar to a matrix that has eigenvalues λ1 , λ2 , and λ3 .

1.3.P1

(a) By Theorem 1.3.21, there exists some nonsingular S such that S −1 AS = diag(λ1 , · · · , λn ) and S −1 BS =
Λ is diagonal. By Theorem 1.3.7, the elements along the diagonal of Λ are the all eigenvalues of B. Thus
we can write Λ = diag(µi1 , · · · , µin ) for some permutation i1 , · · · , in of 1, · · · , n. Then we have

S −1 (A + B)S = diag(λ1 + µi1 , · · · , λn + µin )

By Theorem 1.3.7, λ1 + µi1 , · · · , λn + µin are the eigenvalues of A + B.

(b) By 1.1.P6, if B is nilpotent then µ1 = · · · = µn = 0. Thus the previous result implies that the eigenvalues
of A + B are the same as the eigenvalues of A.

(c) We know that


diag(λ1 µi1 , · · · , λn µin ) = S −1 ASS −1 BS = S −1 ABS

so the eigenvalues of AB are λ1 µi1 , · · · , λn µin .

21
1.3.P2

Observe that if A and B commute, we have

(A − aI)(B − bI) = AB − bA − aB + abI


= BA − aB − bA + baI
= (B − bI)(A − aI)

Let p(t) be a polynomial of degree k. Since p(A)p(B) can written as (A − a1 I) · · · (A − ak I)(B − a1 I) · · · (B −


ak I), the result follows by commuting the multiplicands.

1.3.P3

Since A = S −1 ΛS, it is easy to see via induction that Ak = S −1 Λk S. It is also apparent that Λk =
diag(λk1 , · · · , λkn . These result imply p(A) = S −1 p(Λ)S for any polynomial p.

1.3.P4

Let S be the non-singular matrix of eigenvectors associated with A. Then S −1 AS = diag(α1 , · · · , αn ) with
αi 6= αj for i 6= j. But since S −1 AS and S −1 BS commute, D = S −1 BS must be diagonal (see 0.7.7). Write
D = diag(d1 , · · · , dn ).

Consider the n − 1 polynomial


n Q
X j6=i (t − αj )
p(t) = di Q
i=1 j6=i (αi − αj )

Then p(αi ) = di for i = 1, · · · , n. By 1.3.P3, we then have

p(A) = S −1 diag(p(α1 ), · · · , p(αn ))S = S −1 DS = B

1.3.P5

Consider the matrix !


0 1
A=
0 0

Then A commutes with itself, but A is not diagonalizable (see exercise 6 above). Thus A cannot be si-
multaneously diagonalizable. This does not contradict Theorem 1.3.12 because that theorem requires the
commuting matrices to be diagonalizable.

22
1.3.P6

(a) pΛ (t) = (t − λ1 ) · · · (t − λn ), so

pΛ (Λ) = (Λ − λ1 I) · · · (Λ − λn I)
= diag(0, λ2 , · · · , λn ) · · · diag(λ1 , · · · , λn−1 , 0)
=0

(b) If A is similar to Λ, then they have the same characteristic polynomial by Theorem 1.3.3. 1.3.P3 then
implies that pA (A) = pΛ (A) = S −1 pΛ (Λ)S = 0.

1.3.P7

Suppose S −1 AS = Λ, where Λ = diag(λ1 , · · · , λn ). Then we have

1/2 1/2
S diag(λ1 , · · · , λ1/2
n )S
−1
S diag(λ1 , · · · , λ1/2
n )S
−1
= SΛS −1 = A

1/2 1/2
so S diag(λ1 , · · · , λn )S −1 is a square root of A.

Suppose we could find a matrix A such that


! ! ! !
a11 a12 a11 a12 a211 + a12 a21 a12 (a11 + a22 ) 0 1
= =
a21 a22 a21 a22 a21 (a11 + a22 ) a222 + a12 a21 0 0

The off-diagonal equations imply that a21 = 0 and a11 + a22 6= 0. The diagonal equations then imply that
a211 = a222 = 0, so a11 = a22 = 0. But this is a contradiction, so no such matrix exists.

1.3.P8

We showed in Exercise 12 that simultaneously diagonalizable matrices also commute. To show the converse,
suppose Bx = λx for some x 6= 0. Then BAx = ABx = A(λx) = λAx, so Ax is an eigenvector of B. But
because the eigenvalues are distinct, the subspace {x : Bx = λx} can only have dimension equal to 1. Thus
Ax = µx for some µ. But this implies that the matrix of eigenvectors of B can also be used to diagonalize
A.

1.3.P9

Since AB = 0 and BA = B, we have that pAB (t) = t2 = pBA (t). Thus σ(AB) = σ(BA) = 0.

To see that A and B are not similar, observe that the only eigenvalue of B is 0. But because A takes the
form in 1.3.7.1 with Λ = 1, Theorem 1.3.7 implies that A and B cannot be similar.

23
1.3.P10

Consider the linear relationship

(1) (1) (1) (1) (k) (k) (k) (k)


a1 x1 + a2 x2 + · · · a(1) (1) (k) (k)
n1 xn1 + · · · + a1 x1 + a2 x2 + · · · + ank xnk = 0

Define the expression


B1 = (A − λ2 I)(A − λ3 I) · · · (A − λk I)

Then
(1) (1)
B1 xi1 = (λ1 − λ2 )(λ1 − λ3 ) · · · (λ1 − λk )xi
(j)
for i1 = 1, · · · , n1 and B1 xij = 0 for j = 2, · · · , k and ij = 1, · · · , nj . Thus if we have

(1) (1) (1) (1) (k) (k) (k) (k)


0 = B1 (a1 x1 + a2 x2 + · · · a(1) (1) (k) (k)
n1 xn1 + · · · + a1 x1 + a2 x2 + · · · + ank xnk )
(1) (1) (1) (1)
= B1 (a1 x1 + a2 x2 + · · · a(1) (1)
n1 x n1 )
(1) (1) (1) (1)
= (λ1 − λ2 )(λ1 − λ3 ) · · · (λ1 − λk )(a1 x1 + a2 x2 + · · · a(1) (1)
n1 xn1 )

But since λi 6= λj for i 6= j, this expression implies

(1) (1) (1) (1)


0 = a1 x1 + a2 x2 + · · · a(1) (1)
n1 x n1

(1) (1) (1) (1) (1) (1)


But x1 , x2 , · · · , xn1 are linearly independent, so this implies a1 = a2 = · · · = an1 = 0. A similar
argument can be used to show that the other coefficients are equal to zero.

1.3.P11

(a) Suppose A, B ∈ Mn commute and that x, λ is an eigenpair of A. Consider the sequence of vectors x, Bx,
B 2 x, · · · . Let k be the smallest positive integer such that B k x is a linear combination of its predecessors.
We know that k ≤ n since every collection of n+1 vectors in a space of dimension n is linearly dependent.
Define S = span{x, Bx, B 2 x, · · · , B k−1 x}. Pick y ∈ S, so that y = a1 x + a2 Bx + · · · + ak B k−1 x for some
a1 , · · · , ak . Then By = a1 Bx + a2 B 2 x + · · · + ak−1 B k−1 x + ak B k x. But since B k x = b1 x + b2 Bx + · · · +
bk B k−1 x for some b1 , · · · , bk , we must have By = ak b1 x + (a1 + ak b2 )Bx + · · · + (ak−1 + ak bk )B k−1 x.
Therefore By ∈ S, so S is B-invariant. By Observation 1.3.18, we know that S contains an eigenvector
of B. But also observe that

Ay = a1 Ax + a2 ABx + · · · + ak AB k−1 x
= a1 Ax + a2 BAx + · · · + ak B k−1 Ax
= a1 λx + a2 Bλx + · · · + ak B k−1 λx
= λ(a1 x + a2 Bx + · · · + ak B k−1 x)
= λy

so that every nonzero element of S is also an eigenvector of A. Thus A and B share an eigenvector.

(b) Suppose every finite commuting family of m − 1 matrices contains a common eigenvector and consider

24
the family F = {A1 , A2 , · · · , Am }. Let y 6= 0 be the common eigenvector of {A1 , A2 , · · · , Am−1 }. By
Theorem 1.1.9, there exists a polynomial g(t) of degree at most n − 1 such that g(Am )y is an eigenvector
of Am . Because Am commutes with Ak for k = 1, · · · , m − 1, we have that Ak g(Am )y = g(Am )Ak y =
g(Am )(λy) = λ(g(Am )y). Thus g(Am )y is also an eigenvector of Ak for k = 1, · · · , m − 1.

(c) Since dim Mn = n2 , no list of more than n2 matrices from F ⊂ Mn can be linearly independent.
Select a maximal linearly independent set A1 , · · · , Am . Notice that all the elements of F must be linear
combinations of this set. By part (b), we know that there exists a common eigenvector y for this set.
But since any A ∈ F can be written as A = a1 A1 + · · · + am Am for some a1 , · · · , am , we have

Ay = a1 A1 y + · · · + am Am y
= a1 λ1 y + · · · + am λm y
= (a1 λ1 + · · · + am λm )y

so y is an eigenvector of A.

1.3.P12

By Theorem 1.3.22, we know that AB and BA are similar. Thus if AB is diagonalizable, then so must BA.

Observe that
! ! !
0 1 1 1 0 0
=
0 0 0 0 0 0
! ! !
1 1 0 1 0 1
=
0 0 0 0 0 0

In this case AB is diagonalizable while BA is not, which is possible because both A and B are singular.

1.3.P13

Let A, B ∈ Mn be diagonalizable matrices. If A and B are similar, they have the same characteristic
polynomial by Theorem 1.3.3. Conversely, suppose A and B have the same characteristic polynomial. Since
(A) (A) (B) (B)
A and B are diagonalizable, there exists Λ(A) = diag(λ1 , · · · , λn ) and Λ(B) = diag(λ1 , · · · , λn ) such
that Λ(A) = S −1 AS and Λ(B) = R−1 BR. But then

(A) (B)
(t − λ1 ) · · · (t − λ(A) (B)
n ) = pΛ(A) (t) = pA (t) = pB (t) = pΛ(B) (t) = (t − λ1 ) · · · (t − λn )

But this implies ΛA = P T ΛB P for some permutation matrix P . Since A ∼ ΛA , ΛA ∼ ΛB , and ΛB ∼ B, we


must have that A ∼ B.
! !
0 1 0 0
Observe that both and have the same characteristic polynomial, but are not similar. Thus
0 0 0 0
if one of the matrices is not diagonalizable, the converse of Theorem 1.3.3 does not follow.

25
1.3.P14

(a) Multiplication by a nonsingular matrix does not change rank. Thus rank(Λ) = rank(A) if Λ = S −1 AS
for Λ = diag(λ1 , · · · , λn ). The elements on the diagonal of Λ are the eigenvalues of A and the rank of Λ
is equal to the number of nonzero elements on its diagonal. Thus the rank of A is equal to the number
of its non-zero eigenvalues.

(b) Observe that Ak = SΛk S −1 . Since Λk = diag(λk1 , · · · , λkn ), we have rank(Ak ) = rank(Λk ) = rank(Λ) =
rank(A).

(c) If A = 0, it is obvious that A is nilpotent. Suppose Ak = 0 for some k ≥ 1. Since rank(A) = rank(Ak ),
we must have rank(A) = 0. This implies A = 0.

(d) If rank A = 1, then there only exists one j ∈ {1, · · · , n} such that λj 6= 0. But if λ1 + · · · + λn = 0, then
we must have λj = 0. But this implies rank A = 0, a contradiction.

(e) There are no nonzero eigenvalues of B, but the rank of B is 1. This means (a) does not hold, so B is
not diagonalizable.

The rank of B 2 is zero but the rank of B is one. This means (b) does not hold, so B is not diagonalizable.

B is nilpotent but B 6= 0. This means (c) does not hold, so B is not diagonalizable.

tr B = 0 but rank B = 1. This means (d) does not hold, so B is not diagonalizable.

1.3.P15

Suppose A = S −1 ΛS, where Λ = diag(λ1 , · · · , λn ). Then Ak = S −1 Λk S for any k = 1, 2, · · · . If p(t) =


α0 + α1 t + α2 t2 · · · + αk tk , then we have

p(A) = α0 I + α1 A + α2 A2 · · · + αk Ak
= α0 S −1 S + α1 SΛS −1 + α2 SΛ2 S −1 · · · + αk SΛk S −1
= S −1 (α0 I + α1 Λ + α2 Λ2 · · · + αk Λk )S
= S −1 diag(p(λ1 ), · · · , p(λn ))S

Thus p(A) is diagonalizable.


!
0 1
The converse is not true. Suppose A = and p(t) = t2 . Then p(A) = A2 = 0, which is trivially
0 0
diagonalizable. However A is not diagonalizable.

1.3.P16

We know that

pA (t) = tn−r pB (t)


= tn−r (tr + E1 (B)tr−1 + · · · + Er−1 (B)t + Er (B))

26
But Er (B) = det B 6= 0 and Er (B) = Er (A), so there must exist a principle submatrix of size r with nonzero
determinant.

Consider the matrix !


1 1
A=
−1 −1
Then A is obviously rank principle. However, the only eigenvalue of A is 0 and the null space of A is !
spanned
b 0
by (−1, 1)0 . By Theorem 1.3.7, we know that A is similar to a matrix of the form b ⊕ 0 = if and
0 0
only if b is an eigenvalue of A. But this implies b = 0, which means that b ⊕ 0 = 0. But the only matrix
similar to 0 is 0, so A cannot be similar to a matrix of the form b ⊕ 0.

1.3.P17

Suppose A = T BT −1 for some nonsingular T ∈ Mn (R). Then

Ā = T BT −1 = T̄ B̄ T̄ −1 = T B̄T −1

Conversely, suppose there exists S ∈ Mn such that A = SBS −1 and Ā = S B̄S −1 . Make the following
definitions:

A = C (A) + iD(A)
B = C (B) + iD(B)
S = C (S) + iD(S)

where C (j) , D(j) ∈ Mn (R) for j ∈ {A, B, S}. Then we have

AS = (C (A) + iD(A) )(C (S) + iD(S) ) = C (A) C (S) − D(A) D(S) + i(C (A) D(S) + D(A) C (S) )
SB = (C (S) + iD(S) )(C (B) + iD(B) ) = C (S) C (B) − D(S) D(B) + i(C (S) D(B) + D(S) C (B) )
ĀS = (C (A) − iD(A) )(C (S) + iD(S) ) = C (A) C (S) + D(A) D(S) + i(C (A) D(S) − D(A) C (S) )
S B̄ = (C (S) − iD(S) )(C (B) + iD(B) ) = C (S) C (B) + D(S) D(B) + i(D(S) C (B) − C (S) D(B) )

Equating the real parts of the expressions AS = SB and ĀS = S B̄, we have

C (A) C (S) − D(A) D(S) = C (S) C (B) − D(S) D(B)


C (A) C (S) + D(A) D(S) = C (S) C (B) + D(S) D(B)

Summing these expressions, we see that C (A) C (S) = C (S) C (B) . This then implies that D(A) D(S) =
D(S) D(B) . Now equating the imaginary parts of the expressions AS = SB and ĀS = S B̄ gives us

C (A) D(S) + D(A) C (S) = C (S) D(B) + D(S) C (B)


C (A) D(S) − D(A) C (S) = D(S) C (B) − C (S) D(B)

Summing these expressions, we have that C (A) D(S) = D(S) C (B) . This then implies that D(A) C (S) =

27
C (S) D(B) . We know from Lemma 1.3.28 that there is a real number τ such that T = C (S) + τ D(S) is
nonsingular. But then

AT = (C (A) + iD(A) )(C (S) + τ D(S) )


= C (A) C (S) + τ C (A) D(S) + i(D(A) C (S) + τ D(A) D(S) )
= C (S) C (B) + τ D(S) C (B) + i(C (S) D(B) + τ D(S) D(B) )
= (C (S) + τ D(S) )(C (B) + iD(B) )
= TB

so A = T BT −1 .

1.3.P18

Suppose A = SBS −1 and AĀ = B B̄ = I. Then

S B̄S −1 = ĀAS B̄S −1


= ĀSB B̄S −1
= ĀSS −1
= Ā

By 1.3.P17, there exists T ∈ Mn (R) such that A = T BT −1 .

The converse follows immediately from the fact that Mn (R) ⊂ Mn (C).

1.3.P19

(a) Partition A so that it is conformable with K2n :


!
A11 A12
A=
A21 A22

where Aij ∈ Mn , (i, j) ∈ {1, 2}2 . Then we have


!
A21 A22
K2n A =
A11 A12
!
A12 A11
AK2n =
A22 A21

If K2n A = AK2n , then we must have A12 = A21 and A11 = A22 . By defining B = A11 = A22 and
C = A12 = A21 , we see that A has the desired form.

Conversely, suppose A11 = A22 and A12 = A21 . Then we see that K2n A = AK2n .

If A is nonsingular, then we can left and right multiply both sides of K2n A = AK2n by A−1 to obtain
A−1 K2n = K2n A−1 . But this then implies A−1 is centrosymmetric.

28
If A and B are centrosymmetric, we can right multiply both sides of K2n A = AK2n by B to obtain
K2n AB = AK2n B. But since K2n B = BK2n , this equation implies K2n AB = ABK2n . But this then
means that AB is centrosymmetric.

(b)
! ! !
−1 1 In In B C In In
Q AQ =
2 In −In C B In −In
! !
1 B+C C +B In In
=
2 B−C C −B In −In
!
B+C
=
B−C
= (B + C) ⊕ (B − C)

(c)

det A = det(Q−1 AQ)


= det(B + C) det(B − C)
= det(B 2 + CB − BC − C 2 )

rank(A) = rank(Q−1 AQ)


= rank(B + C) + rank(B − C)

! !
0 C C 0
(d) We know from part (b) that A = is similar to à = by the similarity matrix Q.
C 0 0 −C
This implies
pA (t) = pà (t) = pC (t)p−C (t)

If pC (λ) = det(λI − C) = 0, then p−C (−λ) = det(−λI + C) = (−1)n det(λI − C) = 0. Thus the
eigenvalues occur in ± pairs. If C is real, any imaginary eigenvalues occur in conjugate pairs. Thus if
there is one root of the form a + bi, then a − bi, −a − bi and −a + bi are also roots.

1.3.P20

(a) Observe that

A + B = A1 + iA2 + B1 + iB2
= (A1 + B1 ) + i(A2 + B2 )

29
Thus we have
!
A1 + B1 A2 + B 2
R1 (A + B) =
−(A2 + B2 ) A1 + B1
! !
A1 A2 B1 B2
= +
−A2 A1 −B2 B1
= R1 (A) + R1 (B)

Next, observe that

AB = (A1 + iA2 )(B1 + iB2 )


= A1 B1 + iA1 B2 + iA2 B1 − A2 B2
= A1 B1 − A2 B2 + i(A1 B2 + A2 B1 )

Thus we have that


!
A1 B 1 − A2 B 2 A1 B2 + A2 B1
R1 (AB) =
−(A1 B2 + A2 B1 ) A1 B1 − A2 B2
! !
A1 A2 B1 B2
=
−A2 A1 −B2 B1
= R1 (A)R1 (B)

Finally, it is easy to see that


!
In 0
R1 (In ) = = I2n
0 In

(b) Observe that if A is invertible, then we have

I2n = R1 (In )
= R1 (AA−1 )
= R1 (A)R1 (A−1 )
I2n = R1 (In )
= R1 (A−1 A)
= R1 (A−1 )R1 (A)

where we have applied the results from part (a). Thus R1 (A) is invertible with inverse R1 (A−1 ) and is
therefore nonsingular.

(c) By parts (a) and (b), we have

R1 (SAS −1 ) = R1 (S)R1 (A)R1 (S −1 ) = R1 (S)R1 (A)R1 (S)−1

30
(d) Suppose B = SAS −1 . Then

R1 (B) = R1 (SAS −1 )
= R1 (S)R1 (A)R1 (S)−1

so R1 (A) ∼ R1 (B) by the similarity matrix R1 (S).

(e)
! ! !
In iIn In −iIn In 0
=
0 In 0 In 0 In
! ! !
In −iIn In iIn In 0
=
0 In 0 In 0 In

! ! !
1 In −iIn In iIn 1 2In 0
= = I2n
2 −iIn In iIn In 2 0 2In
! ! !
1 In iIn In −iIn 1 2In 0
= = I2n
2 iIn In −iIn In 2 0 2In

(f)

S −1 R1 (A)S = S̄R1 (A)S


! ! !
In −iIn A1 A2 In iIn
=
0 In −A2 A1 0 In
! !
A1 + iA2 A2 − iA1 In iIn
=
−A2 A1 0 In
!
A iA1 − A2 + A2 − iA1
=
−A2 A1 − iA2
!
A 0
=
−A2 Ā

31
U −1 R1 (A)U = Ū R1 (A)U
! ! !
1 In −iIn A1 A2 In iIn
=
2 −iIn In −A2 A1 iIn In
! !
1 A1 + iA2 A2 − iA1 In iIn
=
2 −A2 − iA1 A1 − iA2 iIn In
!
1 A + A1 + iA2 −A2 + iA1 + A2 − iA1
=
2 −A2 − iA1 + A2 + iA1 A1 − iA2 + A1 − iA2
!
A 0
=
0 Ā

(g) By part (f), we know that R1 (A) is similar to A⊕Ā. The eigenvalues of A⊕Ā are given by λ1 , · · · , λn , λ̄1 , · · · , λ̄n .

(h) We know that det R1 (A) is equal to the product of all the eigenvalues of R1 (A). From part (g), we see
Qn Qn
that R1 (A) = i=1 |λi |2 = | i=1 λi |2 = | det A|2 .

We can also see from part (f) that rank R1 (A) = rank A + rank Ā = 2 rank A.

(i) If R1 (A) is nonsingular, then det R1 (A) 6= 0. But this implies from part (h) that λi 6= 0 for i = 1, · · · , n,
which means that A is non-singular.

(j) The characteristic functions of iIn and −iIn are given by (t − i)n and (t + i)n , respectively. Since the
two matrices have different characteristic functions, they cannot be similar. However, observe that
!
0 In
R1 (iIn ) =
−In 0
! ! !
0 In 0 −In 0 In
=
In 0 In 0 In 0
−1
= K2n R1 (−iIn )K2n

Therefore R1 (iIn ) and R1 (−iIn ) are similar by the similarity matrix R1 (−iIn )K2n .

(k) From part (f), we have

pR1 (A) (t) = pU −1 R1 (A)U


!
tIn − A 0
= det
0 tIn − Ā
= det(tIn − A) det(tIn − Ā)
= pA (t)pĀ (t)

32
(l)
!
∗ AT1 −AT2
R1 (A ) =
AT2 AT1
!T
A1 A2
=
−A2 A1
= R1 (A)T

If A∗ = A, then this expression implies R1 (A) = R1 (A)T . Thus R1 (A) is real symmetric if A is
Hermitian. Conversely, if R1 (A) = R1 (A)T , this expression implies R1 (A∗ ) = R1 (A). But this implies
that A1 = AT1 and A2 = −AT2 , so that A = A1 + iA2 = AT1 − iAT2 = A∗ . Thus if A is Hermitian if R1 (A)
is real symmetric.

If AA∗ = A∗ A = In , we have

R1 (A)R1 (A)T = R1 (A)R1 (A∗ ) = R1 (AA∗ ) = R1 (In ) = I2n


R1 (A)T R1 (A) = R1 (A∗ )R1 (A) = R1 (A∗ A) = R1 (In ) = I2n

so R1 (A) is real orthogonal. Conversely, if R1 (A)R1 (A)T = R1 (A)T R1 (A), we have that R1 (AA∗ ) =
R1 (A∗ A) = I2n . But this implies AA∗ = A∗ A = I, so A is unitary.

(m) If AA∗ = A∗ A, then R1 (A)R1 (A)T = R1 (A)R1 (A∗ ) = R1 (AA∗ ) = R1 (A∗ A) = R1 (A∗ )R1 (A) =
R1 (A)T R1 (A). If R1 (A)R1 (A)T = R1 (A)T R1 (A), then R1 (AA∗ ) = R1 (A∗ A). But this implies AA∗ =
A∗ A.

(n) Partition A so that it is conformable with S2n . We then have


!
A21 A22
S2n A =
−A11 −A12
!
−A12 A11
AS2n =
−A22 A21

It is easy to see that S2n A = AS2n if and only if A11 = A22 and A21 = −A12 .

Suppose A is invertible. Then left and right multiplying both sides of S2n A = AS2n by A−1 yields
A−1 S2n = S2n A−1 . This implies A−1 is a matrix of complex type.

If A and B are both matrices of complex type, then S2n AB = AS2n B = ABS2n . Thus AB is also a
matrix of complex type.

33
1.3.P21

(a)
! !
1∗ −iIn −iIn iIn In
VV =
2 In −In iIn −In
!
1 2In 0
=
2 0 2In
= I2n
! !
∗ 1 iIn In −iIn −iIn
V V =
2 iIn −In In −In
!
1 2In 0
=
2 0 2In
= I2n

! !
In 0 In 0
R2 (In )R2 (In ) =
0 −In 0 −In
!
In 0
=
0 In

! !
0 In 0 In
R2 (iIn )R2 (iIn ) =
In 0 In 0
!
In 0
=
0 In

V −1 R2 (In )V = V ∗ R2 (In )V
! ! !
1 iIn In In 0 −iIn −iIn
=
2 iIn −In 0 −In In −In
! !
1 iIn −In −iIn −iIn
=
2 iIn In In −In
!
1 0 2In
=
2 2In 0
= R2 (iIn )

34
(b)

A = B ⇐⇒ A1 = B1 and A2 = B2 ⇐⇒ R2 (A) = R2 (B)

!
A1 + B 1 A2 + B 2
R1 (A + B) =
A2 + B 2 −(A1 + B1 )
!
A1 + B 1 A2 + B 2
=
A2 + B 2 −(A1 + B1 )
! !
A1 A2 B1 B2
= +
A2 −A1 B2 −B1
= R1 (A) + R1 (B)

(c)
! ! ! !
0 Ā −1 1 −iIn −iIn 0 Ā iIn In
V V =
A 0 2 In −In A 0 iIn −In
! !
1 −iA −iĀ iIn In
=
2 −A Ā iIn −In
!
1 A + Ā i(Ā − A)
=
2 i(Ā − A) −(A + Ā)
!
1 2A1 2A2
=
2 2A2 −2A1
= R2 (A)

(d)
!
0 Ā
det R2 (A) = det
A 0
= det(−AĀ)
= (−1)n det A det Ā
= (−1)n det A(det A)
= (−1)n | det A|2

(e)

det R2 (A) = 0 ⇐⇒ | det A|2 = 0 ⇐⇒ det A = 0

35
(f)
!
tI −Ā
pR2 (A) (t) = det
−A
tI
 
= det(tIn ) det tI − 1t AĀ
 
= det t2 I − AĀ

= pAĀ (t2 )

(g)

R2 (AB) = R2 (A1 B1 − A2 B2 + i(A2 B1 + A1 B2 ))


!
A1 B 1 − A2 B 2 A2 B 1 + A1 B 2
=
A2 B 1 + A1 B 2 A2 B 2 − A1 B 1
! !
A1 A2 B1 B2
=
A2 −A1 −B2 B1
! ! !
A1 A2 In 0 B1 B2
=
A2 −A1 0 −In B2 −B1
= R2 (A)R2 (In )R2 (B)

(h)
! ! !
In 0 A1 A2 In 0
R2 (In )R2 (A)R2 (In ) =
0 −In A2 −A1 0 −In
! !
A1 A2 In 0
=
−A2 A1 0 −In
!
A1 −A2
=
−A2 −A1
= R2 (Ā)

R2 (AB̄C) = R2 (A)R2 (In )R2 (B̄)R2 (In )R2 (C)


= R2 (A)R2 (B)R2 (C)

36
(i)
!
−A1 −A2
−R2 (A) =
−A2 A1
= R2 (−A)
= R2 (iIn · A · iIn )
= R2 (iIn )R2 (Ā)R2 (iIn )
= R2 (iIn )R2 (In )R2 (A)R2 (In )R2 (iIn )
= R2 (iIn )R2 (In )R2 (A)(R2 (iIn )R2 (In ))−1

(j)

V (ĀB ⊕ AB̄)V −1 = V (ĀB ⊕ AB̄)V ∗


! ! !
1 −iIn −iIn ĀB 0 iIn In
=
2 In −In 0 AB̄ iIn −In
! !
1 −iĀB −iAB̄ iIn In
=
2 ĀB −AB̄ iIn −In
!
1 ĀB + AB̄ i(AB̄ − ĀB)
=
2 i(ĀB − AB̄) ĀB + AB̄
!
A1 B1 + A2 B2 A1 B2 − A2 B1
=
A2 B1 − A1 B2 A1 B1 + A2 B2
! !
A1 A2 B1 B2
=
A2 −A1 B2 −B1
= R2 (A)R2 (B)

(k)
 
R2 (A)R2 (Ā−1 ) = V ĀĀ−1 ⊕ AĀ−1 V −1

= V (In ⊕ In ) V −1
= V V −1
= I2n
 
R2 (Ā−1 )R2 (A) = V Ā−1 A ⊕ Ā−1 Ā V −1

= V (In ⊕ In ) V −1
= V V −1
= I2n

37
(l)
! !
2 A1 A2 A1 A2
R2 (A) =
A2 −A1 A2 −A1
!
A21 + A22 A1 A2 − A2 A1
=
−(A1 A2 − A2 A1 ) A21 + A22
= R1 (A21 + A22 + i(A1 A2 − A2 A1 ))
= R1 ((A1 − iA2 )(A1 + iA2 ))
= R1 (ĀA)
R2 (AĀ)R2 (In ) = R2 (A)R2 (In )R2 (Ā)R2 (In )
= R2 (A)R2 (A)
= R2 (A)2

(m)

R2 (SAS̄ −1 ) = R2 (S)R2 (Ā)R2 (S̄ −1 )


= R2 (S)R2 (In )R2 (A)R2 (In )R2 (S̄ −1 )
= R2 (S)R2 (In )R2 (A)(R2 (S)R2 (In ))−1

(n)
!
T AT1 AT2
R2 (A ) =
AT2 −AT1
= R2 (A)T

(o)

R2 (AA∗ ) = R2 (A)R2 (In )R2 (A∗ )


= R2 (A)R2 (In )R2 (Ā)T
= R2 (A)R2 (In )(R2 (In )R2 (A)R2 (In ))T
= R2 (A)R2 (A)T R2 (In )
R2 (A∗ A) = R2 (A∗ )R2 (In )R2 (A)
= R2 (Ā)T R2 (In )R2 (A)
= (R2 (In )R2 (A)R2 (In ))T R2 (In )R2 (A)
= R2 (In )R2 (A)T R2 (A)

If A∗ A = AA∗ = In , then R2 (In ) = R2 (A)R2 (A)T R2 (In ) = R2 (A)T R2 (A)R2 (In ). But this implies I2n =
R2 (A)R2 (A)T = R2 (A)T R2 (A), so R2 (A) is orthogonal. Conversely, if I2n = R2 (A)R2 (A)T = R2 (A)T R2 (A),

38
then R2 (AA∗ ) = R2 (A∗ A) = R2 (In ). This implies AA∗ = A∗ A = In , so A is unitary.

1.3.P22

Suppose A = XY and B = Y X. If X is nonsingular, then Y = X −1 A so that B = X −1 AX. If Y


is nonsingular, then X = Y −1 B so that A = Y −1 BY . Thus A and B are similar if either X or Y is
nonsingular.

If A = SBS −1 where X is nonsingular, then define X = S and Y = BS −1 . Then A = XY and B = Y X by


construction.

1.3.P23
!
In −X
Suppose C = BX for some X ∈ Mn,m . Consider the matrix S = . Then S is invertible and
0m,n Im
!
−1 In X
S = . Then we have
0m,n Im
! ! !
−1 In X B BX In −X
S AS =
0m,n Im 0m,n 0m 0m,n Im
! !
B BX In −X
=
0m,n 0m 0m,n Im
!
B 0n,m
=
0m,n 0m
= B ⊕ 0m

so A is similar to B ⊕ 0m .

Conversely suppose A = S(B ⊕0m )S −1 for some nonsingular S ∈ Mn+m . It is apparent that rank(B ⊕0m ) =
rank B and rank(A) = rank((B C)). Since multiplication by a nonsingular matrix does not change rank, we
have that rank(B) = rank(B ⊕ 0m ) = rank(A) = rank((B C)).

39
1.3.P24

Ajk = cos (jθ + kθ)


 

= cos (j + k)
n
         
1 2π 2π 1 2π 2π
= cos (j + k) + i sin (j + k) + cos (j + k) − i sin (j + k)
2 n n 2 n n
         
1 2π 2π 1 2π 2π
= cos (j + k) + i sin (j + k) + cos −(j + k) + i sin −(j + k)
2 n n 2 n n
1  i(j+k) 2π 2π

= e n + e−i(j+k) n
2
1  i 2π j i 2π k 2π 2π

= e n e n + e−i n j e−i n k
2
1 j k
α α + α−j α−k

=
2
1
= (xj xk + yj yk )
2

Pn !
1 T 1 j=1 α2j n
Let B = 2 (x y) (x y) = 2 Pn . Observe that
n j=1 α−2j

n
X α2
α2j = (1 − α2n )
j=1
1 − α2
α2
= (1 − ei4π )
1 − α2
α2
= (1 − cos(4π) − i sin(4π))
1 − α2
=0
n
X α−2
α−2j = (1 − α−2n )
j=1
1 − α−2
α−2
= (1 − e−i4π )
1 − α−2
α2
= (1 − cos(−4π) − i sin(−4π))
1 − α2
=0
!
n
0 n2
2
so that pB (t) = t2 − n n
 
Thus B = n 4 = t− 2 t+ 2 From Theorem 1.3.22, we have
2 0
 n  n
pA (t) = tn−2 pB (t) = tn−2 t − t+
2 2

40
1.3.P25

(a)

(I + xy ∗ )(I − cxy ∗ ) = I − cxy ∗ + xy ∗ − cxy ∗ xy ∗


= I − (1 − c − cy ∗ x)xy ∗
= I − (1 − c(1 + y ∗ x))xy ∗
=I

(b) If y ∗ x = 0, then c = 1. This implies Λ is similar to A by the similarity matrix I + xy ∗ , so they share the
same eigenvalues.

1.3.P26

(a) Index the rows by (1, 1), · · · , (1, m), (2, 1), · · · , (2, m), · · · , (n, 1), · · · , (n, m).

Index the columns by (1, 1), · · · , (1, n), (2, 1), · · · , (2, n), · · · , (m, 1), · · · , (m, n).

Then column (i, j) is zero everywhere except for a one in row (j, i).

(b) Write the rows of A as

(1) (1,1) (1,1) (1,1) (1,2) (1,2) (1,2) (1,m) (1,m)


A1· = a11 , a12 , · · · , a1n , a11 , a12 , · · · , a1n , · · · , a11 , · · · , a1n
(1) (1,1) (1,1) (1,1) (1,2) (1,2) (1,2) (1,m) (1,m)
A2· = a21 , a22 , · · · , a2n , a21 , a22 , · · · , a2n , · · · , a21 , · · · , a2n
..
.
(1) (1,1) (1,1) (1,2) (1,2) (1,m)
An· = an1 , an2 , · · · , a(1,1) (1,2)
nn , an1 , an2 , · · · , ann , · · · , an1 , · · · , a(1,m)
nn
(2) (2,1) (2,1) (2,1) (2,2) (2,2) (2,2) (2,m) (2,m)
A1· = a11 , a12 , · · · , a1n , a11 , a12 , · · · , a1n , · · · , a11 , · · · , a1n
(2) (2,1) (2,1) (2,1) (2,2) (2,2) (2,2) (2,m) (2,m)
A2· = a21 , a22 , · · · , a2n , a21 , a22 , · · · , a2n , · · · , a21 , · · · , a2n
..
.
(2) (2,1) (2,1) (2,2) (2,2) (2,m)
An· = an1 , an2 , · · · , a(2,1) (2,2)
nn , an1 , an2 , · · · , ann , · · · , an1 , · · · , a(2,m)
nn
..
.
(m) (m,1) (m,1) (m,1) (m,2) (m,2) (m,2) (m,m) (m,m)
A1· = a11 , a12 , · · · , a1n , a11 , a12 , · · · , a1n , · · · , a11 , · · · , a1n
(m) (m,1) (m,1) (m,1) (m,2) (m,2) (m,2) (m,m) (m,m)
A2· = a21 , a22 , · · · , a2n , a21 , a22 , · · · , a2n , · · · , a21 , · · · , a2n
..
.
(m) (m,1) (m,1) (m,2) (m,2) (m,m)
An· = an1 , an2 , · · · , a(m,1)
nn , an1 , an2 , · · · , a(m,2)
nn , · · · , an1
(m,m)
, · · · , ann

41
Then the rows of P A can be written as

(1) (1,1) (1,1) (1,1) (1,2) (1,2) (1,2) (1,m) (1,m)


A1· = a11 , a12 , · · · , a1n , a11 , a12 , · · · , a1n , · · · , a11 , · · · , a1n
(2) (2,1) (2,1) (2,1) (2,2) (2,2) (2,2) (2,m) (2,m)
A1· = a11 , a12 , · · · , a1n , a11 , a12 , · · · , a1n , · · · , a11 , · · · , a1n
..
.
(m) (m,1) (m,1) (m,1) (m,2) (m,2) (m,2) (m,m) (m,m)
A1· = a11 , a12 , · · · , a1n , a11 , a12 , · · · , a1n , · · · , a11 , · · · , a1n
(1) (1,1) (1,1) (1,1) (1,2) (1,2) (1,2) (1,m) (1,m)
A2· = a21 , a22 , · · · , a2n , a21 , a22 , · · · , a2n , · · · , a21 , · · · , a2n
(2) (2,1) (2,1) (2,1) (2,2) (2,2) (2,2) (2,m) (2,m)
A2· = a21 , a22 , · · · , a2n , a21 , a22 , · · · , a2n , · · · , a21 , · · · , a2n
..
.
(m) (m,1) (m,1) (m,1) (m,2) (m,2) (m,2) (m,m) (m,m)
A2· = a21 , a22 , · · · , a2n , a21 , a22 , · · · , a2n , · · · , a21 , · · · , a2n
..
.
(1) (1,1) (1,1) (1,2) (1,2) (1,m)
An· = an1 , an2 , · · · , a(1,1) (1,2)
nn , an1 , an2 , · · · , ann , · · · , an1 , · · · , a(1,m)
nn
(2) (2,1) (2,1) (2,2) (2,2) (2,m)
An· = an1 , an2 , · · · , a(2,1) (2,2)
nn , an1 , an2 , · · · , ann , · · · , an1 , · · · , a(2,m)
nn
..
.
(m) (m,1) (m,1) (m,2) (m,2) (m,m)
An· = an1 , an2 , · · · , a(m,1)
nn , an1 , an2 , · · · , a(m,2)
nn , · · · , an1
(m,m)
, · · · , ann

Thus P groups the first rows of each block together, the second rows of each block together, and so forth.
Right-multiplying by P T likewise groups the first columns of each block together, the second columns
of each block together, and so forth. We have
 (1,1) (1,2) (1,m) (1,1) (1,2) (1,m) (1,1) (1,2) (1,m) 
a11 a11 ··· a11 a12 a12 ··· a12 ··· a1n a1n ··· a1n
(2,1) (2,2) (2,m) (2,1) (2,2) (2,m) (2,1) (2,2) (2,m)
a11 a11 ··· a11 a12 a12 ··· a12 ··· a1n a1n ··· a1n
 
 
.. .. .. .. .. .. .. .. ..
 
 

 . . . . . . . . . 

(m,1) (m,2) (m,m) (m,1) (m,2) (m,m) (m,1) (m,2) (m,m)

 a11 a11 ··· a11 a12 a12 ··· a12 ··· a1n a1n ··· a1n 

 (1,1) (1,2) (1,m) (1,1) (1,2) (1,m) (1,1) (1,2) (1,m) 

 a21 a21 ··· a21 a22 a22 ··· a22 ··· a2n a2n ··· a2n 

(2,1) (2,2) (2,m) (2,1) (2,2) (2,m) (2,1) (2,2) (2,m)

 a21 a21 ··· a21 a22 a22 ··· a22 ··· a2n a2n ··· a2n 

T
 .. .. .. .. .. .. .. .. .. 
P AP = . . . . . . . . .
 

(m,1) (m,2) (m,m) (m,1) (m,2) (m,m) (m,1) (m,2) (m,m)
 

 a21 a21 ··· a21 a22 a22 ··· a22 ··· a2n a2n ··· a2n 

 .. .. .. .. .. .. .. .. .. 

 . . . . . . . . . 

 (1,1) (1,2) (1,m) (1,1) (1,2) (1,m) (1,1) (1,2) (1,m) 

 an1 an1 ··· an1 an2 an2 ··· an2 ··· ann ann ··· ann 

(2,1) (2,2) (2,m) (1,1) (2,2) (2,m) (2,1) (2,2) (2,m)

 an1 an1 ··· an1 an2 an2 ··· an2 ··· ann ann ··· ann 

 .. .. .. .. .. .. .. .. .. 
. . . . . . . . .
 
 
(m,1) (m,2) (m,m) (m,1) (m,2) (m,m) (m,1) (m,2) (m,m)
an1 an1 ··· an1 an2 an2 ··· an2 ··· ann ann ··· ann

(k,l)
(c) (i) Akl is upper triangular for all k, l ⇐⇒ aij = 0 if i > j for all k, l ⇐⇒ Ãij = 0 if i > j ⇐⇒ Ã
is block upper triangular.
(k,l)
(ii) Akl is upper Hessenberg for all k, l ⇐⇒ aij = 0 if i > j + 1 for all k, l ⇐⇒ Ãij = 0 if i > j + 1
⇐⇒ Ã is block upper Hessenberg.

42
(k,l)
(iii) Akl is diagonal for all k, l ⇐⇒ aij = 0 if i 6= j for all k, l ⇐⇒ Ãij = 0 if i 6= j ⇐⇒ Ã is block
diagonal.

(iv) A is block upper triangular and all of the blocks are upper triangular. ⇐⇒ Akl is upper triangular
(k,l) (k,l)
if k ≤ l, Akl = 0 if k > l ⇐⇒ aij = 0 if i > j and k ≤ l, aij = 0 if k > l ⇐⇒ Ãij = 0 if
i > j, Ãij is upper triangular if i ≤ j ⇐⇒ Ã is block upper triangular and all of the blocks are
upper triangular.

1.3.P27

We can use the permutation matrix in 1.3.P26 to show that A is similar to a block upper triangular matrix
with Ãpp along the diagonal. Since the characteristic function of a block upper triangular matrix is equal to
the product of the characteristic functions of the diagonal blocks, we see that A has the same eigenvalues as
Ã11 ⊕ · · · ⊕ Ãnn . Likewise, since the determinant of a block diagonal matrix is equal to the product of the
determinant of the diagonal blocks, we see that det A = (det Ã11 ) · · · · · · det(Ãnn ).

If the diagonal entries of each block Aij are constant, then all the blocks along the diagonal of à are the
same. Thus we have det A = (det Ã11 )n and pA (t) = (pÃ11 (t))n .

1.3.P28

Assume without loss of generality that m ≤ n. We have that

pBA (−1) = det(−In − BA) = (−1)n det(In + BA)pAB (−1) = det(−Im − AB) = (−1)m det(In + BA)

From Theorem 1.3.22, we know that pBA (−1) = (−1)n−m pAB (−1). Thus we know that (−1)n det(In +BA) =
(−1)n−m (−1)m det(In + BA) ⇒ det(In + BA) = det(In + BA).

1.3.P29

det A is an integer because all the elements of A are integers.

To change aij from −1 to +1, add 2ei eTj to A. From Cauchy’s identity, we see that

det(A + 2ei eTj ) = det A + 2eTj (adj A)ei = det A + 2 det Aij

where Aij is the matrix obtained by removing row i column j from the matrix A. But an odd plus an even
is odd and an even plus an even is even, so the parity of det(A + 2ei eTj ) is the same as the parity of det A.

This result shows that we can change all the −1 entries to +1 without changing the parity of the determinant.
Once we make these changes, we obtain the matrix eeT − I. Again using Cauchy’s identity, we have

det(−I + eeT ) = (−1)n (det(I) − eT adj(I)e) = (−1)n (1 − n)

But this implies eeT − I has parity opposite to the parity of n. Thus when n is even, det A is odd and
therefore cannot be 0.

43
1.3.P30

By Theorem 1.3.27, if A = SΛS −1 = T ΛT −1 , then T = S(R1 ⊕ · · · ⊕ Rd ) with each Ri nonsingular. But


this implies

Sf (Λ)S −1 = T (R1−1 ⊕ · · · ⊕ Rd−1 )f (Λ)(T (R1−1 ⊕ · · · ⊕ Rd−1 ))−1


= T (R1−1 ⊕ · · · ⊕ Rd−1 )f (Λ)(R1 ⊕ · · · ⊕ Rd )T −1
 −1 
R1 f (µ1 )In1 R1 0
R2−1 f (µ2 )In2 R2
 
  −1
=T 
..
T
.

 
0 Rd−1 f (µd )Ind Rd
 
f (µ1 )In1 0
f (µ2 )In2
 
  −1
=T ..
T
.
 
 
0 f (µd )Ind
= T f (Λ)T −1

If A had real eigenvalues, then

cos2 A + sin2 A = S cos2 (Λ)S −1 + S sin2 (Λ)S −1


= S(cos2 (Λ) + sin2 (Λ))S −1
(cos2 (µ1 ) + sin2 (µ1 ))In1
 
0
(cos2 (µ2 ) + sin2 (µ2 ))In2
 
  −1
=S ..
S
.
 
 
0 (cos2 (µd ) + sin2 (µd ))Ind
= SIn S −1
= In

1.3.P31

We know that det A = a2 + b2 and tr A = 2a. The characterstic polynomial is then

pA (t) = t2 − 2at + a2 + b2

Using the quadratic formula, we have


p
2a ± (−2a)2 − 4(a2 + b2 ) p
λ= = a ± −4b2 = a ± ib
2

44
1.3.P32

Suppose u and v are linearly independent and suppose a1 x + a2 x̄ = 0. Then a1 (u + iv) + a2 (u − iv) =
(a1 + a2 )u + i(a1 − a2 )v = 0. Since u and v are linearly independent, this implies a1 + a2 = 0 and
i(a1 − a2 ) = 0. But this means that a1 = a2 = 0.

Conversely, suppose x and x̄ are linearly independent and suppose a1 u+a2 v = 0. Then a1 ( x+x̄ x̄−x
2 )+a2 i( 2 ) =
1
2 (a1 − ia2 )x + 12 (a1 + ia2 )x̄ = 0. Since x and x̄ are linearly independent, this implies a1 − ia2 = 0 and
a1 + ia2 = 0. But this means that a1 = a2 = 0.

1.3.P33

(a) Since Ax = λx, we have Āx̄ = λ̄x̄. But since A = Ā, this implies λ̄, x̄ is an eigenvalue of A.

(b) Since λ is non-real, we must have λ 6= λ̄. Then x and x̄ are associated with distinct eigenvalues and are
therefore independent by Lemma 1.3.8. u and v are then independent by 1.3.P32.

(c) We know that

Ax = Au + iAv = λu + iλv = λx
Ax̄ = Au − iAv = λ̄u − iλ̄v = λ̄x̄

Solving for Au and Av and using the fact that λ + λ̄ = 2a and λ − λ̄ = 2ib, we can obtain Au = au − bv
and Av = bu + av. This implies
!
    a a
A u v = u v
−b b

!
−1

−1 −1 −1

−1
    I2
(d) Since I = S S= S u S v S S1 , we must have S u v = e1 e2 = . Then
0
 
S −1 AS = S −1 A u v S1
   
= S −1 A u v AS1
  
= S −1 u v B AS1
   
= S −1 u v B S −1 AS1
! !
I2 −1
= B S AS1
0
! !
B
= S −1 AS1
0
!
B F
=
0 A1

(e) From the previous exercise, we see that pA (t) = pB (t)pA1 (t). But from 1.3.P31, we know that pB (t) =

45
(t−λ)(t− λ̄). Thus the multiplicity of λ and λ̄ as an eigenvalue of A1 must be one less than its multiplicity
as an eigenvalue of A.

1.3.P34

Suppose that A = S −1 BS and that A and B are nonsingular. Then A−1 = S −1 B −1 S and det A = det B, so

adj A = (det A)A−1 = (det A)S −1 B −1 S = S −1 (det B)B −1 S = S −1 adj(B)S

Now suppose A = S −1 BS, but allow for the possibility that A or B is singular. Let {n }∞
n=1 be a sequence
converging to 0. Define An = A+n I and Bn = B +n I. Then S −1 Bn S = S −1 (B +n I)S = S −1 BS +n I =
A + n I = An . Thus An and Bn are similar. Now from 1.2.17, there exists N large enough such that An
and Bn are nonsingular for all n ≥ N . Thus adj An = S −1 adj(Bn )S for all n ≥ N . Since determinants are
continuous functions and each element of the adjugant is a determinant, we must have adj An → adj A and
adj Bn → adj B. This implies adj A = limn→∞ adj An = limn→∞ S −1 adj(Bn )S = S −1 adj(B)S.

1.3.P35

(a) If A ⊂ Mn is reducible, then there exists a non-trivial subspace W ⊂ Cn such that W is A-invariant.
Now define k = dim W and note that by assumption 1 ≤ k ≤ n − 1. Let w1 , · · · , wk denote a basis of W
and let w1 , · · · , wk , v1 , · · · , vn−k denote a basis for Cn . Consider the matrix
 
A0 = w1 ··· wk v1 ··· vn−k

If A0 ∈ A, then W is A0 -invariant. Take z = (x1 , · · · , xk , y1 , · · · , yn−k )0 ∈ W and observe that

A0 z = x1 w1 + · · · + xk wk + y1 v1 + · · · yn−k vn−k

If A0 z ∈ W , then we also must have

A0 z = a1 w1 + · · · + ak wk

for some a1 , · · · , ak . But this implies ai = xi for i = 1, · · · , k and yj = 0 for j = 1, · · · , n − k, for


otherwise the columns of A0 would be linearly dependent. But this means that all vectors in W must
have zeros in entries k + 1 through n. Now consider the matrix
 
A1 = v 1 w2 ··· wk w1 v2 · · · vn−k

Using the same argument, we can show that if A1 ∈ A then all vectors in W must have a zero in the first
entry. But this implies if both A0 and A1 are in A, dim W ≤ k − 1. This is a contradiction, so A =
6 Mn .

(b) Any non-trivial subspace W ⊂ Cn is 0-invariant. Thus {0} is reducible.

(c) Pick A0 ∈ A and w ∈ {Ax : A ∈ A} = Ax. Since w = A1 x for some A1 ∈ A, we have that A0 w = A0 A1 x.
But A0 A1 ∈ A because A is an algebra. This means that A0 w ∈ Ax.

46
Observe that 0 ∈ A because A is a subspace. Thus 0 = 0x ∈ Ax. Suppose that w0 and w1 are in Ax.
Then w0 = A0 x and w1 = A1 x for some A0 , A1 ∈ A. Pick scalars α0 and α1 . Then α0 w0 + α1 w1 =
α0 A0 x + α1 A1 x = (α0 A0 + α1 A1 )x. But α0 A0 + α1 A1 ∈ A because A is a subspace. Therefore
α0 w0 + α1 w1 ∈ Ax.

(d) Since Ax is an A-invariant subspace, if A is irreducible then Ax must be trivial. Since A =


6 {0} we must
have Ax 6= 0. But this means Ax = Cn .

(e) Observe that 0 ∈ A because A is a subspace. Thus 0 = 0∗ x ∈ A∗ x. Suppose that w0 and w1 are in A∗ x.
Then w0 = A∗0 x and w1 = A∗1 x for some A0 , A1 ∈ A. Pick scalars α0 and α1 . Then α0 w0 + α1 w1 =
α0 A∗0 x + α1 A∗1 x = (α0 A∗0 + α1 A∗1 )x = (ᾱ0 A0 + ᾱ1 A1 )∗ x. But ᾱ0 A0 + ᾱ1 A1 ∈ A because A is a subspace.
Therefore α0 w0 + α1 w1 ∈ A∗ x.

(f) Fix x ∈ Cn and suppose there exists y ∈ Cn which is not in A∗ x. Let w1 , · · · , wk denote an orthonormal
basis for the subspace A∗ x and define z = y − hw1 , yiw1 − · · · − hwk , yiwk . From part (e), we know we
can find an A ∈ A such that x = Az. Define w = A∗ x and observe that hz, wi = 0 by construction. But
hz, wi = z ∗ A∗ Az, which implies Az = x = 0, a contradiction.

(g) Define d = min{rank A : A ∈ A, A 6= 0}. Suppose d > 1 and pick Ad with rank Ad = d. Since
rank Ad > 1, there exists i and j such that Ad ei and Ad ej are linearly independent. That is, Ad ei 6= 0
and Ad ei 6= λAd ej for all λ ∈ C. Now from the previous results, we can find B ∈ A such that
B(Ad ej ) = ei . But this implies Ad BAd ej = Ad ei 6= λAd ej for any λ ∈ C. Observe that for any
x ∈ Cn we have Ad B(Ad x) = (Ad B)Ad x. This means that the column space of Ad is Ad B-invariant,
so by 1.3.18 we can find an eigenvector of Ad B in the column space of Ad . That is, there exists
x ∈ Cn and λ0 ∈ C such that Ad x 6= 0 and Ad BAd x = λ0 Ad x. Since Ad BAd ej 6= λ0 Ad ej , we know
that Ad BAd − λ0 Ad 6= 0. Because A is an algebra, we also know that Ad BAd − λ0 Ad ∈ A. Also
observe that if Ad z = 0, then (Ad BAd − λ0 Ad )z = 0. Thus nullspace(Ad ) ⊆ nullspace(Ad BAd − λ0 Ad ).
We also know that Ad x 6= 0 but (Ad BAd − λ0 Ad )x, so the set inequality is strict. But this implies
rank(Ad BAd − λ0 Ad ) = n − dim(nullspace(Ad BAd − λ0 Ad )) < n − dim(nullspace(Ad )) = n − (n − d) = d.
This contradicts the definition of d.

(h) Any rank 1 matrix can be written as uv ∗ , where u and v are not equal to zero. By the previous results,
we know we can find A ∈ A and B ∈ A such that Ay = u and B ∗ z = v. But then uv ∗ = Ayz ∗ B. Since
A, B, and yz ∗ are in A, uv ∗ ∈ A.

(i) Since every matrix can be written as the sum of rank 1 matrices and A is closed under sums, the result
follows.

1.3.P36

(a) The claim is false. Suppose


   
1 −1
   
 2  1 
A=  B= 

 3 


 −1

4 1

47
Since A and B are both block upper triangular, all matrices in A(A, B) are block upper triangular.
Thus A(A, B) 6= M4 . However all eigenvectors of A are real while all eigenvectors of B have a nonzero
imaginary part, so A and B do not share a common eigenvector.

(b) The eigenvectors of A all take the form (x1 , 0)0 , x1 6= 0, while the eigenvectors of B all take the form
(0, x2 )0 , x2 6= 0. Thus the two sets of eigenvectors are disjoint.
! !
1 0 0 0
Observe that AB = and BA = . Then A, B, AB and BA form a basis of M2 .
0 0 0 1

1.3.P37

n = 2m:

! ! !
−1 1 Im −Km B Km CKm Im Im
Q AQ =
2 Im Km C Km BKm −Km Km
! !
1 B − Km C Km CKm − BKm Im Im
=
2 B + Km C Km CKm + BKm −Km Km
!
1 B − Km C − (Km C − B) B − Km C + Km C − B
=
2 B + Km C − (Km C + B) B + Km C + Km C + B
!
B − Km C 0
=
0 B + Km C
= (B − Km C) ⊕ (B + Km C)

n = 2m + 1:

   
Im 0 −Km B Km y Km CKm Im 0 Im
1 √ √
Q−1 AQ = 0 0  xT
2 α xT Km   0 0  2
  
2
Im 0 Km C y Km BKm −Km 0 Km
  
B − Km C Km y − Km y Km CKm − BKm Im 0 Im
1 √ T √ √ T √
=  2x 2α 2x Km  0 2 0 
 
2
B + Km C Km y + Km y Km CKm + BKm −Km 0 Km
 
B − Km C − (Km C − B) 0 B − Km C + Km C − B
1 √ T √ T √ T √ T
=  2x − 2x 2α 2x + 2x

2 √

B + Km C − (Km C + B) 2 2Km y B + Km C + Km C + B
 
B − Km C 0 0
√ T 
= 0 α 2x



0 2Km y B + Km C
√ T !
α 2x
= (B − Km C) ⊕ √
2Km y B + Km C

48
1.3.P38

(a)
 
1 t ··· ··· t
 .. .. 
t
 1 . . 
. .. .. .. .. 
 ..
B(t) =  . . . .

.
..

.
. t

. 1
t ··· t 1

From Theorem 1.3.22, the eigenvalues of tJn = ten eTn are the eigenvalues of teTn en = nt along with n − 1
zeros. Observation 1.1.8 then implies that the eigenvalues of B(t) = (1 − t)In + tJn are nt + (1 − t) =
(n − 1)t + 1 and 1 − t with multiplicity n − 1.

(b) If t 6= 1 and t 6= −(n − 1)−1 , then the eigenvalues of B(t) are all nonzero. By Observation 1.1.7, this
implies B(t) is nonsingular.

B(t)B(t)−1 = ((1 − t)In + tJn )(1 − t)−1 (In − t(1 + (n − 1)t)−1 Jn )


= In − t(1 + (n − 1)t)−1 Jn + t(1 − t)−1 Jn − t2 (1 − t)−1 (1 + (n − 1)t)−1 Jn2
= In + t(1 − t)−1 (1 + (n − 1)t)−1 (−(1 − t) + 1 + (n − 1)t − tn) Jn
= In

1.3.P39
Pn
If tr A = 0, then i=1 λi = 0. If A is also diagonalizable, there exists n linearly independent eigenvectors of
A (Theorem 1.3.7). If all the eigenvalues are 0, then there are n linearly independent vectors in the nullspace
of A. But this implies rank A = n − dim(nullspace(A)) = 0.
Pn
Now suppose λi 6= 0 for some i. If λj = 0 for all j 6= i, then 0 = i=1 λi = λi 6= 0 which is a contradiction.
Thus there must be at least one other non-zero eigenvalue λj . But this implies there are at least two linearly
independent eigenvectors xi and xj associated with non-zero eigenvalues λi and λj , respectively. Since
Axi = λi xi 6= 0 and Axi = λi xi 6= 0 and λi xi and λj xj are linearly independent, there are at least two
linearly independent vectors in the column space of A. But this implies rank A ≥ 2.
!
0 1
If rank A = 1 and tr A = 0, then A must not be diagonalizable. Thus is not diagonalizable.
0 0

1.3.P40

(a) The set of diagonal matrices commute and contains infinitely many matrices.

(b) If I ∈ F, then for any A ∈ F with A 6= I we have AI + IA = A + A = 0. But this implies A = 0, so that
A2 = 0. This contradicts the assumption that no A anticommutes with itself. Thus if I ∈ F we must
have F = {I}.

49
Let A1 , · · · , Ak be a finite collection of matrices from F. Consider the linear relationship a1 A1 + · · · +
ak Ak = 0. Then we must have a1 A21 +a2 A1 A2 +· · ·+ak A1 Ak = 0 and a1 A21 +a2 A2 A1 +· · ·+ak Ak A1 = 0.
Adding these equations gives us a1 (A21 + A21 ) + a2 (A1 A2 + A2 A1 ) + · · · + ak (A1 Ak + Ak A1 ) = 0. Using
the assumption that ]A1 , Aj [= 0 for j 6= 1, this equation implies that 2a1 A21 = 0. Since A21 6= 0 by
assumption, this then implies a1 = 0. Repeating this argument for A2 , · · · , Ak , it readily follows that
a2 = a3 = · · · = ak = 0.

Since any finite collection of matrices from F is linearly independent, there can be at most n2 matrices
in F.

Since F = {I} or does not contain I,

(c) In part (b) we showed that any collection of distinct pairwise anticommuting matrices is finite. If
Ai = Si Λi Si−1 with Λ diagonal, then A2i = Si Λi Si−1 Si Λi Si−1 = Si Λ2i Si−1 . Since Λ2i is also diagonal, we
can conclude that A2i is diagonalizable. Finally, observe that for i 6= j we have

A2i A2j = Ai (Ai Aj )Aj


= Ai (−Aj Ai )Aj
= −(Ai Aj )(Ai Aj )
= −(Aj Ai )(Aj Ai )
= −Aj (Ai Aj )Ai
= Aj Aj Ai Ai
= A2j A2i

where we repeatedly used the assumption that Ai Aj = −Aj Ai . Thus A2i and A2j commute.

1.3.P41

(a) Since D2 is nonsingular, the matrix D2 D1 A is similar to D1 AD2 by the similarity matrix D2 . Thus the
two matrices have the same eigenvalues.

(b) All matrices diagonally equivalent to A are also strictly triangular and therefore only have the single
eigenvalue 0.

(c) The discrimant of pAz (t) is

(tr A)2 − 4 det A = (a + zd)2 − 4z(ad − bc)


= a2 + 2azd + z 2 d2 − 4zad + 4bcz
= d2 z 2 + (2ad − 4(ad − bc))z + a2

The discriminant must equal zero for all nonzero z in order for the two eigenvalues of A to be the same.
But this implies d2 = 0, (2ad − 4(ad − bc)) = 0 and a2 = 0, which means that a = d = bc = 0. Since the
diagonal elements are zero and at least one off diagonal element is zero, A must be upper triangular.

(d) Suppose A is singular and every principle minor of A of size 1 is 0. Then a = d = 0 and ad−bc = −bc = 0.
But this means A is strictly triangular and therefore by part (b) A is not diagonally equivalent to a

50
matrix with distinct eigenvalues.

Conversely suppose A is not diagonally equivalent to a matrix with distinct eigenvalues. Then part (c)
shows us that a = d = bc = 0, so A must be strictly triangular. But this implies det A = 0 and the every
principle minor of A of size 1 is 0.

Section 1.4

Exercises

1. Consider the set V = {x : Ax = λx}. Observe that 0 ∈ V . For any scalars α and β, we have
A(αx + βy) = αAx + βAy = αλx + βλy = λ(αx + βy), so αx + βy ∈ V .

Observe that all nonzero elements of V are eigenvectors of A associated with the eigenvalue λ.

2. By the rank-nullity theorem, we must have rank(A − λI) + dim(nullspace(A − λI)) = n.

3. Suppose x ∈ V = {x ∈ Cn : Ax = λx}. Then A(Ax) = A(λx) = λ(Ax), so Ax ∈ V . Thus V is


A-invariant.

Observe that V = Cn is always A-invariant, but V is only an eigenspace when A = λI for some λ.

4. Suppose V is an A-invariant subspace and dim V ≥ 2. We know from 1.3.18 that some vector in x ∈ V
is an eigenvector of A. Consider the space W = {αx : α ∈ Cn }. Since dim W = 1, W is strictly
contained in V . Also observe that for any y ∈ W we have Ay = A(αx) = α(Ax) = αλx, implying that
Ay ∈ W . Thus W is an A-invariant subspace. We can conclude that any A-invariant subspace with
dimension greater than 1 contains an A-invariant subspace of strictly lower dimension.

5. Theorem 1.2.18 says that rank(A − λI) ≥ n − k if λ has algebraic multiplicity k. But this means
dim(nullspace(A − λI)) = n − rank(A − λI) ≤ k, so algebraic multiplicity must exceed geometric
multiplicity. If k = 1, then rank(A − λI) = n − 1. This implies dim(nullspace(A − λI)) = 1.

6. Theorem 1.3.7 says that if there are k linearly


! independent eigenvectors associated with λ, then A
Λ C
is similar to a matrix of the form where Λ = diag(λ, · · · , λ) ∈ Mk . But the characteristic
0 D
function of this matrix is (t − λ)k pD (t), implying that λ is an eigenvalue with algebraic multiplicity of
at least k. Since similar matrices have the same eigenvalues, this implies that the algebraic multiplicity
of λ for the matrix A is at least k. Since the geometric multiplicity must be at least 1, if the algebraic
multiplicity of λ is 1 then the geometric multiplicity of λ must also be 1.

7. (a) pA1 (t) = (t − 1)(t − 2). The algebraic multiplicity is 1 for both eigenvalues, implying that the
geometric multiplicity is also 1 for both eigenvalues.

(b) pA2 (t) = (t − 1)2 . The algebraic multiplicity of λ = 1 is two. Both e1 and e2 are eigenvectors
associated with λ = 1, so the geometric multiplicity is also 2.

= (t − 1)2 . The algebraic multiplicity of λ = 1 is two. However the dimension of A3 − I =


(c) pA3 (t) !
0 1
is 1, so the geometric multiplicity of λ = 1 is one.
0 0

51
8. A1 is non-defective since the geometric and algebraic multiplicities are the same for both eigenvalues.
A2 is non-defective because the geometric and algebraic multiplicities of the single eigenvalue are the
same; A2 is derogatory because it has an eigenvalue with geometric multiplicity 2. A3 is defective
because the geometric mulitiplicity of its only eigenvalue is less than the algebraic multiplicity of this
eigenvalue; A3 is non-derogatory because the only eigenvalue has a geometric multiplicity of 1.

9. Suppose y ∗ A = λy ∗ for some y 6= 0. Then A∗ y = λ∗ y = λ̄y. We also have A∗ y = AT ȳ = λȳ.

10. Since S −1 AS = Λ, we must have S −1 A = ΛS −1 . We also know that S −1 = (S −∗ )∗ , so that


   
y1∗ y1∗
..   .. 
(S −∗ )∗ A =  −∗ ∗
   
 .  A = Λ  .  = Λ(S )
yn∗ yn∗

But this implies yj∗ A = λj yj∗ for j = 1, · · · , n, so each yj is a left eigenvector of A associated with the
eigenvalue λj . We also know that
   
y1∗ y∗ x ··· y1∗ xn
 .    1 1
.. .. .. 
S −1 S = (S −∗ )∗ S =  .
 .  x1 ··· xn = 
 . . . =I

yn∗ yn∗ x1 ··· yn∗ xn

which implies yj∗ xj = 1 for j = 1, · · · , n, and yi∗ xj = 0 for i 6= j.

1.4.P1

(a)

Ax = xy ∗ x = (y ∗ x)x = λx

so λ is an eigenvalue of A.

(b) In part (a), we showed that x is a right eigenvector associated with λ. Similarly, we have

y ∗ A = y ∗ xy ∗ = λy ∗

so y ∗ is a a left-eigenvalue of A.

(c)

pA (t) = det(tI − xy ∗ ) = det(tI) − y ∗ adj(tI)x = tn − tn−1 λ = tn−1 (t − λ)

If y ∗ z = 0, then Az = xy ∗ z = 0.

Since rank(A) = 1, dim(nullspace(A)) = n − 1. Thus the geometric multiplicity of 0 is n − 1.

A is diagonalizable if and only if the geometric and algebraic multiplicities of each eigenvector are equal.
If λ = 0, the algebraic multiplicity of 0 is n while its geometric multiplicity is n − 1. This implies A is

52
not diagonalizable. If λ 6= 0, the algebraic multiplicity of 0 is n − 1 and its geometric multiplicity is n − 1.
Because the algebraic multiplicity of λ also matches its geometric multiplicity, A is diagonalizable.

1.4.P2

If A = −AT , we have

pA (t) = det(tI − A) = det((tI − AT )T ) = det(tI + AT ) = (−1)n det(−tI − A) = (−1)n pA (−t)

Suppose pA (t) = (t − λ)k q(t), where q(λ) 6= 0. Then the above expression implies pA (t) = (−1)n (−t −
λ)k q(−t) = (t − (−λ))k (−1)n−k q(−t) = (t − (−λ))k q̃(t) where q̃(−λ) 6= 0. Thus −λ is an eigenvalue of A
with multiplicity k.

If n is odd, then at least one eigenvalue λi must satisfy λi = −λi . But this implies λi = 0, so A is singular.

Observe that any principle submatrix of a skew-symmetric matrix is itself skew-symmetric. Thus the previous
result implies that any principle submatrix of odd size must be singular.

Because A is rank principle, there must exist a nonsingular principle submatrix with rank equal to rank A.
Since all principle submatrices of odd size are singular, a nonsingular principle submatrix of even size must
have rank equal to rank A. But this implies rank A must be even.

1.4.P3

(a) Since T is upper triangular and e∗n T is simply the last row of T , we have that e∗n T = tnn e∗n . If tii 6= tnn
for i = 1, · · · , n − 1, then the geometric multiplicity of tnn must equal 1. By Theorem 1.4.12(b), we must
have e∗n x 6= 0. But this means the last entry of x cannot equal 0.
!
T1 T2
(b) Partition T = where T1 ∈ Mk , T2 ∈ Mk,n−k , and T3 ∈ Mn−k . Let x1 be an eigenvector of T1
0 T3
! !
T T T T1 x1 tkk x1
associated with tkk . Consider the vector x = (x1 , 0 ) . Then T x = = = tkk x, so x is
0 0
an eigenvector of T associated with the eigenvalue tkk . If tii 6= tkk for i = 1, · · · , k − 1, then by part (a)
we know that the last entry in x1 cannot be zero. But this implies that the k th entry in x is nonzero.

1.4.P4

First observe that S −1 = S. Left-multiplying A by S changes the signs of elements in the odd rows, while
right-multiplying A by S changes the signs of elements in the odd columns. Thus these transformations
switch the signs of elements in even rows and odd columns or even columns and odd rows. Since all elements
in A are either in an even row and odd column or an even column and an odd row, S −1 AS equals −A.

Since S is non-singular, A and −A are similar. But notice that

p−A (t) = det(tI + A) = (−1)n det(−tI − A) = (−1)n pA (−t)

53
In 1.4.P2 we encountered another situation when pA (t) = (−1)n pA (−t). There we showed if λ is an eigenvalue
of A with multiplicity k, so is −λ. We also showed that if n is odd, at least one eigenvalue is 0.

1.4.P5

! ! ! !
x A11 x λx x
A = = =λ
0 0 0 0
       
0 y∗ A = 0 y ∗ A22 = 0 µy ∗ = µ 0 y∗

This result implies that (σ(A11 ) ∪ σ(A22 )) ⊂ σ(A). To show the reverse inclusion, suppose λ is an eigenvalue
of A associated with eigenvector z. Partition z = (z1T z2T )T so that it is conformable with A. Then we have
! !
A11 z1 + A12 z2 λz1
Az = = = λz
A22 z2 λz2

If z2 6= 0, then this equation implies that λ is an eigenvalue of A22 associated with z2 . If z2 = 0, then z1 6= 0
and this equations implies that λ is an eigenvalue of A11 associated with z1 .

1.4.P6

(a) Suppose y ∗ A = λy ∗ and Ax = λx with y > 0 and x > 0. From Theorem 1.4.7(a), we must have z ∗ x = 0
for any left eigenvector of A not associated with λ. But if z ≥ 0, then z ∗ x > 0. Likewise, Theorem
1.4.7(a) implies we must have y ∗ z = 0 for any right eigenvector of A not associated with λ. But if z ≥ 0,
then z ∗ x > 0.

(b) Since y ∗ x > 0, if λ has geometric multiplicity of 1 then it must also have algebraic multiplicity of 1 by
Theorem 1.4.12.

1.4.P7

 A are distinct, there exists n linearly independent eigenvectors y1 , · · · , yn of A.


Since all theeigenvalues of
Define S = y1 · · · yn ∈ Mn . Since the columns of A form a basis of Cn , we can write x(0) = Sz for
some z ∈ Cn . Suppose that the last entry in z is zero, so that x(0) = z1 y1 + · · · + zn−1 yn−1 . Let y denote
a left eigenvector of A associated with λn . By Theorem 1.4.7, we know that y ∗ yi = 0 for i = 1, · · · , n − 1.
Thus y ∗ x(0) = z1 y ∗ y1 + · · · zn−1 y ∗ yn−1 = 0. We can conclude that zn 6= 0 if x(0) is not orthogonal to some
left eigenvalue associated with λn .

We know from Theorem 1.3.7 that A = SΛS −1 , where Λ = diag(λ1 , · · · , λn ) contains the eigenvalues
of A along its main diagonal. Observe that x(1) = ||x10 || Ax(0) = ||Sz||
1 1
SΛS −1 Sz = ||Sz|| SΛz. Now suppose
1 1
x(k) = ck SΛk z where ck = ||SΛk−1 z||
. Then x(k+1) = ||x(k) ||
Ax(k) = ||xc(k)
k
||
SΛS −1 SΛk z = ck
||ck Sλk z||
SΛk+1 z =
1 (k) 1
||Sλk z||
SΛk+1 z . By induction we have that x = ||SΛk−1 z||
k
SΛ z for k = 1, 2, · · · .

54
Observe that SΛk z = λk1 z1 y1 + · · · + λkn zn yn . Assume without loss of generality that λi 6= 0 for all i (if
λi = 0, the i-th term simply drops out of this equation). We have that

SΛk z
x(k) =
||SΛk−1 z||
λk1 λkn
= z1 y1 + · · · + zn yn
||SΛk−1 z|| ||SΛk−1 z||
−1
−1
λk1 1 λkn 1
  
k−1 k−1

= k k SΛ z z1 y1 + · · · + k k SΛ z zn yn
|λ | 1 λ 1 |λ | λ n n

 k  k
1 λ1 z1 y1 λn zn yn
Notice that λk
SΛk−1 z = λi λ1 + ··· + λn .
The last term in this expression will become
λi
i
 −1
λk
arbitrary large in magnitude for i = 1, · · · , n − 1. This implies |λik | λ1k SΛk−1 z

zi yi → 0 as k → ∞
i i
1 zn yn
for i = 1, · · · , n − 1. If i = n, then λk
SΛk−1 z → λn as k → ∞. If λn is positive and real, then this
n
implies x(k) → λn |zznn | |yynn | . Thus x(k) converges to a nonzero multiple of yn and therefore converges to an
λk
eigenvector of A associated with λn . If n
|λk
does not converge (i.e. λn is negative or complex), then x(k) will
n|
not converge. However each term in the sequence x(k) gets arbitrarily close to a multiple of yn . Thus each
term in the sequence x(k) becomes arbitrarily close to an eigenvector associated with λn .

1.4.P8

 
AS = Ay (n) AS−1
 
= λn y (n) AS−1
! !
  λn −1
= y (n) S−1 S AS−1
0
!
λn ∗
=S
0 B

!
−1 ∗
where S−1 ∈ Mn,n−1 is the matrix composed of the last n − 1 columns of S and S AS−1 = . Since
B
!
λn ∗
A and are similar, they have the same eigenvalues. The eigenvalues of A are λ1 , · · · , λn and
0 B
!
λn ∗
the eigenvalues of are λn and the eigenvalues of B. This implies that the eigenvalues of B are
0 B
λ1 , · · · , λn−1 .

55
1.4.P9

(a) Because the last row of A is a linear combination of the first n − 1 rows, we know that
! !
y BT
= z
α xT

for some z ∈ Cn−1 . But this implies y = B T z and α = xT z. We also have that
!

T
 B x
z −1 =0
yT α
!

so is a left eigenvector of A associated with the eigenvalue 0.
−1
! !
In−1 0 −1 In−1 0
(b) Define S = and observe that S = . Then
−z T 1 zT 1
! ! !
−1 In−1 0 B x In−1 0
SAS =
−z T 1 yT α zT 1
! !
B x In−1 0
=
0 0 zT 1
!
B + xz T x
=
0 0

The eigenvalues of SAS −1 and A are the same, which implies that the eigenvalues of B + xz T are
λ1 , · · · , λn−1 .

(c) If λ is an eigenvalue, then A − λI is singular. Thus one row of A − λI must equal a linear combination
of the other rows. This row can be made the last row of the matrix by an appropriate permutation.

1.4.P10

Let Λ = diag(λ1 , · · · , λn ) where λi is the eigenvalue associated with the ith column of T . Then we have

T ∗ A = ΛT ∗ ⇒ AT ∗− = T ∗− Λ

56
1.4.P11

If A is an upper Hessenberg matrix, then


 
t − a11 F
 
 −a21 t − a22 
 
 .. 
tI − A = 
 −a32 . 

 .. .. 

 . . 

0 −an,n−1 t − ann

If A is unreduced, then the subdiagonal entries are all non-zero. This implies that the first n − 1 columns
are linearly independent for any value of t. Thus if λ is an eigenvalue, we must have rank(A − λI) = n − 1
so that dim(nullspace(A − λI)) = 1. Therefore every eigenvalue has geometric multiplicity 1.

1.4.P12

(a) Let x be an eigenvector of A associated with λ. If one entry of x is zero, then the non-zero entries of
x provide a non-trivial linear combination of n − 1 columns of A − λI which equals zero. Thus at least
one list of n − 1 columns of A − λI is linearly dependent.

Let xi denote column i of A − λI. Suppose {x1 , · · · , xk−1 , xk+1 , · · · , xn } are linearly dependent. Since
there exists a non-trivial linear combination satisfying a1 x1 +· · ·+ak−1 xk−1 +ak+1 xk+1 +· · ·+an xn = 0,
the vector z = (a1 , · · · , ak−1 , 0, ak+1 , · · · , an )T is an eigenvector of A associated with eigenvalue λ.

(b) If no eigenvector of A associated with λ has a zero entry, then from part (a) we know that rank(A−λI) =
n − 1. But this implies dim(nullspace(A − λI)) = 1, so λ has geometric multiplicity 1.

1.4.P13

(a)

γy ∗ x = tr(γy ∗ x)
= tr(γxy ∗ )
= tr adj(λI − A)
= det(λIn−1 − A[−1, −1]) + det(λIn−1 − A[−2, −2]) + · · · + det(λIn−1 − A[−n, −n])
= En−1 (λI − A)
= Sn−1 (λI − A)
= (λ − λ2 ) · · · (λ − λn ) + (λ − λ)(λ − λ3 ) · · · (λ − λn ) + · · · + (λ − λ)(λ − λ2 ) · · · (λ − λn−1 )
= (λ − λ2 ) · · · (λ − λn )

(b) y ∗ x 6= 0 ⇐⇒ (λ−λ2 ) · · · (λ−λn ) 6= 0 ⇐⇒ λ 6= λi , i = 2, · · · , n ⇐⇒ the algebraic multiplicity of λ is 1


Qn
i=2 (λ−λi )
(c) If λ is simple, then part (a) implies that γ = y∗ x .

57
(d) Suppose no element of x or y is zero. In 1.4.P12 we showed that because no element of x equals zero,
every list of n − 1 columns of A − λI is linearly independent. Using a similar argument, one can show
that because no element of y equals zero, every list of n − 1 rows of A − λI is linearly independent. But
using 0.7.6.1, this implies that every principle submatrix of λI − A of size n − 1 has full rank. Thus
every principle minor of size n − 1 must be non-zero.

Now suppose every principle minor of size n − 1 must be non-zero. Since the elements along the main
diagonal of adj(λI − A) are the principle minors of size n − 1 of λI − A, we must have that every main
diagonal entry of adj(λI − A) is non-zero.

Now suppose every main diagonal entry of adj(λI − A) is non-zero. Since adj(λI − A) = γxy ∗ , this
implies γxi ȳi 6= 0 for i = 1, · · · n where xi and ȳi denote the ith element in x and ȳ, respectively. But
this means that all the elements of x and ȳ are non-zero, so all the elements of adj(λI − A) are nonzero.
This also implies that all elements of x and y are nonzero.

1.4.P14

From 0.8.2.1, we have that

pA (t)I = det(A − tI)I = (A − tI) adj(A − tI) = adj(A − tI)(A − tI)

(a) If λ is an eigenvalue, we know that pA (λ) = 0. But this implies 0 = (A − λI) adj(A − λI), so that each
column of A satisfies 0 = (A − λI)x. But this implies is each each non-zero column of A is an eigenvector
associated with λ.

(b) If λ is an eigenvalue, we know that pA (λ) = 0. But this implies 0 = adj(A − λI)(A − λI), so that each
row of adj(A − λI) satisfies z(A − λI) = 0. But this implies that if z is non-zero, z̄ T is a left eigenvector
of A.

(c) If dim(nullspace(A − λI)) > 1, then rank(A − λI) < n − 1. This means that every determinant of size
n − 1 is zero, so adj(A − λI) = 0.

If dim(nullspace(A − λI)) = 1, then from 0.8.2 we know that rank(adj(A − λI)) = 1. But this means
that adj(A − λI) 6= 0.
!
d−λ −b
(d) Since adj(A − λI) = , this follows from parts (a) and (b).
−c a−λ

1.4.P15

det(A − λI + κzw∗ ) = det(A − λI) + κw∗ adj(A − λI)z


= κw∗ (γxy ∗ )z
= κγ(w∗ x)(y ∗ z)
6= 0

58
where the first equality follows from 0.8.5.10 and the second equality from Theorem 1.4.11. Theorem 1.4.12(b)
implies that y ∗ x 6= 0, so we can choose z = x.

1.4.P16

If x is an eigenvector associated with A, then


   
ax1 + bx2 λx1
 cx1 + ax2 + bx3   λx2 
   
..   .. 
   
Ax =   =  .  = λx

 .   
cxn−2 + axn−1 + bxn  λxn−1 
   

cxn−1 + axn λxn

This implies

bx2 + (a − λ)x1 + = 0
bx3 + (a − λ)x2 + cx1 = 0
..
.
bxn + (a − λ)xn−1 + cxn−2 = 0
(a − λ)xn + cxn−1 = 0

These equations can be written as a second-order difference equation:

bxk+1 + (a − λ)xk + cxk−1 = 0, k = 1, · · · , n, x0 = 0, xn+1 = 0

There are two possible cases:

• (a − λ)2 = 4bc : The solution is given by

xk = A1 rk + A2 krk

where r = − (a−λ)
2b . Evaluating this expression at k = 0 gives us A1 = 0. Evaluating this expression at
k = n − 1 gives us A2 = 0. But this means that x = 0, which contradicts the assumption that x is an
eigenvector. Thus this case can be ignored.

• (a − λ)2 6= 4bc : The solution is given by

xk = A1 r1k + A2 r2k

where
p
−(a − λ) +(a − λ)2 − 4bc
r1 =
p2b
−(a − λ) − (a − λ)2 − 4bc
r2 =
2b

59
Evaluating this expression at k = 0 gives us A1 + A2 = 0. Evaulating this expression at n + 1 gives us
 n+1
r1 r1 2πiκ
0= A1 r1n+1 − A1 r2n+1 ⇒ =1 ⇒ = e n+1 for some κ ∈ {1, · · · , n}
r2 r2

Since r1 r2 = cb , we must have for some κ ∈ {1, · · · , n}

πiκ πiκ
c/b e− n+1
p p
r1 = c/b e n+1 , r2 =

or

πiκ πiκ
r2 = − c/b e− n+1
p p
r1 = − c/b e n+1 ,

λ−a
We also know that r1 + r2 = b , so we must have λ = a + b(r1 + r2 ). Thus the eigenvalues of A can
be written as
n √ πiκ πiκ
o
σ(A) = a + bc(e n+1 + e− n+1 ) : κ ∈ {1, · · · , n}

   
πκ
= a + 2 bc cos : κ ∈ {1, · · · , n}
n+1

   
πκ
= a − 2 bc cos : κ ∈ {1, · · · , n}
n+1

1.4.P17

Substituting into the previous expression, we have


   
πκ
σ(A) = 2 − 2 cos : κ ∈ {1, · · · , n}
n+1

Using the identities cos 2x = cos2 x − sin2 x and sin2 x + cos2 x = 1, we have
           
πκ 2 πκ 2 πκ 2 πκ 2 πκ
2 − 2 cos = 2 sin + cos − cos − sin
n+1 2(n + 1) 2(n + 1) 2(n + 1) 2(n + 1)
 
πκ
= 4 sin2
2(n + 1)

Chapter 2

Section 2.1

Exercises

1.

x∗i xj = (yi∗ yi )−1/2 yi∗ yj (yj∗ yj )−1/2 = 0, i 6= j


x∗i xi = (yi∗ yi )−1 yi∗ yi = 1

60
2. Suppose {y1 , · · · , yn } is an orthogonal list of nonzero vectors in Cn and suppose that α1 y1 +· · ·+αn yn =
0. Define βi = αi (yi∗ yi )1/2 and xi = yi (yi∗ yi )−1/2 for i = 1, · · · , n. The linear relation can then be
written as β1 x1 + · · · + βn xn = 0, where {x1 , · · · , xn } is orthonormal. But Theorem 2.1.2 then implies
that β1 = · · · = βn = 0. Since all yi are nonzero, this means that α1 = · · · = αn = 0.

3. Suppose x1 , · · · , xk are orthogonal and order the vectors so that x1 , · · · , xm , m ≤ k, are nonzero. The
previous exercise implies that x1 , · · · , xm are linearly independent, which means that m ≤ n. If k > n,
we must then have k − m ≥ k − n ≥ 1, i.e. there are at least k − n zero vectors.

4. Let V be a subspace of Cn and let y1 , · · · , yk be a basis of V . Define x1 = y1 and z1 = (x∗1 x1 )−1/2 x1 .


Define the vectors z2 , · · · , zk recursively according to the following algorithm:


xj = yj − (zj−1 yj )zj−1 − · · · − (z1∗ yj )z1
zj = (x∗j xj )−1/2 xj

By construction, zj∗ zj = 1 for j = 1, · · · , k. Notice that the list {z1 } is orthonormal. Suppose that the
list of vectors {z1 , · · · , zj } is orthonormal. Pick i < j + 1. Then we have


zi = (x∗j+1 xj+1 )−1/2 yj+1
∗ ∗
zj )zj∗ zi − · · · − (yj+1

z1 )z1∗ zi

zj+1 zi − (yj+1
= (x∗j+1 xj+1 )−1/2 yj+1
∗ ∗

zi − yj+1 zi = 0

But this implies that the list {z1 , · · · , zj , zj+1 } is orthonormal. By induction, it follows that {z1 , · · · , zk }
is orthonormal.

5. Observe that

(U ⊕ V )∗ (U ⊕ V ) = (U ∗ ⊕ V ∗ )(U ⊕ V )
= (U ∗ U ) ⊕ (V ∗ V )

Thus (U ⊕ V )∗ (U ⊕ V ) = In ⊕ Im ⇐⇒ U ∗ U = In and V ∗ V = Im .

6.
! !
∗ 1 In In In In
Q Q=
2 In −In In −In
!
1 2In
=
2 2In
= I2n

See 1.3.P20(e) and 1.3.P21(a) for U and V , respectively.

7. (a) Observe that


!
cos2 θ + sin2 θ 0
Uθ∗ Uθ = = I2
0 sin2 θ + cos2 θ

61
!
a11 a12
so Uθ is real orthogonal for any θ ∈ R. Conversely, let A = be real orthogonal. This
a21 a22
implies that the following relationships hold:

a211 + a212 = a221 + a222 = a211 + a221 = a212 + a222 = 1


a11 a21 + a12 a22 = a11 a12 + a21 a22 = 0

We can use these equations to observe that

(a11 a22 − a12 a21 )2 = a211 a222 − 2a11 a12 a21 a22 + a212 a221
= a211 a222 + 2a211 a212 + a212 a221
= a211 (a212 + a222 ) + a212 (a211 + a221 )
= a211 + a212
=1

But this then implies that det A = a11 a22 − a12 a21 = ±1. If det A = 1, we have that

a211 a222 − a12 a21 a11 a22 = a11 a22 ⇒ a22 (a11 − a22 ) = 0
a12 a21 a11 a22 − a212 a221 = a12 a21 ⇒ a12 (a12 + a21 ) = 0

These equations then imply that a11 = a22 and a12 = −a21 . (If a22 = 0, then a12 = 1 and a11 = 0.
Likewise if a12 = 0, then a22 = 1, a21 = 0 and a11 = 1.) Similarly, if det A = −1 we can show that
a12 = a21 and a11 = −a22 .

Now suppose we define θ according to


 
a21


 arctan if a11 > 0
 a11 



arctan a21 + π


if a11 < 0

θ= a11
π
if a11 = 0 and a21 > 0


2


− π


if a11 = 0 and a21 < 0

2

Then cos(θ) = a11 and sin(θ) = a21 . In this case, Uθ equals


!
a11 −a21
Uθ =
a21 a11

Now if det A = 1, we can use the fact that a11 = a22 and a12 = −a21 to conclude that A = Uθ . If
det A = −1, then we can use the fact that a12 = a21 and a11 = −a22 to see that
! !
a11 −a12 1
Uθ = = A
a21 −a22 −1

62
!
1
This implies A = Uθ .
−1
(b) Define φ as 
θ if det A = 1
φ=
π − θ if det A = −1
2

where θ is defined in part (a). Notice that if det A = 1, then Uφ = Uθ = A. If det A = −1, then
cos(φ) = sin(θ) = a21 and sin(φ) = cos(θ) = a11 . In this case Uφ equals
! ! !
a21 −a11 a21 a22 1
Uφ = = = A
a11 a21 a11 a12 1
!
1
so that A = Uφ .
1
The geometric interpretation of these two transformations can be best understood using Figure 1. Let
a1 and a2 denote the column vectors of A. θ defined in part (a) measures the angle of rotation of
a1 counterclockwise from the positive x-axis. If det A = 1, then θ also equals the angle of rotation
of a2 so no further transformations are required (see Figure 1a). If det A = −1, we must perform
two transformations to recover the linear transformation defined by A. In part (a), we recover A by
first rotating clockwise by θ starting from the positive x-axis (the dashed lines in Figure 1b) and then
reflecting over the line spanned by a1 (the green dashed line in Figure 1b). In part (b), we recover A
by first rotating counterclockwise by θ starting from the positive y-axis (the dotted line in Figure 1b)
and then reflecting over the line y = x (the red dotted line in Figure 1b).

8. (U V )∗ U V = V ∗ U ∗ U V = V ∗ V = I

9. Let O(n, Cn ) denote the set of unitary matrices in Mn . The previous exercise illustrates that O(n, Cn )
is closed under the associative operation of matrix multiplication. I is the identity for matrix multi-
plication and since I ∗ I = I, we know that I ∈ O(n, Cn ). We also know from Theorem 2.1.4 that for
any U ∈ O(n, Cn ), U −1 = U ∗ ∈ O(n, Cn ).

10. Observe that



I if k is even
2

 
Uk = 0 1



  if k is odd
 1 0
!
0 1
Thus I2 and are possible limits of subsequences of Uk .
1 0
11. By the same argument, the set of real orthogonal matrices in Mn (R) is closed and bounded. Thus this
set is a compact subset of R2 , so for any infinite sequence of real orthogonal matrices there exists a
convergence subsequence. Since the set is closed, the limit of the convergent subsequence must be a
real orthogonal matrix.

63
a2

a1

(a) det A = 1

a1

a2

(b) det A = −1

Figure 1: Real orthogonal matrices in R2

64
12. Suppose U is upper triangular. Fix U [{i}, {j}], where j > i. Construct the partition
!
U [{1, · · · , i}] U [{1, · · · , i}, {i + 1, · · · , n}]
U=
U [{i + 1, · · · , n}, {1, · · · , i}] U [{i + 1, · · · , n}]

Now since U is upper triangular, U [{i + 1, · · · , n}, {1, · · · , i}] = 0. Lemma 2.1.10 then implies that
U [{1, · · · , i}, {i + 1, · · · , n}] = 0. But this implies U [{i}, {j}] = 0. Thus any element of U above the
main diagonal is zero, so U is lower triangular. Since U is both upper and lower triangular, U must
be diagonal.

13.
 
Ii−1
cos2 θ + sin2 θ − cos θ sin θ + sin θ cos θ
 
 
U ∗U = 
 
Ij−i−1  = In
 
− cos θ sin θ + cos θ sin θ sin2 θ + cos2 θ
 
 
In−j

14. Since U (θ; i, j) is unitary, U (θ; i, j)−1 = U (θ; i, j)∗ . But since cos(−θ) = cos(θ) and sin(−θ) = − sin(θ),
we have that
 
Ii−1
cos(θ) sin(θ)
 
 
U (θ; i, j)−1
 
=
 Ij−i−1 

− sin(θ) cos(θ)
 
 
In−j
 
Ii−1
cos(−θ) − sin(−θ)
 
 
 
=
 Ij−i−1 

sin(−θ) cos(−θ)
 
 
In−j

= U (−θ; i, j)

15.

Uw∗ = (I − 2(w∗ w)−1 ww∗ )∗ = I ∗ − 2(w∗ w)−1 (ww∗ )∗ = I − 2(w∗ w)−1 ww∗ = Uw

Uw∗ Uw = (I − 2(w∗ w)−1 ww∗ )(I − 2(w∗ w)−1 ww∗ )


= I − 4(w∗ w)−1 ww∗ + 4(w∗ w)−2 ww∗ ww∗
= I − 4(w∗ w)−1 ww∗ + 4(w∗ w)−1 ww∗
=I

65
16. Suppose Uw ∈ Mn (R). Since Uw = Uw∗ = (Ūw )T = UwT , Uw is symmetric. Since I = Uw∗ Uw = UwT Uw ,
Uw is real orthogonal.

Suppose x is non-zero and Uw x = λx. Then x = Uw∗ Uw x = Uw Uw x = λUw x = λ2 x. Since x is nonzero,


this implies λ2 = 1.

17. Suppose w∗ x = 0. Then Uw x = (I−2(w∗ w)−1 ww∗ )x = x. We also have Uw w = (I−2(w∗ w)−1 ww∗ )w =
w − 2w = −w.

18. From (0.8.5.11), we have that

det(Uw ) = det(I − 2(w∗ w)−1 ww∗ )


= det(I) − 2(w∗ w)−1 w∗ w
= −1

19. From 1.2.8, we have that

(t − 1)pUw (t) = (t − (1 − 2))(t − 1)n


⇒ pUw (t) = (t + 1)(t − 1)n−1

Thus the eigenvalues of Uw are −1, 1, · · · , 1. Since det Uw is the product of all the eigenvalues of Uw ,
we must have det Uw = −1.

20. If x = y, w ⊥ x. Thus by a previous exercise we know that Uw x = x = y. If x 6= y, notice that


(x − y)T (x + y) = xT x − y T y = 0. Using the previous results, we have

Uw w = −w ⇒ Uw x − Uw y = y − x
Uw (x + y) = x + y ⇒ Uw x + Uw y = x + y

Adding these two relations gives us Uw x = y. Notice that this argument works as along as ||x||2 = ||y||2 .

21. Since the only eigenvalues of Uw are 1 and −1, there cannot exist a Uw such that Uw e1 = ie1 . Thus e1
cannot be transformed by a Householder matrix into ie1 , even though ||e1 ||2 = ||ie1 ||2 .

22. If y = eiθ e1 for some θ ∈ R, then U (y, e1 ) = eiθ I. But then the first column of U (y, e1 ) equals eiθ e1
which is equal to y.

If y 6= eiθ e1 for all θ ∈ R, then define φ ∈ [0, 2π) such that y1 = e∗1 y = eiφ |e∗1 y| = eiφ |y1 |. (If
 0
1 − |y1 |
y1 = 0, define φ = 0.) Define w = eiφ e1 − y = y1 , −y2 , · · · , −yn . (If y1 = 0, then w =
|y1 |
(1, −y2 , · · · , −yn )0 .) Then w∗ w = 2(1 − |y1 |). Define U (y, e1 ) = eiφ Uw where Uw = I − 2(w∗ w)−1 ww∗ .
The first column of U (y, e1 ) is given by

1 − |y1 | eiφ ȳ1


eiφ e1 − 2 w = eiφ e1 − w = y
2(1 − |y1 |) |y1 |

23. Since x and ||x||2 e1 have the same length, Theorem 2.1.13 applies. This means that that U (||x||2 e1 , x)

66
is essentially Hermitian, unitary, and U (||x||2 e1 , x)x = ||x||2 e1 .

24. From Theorem 2.1.14(d), we know there exists unitary matrix Q and upper triangular matrix R with
non-negative diagonal entries such that A∗ = QR. Then A∗ A = R∗ Q∗ QR = R∗ R = LL∗ where
L = R∗ is lower triangular. If A is nonsingular, then Theorem 2.1.14(b) ensures that R (and thus L)
are unique.

2.1.P1

1 = det I = det U ∗ U = det U ∗ det U = | det U |2 ⇒ | det U | = 1

2.1.P2

(a)

U x = λx, x 6= 0 ⇒ x∗ x = x∗ U ∗ U x = (U x)∗ U x = λλ̄x∗ x = |λ|2 x∗ x, x∗ x > 0


⇒ |λ|2 = 1

(b)

U x = λx ⇒ λ̄U ∗ U x = λ̄λU ∗ x ⇒ U ∗ x = λ̄x ⇒ x∗ U = λx∗

2.1.P3

U ∗ U = diag(e−iθ1 , e−iθ2 , · · · , e−iθn ) diag(eiθ1 , eiθ2 , · · · , eiθn )


= diag(e−iθ1 eiθ1 , e−iθ2 eiθ2 , · · · , e−iθn eiθn )
= diag(1, 1, · · · , 1)
= In

Suppose U = diag(u1 , · · · , un ) and I = U ∗ U = diag(ū1 u1 , · · · , ūn un ) = diag(|u1 |2 , · · · , |un |2 ). This implies


|ui |2 = 1 for i = 1, · · · , n, which means that ui = eiθi for some θi ∈ [0, 2π).

2.1.P4

From 2.1.P3, we know that each element on the diagonal must have length equal to 1. Since the only two
real numbers with length equal to 1 are 1 and -1, the diagonal real orthogonal matrices only have 1s and
−1s on the diagonal.

67
2.1.P5

Since P T = P −1 , we know that P is real orthogonal. Suppose P and Q are both permutation matrices.
Then P Q has the same columns (rows) as P (Q) rearranged in a different order. But this means each column
and each row of P Q has exactly 1 non-zero element equal to 1. By definition, P Q must be a permutation
matrix. Since each row and column of P contains exactly one nonzero element equal to 1, the same is true
of P T . But this means P −1 = P T is also a permutation matrix. Finally, observe that each row and column
of the identity matrix contains exactly 1 non-zero element equal to 1. Thus I is a permutatation matrix.
We can conclude that the set of permutation matrices is a group.

There are n! permutation matrices in Mn (the number of ways to rearrange the n columns of the identity
matrix).

2.1.P6

Let U ∈ M (R3 ) and denote the i, j-th element of U by uij . Define the matrix Uθ,φ,ψ as
 
cos(θ) cos(φ) − sin(θ) cos(φ) − sin(φ)
Uθ,φ,ψ = sin(θ) cos(ψ) − cos(θ) sin(φ) sin(ψ) cos(θ) cos(ψ) + sin(θ) sin(φ) sin(ψ) − cos(φ) sin(ψ)
 

sin(θ) sin(ψ) + cos(θ) sin(φ) cos(ψ) cos(θ) sin(ψ) − sin(θ) sin(φ) cos(ψ) cos(φ) cos(ψ)

Define the angles θ, φ, and ψ according to


  
u12
arctan − if u11 > 0


 u11 





 u12
arctan − +π if u11 < 0

θ= u11

− π2 if u11 = 0 and u12 ≥ 0






π

if u11 = 0 and u12 < 0
2
 !
 u13
arctan − p if u213 6= 1


1 − u213



φ=

 − π2 if u13 = 1


π

if u13 = −1
2
  
u23
arctan − if u11 u22 > u12 u21


 u11 u22 − u12 u21 





 u23
arctan − + π if u11 u22 < u12 u21

ψ= u11 u22 − u12 u21

 π


−2 if u11 u22 = u12 u21 and u23 ≥ 0


π

if u11 u22 = u12 u21 and u23 < 0
2

68
With these angles, we have that

u11 u12
cos(θ) = p sin(θ) = − p
1− u213 1 − u213
q
cos(φ) = 1 − u213 sin(φ) = −u13
u11 u22 − u12 u21 u23
cos(ψ) = p sin(ψ) = − p
1 − u213 1 − u213

Then it is possible to show that  


1
U = 1  Uθ,φ,ψ
 

det U

2.1.P7

Since AB = I, for any y ∈ Cn we have ABy = y. But this means that for any y ∈ Cn , there exists an
x = By ∈ Cn such that Ax = y. Thus the dimension of the column space of A is n, so the dimension of the
null space of A is 0 by the rank-nullity theorem. Now observe that

A(AB − BA) = A(I − BA) AB = I by assumption


= A − ABA
=A−A AB = I by assumption
=0

Thus each column of AB − BA is in the nullspace of A. But this means that each column of AB − BA is
zero, so we must have AB − BA = 0. This implies that BA = AB = I

2.1.P8

(a) Suppose that U is a real orthogonal matrix. Then I = U T U = U ∗ U , so U is unitary.

Conversely, suppose U is a unitary, complex orthogonal matrix. Since U T U = I, we must have Ū T Ū = I


which means that Ū Ū T = I. We also know that U ∗ U = Ū T U = I, so multiplying both sides by Ū gives
us U = Ū .

(b) For any t ∈ R we have

A(t)T A(t) = cosh2 (t)I + (i sinh(t) cosh(t))(S T + S) − sinh2 (t)S T S


= (cosh2 (t) − sinh2 (t))I
=I

By part (a), a complex orthogonal matrix is unitary if and only if it is real. Thus A(t) is unitary if and
only if sinh t = 0, which occurs if and only if t = 0.

69
(c) We can make the absolute value of each element of A(t) arbitrary large by taking t to +∞ or −∞. Since
each A(t) is a complex orthogonal matrix, the set of these matrices is not bounded.

(d) Suppose A and B are complex orthogonal. Then (AB)T AB = B T AT AB = B T B = I, so AB is complex


orthogonal. Since I T I = I, I is complex orthgonal. If AT A = I, then AAT = (AT )T AT = I. Thus
A−1 = AT is complex orthogonal.

(e) Since 1 = det AT A = det AT det A = (det A)2 , we must have | det A| = 1.

The characteristic polynomial of A(t) is given by

pA(t) (u) = det(uI − A(t))


= det((u − cosh(t))I − i sinh(t)S)
= (u − cosh(t))2 − sinh(t)2
= u2 − 2 cosh(t)u + 1

which has roots λ1 = cosh(t) + i sinh(t) and λ2 = cosh(t) − i sinh(t). But |λ1 | = |λ2 | = cosh2 (t) +
sinh2 (t) 6= 1 unless t = 0.

(f) Assume that AT A = I. Then ĀT Ā = I, so Ā is complex orthogonal. We showed in part (d) that AT is
also complex orthogonal. Combining these results, we have that A∗ = ĀT is complex orthogonal.

The columns and rows of A are orthogonal if and only if A is unitary, which we know from part (a)
occurs if and only if A is real.

(g) Suppose U = diag(u1 , · · · , un ) and U T U = diag(u21 , · · · , u2n ) = I. Then u2i = 1 for i = 1, · · · , n, which
implies that ui ∈ {−1, 1} for i = 1, · · · , n. Comparing with 2.1.P4, we see that the set of diagonal
complex orthogonal matrices is the same as the set of real orthogonal matrices.

(h) By part (a), we know that if A is complex orthogonal and unitary if and only if A is real orthogonal.

2.1.P9

Suppose U is unitary. Then Ū ∗ Ū = U T Ū = (Ū T U )T = (U ∗ U )T = I T = I, so Ū is unitary. We also have


(U T )∗ U T = Ū U T = (U Ū T )T = (U U ∗ )T = I T = I, so U T is unitary. Since U T is unitary, U ∗ = Ū T is
unitary.

2.1.P10

If U is orthogonal, then x∗ y = x∗ U ∗ U y = (U x)∗ U y. Thus x∗ y = 0 if and only if (U x)∗ U y = 0.

2.1.P11

A−1 = −AT if and only if I = −AT A = (iA)T (iA). Similarly, A−1 = eiθ AT if and only if I = eiθ AT A =
(eiθ/2 A)T eiθ/2 A. If θ = 0, this expression says A−1 = AT if and only if A is orthogonal. If θ = π, this
expression says A−1 = −AT if and only if A is skew-orthogonal.

70
2.1.P12

Suppose A = SU S −1 where U is unitary. Then A−1 = SU −1 S −1 = SU ∗ S −1 = SS ∗ A∗ S −∗ S −1 , so A−1 is


similar to A∗ by the similarity matrix SS ∗ .

2.1.P13

From 2.1.P12, we know that the set of matrices which are similar to unitary matrices is contained in the set
of matrices A for which A−1 is similar to A∗ .

Consider the matrix A = diag(2, 1/2). The eigenvalues of A are 2 and 1/2, which implies that A cannot
be similar to a unitary matrix (from 2.1.P2, we know that the eigenvalues of a unitary matrix must have
absolute value equal to 1). But observe that
! ! !
−1 1 2 1
A = diag(1/2, 2) = = SA∗ S −1
1 1/2 1

2.1.P14

This follows immediately from 2.1.P8(h).

2.1.P15

Define P = I[∅c , {α, αc }] as the permutation matrix which orders the columns of In according to the list
{α, αc }. For example, if n = 3 and α = {1, 3} then
 
1
P = 1
 

Using this definition, we have that


!
U [α] U [α, αc ]
= PT UP
U [αc , α] U [αc , αc ]

Since U and P are unitary, P T U P is unitary. The result then follows directly from Lemma 2.1.10.

2.1.P16

(a) We have that

Uw Ux = (I − 2(wT w)−1 wwT )(I − 2xxT )


= I − 2(wT w)−1 wwT − 2xxT + 4(wT x)(wT w)−1 wxT

71
Observe that

wxT = (x + y)xT = xxT + yxT


wT x = (xT + y T )x = 1 + y T x
wT w = (xT + y T )(x + y) = 2(1 + y T x)

Plugging these expressions into the previous gives us

Uw Ux = I − 2(wT w)−1 wwT − 2xxT + 2(xxT + yxT ) = I − 2(wT w)−1 wwT + 2yxT = Px,y

(b)
det Px,y = det Uw Ux = det Uw det Ux = (−1)(−1) = 1

(c)

Px,y x = x − 2(wT x)(wT w)−1 w + 2y


= x − w + 2y
=y
Px,y y = y − 2(wT y)(wT w)−1 w + 2(xT y)y
= y − w + 2(xT y)y
= −x + 2(xT y)y

(d) Suppose z ⊥ x and z ⊥ y. Then z ⊥ w so that

Px,y z = z − 2(wT z)(wT w)−1 w + 2(xT z)y


=z

(e) If z ∈ (span{x, y})⊥ , then z ⊥ x and z ⊥ y. From part (d), we know that Px,y z = z, so Px,y functions
as the identity on (span{x, y})⊥ .

In part (c), we showed that Px,y x = y. The matrix representation of the linear ( transformation T ):
y − (xT y)x
span{x, y} → span{x, y} defined by T z = Px,y z with respect to the orthonormal basis x, p
1 − (xT y)2
is given by p !
xT y − 1 − (xT y)2
p
1 − (xT y)2 xT y
This matrix is real orthogonal with determinant equal to 1, implying that the transformation T is a
proper rotation.

72
(f) Observe that

Px,y x × Px,y y = y × (−x + 2(xT y)y)


= −y × −x + 2(xT y)y × y
=x×y

Thus Px,y does not change the cross-product of x and y.

Now suppose A is a proper rotation matrix satisfying Ax = y and Ax × Ay = x × y. Using the fact that
M x × M y = M (x × y) for any proper rotation matrix, we have that

x × y = Ax × Ay = A(x × y)
T
Define z = √y−(x y)x
and note that z ⊥ x and z ⊥ (x × y). Because A is orthogonal, we know that
1−(xT y)2
Az ⊥ A(x × y) and Az ⊥ Ax. This implies that Az ⊥ (x × y) and Az ⊥ y. But this means that
Az = c(x − (xT y)y) for some c ∈ R. We also know that ||Az||2 = ||z||2 = 1, which implies that
T
c = (1 − (xT y)2 )−1/2 . Define z̃ = √x−(x Ty)y2 . We then have the system of equations
1−(x y)

      −1
A x z x×y = y z̃ x×y ⇒ A= y z̃ x×y x z x×y

which uniquely defines the matrix A in terms of x and y. We already know the matrix Jx,y is a solution
to this system and therefore must be the unique solution.

(g) Using the matrix determinant lemma, we have that

pPx,y (t) = det((t − 1)I + 2(wT w)−1 wwT − 2yxT )


!!
  wT
T
= det (t − 1)I + 2(w w) w −2y −1
xT
! !
n 2 wT  T −1 
= (t − 1) det I2 + (w w) w −y
t − 1 xT
 2 2 
1+ − wT y
(t − 1) (t − 1)
= (t − 1)n det 
 

T
 2 x w 2 T

T
1− x y
(t − 1) w w (t − 1)
2 2
 
1+ − (1 + xT y)
t−1 t−1
= (t − 1)n det 
 

 1 2 T 
1− x y
t−1 t−1
   
2 1
= (t − 1)n 1 + (1 − xT y) 1+
t−1 t−1
n−2 2 T

= (t − 1) (t − 1) + 2(1 − x y)t
= (t − 1)n−2 t2 − 2(xT y)t + 1

  p    p 
= (t − 1)n−2 t − xT y + i 1 − xT y t − xT y − i 1 − xT y

73
p √ p
If we define cos θ = xT y, then 1 − (xT y)2 = 1 − cos2 θ = ± sin θ. Thus xT y ± i 1 − xT y =
cos θ ± i sin θ = e±iθ .

2.1.P17

Let a1 , · · · , am denote the column vectors of A. Since rank A = m, these column vectors are linearly
independent. Define x1 = a1 and q1 = (x∗1 x1 )−1/2 x1 and then define the vectors q2 , · · · , qm recursively
according to the following algorithm:


xj = aj − (qj−1 aj )qj−1 − · · · − (q1∗ aj )q1
qj = (x∗j xj )−1/2 xj

It was shown in exercise 4 that q1 , · · · , qm are orthonormal. We can write these equations as

a1 = (x∗1 x1 )1/2 q1
aj = (x∗j xj )1/2 qj + (qj−1

aj )qj−1 + · · · + (q1∗ aj )q1

Define Q ∈ Mn,m as the matrix with columns q1 , · · · , qm . Then

A = QR

where  ∗ 1/2 
(x1 x1 ) q1∗ a2 ··· q1∗ am
 .. .. 
 (x∗2 x2 )1/2 . . 
R=
 
..

. ∗
qm−1 am
 
 
(x∗m xm )1/2
We know from Theorem 2.1.14 that the QR decomposition is unique when rank A = m. Therefore this
factorization will be the same as the one obtained using Householder matrices.

2.1.P18

(a) By construction, {q1 , · · · , qn } is orthonormal. This implies {q1 , · · · , qk } ⊆ {q1 , · · · , qn } is orthonormal.


Because A = QR and R is upper triangular, ak can be written as a linear combination of {q1 , · · · , qk }.
This implies each vector in {a1 , · · · , ak } can be written as a linear combination of {q1 , · · · , qk }, allowing
us to conclude that {q1 , · · · , qk } is a basis of span{a1 , · · · , ak }.

(b) Let V = span{a1 , · · · , ak−1 }. Define bk = r1k q1 + · · · + rk−1,k qk−1 . Then ak = bk + rkk qk with bk ∈ V
and qk ∈ V ⊥ . For any v ∈ V , the distance between ak and v is given by
q
||ak − v|| = ||ak − bk + bk − v|| = ||rkk qk + bk − v|| = 2 + ||b − v||2
rkk k

This distance is minimized when bk = v, in which case the distance between ak and v equals rkk .

74
2.1.P19

(a) We know that


Y = QR−∗ = QRR−1 R−∗ = XR−1 R−∗

which implies that X = Y R∗ R. Since each column in X can be written as a linear combination of the
columns of Y , the columns of Y span the column space of X. Since the column space of X has dimension
m, the m columns of Y must be a basis of the column space of X.

We also have that


Y ∗ X = R−1 Q∗ QR = R−1 R = I

(b) Suppose that Z = XA and Z ∗ X = I. Then Z ∗ X = A∗ X ∗ X = I which implies that A∗ = (X ∗ X)−1 =


(R∗ Q∗ QR)−1 = (R∗ R)−1 = R−1 R−∗ (rank X ∗ X = rank X = m, so X ∗ X is invertible). But this implies
Z = XR−1 R−∗ = QRR−1 R−∗ = QR−∗ = Y .

(c) We know that X ∗ Y = (Y ∗ X)∗ = Im



= Im and that each column of X is a linear combination of the
columns of Y . But because dual bases are unique, this means the columns of X must be the dual basis
of the columns of Y .

(d) If n = m, then X is invertible and (X −∗ )∗ X = X −1 X = I. Since the columns of X span Cn , the


columns of X −∗ are in the column space of X. Thus the columns of X −∗ are a basis of Cn dual to the
columns of X.

2.1.P20

From 0.8.2. and Theorem 2.1.4(b), we have that adj U = (det U )U −1 = (det U )U ∗ . Thus (adj U )∗ adj U =
| det U |2 U U ∗ = I since | det U | = 1 by 2.1.P1.

2.1.P21

If U is complex orthogonal, then U −1 = U T . From the law of complementary nullities, we have that
T T
nullity U11 = nullity U22 and nullity U12 = nullity U21 . The rank-nullity theorem then implies that k −
T T
rank U11 = n − k − rank U22 and k − rank U12 = k − rank U21 . Using the fact that rank A = rank AT for any
A ∈ Mm,n , we have rank U11 = rank U22 − n + 2k and rank U12 = rank U21 .

Using the same argument as in exercise 12, we can can conclude that a complex orthogonal matrix is upper
triangular if and only if it is diagonal. From 2.1.P8(g), we know that diagonal complex orthogonal matrices
have only 1’s and -1’s on the diagonal.

2.1.P22

X and Y have the same column space if and only if X = Y U for some U ∈ Mm . If X = Y U , then
Im = X ∗ X = U ∗ Y ∗ Y U = U ∗ U . Thus X and Y have the same column space if and only if X = Y U for
some unitary U ∈ Mm .

75
2.1.P23

We have
| det A| = | det(QR)| = | det Q det R| = | det Q|| det R| = det R = r11 · · · rnn

since | det Q| = 1 by 2.1.P1 and rii ≥ 0 by Theorem 2.1.14(a). We also have that

A∗ A = R∗ Q∗ QR = R∗ R
p p
But this implies ||ai ||22 = ||ri ||22 , so ||ai ||2 = ||ri ||2 =
|ri1 |2 + · · · + |rii |2 ≥ |rii |2 = rii . Equality only holds
Qn Qn
if ri1 = · · · = ri,i−1 = 0, which implies that ai = rii qi . We can conclude that | det A| = i=1 rii ≤ i=1 ||ai ||2
with equality if and only if either ai = 0 for some i (in which case both sides are zero) or ai = rii qi for all i.
In the second case we have that a∗i aj = rii rjj qi∗ qj = 0, so the columns of A are orthogonal.

2.1.P24

(a)
n
XY X
per E = eiσ(i) = 1 = 3! = 6
σ i=1 σ

(b)
3
3
Y
| det B| ≤ ||bi ||2 = 3 2 < 6 = per E
i=1

2.1.P25

From 0.8.1, we have that I(n) = Cr (In ) = Cr (U ∗ U ) = Cr (U ∗ )Cr (U ) = Cr (U )∗ Cr (U ) for any r ≤ n.


r

2.1.P26

(a) Let a1 ∈ Cn be the first column of A. Define s1 = ||a1 ||2 and θ1 = Arg a11 . If s1 = 0, take w1 = e1 .
If a1 = eiθ1 e1 , take w1 = a1 . Otherwise, take w1 = e−iθ1 a1 − s1 e1 . Define H1 = In − 2(w1∗ w1 )−1 w1 w1∗ .
Then it is easily shown that H1 a1 = r1 e1 where



 0 if s1 = 0

r1 = −eiθ1 if a1 = eiθ1 e1


eiθ1 s

otherwise
1

Note that unlike the construction used for the QR decomposition, r1 is not guaranteed to be real or
positive.

Next, let (b02 , a02 )0 denote the second column of H1 A, where a2 ∈ Cn−1 . Define s2 = ||a2 ||2 and θ2 =
Arg a21 . If s2 = 0, take w2 = e2 . If a2 = eiθ e1 , then take w2 = (0, a02 )0 . Otherwise, take w2 =

76
(0, e−iθ1 a02 − s2 e01 )0 . Define H2 = In − 2(w2∗ w2 )−1 w2 w2∗ . Then we have
 
r1 F F
H2 H1 A =  0 r2 F
 

0 0 A3

where 


 0 if s2 = 0

r2 = −eiθ2 if a2 = eiθ2 e2


eiθ2 s

otherwise
2

Continuing in this fashion, we can construct Householder matrices H1 , · · · , Hn−1 such that

Hn−1 · · · H1 A = R

where R is upper triangular. Since each Householder matrix is unitary and Hermitian, we have

A = H1 H2 · · · Hn−1 R

(b) Using the construction in part (a), we can write

Hn−1 · · · H1 U = R

where R is upper triangular. Since each Householder matrix is unitary, the left-hand side must be a
unitary matrix. But this implies R is unitary. From Exercise 12, we know that an upper triangular
matrix which is unitary must be diagonal.

(c) Let q1 be the first column of Q and define



e if q1 = e1
n
w1 =
(q 0 − e0 )0 otherwise
1 1

Let H1 = In − 2(w1T w1 )−1 w1 w1T . Then


!
1 0
H1 Q =
0 Q2

This follows from Lemma 2.1.10 and the fact that ||q1 ||2 = 1. But since the second column of H1 A must
have norm equal to 1, we know that ||q2 || = 1 where q2 ∈ Rn−1 is the first column of Q2 . Define

(0, e0 )0 if q2 = e1
n
w2 =
(0, q 0 − e0 )0 otherwise
2 1

77
where e1 , en ∈ Rn−1 and let H2 = In − 2(w2T w2 )−1 w2 w2T . Then
 
1 0 0
H2 H1 Q = 0 1 0
 

0 0 Q3

Continuing in this fashion, we obtain the decomposition


 
1
 .. 
 . 
Hn−1 · · · H1 Q = 



 1 
c

To determine the value of c, take determinants of both sides:

c = (−1)n−1 det Q = ±1

2.1.P27

(1) (1)
If xn = xn−1 = 0, then U (θ1 ; n − 1, n) = In . Thus x(1) = x, so xn = xn−1 = 0. Otherwise, we have

x(1) = U (θ1 ; n − 1, n)x


 
x1

In−2
x xn  .. 
q n−1

 . 
 q 

x2n + x2n−1 2 2

= xn + xn−1  

xn−2 
xn x

q n−1
 
−q
 
  xn−1 
xn + x2n−1 x2n + x2n−1

2
xn
 
x1
 .. 

 . 

xn−2
 
 
 
2 2
x x
 
=
 n−1 + n 
q 

 2 2
xn + xn−1


 
 
 −xn xn−1 + xn−1 xn 
 q 
x2n + x2n−1
 
x1
 ..

 .

x
 
=
 n−2 

q 
 x2 + x2 
 n n−1 

78
(1) (1)
If xn−1 = xn−2 = 0, choose θ2 = 0. Otherwise, pick θ2 such that

(1)
xn−2
cos θ2 = r 2  2
(1) (1)
xn−2 + xn−1
(1)
xn−1
sin θ2 = − r 2  2
(1) (1)
xn−2 + xn−1

(1) (1) (2) (2)


If xn−1 = xn−2 = 0, U (θ2 ; n − 2, n − 1) = In . Thus x(2) = x(1) , so xn−1 = xn−2 = 0. Otherwise, we have

x(2) = U (θ2 ; n − 2, n − 1)x(1)


 
In−3  
(1) (1) x1
xn−2 xn−1
 
  . 
  . 
 . 
r 2  2 r 2  2
(1) (1) (1) (1)


 xn−2 + xn−1 xn−2 + xn−1 
 x 
=   n−3 
(1) (1)   (1) 
 xn−1 xn−2  xn−2 
− r

 2  2 r 2  2   (1) 
 (1) (1) (1) (1)
 x 
 x + x xn−2 + xn−1  n−1 
n−2 n−1
0
 
1
 
x1
 .. 
.
 
 
 

 xn−3 

= r

2 

2 
(1) (1)

 xn−2 + xn−1  
 

 0 

0

(k−1) (k−1)
Continue recursively, defining x(k) = U (θk ; n − k, n − k + 1)x(k−1) where θk = 0 if xn−k+1 = xn−k = 0 and

(k−1)
xn−k
cos θk = r 2  2
(k−1) (k−1)
xn−k+1 + xn−k
(k−1)
xn−k+1
sin θk = − r 2  2
(k−1) (k−1)
xn−k+1 + xn−k

otherwise.

By Theorem 2.1.4(g), multiplication of a vector by a unitary matrix preserves its Euclidean norm. Thus x(k)
and x must have the same Euclidean norm.

79
2.1.P28

(a) Let a1 ∈ Rn denote the first column of A. Using the algorithm in 2.1.P27, construct a sequence of
(1) (1) (1) (1)
plane rotations such that Un−1 · · · U1 a1 = (b11 , 0, · · · , 0)0 where b1 ≥ 0. Define A2 = Un−1 · · · U1 A
and let a2 denote the second column of A2 . Again using the algorithm in 2.1.P27, construct a sequence
(2) (2)
of plane rotations such that Un−2 · · · U1 a2 = (b12 , b22 , · · · , 0)0 where b22 ≥ 0. Notice that the first
(2) (2) (2)
column of Uk equals e1 for k = 1, · · · , n − 2, so Un−2 · · · U1 (b11 , 0, · · · , 0)0 = (b11 , 0, · · · , 0)0 . Thus
(2) (2)
the first two columns of A3 = Un−2 · · · U1 A2 are in upper triangular form with non-negative elements
on the diagonal. Continuing in this fashion for columns!3 to m − 1, we can construct a sequence of
B
plane rotations U1 , · · · , UN such that UN · · · U1 A = is upper triangular with b11 , · · · , bm−1,m−1
0
non-negative.
Pm
(b) The algorithm given in part (a) requires k=1 (n − k) = mn − m(m+1) 2 rotations. The construction in
2.1.P27 illustrates how some of the rotations might be the identity.
n(n−1)
(c) Using the construction in part (a), we can find N = 2 rotation matrices such that UN · · · U1 A = R
where R is upper triangular and r11 , · · · , rn−1,n−1 are non-negative. This implies A = U1T · · · UN
T
R. But
since UkT is a rotation matrix for k = 1, · · · , N , this is the desired decomposition.

2.1.P29

n(n−1)
Using the result from 2.1.P28, we can construct a sequence of N = 2 rotation matrices such that

UN · · · U1 Q = D

where D is upper triangular and d11 , · · · , dn−1,n−1 are non-negative. Since the left-hand side is real orthog-
onal, the right-hand side must be as well. But then by Exercise 12, D must be a diagonal. By 2.1.P4, we
know that D must only have 1s and -1s on its diagonal. But this implies d11 = · · · = dn−1,n−1 = 1. To
determine dnn , take determinants of both sides to obtain det Q = dnn = ±1.

Section 2.2

Exercises

1.

2.2.P1

80
References

81