Академический Документы
Профессиональный Документы
Культура Документы
in Linear Algebra
A Second Course
in Linear Algebra
WILLIAM C. BROWN
Michigan State University
East Lansing, Michigan
A Wiley-lnterscience Publication
JOHN WILEY & SONS
New York Chichester Brisbane • Toronto • Singapore
Copyright © 1988 by John Wiley & Sons, Inc.
All rights reserved. Published simultaneously in Canada.
Reproduction or translation of any part of this work
beyond that permitted by Section 107 or 108 of the
1976 United States Copyright Act without the permission
of the copyright owner is unlawful. Requests for
permission or further information should be addressed to
the Permissions Department, John Wiley & Sons, Inc.
For the past two years, I have been teaching a first-year graduate-level course in
linear algebra and analysis. My basic aim in this course has been to prepare
students for graduate-level work. This book consists mainly of the linear algebra
in my lectures. The topics presented here are those that I feel are most important
for students intending to do advanced work in such areas as algebra, analysis,
topology, and applied mathematics.
Normally, a student interested in mathematics, engineering, or the physical
sciences will take a one-term course in linear algebra, usually at the junior level.
In such a course, a student will first be exposed to the theory of matrices, vector
spaces, determinants, and linear transformations. Often, this is the first place
where a student is required to do a mathematical proof. It has been my
experience that students who have had only one such linear algebra course in
their undergraduate training are ill prepared to do advanced-level work. I have
written this book specifically for those students who will need more linear
algebra than is normally covered in a one-term junior-level course.
This text is aimed at seniors and beginning graduate students who have had
at least one course in linear algebra. The text has been designed for a one-
quarter or semester course at the senior or first-year graduate level. It is assumed
that the reader is familiar with such animals as functions, matrices, determi-
nants, and elementary set theory. The presentation of the material in this text is
deliberately formal, consisting mainly of theorems and proofs, very much in the
spirit of a graduate-level course.
The reader will note that many familiar ideas are discussed in Chapter I.
I urge the reader not to skip this chapter. The topics are familiar, but my
approach, as well as the notation I use, is more sophisticated than a junior-level
VII
Viii PREFACE
WILLIAM C. BROWN
East Lansing, Michigan
Septenther 1987
Contents
References 259
Linear Algebra
In this book, the symbol F will denote an arbitrary field. A field is defined as
follows:
Definition 1.1: A nonempty set F together with two functions (x, y) —* x + y and
(x, y) —* xy from F x F to F is called a field if the following nine axioms are
satisfied:
Fl. x+y=y+xforallx,yeF.
F2. x+(y+z)=(x+y)+zforallx,y,zeF.
Fl There exists a unique element 0€ F such that x + 0 = x for all x eF.
F4. For every x e F, there exists a unique element — x e F such that
x+(—x)=O.
F5. xy = yx for all x, yeF.
F6. x(yz) = (xy)z for all x, y, zeF.
F7. There exists a unique element 1 # 0 in F such that xl = x for all xc F.
F8. For every x 0 in F, there exists a unique ye F such that xy = 1.
F9. x(y + z) = xy + xz for all x, y, zeF.
denote both the set and the two maps satisfying axioms Fl —F9. Although this
procedure is somewhat ambiguous, it causes no confusion in concrete situations.
In our first example below, we introduce some notation that we shall use
throughout the rest of this book.
Example 1.2: We shall let Q denote the set of rational numbers, R, the set of real
numbers, and C, the set of complex numbers. With the usual addition and
multiplication, Q, R, and C are all fields with Q C. J
The fields in Example 1.1 are all infinite in the sense that the cardinal number
attached to the underlying set in question is infinite. Finite fields are very
important in linear algebra as well. Much of coding theory is done over finite
algebraic extensions of the field described in Example 1.3 below.
Example 1.3: Let / denote the set of integers with the usual addition x + y and
multiplication xy inherited from Q. Let p be a positive prime in / and set
= {O, 1,..., p — l}. becomes a (finite) field if we define addition $ and
multiplication modulo p. Thus, for elements x, y e there exist unique
integers k, z €/ such that x + y = kp + z with z We define x $ y to be z.
Similarly, xy = w w c p.
The reader can easily check that (Fr, ®, satisfies axioms Fl —F9. Thus, F1, is
a finite field of cardinality p. J
Except for some results in Section 7, the definitions and theorems in Chapter
I are completely independent of the field F. Hence, we shall assume that F is an
arbitrary field and study vector spaces over F.
Definition 14: A vector space V over F is a nonempty set together with two
cx + /1 from V x V to V (called addition) and (x, cx) —÷ xcx from
functions, (cx, /1) —*
F x V to V (called scalar multiplication), which satisfy the following axioms:
Vi. cx + ji = /1 + cx for all cx, fleV.
V2. cx+(fl+y)=(cx+fl)+yforallcx,fl,yeV.
V3. There exists an element 0eV such that 0 + cx = cx for all cx e V.
V4. For every cx e V, there exists a fle V such that cx + ji = 0.
V5. (xy)cx = x(ycx) for all x, yeF, and cxeV.
V6. x(cx + fi) = xcx + xfl for all xeF, and cx, 13eV.
V7. (x+y)cx=xcx+ycxforallx,yeF,andcxeV.
V8. lcx = cx for all cxeV.
As with fields, we should make the comment that a vector space over F is
—p cx + /3, (x, cx) —' xcx) consisting of a nonempty set V
really a triple (V, (cx, 13)
together with two functions from V x V to V and F x V to V satisfying axioms
Vi —V8. There may be many different ways to endow a given set V with the
DEFINITIONS AND EXAMPLES OF VECTOR SPACES
Example 1.5: Let IkJ = { 1, 2, 3,. . .} denote the set of natural numbers. For each
n e NJ, we have the vector space F" = {(x,,. , x,,) ; e F} consisting of all n-
. .
tuples of elements from F. Vector addition and scalar multiplication are defined
componentwise by (x,,. . , x,,) + (y,,.. . , y,,) = (x, + y,,.. . , x,, +
. and
x(x,,..., = (xx,,..., xx,,). In particular, when n = 1, we see F itself is a
vector space over F. fl
If A and B are two sets, let us denote the set of functions from A to B by BA.
Example 1.6: Let V be a vector space over F and A an arbitrary set. Then the set
V" consisting of all functions from A to V becomes a vector space over F when
we define addition and scalar multiplication pointwise. Thus, if f,g f + g is
the function from A to V defined by (f + g)(a) = f(a) + g(a) for all a cA. For xc F
and fe VA, xf is defined by (xf)(a) = x(f(a)). fl
Note that our choice of notation implies that F" and M, ,,(F) are the same
vector space. Although we now have two different notations for the same vector
space, this redundancy is useful and will cause no confusion in the sequel.
Example 1.8: We shall let F[X] denote the set of all polynomials in an
indeterminate X over F. Thus, a typical element in F[X] is a finite sum of the
form + a,,_,X"' -i-a0. Here neNJ u{O}, and a0,...,a,,eF. The
usual notions of adding two polynomials and multiplying a polynomial by a
4 LINEAR ALGEBRA
constant, which the reader is familiar with from the elementary calculus, make
sense over any field F. These operations give F[X] the structure of a vector
space over F. LI
Many interesting examples of vector spaces come from analysis. Here are
some typical examples.
Example 1.9: Let I be an interval (closed, open, or half open) in IL We shall let
C(I) denote the set of all continuous, real valued functions on I. If k e T%J, we shall
let Ck(I) denote those fe C(I) that are k-times differentiable on the interior of I.
Then C(I) 2 C1(I) 2 C2(I) These sets are all vector spaces over R when
endowed with the usual pointwise addition (f + g)(x) = f(x) + g(x), x €1, and
scalar multiplication (yf)(x) = y(f(x)). J
Example 1.10: Let A = [a1, b1] x x [an, R" be a closed rectangle. We
shall let GR(A) denote the set of all real valued functions on A that are Riemann
integrable. Clearly Gt(A) is a vector space over when addition and scalar
multiplication are defined as in Example 1.9. LI
We conclude our list of examples with a vector space, which we shall study
carefully in Chapter III.
Now suppose V is a vector space over F. One rich source of vector spaces
associated with V is the set of subspaces of V. Recall the following definition:
Theorem 1.14: Let V be a vector space over an infinite field F. Then V cannot be
the union of a finite number of proper subspaces.
(a) For Sc gJ(V), L(S) is the subspace of V consisting of all finite linear
combinations of vectors from S. Thus,
L(S) cx1eS, n o}
= {S x1cxJx1eF,
(b) If then L(S1) c L(S2).
(c) If cxc L(S), then there exists a finite subset 5' c S such that cx eL(S').
(d) S L(S) for all Sc
(e) For every SecY(V), L(L(S)) = L(S).
(f) If fleL(Su{cx}) and then cxeL(Su{j3}). Here cx, /1eV and
Sc
Proof Properties (a)—(e) follow directly from the definition of the linear span.
We prove (f). If /1€ L(S u {cx}) — L(S), then /3 is a finite linear combination of
vectors from S u {cx}. Furthermore, cx must occur with a nonzero coefficient
in any such linear combination. Otherwise, /3 e L(S). Thus, there exist
vectors and nonzero scalars such that
/1 = x1cx1 + + + Since 0, we can write cx as a linear
combination of /3 and Namely,
LI
(1) Complete the details in Example 1.3 and argue ®,) is a field.
g e R[X] and g
(2) Let 0k(X) = {f(x)/g(x) If, 0} denote the set of rational
functions on R. Show that R(X) is a field under the usual definition of
addition fig + h/k = (kf + gh)/gk and multiplication (f/g)(h/k) = th/gk.
R(X) is called the field of rational functions over It Does F(X) make sense
for any field F?
EXERCISES FOR SECTION 1 7
(12) Let V = R3. Show that = (1, 0, 0) is not in the linear span of cx, ji, and y
where cx = (1,1,1), /3 = (0,1,—i), andy = (1,0,2).
(13) If and are subsets of a vector space V, show that
L(51 u S2) = L(S1) + L(S2).
(14) Let S be any subset of R[X] c RR. Show that ex L(S).
(15) Let cx1 = (a11' a12)eF2 for i = 1,2. Show that F2 = L({cx1, cx2}) if and only if
the determinant of the 2 x 2 matrix M = is nonzero. Generalize this
result to P.
(16) Generalize Example 1.8 to n + 1 variables X0,. . . , Xi,. The resulting vector
space over F is called the ring of polynomials in n + 1 variables (over F). It
is denoted F[X0,..., Show that this vector space is spanned by all
monomials as (m0,...,
8 LINEAR ALGEBRA
Before proceeding with the main results of this section, let us recall a few facts
from set theory. If A is any set, we shall denote the cardinality of A by Al. Thus,
A is a finite set if and only if Al c oo. If A is not finite, we shall write Al = CX).
The only fact from cardinal arithmetic that we shall need in this section is the
following:
2.1: Let A and B be sets, and suppose IAI = cc. If for each x cA, we have some
finite set
A proof of 2.1 can be found in any standard text in set theory (e.g., [1]), and,
consequently, we omit it.
A relation R on a set A is any subset of the crossed product A x A. Suppose R
is a relation on a set A. If x, yeA and (x, y)e R, then we shall say x relates to y and
write x y. Thus, x y (x, y) e R. We shall use the notation (A, to indicate
the composite notion of a set A and a relation R c A x A. This notation is a bit
ambiguous since the symbol has no reference to R in it. However, the use of
will always be clear from the context. In fact, the only relation R we shall
systematically exploit in this section is the inclusion relation c among subsets of
[V some vector space over a field F].
A set A is said to be partially ordered if A has a relation R c A x A such that
(1) x x for all xeA, (2) if x y, and y x, then x = y, and (3) if x y, and
y z, then x z. A typical example of a partially ordered set is together
with the relation A B if and only if A B. If (A, is a partially ordered set,
and A1 A, then we say A1 is totally ordered if for any two elements x, ye A1,
we have at least one of the relations x y or y x. If (A, is a partially
ordered set, and A1 c A, then an element x e A is called an upper bound for A1 if
y x for all ye A1. Finally, an element xc (A, is a maximal element of A if
x y implies x = y.
BASES AND DIMENSION 9
We shall not give a proof of Zorn's lemma here. The interested reader may
consult [3, p. 33] for more details.
Now suppose V is an arbitrary vector space over a field F. Let S be a subset of
x1,. . are nonzero scalars in F. Every vector space has a basis. In fact, any
. ,
Proof. Let denote the set of all independent subsets of V that contain S. Thus,
9' = {A e £P(V)I A S and A is linearly independent over F}. We note that
9' # 0 since Se 9'. We partially order 9' by inclusion. Thus, for A1, A2 e 9',
10 LINEAR ALGEBRA
A1 if and only if A1 A2. The fact that ($0, is a partially ordered set is
clear.
Suppose 5 = lie A} is an indexed collection of elements from 6° that
form a totally ordered subset of 9'. We show Y has an upper bound. Set
A= Clearly, AeL?(V), S A, and A A fails to be
linearly independent, then there exists a finite subset oçj A and
nonzero scalars x1,..., e F such that + + = 0. Since 5 is totally
ordered, there exists an index i0 e A such that {x1,..., But then is
dependent, which is impossible since A
9'. A in 9'.
5 was arbitrary, we can now conclude that (9', c) is an inductive set.
Applying 2.2, we see that 9' has a maximal element B. Since Be 9', B S and B
is linearly independent. We claim that B is in fact a basis of V. To prove this
assertion, we need only argue = V. Suppose L(B) V. Then there exists a
vector eV — L(B). Since L(B), the set B u {oc} is clearly linearly independ-
ent. But then B u {x} 6°, and B u is strictly larger than B. This is contrary
to the maximality of B in 9'. Thus, L(B) = V, and B is a basis of V containing
S. El
Example 2.7: The empty set 0 is a basis for the zero subspace (0) of any vector
space V. If we regard a field F as a vector space over itself, then any nonzero
element x of F forms a basis of F. El
Example 2.9: Let V = Mmxn(F). For any i = 1,...,m, andj = 1,...,n, let
denote the m x n matrix whose entries are all zero except for a 1 in the (i j)th
position. Since we see B = 1 C i C m, 1 (j (n} is a basis
I
A specific basis for the vector space Ck(I) in Example 1.9 is hard to write
down. However, since R[X] Ck(I), Theorem 2.6 guarantees that one basis of
contains the monomials 1, X, X2
Theorem 2.6 says that any linearly independent subset of V can be expanded
to a basis of V. There is a companion result, which we shall need in Section 3.
Namely, if some subset S of V spans V, then S contains a basis of V.
BASES AND DIMENSION 11
Theorem 2.11: Let V be a vector space over F, and suppose V = L(S). Then S
contains a basis of V.
for .f in 9'. Thus, (9°, c) is inductive. Applying 2.2, we see that 9' has a
maximal element B.
We claim B is a basis for V. Since BE 9', B c S and B is linearly independent
over F. If L(B) = V, then B is a basis of V, and the proof is complete. Suppose
L(B) V. Then S L(B), for otherwise V = L(S) L(L(B)) = L(B). Hence there
exists a vector /3 e S — L(B). Clearly, B u {fl} is linearly independent over F.
Thus, B u {fl} e b°. But /3 L(B) implies /3 B. Hence, B u {/J} is strictly larger
than B in Since B is maximal, this is a contradiction. Therefore, L(B) = V and
our proof is complete. fl
Theorem 2.12: Let V be a vector space over F, and suppose B1 and B2 are two
bases of V. Then IB1I = 1B21.
Now suppose 1B11 > n (1B11 could be infinite here). By an argument similar
to that given above, we can exchange n vectors of B1 with . . , oç. Thus, we
.
Proof It follows from Theorem 2.6 that any basis of a subspace W of V can be
enlarged to a basis of V. This immediately proves (a) and (b). Suppose W is a
subspace of V. Let B be a basis of W. By Theorem 2.6, there exists a basis C of V
such that Let W'=L(C—B). Since C=Bu(C—B),
V = L(C) = L(B) + L(C — B) = W + W'. Since C is linearly independent and
B n (C — B) = 0, L(B) n L(C — B) = (0). Thus, W n W' = (0), and the proof of
(c) is complete.
To prove (d), let B0 = {x1,..., ocj be a basis of W1 n W2. If W1 n W2 = (0),
then we take B0 to be the empty set 0. We can enlarge B0 to a basis
Bi={txi,...,crn,Pi,...,/Jm} of W1. We can also enlarge B0 to a basis
B2 = {r1,..., oç, yr,..., of W2. Thus, dimW1 n W2 = n, dimW1 = n + m,
anddimW2 = n + p. WeclaimthatB =
basis of W1 + W2. Clearly L(B) = W1 + W2. We need only argue B is linearly
independent. Suppose + + ;y1 = 0 for some
e F. Then e W1 n W2 = L({x1,..., Thus, =
for some e F. Since B2 is a basis of W2, we conclude that z1 = =; = 0.
Since B1 is a basis of W1, x1 = = x,, = y1 = = 0. In particular, B is
linearly independent. Thus, dim(W1 + W2) = IBI = n + m + p, and the proof of
(d) follows. fl
A few comments about Theorem 2.13 are in order here. Part (d) is true
whether V is finite dimesional or not. The proof is the same as that given above
when dim(W1 + W2) c co. If dim(W1 + W2) = oo, then either W1 or W2 is an
infinite-dimensional subspace with the same dimension as W1 + W2. Thus, the
result is still true but rather uninteresting.
If V is not finite dimensional, then (b) is false in general. A simple example
illustrates this point.
Example 2.14: Let V = F[X], and let W be the subspace of V consisting of all
even polynomials. Thus, W = {> e F}. A basis of W is clearly all even
powers of X. Thus, dim V = dim W, but W V. fl
(b)
(c) dim(Wln if and only if for all i= 1,...,k,
+ = V.
Proof Part (a) follows from Theorem 2.13 (d) by induction. Parts (b) and (c) are
easy consequences of(a). We leave the technical details for an exercise at the end
of this section. fl
Before closing this section, let us develop some useful notation concerning
bases. Suppose V is a finite-dimensional vector space over F. If = {x1,..., oç}
is a basis of V, then we have a natural function V -÷ 1(F) defined as
follows.
Proof. Let us denote the ith column of any matrix M by Co14(M). Then
for each i=1,...,n, we have 1, 0,..., O)t
= Co14(M(ö, cx)) = Thus, the theorem is correct for /1 e
Now we have already noted that and [S]! preserve vector addition and
scalar multiplication. So does multiplication by M(3, cx) as a map on
Since any /3 e V is a linear combination of the vectors in we conclude that
M(ö, a)[/1]5 = [fi]! for every 46eV. El
The matrix M(O, a) defined in 2.17 is called the change of basis matrix
(between b and cx). It is often convenient to think of Theorem 2.18 in terms of the
EXERCISES FOR SECTION 2
2.19:
(7) Find the dimension of the subspace V = fi, y, 5}) R4, where
oc=(1,2,1,O),fl=(—1,1,—4,3),y=(2,3,3,—1),and5=(O,1,—1,1).
(8) Compute the following dimensions:
(a) dimR(C).
(b) dim0(R).
(c) dim0(F), where F is the field given in Exercise 3 of Section 1.
16 LINEAR ALGEBRA
(9) Suppose V is an n-dimensional vector space over the finite field Argue
that V is a finite set and find IVI.
(10) Suppose V is a vector space over a field F for which IVI > 2. Show that V
has more than one basis.
(11) Let F be a subfield of the field F'. This means that the operations of
addition and multiplication on F' when restricted to F make F a field.
(a) Show that F' is a vector space over F.
(b) Suppose dimF(F') = n. Let V be an m-dimensional vector space over F'.
Show that V is an mn-dimensional vector space over F.
(12) Show that dim(V") = n dim(V).
(13) Return to the space in Exercise 1. Let = for i = 1,..., r.
Set A = e + 1)x r(hl. Show that the dimension of L({p1,..., Pr}) is
precisely the rank of A.
(14) Show that the dimension of the subspace of homogeneous polynomials of
degree d in F[X0,..., Xj is the binomial coefficient ("p).
(15) Find the dimensions of the vector spaces in Exercises 18 and 19 of Section 1.
(16) Let Ac Mm JF). Set CS(A) = {AX I Xe 1(F)}. CS(A) is called the
column space of A. Set NS(A) = {X e 1(F) I AX = O}. NS(A) is called the
null space of A. Show that CS(A) is a subspace of Mm 1(F), and NS(A) is a
subspace of 1(F). Show that dim(CS(A)) + dim(NS(A)) = n.
(17) With the same notation as in Exercise 16, show the linear system AX = B
has a solution if and only if dim(CS(A)) dim(CS(A I B)). Here
Be Mm 1(F), and (Al B) is the m x (n + 1) augmented matrix obtained
from A by adjoining the column B.
(18) Suppose V and W are two vector spaces over a field F such that IVI = IWI.
Is dim V = dim W?
(19) Consider the set W of 2 x 2 matrices of the form
(x —x
z
(xy
z
Show that W and Y are subspaces of M2 2(F) and compute the numbers
dim(W), dim(Y), dim(W + Y), and dim(W n Y).
LINEAR TRANSFORMATIONS
3. LINEAR TRANSFORMATIONS
Example 3.2: The map that sends every vector in V to OeW is clearly a linear
map. We shall call this map the zero map and denote it by 0. If T: V -÷ W and
S: W —÷ Z are linear transformations, then clearly the composite map ST: V —' Z
is a linear transformation. fl
Example 3.4: Taking the transpose, A -÷ At, is clearly a linear map from
Mm x Mm x Then multiplication by
A (necessarily on the left) induces a linear transformation TA: V -÷ V given by
TA(B) = AB for all BeV. LI
Example 3.3 and 3.5 show that the commutative diagram in 2.19 consists of
linear transformations.
Definition 3.9: Let V and W be vector spaces over F. The set of all linear
transformations from V to W will be denoted by HomF(V, W).
18 LINEAR ALGEBRA
When the base field F is clear from the context, we shall often
write Hom(V, W) instead of HomF(V, W). Thus, Hom(V, W) is the subset of
the vector space W" (Example 1.6) consisting of all linear transformations
from V to W. If T, S e Hom(V, W) and x, ye F, then the function xT + yS e W"
is in fact a linear transformation. For if a, be F and /1eV, then
(xT + + bfl) = xT(ax + hi?) + yS(atx + b/I) = xaT(x) + xbT(/I) + yaS(oc) +
ybS(/I) = a(xT(oc) + + b(xT(/I) + = a(xT + yS)(oc) + b(xT + ySXTh.
Therefore, xT + yS e Hom(V, W). We have proved the following theorem:
Thus, (fl)2 = for all J3e V. We can now state the following theorem,
whose proof is given in Example 3.13:
We now have two isomorphisms V —' , 1(F) and V F" for every
choice of basis of a (finite-dimensional) vector space V. We shall be careful to
distinguish between these two maps although they only differ by an isomorph-
ism from 1(F) to M1 Notationally, F" is easier to write than
and so most of our subsequent theorems will be written using the map ft. With
this in mind, let us reinterpret the commutative diagram given in 2.19.
If A is any n x n matrix with coefficients in F, then A induces a linear
transformation SA: F" —, F" given by the following equation:
3.16:
Using the notation in Example 3.5, we see 5A is the linear transformation that
makes the following diagram commutative:
3.17:
TA
I
—>F"
A' A
F" >F" or
commutative diagram:
3.18:
(.)t
M(b, a)'
F"
3.20:
F"
a)'
Suppose A is any finite set with Al = n. We can without any loss of generality
assume A = {1,.. ., n}. Then yA = There is a natural isomorphism
T:V x x given by where f(i)=tx1 for all
= 1,. . , n. The fact that T is an isomorphism is an easy exercise, which we
.
Theorem 3.22: Let V and W be vector spaces over F, and suppose V is finite
dimensional. Let dim V = n.
Hence, is an isomorphism. El
3.23: Let V and W be vector spaces over F and suppose B = liE A} is a basis
of V. If lie A} is any subset of W, then there exists a unique T e Hom(V, W)
such that T(cx1) = for all i cA. Li
f.11
Now suppose V and W are both finite-dimensional vector spaces over F. Let
dimV=n and dimW=m. If a={ai,...,xn} is a basis of V and
22 LINEAR ALGEBRA
equation:
3.24:
3.26:
F— * Ftm
Proof We need only argue that the diagram in 3.26 is commutative. Using the
LINEAR TRANSFORMATIONS 23
3.27:
T
>w
N
IJXT)
nxl( F'j — >Mmxi(F)
IJXTY
—÷ Ftm
Since all the maps in 3.27 are linear and the bottom square commutes, we
need only check = on a basis of V. Then the top square of
3.27 is commutative, and the commutativity of 3.26 follows. For any e cc, we
have = [T(cxj]p = Col1(JT(cx, /JXT)) = ['(cx, f3)(T)(O,..., 1,..., =
['(cx, El
3.29:
1
['(a', /3')(T) = M(/3, /3)(T)M(a, a')
Proof? Before proving equation 3.29, we note that M(/3, /3') (and M(a, a')) is the
m x m (and n x n) change of basis matrix given in equation 2.17. We have
already noted that change of bases matrices are invertible and consequently all
the terms in equation 3.29 make sense.
To see that 3.29 is in fact a valid equation, we merely combine the
24 LINEAR ALGEBRA
3.30:
)
[.]
/ T
>\'( © M(fJ,,8')
\nr
(
11!'
,d1
Mmxi(F)
f3'XT)
The diagram 3.30 is made up of four parts, which we have labeled ®'
and Rj. By Theorem 2.18, diagrams ® and © are commutative. By
Theorem 3.25, diagrams and ® are commutative. It follows that the entire
diagram 3.30 is commutative. In particular, M($, fJ')JT(cx, fJ)(T) =
cx'). Solving this equation for f(cx', fJ')(T) gives 3.29. El
PAQ
=
Here our notation
(I. 0
k° 0
means PAQ will have the s x s identity matrix, in its upper left-hand corner
and zeros everywhere else.
LINEAR TRANSFORMATIONS 25
3.32:
F(&, fJ')(T)
= (4-]t)
There is another representation problem that naturally arises when consider-
ing Theorem 3.28. Suppose V = W. If is a basis of V, then any T e Hom(V, V) is
represented in terms of by an n x n matrix A = cx)(T). If we change to a
new basis of V, then the representation of T changes to B = JT(cx', cx')(T).
Equation 3.29 implies that B = PAP 1, where P = M(a, a'). Recall that two
n x n matrices A and B are similar if there exists an invertible n x n matrix P
such that B = Thus, different representations of the same
T e Hom(V, V) with respect tO different bases of V are similar matrices.
Now we can ask, What is the simplest representation of T? If we choose any
basis of V and set A = cx)(T), then our question becomes, What is the
simplest matrix B similar to A? That question is not so easy to answer as the
previous equivalence problem. We shall present some solutions to this question
in Chapter III of this book.
Theorem 3.25 implies that dim Hom(V, W) = (dim V)(dim W) when V and W
are finite dimensional. In our next theorem, we gather together some miscella-
neous facts about linear transformations and the dim(S) function.
3.35:
d
—÷V1÷1 V
If a chain complex C has only finitely many nonzero terms, then we can
change notation and write C as
3.36:
d d1
—>V0—*0
It is understood here that all other vector spaces and maps not explicitly
appearing in 3.36 are zero.
d
—V'
LINEAR TRANSFORMATIONS 27
Example 3.38: Let V and W be vector space over F, and let T e Hom(V, W).
Then
3.40:
d1
C:0—+V2
d, d,
> > V0—÷0
Proof The chain complex C can be decomposed into the following short exact
sequences
d,
C1:0—*kerd1--÷V1 >V0-÷0
d2
C2:0-÷kerd2--*V2 )kerd1-÷0
d,
0—*ker > ker —>0
28 LINEAR ALGEBRA
If we now apply Theorem 3.33(c) to each C1 and add the results, we get
(10) Let T e Hom(V, V) be an involution, that is, T2 = 'v• Show that there exists
two subspaces M and N of V such that
(a) M + N = V.
(b) MnN=(0).
(c) T(cx) = for every cxeM.
(d) T(cx) = for every xeN.
In Exercise 10, we assume 2 0 in F. If F = F2, are there subspaces M and
N satisfying (a)—(d)?
(11) Let TeHomF(V, V). If f(X) = + + a1X + a0eF[X]. then
f(T) = + + a1T + e Hom(V, V). Show that dimF V = tfl c CX)
implies there exists a nonzero polynomial f(X)e F[X] such that f(T) = 0.
(12) IfS, T e HomF(V, F) such that Sex) = 0 implies T= xS
for some xeF.
(13) Let W be a subspace of V with m=dimWCdimV=ncoo. Let
Z = {T e Hom(V, V) I = 0 for all e W}. Show that Z is a subspace of
Hom(V, V) and compute its dimension.
(14) Suppose V is a finite-dimensional vector space over F, and let
5, T e HomF(V, V). If ST = show there exists a polynomial f(X)e F[X]
such that S = f(T).
(15) Use two appropriate diagrams as in 3.27 to prove the following theorem:
Let V, W, Z be finite-dimensional vector spaces of dimensions n, m, and p,
respectively. Let at, /1, and y be bases of V, W, and Z. If T e Hom(V, W) and
Sc Hom(W, Z), then F(cx, y)(ST) = flfl, y)(S)f(at, fl)(T).
(16) Suppose
d1 d1
->V1 tV0_+O
and
d'1
Definition 4.1: fl's V1 = {f: A -÷ UI€A V11f is a function with f(i) e for all i e A}.
We can give the set the structure of a vector space (over F) by defining
addition and scalar multiplication pointwise. Thus, if f, g e f + g is
defined by (f + g)(i) = f(i) + g(i). If fe and x e F, then xf is defined by
(xf)(i) = x(f(i)). The fact that V1 is a vector space with these operations is
straightforward. Henceforth, the symbol V1 will denote the vector space
whose underlying set is given in 4.1 and whose vector operations are pointwise
addition and scalar multiplication.
Suppose V = V1 is a product. It is sometimes convenient to identify a
given vector ft V with its set of values {f(i) lie A}. f(i) is called the ith coordinate
off, and we think off as the "A-tuple" (f(i))1€A. Addition and scalar multiplication
in V are given in terms of A-tuples as follows: (f(i))I€A + = (f(i) +
and = (xf(i))IGA. This particular viewpoint is especially fruitful when
A! = n <cc. In this case, we can assume A = {1, 2,. . . , n}. Each feY is then
identified with the n-tuple (f(1),..., f(n)). When Al = n, we shall use the
PRODUCTS AND DIRECT SUMS 31
fo if i=q
if
4.4:
WT>V =
Proof: (a), (b), and (c) follow immediately from the definitions. is surjective
and is injective since = I;. Thus, (d) is clear. As for (e), we need only
argue that T is linear provided is linear for all peA. Let at, fleW and x, ye F.
32 LINEAR ALGEBRA
The map icr: V —> in Definition 4.2(a) is called the pth projection or pth
coordinate map of V. The map 0q Vq —÷ V is often called the qth injection of Vq
into V. These maps can be used to analyze linear transformations to and from
products. We begin first with the case where IAI < cc.
4.10: For any indexing set A, Hom(W, flEa HIGA Hom(W, V1).
Definition 4.11: Let leA V1 = {fe [lisa V1 I 1(i) = 0 except possibly for finitely
many ieA}.
Example 4.12: Let F = R, A = F%J, and V1 = for all ic/i Then the Iki-tuple
11R
(flieN, that is, the function f: R given by f(i) = 1 for all i e N, is a vector
in V = fl
vector space El? leA V1 is called the direct sum of the V1. It is also called the
subdirect product of the V1 and written V1. In this text, we shall consistently
use the notation to indicate the direct sum of the If Al = n < cc,
then we can assume A = {1, 2, . . . , n}. In this case we shall write V1 e e
or $IGAVI.Thus,Vl®"®Vfl, x x
V1, and V1 are all the same space when Al = n c cii
Since E9I€A V1 = our comments after 4.10 imply the following
theorem:
Theorem 4.13: Suppose V = V1 is the direct sum of vector spaces V1. Let B1
be a basis of V1. Then B = is a basis of V. E
Note Tm S = Thus, the subspaces V1, i eA, are indepedent if and only if
E9Ith V1 LEA V1 via S. A simple example of independent subspaces is provided
by Theorem 2.13(c).
Example 4.15: Let V be a vector space over F and W a subspace of V. Let W' be
any complement of W. Then W, W' are independent. The direct sum W e W' is
just the product W x W', and 5: W x W' —, W + W' is given by
S((oc, fl))=x+/3. If(x, fl)ekerS, then cz+fl=O. But WnW'=O. Therefore,
= —fJeWnW' implies = = 0. Thus, S is injective, and W,W' are
CX
independent. U
Proof In statements (b) and (b'), LEAmeans = 0 for all but possibly finitely
many i e A. It is obvious that (b) and (b') are equivalent. So, we argue
(a) (b') (c).
Suppose the V1 are independent. If LEA = 0 with e V1 for all i eA, then
S((cxJIEA) = = 0. Since S is injective, we conclude that = 0 for all i e A.
Thus, (a) implies (b'). Similarly, (b') implies (a).
Suppose we assume (b'). Fix j e A. Let cx e n V1). Then cx =
cx for some e V1. As usual, all the here are zero
except possibly for finitely many indices i j. Thus, 0 = + (— cx1
Theorem 4.20: Let V be a vector space over F, and suppose {P1,..., is a set
of pairwise orthogonal idempotents in t(V) such that P1 + + P,, = 1. Let
ImP1. Then V= V1
EXERCISES FOR SECTION 4 37
(1) Let B = lie i4 be a basis of V. Show that V is the internal direct sum of
{F31IieA}.
(2) Show HomF(eIEA V1, W) HomF(VI, W).
(3) Give a careful proof of Theorem 4.3(f).
(4) LetV = V1 x x Vi,, and for each i = l,...,n, setT1 = Show that
{T1,. . ., is a set of pairwise orthogonal idempotents in t(V) whose
sum is 1.
(5) Let V = V1 x x Show that V has a collection of subspaces
such that V=
a combined version of Corollaries 4.6 and 4.8 by showing directly that
c&:HomF(Vl x x W1 x x
given by çb(T) = n,j1 m is an isomorphism.
(7) Suppose V = V1 eLet Te Hom(V, V) such that T(V1) V1 for
all i = 1,..., n. Find a basis of V such that
=
where M1 describes the action of T on V1
(8) IfX,Y,Zaresubspaces ofVsuch thatX$Y=X$Z=V,isY=Z?Is
Z?
(9) Find three subspaces V1, V V
properties:
5.3: (a) x et
(b)
(c) For any x, yeA, either 5c = y or = 4).
(d) A =
The proofs of the statements in 5.3 are all easy consequences of the
definitions. If we examine Example_5.2 again, we see 7L is the disjoint union of the
p equivalence classes U, I,.. ., p — 1. It follows from 5.3(c) and 5.3(d) that any
equivalence relation on a set A divides A into a disjoint union of equivalence
classes. The reader probably has noted that the equivalence classes {U,
T,..., p — 1
} of 1 inherit an addition and multiplication from 1 and form the
field discussed in Example 1.3. This is a common phenomenon in algebra. The
set of equivalence classes on a set A often inherits some algebraic operations
from A itself. This type of inheritance of algebraic structure is particularly
fruitful in the study of vector spaces.
Let V be a vector space over a field F, and suppose W is a subspace of V. The
subspace W determines an equivalence relation on V defined as follows:
5.4:
if cz—fleW
Let us check that the relation defined in 5.4 is reflexive, symmetric, and
transitive. Clearly, cx cx. fi, then cx — fleW. Since W is a subspace,
If cx
fi— cxeW. Therefore, ft cx. Suppose cx ft and ft y. Then cx — fi, ft — yeW.
Again, since W is a subspace, cx — y = (cx — fi) + (ft — y) e W, and, thus cx y. So,
indeed is an equivalence relation on V. The reader should realize that the
equivalence relation depends on the subspace W. We have deliberately
suppressed any reference to W in the symbol to simplify notation. This will
cause no confusion in the sequel.
Thus, & = {fleVlfl z} and V/W = {&lcceV}. Note that the elements in
V/W are subsets of V. Hence V/W consists of a collection of elements from £3°(V).
Definition 5.7: The set of all affine subspaces of V will be denoted d(V).
Theorem 5.8: Let V be a vector space over F, and let d(V) denote the set of all
affine subspaces of V.
Definition 5.9: Let V and V' be two vector spaces over a field F. A function
f: V -÷ V' is called an affine transformation if f = for some Te HomF(V, V')
and some cc cV'. The set of all affine transformations from V to V' will be
denoted AffF(V, V').
&-i-/J=cc+fJ
In equation 5.11, cc and /3 are vectors in V and & and /3 are their corresponding
equivalence classes. & -I- if is defined to be the equivalence class that contains
cc + /3. We note that our definition of & -1- 5 depends only on the equivalence
classes & and /3 and not on the particular elements cc e & and /3 e /3 (used to form
the right-hand side of 5.11). To see this, suppose cc
5.12:
= xcc
42 LINEAR ALGEBRA
In equation 5.12, xe F and & e V/W. Again we observe that if e &, then
xx1 = xz. Thus (X, &) —* xx is a well-defined function from F x V/W to V/W. The
reader can easily check that scalar multiplication satisfies axioms V5—V8 in
Definition 1.4. Thus, (V/W,(&, fi) -+ & + fi, (x, &) -÷ x&) is a vector space over F.
We shall refer to this vector space in the future as simply V/W.
Equations 5.11 and 5.12 imply that the natural map H: V V/W given by
H(z) = & is a linear transformation. Clearly, H is surjective and has kernel W.
Thus, if i: W —' V denotes the inclusion of W into V, then we have the following
short exact sequence:
II
O-*W____ ->V/W-*O
We shall finish this section on quotients with three theorems that are
collectively known as the isomorphism theorems. These theorems appear in
various forms all over mathematics and are very useful.
Theorem 5.15 (First Isomorphism Theorem): Let T e HomF(V, V'), and suppose
W is a subspace of V for which T(W) = 0. Let fl: V —* V/W be the natural map.
Then there exists a unique I e HomF(V/W, V') such that the following diagram
commutes:
5.16:
v/w
Proof? We define I by T(&) = T(cz). Again, we remaind the reader that & is a
subset of V containing cx. To ensure that our definition of I makes sense, we
must argue that T(cz1) = T(x) for any If e&, then; — oceW. Since T
is zero on W, we get T(x1) = Thus, our definition of I(&) depends only
on the coset & and not on any particular representative of a. Since
T(x& + yfl) = T(xx + yfl) = T(xcz + y/J) = xT(x) + yT(/3) = XT(ä) + yT(/3), we
QUOTIENT SPACES AND THE ISOMORPHISM THEOREMS 43
see I e Hom(V/W, V'). TH(cz) = 1(ä) = T(x) and so 5.16 commutes. Only the
uniqueness of I remains to be proved.
If T' e Hom(V/W, V') is another map for which T'H = T, then I = T' on Im
H. But H is surjective. Therefore, I = T'. fl
Corollary 5.17: Suppose T e V'). Then Tm T V/ker T.
Proof We can view T as a surjective, linear transformation from V to Tm T.
Applying Theorem 5.15, we get a unique linear transformation
I: V/ker T Tm T for which the following diagram is commutative:
T
V >ImT
V/ker T
Proof Let H: V -÷ V/W and H': V/W —* (V/W)/H(W') be the natural pro-
jections. Set T = H'H: V (V/W)/H(W'). Since H and H' are both surjective,
T is a surjective, linear transformation. Clearly, W' c ker T. Let e ker T.
Then 0 = H'H(x). Thus, & = H(cz)e H(W'). Let fleW' such that H(J3) = H(cz).
Then H(fl—x)=0. Thus, fl—rxekerH=WcW'. In particular, zeW'.
We have now proved that ker T = W'. Applying Corollary 5.17, we have
(V/W)/H(W') = Tm T V/ker T = V/W'. LI
Theorem 5.20 (Third Isomorphism Theorem): Suppose W and W' are sub-
spaces of V. Then (W + W')/W W'/(W n W').
44 LINEAR ALGEBRA
(2) Let T e Hom(V, V) and suppose T(tx) = cx for all ci 8W, a subspace of V.
(a) Show that T induces a map Sc Hom(V/W, V/W).
(b) If S is the identity map on V/W, show that R = T — 'v has the property
that R2 = 0.
(c) Conversely, suppose T = 1v + R with Re Hom(V, V) and R2 = 0.
Show that there exists a subspace W of V such that T is the identity on
W and the induced map S is the identity on V/W.
(3) A subspace W of V is said to have finite codimension n if dim V/W = n. If
W has finite codimension, we write codim W c cc. Show that if W1 and
W2 have finite codimension in V, then so does W1 n W2. Show
codim(W1 n W2) codim W1 + codim W2.
(4) In Exercise 3, suppose V is finite dimensional and codim W1 = codim W2.
Show that dim(W1/W1 n W2) = dim(W2/W1 n W2).
(5) Let T e Hom(V, V'), and suppose T is surjective. Set K = ker T. Show there
exists a one-to-one, inclusion-preserving correspondence between the
subspaces of V' and the subspaces of V containing K.
(6) Let Te Hom(V, V'), and let K = ker T. Show that all vectors of V that have
the same image under T belong to the same coset of V/K.
(7) Suppose W is a finite-dimensional subspace of V such that V/W is finite
dimensional. Show V must be finite dimensional.
EXERCISES FOR SECTION 5 45
(11) Suppose C = {(V1, d1) lie Z} is a chain complex. For each i €7, set
= ker d1/Im H1(C) is called the ith homology of C.
(a) Show that C is exact if and only if H1(C) = 0 for all i eZ.
(b) Let C={(V1,dJlieZ} and C'= be two chain
complexes. Show that any chain map T = C —+ C' induces a
linear transformation T1: H1(C) H1(C') such that 11(cc + Im
d
—>V1--÷0
is a finite chain complex. Show that 1)' dim H1(C) =
— i)i dim V1. Here each V1 is assumed finite dimensional.
(13) Use Exercise 12 to' prove the following assertion: Let = cc1 + W1 and
= cc2 + W2 be two cosets of dimension k [i.e., dim(W1) = k]. Show that
and are parallel (i.e., W1 = W2) if and only if and are contained
in a coset of dimension k + 1, and have empty intersection.
(14) In IV, show that the intersection of two nonparallel planes (i.e., cosets of
dimension 2) is a line (i.e., a coset of dimension 1). The same problem makes
sense in any three-dimensional vector space V.
(15) Let and S3 be planes in IV such that n n S3 = 0, but no two
are parallel. Show that the lines n n S3 and n S3 are parallel.
(16) LetW={pflpeR[X]}. Show
that W is a subspace of Show that dim(R[X]/W) = n.(Hint: Use the
division algorithm in
(17) In Theorem 5.15, if T is surjective and W = ker T, then T is an isomorph-
ism [prove!]. In particular, S = (T)-'is a well-defined map from V' to
V/W. Show that the process of indefinite integration is an example of such
a map S.
46 LINEAR ALGEBRA
cxt(x1cz1 + ... + = x1
Example 6.4: Let V = eF, that is, V is the direct sum of the vector spaces
= Fli e N}. It follows from Exercise 2 of Section 4 that V* =
F, F) flit' HomF(F, F) F. From Theorem 4.13, we
know that dim V = NI. A simple counting exercise will convince the reader
that dim V* = dim(flft1 F) is strictly larger than IN. LI
Definition 6.5: Let V, V', and W be vector spaces over F, and let w: V x V' W
be a function. We call w a bilinear map if for all cV, w(; ) e HomF(V', W) and
for all ,6e V', co(, fJ)e HomF(V, W).
6.6:
w(x, T) = T(z)
In equation 6.6, e V and T e V*. The fact that w is a bilinear map is ob-
vious. w determines a natural, injective, linear transformation i/i: V —. V**
in the following way. If eV, set ç&(z) = w(cx, ). Thus, for any T e
= @(cz, T) = T(cz). If X, ye F, and T e V*, then
/3 e V
+ yfl)(T) = w(xcz + y/J, T) = xco(cz, T) + yw(fl, T) = (xI/ar(z) +
i/i(xx
Consequently, if, e HomF(V, V**). To see that ç& is injective, we need to
generalize equation 6.3. Suppose = lie A} is a basis of V (finite or infinite).
Then for every i e A, we can define a dual transformation czr e V* as follows: For
each nonzero vector x e V, there exists a unique finite subset
Ii if i=j
if
Now if e ker then T(x) = 0 for all T e W. In particular, = 0 for all i eA.
This clearly implies = 0, and, thus, is injective.
We note in passing that the set = lie A} c which we have just
constructed above, is clearly linearly independent over F. If dim V < cc, this is
just the dual basis of V" coming from If dim V = cc, then does not span
and, therefore, cannot be called a dual basis. At any rate, we have proved the
first part of the following theorem:
Theorem 6.7: Let V be a vector space over F and suppose co: V x F is the
bilinear map given in equation 6.6. Then the map i/i: V —+ given by
i/i(z) = co(z,') is an injective linear transformation. If dim V <cc, then ç& is a
natural isomorphism.
48 LINEAR ALGEBRA
Definition 6.8: If A is any subset of V, let A' = {fleV* w(x, /1) = 0 for all cxeA}.
Thus, A' is precisely the set of all vectors in that vanish on A. It is easy to
see that A' is in fact a subspace of V*. We have a similar definition for subsets of
Definition 6.9: If A is a subset of Let A' = {cz e V I co(x, /1) = 0 for all fle A}.
Thus, if A V*, then A' is the set of all vectors in V that are zero under the
maps in A. Clearly, A' is a subspace of V for any A
0 = = (E?= i = = Thus, 11 = 1 e
S
If T e W), then T determines a linear transformation
T* e HomF(W*, Vt), which we call the adjoint of T.
Definition 6.11: Let T e W). Then T* e Homr(W*, V*) is the linear
transformation defined by T*(f) = IT for all fe
W the linear transformations f and T is
again a linear map from V to F, we see T*(f) e V*. If x, ye F and f1, f2 e then
T*(xf1 + yf2) = (xf1 + yf2)T = x(f1T) + y(f2T) = xT*(f1) + yT*(f2). Thus, T* is a
linear transformation from
W The T* from
T —.
—. Homr(W*,V*) is an injective transformation. If V and W are
finite dimensional, then this map is an isomorphism.
Proof Let x: Hom(V, W) —' Hom(W*, V*) be defined by x(T) = T*. Our
comments above imply x is a well-defined function. Suppose x,y e F,
T1, T2 e Hom(V, W), and fe Then x(xT1 + yT2)(f) = (xT1 + yT2)*(f) =
gxT1 + yT2) = xQT1) + yQT2) = xTt(f) + = (xTt + yTfl(f) =
(xx(T1) + Thus, x(xT1 + yT2) xx(T1) + and x is a linear
transformation.
Suppose T e ker x. Then for every fe W*, 0 = x(T)(f) = T*(f) = IT. Now if we
follow the same argument given in the proof of Theorem 6.7, we know that if fi is
a nonzero vector in W, then there exists an fe W* such that f(fl) 0. Thus,
11' = 0 for all fe W* implies Im T = (0). Therefore, T = 0, and x is injective.
Now suppose V and W are finite dimensional. Then Theorems 6.2 and 3.25
imply W)} = dim{HomF(W*, V*)}. Since x is injective, Theorem
3.33(b) implies x is an isomorphism. 5
6.13:
(ST)* = T*S*
Proof (a) Let e(Im and suppose co: V x —' F is the bilinear map
defined in equation 6.6. Then o(rz, Im Ti = 0. Thus, for all fe
o = o(cz, T*(f)) = fT) = fT(cz) = gT(x)). But we have seen that
gT(tx)) = 0 for all feWt implies T(x) = 0. Thus, e ker T. Conversely,
if e ker T, then 0 = f(T(cz)) = w(x, Tt(f)) and e(Im T*)±.
(b) Suppose fe ker T*. Then 0 = T*(f) = fT. In particular, =
O for all e V. Therefore, 0 = f) and fe(Im T)'. Thus,
ker T* (Im T)-'-. The steps in this proof are easily reversed and so
fl
Theorem 6.14 has an interesting corollary. If T e HomF(V, W), let us define
the rank of T, rk{T}, to be dim(Im T). Thus, rk{T} = dim(Im T). Then we have
the following:
Corollary 6.15: Let V and W be finite-dimensional vector spaces over F, and let
TeHomF(V,W). Then rk{T} = rk{T*}.
6.17:
f(fJ*, = fl)(T))t
r(p, a*XT*)
Mmxi(F) -
The transpose of A is the n X m matrix At = (bpq), where bpq = aqp for all
p = 1,. . , n, and q = 1,..., m. It follows from 3.24 that f(f3*,
. = At
provided that the following equation is true:
forall q=1,...,m
Fix q = 1,..., m. To show that T*(f3) and are the same vector in
it suffices to show that these two maps agree on the basis of V. For any
r = 1,..., n, (T*(f3))(;) = 13'(T@r)) = = >JtI a1fi($1) = aqr. On
the other hand, i = i = brq = aqr. Thus, equation
6.18 is established, and the proof of Theorem 6.16 is complete. E
(2) Let V and W be finite-dimensional vector spaces over F with bases and fi,
respectively. Suppose T HomF(V, W). Show that rk{T} = rk(F(x, f3)(T)).
(3) Let 0 fJeV and feV* — (0). Define T: V —* V by T(cz) = f(cz)fi A func-
tion defined in this way is called a dyad.
(a) Show T e Hom(V, V) such that dim(Im T) = 1.
(b) If 5€ Hom(V, V) such that dim(Im 5) = 1, show that S is a dyad.
(c) If T is a dyad on V, show that Tt is a dyad on V*.
(4) Let V and W be finite-dimensional vector spaces over F. Let V -÷ V**
and W —* W** be the isomorphisms given in Theorem 6.7. Show that
for every T e Hom(V, W) the following diagram is commutative:
T**
V** )
52 LINEAR ALGEBRA
(5)
Let A = {f1, . , . fj
c V* and suppose g e V* such that g vanishes on A'.
.
Show g e MA). [Hint: First assume dim(V) < co; then use Exercise 3 of
Section 5 for the general case.]
(7) Let V and W be finite-dimensional vector spaces over F, and let
ox V x W —* F be an arbitrary bilinear map. Let T: V —' W* and
5: W -÷ V* be defined from ca as follows: T(cx)(/1) = co(cx, /3) and
s(fl)(cx) = co(cx, /3). Show that S = T* if we identify W with W** via
(8) Show that WI.
(9) Let V be a finite-dimensional vector space over F. Let W = V ® V*. Show
that the map (cx,f1) —' (/3, cx) is an isomorphism between W and W*.
(10) If
5 T
O-.V —>W
T* 5*
is exact.
(11) Let {W1 lie Z} be a sequence of vector spaces over F. Suppose for each
i eZ, we have a linear transformation e1 e HomF(WI, Then
D = {(W1, e1) lie 7L} is called a cochain complex if e1 + = 0 for all i eZ. D
is said to be exact if Ime1 = for all ie7L.
(a) If C = {(C1, d3liel} is a chain complex, show that C* =
{(Cr, e1 = lie Z} is a cochain complex.
(b) If C is exact, show that C* is also exact.
(12) Prove
V be a finite-dimensional vector space over F with basis =
Define by Show that
T*(f)
= (f)? for all fe V*. Here you will need to identify with in a
natural way.
(14) Let {z1}t0 be a sequence of complex numbers. Define a map T: C[X] -. C
by =0 akX9 = >3 = 0 akzk. Show that T e(C[X])*. Show that every
Te(C[X])* is given by such a sequence.
(15) Let V = R[X]. Which of the following functions on V are elements in V*:
(a) T(p) = Sb p(X) dx.
(b) T(p) = Sb p(X)2 dx.
SYMMETRIC BILINEAR FORMS 53
Example 7.2: The standard example to keep in mind here is the form
xj, (y1,..., yj) = x1y1. Clearly, o is a symmetric, bilinear form
onF". J
Suppose cv is a bilinear form on a finite-dimensional vector space V. Then for
every basis = {oc1,..., ci,,} of V, we can define an n x n matrix
M(w, ci)E whose (i,j)th entry is given by {M(co, = cii). In terms
of the usual coordinate map V —, M,, 1(F), cv is then given by the following
equation:
7.3:
co(/3, ö) =
54 LINEAR ALGEBRA
Thus, we have established the result for all vector spaces of dimension I
over It
Suppose n> 1, and we have proved the theorem for any vector space over R
of dimension less than n. Since cv is symmetric, we have
7.7:
forall
SYMMETRIC BILINEAR FORMS 55
1 for i=1,...,p
q(ocj= —1 for i=p+1,...,p+m
0 for i=p+m+1,...,p+m+r
The vector space V then decomposes into the direct sum V = V1 ® V0 V1,
where and V1=
L({cz1,..., ;}).
Our quadratic form q is positive on V1 — (0), zero on V0, and negative on
V_1 — (0). For example, suppose fleV_1 — (0). Then /3 = +
for some xj,...,xmeF. Thus,
Since /3 0, some x1 is nonzero. Since = — 1 for all i = 1,. . . , m, we see
q(/3) <0.
The subspaces V1, V0, and are pairwise co-orthogonal in the sense that
co(V1, = 0 whenever i,j e { —1,0, 1} and i j. Thus, any co-orthonormal basis
of V decomposes V into a direct sum V = V_1 El? V0 ® V1 of pairwise co-
orthogonal subspaces The sign of the associated quadratic form q is constant
56 LINEAR ALGEBRA
on each — (0). An important fact here is that the dimensions of these three
subspaces, p, m, and r, depend only on cv and not on the particular cv-
orthonormal basis cx chosen.
Proof? W - is the subspace of V spanned by those for which q(fl1) = —1. Let
cxeW_1 n(V0 + V1). If cx 0, then <0 since cxeW_1. But cxeV0 + V1
implies q(cx) 0, which is impossible. Thus, ci = 0. So, W_1 n (V0 + V1) = (0).
By expanding the basis of W_1 if need be, we can then construct a subspace P of
V such that W_1 P, and P®(VO + V1) = V. Thus, from Theorem 4.9, we
have dim(W_ dim P = dim V — dim V0 — dim V1 = dim(V_ 1). Therefore,
dim(W_ C dim V_1. Reversing the roles of the W1 and V1 in this proof gives
dim(V_ dim(W_ Thus, dim(W_ = dim(V_ A similar proof shows
dim(W1) = dim(V1). Then dim(W0) = dim(V0) by Theorem 4.9. This completes
the proof of Lemma 7.9. D
Example 7.15: Let V = C([a, b]). Define o(f, g) = f(x)g(x) dx. Clearly, o is an
inner product on V. El
(1) In our proof of Lemma 7.9, we used the following fact: If W and W' are
subspaces of V such that W n W' = (0), then there exists a complement of
W' that contains W. Give a proof of this fact.
(2) I2tV = Mmxn(F),andletCeMmxm(F).DefinealllapCtXV x V—'Fbythe
formula o(A, B) = Tr(AtCB). Show that o is a bilinear form. Is o
symmetric?
(3) Let Define a map o:VxV-.F by o(A,B)=nTr(AB)
— Tr(A) Tr(B). Show that o is a bilinear form. Is o symmetric?
(4) Exhibit a bilinear form on R" that is not symmetric.
(5) Find a symmetric bilinear form on C" whose associated quadratic form is
positive definite.
(6) Describe explicitly all symmetric bilinear forms on BV.
(7) Describe explicitly all skew-symmetric bilinear froms on DV. A bilinear
form o is skew-symmetric if o(x, fi) = — o(fl, 4
(8) Let ox V x V —. F be a bilinear form on a finite dimensional vector space V.
Show that the following conditions are equivalent:
(a) {xeVlo(; fi) = 0 for all fleV} =(0).
(b) {xeVIo($,oc)=OforallfleV} =(0).
(c) M(o, isnonsingular for any basis of V.
We say o is nondegenerate if o satisfies the conditions listed above.
(9) Suppose o: V X V —. F is a nondegenerate, bilinear form on a finite-
dimensional vector space V. Let W be a subspace of V. Set
W'={czeVIo(oc,fl)=Oforall$eW}.ShowthatV=WEBW'.
58 LINEAR ALGEBRA
(10) With the same hypotheses as in Exercise 9, suppose fe V*. Prove that there
exists an e V such that f(fl) = @(x, /3) for all /3eV.
(11) Suppose cv: V x V -. F is a bilinear form on V. Let W1 and W2 be
subspaces of V. Show that (W1 + W2)' = Wt n W±, If cv is nondegen-
erate, prove that (W1 n W2)' = Wf +
(12) Let cv be a nondegenerate, bilinear form on a finite-dimensional vector
space V. Let cv' be any bilinear form on V. Show there exists a unique
Te V) such that co'(oc, /3) = cv(T(cz), /3) for all x, 13€ V. Show that cv'
is nondegenerate if and only if T is bijective.
(13) With the same hypotheses as in Exercise 12, show that for every
T e HomF(V, V) there exists a unique T' e V) such that
cv(T(à), /3) = cv(cz, T'(/J)) for all at, /3eV.
(14) Let Bil(V) denote the set of all bilinear forms on the vector space V. Define
addition in Bil(V) by (cv+ co%x, /3) = cv(at, /3) + cv'(at, /3), and scalar mult-
iplication by (xcv)(cz, /3) = xcv(at, /3). Prove that Bil(V) is a vector space over
F with these definitions. What is the dimension of Bil(V) when V is finite
dimensional?
(15) Find an co-orthonormal basis for R2 when cv is given by cv((x1, y1),
(x2, y2)) = x1y2 + x2y1.
(18) Let V be the subspace of C([—ir, it]) spanned by the functions 1, sin(x),
cos(x), sin(2x), cos(2x),..., sin(nx), cos(nx). Find an co-orthonormal basis of
V where cv is the inner product given in Exercise 16.
Chapter II
Multilinear Algebra
an n-tuple (x1,. .., aj with a1 e V1. Thus, we can think of 4) as a function of the n
variable vectors a1,..., oç1.
59
60 MULTILINEAR ALGEBRA
We can use this idea along with Example 1.5 above to give a few familiar
examples from analysis.
function that sends a given fe to its derivative f'. Thus, D(f) = f'. Clearly,
Example 1.7: Let [a, b] be a closed interval in k and consider C([a, b]). Clearly,
C([a, b]) is an k-algebra under the same pointwise operations given in Example
1.6. We can define a multilinear, real valued function c/i: C([a, —÷ R by
MULTILINEAR MAPS AND TENSOR PRODUCTS 61
1.9: Let V1,. .., be vector spaces over F. Is there is a vector space V (over F)
and a multilinear mapping 4): V1 x x V such that if
ci,: V1 x x —> W is any multilinear mapping on V1 x x then there
exists a unique linear transformation T e HomF(V, W) with TØ =
62 MULTILIN EAR ALGEBRA
1.10:
v1x...xVn
Notice that a solution to 1.9 consists of a vector space V and a multilinear map
4': V1 x x —> V. The pair(V, 4)) must satisfy the following property: If W is
then there must exist a unique, linear transformation T: V —÷ W such that 1.10
commutes.
Before constructing a pair (V, 4)) satisfying the properties in 1.9, let us make
the observation that any such pair is essentially unique up to isomorphism. To
be more precise, we have the following lemma:
Lemma 1.11: Suppose (V, 4') and (V', 4)') are two solutions to 1.9. Then there
exist two isomorphisms T1 e HomF(V, V') and T2 e HomF(V', V) such that
1.12:
V1x...xVn V
T1
Proof? Since (V, 4)) is a solution to 1.9 and 4": V1 x x —' V' is a multi-
linear map, there exists a unique T1 eHomF(V, V') such that T14' = 4".
Similarly, there exists a unique T2 e V) such that T24)' = 4. Putting
MULTILINEAR MAPS AND TENSOR PRODUCTS 63
vIx...xvn
is commutative. Now in diagram 1.13, we can replace T2T1 with L, the identity
map on V. Clearly, the diagram stays commutative. Since (V, 0) satisfies 1.9,
there can be only one linear transformation from V to V making 1.13
commutative. We conclude that T2T1 = L,,. Similarly, T1T2 = and the proof
of the lemma is complete. U
Thus, if we find any solution to 1.9, then up to isomorphism we have found
them all. We now turn to the matter of constructing a solution. We need to recall
a few facts about direct sums.
Suppose A is a nonempty set. Then we can construct a vector space U over
F and a bijective map p: A —> U such that p(A) is a basis of U. To see this, set
U= F. Thus, U is the direct sum of Al copies of F. For each i eA, let be
the vector in U defined by = 0 ifj i, and ö1(i) = 1. It follows from Theorem
4.13 in Chapter I that B = {ö1li E A} is a basis of U. The map 'p: A —> B given by
qi(i) = is clearly bijective.
Now suppose A itself is a vector space over F. Then in U = F, we have
vectors of the form ö(j1+...+ij — — — and —
x e F. We shall employ these ideas in the construction of a solution to 1.9.
Let V1,..., Vn be vector spaces over F, and, for notational convenience, set
Z= V1 X x A typical element in the set Z is an n-tuple of the form
(ott,..., with eV1. Set U = ajeZ F. Thus, U is the direct sum of IZI
copies of F. As we observed above, U has a basis of the form
eQ V1 x x Vj. Let U0 be the subspace of U spanned
by all possible vectors of the following two types:
1.14:
and
= (ö(cx1 + + U0
= (ô(; + U0) + (kxi a) + U0)
Also,
= xa1 aj ÷ U0 = + Uo
= X(ö(a1 + U0)
Definition 1.16: The vector space U/U0 is called the tensor product of V1,...,
(over F) and will henceforth be denoted V1 ®F
When the field F is clear from the context, we shall drop it from our notation
and simply write V1 ® ® for the tensor product of V1,. . , .
MULTILINEAR MAPS AND TENSOR PRODUCTS 65
and
Theorem 1.20: Let V1,..., Vn be vector spaces over F, and suppose B1,..., Bn
are bases of V1,.. ., respectively. Then B = ®... ® e B1} is a basis
Proof We prove this theorem by using Lemma 1.11. Consider the set
B1 X X Bn={(fli,...,/JJIPjEB1}. Let V'= $qi1 We
have seen from our previous discussion that V' is a vector space over F
with basis ,fJjeB1 x
.
x Bn}. We define a function
.
x x by Now B1 X X Bnc
\"j X X Vn and each is the linear span of It follows that there
exists a unique multilinear function 4": V1 x x Vn —÷ V' such that
ø'(fli,...,fln)4'o(Pi,...,Pn)forall(fli,...,fln)GB1 X X Bn.
66 MULTILINEAR ALGEBRA
We claim that (V', 4?) satisfies 1.9. To see this, let i/i: V1 x x —, W be an
arbitrary multilinear mapping. Since 1 . , fin) e B1 x x Bn} is a
basis of V', it follows from 3.23 of Chapter I that there exists a unique linear
. , fij
transformation T: V' -÷ W such that T(ô(p1 /1)) = for all
(fir,. x x Bn. Then Tq5' = 1/i and clearly T is the unique linear
transformation for which this happens.
We now have two pairs (V', 4?) and (V1 ® ... ® 4) satisfying 1.9. Hence,
Lemma 1.11 implies there exists an isomorphism SeHomF(V', %T1 ® ... ® Vj
such that
v1x...xVn
V1®...®Vn
Example 1.24: Let V be any vector space over F, and consider the two algebras
A1 = F[X] and A2 = t(V). Then every T e t(V) determines an algebra homo-
morphism A1 —÷ A2 defined as follows:
(1) Complete the details of Example 1.4, that is, argue cz11) = is
a multilinear mapping.
(2) In the proof of Theorem 1.20, we used the following fact: if B1 is a basis of
and B1 x - - x —÷ V' is a set map, then 4o has a unique extension to
jttiiBHIaiqB
\apiBL"IapqB
(F if n=O
if n=1
if
(14) Let F be a field. We regard F as an algebra over F. Show that any algebra
homomorphism p: F —> F is either zero or an isomorphism.
(15) Suppose Ae is nonsingular. Define a map on by the
equation (p(B) = A 'BA. Show that p is an algebra homomorphism from
to that is bijective.
(16) With the same notation as in Exercise 15, suppose the map
-÷ given by = AB is an algebra homomorphism.
What can you say about A in this case?
Theorem 2.1: Let V1 ,..., and W1, ..., Wm be vector spaces over F. Then
there exists an isomorphism T: (V1 ® ® Vj ® (W1 ®... ® Wm) '
FUNCTORIAL PROPERTIES OF TENSOR PRODUCTS 69
such that
= a1 ®Pm.Her = 1,...,nand
j=1,...,m.
The proofs of the theorems in this section can usually be done in two different
ways. We can appeal to Lemma 1.11 or use Theorem 1.20. We shall present a
mixture of both types of proof here. Since we are dealing with vector spaces, we
could prove every theorem in this section by using Theorem 1.20. The advantage
to proceeding via Lemma 1.11 (i.e., a basis-free proof) is that this type of proof is
valid in more general situations (e.g., modules over commutative rings).
(a) F1 = x x is a basis of
V1®...®Vn.
(b) F2 = {f11 ® .,fl,JeC1 X X Cm} is a basis of
OCR,
Let us say a word about the proof of Theorem 2.1 via Lemma 1.11. There
is a natural multilinear mapping Vi: V1 x x x W1 x x Wm
given by
® ® ®... ® We could then argue that the pair
((V1 ®... ® Vj ® (W1 ®... ® Wm), i/') satisfies the universal mapping
property given in 1.9. Lemma 1.11 would then imply
via a
linear transformation T for which T((a1 ®... ® ® (/3 ®... ® /3)) =
®®®®® We ask the reader to provide the details of this
proof in the exercises at the end of this section.
There is a special case of Theorem 2.1 that is worth noting explicitly.
(h)
Proof? (c)—(h) are all straightforward and are left to the reader. We prove (a) and
2.7:
(k1 kjeA1 x x
In equation 2.7, every ck1 e F and all but possibly finitely many of
these scalars are zero. If we now apply T1 ® ® to a and use the fact
the B is linearly independent over F, we see ck1 = 0 for every
(k1,...,kjeA1 x x fl
S T
V ->V
T,
from the fact that each T1 is surjective. We then define l/i(c4,. .., t4) to be the
following coset:
2.10:
Now it is not obvious that i/i is well defined. We must check that if(f31,..., fJJ is
a second vector in V1 x xwith the property that T1(fi1) = for all
= 1,...,n, then ($1 +W + W.
Since T1(fi1) = = for i = 1,..., n and each sequence in 2.8 is exact,
there exists a e such that S1QiJ = — fit. In particular, we have the
following relations in V1 ® 0
T
74 MULTILINEAR ALGEBRA
T,
\T"
T1 ® is surjective and
El
S I
is a short exact sequence of vector spaces over F. Then for any vector space W,
2.14:
O—V"®W T®Iw>vf®Wo
S T
V"
o >0
FUNCTORIAL PROPERTIES OF TENSOR PRODUCTS 75
we see that ker(T ® = Im(S ® u,). Thus 2.14 is exact and the proof of the
theorem is complete. Q
The next natural question to ask about tensor products is how they behave
with respect to direct sums. We answer this question in our next theorem, but
leave most of the technical details as exercises at the end of this section.
Theorem 2.15: Suppose {V1 lie A} is a collection of vector spaces over F. Then
for any vector space V we have
%{V1®V}
leA leA
Perhaps we should make a few comments about (d). If e EL? leA V1, then is a A-
tuple with at most finitely many nonzero components. Thus, XIeA is a
finite sum whose value is clearly This is what the statement 01Th1 = I
means in (d).
Now for each j e A, we can consider the linear transformation
® V -+ { V1} ® V. An easy computation shows that
{ e leA V1} ® V is the internal direct sum of the subspaces {Im(01 ® lie A}.
Thus, { VJ ® V = Im(01 ® Iv). Since each 01 is injective, Theorem 2.6
implies °i ® is injective. Hence V1 ® V Im(01 ® L4. It now follows that
EL?leAlm(Ol®Iv) = ®V. S
We next study a construction using tensor products that is very useful in
linear algebra. Suppose V is a vector space over F, and let K be a second field
containing F. For example, F = R and K = C. We have seen in Chapter 1 that K
is a vector space (even an algebra) over F. Thus, we can form the tensor product
V ®F K of the vector spaces V and K over F.
V ®F K is a vector space over F. We want to point out that there is a natural
K-vector space structure on V ®F K as well. Vector addition in V ®F K as a K-
vector space is the same as before. Namely, if = (oct x1), and
1
'1 = ®FYj) are two vectors in V ®FK (thus a1, and x1,
then
76 MULTILINEAR ALGEBRA
JEtS
i=1,...,n
FUNCTORIAL PROPERTIES OF TENSOR PRODUCTS 77
In Equation 2.18, the are scalars in F and each sum on the right-hand side is
finite. Thus, for each i = 1,. .., n, = 0 except possibly for finitely many j eA.
We now have
{txi®F(ExIizJ)}= i=ljeA
1=1 1=1 11 jeA
= E x1k(oc1 ®F l)ELK(F) E
There are two important corollaries to Theorem 2.17 that are worth noting
Proof
basis over K by Theorem 2.17 fl
Corollary 2.20: Suppose V is a finite-dimensional vector space over F and K is a
field containing F. Then HomF(V, V) ®F K HomK(V ®F K, V ®F K) as
vector spaces over K.
x(T, k) = k(T ®F 'K) Here T e HomF(V, V), and k e K. From the discussion
preceding Definition 2.5, we know that T ®F IKE HomF(V ®F K, V ®F K). We
claim T ®F 'K is in fact a K-linear map on V ®F K. To see this, we use equation
2.16. We have (T ®F 1K)(k(a ®F k')) = (T ®F ®F kk') = T(oc) ®F kk' =
k[T(cz) ®F k'] = k[(T ®F 'Id@ ®F k')]. Thus, T ®F IKE HomK(V ®F K,
V ®F K). Again by 2.16, k(T ØF 'K) is the K-linear transformation on V ®F K
given by [k(T ®F &)](cx ®F k') = k(T(tx) ®F k') = T(tx) ®F kk'. In particular,
Imx c HomK(V K, V K). x is clearly an F-bilinear mapping, and, thus,
factors through the tensor product HomF(V, V) ®F K. So, we have the following
commutative diagram:
2.21:
V ®FK)
In 2.21, Ø(T, k) = T ®Fk, and cli is the unique, F-linear transformation making
2.21 commute. Thus, i/i(T ØF k) = ViqS(T, k) = x(T, k) = k(T ®F 1,3. Using
equation 2.16, we can verify that is in fact a K-linear transformation. We have
i/4k2(T ®Fkl)) = t/i(T ®Fk11(l) = k2k1(T ®F&) = k2çl4T Thus, i/i is
K-linear.
We shall shorten our notation here and let 1C denote the complexification of
V. Thus, P = e V, e C}. Our previous discussion implies
that yC is a vector space over C with scalar multiplication given by
z'(oc z) z'z. IfB is an p-basis ofV, then B® 1 = {cx 1IxeB} is
a basis of yC over C.
There is an important map on Vc that comes from complex
conjugation on C. Recall that if z = x + iy (x, ye R, i = is a complex
number, then 2 = x — iy is called the conjugate of z. Clearly the map a: C —÷ C
given by a(z) = 2 is an R-linear transformation. Thus, CE HomR(P, Vc).
Recall that a is given by the following equation:
2.23:
Definition 2.24: Let V be a vector space over FR, and let T e V). The
extension T will be called the complexification of T and written Tc.
Thus, TC is the C-linear transformation on yC given by
2.25:
Tc (S ®R Zk))
=
C -÷ Homc(VC, is yC)
®R
the C-linear isomorphism given in 2.21. If c co, and is a basis of V,
then f(cx, cx)(T) = JT(cx ® 1, ® 1)(Tc). Thus, the matrix representation of the
complexification of T is the same as that of T (provided we make these
statements relative to and 0 1).
80 MULTILIN EAR ALGEBRA
2.27:
= ktl (T(cxk)
Thus, S satisfies equation 2.27.
Conversely, suppose S c Homc(VC, yC) and satisfies equation 2.27. The
discussion after Corollary 2.20 implies that S = wi), where
e HomR(V, V) and cC. To be more precise, S = wi), where
Vi is the isomorphism in 2.21, F = and K = C. We shall suppress i/i here and
write S = i wi). Thus, S is given by the following equation:
2.28:
S(>
k=1
ock®RZk)
j1
>
ki
>
S=>(TJ®RwJ)=L (TJ®RxJ)+>(TJ®RiyJ)
J=1
= (z
j=1
= (x
j=1
(2) Give a basis free proof of Theorem 2.1 by showing that the pair
((V1 0 ® Vj ® ®... ® WJ, &) satisfies 1.9. Recall çfr: V1 x
x V,, x W1 x x
÷d1+1
(3) Generalize Theorem 2.13 as follows: Suppose —* V1 +1 V1 ÷
d1
v1_1 is an exact chain complex of vector spaces over F.
If V is any vector space (over F), show that C ®F
is an exact chain
complex.
(4) Show by example that if 0 —÷ V' y1 V'1 -÷0 and 0 -÷
V \T2 3T2 —, 0 are two short exact sequences of vector spaces,
82 MULTILINEAR ALGEBRA
then
51®S2V®V
(14) Suppose V and W are finite-dimensional vector spaces over a field F. Let
T E HomF(V, V) and SE HomF(W, W). Suppose A and B are matrix repre-
sentations of T and 5, respectively. Show that A 0 B is a matrix
representation of T on V ®FW.
(15) If A e and BE Mm x m(11, show that rk(A ® B) = rk(A)rk(B).
ALTERNATING MAPS AND EXTERIOR POWERS 83
In this section, we study a special class of multilinear maps that are called
alternating. Before we can present the main definitions, we need to discuss
permutations. Suppose A = {1,..., n}. A permutation of A is a bijective map of
A onto itself. Suppose a is a permutation of A. If a(1) = j1, a(2) = and
a(n) =j, then A = {j1,.. ,j,j. We can represent the action of a on A by the
following 2 x n array:
[1 2 n
1 J2 in
[1 2 3 4 5
3 4 1 5
[1 2 3 4 51 [1 2 3 4 5
a=[2 t=[5
3 4 1
5j 4 2 3 1
Then
[1 2 3 4 51 [1 2 3 4 5
1 3 4 2j'
ta=[4 2 3 5 1
In (b) of 3.4, 1 is just the identity map on A. Any set S together with a binary
operation (a, t) —÷ at from S x S to S that satisfies the three conditions in 3.4 is
called a group. For this reason, the set is often called the permutation group on
n letters. With this notation, some of our previous theorems can be worded
more succinctly. For example, Theorem 2.3 becomes: For all a e Sn'
[1 2 3 4 5
3 4 1 2
is 5-cycle.
[1 2 3 4 5
a2=[2 3 1 4 5
is a 3-cycle.
[1 2 3 4 5
a=[2 3 1 5 4
ALTERNATING MAPS AND EXTERIOR POWERS 85
[1 2 51[1 2 3
3 1 4 sj[i 2 3 5 4j U
When dealing with an r-cycle, a, which permutes i1,..., i, and leaves fixed all
other elements of A, we can shorten our representation of a and write
a= ir). Thus, in Example 3.5, a1 = (1, 5, 2, 3, 4), a2 = (1, 2, 3), and
a = (1,2, 3)(4, 5).
We say two cycles (of Sn) are disjoint if they have no common symbol in their
representations. Thus, in Example 3.5, a1 and a2 are not disjoint, but (1,2, 3) and
(4, 5) are disjoint. It is convenient to extend the definition of cycles to the case
r = 1. We adopt the convention that for any i cA, the 1-cycle (i) is the identity
map. Then it should be clear that any ae is a product of disjoint cycles.
Example 3.6: Let n = 9 and
[1 2 3 4 5 6 7 8 9
3 4 1 6 5 8 9 7
Example 3.7: Let n = 4. Then (1, 2, 4, 3) = (1, 3)(1, 4)(1, 2). Also
(1,2,4,3) = (4,3,1,2) = (4,2)(4, 1)(4,3). U
We can now return to our study of multilinear mappings and introduce the
concept of an alternating map. Suppose V and W are vector spaces over a field
F. Recall that = {(cxi,..., cxJ cx1 e V}. We shall keep n fixed throughout this
discussion.
construct an alternating map Alt(Ø) from 4' with the following definition:
Alt(Ø)(cm1,..., cx,,) = . .,
tTGS,,
3.14:
..., = cx,,)
Thus, when we interchange two terms in the sequence cx1,..., cx,, the sign of
cx,,) changes. Since every permutation is a product of transpositions,
equation 3.14 immediately implies the following theorem:
3.17: Let V be a vector space over F, and fix n N. Is there a vector space Z over
F and an alternating multilinear map V" —÷ Z such that the pair (Z, has the
88 MULTILIN EAR ALGEBRA
following property? If W is any vector space over F, and cli e W). then
there exists a unique T e HomF(Z, W) such that =
The question posed in 3.17 is called the universal mapping problem for
alternating multilinear maps. This problem has an obvious solution, which we
shall construct shortly. First, let us point out that any solution to 3.17 is
essentially unique.
Lemma 3.18: Suppose (Z, and (Z', ii') are two solutions to 3.17. Then there
exist isomorphisms T1: Z such that
Pr
The next order of business is to construct a solution to 3.17. This is easy using
what we already know about tensor products. Consider V ®F V.
Let U be the subspace of generated by all vectors of the form ® ®
where the n-tuple (cx1,..., cxc) contains a repetition. Set Z = Let
cc) = cx1
Then define V" -÷ Z by
00 be the canonical multilinear map from to
® ... 0 cxj + U. Thus, is just the
. , .=
composite of 0 with the natural map of onto its quotient space The
definition of U immediately implies that 'i is an alternating multilinear map.
Proof Suppose cli: —÷ W is any alternating multilinear map. Then because i/i
is multilinear, there exists a unique linear transformation T0: —> W such that
3.20:
ALTERNATING MAPS AND EXTERIOR POWERS 89
is commutative. If (;,..
, e V't contains a repetition, then
. =0
since alternating. Thus, 3.20 implies T0(oc1 ® ... ®
i/i is = 0. Since U is
generated by all vectors ; 0 with (oc1,.. ., cxj containing a repetition,
we conclude that T0(U) = 0. It now follows from the first isomorphism theorem
(Theorem 5.15 of Chapter I) that T0 induces a linear transformation
given by If
then oc,,)= T((oc1 + U) = T0(oc1
0 = l/4X1,..., Thus,
3.21:
iscommutative.
Finally, we must argue that T is unique. Suppose T' e W) makes 3.21
commute. Then T = T' on Im But Z = L(Im 'i). Thus, T = T', and the proof of
Lemma 3.19 is complete. El
Definition 3.22: The vector space is called the nth exterior power of V
(over F) and will henceforth be denoted by
When the base field F is clear from the context, we shall simplify our notation
and write Note that A'(V) = V. We define A°(V) = F.
Theorem 3.25: Let V and W be vector spaces over F and suppose i/i: —p W is
an alternating multilinear map. Then there exists a unique linear transformation
T e Homf(A"(V), W) such that for all (x1,..., ocj e Vi', ifr(x1,..., =
fl
Having constructed the nth exterior power of V, the next order of
business is to find a basis for this space. Suppose B is a basis of V. As usual, set
= {(x1,..., xjl e B}. Let B(n) denote the subset of W consisting of those n-
tuples which have distinct entries. Thus, B(n) = {(x1,..., eWI when-
ever i j }. It is possible that B(n) = In this case, n > IBI. But then every
wedge product A A in is zero. Thus, = 0, and the empty set
is a basis of So, we can assume with no loss of generality that n IBI.
We define an equivalence relation on the set B(n) by the following formula:
Thus, two n-tuples in B(n) are equivalent if some permutation of the entries in
the first n-tuple gives the second n-tuple. The fact that is indeed an
equivalence relation, that is, that satisfies the axioms in 5.1 of Chapter I is
obvious. We shall letB(n) denote the set of equivalence classes of B(n). Recall
that the elements of B(n) are subsets of B(n). B(n) is the disjoint union of the
distinct elements of B(n). If we shall let (act,..., at,) denote
the equivalence class in B(n) which contains (at1,..., Thus,
B(n)
Now for each element x e B(n), we can choose an n-tuple (at1'..., ocj e B(n)
such that oç) = x. For a given x, there may be many such n-tuples, but
they are all representatives of the same equivalence class x. For each x e B(n),
pick a representative of x, say in B(n). Thus, is an n-tuple (act,..., B(n)
such that oct> = x. We have now defined a set mapping (x -÷ of B(n)
to B(n). There are of course many choices for such a map. Choose any such map
and set C(n) = {2,, x eB(n)}. Then C(n) is a collection of n-tuples in B(n), one n-
I
tuple for every equivalence class xc B(n). We can now state the following
theorem:
Proof The set C(n) consists of one representative in B(n) for each equivalence
class x eB(n). Thus, IC(n)I = IB(n)I. In particular, if A is a basis of then
= Al = IC(n)I = IB(n)I. So, we need only argue A is a basis of
The proof we give here is analogous to that of Theorem 1.20.
ALTERNATING MAPS AND EXTERIOR POWERS 91
ofA is clearly the number of ways of picking an n-element subset from the set
{1,2 N}. Therefore, 41 =
Suppose n > N. Then we had noted previously that B(n) = 4), and = 0.
Thus, = = 0. fl
There is another corollary that can be derived from Theorem 3.27. Suppose
V is an n-dimensional vector space over F. Let = be a basis
of V. Then we have a nonzero, alternating map 4): —÷ F given by
= = (a11,..., a1j. Our next corollary says that
where (CXJa
4) is essentially the only alternating map from
V F. Then
F)) = 1.
Proof The map 4) constructed above is alternating. Hence 4)5 F).
Suppose F). If 'i denotes the canonical map given in Lemma
3.19,then there exists a unique linear transformation Te F) such
that Tq = Similarly, there exists a T1 e such that T1q = 4).
Now Corollary 3.28 implies dimf(MV) = 1. Consequently,
F)} = 1. Since 4) 0, we conclude T1 $ 0. In particular,
{T1} is a basis for F). Therefore T = xT1 for some xc F. We then
have i/i = Tii = xT1ii = xØ. Thus, {qS} is a basis of AItF(V", F) and the proof of
Corollary 3.29 is complete. El
Definition 330: The unique linear transformation S for which Sn = 'PT will
henceforth be denoted M(T).
Thus, A"(T)e
A A A A T(czj.
Clearly, M(T1T2)= M(T1)M(T2) for T2 eHomf(V, W) and T1 eHomF(W, Z).
We also have the important analogs of Theorem 2.6.
EXERCISES FOR SECTION 3 93
Theorem 3.31: Let V and W be vector spaces over F, and let T e W).
Then the following assertions are true:
(a) If T is injective, so is
(b) If T is surjective, so is A"(T).
(c) If T is an isomorphism, so is M(T).
Proof Consider bases of V and W and apply Theorem 3.27. S
A"(T) is usually called the nth exterior power of T.
(9) Suppose V is a vector space over F, and let K be a field containing F. Show
®F K ®F K) as K-vector spaces.
94 MULTILINEAR ALGEBRA
.., E
= cTE5 .., S
4.4: Let V be a vector space over F and n e Ri. Is there a vector space Z over F
and a symmetric multilinear map 4': V" -÷ Z such that the pair (Z, 4') has the
following property: If W is any vector space over F, and V, e W), then
there exists a unique T e W) such that T4' = i/i?
As with alternating maps, it is an easy matter to argue that any solution to 4.4
is essentially unique.
Lemma 4.5: Suppose (Z, 4') and (Z', 4") are two solutions to 4.4. Then there exist
isomorphisms T1: Z such that
SYMMETRIC MAPS AND SYMMETRIC POWERS 95
Theorem 4.8: Let V be a vector space over F, and suppose i/i e W).
Then there exists a unique linear transformation T e W) such that
ct*x1,..., = for all .., 5
From our construction of we see that the set
{[;]... I ..., e spans as a vector space over F. The
following relations are all obvious.
4.9: (a)
(b) ... = [a.]
(c) = for all creSs.
96 MULTILINEAR ALGEBRA
(1) Complete the details of Example 4.3, that is, argue S(q5) is indeed a
symmetric multilinear mapping.
(2) Suppose V is a vector space over F. Show that the following complex is a
short exact sequence:
T T
0 >0.
EXERCISES FOR SECTION 4 97
= E (c(11 .
.
+ d(11 . .
and
— V
(1F"1n) 1 n)L.., 1
. . .
n
1. PRELIMINARIES ON FIELDS
algebra F[X] to the multiplicative identity 'v in t(V). Another important point
to note here is that any identity f(X) = g(X) in F[X] is mapped by p into the
corresponding identity f(T) = g(T) in 1(V).
We shall constantly use this map p to study the behavior of T on V. In order
to facilitate such a study, we need to know some basic algebraic facts about the
polynomial algebra F[X]. We present these facts in the rest of this section.
The algebra F[X] is often called the ring of polynomials in the indeterminate
X over F. We have seen that F[X] is an infinite-dimensional vector space over F
with basis the monomials { 1 = X°, X, X2,... }. In particular, two polynom-
ials f(X) = + + a1X + a0 and g(X) = bmXm + + + b0 in F[X]
with 0 5e bm are equal if and only if n = m and =
a0 = b0. If f(X) is a nonzero polynomial in F[X], then f(X) can
be written uniquely in the following form: f(X) = + + a1X + a0 with
n 0, a0 e F, and 0. The integer n here is called the degree off
We shall use the notation 5(f) to indicate the degree of f. Thus, 5(S) is a function
from F[X] — {0} to N u {0}. Notice that we do not give a degree to the zero
polynomial 0. The degree function 5() has all the same familiar properties that
the reader is acquainted with from studying polynomials with coefficients in R.
Thus, we have the following facts:
1.2: Let f(X), g(X) e F[X] with g 5é 0. Then there exist unique polynomials h(X)
and r(X) in F[X] such that f(X) = h(X)g(X) + r(X), where r(X) = 0 or 5(r) c 5(g).
The proof of 1.2 is nothing more than the long division process you learned in
grade school. We leave it as an exercise at the end of this section.
Let f, g e F[X]. We say f divides g if there exists an he F[X] such that lb = g.
1ff divides g, we shall write fi g. We say f and g are associates if fi g and g It
follows easily from 1.1(c) that f and g are associates if and only if f = cg for some
nonzero constant c e F. For example, 2X + 2 and X + 1 are associates in Q[X],
whereas X + I and X are not associates.
The notion of a greatest common divisor of a set of polynomials f1,. . , is .
Lemma 14: Let f1,..., e F[X]. Then f1,..., have a greatest common
divisor d. Furthermore, d = a1f1 + ... + for some a1,...,
and
= arf.
fj
We shall let g.c.d.(f1,. .., denote a greatest common divisor of f1,. . . , ç.
Although a g.c.d.(f1,. . ,. fj
is unique only up to some nonzero constant in F, this
will cause no confusion in the sequel. Note that g.c.d.(f1,..., fj = 1 whenever
some f1 is a nonzero constant in F.
In the sequel, we shall also need the dual concept of a least common multiple
l.c.m.(f1,..., of f1,.. ., We shall discuss this notion in the exercises at the
end of this section.
A polynomial f(X) e F[X] is said to be constant if for 5(f) is zero. Thus, f is a
constant if and only if fe F. We say a polynomial f(X) in F[X] is irreducible (over
F) if f is not constant, and whenever f = gh with g, he F[X], then one of g or h is
a constant. This notion of irreducibility definitely depends on the field F.
PRELIMINARIES ON FIELDS 101
1.8 follows easily from Lemma 1.4 and the observation that if g is irreducible,
then g.c.d.(g, p) = 1 or g for any p e F[X]. Once we have 1.8, the essential
uniqueness of f = g1 is clear. LI
1.9: f(X) = . ..
In Equation 1.9, g1,..., are irreducible, e1,. .., em are positive integers, and
g1 and are not associates whenever i j. Sometimes it is convenient to allow
an exponent e1 in 1.9 to be zero. For example, if f1,. . . , f,, are nonconstant
polynomials, then there exists a set of irreducible polynomials {g1,. . , and
nonnegative integers such that g1 and are not associates for i 5e j and
1.10:
102 CANONICAL FORMS OF MATRICES
1.11: f1,. .., are relatively prime if and only if a1f1 + + = 1 for some
a1eF[X].
We shall use this remark frequently throughout the rest of this chapter.
Before closing this section, we need to say a few words about algebraically
closed fields.
f(X) = c i:-i (X —
Theorem 1.15: Let F be any field. Then there exists a field F containing F with
the following properties:
An easy exercise shows t/i must be injective in (b). It follows that any two fields
containing F and satisfying (a) and (b) are isomorphic as F-algebras. For this
reason, a field F satisfying (a) and (b) is called the algebraic closure of F (any
EXERCISES FOR SECTION 1 103
other algebraic closure of F being isomorphic to F). The proof of Theorem 1.15
is beyond the level of this book. The interested reader can consult [6; Thm. 32, p.
106].
If F is any field, we shall let F denote the algebraic closure of F. Our interest
in F comes from equation 1.14. If f(X) is a nonconstant polynomial in F[X], then
f may not factor into linear polynomials. Since F F, F[X] F[X]. Since
F is algebraically closed, f has a unique factorization as in 1.14. If
f(X) = c fl[. 1(X — cjn e F[X], we shall again call R(f) = {c1,.. . , cr} the roots
of f. Thus, the roots of a polynomial f(X) in F[X] are those elements c e F, the
algebraic closure of F, such that f(c) = 0.
Example 1.16: Suppose f(X) = (X2 + 1)(X2 + 2)e R[X]. Since X2 + 1 is irre-
ducible in R[X], f has no factorization as in 1.14 over R. It is not hard
to see that C is the algebraic closure of R. In C[X], we have f(X) =
(X + i)(X — i)(X — + f are then given by R(f) =
{i, El
The fact that any field F has an algebraic closure F is often used when
studying linear transformations.
Suppose T e HomF(V, V). Then we have the F-algebra homomorphism
(p: F[XIJ -.t(V) given by 4(f) = f(T). Suppose we wish to study the action of f(T)
on Y for some interesting f(X) e F[X]. One way to do this is to extend the scalars
to F and study the natural extension of f(T) to Often information obtained
about f(T) on gives us useful information about f(T) on V. Recall that
extending scalars to F means we form the F-vector space = V ®F F. By the
natural extension of T to VF, we mean the linear map T ®F IF Homp(VF, yF)
The vector space V is imbedded in yF as the F-subspace V ®F 1. If we identify a
vector aeV with its image a ®F 1 in V ®F 1, then we have
(T ®F 1rXa) = (T ®r Ir)(a ®F 1) = T(a) ®F I = T(a). Thus, T ®F IF is just T
on V. In the sequel, we shall identify a ®F 1 with a and set T ®F Ir TF. When
we do this, a typical vector in is a finite sum of the form with
Zn eF, and a1,.. ., €V. The action of TF on such a vector is given by
Z1a1) = In particular, the reader can easily check that
(f(T)) = f(TF) for any f(X) F[X]. Now since F is algebraically closed, f can be
factored in F[X] as in equation 1.14. Thus, f(Tf has a particularly simple form:
f(T)F = c (TF — cJ". We shall see how these ideas can be usefully em-
ployed in Section 5 when discussing the real Jordan canonical form of T.
(4) Complete the proof of Lemma 1.4 by showing that = g.c.d.(f, g).
(5) Give a proof of 1.8 and then complete the details in Theorem 1.7.
(6) Determine what all irreducible polynomials of degree less than or equal to
three look like in F2[X].
(7) Show that f(X)g(X) = 0 in F[X] if and only if f = 0 or g = 0.
(8) Let K and F be fields and suppose K F. Show that any F-algebra
homomorphism i/i: F -÷ K with &(l) = 1 must be injective. If i/i(l) 5e 1 is this
statement true?
(9) Find d = g.c.d.(X3 — 6X2 + X + 4, X5 — 6X + 1) in Q[X]. Exhibit
two polynomials A and B such that d = A(X3 — 6X2 + X + 4)
+ B(X5 — 6X + 1).
(10) Suppose F c K are fields. Let g1,..., F[X]. Show that if g1,..., are
relatively prime in F[X], then g1,.. ., are relatively prime in K[X]. Is
the converse true here?
(11) In Section 2, we shall need a least common multiple, l.c.m.(f1,..., f,j, of a
set of polynomials f1, . . , f,, e F[X]. We define a l.c.m. (f1,. . ,
. to be a
. fj
polynomial (unique up to associates) e(X) such that (a) f1 e for all
= 1,...,n, and(b)iffjgfori = 1,..., n, then Prove that any set of
polynomials f1,..., e F[X] has a least common multiple.
(12) In Exercise 11, suppose each f1 is factored as in equation 1.10. For each
(15) Prove that every polynomial f(X) c R[X] factors into linear and quadratic
polynomials.
(16) Let f(X)c R[X], and let f'(X) denote the derivative of f. Show that f and f'
are relatively prime in R[X] if and only if f has no multiple roots.
(17) Prove that f(X) = 1 + X + X3 + X4 is not irreducible over any field.
(18) Show that p(X) = X4 + 2X + 2 e Q[X] is irreducible.
(19) Let f(X) be a nonconstant polynomial in F[X]. Set (f) = g(X)e F[X]}.
Show that the vector space F[X]/(f) is a finite-dimensional algebra over F
when multiplication is defined as follows: (h + (f))(g + (f)) = hg + (f).
(20) Suppose in Exercise 19 that f(X) is irreducible. Prove that F[X]/(f) is a field.
MINIMAL AND CHARACTERISTIC POLYNOMIALS 105
Theorem 2.1: Let A be an algebra over F, and assume dimF(A) = m <cc. Then
for every x eA, there exists an f(X)e F[X] such that 1 C 5(f) C m and f(x) = 0.
Proof Consider the set A = {1, x,..., xm} A. Since dimFA = m, A is linearly
dependent over F. Hence, there exist constants c0,..., Cm E F, not all zero, with
c0(1) + c1x + + cmxtm = 0. Set f(X) = c0 + c1X + + Clearly,
1 C5(f)Cm,andf(x)=0. El
Although the proof of Theorem 2.1 is trivial, the theorem has many
interesting ramifications for a T e t(V). Since dimF(V) = n < cc, dimFt(V) = n2.
Thus, we can apply Theorem 2.1 to the algebra 1(V). We conclude that there
exists a nonconstant polynomial f(X) e F[X] such that 5(f) C and f(T) = 0.
Another way to say this is that ker p = {f(X)e F[X] I f(T) = 0}, the kernel of
qx F[X] —p 1(V), is not zero. Among all such nonconstant ft ker p, we can select
a polynomial g(X) of smallest degree. Then 1 (5(g) C n2, g(T) = 0, and
5(g) C 5(f) for any nonconstant polynomial ft ker (p. Suppose 5(g) = m. Then
+c0 with Then
+ (cm -
1
+ ... + (co/cm) is a monic polynomial of lowest degree in
ker (p. A polynomial arXr + + a0 e F[X] is said to be monic if r ? 1 and
ar = 1.
Now we claim that c 1g(X) is unique. To see this, suppose f(X) is a second,
nonconstant, monic polynomial in ker p of smallest possible degree. Set
c 1g(X) = h(X). Applying the division algorithm 1.2, we have
f(X) = A(X)h(X) + r(X) with r = 0 or 5(r) c 5(h). If we apply (p to this equation,
we get 0 = f(T) = A(T)h(T) + r(T) = r(T). If r 0, then r(X) is a nonconstant
polynomial in ker (p of degree smaller than 5(g). This is impossible. We conclude
that r = 0 and f = Ah. Now 5(f) = 5(h) by definition. Thus, 1.1(c) implies
5(A) = 0. So, A is a nonzero constant. Since f and h are both monic, we conclude
that A = 1 and f = h. We have now shown that there is a unique, monic
polynomial of smallest positive degree in ker (p. This polynomial gets a special
name, which we introduce below:
Definition 2.2: The unique, monic polynomial f(X)e F[X] of smallest positive
degree such that f(T) = 0 is called the minimal polynomial of T We shall
henceforth denote the minimal polynomial of T by mT(X) or just mT.
Our discussion before Definition 2.2 implies that mT(X) exists and indeed is
unique. We also know the following facts about mT(X):
(a) follows from Theorem 2.1. (b) is the same argument that was used above
for the uniqueness of mT.
We can define the minimal polynomial of an n x n matrix in a similar fashion
to 2.2.
Example 2.6: Let V = R2, and let Ô = = (1,0), 62 = (0, 1)} be the canonical
basis of R2. Let T be defined by T(61) = 62 and T(62) = —.6k. Thus, T is the
familiar rotation of P2 through a 900 angle:
F(&oXT)=(? 1)=A
Clearly, A2 + I = 0. Thus, X2 + 1 e ker p. Since T takes no nonzero a
into a multiple of itself, no linear polynomial lies in ker p. Thus,
mT(X)=mA(X)=X+L fl
Suppose we consider the same example over C instead of P.
Example 2.7: Let V = C2, 3 = = (1,0), 62 = (0, 1)}, and suppose T(61) = 62,
= Then again we have
and hence X2 + 1 e ker p. But now T((1, —i)) = i(1, —i). So, the same reasoning
used in Example 2.6 no longer applies. If U(mT) = 1, then T = zlv for some
zeC — {0}. But then 62 = T(61) = z61 which is impossible. Hence
mT(X) = X2 + 1 as before. El
Example 2.7. The minimal polynomial stays the same but is no longer
irreducible. X2 + 1 = (X — i)(X + i) e C[X]. These examples suggest the follow-
ing question: How does the minimal polynomial change under an extension of
the scalars? The answer is that the minimal polynomial stays the same under all
extensions of the scalars. However, as the field F gets larger, we may be able to
factor mAX) more easily, as the two examples above show.
In order to argue mT(X) remains invariant under extensions of the base field,
we need to examine the kernel of p more closely. From 2.3(b) we know that any
polynomial in ker (p is a multiple of mT(X).
Proof The curved arrow between (mT) and F[X] indicates that the map
between (mT) and F[X] is the inclusion map. We have seen that ker (p = (mT) by
2.3(b). The result now follows from Lemma 2.8 and Corollary 5.17 of Chapter
LU
The image of is just the linear span of the set {T1 Ii = 0, 1,... }. Corollary
2.9 says this is a subspace of of dimension 3(mT).
Now suppose K is any field containing F. We can extend our scalars to K by
forming V" = V ®F K Let D' = T ®F II. Then we have two minimal poly-
nomials mT e F[X] and mTK e K[X]. The statement that the minimal polynomial
of T remains invariant under an extension of the base field means more precisely
that mT(X) = mTK(X) in K[X].
Theorem 2.10: Let K be a field containing F, and let T e HomF(V, V). Then
mT(X) = mT'4X) in K[X].
108 CANONICAL FORMS OF MATRICES
mT(T)=mT(T®FIK)=(T®FIK)+ar_l(T®FIK) +
= (V ®FIK) + ®F&) + a0
= mAT) ®F 'K
=0
Thus, 2.3(b) implies mTK(X) I mT(X) in K[X]. Since both of these polynomials are
monic, our theorem will follow if we can show that ê(mT) = ô(mTK).
We know from Corollary 2.9 that
2.12:
isan exact sequence of vector spaces over K. The reader can easily check that
F[X] ®F K K[X] (as K-algebras) under the map sending ®F to
> Under this isomorphism (mT) ®F K is sent to all multiples of mT in
K[X]. Let us call this image (mT)K[X]. Then the short exact sequence in 2.12
becomes
Corollary 2.14: Let Ae and let K be any field containing F. Then the
minimal polynomial, mAW4X) e F[X], of A viewed as an n x n matrix with
coefficients in F is the same as the minimal polynomial, mAK4X) e K[X], of A
viewed as an n x n matrix with coefficients in K.
MINIMAL AND CHARACTERISTIC POLYNOMIALS 109
There is one final property that mT(X) possesses that we shall discuss at this
point.
Lemma 2.16: T is invertible if and only if mT(X) has a nonzero constant term.
[ \a0J \ a0 J \aojj
g(X) =
\aoj
XE1 -\ J
xr-2 - ... -
Then Tg(T) = g(T)T and T is invertible. U
Note that the proof of Lemma 2.16 implies that if T is invertible, then
T in of course, have a
similar statement about matrices.
110 CANONICAL FORMS OF MATRICES
Corollary 2.17: If T e 1(V) is invertible, then T' = f(T) for some f(X) e F[X].
Similarly, if Ac is invertible, then A-' = g(A) for some polynomial
g(X)eF[X]. C
We now turn our attention to the second polynomial of this section. We need
to consider matrices with polynomial entries.
f11(X),...,
A=
2.20: =
2.22: det(A)
=
MINIMAL AND CHARACTERISTIC POLYNOMIALS 111
or
det(A)
=
The proof of 2.22 is the same as in the field case.
We can now introduce the characteristic polynomial of an n x n matrix with
coefficients in F.
We had seen in Theorem 3.28 of Chapter I that any two matrix represen-
tations of T are similar. Hence, the definition of cT(X) does not depend on the
basis We shall call cT(X) the characteristic polynomial of T.
Example 2.25: Let T be the linear transformation given in Example 2.6. Then
cT(x)=cA(x)=det(1 fl
2.27:
112 CANONICAL FORMS OF MATRICES
In equation 2.27,we should really write Bj(X'In) instead of B1X1, but the meaning
of the symbol is Clear.
Now from equation 2.21, we have
= (—B0A) + (B0 — +
+ (Bn2 — + Bn..iXn
We now compare the results in 2.28 and 2.29. We have two polynomials in X
with coefficients in Mn n(F) that are equal. An easy argument shows that the
matrices corresponding to the same powers of X in both equations must be
equal. Thus, we get the following equations:
2.30:
B0 — B1A = ciln
Bn_iA" =
Adding the vertical columns in 2.31 gives us 0 = cA(A). fl
Corollary 232: Let I e 8(V). Then cT(T) = 0. In particular, U(mT) U(c,j =
dim V.
We have noted in the proof of Corollary 2.32 that mT I CT. In general, these
two polynomials are not equal, as the following trivial example shows:
[—1 7 01
A1 0 2 OleM3x3(Q)
[o 3 —1]
The examples above suggest that even when mT CT, they always have the
same irreducible factors in F[X]. This is indeed the case.
2.36: B0
andfori=1,...,r—1
If we now take the determinant of both sides of equation 2.37, we get cA(X)
det C = (mA(X))". Consequently, CA(X) I (mA(X))". El
Corollary 2.38: Let T e 8(V). Then c1(X) and mT(X) have the same set of
irreducible factors in F[X].
Proof Theorem 2.26 implies mT(X) I cT(X). Theorem 2.35 implies that
cT(X) I The result follows from 1.7. fl
Let us rephrase Corollary 2.38 in terms of the language used in Theorem 1.7.
Suppose mT(X) = fV . . is the (essentially) unique factorization of the minimal
.
Corollary 2.40: Let T e 8(V). Then the minimal polynomial and characteristic
polynomial of T have the same roots in P.
Equation 2.42 gives us an immediate proof of the first half of the following
theorem:
Theorem 2.43: Let T e t(V) and suppose V = V1 ® is an internal direct
sum of 1-invariant subspaces V1,..., Vr. Let denote the restriction of T to V1.
Then
g(T) = 0, and by 2.15(b). This proves (ii) and completes the proof of the
theorem. C
(1) Let F K be fields. Show that the map i/i: F[X] K -± K[X] given by
i/i(f(X) ® k) = kf(X) is an isomorphism of K-algebras.
116 CANONICAL FORMS OF MATRICES
(i) = = —Tr(A).
(ii) a0 = (— det(A).
000 00
100 00
010 00
666 ió
(9) Let 1: W be given by T(x1, x2, x3, x4) = (x1 — x4, x1,
— 2x2 — x3 — 4x4, 4x2 + x3).
(a) Compute cT(X).
(b) Compute m14X).
(c) Show that R4 is an internal direct sum of two proper 1-invariant
subspaces.
(10) Find the minimal polynomial of
—1 0 0 0 0
1 —1 0 0 0
0 0 —1 0 0
0 0 0 1 0
0 0 0 1 1
(21) Prove Corollary 2.14 directly without using tensor products. [Hint: Use a
basis of K over F and write the coefficients of mT4X) in terms of this
basis.]
we shall use the term "eigenvalue" exclusively. The complete set of eigenvalues
for T in F will be denoted £PF(T) and called the spectrum of T (in F). Thus,
CE YF(T) if and only if there exists a vector cx E V such that cx # 0 and T(cx) = ccx.
The spectrum of T depends on the field F.
Example 3.2: Let us return to Examples 2.6 and 2.7 of the Section 2. Since
T: R2 represents a rotation through 90°, no nonzero vector is taken by T
into a multiple of itself. Consequently, = 4). If we extend T to
Tc: C2 —' C2, then both i and — i are eigenvalues of TC. We have
Tc((1, —i)) = i(1, —i) and Tc((_1, —i)) = —i(—1, —i). We shall soon see that
Tc can have at most two distinct eigenvalues. Therefore, = {i, — i}.
Let us gather together some of the more obvious facts about eigenvalues.
Proof (a) Suppose c e 92F(T). Then there exists a nonzero vector cx e V such that
(T—c)(cx)=0. cx®rl
0 ®F 1 = 0. Thus, CE
(b) Recall that R(cT) is the set of roots (in F) of the characteristic
polynomial c1(X) of T. Let be a basis of V, and set ['(cx, cx)(T) = A.
Now CE if and only if ker(T — c) 0. From Theorem 3.33(b)
of Chapter I, we know ker(T — c) # 0 if and only if T — c is not an
isomorphism on V. Since 8(V) -÷ is an isomorph-
ism of F-algebras, T — c is not an isomorphism if and only if
A — c = A — cIi, is not invertible in This last statement is
in turn equivalent to det(A — cIa) = 0. Now cT(C) = cA(c) =
— A) = (— det(A — cIj. Thus, CE if and only if c is a
root of cT(X) in F. Hence, = R(cT) n F.
(c) We have seen in Corollary 2.40 that R(m1) = R(cT). Hence, (c) follows
from (b).
(d) From (c), we know R(m1). We have seen in Exercise 14 of
Section 1 that IR(mT)I 3(m1). The result now follows from Corol-
lary 2.32. fl
Let us make a few comments about Theorem 3.3. The includion in (a) could
very well be strict, as Example 3.2 shows. As we extend scalars, the extended
linear transformation TK may pick up more eigenvalues because the character-
EIGENVALUES AND EIGENVECTORS 119
istic polynomial cT(X) may have more linear factors in K[X] than it had in F[X].
This is precisely what is happening in Example 3.2. Over R, cT(X) = X2 + 1.
Since X2 + 1 is irreducible in R[X], R(cT) n R = 4). Thus = 4). Over C,
cTc = (X2 + 1) = (X + i)(X — i). Hence, R(cic) n C = {i, —i}. Therefore,
= {±i}.
Theorem 3.3(b) tells us exactly how to compute the eigenvalues of T in F.
Choose any matrix representation A of T, and compute the characteristic
polynomial cA(X) = det(XI — A). Then find the roots of cA(X) that lie in F.
These roots are precisely the eigenvalues of T lying in F. Of course, finding the
roots of cA(X) that lie in F may be very difficult. If F = for instance, we can
use well-known techniques from numerical analysis to at least approximate the
real roots of cA(X).
Let us consider one more example before continuing.
Example 3.4: Let T: It' -÷ It' be given by T(31) = — T((52) = 2(52,
T(b3) = and T(b4) = (54 — (53. Here 3 =
(54, (53, 34} is the canonical basis
of It'. The matrix representation of T is given by
a1; + + a,;=O
In equation 3.11, s r, and each a1 is a nonzero scalar in F. Multiplying 3.11 by
c1 gives
If we now subtract 3.12 from 3.13, we produce a nontrivial relation among the x1
that has fewer terms in it than 3.11. This is a contradiction. Thus, A is linearly
independent over F. fl
There are several interesting corollaries to Theorem 3.10. In the first place,
19'.(T)l dim V since eigenvectors belonging to distinct eigenvalues must be
EIGEN VALUES AND EIGENVECTORS 121
linearly independent. We do not list this fact as a corollary since Theorem 3.3(d)
is an even sharper result. Theorem 3.10 gives us sufficient conditions for
representing T as a diagonal matrix.
Corollary 3.14: Suppose Te such that 9'F(T)I = dim V. Then there exists a
basis of V such that is a diagonal matrix.
Proof of 3.14: Let = {c1 . . . , cj, where n = dim V. For each i = 1,.. . , n,
let be an eigenvector belonging to c1. Then = .. . , is a basis of V by
Theorem 3.10. Clearly, JT(2, x)(T) = diag(c1,.. . , cj. El
Here and throughout the rest of the text, we shall let diag(c1,. .. , cj denote
the n x n diagonal matrix
[ci 0
3.15:
[0
We note in passing that the converse of Corollary 3.14 is false. Namely, if
some matrix representation of T is diagonal, we cannot conclude that T has n
distinct eigenvalues in F. For example, T = is represented by the matrix
relative to any basis in V, but T has only one eigenvalue 1.
A slightly different version of Corollary 3.14 is worth recording here.
Corollary 3.16: Let F be an algebraically closed field and let Ae JF).
Suppose cA(X) has no repeated roots. Then A is similar to a diagonal matrix.
Proof Since F is algebraically closed, cA(X) = (X — c1r in F[X]. Here
c1,. ..,cr
are the roots of cA(X), and we must haven1 + + = n. Now the
statement that c4X) has no repeated roots means each = 1. In particular,
r = n, and R(cA) = {c1,..., Thus, Theorem 3.3(b)implies =
Example 3.17: Consider the linear transformation Tc: C4 —> C4 derived from T
in example 3.4. Since £ec(TC) = {1, 2, (1 ± we conclude from Corollary
3.14 that there exists a basis of C4 such that
10 0 0
3.18: = : : 0
0 0 0
2
1 0 0 0
—1 2 0 0
A—
— 0 0 0 —1
0 0 1 1
000 00
100 00
Nk= 9 ! 9 9 9
ooo
An easy computation shows Nk is nilpotent of index k. fl
Suppose T e 1(V) is nilpotent of index k. It follows readily from Exercise 6 of
Section 2 that the minimal polynomial mT(X) must be a power of X alone. Since
T is nilpotent of index k, mAX) = X". In particular, Theorem 3.3(c) implies
= R(mT) = {0}. Thus, a nilpotent transformation or matrix has only one
eigenvalue 0. Note that mT(X) = implies k n. The maximum index of
nilpotency of any nilpotent transformation T cannot exceed n = dim (V). If
T is not zero, then T cannot be diagonalized. For suppose =
diag(a1,. aj for some basis of V. Then 0 = 1(2, 2)(T9 = [1(2, cc)(T)]k
. . ,
= [diag(a1,.. ,aj]" =
. But then a1 = = = 0 and T = 0,
which is impossible. Similar reasoning shows that a nonzero nilpotent matrix
cannot be similar to a diagonal matrix.
Nilpotent linear transformations are the fundamental ingredients in the
Jordan canonical form. We shall finish this section with a representation
theorem for nilpotent transformations.
[Nk,
1(2, cz)(T) =
[0 Nk
where
Let us say a few words about Theorem 3.21 before proceeding with its proof.
The supposition that k 2 is solely to avoid the trivial case T = 0. If T is
nilpotent and not zero, then clearly the index of nilpotency of T is some positive
integer between 2 and n. The notation for the Nk is given in Example 3.20. Thus,
each Nk1 is a x matrix having ones on its subdiagonal and zeros everywhere
else. Note then that JT(tx, txXT) is the next best thing to being diagonal. 12(2, cx)(T) is
a subdiagonal matrix with only zeros and ones appearing on its subdiagonal.
124 CANONICAL FORMS OF MATRICES
Proof of 3.21: Set k1 = k. Since T is nilpotent of index k1, we know Tk1 = 0, and
T1" -1 0. Hence, there exists a vector cx e V such that T1" - 1(tx) 0. We first
claim that A = fr, T(x),..., Tkt - is linearly independent over F. Suppose
these vectors are linearly dependent over F. Then we have
3.22: c1x + + + = 0.
In this equation, c1,.. ., e F and are not all zero. Suppose is the first
nonzero scalar in 3.22. Since Tk1 - 1(cx) 0, s < k1. We can then rewrite 3.22 as
3.23: [(c, +
[Nk, 0
3.24: ['(A, AXt) =
[0
with k2 ? ? and k2 + + = dimW. It then follows from 2.42 that
= A u A is the required basis for V. Hence the proof of Theorem 3.21 will be
complete when we argue Z1 has a T-invariant complement.
Before constructing a complement of Z1, we need the following technical
result:
3.25: If /3eZ1 is such that = 0 where 0 < s k1, then /3 = T5(fl0) for
some /30eZ1.
subspace of V, T(W') W' and Z1 n W' = (O)}. Since the subspace (0) eY,
4', We can partially order the subspaces in 5 by inclusion If is a
totally ordered subset of 5, then clearly has an upper bound W in Y.
Namely, W = Uw€,0W'. Hence, (5 c) is an inductive set. We can apply
Zorn's lemma (Z2 of Chapter 1) and choose a maximal subspace W in Y. W is
clearly T-invariant, and W n Z1 = (0). We must argue Z1 + W = V.
We shall suppose Z1 + W # V and derive a contradiction. If Z1 + W # V,
then there exists a /3EV — (Z1 + W). Since T'" = 0, there exists an integer u such
that 0< u (k1, T'1(fl)eZ1 + Wand + W for any i < u. Let us write
T11(fl)=y+cS with yeZ1 and öeW. Then
U(y) _u(5) These two vectors are in
+ Z1 and W, respectively,
since Z1 and W are T-invariant. Since Z1 n W = (0), we conclude that
Tk1 U(y) = 0 and _u(o) = 0. We can now apply 3.25 to the vector y. We
get y = T"(yo) for some y0eZ1. Therefore, T"(/3) = y + 3 = Tt1(70) + cS.
Set = — Then Tt1(/31) T"($) — = 5eW. Since W is T-invariant,
we can now conclude that Tm(flj)EW for any On the other
hand, if i < u, then = — + W. [y0eZ1 implies
If — T1(y0)eZ1 + W, then + W. Since i <u this is
impossible.] We have now proved the following relations concerning
3.27: 0 + + + + cuTu_s)(Ts_l(fli))EZ1
(a) Tk(Z) = 0.
(b) If T1 denotes the restriction of T to Z, then JT(A, A)(T1) = Nk.
(C)
[Nk1 0
[0 Nk
and B is similar to
[N11 0
[o
similarity is clearly an equivalence relation on JF). If A and B have
1 e same invariants, then we have
[Nk1 0 1 [N,1 0
[0 Nj [0 1=1
N,
T uis, A and B are similar. Conversely, suppose A and B are similar. Then
[Nk1 01 [N,, 0
and
[0
I
Nj I
[0
1
N,
are similar. These two matrices then describe the same linear transformation
T: P -÷ P relative to two different bases, say x and at' of P (see Theorem 3.28, of
Chapter 1). If
[Nk, 0
12(2, ae)(T) =
[0
I
[N,, 0
12(2', ac')(T) =
[0 N,
EIGENVALUES AND EJGEN VECTORS 129
[Nk 0
=
[0
I
Nk
Recall that M(c5, cx) is the change of basis matrix given by M(c5, =
I
Hence, if P = M(Ô, then 3.36 becomes 3.35. Since
we know M(ó, = M(cx, ö) is the easier matrix to compute. Thus, to
compute P, construct and invert M(cx, 5).
[0 0 01
A11 0 01eM3x3(Q).
[1 1 oJ
Then
[0 0 ol
A2=I0 0
[i 0
[0 0 01
0 01
[0 1 oj
[1001 [1 0 01
1 01 and P=M(i5,oc)=J0 1 01
[0 1 ij [0 —1 ij
(1) Find all the eigenvectors of the map Tc given in Example 3.4.
(2) Suppose A is a lower triangular matrix of the form
a11 0 0
a21 a22
A=
[18 —9 —61
A117
[25 —12 —9j
Show that A is nilpotent. Find P such that PAP' has the subdiagonal
form given in Theorem 3.21.
EXERCISES FOR SECTION 3 131
(5) Let Ac >JF) such that R(cA(X)) F. Show that A is similar to a lower
triangular matrix.
(6) Let T e 1(V). Suppose R(cT) = {O}. Show that T must be nilpotent.
(7) Find up to similarity all possible nilpotent matrices in M6
(8) Find all eigenvalues and eigenvectors for
[2 1
1 2j
(9) Let T e 1(V), and suppose cT(X) = (X — c1ñ in F[X]. Show that V
has a basis of eigenvectors of T if and only if dim(ker(T — c1)) n1 for all
i=1,...,r.
(10) Let Ac JF) be a diagonal matrix. Suppose cA(X) = [fl. 1(X — in
F[X]. Set Show that dim(W)=
(19) Let 5, T c 1(V). Show that 5°F(ST) = Do TS and ST have the same
eigenvectors? (Note: Exercise 18 gives an easy proof of this problem if S and
T are represented by symmetric matrices.)
132 CANONICAL FORMS OF MATRICES
Inthis section, we shall use Theorem 3.21 to present a canonical form for those
T e 1(V) = HomF(V, V) that have the property that the roots of cT(X) all lie in F.
If F, then we know from Section 1 that c1 factors in F[X] as follows:
CT(X) = [1 (X —
Proof? (a) S°F(T) = {c1,. .., cj by Theorem 3.3. Thus, for each c1 there exists a
nonzero vector such that = In particular,
e ker(T — = This shows V1 (0). Since T commutes with
V1.
any polynomial in T, T(T — = (T — c1)"T. In particular, if fI e
then (T — cJ"T(fJ) = T(T — = 0. Hence T(/3) e V1. This proves
(a).
(b) For each i= 1,...,r, set Then
h1(X),.. ., have no common factor in F[X]. In particular,
g.c.d.(h1 . , hr) = 1. It now follows from 1.1 1 that there exist
. .
0 1
4.5: — cjlv.) = I
M1.
[ 0 Nk111 j
In equation 4.5, k1 = k11 ? ? are the (unique) invariants of
T1 — c1Iv.. We now have ['(cx1, = JT(cx1, + (T1 — c1Iv))
=c1JT(cz1, + F(x1, — = + M1. This gives us
equation 4.3 and completes the proof of Theorem 4.2. E
We have already proved our next theorem, but let us introduce a definition
first.
clk+Nk=[
1 :1
is called a Jordan block of size k belonging to c. A square matrix J is Called a
Jordan matrix if J has the form J = diag(J1,. ., Jj, where J1,..., are Jordan
.
The Computations after equation 4.5 show that there is a basis of V1 such that
f&1,
1
4.7: f@x1, = I I
a J1
[ 0 B(k1p(j))j
01
4.8: F(occz)(T)=I
[0 Jj
The representation of T given in equation 4.8 is called a Jordan canonical form
of T. We shall see shortly that J is unique up to a permutation of its blocks
We have now proved the following theorem:
Theorem 4.9: Let T e 1(V), and suppose the roots of the characteristic poly-
nomial of T all lie in F. Write cT(X) as in equation 4.1. Then there exists a basis
of V such that
01
['(cz, cz)(T) = I I
[0 Jj
where each
[B(k11) 0
Ji = I
[0
For each i = 1,..., r, the integers k11 are the invariants of the
nilpotent transformation T — c1 on V1 = ker(T — B(k11),..., are
Jordan blocks of sizes k11,..., respectively belonging to c1. U
01
J=I I
[0 Jj
THE JORDAN CANONICAL FORM 135
[B(k11) 0
Ji =
[ 0
The constants ?' that appear in Corollary 4.10 are computed from
the natural linear transformation T: F" —+ F" associated with A. Namely, if 5 is
the canonical basis of F", define T by = Then f(c5, Ô)(T) = A, and
are the invariants of T — c1 on ker(T —
[19 —6 —91
A117
[25 —9 —11j
[18 —6 —91
4.13: (a) F(&o)(T—1)=A—I=117 —5
—9 —12j
[—3 3 01
4 01
[—3 3 oj
[17 —6 —91
(c)JT(c5,b)(T—21)=A—21=j17—6 —91
[25 —9 —13j
Now 4.13(b) implies that V1 = ker(T — J)2 is a cyclic (T — 1)-subspace with basis
= = 1, 1), = (3, 3, 4)}. If T1 denotes the restriction of T to V1, then
g1XT1) = - I) +
= (? )+ 0=( 0
136 CANONICAL FORMS OF MATRICES
1 00
4.14: 11:0 =J
0 02
J is clearly a Jordan canonical form of T (or A). If we set
[1 3 31
M(ocö)=J1 3
4
[1 0 01 [19 —6 —91
4.15: 1 01=M(ô,cx)117 —4
—9 —lij
where
[1 3 31
M(b,cx)'=M(cc,ö)=Il 3
[1 4
3
3 41 =1—i 0 iI=P
[i 4 [—i 1 oj
A=(?
THE JORDAN CANONICAL FORM 137
(a1 O\
and
(a 0
I
\O a2,,' \1 a
01
4.18: cx)(T) = J = I I
[0 Jj
The first thing to note here is that k11 ? ? are the unique invariants
of the nilpotent transformation T1 — c1 on Since dim V1 = n1, Theorem 3.21
implies k11 + + = n1. But, k11 + + is the size of the square ma-
trix J1. Thus, J1 is an n1 x n1 matrix having the eigenvalue c1 running down its
diagonal. Now, n1 is the multiplicity of the root c1 in cT(X). Thus, an eigenvalue c1
of T appears as many times on the principal diagonal of J as its multiplicity in
the characteristic polynomial of T. These remarks together with Theorem 3.31
readily imply that the Jordan canonical form J of T is unique up to a
permutation of its blocks J1,. The corresponding matrix statement is the
following theorem whose proof we leave as an exercise at the end of this section:
Next note that k11 is the index of nilpotency ofT1 — c1 on V1. Therefore, the
minimal polynomial ofT1 on V1 is given by mT.(X) = (X — cjkui. Theorem 2.43(c)
138 CANONICAL FORMS OF MATRICES
4.20: mT(X) = fl (X —
Theorem 4.21: Let T e t(V), and assume CT(X) = 1(X — C1)" eF[X]. Then T
can be represented by a diagonal matrix if and only if mT(X) = 1(X — c1),
that is, every eigenvalue of T has multiplicity one in the minimal polynomial of
T.
Proof? We have seen that the Jordan canonical form of T given in equation 4.18
is unique up to a permutation of the blocks J1,..., If T is represented by a
diagonal matrix diag(b1,..., bj = B, then B is a Jordan canonical form of T.
Hence, B is J up to some permutation of the blocks In particular,
every block J1 is diagonal. Thus, k11 = 1 for every i = 1,..., r. Then equation
4.20 implies mT(X) = (X — c1).
Conversely, if mT(X) = 1(X — c1), then equation 4.20 implies k11 = 1. But
Therefore, 1 = k11 = = for all i = 1,..., r, and J is
diagonal. fl
Let us rephrase Theorem 4.21 slightly.
(c)ImP1=V1=ker(T_c1)ffori=1,...,r.
(d) T = + NJ.
(e)
ifi=j
' '
LO ifi$j
(f) N1 is nilpotent of index at most n1 for each i = 1,..., r.
4.24:
(ii) a1(X)h1(X) = 1 for some a1,..., are F[X]
(iii) Im(h1(T)) c V1 for each i = 1,..., r
(1) Prove Theorem 4.19. [Hint: Show that ker(A — ker(B — for
every eigenvalue ci.]
(2) Show that the Jordan canonical form for a transformation T (if it exists) is
unique up to a permutation of its blocks J1,. ,
(3) Find all Jordan forms for
(a) All 8 x 8 matrices having x2(x — as minimal polynomial.
(b) All 6 x 6 matrices having (x + 2)4(x — 5)2 as characteristic
polynomial.
(4) Find the Jordan canonical form J of
[30 —141
A=1 1
[—1 —3 1
0—1 00
0 0 0
A—
—
—1
1
1 201
3 —1 0
2 1 —2 0
A= 00 10
0 0 0
10 00
1
0001
001
1
—3 2 1 1
3 —6 1 4
(A 0
kS%OA
is similar to
(B 0
k\0 B
In this section and the next, we take up the question of what canonical form for
T e HOmF(V, V) is available when the characteristic polynomial of T does not
have all of its roots in F. When F = R, we are able to construct a form
surprisingly close to the Jordan canonical form of Section 4.
For the time being, let us assume F is an arbitrary field. Let V be a finite-
dimensional vector space of dimension n over F. If T e HomF(V, V), then we
have seen that the characteristic polynomial cT(X) is a monic polynomial of
degree n in F[X]. Using Theorem 1.7, we know cT(X) has an essentially unique
142 CANONICAL FORMS OF MATRICES
CT(X)
LI
Proof The complete factorization of cT(X) given in 5.1 has (X — cr as one of its
terms q1(X)'1. We may assume q1(X)" = (X — c)m. The result now follows from
Theorem 5.3 with W = ker(q2(T)fl2) $ $ J
We can now take up the question of what canonical form is available for T
when R(cT(X)) F. The first thing one might try is to pass to the algebraic
closure F of F. Thus, consider the extended map on = ®F F. As we
have seen in Section 2, the characteristic polynomial cT(X) is the same as cTr(X).
Since F is algebraically closed, c,JX) decomposes into linear factors in P[X] and
Theorem 4.2 implies T1' has a Jordan canonical form J e Of course, the
entries in J may not all lie in F, but we can hope that if the relationship between
F and F is special enough, we may be able to use J to produce some reasonable
canonical form for T in This is precisely what happens when F Elk.
Then F = C. By using complex conjugation a e HomR(C, C) (See Section 2,
Chapter II), we can convert a Jordan canonical form for Tc to a reasonable form
for T itself over lit
Let us set up the notation we shall use for the rest of this section. V will denote
a vector space over Elk of dimension n. Vc = V ®RC is the complexification of V.
We shall identify a vector cV with its image 1 in yC Then,
yC e V}. Note then that any vector can be written
= i zkaklzk
uniquely in the form = p + iA, where 4u, 2eV and i = We had also seen
in Chapter II that a extends to an R-isomorphism c: Vc. Recall that
the value of ®R a on a typical vector = e Vc is given by
®R Zkcxk) = We shall shorten our notation here and
write ®R a)(fJ) = for any PEVC. Thus, if fi = Zk2k, then
P = Equivalently, if fi p + ii with p,AeV, then 7? = p— iL It is
144 CANONICAL FORMS OF MATRICES
Proof If z is an eigenvalue of TC, then TC(cx) = zcx for some nonzero eigenvector
& is also nonzero and equation 5.5 implies TC(&) 2&. Thus, if cx is an
eigenvector of TC with corresponding eigenvalue z, then & is an eigenvector of TC
with corresponding eigenvalue 2. El
In equation 5.7, c1 . . . , denote the (distinct) real roots of cT(X), and z1,
2k,. .. , denote the (distinct) complex roots which are not real. We have
r 0, t 0, and r + 2t n = 3(cT).
If = 0, then cT(X) has only real roots. In this case, T has a Jordan canonical
t
(X — —
cT(X) [1 (X —
=
Since the coefficients of cT(X)( = c-rc(X)) are all real, conjugating c1.(X) merely
THE REAL JORDAN CANONICAL FORM 145
Thus, the multiplicity of z, and 2, in cT(X) is the same. We can now rewrite
equation 5.8 as follows:
5.9:
cT(X) = fi (X — fi (X — —
If we now combine Theorem 4.9 with Lemma 5.10, we get the following
important corollary:
Corollary 5.11: Let fi, {fJ,,..., fl,,,,} be a basis of ker(TC — z,)" such that if TF
denotes the restriction of Tc to ker(TC — then
[B(k,1) 0 1
r(fl,,fl,)(TF)=I
L 0 B(k(q(l))j
[B(k,1) 0 1
L0 B(klq(1))j
= kerfP — j = 1,...
Ur+i = ker(TC — z,)" $ ker(TC — 1= 1,..., t
0 1
= I=
[0
As usual, in equation 5.13, denotes the restriction of T to
are the invariants of — on and B(yjq) = CJIy•q + Nyjq for all
q = 1, . . , s(j). By Theorem 2.17 of Chapter 2,
. is a C4asis of The
representation of Tf (the restriction of TC to with respect to is identical to
5.13.
[B(k,1) 0
5.14: fl,)(TF) = I
[0 B(klq(1))
and
if we now equate the real and imaginary parts in equation 5.15, we get the
following equations;
= a,p12 — b,1,2 +
T(2,2) = b1p12 + a,2,2 + 213
= b,p,k + a,A,k
Thus, the subspace spanned by the first 2k vectors Ru' 211 . . 'mi' of A, form a
T-invariant subspace of +,. The representation of T on this subspace is given
by the 2k x 2k matrix
D
12 D 0
'2
0
12 D
148 CANONICAL FORMS OF MATRICES
where
a1 b1
\—b, a,
[H(k,1) 0
A,)(t,) =
[ 0 H(klq(l))
cT(X) = [I (X — fi (X — — C[X]
Here c1, ..., c1 are the distinct real roots of cT and z1, 2k,. z1, are the
nonreal roots. Let TC denote the complexification of T. Then
[JI I
Jr+ i
0
sr-ft
THE REAL JORDAN CANONICAL FORM 149
Forj=1,..,r,
0
Ji =
[0
I
B(yJS(J))
[B(k,1) 0
=
[0
Here k,1 ? kjq(j), k11 + + klq(:) = Pi' and B(kjm) = ZIIkim + Nkzm
form=1,...,q(l).Forl=1,..,t,
0
=
L 0 B(klq(j))
( a1 b,
a1
J1
0
5.20: F(A, A)(T)
= 0 K,
K,
[i-i(k11) 0 1
K,=I
[0 H(klq(I))j
150 CANONICAL FORMS OF MATR1CES
1(6, b)(T)
= —
1(fl, fl)(TC) =
(i 0)
(i 0
k%0 —i
is the Jordan canonical form of Tc with J1 = (i) and Ji = (— i). The real and
imaginary parts of i are 0 and 1. Therefore,
D=(?
= + i21 for p1 = (1, 0), and = (0, —1). Thus, A = A1} is basis for
R2 and JT(A, A)(T) = D is a real Jordan form of T. El
[14 —3 —91
A=I 15 —3
[13 —2 —9j
THE REALJORDAN CANONICAL FORM 151
[2 0
0 —ij
0 01
1°: 0 ii
[o:—i oj
We note in passing that the analog of Theorem 4.19 is true for real Jordan
canonical forms. Let A, BE Then A and B are similar if and only if A
and B have the same real Jordan canonical form, that is, there is an ordering of
the eigenvalues of A (and B) such that the resulting real Jordan canonical forms
are the same. We leave this remark as an exercise at the end of this section. Since
a real Jordan canonical form of T (or A) is unique up to similarity, authors often
refer to "the" real Jordan canonical form of T (or A).
In the remainder of this section, we discuss one of the more important
applications of the real Jordan canonical form, that is, solving systems of linear
differential equations. Suppose I is some open interval containing 0 in It Let
x1(t),..., C'(I). We are interesting in solving a system of linear differential
equations of the following type:
dx1
5.23:
= + +
—a-
Now the solution procedure in 5.24 is to replace A with the simplest matrix
similar to A that we can find. Suppose J = PAP' for some invertible matrix
Pe Set y = Px. Then y' = Px', and y(O) = PC. Also,
Jy = PAP '(Px) = PAx = Px' = y'. Thus, to find a solution to 5.24 we need
only solve the following equation:
[B, 0
5.26: J= I
[0 BK
[D 0
0
5.27: . or
[-0 12 D
0 1 c
(ab
a
Y2 = (ye, + a,..., ÷J, etc. Thus, to find a solution to equation 5.25, it suffices
to know how to solve equations of the following two types:
c 0 0 x1
I
and
x'1 D 0 0 x1
'2 0
5.29: . =0 ' withx1(0)=c1, i=1,...,2n,
6 12 D
and D=(
Before proceeding with a solution to these two types of equations, we need to
recall a few facts about exponentials of matrices. If A e Mn n(R), then is the
n x n matrix defined by the following eqpation:
5.30: eA
= k=O
The reader can easily argue that the partial sums = = of the series
in 5.30 converge to a well-defined n x n matrix we denote by eA. We shall need
the following facts:
A=( a b
a
then
a( cosb sinb
eA e\\.b cosb
(e) If c e 92R(A), then ec e
(f) d(et"9/dt = AetA.
Proof? All six of these assertions are easy computations, which we leave to the
exercises. El
Theorem 5.3 1(f) provides us with a unique solution to equation 5.24, namely
x= For, x' = d(etAC)/dt = AetAC = Ax, and x(0) = = C. The fact
that etAC is the only solution to 5.24 is a simple computation. We want to see
what form this solution takes in our two special cases 5.28 and 5.29.
154 CANONICAL FORMS OF MATRICES
0 0 0
I
and N=
1 c 0 1 0
0T
1 0 0 o
00
00
(tN)k
5.32: etN
= k! = r2 10
(n—2)! (n—3)! (n — 4)!
tn—1 tn—2
t 1
(n—I)! (n—2)! (n — 3)!
5.25:
J2
0
00'
0
00
0 z1 z'1
The equations in 5.35 are solved in the same manner as equation 5.28. Thus,
tk
5.36: (fl— p1 V
j—k' 3 — i,..., n
— L. ,
k=O IL
THE REAL JORDAN CANONICAL FORM 155
Now recall e1tt = eat _bti = eat(cos bt — i sin bt). Substituting this expression into
equation 5.36 and letting = C2j -' + gives us our final solution:
5.37:
j—i &
= eat [c2(J_k)_1 cos bt + c2(J_k) sin bt]
j—t tk
= eat [C2(J_k) cos bt — c2(J_k)_I sin bt], j = 1,..., n
k=O
Theorem 5.38: Let Ae 11(R), and let x(t) be a solution to the differential
equation Ax = x'. Then each coordinate x of
of the form tkeat cos bt, tleat sin bt, where a + bi runs through the
eigenvalues of A. El
5.40:
= 2x1 + x2 + x3
= x2 + x3
= —x2 + x3
= x1 + x2 + 2x4 with x(O) = (c1; c2, c3, c4)t
2 1 1 0
0
A—°
_0 —1
1 1
1 0
1 1 0 2
The first order of business is to find the real Jordan canonical form of A and the
matrix P for which PAP' = J.
The characteristic polynomial of A is given by
2000
1200
5.42
0 0 0
0 0 0
2 0 0 0
2 0 0
543
0
0 1
5.44:
0—2 0 0 1—i 1 1 0
(A—2) 2 0 0—2 0
and A—z1= 0 —i 1 0
= 0 2 0 0 0 —1 —i 0
0 0 2 0 1 1 01—i
Equation 5.44 readily implies {(l, 0, 0, Ø)t, (0, 0, 0, 1)t} is a basis of ker(A — 2)2,
and {(1, i, — 1, —i)} is a basis of ker(A — z1)) Since (1, i, — 1, —i) =
(1, 0, — 1, 0) + i(0, 1, 0, — 1), we conclude that
1 0 1 0
A=
0 1 0 —1
is a basis for M4 4(R) giving the real canonical form J of A. It now follows from
equation 3.36 that
1 0 1 0 1 0 1 0
5.45:P= 0 1 0 1
and P
-' 0 0 0 1
0 0 —1 0 = 0 0 —1 0
0 1 0 0 0 1 0—1
Our solutions to equations 5.28 and 5.29 imply that the system Jy = y' with
EXERCISES FOR SECTION 5 157
[2 0 01
PAr'=lO 0 ii
[o —1 oj
0 0 0 —8
1 0 0 16
A—
— 0 1 0 —14
0 0 1 6
1 —2 —1 1
2 —3 0
A—
— 0 3 2 0
—1 —1 2 1
= —8x4
= x1 + 16x4
= x2 — 14x4
= x3 + 6x4
= 2x2 — 3x3
= 3x2 — 2x3
= x1 — 2x2 + 2x4
= 2x1
= x1 — 4x2 + x3 + 2x4
THE RATIONAL CANONICAL FORM 159
76 —3 —2
4 —1 —2
10 —5 —3
Inthis section, we continue our theme from the last section. T E HomF(V, V), and
we want to discuss what canonical form may be available for T. We have seen in
Section 5 that if F = R, then the Jordan canonical form for Tc can be used to
construct a real Jordan canonical form for T. For a general field F and its
algebraic closure F, no such special relations as those used in Section 5 exist.
Thus, the Jordan canonical form of in does not give us any
particular form for T in
In this section, F is an arbitrary field, and we stay in JF). We work with
the minimal polynomial of T and construct a canonical form for T based on mT.
We shall assume as always that V is a finite-dimensional vector space over F
with dimFV = n. Let T e V). We shall factor the minimal polynomial
mT(X) of T as in equation 5.2. Then the primary decomposition theorem implies
000 0 c0
100 0 c1
1 0 0 c2
64: r(A,A)(t)=
o o a a
o o 0 1 cm_i
mt(X)=Xm_cm_iXm_i — —Co
000 0 c0
100 0 c1
010 9
o Ô 6 6
0 0 0 1 cm_i
Thus, the matrix of 1' appearing in equation 6.4 is just the companion matrix
of the minimal polynomial of t. We can restate Lemma 6.3 as follows:
Theorem 6.9: Let T E HomF(V, V), and suppose mT(X) = q(X)e, where q is a
(monic) irreducible polynomial over F. Then V = Z1 ® e where each Z1
is a T-cyclic subspace of V.
We get two important facts from equation 6.11. First, mt(X)I mT(X). Second, if
W is a T-cyclic subspace of V generated by a vector and q(X)' is the minimal
polynomial of T on W, then q(X)' must be a multiple of the minimal polynomial
of I on the 1-cyclic subspace of V/Z1 generated by fi + Z1.
Since m1{X) I mT(X), mi<X) = q(X)e1 with e1 C e. Also, dim Z1 1 implies
dim{V/Z1} <n. Hence, we may apply our induction hypothesis to 1 on V/Z1
and conclude that V/Z1 = 22 ®... Each is a 1-cyclic subspace of V/Z1.
The minimal polynomial of I on has the form and we may assume
for i = 2,..., p.
6.12: (a) Each Z1 is isomorphic to
(b) The minimal polynomial of T on Z1 is the same as the minimal
polynomial q(X)e of I on
exists a polynomial h(X)E F[X] such that q(X)eh(X) = q(X)c - cif(x). Clearly,
f(X) = h(X)q(X)ei. Set e Z1. Then q(T)e(; + = q(T)ei(x1) +
= — h(T)(oc1)
q(T)e(/11) = f(T)(cz1) — q(T)eh(T)(x1) = f(T)(a1)
— f(T)oc1 = 0.
Now let Z1 be the T-cyclic subspace of V generated by + Since
+ + Z1 = + Z1 generates the 1-cyclic subspace Z3 of V/Z1, and the
minimal polynomial oft on is q(X)e, our remarks after equation 6.11 imply
the minimal polynomial of T on is a multiple of q(X)e. But q(T)%x3 + = 0.
Thus, q(X)e is the minimal polynomial of T on Z1. It now follows from Lemma
6.3 that Z1 and Z3 have the same dimension e1d. Since the natural map
y —+ y+ Z1 induces a surjective map from Z1 to Z1, we conclude from Theorem
3.33 of Chapter I that the natural map y —' y + Z1 is an isomorphism of Z1 onto
We have now proven (a) and (b) of 6.12.
Let us denote the natural map from V to V/Z1 by ic. Thus, m(y) = y + Z1.
We have seen in the previous paragraph that ic restricted to each Z1, i = 2,
., p, is an isomorphism of onto Z1. This fact easily implies
Z1 + + = Z1 E13" $ To see this, we need only show that if
Yi + + = 0 with then Yi = = = 0 (Theorem 4.16 of Chapter
I). If then in
We should point out here that Theorem 6.9 gives us a different proof of
Theorem 3.21. If T is nilpotent, then mT(X) = Xk. The companion matrix of Xk is
just the matrix Nk defined in 3.20.
We can now prove our main result in this section.
(b) = 5(q1).
(c) There exists a basis A= then
[R1 0
[(A, A)(T) = I
[0 Rn
164 CANONICAL FORMS OF MATRICES
where
[C(qpi) 0
R1 =
[0
foralli=1,...,r.
Proof We have virtually proved everything here already. The primary decom-
position theorem implies V = V1 $$ with = Theorem
2.43 implies the minimal polynomial of T restricted to V1 is given by q1(X)m.
Hence by Theorem 6.9, each V1 = Z11 $ where is a T-cyclic
subspace of V. We had seen in the proof of 6.9 that the can be chosen such
that the restriction of T to Z11 has minimal polynomial q1(X)m', and the
restriction of T to the remaining has minimal polynomial q1(X)eui where
m1 = e1, e12 ;3 Thus, we have established (a). (b) follows from
Lemma 6.3. Lemma 6.3 also implies there exists a basis of such that
= Here denotes the restriction of T to (c) now
follows from equation 2.42. E
The matrix constructed in 6.13(c) is called a rational canonical form of T. As
we shall soon see, it is unique up to a permutation of the R1. Let us consider
some examples.
F(Ô, ÔXT)
= C 1)
cT(X) = X2 + 1, which is irreducible in C[X]. Therefore, mT(X) = X2 + 1. The
rational canonical form of T is just the companion matrix
c(x2+1)=(1
[—1 7 01
A1 0 2 OIeM3x3(Q)
[o 3 —ij
THE RATIONAL CANONICAL FORM 165
A=[1
-i -!
A simple calculation shows cA(X) = (X2 + 2)(X2 + 1). Thus, the rational canon-
ical form R of A is given by
Before proving Theorem 6.17, we note that Theorem 2.43 implies that any
decomposition of V = U1 e ® UN into T-cyclic subspaces such that the
restriction of T to U1 has minimal polynomial a multiple of some must involve
all factors q1,..., of mT. Thus, for every i = 1,..., r there must exist some
j = 1,..., N such that the minimal polynomial of T on is a power of q1.
Hence, Theorem 6.17 is the natural sort of uniqueness statement one would
expect.
6.19: dim{q(T)fm(V)}
= E —
p(i)
® ®
i=1 j=1
6.22: Let A e
Definition For each i = 1,..., n, let g1 be a greatest
common divisor of all i-rowed minors of — A. The polynomials g1,
.., are called the invariant factors of — A (or the invariant
factors of A).
Theorem 6.23: Let A, BE Then A and B are similar if and only if the
invariant factors. of XI — A and XI — B are the same.
Proof? The statement that the invariant factors are the same is of course up to
units, i.e. nonzero constants in F. If A and B are similar, then so are XI — A and
XI — B. It then easily follows that the invariant factors of XI — A and XI — B
are the same. Let us suppose A and B have the same set of invariant factors.
In we can perform the same elementary row (and column)
operations the reader is familiar with in We can interchange two rows
(or columns) of a matrix C e We can multiply one row (or column)
of C by a polynomial h(X) and add the result to another row (or column). Both
of these operations are performed by multiplying C on the left (or right) by
suitable invertible matrices in We can also multiply a row (or
column) of C by a nonzero constant from F.
Now by applying suitable row and column operations to XI — A, it is not
difficult to see that XI — A is equivalent to a diagonal matrix of the form
- g1). Hence, there exist invertible matrices
R, SE such that R(XI — A)S = .., We ask the
reader to provide a proof of this fact in the exercises at the end of this section.
Being equivalent is clearly an equivalence relation on Thus, if
XI — A and XI — B have the same invariant factors, then XI — A and XI — B
are equivalent. Hence, there exist invertible matrices P, Q e such
that P(XI — A)Q = XI — B.
We claim there exist invertible matrices p, q E such that
p(XI — A)q = XI — B. To see this, we first note that P and Q can be viewed as
polynomials in X with coefficients in the algebra We can then divide
XI — B into P and Q (in a process entirely analogous to 1.2) and write
(1) Find an invertible matrix P EM3 3(0) such that PAP' is the rational
canonical form of A when A is the matrix given in Example 6.15.
(2) Find a P EM4 4(0) such that PAP' = R for the matrix A in Example
6.16.
(3) In the proof of Theorem 6.17, we claimed that dim{q(T)fm(WJ} =
— for i m. Give a proof of this statement.
(4) Prove Theorem 6.21.
(5) Find the rational canonical form R of
EXERCISES FOR SECTION 6 169
(sin U —cos-O
sinO
(9) Show that the rational canonical form of a diagonal matrix is itself.
(10) A matrix Ae is said to be indecomposable if A is not similar to any
matrix of the form diag(A1, A2) for some smaller square matrices A1 and
A2. A is nonderogatory if m4X) = cA(X). Show that A is indecomposable if
and only if A is nonderogatory and mA(X) = q(X)c with q(X) irreducible in
F[X].
(11) Let T e t(V), and let T* e t(V*) denote the dual of T. Show that
cT(X) = cr(X) and mT(X) = mT4X).
(12) Let T e 1(V), and suppose dim(V) = 2. Show that V is T-cyclic or T =
for some xeF.
(13) Let A be the companion matrix of g(X) = Xm — cm_ 1Xtm_1 — — C0.
Show directly that cA(X) = g(X).
(14) Let p1(X),..., pjX) be a set of monic, primary polynomials, all of degree at
least one in F[X]. Show there exists a matrix A whose nontrivial
elementary divisors are p p is a
power of an irreducible polynomial].
(15) Let
[1 3 31
A=I 3 1
31
[—3 —3
'k
170 CANONICAL FORMS OF MATRICES
(18) Let T e t(V). Show that V is T-cyclic if and only if every Se t(V) that
Commutes with T is a polynomial in T.
(19) An endomorphism T e 1(V) is said to be semisimple if every T-invariant
subspace of V has a T-invariant complement. If mT(X) is irreducible in
F[X], prove that T is semisimple.
(20) In the proof of Corollary 6.24, we use the fact that if g = g.c.d.(f1,. .., in
F[X], then g = g.c.d.(f1, ..., in K[X]. Give a proof of this fact.
(21) In theorem 6.23, argue that XI — A is equivalent to . . ,gj.
Chapter IV
In this chapter and the next, we shall take a brief look at some of the more
important functions which are often present on a vector space V. A great deal of
what we have to say in the present chapter can be done over any suitably
ordered field F. However, we shall simplify our discussion arid assume
throughout that F = R, the field of real numbers. Hence, V will denote a real
vector space in this chapter. We do not assume V is finite dimensional over
Let us begin with the definition of a norm on V.
Definition 1.1: A norm on V is a function f: V —÷ 01 such that
(a) f(cx)>OiftxeV—{O}.
(b) f(xtx) = IxIf@) for all txeV and xell.
(c) f(cx + fi) f(cx) + f(fJ) for all cx, /1eV.
In Definition 1.1, the notation lxi means the absolute value of the real number
x. In previous portions of this text, we have used the notation iAi to denote the
cardinality of the set A. The use of the symbol will always be clear from the
context and will cause no confusion in the sequel. Note that 1.1(b) implies
f(O) = 0.
The most familiar example of a norm on a real vector space V is V = R, and
f(cx) = cxi. Weshall give more interesting examples in a moment. 1ff is a norm on
V, then we shall adopt the standard notation of the subject matter and write
f(tx) = icxli for all cx eV. Thus, the symbol 1111 indicates a real valued function on
V that satisfies (a), (b), and (c) of 1.1.
171
172 NORMED LINEAR VECTOR SPACES
Definition 1.2: A normed linear vector space is a real vector space V together
with some fixed norm III: V —. It
Obviously a given vector space V can be viewed in different ways as a normed
linear vector space by specifying different norms on V. Thus, a normed linear
vector space is actually an ordered pair (V, 1) consisting of a real vector space 1
V and a real valued function liii: V —÷ 11 satisfying the axioms of 1.1. When it is
not important to specify the exact nature of we shall drop this part of the
notation and simply refer to V itself as a normed linear space. Let us consider
some nontrivial examples.
Example 1.3: Let n be a positive integer, and set V = R". V has at least three
important norms whose definitions are given by the following equations:
inequality in Chapter V]. The norm liii given in 1.4(b) is just the usual Euclidean
norm on I?. The norm 1 given in 1.4(c) is called the unVorm norm on I". It is
1
the easiest of the three norms to use when making computations. Note that
when n = 1, all three of these norms reduce to the absolute value on R. E
Example 13: Let V = C([a, b]) denote the set of continuous, real valued
functions on a closed interval [a, bJ. V also has at least three important norms
that are used in analysis:
1.6: (a) 11f11 =
(b) If II =
(c) = b]}
Once again the reader can check that 1.6 defines norms on C([a, b]). The norm
is usually called the uniform norm on C([a, b]). fl
Example 1.7: Let V = It A typical vector in V is an infinite sequence
1
Since any two vectors in V sit in for some n sufficiently large, it is clear that
the equations in 1.8 define norms on V. C
Example 1.9: Let V = {(x1, x2,. . . x? < oo}. We had seen in
Exercise 6, Section 1 of Chapter I that V is a subspace of We can define a
norm on V as follows:
1/2
X2,.. . )II
=
V with the norm defined in equation 1.10 is a well-known Hilbert space, usually
denoted z2. fl
All these definitions depend on the particular norm being used in V. The
function d: V x V —÷ l1 defined in 1.11(a) is called the distance function (relative
to the norm iii). The reader can readily verify that d satisfies the following
properties:
The set Br(cc) introduced in 1.11(b) is called the ball of radius r around cc. Its
exact shape, of course, depends on the particular norm being used.
Example 1.13: Let V = and consider the three norms given in equation 1.4.
If cc = 0 and r = 1, then the ball B1(O) is the following set:
1)
(-1, 1):::; 1)
(—1 0) 0)
..:.zfz.
(—1, —1) —1)
The reader can easily check that Br(cc) is an open set in V. In fact, it is clear
from the definitions that a set A in V is open if and only if for every cc e A there
exists an r > 0 such that Br(tX) A. The collection of subsets 0/1 = {A A is open
in V} forms a topology on V. This means that q5 and V e 0/1, and finite
intersections and arbitrary unions of sets from 0/j are again sets in 0/1.
We can also introduce the familiar concepts of limits, continuity, and so on by
using the distance function d. In what follows, it is assumed that V and W are
normed linear vector spaces. We shall use the same symbol II to denote the 1
norm in both spaces V and W. It will always be clear from the context when the
symbol liii is being used to represent the norm on V and when it is being used to
represent the norm on W.
Definition 1.14: Let V and W be normed linear vector spaces and suppose A is a
subset of V. Let f: A —÷ W be a function and suppose cc e V.
(a) = fi if for every r > 0 there exists an s > 0 such that for every
<r.
(b) If cc e A, and = f(cc), then we say f is continuous at cc.
(c) f is continuous on A if f is continuous at every point of A.
(d) f is Lipschitz on A if there exists a positive constant c such that
— f(n)1I — for all
1.15: (a) f(5) = fi if for every open set U in W containing fi, there exists an
open set U' In V containing cc such that f(U') c U.
BASIC DEFINITIONS AND EXAMPLES 175
Definitions 1.14(a)—1.14(c) describe ideas that are familiar from the calculus.
The notion of a function being Lipschitz is more peculiar to analysis coming
from normed linear spaces. Note that any f: A -÷ W that is Lipschitz on A is
certainly continuous on A. Our most important example of a Lipschitz function
is the norm itself.
Lemma 1.16: Let (V, III) be any normed linear vector space. Then the norm Ill
is a Lipschitz function from (V, liii) to (R, I). I
Since Lipschitz functions are continuous, we conclude from Lemma 1.16 that
the norm II his a continuous, real valued function on V. We shall use this fact
often in the sequel.
Now suppose T: V —÷ W is a linear transformation. It follows easily from the
definition that T is Lipschitz on V if and only if there exists a c > 0 such that
hi dl for all e V. Linear transformations that are Lipschitz on V are
1
very important in analysis and algebra alike. These types of transformations are
called bounded linear operators. Let us formally introduce the notation we shall
use for the set of bounded linear operators.
Definition 1.17: A bounded linear operator T: V -÷ W is a linear transformation
that is Lipschitz on V. The set of all bounded linear operators from V to W will
be denoted êB(V, W).
Thus, T e GB(V, W) if and only if T e W), and there exists a positive
constant c such that II for all e V. Let us consider a few examples
before continuing.
Example 1.18: Let V = I? with n 2, and consider the norm Ill1 given in
Example 1.3. Let and denote the canonical injections and projections of
Let l1 have the standard norm Ix II = lxi. Then e Homk(R, V). Since
= hh(O,...,x,...,0)ll1 = lxi = hlxlh,eachO1isaboundedlinearoperator.
1161(x)111
Therefore, e lfl.
Each projection ; is a linear transformation from I?' to It If (x1, . . , xj e .
The reader can easily verify that and are also bounded linear operators
with respect to the Euclidean norm 1111 and the uniform norm 1111 on
176 NORMED LINEAR VECTOR SPACES
Example 1.19: Let V = C([a, bJ) with norm Fl given in equation 1.6. For
every ft V, set T(f) = f(t) dt. Then T e HomR(V, Ilk). As usual, let the norm on R
be the absolute value I. Set m = 1Ff II = sup{If(t)I t e [a, bJ}. Then IT(f)I =
I
Example 1.21: Set V = W = Ilk, and again let T be the identity map from
V to W. Norm V with liii and W with liii! (notation as in 1.8). If T were a
bounded linear operator, then there would exist a C > 0 such that Ihx C
for all txeV. For = (1,..., 1, 0. . .), this inequality implies n C c. This is clearly
impossible for all n. El
The reader will note that the last two examples were both infinite dimen-
sional. If both V and W are finite dimensional over Ilk, then every linear
transformation from V to W is a bounded linear operator. Thus,
W) = Hom(V, W) whenever dim(V), dim(W) c cc. This fact is not obvious.
We shall prove it in Section 3 of this chapter.
Now suppose V and W are arbitrary normed linear vector spaces, and let
T e Hom(V, W). The relation between being bounded and being continuous is
easily stated.
Theorem 1.22: Let V and W be normed linear vector spaces and suppose
T E Hom(V, W). Then the following are equivalent:
Proof Lipschitz maps are continuous. Thus, the implications (c) (b) (a) are
all obvious. Hence, it suffices to show (a) (c). Suppose T is continuous at cc e V.
Then there exists an s>0 such that — ccli — T(cc)lI < 1.
BASIC DEFINITIONS AND EXAMPLES 177
Now any vector /1eV can be written in the form /3 = — a (set = /1 + a).
Thus, II /311 <s IIT(P)ll <1. If /3 is any nonzero vector in V, then II 0, and
llsIl/211 fill II = s/2 c s. Therefore, IIT(sfi/211 P11)11 c 1. This last inequality is
equivalent to IIT(fi)ll <(2/s)ll fill. We conclude that T is bounded. El
Definition 1.23: Let Te W). Then lIT II = inf{clc is a bound for T}.
There are a couple of alternative definitions for the norm of T that are worth
recording here.
The proofs of (a) and (b) in 1.24 are straightforward. We leave them for
exercises at the end of this section. Let us consider an example.
Proof We have already noted that W) is a vector space under the usual
operations xS + yT in Hom(V, W). It remains to verify that axioms (a), (b), and
(c) of Definition 1.1 are satisfied.
(a) Let W). Lemma 1.24(b) implies 11Th 0. Suppose 11Th = 0. Then
1.24(a) implies 1IT(a)ll = 0 for all a e V. But then T(a) =0 for all a e V and,
consequently, T = 0.
(b) Let T e GJ(V, W) and x e R. Using 1.24(b), we have
(1) Show that equations 1.4(a) and 1.4(c) and 1.6(a) and 1.6(c) define norms on
R" and C([a, b]), respectively.
(2) Let W denote a real vector space. A function f: W -÷ R is called a seminorm
on W if f satisfies the following properties:
(a) f(a)?OforallaeW.
(b) f(xa) = lxlf(a) for all xeR and aeW.
(c) all a,fleW.
EXERCISES FOR SECTION 1 179
(10) Give a detailed proof of the assertion that 1.15(a) and 1.15(b) are equivalent
to 1.14(a) and 1.14(c), respectively.
(11) Show that and it1 are bounded linear operators when is replaced by
either III or III (notation as in 1.4) in
(12) Let Show for all
e V, and consider the map W) -÷ W given by EéT) =
Show that is a bounded linear operator.
(14) Let e V. Show that there exists an fe R) such that II = 1 and
=
(15) Use Exercise 14 to show that = in Exercise 13.
(16) If define IAII 1,...,n}. Show
that this equation defines a norm on for which PABI! DAD IBIP
for all matrices A and B.
180 NORM ED LINEAR VECTOR SPACES
(19) Prove that every ball Br(cx) in V is a convex set. (A set S in V is convex if /3,
+ (1 — x)c5eS for all xe[0, 1].)
(20) Consider (W, and let cx = (x1,..., xj e Define a map T: -÷
by T(y1,..., = Show that l1), and compute liii.
(21) Use Exercise 9 to show that the sum of any two bounded sets is bounded.
(22) Formulate the appropriate notion of a function f: V -+ W being Lipschitz at
a point cx e V. Consider the function f: R —* R given by f(x) = xft'2. Show
that f is Lipschitz at 1, but is not Lipschitz at 0.
Two normed linear vector spaces V and W are said to be norm isomorphic if
there exists an isomorphism T: V W such that T and T1 are bounded linear
operators, that is, T e V). For example, we have seen in
Exercise 17 of Section 1 that l?1) and are norm isomorphic when
W is given the uniform norm If two normed linear vector spaces are norm
hi
isomorphic, they are for all practical purposes identical. Thus, in the sequel, we
shall identify spaces that are norm isomorphic whenever it is convenient to do
so.
For two different norms on the same space V, we have the following
important definition:
Definition 2.1: Two norms ii ii and on the same real vector space V are said
II
to be equivalent if there exist positive constants a and b such that ihcxhh ahlcxhh',
and jlcxhh' c blhcxhh for all cxeV.
Thus, two norms and Ifi' on V are equivalent if the identity map is a norm
isomorphism from (V, liii) to (V, h'). We have already seen in Example 1.20 or
1.21 that two norms need not be equivalent. We should also point out the trivial
case of R itself. Definition 1.1(b) implies that any norm on R is equivalent to the
absolute value I I. We shall prove in Section 3 that if dim(V) c ccc, then any two
norms on V are equivalent.
The equivalence of norms is an important idea in analysis and topology alike.
If two norms are equivalent on V, then they generate the same topology. By this
we mean the open sets relative to the two norms are the same collection of sets.
More precisely, we have the following theorem:
PRODUCT NORMS AND EQUIVALENCE 181
open set in (V, Ifl). This completes the proof of the theorem. fl
Theorem 2.2 implies that all topological notions remain the same when
switching between equivalent norms. This being the case, one should try to
choose an equivalent norm in which the given problem becomes easier
computationally. For example, in II? the Euclidean norm is equivalent to the
uniform norm (see Exercise 1 at the end of this section). It often happens in
specific problems that is easier to handle than the Euclidean norm when
doing arithmetic. Thus we often switch from to Ill when dealing with
problems in W1. Since these norms are equivalent, there is no loss in generality
from a topological point of view in making this switch.
An immediate corollary to Theorem 2.2 is the following remark, whose proof
we leave to the reader.
Corollary 2.3: Let V and W be normed linear vector spaces. Then the set
iM(V, W) in HomR(V, W) remains the same if either the norm in V or W is
replaced by an equivalent norm. Changing norms in V and W to equivalent
norms results in equivalent uniform norms on iJ(V, W). El
V x W. For example, the norm ii given in equation 1.4(a) is clearly the sum
norm on
The sum norm on V x W has the property that the canonical injections
and projections it1 associated with V x W are all bounded linear operators.
Thus,
2.4: V V
x
182 NORMED LINEAR VECTOR SPACES
Definition 2.5: Let V and W be normed linear vector spaces. A norm on the
product V x W is called a product norm if the canonical maps in Diagram 2.4
are all bounded linear operators.
2.6: (1)
(2)
forallczeV fleW
(3) P)li
(4) 1n2(x, /3)112 dIl(x,
Theorem 2.7: Let (V, II 1) and (W, 1112) be normed linear vector spaces. Then
any product norm on V x W is equivalent to the sum norm
lKcx, = lalli +
Proof? Let II be any product norm on V x W. Since
1
and it1 are bounded,
there exists constants a, b, c, and d satisfying the inequalities in 2.6. Let
e = max{a, b}. Then fl)li = lKcx, 0) + (0, + 11(0, lOll =
+ 1102(13)11 + bO /3l12 c eflhx01 + = ell(x, Thus, Il@'
We can generalize Theorem 2.7 to n factors in the obvious way. We state the
result and leave the proof for the reader.
norm IL given in (a) below. The formulas in (b) and (c) also define product
norms on V1 x x
Lemma 2.9: Let V be a normed linear vector space. The operation of addition is
a bounded linear operator from V x V to V. The operation of scalar multiplica-
tion is a continuous function from l1 x V to V. El
Definition 2.10: Let (V, be a normed linear vector space with subspaces
V1,..., Vi,. We say that V is a norm direct sum of the V1 if
norm isomorphism.
Theorem 2.11: Suppose V is a normed linear vector space and an internal direct
sum V = V1 e of subspaces V1 ,.. .,
V onto V1. Then V is a norm direct sum of the V1 if and only if each P1 is a
bounded linear operator.
Proof From our discussion above, we know V is a norm direct sum of the V1 if
and only if 5 is a bounded linear operator. 51 is bounded if there exists a
c > 0 such that cIlxl! for all xeV. This last inequality means
Now suppose V is the norm direct sum of the V1. Let oe V and write
with for j=l,...,n. For any fixed i, we have
1P1(x)jl = jx1fl = Thus, P1 is a bounded linear
operator.
Conversely, suppose each P1 is bounded. Then there exists a k1 > 0 such
that 11P1@)tl for all xeV. Let c=Er1k1. Then
>J1_i = 11P1@)ll cljxjl. Thus, 51 is bounded. Consequently, V is a
norm direct sum of the V1. U
(4) Let V and W denote normed linear vector spaces, and consider the
projection it1: V X W -÷ V. Show that relative to any product norm on
V X W, it1 is an open map [i.e., U open in V x W n1(U) open in V].
(5) Prove Theorem 2.8.
(6) Let V be a normed linear vector space and define f: R x V V by
f(x, tx) = Show that f is continuous relative to any product norm on
(7) Let V1, i = 2, 3, be normed linear vector spaces. Show that the map
1,
V2) x V3) -÷ eJ(V1, V3) given by X(S, T) = TS is a con-
tinuous map. Here the norm on is the uniform norm given
in equation 1.23, and the norm on the product is any product norm.
(8) Give an example of a normed linear space V and two subspaces V1 and V2
of V such that V = V1 $ V2, but V is not the norm direct sum of V1 and
V2.
(14) Suppose V and W are normed linear vector spaces, and T e W). Let N
be a closed subspace of V contained in ker(T). Let T be the induced map on
V/N given by T(cz + N) = T(x). If we norm V/N as in Exercise 13, show I is
a bounded linear map from V/N to W such that 1111 =
(15) Let V and W be normed linear vector spaces, and let T e W). T is
called an isometry if = for all e V. Examine the bounded maps
in this section and decide which are isometries.
In this section, we shall prove that any two norms on a finite-dimensional vector
space V are equivalent. In order to do this, we need to develop a certain number
of topological facts that are true in any metric space. However, we shall state all
the results we need in the language of normed linear vector spaces. Throughout
this section, V will denote a normed linear vector space with some fixed norm
3.2: —÷ /3 if and only if for every r> 0 there exists an me N such that
k
Proof We first show that + flj —* oc + /3. Let r > 0. Then there exists
natural numbers m1 and m2 such that k m1 lock — cxli <r/2, and
k ) m2 /1k — /311 <r/2. Let m = max{m1, m2}. If k m, then l(ock + /3k)
+ <r/2+r/2=r. Thus,
SEQUENTIAL COMPACTNESS AND THE EQUIVALENCE OF NORMS 187
Lemma 3.4 says that the smallest closed set in V containing A is precisely the
set of all limits from sequences in A. For instance, the closure ofB1(0) in the three
norms given in Example 1.13 are the following sets:
11111 liii
The boundary, (B1(0))3 = B1(0) — B1(0), of B1(O) in the three norms are the
three curves pictured below:
3.6:
ii
Lemma 3.7: Let V and W be normed linear vector spaces, and let A be a subset
of V. Suppose f: A -÷ W is a function. Let cx eA. Then f is continuous at cc if and
only if for every sequence in A converging to cx, —'
Corollary 3.8: Two norms on V are equivalent if and only if the set of
convergent sequences in V relative to one of the norms is precisely the same as
the set of convergent sequences relative to the other norm. 0
and and
a norm on V x W. We first note that the sum norm
= clearly satisfies property (c??'). If ill is a product norm on
+ II
We can now introduce the central ideas of this section. In order to discuss
sequential compactness, we need the notion of a subsequence of }.
If we set f(k) = nk in Definition 3.10, then n1 <n2 <n3 <...,and 13k = aflk
for all k e NJ. For this reason, we often use the notation {aflk} to indicate a
subsequence of We shall also use the notation { /J,j c÷ to indicate that
{ is a subsequence of
There are a few elementary remarks concerning subsequences that we shall
use implicitly throughout the rest of this section. We gather these remarks
together in the following lemma:
Corollary 3.8 implies that if two norms are equivalent, then a set A is
sequentially compact with respect to one norm if and only if A is sequentially
compact with respect to the other norm. Let us consider some simple examples
in R before continuing.
Theorem 3.14: Let A be a subset of a normed linear vector space (V, 1111). If A is
sequentially compact, then A is closed and bounded.
At this point, it may be a good idea to say a few words about a possible
converse of Theorem 3.14. We shall show later on in this section that if
dimR(V) < oo, then the converse of 3.14 is true. For the time being, we merely
note that neither hypothesis, closed nor bounded, alone implies sequential
compactness. 1 is a closed subset (but not bounded) of R, and / is not
sequentially compact. C = { 1/n n e Ni} is a bounded subset (but not closed) of
and C is not sequentially compact.
SEQUENTIAL COMPACTNESS AND THE EQUIVALENCE OF NORMS
Theorem 3.15: Let V and W be normed linear vector spaces, and let A be a
subset of V. Suppose f: A W is a continuous function. If A is a sequentially
compact subset of V, then f(A) is a sequentially compact subset of W.
Proof? Let be a sequence in f(A). Then for every n, there exists an oc,, e A with
= Since A is sequentially compact, has a convergent subsequence
—÷ fleA. Since f is continuous, -. f(fl) by Lemma 3.7. Since fleA,
f(fl) e f(A). Thus, the sequence has a convergent subsequence {f(/Jj} — f(fl)
in f(A). We conclude that f(A) is sequentially compact. El
Proof? Before proving the corollary, let us discuss its meaning. We say f is
bounded on A if there exists a positive number b such that If@)I <b for all a eA.
To say that f assumes a maximum value on A means there exists an e A such
that f(c5) f(a) for all eA. Similarly, f assumes a minimum value on A if there
exists a fleA such that f(5) f(fl) for all c5eA.
Now by Theorem 3.15, f(A) is a sequentially compact subset of Ut Thus, by
Theorem 3.14, is a bounded subset of R. This of course means f is bounded
on A.
Let x = sup f(A) = sup{f(c5) eA}. Let y = inff(A). Since f(A) is a bounded
subset of R, both x and y exist. Theorem 3.14 also implies that f(A) is a closed
subset of R. The reader can easily check that any closed (and bounded) subset of
R contains both its infimum and supremum. In particular, x, y e f(A). If eA
such that f(ct) = x, then clearly f assumes a maximum value x on A at Similarly,
if fleA such that Q/J) = y, then f assumes a minimum value y on A at /3. J
We now turn our attention to R. Since any norm on is equivalent to the
absolute value we can, with no loss in generality, state all our results relative
to We first remind the reader of some familiar definitions from the calculus.
not an upper bound of the set In e Hence, there exists an m such that
xm > x — r. Since is increasing, x — r < xm C x for all n ? m. In
particular, n ? m — <r. Thus,
xj x.
If is decreasing and y = J
ne then a similar proof shows
El
We can combine Lemmas 3.18 and 3.19 into the following important
theorem:
We can generalize Theorem 3.20 to I?' and any product norm as follows:
Proof From Theorem 2.8 and Corollary 3.8, we may assume that is the sum
norm II(x1,..., = dx1I. We proceed by induction on n, the case n = 1
having been proved in 3.20.
Suppose is a bounded sequence in or. Then there exists a constant c > 0
such that Ixk C c for all k. Let us write ak = (xlk,.. ., xflk) for all ke NI, and set
= x
13k
SEQUENTIAL COMPACTNESS AND THE EQUIVALENCE OF NORMS 193
Then ak =03k, xflk). Since liakill = II13k1I1 + IXnkI ? IIfJkIIl, IXnkI, both {/3k} and
{ are bounded sequences in R'1 -' and R, respectively. By our induc-
tion hypothesis, {/3k} contains a convergent subsequence {fJk}. Suppose
{/Jk}
Now consider the corresponding subsequence {xflk.} of {xflk}. Since {xflk} is
bounded, {xflk} is bounded. Thus, Theorem 3.20 implies {xflk} has a convergent
subsequence { }. Recall that this means there exists a strictly increasing
function f: RJ —* {k1, k2, k3, . . } such that
. = for all j e Suppose
—+ y. For eachj e rSJ, set = Then is a subsequence of {/3k.}. At this
point, a diagram of the sequences we have constructed may be helpful.
One important corollary to Theorem 3.21 is the converse of Theorem 3.14 for
product norms on
Theorem 3.23: Let Ill be any product norm on l?1. A set A in (lr, II II) is
sequentially compact if and only if A is closed and bounded.
Set S = {ae if?' = 1}. The reader can easily check that S is a closed and
bounded subset of (I?', 1111 J. In particular, Theorem 3.23 implies that S is a
sequentially compact subset of (R", 11111).
Since II II is a continuous function on I?', certainly II II is a continuous function
on S. We can now apply Corollary 3.16 to the continuous map II ll:S —* l1. We
conclude that II assumes a minimum value m on S. Thus, there exists a y e S
such that [I'll = m, and lall m for all aeS. Note that m >0. For if m C 0,
then liv II C 0. Since liii is a norm on R", we would then conclude that y = 0. This
is impossible since 0 is not in S.
We have now constructed a positive constant m, such that hal ? m for all
aeS. We can rewrite this last inequality as hail mhIaII1 for all aeS. Let
fieif?' — {0}. Then Consequently, Thus,
II fill mhl This last inequality also holds when /3 = 0. Thus, setting
b = 1/m, we have shown hail1 C bilall for all ac R". Since we had previously
argued that hail C aIlaIl1, we conclude that 1111 is equivalent to El
In the rest of this section, we shall develop the important corollaries that
come from Theorem 3.24. We have already mentioned our first corollary.
Notice then that any norm on if?' being equivalent to 11111 is automatically a
product norm. Hence, we can drop the adjective "product" when dealing with
norms on Any norm on I?' is a product norm.
Corollary 3.26: Let V be a finite-dimensional vector space over R. Then any two
norms on V are equivalent.
Corollary 3.27: Let (V, III) be a finite-dimensional, normed linear vector space.
If = n, then (V, III) is norm isomorphic to (IR", 11111).
Proof Suppose = {cz1,.. ., is a basis of V over R. Then we have an
isomorphism 5: V R" given by S((5) = (x1,..., where i = 5.We can
define a new norm fl' on V by the equation 11(511' = II S((5) fly. It is a simple matter
to check that S is now a norm isomorphism between (V, III') and (P.", 1111). By
Corollary 3.26, liii and fl' are equivalent. This means that the identity map
from (V, liii) to (V, III') is a norm isomorphism. Composing these two norm
isomorphisms, we get (V, iii) and (P.", 11111) are norm isomorphic. U
Thus, for finite-dimensional, normed linear vector spaces, the theory is
particularly easy. We can always assume that our space is (P.", II II
up to norm
isomorphism.
Returning to a remark made earlier in this section, we can now prove the
following generalization of Theorem 3.23:
Corollary 3.28: Let (V, III) be a finite-dimensional, normed linear vector space.
Then a subset A of V is sequentially compact if and only if A is closed and
bounded.
Proof (V, liii) is norm isomorphic to (P.", liii 1) for some n. So the result follows
from Theorem 3.23. U
We should point out here that Corollary 3.28 is really a theorem about P.". It
is not true in general. If(V, is an infinite-dimensional, normed linear vector
space, and we set B = {czeVl C 1}, then B is closed and bounded in V.
However, B is never sequentially compact. We ask the reader to provide a proof
of this assertion in Exercise 10 at the end of this section.
We can also use Corollary 3.26 to show that all linear transformations on
finite-dimensional spaces are bounded.
Corollary 3.29: Let V and W denote finite-dimensional, normed linear vector
spaces. Then eJ(V, W) = HomR(V, W).
Proof Let T e HomR(V, W). Since any two norms on V (as well as W) are
equivalent, it suffices to argue that T is bounded with respect to a specific choice
of norms on V and W.
Suppose dim(V) = n and dim(W) = m. Let = .
., z,,} be a basis of V
and /3 = { 13m} a basis of W. Then we have the following commutative
diagram:
T
3.30:
>prn
196 NORMED LINEAR VECTOR SPACES
S((x1,..., xj) = (E
Now let us norm W with the usual sum norm 11111. Then Dali = ilf(a)111 defines
a norm on V. Let us norm with the uniform norm (notation as in
Example 1.3). Then 11/311' = lg(fJ)li is a norm on W. We now have the following
commutative diagram of normed linear vector spaces:
T
3.31: (V, liii) >('W, liii')
p
S
Ii 1k) (Rm, II lI)
It suffices to argue that T is bounded with respect to the norms 1111 and 1111'.
Set For any vector y=(x1,...,
e we have the following inequalities:
3.32:
= jti
The conclusion we can draw from Corollaries 3.27 and 3.29 is that when
dealing with finite-dimensional, normed linear vector spaces V and W (and the
linear transformations between them), we can make the following assumptions
up to equivalence:
(a) V = and W = Rm.
(b) The norms on V and W can be any we choose for computational
convenience.
SEQUENTIAL COMPACTNESS AND THE EQUIVALENCE OF NORMS 197
Our last topic in this section is another corollary that has important
applications in least-squares problems.
Corollary 3.34: Let (V, II) be a normed linear vector space, and suppose W is a
finite-dimensional subspace of V. Then for any vector fi e V, there exists an a e W
such that d(fi, W) = Va — fill.
Example 3.35: Let V = {fe C([O, 1]) f(O) = O}. Clearly, V is a subspace of
C([O, 1]). We norm V with the uniform norm 1111 given in equation 1.6(c). Let
198 NORMED LINEAR VECTOR SPACES
3.36:
(1) If a sequence in V converges to two vectors /3 and /3', show that /3 = /3'.
(2) Show that the sum norm II(a, /3)11, = haIl + 11/3112 satisfies property in
Corollary 3.9.
(3) Let A be a closed and bounded subset of It Show that inf(A) and sup(A) are
elements in A.
(4) Let V and W be normed linear vector spaces, and suppose f: A —p W is a
continuous function on a sequentially compact subset A of V. 1ff is a
bijection from A to f(A), show f': f(A) —÷ A is continuous.
(5) Construct a sequence in [0, 1] such that every ye [0, 1] is the limit of
some subsequence of
(6) If A and B are sequentially compact subsets of a normed linear vector space
V, show A + B is also sequentially compact.
(7) Unlike sequential compactness, the sum of two closed sets in V need not be
closed. Exhibit an example of this fact.
EXERCISES FOR SECTION 3 199
(12) In Exercise 11, suppose we assume dim(V) < CX). Show there exists a vector
/1eV such that 111111 = d(/1, W) = 1.
(13) Suppose V and W are normed linear vector spaces, and let T e W).
Show that ker(T) is a closed subspace of V. Is Im(T) a closed subspace of
W?
(14) Consider the function f: R2 —, R defined as follows: f(x, y) = xy/(x2 + y2) if
(x, y) (0, 0), and f(0, 0) = 0. Use Lemma 3.7 to prove the following
assertions:
(a) For all xc R, f(x, ): R R is continuous at 0.
(b) For all ye R, f( , y): R -. R is continuous at 0.
(c) f is not continuous at (0, 0).
(15) Suppose V and W are normed linear vector spaces, and let A and B be
sequentially compact subsets of V and W, respectively. Prove that A X B is
a sequentially compact subset of V X W relative to any product norm on
VXW.
(16) Let V be a normed linear vector space. A subset A of V is said to be dense in
V if A = V. Give an example of a proper subset of I?' that is dense in
Suppose W is a second normed linear vector space and f and g two
continuous functions from V to W. Suppose f = g on some dense subset of
V. Prove f = g.
(17) Let V and W be normed linear vector spaces. Let f: A —÷ W be a function
from a subset A of V to W. We say f is unjformly continuous on A if
for every r > 0, there exists an s > 0 such that for all cx, /1 e A,
iicx—/1ii <r.
(a) If f is uniformly continuous on A, prove that f is continuous on A.
Show that the converse is false.
(b) If A is sequentially compact, and f continuous on A, prove that f is
uniformly continuous.
200 NORMED LINEAR VECTOR SPACES
4. BANACH SPACES
In this section, we take a brief look at Banach spaces. Our goal is to introduce
the terminology most frequently used in the literature and prove that any
normed linear vector space is contained in some Banach space. As usual, (V, liii)
will denote some normed linear vector space over R. We first remind the read
about a definition familiar from the calculus.
Definition 4.1: A sequence {ocj in V is said to be Cauchy (or is called a Cauchy
sequence) if for every r> 0, there exists an m e 1%J such that
<r.
There are several easy facts about Cauchy sequences that we shall need in the
sequel. We gather these facts together in our first lemma. We leave the proof for
the exercises at the end of this section.
Lemma 4.2: (a) If — then is Cauchy.
(b) Any Cauchy sequence is bounded.
(c) If is Cauchy and contains a subsequence converging to say
then —÷
We note that 4.2(f) is not true in general for continuous functions. See
Exercise 2 at the end of this section. We can now introduce the central definition
of this section.
Again we remind the reader that this definition depends on the particular
norm on V. It is more precise to say (V, liii) is complete. If liii and 1' are two
equivalent norms on V, then clearly a sequence {oçj is Cauchy in the Il-norm if
and only if is Cauchy in the II I'-norm. In particular, (V, Ill) is complete if
and only if (V, liii') is complete.
A complete, normed linear vector space is called a Banach space in honor of
the great analyst Stefan Banach. One of the most important examples of a
Banach space is R itself.
BANACH SPACES 201
Lemma 4.8: Let (V, 1111) be a normed linear vector space, and suppose A is a
subset of V such that A = V. If every Cauchy sequence in A has a limit in V, then
V is complete.
Proof Let {oçj be a Cauchy sequence in V. We must argue {oçj has a limit in V.
Since A = V, each In particular, for each n e FkJ, there exists a such
that <1/n. Then C +
202 NORMED LINEAR VECTOR SPACES
c 1/n + 1/rn + — 2m11. Since {cxn} is Cauchy, we conclude frorn these in-
equalities that { is Cauchy. Our hypotheses now imply that there exists a
fJeV such that But then C +
II — fill implies {ocj —÷ fi.
Thus, V is complete. fl
We can now state our main result.
Theorem 4.9: Let (V, II II) be a norrned linear vector space. Then there exists a
Banach space (V', 1111') and a rnonornorphism 0: V —* V' such that the following
properties are satisfied:
Let us say a few words concerning Theorem 4.9 before giving its proof. A
Banach space (V', liii') satisfying the conditions (a) and (b) in 4.9 is called a
cornpletion of (V, II II). The theorem guarantees that every normed linear vector
space has a completion. The second half of the theorem says that any
completion of V is unique up to norm isomorphisrn. Hence, we may refer to the
completion of (V, III).
A linear transformation between norrned linear vector spaces which preserves
distances is called an isornetry. Thus, 4.9(a) says V is isometrically imbedded in
its completion. 4.9(b) says via 0, V sits in its completion as a dense subset.
Proof of 4.9: Consider the vector space given in Example 1.6 of Chapter 1.
VN is nothing but the set of all sequences in V with addition and scalar
multiplication defined pointwise. Let S = EVN is Cauchy}. If
and { are Cauchy sequences in V, and x, y e then clearly + is a
Cauchy sequence in V. Thus, S is a subspace of VN.
Let eS. The inequality IIocJI — INxmII C — implies that is
a Cauchy sequence in R. Since R is complete, the sequence has a limit in
R. In particular, it makes sense to talk about the lirnit, of the sequence
for any vector eS. We can then define a function p: 5 —÷ R by
p({oçj) = The reader can easily verify that the function p satisfies the
BANACH SPACES 203
following properties:
These inequalities hold for all { eS, and all x e R. In the language of
Exercise 2 of Section 1, p is a seminorm on S. Note that p({oçj) = 0 does not
mean = 0. So, p is not a norm on S.
We now follow the ideas laid out in Exercise 10 of Section 2. Set
N= €5 = 0}. N is precisely those Cauchy sequences in V whose
norms "cxJ have limit zero. By Exercise 10, N is a subspace of 5,
and S/N is a normed linear vector space with norm given by
+ Nil' = p({cxj) = Set V' = S/N. We claim (V', III') is a
Banach space satisfying (a) and (b).
We first define a map 0: V —* V' by the equation 0(cx) = + N, where {cx,j
is the constant sequence cxi, = cx for all n e 1%J. We shall need some special
notation for constant sequences in this proof. We shall let {cx} denote the
constant sequence in V every term of which is cx. Clearly, any constant sequence
{cx} is Cauchy, and, thus, {cx}eS. The map 0 is now given by 0(2) = {cx} + N.
Clearly, 0 is a linear transformation from V to V'.
Suppose 0(2) = 0 for some cx V. Then {cx} + N = N in V'. Thus, {cx} eN. But
then iicxfl = lim flcxfl = 0. Therefore, cx = 0 since is a norm on V. We conclude
1
(V', lii')
4.13: For every €5, and for every r > 0, there exists a vector e V such that
P(0o@7) — c r.
204 NORMED LINEAR VECTOR SPACES
4.14: Every Cauchy sequence in 00(V) has a limit in 5, that is, there
exists a vector {13k} eS such that — = 0.
(2) Give an example in R showing 4.2(f) is not true in general for continuous
functions.
(3) Fill out the details in the proof of Corollary 4.5.
(4) Show that C([0, 1]) is a Banach space with respect to the uniform norm
= 1]}.
(5) Suppose (V1, 1)' ...' (Va, are a finite number of Banach spaces.
Show that the product V1 x x is a Banach space relative to any
product norm.
EXERCISES FOR SECTION 4 205
(6) Suppose V is a normed linear vector space and W is a Banach space. Show
that W) is a Banach space with respect to the uniform norm (see
Definition 1.23).
(7) In the proof of Theorem 4.9, show that assertion 4.13 indeed implies the
closure of 0(V) in V' is all of V'.
(8) Complete the proof of Theorem by showing that the completion
(V', liii')V is unique up to a norm isomorphism satisfying 4.10.
of
(9) Suppose (V, Ill) is a Banach space, and V itself is also an algebra over R.
We say (V, liii) is a Banach algebra if the following two properties are
satisfied:
(i) lcxfihl C 1kV II /311 for all cx, /3eV.
(ii) 11111 = 1.
If V is a Banach space, show that V) is a Banach algebra with
respect to the uniform norm.
(10) Suppose (V, 1111) is a Banach algebra.
(a) IfoceV has < 1, then show 1 — is invertible in V. More precisely,
cx
(12) Show that (C([O, 1]), liii) is not a Banach space. as in 1.6(b)).
(13) Let (V, 1111) be a Banach space. Suppose W is a closed subspace of V. Prove
that (W, Ill) is a Banach space.
(14) Suppose (V, liii) is a normed linear vector space. If V is sequentially
compact, prove that (V, 1) is a Banach space. Is the converse true?
(15) Let (V, II 1) be a normed linear vector space. Let {cxn} be a sequence in V.
We say the infinite series converges (to say /3 e V) if —, /1. Here
{
is the usual sequence of partial sums given by = We say
is absolutely convergent if converges. If (V, III) is a
Banach space, prove that every absolutely convergent series converges.
(16) Prove the converse of Exercise 15: If every absolutely convergent series in V
is convergent, then V is a Banach space.
(17) Use Exercise 16 to show that if N is a closed subspace of a Banach space V,
then V/N is a Banach space. The norm on V/N is given in Exercise 13 of
Section 2.
Chapter V
Conditions (a)—(d) in 1.1 are to hold for all vectors cx, /3, y in V and for all x,
ye R. Note that (a) and (d) imply that (cx, cx> 0 with equality if and only if
cx =0.
206
REAL INNER PRODUCT SPACES 201
Example 1.2: Let V = R'1, and set <cx, /3> = where cx = (x1,...,
and /3 = (y1,..., yj. It is a simple matter to check that < , > satisfies
conditions (a)—(d) in 1.1. We shall refer to this particular inner product as the
standard inner product on fl
Example 1.3: Let V = e r=1 R. Define an inner product on V by setting
<cx, /3> = Here cx = (x1, x2,...) and /3 = (y1, y2,...). Since any vector
in V has only finitely many nonzero components, '( , ) is clearly an inner
product on V. Thus, V is an example of an infinite-dimensional, inner product
space. El
Example 1.4: Let V = C([a, b]). Set <f, g) = f(x)g(x)dx. An easy computation
shows < , > is an inner product on V. J
Let (V, < , >) be an inner product space. If T: W —> V is an injective linear
transformation, then we can define an inner product on W by setting
<cx, /3>' = T(/3)) for all cx, /3 e W. In this way, we can produce many new
examples from the examples we already have. A special case of this procedure is
that in which W is a subspace of V. If we restrict < , > to W x W, then
(W, < , >) becomes an inner product space in its own right. For instance, is a
subspace of V in Example 1.3. When we restrict < , ) to x we get the
standard inner product on
Our first general fact about an inner product space (V, <')) is that V has a
natural, normed linear vector space structure that is intimately related to the
inner product < , ). To see this, we need an inequality, known as Schwarz's
inequality.
Lemma 1.6: Let V be an inner product space. Then <cx, /3)1 /3>112
<cx, cx)112</3,
for all cx, /3eV.
208 INNER PRODUCT SPACES
Proof Fix and /3 in V. For each real number t, let p(t) = — t/3, — t/1).
Then 1.1 implies that p(t) is a quadratic function on such that p(t) 0 for all
t e R. It follows that the discriminant, 4(x, /3)2 — /3, /3), of p(t) must be
negative or zero. Thus, (cx, /3)2 C x)<fl, /3). Taking square roots gives us the
desired conclusion. fl
(E xiyi) C v?).
Corollary 1.9: Let (V, < , )) be an inner product space. Then = (x, tx)1'2 is
a norm on V.
Proof We must verify that liii satisfies the conditions in Definition 1.1 of
Chapter IV. Of these conditions, only the triangle inequality
+ /Jfl + 11/311, requires any proof. Fix cx and /3 in V. Using Schwarz's
inequality, we have Ihx+13112= (cx+/3, cx+/3) = (oçoc) +2<oc/i)+ (/3,13)
C + 21(2,13)1 + 11/3112 C 112112 + 2021111/311 + 11/3112 = (1120 + 11/311)2. If we
now take the square root of both sides of this inequality, we get + /111 C
+ El
Corollary 1.9 implies that every inner product space (V, ( , )) is a normed
linear vector space via = (2, 2)h/2. We shall call the norm given in 1.9 the
norm associated with the inner product ( , ). In Example 1.2, for instance, the
norm associated with the standard inner product is just the ordinary Euclidean
norm given in equation 1.4(b) of Chapter IV. In Example 1.3, the norm
associated with ( , ) is the natural extension of the Euclidean norm to
[1.8(b) of Chapter IV]. In Example 1.4, the norm associated with the inner
product there is given by 1.6(b) of Chapter IV.
Since any inner product space (V, ( , )) is a normed linear vector space with
respect to the norm associated with ( , ), we have all the topological
machinery from Chapter IV at our disposal. In particular, we can talk about the
distance between two vectors, open and closed sets, continuity, limits, complete-
REAL INNER PRODUCT SPACES 209
and so on. It will be understood that these notions are all relative to the
ness,
norm associated with the inner product ( , >. Thus, when we speak of the
distance between two vectors and fi, for instance, we mean
When dealing with inner product spaces, we shall always use the symbol II
to represent the norm associated with the inner product, that is, = 2>1/2.
In terms of the associated norm, Schwarz's inequality can be rewritten as
follows:
follow from (s?) in Corollary 3.9 of Chapter IV. Since we know the
sequence is bounded. Suppose Cc for all ne Ni. Applying Schwarz's
inequality, we have
fi,) — (cx, 11>1 flu) — 11>1 + — (ocfl>I
+
+ 01311
Definition 1.12: A normed linear vector space (V, ii) Called a pre-Hilbert
space if there exists an inner product ( , > on V such that = (cx, x>"2 for
all cxeV.
=
3= . . , bj denote the canonical basis of R°. Set = and
let S denote the boundary of the unit ball in (W, HI Then
S = {cx =(x1,...,xjI IkxII1 = 1} = = 1}. On the other
hand, = = Thus, S is the set of zeros in 11" of the
quadratic polynomial 1
— 1. This is clearly impossible. S has too
many corners to be the set of zeros of any polynomial. Thus, the sum norm fi
cannot be the associated norm of any inner product on In particular, the
normed linear vector space (V, II III) is not a pre-Hilbert space.
Definition 1.13: A pre-Hilbert space (V, 1111) is called a Hubert space if V is
complete with respect to Ill.
Thus, a Hilbert space is a Banach space whose norm is given by an inner
product. For example, (R", III) is a Hilbert space. More generally, Corollaries
3.27 and 4.5 of Chapter IV imply that any finite-dimensional pre-Hilbert space is
in fact a Hilbert space. For an example of an infinite-dimensional Hilbert space,
we return to Example 1.5. We ask the reader to confirm that 12 is a Hilbert space,
infinite dimensional over R. (See Exercise 2 at the end of this section.)
An important point here when dealing with pre-Hilbert spaces is the analog
of Theorem 4.9 of Chapter IV.
Theorem 1.14: Let (V, Ill) be a pre-Hilbert space. Then there exists a Hilbert
space (V', Ill') and an isometry 6: V -÷ V' such that the closure of 6(V) in V' is all
of V'. Furthermore, if (V", III") is a second Hilbert space admitting an isometry
i/i: V —÷ V" such that t/i(V) is dense in V", then there exists a norm isomorphism
x: V' V" such that xO =
Proof We shall not use this theorem in the rest of this text. Hence, we define V'
and leave the rest of the details to the exercises. Let (V', Ill') denote the
completion of V constructed in Theorem 4.9 of Chapter IV. We define an inner
product on V' = S/N by the following formula:
+ N, + N> = tim fin>
REAL INNER PRODUCT SPACES 211
The reader can easily argue that this formula is well defined and gives an inner
product on whose associated norm is 11 1'. S
Throughout the rest of this section, (V, U will denote a pre-Hilbert space.
Let us recall a few familiar definitions from the calculus.
Note that A' is a subspace of V such that A' n L(A) = 0. In fact, we even
have A' n L(A) = 0. For suppose cx e MA), and fleA'. By Lemma 3.4 of
Chapter IV, there exists a sequence {cxn} in L(A) such that cx. Using the
continuity of the inner product, we have {(cxn, /3)} (cx, /9). But, $) = 0
for all n e Hence, (cx, /3) = 0. In particular, if cxc A' n MA), then <cx, cx) = 0.
Thus cx = 0.
Vectors that are orthogonal behave nicely with respect to length formulas.
j
normed linear vector space (V = {fe C([O, 1]) f(0) = 0}, 1111 J. Unfortunately,
(V, Ill is not a pre-Hilbert space. The reader can easily argue that 1111 is not
the norm associated with any inner product on V. To produce an example that
fits our present context, we can return to Example 1.5. If we set
W= €12 = 0 for all n sufficiently large}, then W is a subspace of 12. Let
fi = {l/n}. Then fi €12 — W. The reader can easily check that d(fi, W) = 0. (In
fact, W = 12.) Thus, d(fi, W) = 0, but 0 In — fill for any neW. We ask the
reader to verify these remarks in Exercise 7 at the end of this section.
Thus, in a pre-Hilbert space V, a given subspace W may contain no vector n
that is closest to fi in the sense that d(fi, W) = In — fill. However, if W does
contain a vector cx such that In — fill = d(fi, W), then we can give a nice
geometric characterization of cx.
Proof Suppose (cx — fi) ± W. Let c5eW — {cx}. Then — fill2 = ll((5 — cx)
11(5
+ (cx — fi)112 = 11(5 — 2112 + llcx — fill2. Since cx — fi is orthogonal to W, this last
equality comes from 1.17(b). Taking square roots, we see 1(5 — fill > llcx — fill. In
particular, d(fi, W) = inf{lly — fill yeW} = 112— fill.
Conversely, suppose llcx— fill = d(fi, W). Fix a vector (5€ W —{0}. Then for
any real number t, we have cx + t(5eW. Thus, llcx—fi112 llcx + t(5 — fill2 =
(cx—fi+t(5, cx—fi+t(5>=llcx—fi112+2t(cx—fi, (5>+t211(51l2. Thus, the
quadratic form q(t) = 2t(cx — fi, (5> + t2llcSll2 is nonnegative on It This can
only happen if the discriminant of q(t) is not positive. Thus, 4(cx — fi, (5>2
— 411(5112(0) 0. Hence, (cx — fi, (5> = 0. If (5= 0, then clearly (cx — fi, (5> = 0.
We conclude that cx — fi is orthogonal to W. fl
may Contain several vectors that are Closest to a given vector /3. In pre-Hilbert
spaces, if W contains a vector x closest to/I in the II Il-norm, then z is unique. We
want to give a special name to the vector when it exists.
We caution the reader that does not always exist. By Theorem 1.19,
[when it exists] is the unique vector in W closest to /3 in the 1111-norm.
We have seen an example (Exercise 7 at the end of this section) that shows that in
general there is no vector in W closest to /3. If does exist, then
— /3 is orthogonal to W. Notice also that = /3 if and only if
flEW.
There is one important case in which always exists.
Theorem 1.21: Let W be a subspace of the pre-Hilbert space (V, Ill). Suppose
(W, 1111) is a Banach space. Then exists for every /3EV. In this case,
V=
Proof Let /3 E V. If /3 E W, then = /3, and there is nothing to prove.
Suppose /3 is not in W. Set d = d($, W). Then there exists a sequence {oç,} in W
such that — /31l} d. We claim that is a Cauchy sequence in W. To see
this, we first apply the Parallelogram Law. We have — 112 =
{ xj is a Cauchy sequence in W.
Since W is complete, there exists a vector 2 E W such that -÷ 2. Then
continuity of the norm implies — /311} —* 112
— /311. Thus, d = 112 — /311.
Theorem 1.19 now implies that = 2.
As for the second assertion, we always have W n W' = 0. We need to argue
that V = W + W1. Let /3EV. From the first part of this argument, we know
ci = exists. The vector ci — /3 is an element of W'. Thus, /3 — ci E W1.
Since 13 = ci + (/3 — ci) E W + W', we conclude that V = W W'. fl
Note that Theorem 1.21 is a generalization of Corollary 3.34 of Chapter IV
when (V, Ill) is a pre-Hilbert space. For if W is a finite-dimensional subspace of
V, then (W, 1111) is norm isomorphic to (lr, 11111) for some n. Thus, W is complete
by Corollary 4.5 of Chapter IV. Some of the most important applications of
Theorem 1.21 arise when V itself is finite dimensional. Then every subspace W of
V is a Banach space, and, consequently, exists for every vector /1 E V.
Let us discuss a well-known example of the above results. Suppose V is the
214 INNER PRODUCT SPACES
Hubert space (He, U fi) given in Example 1.2. Let W be a subspace of V, and let
fi e V. In this discussion, it will be Convenient to identify with the space of all
n x 1 matrices The standard inner product on 11? is then given by the
following formula: = Here is the transpose of the n x 1 matrix
and is the matrix product of with
Suppose is a basis of W. Let us write each column vector; as
= (a11,..., aJ and form the n x m matrix A = = IcXm). Then W is
just the column space of A.
Let fi = (b1,. . , bJ. Then finding the orthogonal projection,
. of fi
onto W is equivalent to determining the least squares solution to the following
system of linear equations:
1.22: AX =
In 1.22, X = (x1,..., Xm)t is a column vector in Mmx IffleW, then the linear
system in 1.22 is consistent. In this case, equation 1.22 has a unique solution Z
since rk(A) = m. If $ is not in W, then the linear system in 1.22 is inconsistent. In
either case, the words "least-squares solution" means a Z in Mm 1(R) for which
IIAZ — is as small as possible.
Now inf{IIAX — = — $11 yeW} = d(fJ, W). Thus,
by Theorem 1.19, the least-squares solution to 1.22 is a vector Z e Mm 1(R) such
that AZ = Since W is finite dimensional, we know from Theorem 1.21
that exists. Since the rank of A is m, there exists a unique Z e Mm x
such that AZ = It is an easy matter to find a formula for Z. Since
AZ — ,8 is orthogonal to W, we must have (AX)t(AZ — /1) = 0 for every
XE Mm 1(R). Thus, X'(AtAZ — Atfi) = 0 for every X. This implies that
AtAZ — A1$ = 0. At this point, we need the following fact:
A proof of 1.23 is easy. We leave this as an exercise at the end of this section.
Returning to our computation, we see Z = (AtA) lAt$ and =
A(AtA) lAt/9. Let us summarize our results in the following theorem:
1.25: Z = (A'A)1At$
If we look back at our discussion preceding Theorem 1.24, we see that the
hypothesis rk(A) = m was used in 1.23 to conclude that AtA was invertible. If A
is an arbitrary n x m matrix, then the linear system AX = fi still has a least-
squares solution Z in Mm Z is not necessarily unique, but the same
analysis as before shows that Z must satisfy the following equation:
= Projw($)
Proof The projection of /3 onto is xcx for some x e R. We also know that
(xcx — /3, cx) = 0. Solving this equation for x gives the desired result. El
The scalar (/3, cx)/(cx, cx) appearing in 1.32 is called the Fourier coefficient of
/3 with respect to cx. We can combine the last two lemmas in the following
theorem:
Proof Since W = $ we can apply the last two lemmas and get
the result. LII
given in 1.34 makes it clear that orthogonal bases of W are very useful. Let us
introduce the following definition:
Proof The finite sequence case is included in the infinite argument. Hence, we
=
—
= (<11,
We shall finish this section with a brief description of what analysts call a
"basis" when dealing with Hilbert spaces. We need a formal name for the type of
sequence constructed in Theorem 1.36.
218 INNER PRODUCT SPACES
Thus, if every vector (5in V is equal to its Fourier series (relative to then
the set i e N} is called a "basis" of V. The reader will note that the word
"basis" is included in quotation marks here. This is because an orthonormal
sequence that is a "basis" of V is not in general a vector space basis in the sense
of Chapter I. Consider the following example:
Example 1.42: We return to the Hilbert space 12 given in Example 1.5. For each
i e N, let p1 denote the sequence that is zero except in the ith position where it is
one. Thus, p1 = (0,...,0, 1,0,0,...). Clearly, the set is linearly
REAL INNER PRODUCT SPACES 219
independent over R. This set is not a basis of 12 since, for instance, tj = { 1/n} is
not a finite linear combination of the vectors in ie N}.
The sequence {q,1} is clearly an orthonormal sequence in 12. If = e 12,
then the Fourier coefficient of relative to is pj) = Thus, the Fourier
series of '5 is x1p1. The nth partial sum, of this series is clearly
(x1, x2,..., 0, 0,...). Thus, — = 0. Therefore, =
Since is arbitrary, we conclude that {p1 i e N} is a "basis" of 12.
(1) Verify that the definition given in 1.1 is the same as that given in 7.12 of
Chapter I.
(2) Show that 12 is a Hilbert space.
(3) Show that the example given in 1.4 is a pre-Hilbert space but not a Hilbert
space.
(4) Occasionally it is convenient to relax condition (d) in Definition 1.1.A
function f: V x V is called a semiscalar product if f satisfies the
following conditions:
(a) f is bilinear.
(b) f(oc, /3) = f(fJ, x) for all cx, fieV.
(c) all aeV.
Give an example of a semiscalar product that is not an inner product on V.
Show that Schwarz's inequality remains valid for any semiscalar product f.
(5) Let V = {feC'([a, b])lf' is continuous}. Define <f, g> = f(a)g(a) +
r(t)g(t) dt. Show that <, > is an inner product on V.
(6) Provide the technical details for the proof of Theorem 1.14.
(7) Let V = {fe C([O, 1]) I f(0) = 0}. Let be the norm defined in equation
1.6 of Chapter lv. Let W = e12
Nc = 0 for all n > 0}.
(a) Show that the normed linear vector space (V, ill is not a pre-Hilbert
space.
(b) Let fi = {1/n}. Show that fle 12, d(fl, W) = 0 and 0 — for any
a e W. Thus, W contains no vector closest to /3.
(8) Give an example of a normed linear vector space (V, a subspace W,
and a vector fi such that W contains more than one vector such that
— /311 = d(/3, W). There is such an example in Chapter IV.
(12) Let (V, < , >) be a finite-dimensional inner product space. Let {x1,...,
be a basis of V. Let c1, . .. , ; e It Show there exists a unique vector x e V
such that <x, cx1) = c1 for all i = 1,.. ., n.
(13) Let V = ,,(R). If A, Be V, set <A, B> = Tr(ABt).
(a) Show that (V, < , >) is an inner product space.
(b) Find the orthogonal complement of the subspace of all diagonal
matrices in V.
(14) Let W be a finite-dimensional subspace of an inner product space V. Show
that C II 1311 for all 13eV.
(15) In Exercise 14, suppose T 61(V) is an idempotent map with Im(T) = W. If
111(13)11 C Ii 1311 for all 13eV, prove T = ).
2. SELF-ADJOINT TRANSFORMATIONS
As in Section 1, we suppose (V,'( , >) is a real inner product space. The reader
will recall from Section 6 of Chapter I that the dual V* of V is the vector space
HomR(V, R). If 1: V —' W is a linear transformation, then the adjoint of I is the
linear transformation 1*: V* given by T*(f) = fT.
If V is an infinite-dimensional pre-Hilbert space, then V* is too large to be of
any interest. Recall that dimR(V) = cc implies that dimR(V*)> dimR(V). We
222 INNER PRODUCT SPACES
confine our attention to the bounded linear maps, R), in V*. Recall that
T e t?J(V, R) if and only if there exists a positive constant c such that dl II
for all 1eV. If V is finite dimensional, then we had seen in Chapter IV that
gj(V, R) = V is any Hubert space, finite or infinite dimensional, we shall
see that R) is isomorphic to V in a natural way.
Any pre-Hilbert space (V, Ill) admits a linear transformation 0: V -÷ YJ(V, 114
which we formally define as follows:
Definition 2.1: Let (V, Jill) be a pre-Hilbert space with inner product < , >. Let
0: V -÷ V* denote the linear transformation defined by O(fi)(cx) = <cx, fi>.
Thus, for any fie V, O(fi) is the real valued function on V whose value at cx is
(cx, fi>. We can easily check that O(fi) is a linear transformation from V to R. If
cx,cx'eV, and x,yek, then we have O(fi)(xcx + ycx') = (xcx + ycx',fl> =
x<cx, + y(cx', = xO(fi)(cx) + yO(fi)(cx'). Thus, U is a well-defined map from
V to V*. We next note that U is a linear transformation from V to V* We
have U(xfi + yfi')(cx) = (cx, xfi + yfi'> = x(cx, 11> + y'(cx, 11'> = xO(fi)(cx) + yO(fi')(cx) =
[xU(fl) + yU( fi')](cx). Since cx is arbitrary, we conclude that U(xfi + yfi') =
xU(fl) + yO(fi'). Hence, U is a linear map from V to V*. Let us also note that U is
an injective linear transformationr For suppose, U(fi) = 0. Then, in particular,
o = U(fi)(fi) = (fi, fi>. Thus, fi = 0.
The linear transformation U gives an imbedding of V into V*. We claim that
the image of U actually lies in R). As usual for statements of this kind, we
regard R as a normed linear vector space via the absolute value I , and V as a
normed linear vector space via the norm 1111 associated with ( , >. We claim
U(fi) e 11) for every fi eV. This follows from Schwarz's inequality. If cxcV,
then IU(fi)(cx)l = 1<; C fill llcxll. Thus, U(fi) is a bounded linear operator on V.
II
A bound for U(fi) is fill. We have now shown that U is an injective linear map
II
from V to 14
Now recall that YJ(V, 11) is a normed linear vector space relative to the
uniform norm IITJI = inf{cI c is a bound of T}. Here 14 Thus, in the last
paragraph, we showed that IIU(fi)ll C fill for all fie V. On the other hand, if
Il
fi 5& 0, then cx = fi/Ilfill has length one, and IU(fi)(cx)l = (fl/Il fi>l = II fill. In
particular, IIU(fl)ll II fill by Lemma 1.24(b) of Chapter IV. We conclude that
IIU(fi)ll = II fill for every fi eV. The reader will recall that a bounded linear map
between normed linear vector spaces that preserves lengths is called an isometry.
Hence, U is an isometry of V into R). We have now proved the first part of
the following theorem:
Theorem 2.2: Let (V, 1111) be a pre-Hilbert space with inner product ( , >. The
map U given by U(fi)(cx) = (cx, fi> is an isometry of V into the Banach space
14
SELF-ADJOINT TRANSFORMATIONS 223
Theorem 2.3: Let (V, Ill) be a pre-Hilbert space. The isometry 0: V —' R) is
surjective if and only if (V, 0 1) is a Hilbert space.
2.4: V*(
ii *
Ii
R) <— êJ(V,
oj
In diagram 2.4, the map T*J denotes the restriction of T* to the subspace
ff4 The map i is the inclusion of
(V, ( , )) is a Hubert space, the map U in 2.4 is an isomorphism by
Theorem 2.3. In this case, the composite map S = 0 i(T*I)O is a bounded linear
operator from V to V. There is a simple relationship between S and T.
In equation 2.5 and much of the sequel, we take the parentheses off the symbols
T(a) and S(fl) to simplify the notation. To prove 2.5, we compare both sides. On
the left, we have (Toç ,6) = O( I3XTa). On the right, we have (oçSfl) = O(Sfl)(cx) =
[O(0 1T*O)(fl)](a) = [T*O(/3)](cx) = O(fJ)(Tx). Thus, <Dx, = <cx, Sf3>.
We also note that S is the only bounded linear operator on V that satisfies
equation 2.5. For suppose 5' e YJ(V, V), and (Dx, fi> = (cx, for all x, 46eV.
Then (cx, (5' — S)fl) = (x, S'fl) — <x, Sfl) = (Tcx, ,6> — (Ta, fi> = 0. In part-
icular, II(S' — S)flh12 = <(5' — (5' — =0 for all fleV. We conclude that
5' = S.
We have now proved the following theorem:
Theorem 2.6: Let (V, < , >) be a Hilbert space. For every bounded linear
operator T e V), there exists a unique bounded linear operator Se V)
such that (Dx, ,6> = (cx, Sfl) for all a, fle V. J
The reader is warned that we have changed our definition of the adjoint when
dealing with bounded linear operators on Hubert spaces. When we need to refer
back to our old usage of the word "adjoint" (Section 6 of Chapter I), we shall use
the words "algebra adjoint." Thus, if V is a Hubert space and T E V), then
the "algebra adjoint" is the induced map on given by f —÷ fT. The adjoint of T
is the operator T* e &J(V, V) that satisfies equation 2.7. When dimR(V) < then
YJ(V, ffk) = by Corollary 3.29 of Chapter IV. Thus, in this case, the adjoint of
T only differs from the "algebra adjoint" of T by the isometry 0.
Now let (V, ( , >) be an arbitrary Hilbert space. There are two types of
bounded linear operators on V that we want to study in the remainder of this
section.
Definition 2.8: Let (V, ( , >) be a Hubert space. Let T e YJ(V, V).
2.8': (a) T is self-adjoint if and only if (Ta, = (cx, T,6> for all x, jieV.
(b) T is orthogonal if and only if (T; T,6> = (cx, ji) for all a, $ e V.
particular, J?('p, ep)(T) = M(y, (p) is an orthogonal matrix. Thus, a change of basis
matrix between orthonormal bases of R" is orthogonal. E
Lemma 2.11: Let V be a Hilbert space, and Tet?J(V, V). Suppose T is self-
adjoint and nonnegative. Then
2.12: I<Ta, 11>1 = lEa, fill C [a, fi]1/2 = (Ta, a)'12(Tfi, fi)1/2
Proof The first order of business is to argue that T has an eigenvector. To this
end, consider the following function f gives us
a real valued function on V. Since T is continuous on V and ( , ) is continuous
on V x V, we conclude that f is continuous on V.
Set S = = 1}. We had seen in Chapter IV that S is a closed and
bounded subset of V. Hence, S is sequentially compact by Corollary 3.28 of
Chapter IV. Note that f is a bounded function on S. For if eS, then
= )h C C 11Th 1k112 = DTII.
Set m= It follows from Corollary 3.16 of Chapter IV
that there exists a vector a e S such that f(a) = m. Now set T, mlv — T. The
map T, is the difference of two self-adjoint operators, and, consequently, is
self-adjoint. If then = — Tc5, = — =
If 5eV—{0}, then Therefore,
0.Thus, 0. We can now conclude that
(T,c5, 0 for all eV. In particular, T, is a nonnegative, self-adjoint
operator on V.
Now (T,a,a)=(ma—Ta,a)=m11a1h2—f(a)=m—m=0. But then,
Lemma 2.11 implies that T,(a) = 0. Thus, T(a) = ma, and we have found a unit
eigenvector a e V with eigenvalue m.
If dimR(V) = 1, then {a} is an orthonormal basis of V, and we are done. If
228 INNER PRODUCT SPACES
2.18: = a11x1 + +
eR
Then
)
is symmetric. The characteristic polynomial of A is given by
cA(X) = X2 + 4X + 3. Thus, the eigenvalues of A are —1 and —3. An or-
thonormal basis of 1k2 consisting of eigenvectors of A is easily seen to be
= where = 1/,,/i)t, and = — Thus,
1)
1) and
230 INNER PRODUCT SPACES
2.21: n
The computations in Example 2.17 can be carried out for any matrix A that is
similar to a diagonal matrix. The main point here is that symmetric matrices are
always diagonalizable, and they are easy to recognize.
There is a third corollary to Theorem 2.14 that is worth mentioning here.
Corollary 2.22: Let (V, ( , )) be a finite-dimensional Hilbert space. Suppose
Te V) is an isomorphism. Then T = RS, where R is orthogonal and S
is a positive, self-adjoint operator.
Proof A self-adjoint operator S on V is said to be positive if all the eigenvalues
of S are positive. To prove the corollary, consider the adjoint T* of T. Since
(T*T)* = T*T** = T*T, we see that T*T is self-adjoint. By Theorem 2.14, V has
an orthonormal basis = fr',..., consisting of eigenvectors of T*T.
Suppose T*T(x1) = for i = 1,..., n. Since T is an isomorphism, $ 0.
Therefore, 0< IIT@xJ112 = <Tx1, = T*Tcz1> = m1x1) = m11hx1112=
Thus, each is a positive real number. Set =
We can define a linear transformation 5: V V by setting = for all
= 1,..., n. Then = T*T. The reader can easily check that S is self-adjoint.
Hence, S is a positive self-adjoint operator whose square is T*T.
Set P = 5T'. Since (Pcr, P/i> = <5T'ocST'fl) = <T'oc 52T1$> =
(T1oc, = <T1oc, T*/3> = (TT1oç/J) = (ac, P is orthogonal.
In particular, P = P" is also orthogonal. Set R = P1. Then T = RS with R
orthogonal, and S a positive, self-adjoint operator. El
If we combine Corollaries 2.22 and 2.16, we get what is known as the UDV-
decomposition of a nonsingular matrix.
Definition 2.24: Let (V, < , >) be a pre-Hilbert space, and suppose T e 98(V, V).
We say T is self-adjoint if (Tac, = <cx, T/J) for all cx, /3eV.
SELF-ADJOINT TRANSFORMATIONS 231
Obviously, our new definition agrees with the old one when V is a Hilbert
space. We had argued this point in 2.8'(a). Note that Lemma 2.llis still valid for
any pre-Hilbert space V, and any nonnegative, self-adjoint operator T. The
proof is precisely the same. The definition of a compact operator is as follows:
Definition 2.25: Let (V, liii) be a pre-Hilbert space, and T e V). Set
S= = 1}. We say Tis acompact operator if the closure of T(S)in V is
sequentially compact.
Before we give the proof of Theorem 2.26, let us discuss why this theorem is a
generalization of Theorem 2.14. Suppose V is finite dimensional. Then T is just a
self-adjoint, linear transformation from V to V. We need the following lemma:
Proof We first show ker(T) n Im(T) = (0). Let e ker(T) n Im(T). Then = T(y)
for some ye V. Since xe ker(T), we have 0 = T(cr) = T2(y). But then,
o = <T2y, y> = <Ty, Ty> = We conclude that cx = T(y) = 0.
To show ker(T) + Im(T) = V, we can use the first isomorphism theorem
and count dimensions. We have Im(T). Therefore, dim(ker(T))
+ dim(Im(T)) = dim(V). Since ker(T) n Im(T) = (0), the union of a basis from
ker(T) with a basis from Im(T) is a set of linearly independent vectors in V. Since
the dimensions add upright, the union of bases from ker(T) and Im(T) is in fact a
basis of V. Therefore, ker(T) + Im(T) = V.
We have now shown that V = ker(T) Im(T). It remains to show that these
two subspaces are orthogonal. To see this, let cx e ker(T), and /3 e Im(T). Then
/1 = T(y) for some yeV. We have <cx,/3> = <cx, Ty) = <Tcx,y> = <0, y> = 0.
Thus, ker(T) and Im(T) are orthogonal, and the proof of the lemma is
complete. El
We can now apply Lemma 2.27 to our discussion. The subspace ker(T) has an
orthonormal basis by the Gram—Schmidt theorem. The vectors in are all
eigenvectors of T with eigenvalue 0. If cx' is an orthonormal basis of Im(T), then
Lemma 2.27 implies that the union u is an orthonormal basis of V. Hence, V
has an orthonormal basis consisting of eigenvectors of T if and only if Im(T) has
an orthonormal basis consisting of eigenvectors of T. In particular, Theorem
2.26 implies Theorem 2.14.
/3 = for every /leW. So, let /JeW. Then /3 = T(cx) for some cxeV.
Set b1 = </3, and = for every i. Then b1 = = (Tcx, (Pt> =
<cx, = (oc,c1q,1> = c1a1. In particular, T(a1Q1) = a1Tfrp1) = = b1Q1.
Thus, T(cr — = /3 — Since the are the Fourier coef-
ficients of cx (with respect to cx — a vector in by Theorem
1.33. The norm of T on is Thus, we have IIT@ — a1p1)11 C
Since is orthogonal to we
have 11cx112 = — + a1q,1112. In particular, —
C IIcxII. Putting this all together, we have — =
Since we conclude that
{YJ'=' b141} —÷/3. Thus, /3 = This completes the proof of
Theorem 2.26. fl
[ 4 —1 1
A=1—1 4 —1
[ 1 —1 4
To motivate the definition we shall use, consider the complex vector space C".
The analog of the standard inner product (on R") for C" is the bilinear map
/3> = zkwk. Here x = (z1, . . , zj, /3 = (w1,..., wj, and zk, wk e C for
1
.
all k = 1,..., n. This bilinear form does not work well as a candidate for an
inner product on C". If = (1, i, 0,... , 0), for example, then x $ 0, but
<oc,x> = 0. We can fix this problem by defining <x, /3> = zkWk. Here Wk
denotes the complex conjugate of wk. Now if x = (z1,..., e C", then
<oc = = IzkI. Here the notation Izi indicates the modulus of
1
the complex number z. The reader will recall that the modulus of a complex
number z = a + bi [a, b e is defined to be the positive square root of a2 + b2.
Thus, zi = (a2 + b2)"2. The modulus is a function from C to the nonnegative
real numbers. It agrees with the ordinary absolute value on It Hence, we have
chosen the same notation I I for the absolute value on R and the modulus on C.
The reader can easily check that the modulus I I: C —' l1 is a norm on the real
vector space C. We also have Izz'I = zi Iz'I, and z2 = z12 for all zeC. In
particular, (cx, x) is a nonnegative real number for every e C". Also, it is clear
that <ocx> =Oif and only if x= 0.
We give up something here with this new definition of < , >. The function
COMPLEX INNER PRODUCT SPACES 237
The reader can easily verify that these equations hold for all cc, j9, and y in and
all z, z' in C. This function ( , > has the desired length properties and
furthermore reduces to the standard inner product on R" when restricted to
ER" x IFP. Finally, we note that the conditions listed in 3.1 make sense for any
complex vector space V, and any function f: V x V —÷ C. Hence, we adopt these
conditions as our definition of a complex inner product.
Definition 3.2: Let V be a vector space over C. By a complex inner product on V,
we shall mean a complex valued function < , >: V x V —÷ C that satisfies the
following conditions:
and /3 = (w1,..., wj. Then (C", < , >) is a complex inner product space.
We shall refer to this particular inner product on C" as the standard inner
product. fl
238 INNER PRODUCT SPACES
Notice that when n = I in Example 3.3, (z, z> = = z12. Hence, the
modulus function I C —* R is the norm given by the standard inner product on
C. We had mentioned that a given V might support more than one inner
product. Here is a second inner product on C2.
Example 3.4: V = C2. Define <, )' by the following formula:
<(z1, z2), (w1, w2))' = 2z1*1 + z1*2 + + z2W2. The reader can easily
verify that ( , )' satisfies conditions (a)—(c) in Definition 3.2. Hence, (C2, < , >')
is a complex inner product space. fl
Example 33: Let V be the set of all continuous, complexed valued functions on
the closed interval [a, b] c 1k. Clearly, V is a complex vector space via pointwise
addition and scalar multiplication. V becomes a complex inner product space
when we define <f, g) = f(t)g(t) dt. fl
w2,...). Since and fi have at most finitely many nonzero components, the
formula for <x, makes perfectly good sense. El
Example 3.7: Let V = {{zk} e CN I lIzkI < cc}. V is a complex vector space
via componentwise addition and scalar multiplication. We can define an inner
product on V by setting (X, /3> = ZkWk. Here cx = {zk}, and /3 = (wk}.
This space is the complex analog of 12 in Example 1.5 of Section 1. In the
literature, it is also called 12. fl
The reader will note that several of these examples are the same as the
corresponding real inner product spaces. We have simply enlarged the base field
from 11R to C. There is a standard way to produce complex inner product spaces
from real ones. Namely, pass to the complexification.
Let (W, < , >) be a real inner product space. Consider the complexification
We = W ®R C of W (see Section 2 of Chapter II). Recall that if we identify a
vector e W with its image ® 1 in Wc, then WC is spanned by W. Thus, every
vector in Wc can be written in the form + + where eW
;
and z1,. .., e C. Any basis x of W over ER is also a basis of Wc over C. Also,
any vector in Wc can be written uniquely in the form cx + i$ for vectors x and
/JeW. In terms of this representation, addition and scalar multiplication in Wc
are given by the following formulas:
(x+ifl)+(p+iA)=(x+M)+i($+1)
(a + bi)(x + i$) = — bfl) + i(a/3 + bcr)
3.9: WCxWC
3.10: (pj + iA1, P2 + il2>1 = <Pi, P2> — 1<tZi' + iQ1, 112> + <As, 22>
Lemma 3.12: Let (V, < , )) be a complex inner product space. Then
<ac <ci p>1/2 for all a, /3eV.
Before proving Lemma 3.12, let us make a couple of comments about the
quantities appearing in the inequality. If ci and flare vectors in V, then (a, is a
240 INNER PRODUCT SPACES
complex number. Thus, the left side of 3.12 is the modulus of the complex
number (ci, /3). By 3.2(c), <ci, ci) and (/1, /3) are nonnegative real numbers. Thus,
the right side of 3.12 is the product of the (nonnegative) square roots of these
quantities. In the lemma, we compare these three real numbers.
Proof of 3.12: Fix ci and /3 in V. If (ci, /3) is a real number, then the proof of the
inequality is the same as in Lemma 1.6.
So, we assume z = (ci, /3)eC — R. Then z 0, and z'cieV. Since
(z 'ci, /3) '<ci, /3) = 1, a real number, Schwarz's inequality is true for the
vectors and /3. Thus, But
=(z2)'(ci,ci) = izL2<ci,ci). Therefore, <z'ci,z'ci)"2 =
izi - '<ci, ci)'12. Thus, izi (ci, ci)"2< /3, /3)1/2, and the proof is complete. fl
Corollary 3.13: Let (V, < , )) be a complex inner product space. Then
Dciii = (ci, defines a real valued function on V that satisfies the following
conditions:
Proof? (a) Definition 3.2(c) implies that ilciui > 0 for every ci eV — {0}.
Note also that ilciD = 0 if and only if ci = 0.
(b) IizciuI = (zci, zci)"2 = (z2<ci, ci))"2 = (1z12<ci, ci))"2 = izi ilciui.
(c) In order to prove the triangle inequality, we need to recall the
definition of the real part, Re(z), of a complex number z. If z = a + bi,
then Re(z) = a. Note that z + 2 = 2 Re(z) and Re(z) E izi.
Now suppose ci and /3 are vectors in V. Then ilci + /3ii2 = ci + /3) =
Dciui2 + 21<ci, /i)i + 11/3112 11ci112 + 2iiciuI li/ill + = (liciD + il/iiD2. Thus,
ilci + 13112 (iiciui + ii Taking square roots gives us (c). El
Any complex inner product space is of course a vector space over R since
c C. Corollary 3.13 says that any complex inner product space is a normed
linear vector space over It Actually, 3.13(b) is a stronger statement than what is
required in Definition 1.1(b) of Chapter IV. A complex vector space V together
with a real valued function liii: V —+ R satisfying conditions (a)—(c) in 3.13 is
called a complex, normed linear vector space. Thus, Corollary 3.13 says that any
complex inner product space (V, ( , )) is a complex, normed linear vector
space relative to ilcill = <ci,
COMPLEX INNER PRODUCT SPACES 241
As usual, we shall Call the norm lall = <a,a>'12 defined by the inner product
on V the norm associated with ( , ). Topological statements about a complex
inner product space will always be relative to the associated norm on V. We can
rewrite Schwarz's inequality as follows:
3.14: (a, 13)1 C Vail II /311 for all oc, /3EV
We now introduce the same definition discussed in Section 1 for real inner
product spaces.
Definition 3.15: Let (V, Ill) be a complex, normed linear vector space. We say
(V, 1111) is a pre-Hilbert space if there exists a complex inner product on V such
that all = <a a>'12 for all aeV. If the pre-Hilbert space (V, 1111) is complete,
then (V, Ill) is called a Hilbert space.
The inner product spaces and 12 are (complex) Hilbert spaces. The space
C is a pre-Hilbert space, but not a Hilbert space.
It is not our intention at this point to give the complex analog of every
theorem in Section 1 for complex pre-Hilbert spaces. We shall say just a few
words about some of these results. Let (V, < , )) be a complex inner product
space. Two vectors a and /3 in V are said to be orthogonal if (a, /3) = 0. If a and /3
are orthogonal, we shall write a I /3. Since (a, /3) = (/3, a>, we see a1 /3 if and
only if /31a. We shall use the same terminology introduced in Definition 1.16 for
complex inner product spaces.
The parallelogram law, Corollary 1.18, the Gram—Schmidt theorem, Bessel's
inequality, and so on are all true in any complex inner product space (with only
minor changes in their statements and proofs). For example, Bessel's inequality
becomes the following statement in a complex inner product space: Let be
an orthonormal sequence in V, and set Zk = (a, for all k e Then
C la 112. We shall cover these results in the exercises at the end of this
section.
We shall be interested in finite-dimensional, complex inner product spaces in
the next section. For these spaces, the complex analog of Theorem 1.21 can be
proved purely algebraically.
Theorem 3.16: Let (V, ( , )) be a finite-dimensional, complex inner product
space. Let W be a subspace of V. Then V = W $ W'.
Proof? Applying the Gram—Schmidt theorem to W, we can find an orthonormal
basis a = {a,,..., ar} of W. If aeV, then = <a,ak)ak is a vector
in W such that
a= + (a — eW + W'. Therefore, V = W + W'. Since we
alwayshaveWnW'=(O),V=W$W'. El
It easily follows from Theorem 3.17 that W" = W for any subspace W of V.
We leave this as an exercise at the end of this section.
242 INNER PRODUCT SPACES
(1) Verify that the standard inner product on indeed satisfies the Conditions
listed in 3.1.
(2) Show that the map < , >' given in Example 3.4 is a complex inner product
on C2.
(3) Do the same for the map given in Example 3.7.
(4) Show that ( , given in equation 3.10 is a complex inner product on
Wc.
(5) Show that Examples 3.3 and 3.6 are the complexifications of Examples 1.2
and 1.3, respectively.
(6) Let (V, ( , >) be a complex inner product space. Prove the Gram—
Schmidt theorem in this setting. Thus, if {ak} is a finite or infinite sequence
of linearly independent vectors in V, show that there exists an orthonormal
sequence {Qk} in V such that for allj.
(7) Show that the parallelogram law is valid in any complex inner product
space V. Thus, Ila + /3112 + Ia — /3112 = 2(11a112 + 11/3112).
(8) The complex analog of Lemma 1.17(b) is not true. If a and /3 are
orthogonal, show IIa + /3112 = 1la1l2 + /3112. Give an example that shows
II
(16) Find an orthonormal basis of the complex inner product space given in
Exercise 15.
(17) What do all complex inner products on C look like?
(18) Let ( , ) denote the standard inner product on C2. Show there is no
nonzero matrix such that (ci, ciA> = 0 for all cieC2.
4. NORMAL OPERATORS
space. Then for every T e 1(V), and every /3 e V, there exists a unique vector ye V
such that (Tci, /3) = (ci,y) for all cieV.
Proof Let T e 1(V) and /3eV. Then the map f(ci) = (Tci, /3) is clearly a linear
transformation from V to C. Thus, fe V*. By Lemma 4.1, there exists a unique
vector ye V such that f(ci) = (ci, y) for all ci eV. Thus, (Tci, /3) = (ci,y) for all
ci. U
+ i<T21, a2> = <Tp1, P2> — i<Tp1, 22> + i<T21, P2> + <TA1, 22> =
<Pi' T*p2) — i<p1, T*22) + i(21, T*p2> + <2k, T*22) = <Pi + i).1, T*p2 +
iT*22> = <a1, (T*)Ca2).
(a) (T*)* = T.
(b) (S + T)* = S* +
(c) (zT)* =
(d) (ST)* = T*S*.
(e) If cx is an orthonormal basis of V, then ['(a, a)(T*) is the conjugate
transpose of
Proof All the above assertions follow immediately from the functional
equation:
We prove (e) and leave the rest as exercises at the end of this section. Suppose
= {a1,..., is an orthonorinal basis of V. Set ['(a, a)(T) = (ZkJ) e JC).
Then for allj = 1,...,n. If T*(aJ)=>3=lwkjak, then we
have = = <ai, = <as, = Thus, the con-
jugate transpose of is ['(a, [1
Definition 4.9: Let (V, < , >) be a finite-dimensional, complex inner product
space. Let T e 1(V).
Obviously a Hermitian operator is normal. Since dimc(V) < oo, T*T = Ij,, if
and only if TT* = Thus, a unitary operator is normal also. Hermitian and
unitary operators are the complex analogs of self-adjoint and orthogonal
operators on real inner product spaces. In fact, we have the following theorem:
Proof These results follow immediately from Theorem 4.5 and the fact that the
complexification of a product of two endomorphisms is the product of
their complexifications. Thus, if T is self-adjoint, then T = T*. Hence,
= (T*)c = Tt. Therefore, TC is Hermitian. If T is orthogonal, then
T*T = Let I denote the identity map on the complexification Wc. Then
1 = (1w)C = = (T*)CTC = Therefore, Tc is unitary.
In terms of the complex inner product on V, the definitions of Hermitian and
unitary can be rewritten as follows:
4.11: (a) T is Hermitian if and only if (Ta, /3) = (a, T/3) for all a, /3eV.
(b) T is unitary if and only if (Ta, Tf3) = (a, /3) for all a, /3eV.
In the last part of this section, we discuss normal operators. Since Hermitian
and unitary operato!s are both normal, whatever we say applies to both types of
NORMAL OPERATORS 247
cT(X) (X —
= k01
Thus, the eigenvalues of a Hermitian operator are always real. Our next
lemma says that nilpotent Hermitian operators are zero.
Proof We can assume that T2M(a) = 0 for some m 1. Set S = T2m1. Then S is
Hermitian by Lemma 4.7(d). Since 55* = S2 = T2M, we see that SS*(a) = 0. But
then, 0 = a> = (S*a, S*a> = (Sa, Sa> = We conclude that
S(a) = 0. We can now repeat this argument. We finally get T2(a) = 0. Then
o = (T2a, a> = (Ta, Ta) = ITaII2. Therefore, T(a) = 0. fl
Proof IIT(a) 12 = (Ta Ta> = (a, T*Ta> = (a, TT*a> = (T*a, T*a> = IIT*(a)112.
Taking square roots, gives us the desired result. fl
One immediate application of Lemma 4.19 is the fact that ker(T) = ker(T*)
for any normal operator T. An important special case of this equality is the
following corollary:
Corollary 4.21 of course says that a unitary operator has all of its eigenvalues
lying on the unit circle in the complex plane C.
We need one more set of ideas before presenting the spectral theorem for
normal operators. The reader will recall that an endomorphism T e 1(V) is
called idempotent if T2 = T. A typical example of an idempotent map is the
projection ) of V onto a subspace W. If T is idempotent, then
V = ker(T) $ Im(T). To see this, first note that any vector cxe V can be written in
the form Clearly, T(oe) e Im(T). Since T(cc — T(oe)) =
cx = T(cx) + (oc — T(oc)).
= 0,
T(oc) — T2(oc) = T(cc) — T(oe) we see — T(oe) e ker(T). Therefore,
V = ker(T) + Im(T). If cceker(T) n Im(T), then cc = T(fJ) for some fJeV. Then
cc= T(fl) T2(fl) = T(oc) = 0. This last equality comes from the fact that
cc e ker(T). Thus, ker(T) n Im(T) = (0), and V is the direct sum of ker(T) and
Im(T).
Now, in general, when T is idempotent, the subspaces ker(T) and Im(T) are
not orthogonal. Thus, V is not in general an orthogonal direct sum of ker(T) and
Im(T). Those idempotent T for which ker(T)11m(T) are called orthogonal
projections. We can construct examples of orthogonal projections by following
the same procedure as in Section 2. Suppose W is a subspace of V. Let
= {cc1,..., be an orthonormal basis of W. We can define an operator
) on V by setting = <cx' The reader can easily check
that = Thus, the map ) is an idempotent
operator on V. The image of ) is clearly W. One can also check that
cx — is orthogonal to W. It easily follows from this fact that
)) is orthogonal to )). Thus, ) is an orthogonal
projection of V onto W.
We shall need the following result about orthogonal projections:
Lemma 4.22: Suppose T is idempotent and normal. Then T is a Hermitian,
orthogonal projection.
(T*)2. Our Comments before this lemma now imply that V = ker(T) e Im(T) =
ker(T*) e Im(T*).
Since T is normal, Lemma 4.19 implies that ker(T) = ker(T*). We Claim that
Im(T) = Im(T*). Since T commutes with T*, we have T*(Im(T)) Im(T). Thus,
Im(T*) = T*(V) = T*(ker(T) + Im(T)) = T*(ker(T*) + Im(T)) = T*(Im(T))
Im(T). Reversing the roles of T and T*, gives us the other inclusion
Im(T) c Im(T*). Thus, Im(T) = Im(T*).
We now claim that T = T*. Since both maps are idempotent, they are both
the identity map on Im(T) = Im(T*). Let oe e V. We can write oe as oe = + oe2
where e ker(T) = ker(T*), and oe2 e Im(T) = Im(T*). Then T(oe) = T(oe1)
+ T(oe2) = T(oe2) = T*(oe2) = T*(oei) + T*(oe2) = T*(oe). We have now es-
tablished that T is Hermitian.
For any Hermitian operator on V, we have ker(T) = Im(T)1. This argument is
exactly the same as the self-adjoint argument. If oe e ker(T), and /3 e Im(T), then
/3 = T(y) for some ye V. In particular, we have (at, /3> = (at, Ty) = (Toe, y) = 0.
Therefore, ker(T) Im(T)1. If ate Im(T)', and /3 is arbitrary, then
o= TfJ> = <Toe, /3>. We conclude that T(oe) = 0. Thus, Im(T)' ker(T).
Since ker(T) = Im(T)', in particular, ker(T)11m(T). Hence, T is an orthogonal
projection. El
We can now state the spectral theorem for normal operators. With the
lemmas we have proved in this section, the proof of the Spectral Theorem is a
simple consequence of Theorem 4.23 of Chapter III. The reader is advised to
review Theorem 4.23 before proceeding further.
Theorem 4.23 (Spectral Theorem): Let (V, ( , >) be a finite-dimensional,
complex inner product space. Let T e Homc(V, V) be a normal operator.
Suppose the characteristic polynomial of T is cT(X) = = — zk)k. Then
there exists a set of pairwise orthogonal idempotents {P1,..., Pr} c
Homc(V, V) having the following properties:
(a)
(b) >klZkPkT.
(c) For each k = 1,. ..;r, Im(Pk) {ateVIT(at) =
If we set Vk = Im(Pk) for each k = 1,. .., r, then we also have
(d) dimc(Vk) = nk.
>J= lZkPk = T. We had also proved in Theorem 4.23 that P1, ., . are
pairwise orthogonal idempotents whose sum is Lb,. Hence, we have established
(a) and (b).
In Theorem 4.23 of Chapter III, we had also established that
Vk = Im(Pk) = ker(T — for each k = 1,..., r. Since T — Zk is normal,
Corollary 4.18 implies ker(T — = ker(T — Zk). This proves (c). The as-
sertions in (d) and (e) were also established in 4.23 of Chapter III. The assertion
in (g) follows from Corollary 4.22 of Chapter III. The only thing that remains to
be proved is the statement in (f).
Each is a polynomial in T. Thus, each is a normal operator on V.
Lemma 4.22 implies each is a Hermitian, orthogonal projection of V onto
Suppose k and let cxeVk and Then cx = Pk(cx'), and /3 = for
some fl'eV. Since = 0, we have <at, = (Pk(oc'), =
<at', = <a', if> = <a', 0) = 0. Thus, and (f) is proved. E
The spectral theorem says that V decomposes into an orthogonal direct sum
of the eigenspaces Vk. If we choose an orthonormal basis (by Gram—Schmidt) of
each Vk and take their union, we get an orthonormal basis of V consisting
entirely of eigenvectors of T. Hence, we can restate Theorem 4.23 as follows:
Corollary 4.24: Let T be a normal operator on a finite-dimensional, complex
inner product space V. Then V has an orthonormal basis consisting of
eigenvectors of T. LI
The matrix version of Theorem 4.23 is easy to state. The Hermitian adjoint
A* of a complex matrix A is its conjugate transpose. Thus, A* = (A)t. A matrix
U is unitary if U*U = I. A matrix A is Hermitian if A* = A. A change in
orthonormal bases in is given by a unitary matrix. Thus, the matrix version of
4.23 is as follows:
Corollary 4.25: Let Ae be a normal matrix, that is, AA* = A*A. Then
there exists a unitary matrix U such that UAU1 is diagonal. S
(10) Return to the example you gave in Exercise 3, and find an orthonormal
basis of consisting of eigenvectors for your normal operator.
(11) Suppose T is a nonzero, skew-Hermitian operator on V. Show the
eigenvalues of T are purely imaginary.
(12) Suppose T is a normal operator on V. Show that the Hermitian adjoint T*
of T is a polynomial in T.
(13) Let
1000
A=
0010
0100
0001
Find a unitary matrix U such that UAU1 is diagonal.
(14) Let A Show there exists a unitary matrix U such that UAU' is
upper triangular.
EXERCISES FOR SECTION 4 253
F an arbitrary field, 1
0 the field of rationals numbers, 2
l1 the field of real numbers, 2
C the field of complex numbers, 2
the field of p elements, 2
7 the integers, 2
V an arbitrary vector space, 2
N the natural numbers, 3
F n-tuples of elements from F, 3
B" all functions from A to B, 3
VI' all functions from {1,. . , n} to V, 3
a matrix,3
Mmxn(F) the set of m x n matrices over F, 3
F [X] polynomials in X over F, 3
C(I) the set of continuous functions on I c R, 4
Ck(I) k-times differentiable functions on I, 4
Riemann integrable functions on A, 4
At the transpose of a matrix A, 4
the sum of the Wi, 5
US) the linear span of 5, 6
the set of all subsets of V, 6
254
GLOSSARY OF NOTATION 255
$jetS
V1 the direct sum of the 33
T* the adjoint of T, 49
rk{T} the rank of T, 50
Tr(A) the trace of A, 53
a quadratic form, 54
4' a multilinear mapping, 59
infinitely differentiable functions on I, 60
Mu1F(Vl x x Vi,, V) the set of multilinear maps from V1 x x
toY, 61
V1 ®F V,, the tensor product of V1,..., 64
Yv®F 103
TT=T®FIp, 103
l.c.m.(f1,. . , fj a least common multiple of f1, , 104
mT(X) the minimal polynomial of T, 105
the inclusion map, 107
vi' VKV®FK, 107
T"=T®FIK, 107
Mm JF[X]) m x n matrices with coefficients in F[X], 110
adj(A) the adjoint of A, 110
cA(X) the characteristic polynomial of A, 111
&°F(T) the eigenvalues of T in F, 118
diag(a1,..., aj n x n diagonal matrix, 121
a k x k subdiagonal matrix, 123
if the conjugate of /3 in yC, 143
the exponential of A, 153
C(g(X)) the companion matrix of g(X), 161
lxi the absolute value of x, 171
liii a norm, 171
(V, liii) a normed linear vector space, 172
the Hilbert space of square summable
sequences, 173
d(oc, /3) the distance between z and /3, 173
Br(OC) a ball of radius r about at, 173
d(fl, A) the distance between /3 and A, 173
A° the interior of A, 173
AC the complement of A, 173
lim f
259
Subject Index
261
262 SUBJECT INDEX
quotient, 40 Transitive, 38
real, 54 Transpose, 4
Span, linear, 6 Transposition, 85
Spectral theorem, 227, 250 Triangular matrix, 117
Spectrum, 118
Standard basis, 10 Uniform:
Subfield, 16 continuity, 199
Subsequence, 189 norm, 177
Subspace, 4 Unique factorization, 101
cyclic, 126, 159 Unitary map, 245
independent, 34 Upper triangular, 117
invariant, 114
Sum, 5
Values, characteristic, 117
external, 34
Vectors:
internal, 35
characteristic, 120
Symmetric:
independent, 9
matrix, 8
Vector space(s):
power, 95
Banach, 200
relation, 38
complete, 200
complex, 236
Tensor product, 64
dual, 46
Topology, 174
Hilbert, 210
Trace, 53
inner product, 207
Transformation(s):
normed linear, 172
adjoint, 49
pre-Hilbert, 209
algebra of, 36
18
quotient, 40
bijective,
real, 54
bounded, 175
Hermitian, 245
identity, 28 Wedge product, 87