Вы находитесь на странице: 1из 292

The Theory of Matrices

Second Edition
with Applications
I
!I Peter Lancaster
D,fHI' ' ' ' ' of Ma,hI""";,,
I University of Calgary
Calgary. Alberta. Canoda

I Miron Tismenetsky
IBM Scumtijic Cenlt!T
Ii Technion City
Haifa. Israel

This is a volume in
COMPUTER SClBNCR ANDAPPLIED MAmEMATICS
A Series ojMollOgraphs andTextboolcs

This series has been reru.uned


COMPUI'BR SCIENCR ANDSCIENTIFIC COMPUTING

Editor: WBR.NBR RHElNBOLT @


Academic Press
lo~gO New Yolk Boston
A completelist of titles in this series is available from the publisherupoII request. i Sydney Thkyo Toronto

L
I
This is an Academic Pressreprintreproduced directly from thepagesof a
I
titlefor which type,plates,or film no longerexist. Although not up to the
standards of the original, this method of reproduction makes it possible to
!
provide copiesof bookwhichwouldotherwise be outof print. I
I
I To our wives,
FindUs on the Web! http://www.apnet.com
I Edna and Fania

I
No ran..Qf lbiapabiic:atio-. ...., be n:poduccd or
~.la.,fOllllllfby., ....... decuoaic
ClI'IP~~~jlOGORlla&CII'
Illy .lafQllUliClllItOlalCaaII n=trinIl SJIIGD. ftbout
permiuioa ill wriliDa fnJm lhc pubIiIbM.

ACAI;)EMIC PRESS
A l>MIi",. ofHtueOllTlBrtJ,C'cI ComptJlJy
S2S B Stmlt, Suite 1900. SIB Diego, CA 92101-4495

Ub....., of<:anaress Catalo&lq .. PubllcadoR Data

LanRJll:r. Peter. Dare.


The ~ oflUllica.
(Con!Puter science aad appliecllllWhemalic:a)
~iDda.
L MMca. L TIIIIIaIllUky.M. (Minla) D. Tille.
m. Series.
QA188.t36 1984 512.9'43483-15175
ISBN ().12-435S60-9(lIIk. p8pu).

Pri~ in ~ UuitedStalesof America


9899mT98
CONTENTS CONTENTS
ix

3.3 Linear Combinations 78 6 The Jordan Canonical Form: A Geometric Approach


3.4 Linear Dependence and Independence 80
3.5 The Notion ofa Basis 6.1 Annihilating Polynomials 221
83
6.2 Minimal Polynomials 224
3.6 Sum and Direct Sum of Subspaces 87
6.3 Generalized Eigenspaces 229
3.7 Matrix Representation and Rank 91
3.8 Some Properties of Matrices Related to Rank 95 6.4 The Structure of Generalized Eigenspaces 232
6.5 The Jordan Theorem 236
3.9 Change of Basis and Transition Matrices 98
6.6 Parameters of a Jordan Matrix 239
3.10 Solution of Equations 100
6.7 The Real Jordan Form 242
3.11 Unitary and Euclidean Spaces 104
6.8 Miscellaneous Exercises 244
3.12 Orthogonal Systems 107
3.13 Orthogonal Subspaces III
3.14 Miscellaneous Exercises 113
7 Matrix Polynomials and Normal Forms
7.1 The Notion of a Matrix Polynomial 246
4 Linear Transformations and Matrices
7.2 Division of Matrix Polynomials 248
4.1 Linear Transformations 117 7.3 Elementary Operations and Equivalence 253
4.2 Matrix Representation of Linear Transformations 122 7.4 A Canonical Form for a Matrix Polynomial 256
4.3 Matrix Representations, Equivalence, and Similarity 127 7.5 Invariant Polynomials and the Smith Canonical Form 259
4.4 Some Properties of Similar Matrices 131 7.6 Similarity and the First Normal Form 262
4.5 Image and Kernel of a Linear Transformation 133 7.7 Elementary Divisors 265
4.6 InvertibleTransformations 138 7.8 The Second Normal Form and the Jordan Normal Form 269
4.7 Restrictions, InvariantSubspaces, and Direct Sums of Transformations 142 7.9 The Characteristic and Minimal Polynomials 271
4.8 Direct Sums and Matrices 145 7.10 The Smith Form: Differential and Difference Equations 274
4.9 Eigenvalues and Eigenvectors of a Transformation 147 7.11 Miscellaneous Exercises 278
4.10 Eigenvalues and Eigenvectors of a Matrix 152
4.11 1he Characteristic Polynomial 155
4.12 The Multiplicities of an Eigenvalue 159 8 The Variational Method
4.13 First Applications to Differential Equations 161 8.1 Field of Values. Extremal Eigenvalues of a HermitianMatrix 283
4.14 Miscellaneous Exercises 164 8.2 Courant-Fischer Theory and the RayleighQuotient 286
8.3 The Stationary Property of the Rayleigh Quotient 289
5 Linear Transformations in Unitary Spaces and Simple Matril;es 8.4 Problems with Constraints 290
8.5 The Rayleigh Theoremand Definite Matrices 294
5.1 Adjoint Transformations 168 8.6 The Jacobi-Gundelfinger-Frobenius Method
174 296
5.2 Normal Transformations and Matrices 8.7 An Application of the Courant-Fischer Theory 300
5.3 Hermitian, Skew-Hermitian, and Definite Matrices 178 8.8 Applications to the Theory of Small Vibrations
302
5.4 Square Root of a Definite Matrix and Singular Values 180
5.5 Congruence and the Inertia of a Matrix 184
5.6 Unitary Matrices 188 9 Functions of Matrices
5.7 Polar and Singular-Value Decompositions 190
5.8 IdempotentMatrices (Projectors) 194 9.1 Functions Defined on the Spectrumof a Matrix
9.2 305
5.9 Matrices over the Field of Real Numbers 200 Interpolatory Polynomials 306
5.10 Bilinear, Quadratic, and HermitianForms 202 9.3 Definition of a Function of a Matrix
9.4 308
5.11 Finding the Canonical Forms 205 Properties of Functions of Matrices 310
5.12 The Theory of Small Oscillations 208 1 9.5
9.6
Spectral Resolution of f(A) 314
5.13
5.14
Admissible Pairs of Matrices
Miscellaneous Exercises
212
217
I 9.7
Component Matrices and Invariant Subspaces
FurtherProperties of Functions of Matrices
320
322
CONTENTS CONTENTS
xi

9.8 Sequences and Series of Matrices 325 13.3 The Bezoutian and the Resultant 454
9.9 The Resolvent and the Cauchy Theorem for Matrices 329 13.4 The Hermite and the Routh-Hurwitz Theorems 461
9.10 Applications to Differential Equations 334 13.5 The Schur-Cohn Theorem 466
9.11 Observable and Controllable Systems 340 13.6 Perturbations of a Real Polynomial 468
9.12 Miscellaneous Exercises 345 13.7 The Lienard-Chlpan Criterion 470
13.8 The Markov Criterion 474
13.9 A Determinantal Version of the Routh-Hurwitz Theorem 478
10 Norms and Bounds for Eigenvalues 13.10 The Cauchy Index and Its Applications 482
10.1 The Notion of a Norm 350
10.2 A Vector Norm as a Metric: Convergence 354
14 Matrix Polynomials
10.3 Matrix Norms 358
10.4 Induced Matrix Norms 362 14.1 Linearization of a Matrix Polynomial 490
10.5 Absolute Vector Norms and Lower Bounds of a Matrix 367 14.2 Standard Triples and Pairs 493
10.6 The Gedgorin Theorem 371 14.3 The Structure of Jordan Triples 500
10.7 Gersgorin Disks and Irreducible Matrices 374 14.4 Applications to Differential Equations 506
10.8 The Schur Theorem 377 14.5 General Solutions of Differential Equations 509
10.9 Miscellaneous Exercises 380 14.6 Difference Equations 512
14.7 A Representation Theorem 516
14.8 Multiples and Divisors 518
11 Perturbation Theory 14.9 Solvents of Monic Matrix Polynomials 520
11.1 Perturbations in the Solution of Linear Equations 383
11.2 Perturbations of the Eigenvalues of a Simple Matrix 387
11.3 Analytic Perturbations 391 IS Nonnegativ~ Matrices
11.4 Perturbation of the Component Matrices 393 15.1 Irreducible Matrices
395 528
11.5 Perturbation of an Unrepeated Eigenvalue 15.2 Nonnegative Matrices and Nonnegative Inverses
397 530
11.6 Evaluation of the Perturbation Coefficients 15.3 The Perron-Frobenius Theorem (I)
399 532
11.7 Perturbation of a Multiple Eigenvalue 15.4 The Perron-Frobenius Theorem (II) 538
15.5 Reducible Matrices 543
12 Linear Matrix Equations and Generalized Inverses 15.6 Primitive and Imprimitive Matrices 544
15.7 Stochastic Matrices 547
12.1 The Notion of a Kronecker Product 406 15.8 Markov Chains 550
12.2 Eigenvalues of Kronecker Products and Composite Matrices 411
12.3 Applications of the Kronecker Product to Matrix Equations 413
12.4 Commuting Matrices 416 Appendix 1: A Survey of Scalar Polynomials 553
12.5 Solutions of AX + XB = C 421 Appendix 2: Some Theorems and Notions from Analysis 557
12.6 One-Sided Inverses 424 Appendix 3: Suggestions for Further Reading
12.7 Generalized Inverses 428 560
12.8 The Moore-Penrose Inverse 432
12.9 The Best Approximate Solution of the Equation Ax = b 435 Index
563
12.10 Miscellaneous Exercises 438

13 Stability Problems
13.1 The Lyapunov Stability Theory and Its Extensions 441
13.2 Stability with Respect to the Unit Circle 451

L
Preface

In this book the authors try to bridge the gap between the treatments of matrix
theory and linear algebra to be found in current textbooks and the mastery of
these topics required to use and apply our subject matter in several important
areas of application, as well as in mathematics itself. At the same time we
present a treatment that is as self-contained as is reasonably possible, beginning
with the most fundamental ideas and definitions. In order to accomplish this
double purpose, the first few chapters include a complete treatment of material to
be found in standard courses on matrices and linear algebra. This part includes
development of a computational algebraic development (in the spirit of the first
edition) and also development of the abstract methods of finite-dimensional
linear spaces. Indeed, a balance is maintained through the book between the
two powerful techniques of matrix algebra and the theory of linear spaces and
transformations.
The later chapters of the book are devoted to the development of material that
is widely useful in a great variety of applications. Much of this has become a part
of the language and methodology commonly used in modern science and en-
gineering. This material includes variational methods, perturbation theory,
generalized inverses, stability theory, and so on, and has innumerable applica-
tions in engineering, physics, economics, and statistics, to mention a few.
Beginning in Chapter 4 a few areas of application are developed in some
detail. First and foremost we refer to the solution of constant-coefficient systems
of differential and difference equations. There are also careful developments of
the first steps in the theory of vibrating systems, Markov processes, and systems
theory, for example.
The book will be useful for readers in two broad categories. One consists of
those interested in a thorough reference work on matrices and linear algebra for
Use in their scientific work, whether in diverse applications or in mathematics
xiii

i
i

1_
PREFACE
PREFACE xv
xrv
and in others hints for solution are provided. These are seen as an integral part of
itself. The other category consists of undergraduate or graduate students in a
the book and the serious reader is expected to absorb the information in them as
variety of possible programs where this subject matter is required. For example,
well as that in the text.
foundations for courses offered in mathematics, computer science, or engin~r In comparison with the 1969 edition of ' 'The Theory of Matrices" by the first
ing programs may be found here. We address the latt~r audience in.more detail. author, this volume is more comprehensive. First, the treatment of material in the
The first seven chapters are essentially self-contame~ and require no form~1 first seven chapters (four chapters in the 1969 edition) is completely rewritten
prerequisites beyond college algebra. However, expenen~e sug~ests that. this and includes a more thorough development of the theory of linear spaces and
material is most appropriately used as a second course in matnces or linear transformations, as well as the theory of determinants.
algebra at the sophomore or a more senior level. . . Chapters 8-11 and 15 (on variational methods, functions of matrices, norms,
There are possibilities for several different courses depending on specific perturbation theory, and nonnegative matrices) retain the character and form of
needs and specializations. In general, it would not be nec~ssary to .work sys~em chapters of the first edition, with improvements in exposition and some addition-
atically through the first two chapters. They serve to establish notatl~n, termtno~ al material. Chapters 12-14 are essentially extra material and include some quite
ogy, and elementary results, as well as some deeper results concernmg determi- recent ideas and developments in the theory of matrices. A treatment of linear
nants, which can be developed or quoted when required. Indeed, the first two equations in matrices and generalized inverses that is sufficiently detailed for
chapters are written almost as compendia of primitive definitions, res~lts, and most applications is the subject of Chapter 12. It includes a complete description
exercises. Material for a traditional course in linear algebra, but WIth more of commuting matrices. Chapter 13 is a thorough treatment of stability questions
emphasis on matrices, is then contained in Chapters 3-6, with the possibility of for matrices and scalar polynomials. The classical polynomial criteria of the
replacing Chapter 6 by Chapter 7 for a more algebraic development of the Jordan nineteenth century are developed in a systematic and self-contained way from the
normal form including the theory of elementary divisors. more recent inertia theory of matrices. Chapter 14 contains an introduction to the
More advanced courses can be based on selected material from subsequent recently developed spectral theory of matrix polynomials in sufficient depth for
chapters. The logical connections between these chapters are indicated below to many applications, as well as providing access to the more general theory of
assist in the process of course design. It is assumed that in order to absorb any of matrix polynomials.
these chapters the reader has a reasonable grasp of the first seven, as well as The greater part of this book was written while the second author was a
some knowledge of calculus. In this sketch the stronger connections are denoted Research Fellow in the Department of Mathematics and Statistics at the Universi-
by heavier lines. ty of Calgary. Both authors are pleased to acknowledge support during this
period from the University of Calgary. Many useful comments on the first edition
are embodied in the second, and we are grateful to many colleagues and readers
for providing them. Much of our work has been influenced by the enthusiasms of
co-workers I. Gohberg, L. Rodman, and L. Lerer, and it is a pleasure to
acknowledge our continued indebtedness to them. We would like to thank H. K.
8 Wimmer for several constructive suggestions on an early draft of the second
edition, as well as other colleagues, too numerous to mention by name, who
made helpful comments.
The secretarial staff of the Department of Mathematics and Statistics at the
University of Calgary has been consistently helpful and skillful in preparing the
typescript for this second edition. However, Pat Dalgetty bore the brunt of this
work, and we are especially grateful to her. During the period of production we
have also benefitted from the skills and patience demonstrated by the staff of
Academic Press. It has been a pleasure to work with them in this enterprise.
Prerequisite structure. by chapters.

There are many exercises and examples throughout the book. These range
from computational exercises to assist the reader in fixing ideas, to extensions of P. Lancaster M. Tismenetsky
the theory not developed in the text. In some cases complete solutions are given, Calgary Haifa
CHAPTER 1

Matrix Algebra

An ordered array of mn elements alj (i = 1,2, ... , m; j = 1,2, ... , n)


written in the form

all a12 ... atn]


a21 a22 .,. a2n
A = .
.,, ...
[ ..
ami am 2 a mn

is said to be a rectangular m x n matrix. These elements can be taken from


an arbitrary field IF. However, for the purposes of this book, IF will always
be the set of all real or all complex numbers, denoted by Rand C, respectively.
A matrix A may be written more briefly in terms of its elements as

or

where ali (l ::;; i ::;; m, 1 ::;; j ::;; n) denotes the element of the matrix lying on
the intersection of the ith row and the jth column of A.
Two matrices having the same number of rows (m) and columns (n) are
matrices of the same size. Matrices of the same size

and

are equal if and only if all the corresponding elements are identical, that is,
ai j = blj for 1 ::;; i ::;; m and 1 ::;; j ::;; n.
The set of all m x n matrices with real elements will be denoted by Rm x n.
Similarly, em x n is the set of all m x n matrices with complex elements.
1.1 031'"UAL I YI'Ii.S OF MATRICES
3
I MATRIX ALGEBRA

1.1 Special Types of Matrices and in the case a = 0, a square zero-matrix

If the number of rows of a matrix is equal to the number of columns, that


is, m"'" n, then the matrix is square or of ordern:

is obtained. A rectangular matrix with all its elements zero is also referred
A = [aiJi,i= I = [::: ::; ::: :::]. to as a zero-matrix.
A square matrix A is said to be a Hermitian (or self-adjoint) matrix if the
a,,1 a,,2 ." a"" elements on the main diagonal are real and whenever two elements are
The elements al " a22' . , , , a"" of a square matrix form its main diagonal, positioned symmetrically with respect to the main diagonal, they are mutually
whereas the elements al", a2.,,-I, ... , a"l generate the secondary diagonal of complex conjugate. In other words, Hermitian matrices are of the form
the matrix A.
Square matrices whose elements above (respectively, below) the main al l al2 .. ,al"l
diagonal are zeros, a~2 a~2 ,.. a~",
[ ., .
A - [:::
1- :
a~2
. .
~]
0'
A
2-:
_la~l ::: '.:~
'. SO that
aln a2n ann
aji = (iij for i = 1, 2, ... , n, j = 1, 2, ... , n, and aii denotes the com-
a~1 .: an.~~l ann 0 0
plex conjugate of the number ai)'
If all the elements located symmetrically with respect to the main diagonal
are called lower- (respectively, upper-) triangular matrices. are equal, then a square matrix is said to be symmetric:
Diagonal matrices are a particular case of triangular matrices, for which
all the elements lying outside the main diagonal are equal to zero: all al2 aln1
a12 an a2n
.. ..

[ a~l: a~2 ~0] = dlag[au,


.
.., ..
. . [
A= a22'' an"J. a2n al n a nll
It is clear that, in the case of a real matrix (i,e., consisting of real numbers),
o ... 0 an" the notions of Hermitian and symmetric matrices coincide.
If all = a22 = ... = ann = a, then the diagonal matrix A is called a scalar Returning to rectangular matrices, note particularly those matrices having
matrix; only one column (column-matrix) or one row (row-matrix) of length, or

[~
size, n:

A = ; :o<a= ] = diag[a, a, ... , aJ.


o ...
In particular, if a = 1, the matrix A becomes the unit, or identity matrix

[i ~ ~l
The reason for the T symbol, denoting a row-matrix, will be made clear in
Section 1.5.
Such n x 1 and 1 x n matrices are also referred to as vectors or ordered
1- 0 n-tuples, and in the cases n = 1, 2, 3 they have an obvious geometrical

1__--
1 MATRIX t\.U.,iHIJIU\ I . ..! HIE UI'''KAnUNS UI' ADDITION AND ;~CALAII. MULTI!'LlCATlON J

v z z R(x1 +X 2'Yl +Y2. Z1 +Z2 1


I
I
Xo P(xo I ,
,
I

,, ,
o. po .. X __ ~~~_X

,
P(X 1.Vl' Z,l

Fig. 1.1 Coordinatesand positionvectors. /"'"=----------.V


x
meaning as the coordinates of a point P (or as components of the vector OF) Fig.I.1 The parallelogramlaw.
in one-,two-, or three-dimensional space with respect to the coordinate axes
(Fig. 1.1).
For example, a point P in the three-dimensional Euclidean space, having For ordered n-tuples written in the form of row- or column-matrices this
---+ operation is naturally extended to '
Cartesian coordinates (xo, Yo, %0), and the vector OP, are associated with
the 1 x 3 row-matrix [xo Yo %0]' The location of the point P, as well as [Xl X2 .. , X n] + [Yl Y2 .. Yn]
---+
of the vector OP, is described completely by this (position) vector. ~ [x, + Yl X2+ Yz .. , X n + YnJ
Borrowing some geometrical language, the length of a vector (or position or
vector) is defined by the natural generalization of Euclidean geometry: for

~ ~ ;:].
a vector 6 with elements b h b2 , , bn the length is

16\ ~ (Ib l l2 + Ib2 12 + ... + Ibn I2) 1/2. : ,: ] + [::;:] [::


[
Note that, throughout this book, the symbol ~ is employed when a x, )n Xn + Yn
relation is used as a definition. That is, the elements (or components, or coordinates) of the resulting vector
are merely the sums of the corresponding elements of the vectors. Note that
only vectors of the same size may be added.
Now the following definition of the sum of two matrices A = [ai;J7:j':: I
1.2 The Operations of Addition and Scalar Multiplication and B = [bj.il~J':: I of the same size is natural:
A +B ~ [alj + bjj J7:j':: r-
Since vectors are special cases of matrices, the operations on matrices will The properties of the real and complex numbers (which we refer to as
be defined in such a way that, in the particular cases of column matrices and scalars) lead obviously to the commutative and associative laws of matrix
addition.
of row matrices, they correspond to the familiar operations on position
vectors. Recall that, in three-dimensional Euclidean space, the sum of two Exercise 1. Show that, for matrices of the same size,
position vectors is introduced as
A + B = B + A,
[Xl Yl %1] + [X2 Y2 %2] ~ [Xl + X2 Yl + Y2 %1 + %2]. (A + B) + C = A + (B + C). 0
This definition yields the parallelogram law of vector addition, illustrated T~ese rules allow easy definition and computation of the sum of several
in Fig. 1.2. matfices of the same size. In particular, it is clear that the sum of any
o 1 MATRIX ALGEBRA 1.3 MATRIX MULTIPLICATION 7

number of n x n upper- (respectively, lower-) triangular matrices is an multiplication and addition of matrices follow immediately from the defini-
upper- (respectively, lower-) triangular matrix. Note also that the sum of tions and from the corresponding properties of numbers.
several diagonal matrices ofthe same order is adiagonal matrix. Exercise 2. Check that for any two matrices of the same size and any
The operation of subtraction on matrices is defined as for numbers.
scalars rx, P
Namely, the difference of two matrices A and B of the same size, written
A - B, is a matrix X that satisfies OA = 0,
X+B=A.
(IX + P)A = + PA,
r:t.A
IX(A + B) = rxA + rxB,
Obviously, IX(PA) = (ap)A. 0
Note that by writing (-l)B ~ -B we may, alternatively, define the
where difference A - B as the sum of the matrices A and - B.

It is clear that the zero-matrix plays the role of the zero in numbers: a
matrix does not change if the zero-matrix is added to it or subtracted from it. 1.3 Matrix Multiplication
Before introducing the operation of multiplication of a matrix by a scalar,
recall the corresponding definition for (position) vectors in three-dimensional
Euclidean space: If aT = [al a2 a3] and oc denotes a real number. then Like other operations on matrices, the notion of the product of two
the vector (UI T is defined by matrices can be motivated by some concepts of vector algebra. Recall that
in three-dimensional Euclidean space, the scalar (or dot, or inner) product
rxaT !! [lXal rxa2 oca3]'
of the (position) vectors a and b is an important and useful concept. It is
Thus. in the product of a vector with a scalar. each element of the vector is defined by
multiplied by this scalar.
a' b~; /allblcos IX, (1)
This operation has a simple geometrical meaning for real vectors and
scalars (see Fig. 1.3).That is. the length ofthe vectores" is loci times the length where ex denotes the angle between the given vectors. Clearly, for nonzero
of the vector aT. and its orientation does not change if rx > 0 and it reverses vectors a and b. the scalar product is equal to zero if and only if II and bare
if IX < O. orthogonal.
Passing from (position) vectors to the general case, the product of the . If the coordinates al. a2. a3 and blJ b 2 , b 3 of the vectors a and b, respec-
matrix A = [au] with a scalar rx is the matrix C with elements cl j = IXQj}. tively, are known, then the computation of the scalar product is simple;
that is, C A [rxa j } ] . We also write C = rxA. The following properties of scalar
a' b = alb} + a2b2 + a3b3' (2)
The crux of the definition of matrix multiplication lies in the convention
of defining the product of row vector aT with column vector b (in this order)
to be the sum on the right of Eq. (2). Thus,

[a, a, a,J[::] a,6, + a,b, + a,6,. (3)


l I
Fig. 1.3 Scalar multiplication.
.T6 _ f>,

and this definition will now supersede the" dot" notation of Eqs, (l) or (2).
Not unnaturally, it is described as "row-into-column" multiplication.
8 1 MATRIX ALGEBRA 1.3 MATRIX MULTIPLICATION 9

In contrast to Eq, (I), the definition in (3) lends itself immediately to rows of the second. When this is the case, A and B are said to be conformable
broad generalizations that will be very useful indeed. First, the elements of (with respect to matrix multiplication).
a andb may be from any field (i,e., they are not necessarily real numbers), Exercise 2. If a is a column matrix of size m and bT is a row matrix of size
and second, the vectors may equally well have n elements each, for any n show that aliT is an m x n matrix. (This matrix is sometimes referred to
positive integer n. as an outer product of a and b.) 0
Thus, if a, b are each vectors with n elements (the elements can be either
real or complex numbers), define The above definition of matrix multiplication is standard; it focusses on
the rows of the first factor and the columns of the second. There is another
representation of the product AB that is frequently useful and, in contrast,
aTb = [al a2 . . . aJlJI:::] A t arbr
. r-l
(4) is written in terms of the columns of the first factor and the rows of the second.
First note carefully the conclusion of Exercise 2. Then iet A be m x I and
bn B be I x n (so that A and B are conformable for matrix multiplication).
Suppose that the columns of A are a10 a2, ... , a, and the rows of Bare
This idea will now be exploited further in the full definition of matrix multi- 61.. bI, ... , bT. Then it is claimed that
plication.
I
Let A be any matrix consisting of m rows aT (1 ~ i ~ m) of length
l(soAism x l)andletBdenoteamatrixcontainingncolumnsbJ(l ~j ~ n)
AB = L akbr,
k=l
(6)
of the same length I (so that B is I x n). Then the product of A and B is
defined to be the m x n matrix C = [C'J]' where
and of course each product Ilk br is an m x n matrix.
Exercise 3. Establish the formula in (6). 0
CiJ = aTbJ (1 ~ i ~ m, 1 ~ j ~ n).
Thus Matrix multiplication preserves some properties of multiplication of
scalars.

AD = l1}b' b, bJ Ex~rcise 4. Check that for any matrices of appropriate orders


AI = IA = A,
A(B + C) = AB + AC, (B + C)D = BD + CD (distributive laws),
a1b alb2
1 albn] A(BC) = (AB)C (associative law). 0
.c.. IIIb l alb 2 alb n = [ "'b ].... ,n ~owever, the matrix product does not retain some important properties
- :: : - a. J J=I
( ., . enjoyed by multiplication of scalars, as the next exercises demonstrate.
lI~bl a~b2 a~bn
Exereise 5. Show, setting
Using (4) we get the multiplication formula

AB
I ]",.n
L aikbkJ.I.J=1
= [ k-l (5) and B= [ 012]
-1 0 0
Exercise 1. Show that or

[~-: ~m !l-[~ -:] 0


that AB :f. BA.
A
0
= [~ ~] and B= [ 10]
-1 0'

It is important to note that the product of two matrices is defined if and


only if the number of columns of the first factor is equal to the number of Thus, matrix multiplication is, in general, not commutative.
AV

Exercise 6. Show that AB = 0 does not imply that either A or B is a zero- Answers.
matrix. (In other words, construct matrices A :p. 0, B :p. 0 for which AB = 0).
d- C 0] a b C]
Exercise 7. Check that if AB = AC, A ~ 0, it does not always follow that (a) [ c d ' (b)
[
0 a b,
B=C. 0 o 0 a
A particular case of matrix multiplication is that of multiplication. of a where a, b, c, d denote arbitrary scalars. 0
matrix by itself. Obviously, such multiplication is possible if and only If the
It follows from Exercise 1.3.4 that the identity matrix commutes with any
matrix is square. So let A denote a square matrix and let p be any positive
square matrix. It turns out that there exist other matrices having this property.
integer. The matrix
Ageneral description of such matrices is given in the next exercise. Following
A"A AAA that, other special cases of commuting matrices are illustrated.
" J

,limes Exercise 2. Show that a matrix commutes with any square conformable
is said to be the pth power of the matrix A, and A 0 is defined to be the identity matrix if and only if it is a scalar matrix.
matrix with the size of A. Note also that A 1 A A.
Exercise 3. Prove that diagonal matrices of the same size commute.
Exercise 8. Verify that
Exercise 4. Show that (nonnegative integer) powers of the same square
matrix commute.

Exercise 5. Prove that if a matrix A commutes with a diagonal matrix


Exercise 9. Prove the exponent laws; diag[a .. a2, ... , an], where ai aj, i i, then A is diagonal. 0
A"A' = A"+', Another important class of matrices is formed by those that satisfy the
(A")' = A"', equation

where p, q are any nonnegative integers. 0 A2 = A. (1)


Such a matrix A is said to be idempotent.
Exercise 6. Check that the matrices

1.4 Special Kinds of Matrices Related to Multiplication


and A2
1 0-1
= [0
2 2]
o 0 1
Some types of matrices having special properties with regard to matrix are idempotent.
multiplication are presented in this section. As indicated in Exercise 1.3.5,
two matrices do not generally commute: AB :p. BA. But when equality Exercise 7. Show that if A is idempotent, and if p is a positive integer, then
does hold, we say that the matrix A commutes with B; a situation to be A"=A.
studied in some depth in section 12.4.
Exercise 1. Find all matrices that commute with the matrices Exercise 8. Describe the class of 2 x 2 idempotent matrices. 0

A square matrix A is called nilpotent if there is a positive integer p such


1 OJ
(a) [_~ ~J (b)
Io0
0 1 .
000
that

A" =0. (2)


I.J 88V""_ - - - - ----- - ---- -

Obviously, it follows from (2) that any integer kth power of A with k ~ P with a'1. + be = 1, or
is also a zero matrix. Hence the notion of the least integer Po satisfying(2) is
reasonable and is referred to as the degree of ntlpoteney of the matrix A.
withB=1.
Exercise 9. Check that the matrix
Exercise 16. Show that if A is an involutory matrix, then the matrix

o10]
B = !(l + A) is idempotent. 0
1
o 0
is nilpotent with degree of nilpotency equal to 3. 1.5 Transpose and Conjugate Transpose
Exercise 10. Prove that every upper-triangular matrix having zeros on the
main diagonal is nilpotent. In this section some possibilities for forming new matrices from a given
Exercise 11. Find all nilpotent 2 x 2 matrices having degree of nilpotency one are discussed.
Given an m x n matrix A = [a'j], the n x m matrix obtained by inter-
equal to two.
changing the rows and columns of A is called the transpose of the matrix A
Answer. and is denoted by AT. In more detail, if

where a'l. + be = 0 and not all of a, b, c are zero. 0


A = [:;: :;: ::: :::]. then AT A [:;: :i: ::: ~:].
a m1 am2 a mn al n a2n a mn
The last type of matrix to be considered in this section is the class of
involutory matrices. They are those square matrices satisfying the condition Note that the rows of A become the columns of AT and the columns of A
are the rows of its transpose. Furthermore, in view of this definition our
A'l. = I. (3) notation liT for a row matrix in the previous sections just means that it is the
Exercise 12. Check that for any angle 8, transpose of the corresponding column matrix II.
Exercise 1. The matrices
= [cos 8
is an involutory matrix.
A sin 8]
sin 8 -cos 8
A=[_; ~~] and s-u -~]
Exercise 13. Show that any matrix having 1 or -1 in all positions on the are mutually transposed matrices: A = BT and B = AT. 0
secondary diagonal and zeros elsewhere is involutory.
The following properties of the transpose are simple consequences of the
Exercise 14. Show that a matrix A is involutory if and only if definition.
(I - A)(l + A) = o. Exercise 2. Let A and B be any matrices of an appropriate order and let
e fF. Prove that
Exercise 15. Find all 2 x 2 involutory matrices. (AT)T == A.

Answer. (exA)T = AT,


(A + B)T = AT + BT,
(AB)T = BT AT. 0

),
..... \
.. ,."
)
l4 1 MATRIX ALGllBllA 1.5 'TRANSPOSE AND CONJUGATETRANSPOSE 15

Using the notion of transpose, a brief definition of a symmetric matrix The operations of transposition and conjugation are combined in the
(equivalent to that in Section 1.1) can be formulated. important concept of the conjugate transpose A* of a matrix A. It is defined
by A* A AT. In more detail, for A = [aij]i:i':. l'
Exercise 3. Show that a square matrix A is a symmetric matrix if and only if
AT == A. 0 (1)
all a21 .., ami]
Some additional properties of symmetric matrices follow easily from (1). A* A ~~2 ~~2 ... ~~2.
[
Exercise 4. Prove that if A and B are symmetric matrices, then so is A + B. al n a2n a mn

Exercise S. Show that for any square matrix A the matrices AAT, ATA and It is clear that for a real matrix A the notions of transpose and conjugate
T ' ,
A + A are symmetric. 0 transpose are equivalent.

A matrix A is called skew-symmetric jf Exercise 9. Check that the conjugate transposes of the matrices

Exercise 6. Check that the matrix


AT = -A. (2) . [2 -3]
A = 0 1 and B= [~ -3 ]
2i + 1
1 0
0-2
[ -3]
2 0-2
are the matrices

[-i 0]
[_~ ~ ~] =
A =
A* = AT and B* _
320 - -3 -2i +1'
is skew-symmetric. 0 respectively.
Obviously the elements of an n x n skew-symmetric matrix A = [a .] Exercise 10. (Compare with Exercise 2.) Show that (A*)* = A, (exA)* =
satisfy the conditions I)
aA*, (A + B)* = A* + B*, (AB)* = B*A*. 0
a/j=-aj/> t s t.l s;, Applying the notion of the conjugate transpose, a nice characterization of
and in particular, the elements of A on the main diagonal are zeros. a Hermitian matrix can be formulated.

Exercise 7. Check that for any square matrix A the matrix A - AT is Exercise 11. Prove that a square matrix A is Hermitian (as defined in
skew-symmetric. 0 Section 1.1) if and only if A = A*.
The assertion of Exercise 7 together with Exercise 5 admits the following Exercise 12. Show that for any matrix A, the matrices AA* and A*A are
decomposition: Hermitian.

Exercise B. Show that any square matrix A can be represented in the form Exercise 13. Check that, for any square matrix A, the matrices A + Alii and
i(A - A*) are Hermitian. 0
A = Al + A 2 ,
Observe that the 1 x 1 complex matrices (i.e., those with only one scalar
where Al is a symmetric matrix and A 2 a skew-symmetric matrix. 0 element) correspond precisely to the set of all complex numbers. Further-
~onside~ a ma~rix A = [aij]~:/:' 1 with elements from C. The complex more, in this correspondence the 1 x 1 Hermitian matrices correspond to
~onJu9a.te A of ~ IS defined as the matrix obtained from A by changing all the real numbers. More generally, it is useful to consider the embedding of
Its entries to their complex conjugates. t In other words, A A [a/j]7'i':. l' It is the Hermitian matrices in c:;n)( n as a generalization of the natural embedding
clear that a matrix A is real if and only if A == A. ' ~f iii in C. In particular, note the analogy between the following representa-
tion of a complex square matrix and the familiar Cartesian decomposition
t Recall that if a = c + di; c. d," R. then the complex conjugate is ii A c - di. of complex numbers.

J:
!
16 1 MATRIX ALQEBIlA. 1.6 SUBMATRICES AND PARTITIONS OF A MATRIX 17

Bxercise 14. Show that every square matrix A can be written uniquely in can be partitioned as
the form A = Al + iA2 , where At and A:z are Hermitian matrices. 0
A skew-Hermitian matrix A is defined in terms of the conjugate transpose
by A'" == -A.
A [~ }!_~_~ __~]
= __ or A = [~t"';--=:-j-~]
Exercise 15. Check that the matrix 2 -1: 0 0 2:-10:0
, ,
and in several other ways. In the first case, the matrix is partitioned into
submatrices (or blocks) labeled

is skew-Hermitian. A = [All Au]


A 2 1 A22 '
(1)
Exercise16. Show that the diagonal elements of a skew-Hermitian matrix
are pure imaginary. 0 where

All = [~ ~]. Au = [-24] 1 0' A 2 1=[2 -1], A 2 2 = [0 0].

1.6 Submatrices and Partitions of a Matrix In the second partition A can be written in the form

Given a matrix A = [aiJ]~'j~ b if a number of complete rows or columns


A= [All
A 21
Au
An A 2 3
Au],
of A are deleted, or if some complete rows and complete columns are deleted,
where
the new matrix that is obtained is called a submatrix of A.
Bxampld. If
All = [2], Au = [3 -2], A l3 = [4],
2 3-2 4] [~]. [_~ ~l [~l
[
A=O 1 1 0 ,
2 -1 0 0
Au = A22 =
Note that some blocks may be 1 x 1 matrices, that is, scalars.
A2 3 =
0
then the matrices The submatrices of a partitioned matrix are referred to as the elements of

[1], [2 3 -2 4],
a block-matrix. For example, the submatrices All'
be considered as the elements of the 2 x 2 block-matrix A.
Au,
A 2 1, A 2 2 of (1) may

The notions of a square block-matrix, a diagonal block-matrix, and so on,


are some of the submatrices of A. 0 are defined as for matrices with scalar elements (see Section 1.1).
There is a special way of dividing a matrix into submatrices by inserting Exampl,3. The matrices
dividing lines between specified rows and between specified columns (each
dividing line running the full width or height of the matrix array). Such a
division is referred to as a partition of the matrix.
A= [Auo AAu], B= [B0ll B0] 22
22

Exampl, 2. The matrix are upper-triangular and diagonal block-matrices, respectively. 0

A =
2 3
0 1
-2 4]
1 0
Square matrices admit some special partitions. If the diagonal blocks
All of a square matrix A are square matrices for each i, then such a partition
[ 2 -1 o 0 of A is said to be symmetric.
1 MATRIX ALGEBRA 1./ I"ULrNUllUAUi IN A MAnUJl

Example 4. The matrix Exercise 7. Prove that if each pair of blocks AI} and H,} of the matrices A
and B, respectively, are of the same size, then
A + B = [Ai} + BIj]ri"=I, A - B = [Ai} - B,j]r.'.J"=I'
Exercise B. Prove. that if the m x I matrix A has blocks A,,, of sizes
m, x I" where 1 SiS rand 1 S k S , and the I x n matrix B consists of
is a symmetrically partitioned matrix. blocks B,,} of sizes I" x nJ' where 1 S; j S; s, then [see Eq. (1.3.5)]
Example 5. Let the matrices A and B be partitioned as follows.
AH = L
II
AilcB"J
Jr.. 0
[
"=1 i,}=1
A = [Au Au]. B = [B ll
B u].
A2 1 A22 Bu B22 Note that two "conformable" conditions are required for performing
block-matrix multiplication. The number of blocks in a row of A must be
and be of the same size. Recalling the definition of matrix addition. it is
the same as the number of blocks ina column of B. Also, the number of
easily seen that if each pair of blocks Ai} and Bi} (1 S; i. j S; 2) is of the same columns in All< must be the same as the number of rows in B,,} for 1 S; k S;
size. then and all possible i and j.
A B = [All B l l Au B u ].
A 2 1 B2 1 A 2 2 B u
Clearly. if the partition of B differs from that of A. say. 1.7 Polynomials in a Matrix

B = [BB u Bu B 13] . Let A denote a square matrix and let


u Bu B23
then A and H cannot be added block by block. 0 p(.t) = ao + al.t + ... + a,.t' (a, : 0)
Block-matrix multiplication can also be performed in some cases. be a. pol~'nomial of ~egree I with scalar coefficients. The operations on
matnces mtroduced m Sections 1.2 and 1.3 can be combined to define a
Exercise 6. Let the conformable matrices A and B be partitioned in the matrix
following way:
p(A) A aDI + alA + .. , + a,A'
2 : 13] . ,"
A = [ 0 \ -1 2 A [Au Au], This is said to be a polynomial in A.
Exercise 1. Check that if p(.t) = 2 - .t + .t2 and

B=
024
['~;"~i"~':] [:::1 A A=010,
101
[
0 1 OJ
Check by a direct computation that then

AB = [Au A12 l[B =


ll] AuB u + A12B2 1 0 = 21 -
2 0 0]
+ A 2 = 0 2 O. 0
B2 1 p(A) A
[0 1 2
The above examples show that in certain cases,the rules of matrix addition
and multiplication carryover to block-matrices. The next exercises give the Si~ce every square matrix commutes with itself, all polynomials in a given
general results. JDatnx commute with one another. This fact is required for generalizing the
1 MATRIX ALOIlIlRA 1.8 MISCELLANEOUS EXERCISES 21
following familiar properties of scalar polynomials to polynomials in a E1&"cise 4. Let p(,\) - ,\3 + ,\2 + 2,\ + 2. Show that if
matrix.
A 0 0 000
Exercise 2. Show that if p, q are scalar polynomials and j.t 1 o 000
p(,t) + q(A) = h(A), p(A)q(A) = t(,t),
A=
o 0 j.t 0 0 0
then, for any square matrix A;
o 0 0 v 1 o'
o 0 0 0 v 1
peA} + q(A) = h(A), p(A)q(A) = teA). o 0 0 0 0 v
Verify also that if then
p(,t) = q(A) d(A) + r(A),
p(,t) 0 0 0 0 0
then, for any square matrix A;
0 p(j.t) p'{j.t) 0 0 0
PeA) = q(A) deAl + rCA). 0 P(A)=
0 0 p{j.t) 0 0 0
In the expression p(,t) = q(,t) d(,t) + r(,t), if r(,t) is the zero polynomial or 0 0 0 p(v) p'(v) !p"(v) ,
has degree less than the degree of d(A), then q(A) and rCA) are, respectively, 0 0 0 0 p(v) p'(v)
the quotient and remainder obtained on division of p(,t) by d(,t). In the case 0 0 0 0 0 p(v)
rCA) == 0, deAl is a divisor of peA). Exercise 2 shows that such a representation
carries over to polynomials in a matrix and, in particular, implies that if where p' and pIt denote the first and second derivatives of p, respectively.
deAl is a divisor of p(A), then deAl is a divisor of P(A) in the sense that PeA) = Exercise S. Let A be a symmetrically partitioned matrix of the form
Q deAl for some square matrix Q of the same order as A.
However, the well-known and important fact that a scalar polynomial of
degree I has exactly I complex zeros does not generalize nicely for a poly-
A= [A~l A;2}
nomial in a matrix. The next example illustrates this. Prove that for any positive integer n,
Example 3. Let
P(A) = (A - l)(A + 2) An = [A~l
and let A be an arbitrary 2 x 2 matrix. Then in view of Exercise 2, where Pn(A) = (An - l)/(A - 1). 0
PeA) = (A - IXA + 21) = A + A - 21.
2
(1)
In contrast to the scalar polynomial p(,t),the polynomial peA) has more than
two zeros, that is, matrices A for which P(A) = O. Indeed, since the product
of two matrices A-I and A + 21 may be the zero matrix even in the case 1.8 Miscellaneous Exercises
of nonzero factors (see Exercise 1.3.6),the polynomial (I), in addition to the
zeros 1 and -21,' may have many others. In fact, it is not difficult to check
that the matrix 1. Let a square matrix A e IIln >< n with nonnegative elements be such that the
sum of all elements of each of its rows (each row sum) is equal to one.
A= [-~ ~] Such a matrix is referred to as a stochastic matrix. Prove that the product
of.two stochastic matrices is a stochastic matrix.
is also a zero of P(A) for any number a. 1. A matrix A e IIln >< n is said to be doubly stochastic if both A and AT are
f These zeros of P(A) are scalar matrices. Obviously P(A) has no other scalar matrices as stochastic matrices. Show that the product of two doubly stochastic
zeros. matrices is a doubly stochastic matrix.
1 MATRIX ALGEBRA

3. The trace of a matrix A e e" x ", written tr A, is defined as the sum of all
the elements lying on the main diagonal of A. Prove the following.
(a) tr(<<A + PB) = IX tr A m+x Ptr B for any A, B e cn x n; , pee. CHAPTER 2
(b) If A e c"x m and Be e " then tr AB = tr BA.
(c) If A = [al}]~.J=1 e C"x", then tr AA = tr AA = D.J=llaijlz.
(d) If A, B e C"xn and A is idempotent, then tr(AB) = tr(ABA).
Determinants, Inverse Matrices,
4. Prove that if n ~ 2, there are no n x n matrices A and B such that
AB - BA = I".
I and Rank
Hint. Use Exercise 3.
S. Use induction to prove that if X, Y e !IF" x ", then for r = I, 2, ... , I
,-I
X' - Y' = L yJ(X - y)xr- 1 - J.
J=O
I
I
! The important concepts of the determinant of a matrix and the rank of a
matrix are introduced and examined in this chapter. Some applications of
these notions to the solution of systems of linear equations are also con-
sidered.

2.1 . Definition of the Determinant

In this section we define the most important scalar-valued function


associated with the elements of a square matrix-the determinant. Recall
that the notion of the determinant of 2 x 2 and 3 x 3 matrices has its
origin in solving linear systems like
allx + a12Y = bl>
(1)
a21 x + (l22Y = bz
for X,Y and
a llx + al2Y + a13z = bl>

aZlx + a22Y + aZ3z = bz , (2)


a31 x + a32Y + a33z = b3
for x, Y, z,

23
2 DETEIlMINANl'S, INVERSE MATRICES. AND RANK 2.1 DEFINITION OF THE DI!TI!RMlNANT 25

It is easily seen that in the case auazz - aUaZI :F 0 the solution (x, y) and is assumed to be a nonzero number. The matrices A;, A)', Az in (5) are
of (1) can be written in the form obtained by interchanging the appropriate column of A with the column
[b1 b2 b3]Tof right-hand terms from (2).
2a 1a To define the notion of determinant for an arbitrary n x n square matrix
x= b1 Z2 - b2 u , y= b 11 - b l1 , (3)
a a
a11a22 - aUa21 a11a22 - a12 a21 so that Cramer's rule remains true, observe that for a 3 x 3 matrix the
determinant consists of summands of the form a1ita2ha3j], wherei1,jz,i3
where the denominator in the above expressions is said to be the determinant
are the numbers 1, 2, 3 written in any possible order, that is, UUi2,i3) is a
of the coefficient matrix permutation ofthe numbers 1,2,3. In general, ifit, iI, ... ,in are the numbers
1,2, ... , n written in any order, then UUi2"" ,in) is said to be a permutation
of 1,2, ... , n.
Let A = [aij]r,J= 1 denote an arbitrary square matrix. Keeping in mind the
of system (1). Thus, by definition structure of determinants of small-sized matrices, we define a diagonal of A
as a sequence of n elements of the matrix containing one and only one element
det[a 11 au] A a11 a22 - a12 a21' from each row of A and one and only one element from each column of A.
a21 a22 Adiagonal of A is always assumed to be ordered according to the row indices;
Note also that the numerators in (3) can be written as determinants of the therefore it can be written in the form
matrices (7)
and whereU1,i2' ... ,in) is a permutation ofthenumbers 1, 2, ... , n. In particular,
if UUi2' ... ,in) = (1,2, ... , n), we obtain the main diagonal of A. In the
respectively. case UUi2' ... , jn) = (n, n - 1, "', 1), the secondary diagonal of A is ob-
Hence, if det A :F 0, the solution of (1) is tained (see Section 1.1). Clearly, an n x n matrix has exactly n! distinct
diagonals.
det A; det A)' (4)
x= detA' y= detA' Exercise.l. Check that the determinants of 2 x 2 and 3 x 3 matrices are
algebraic sums of products of the elements from each diagonal. 0
where the matrices A; and A" are obtained from the coefficient matrix of 1
the system by putting the column [b 1 b 2]T in the place of the appropnate For defining the signs ofthe products [in (6), for example], we need some
column in A. The formulas in (4) are known as Cramer's rule for finding the facts about permutations. We say that a pair of numbers it and i" in the
solution of the linear system (1). permutation U1,j2' ... ,in) form an inversion if it> l while k < p, that is,
Similarly, the solution of (2) can be found by Cramer's rule extended to if a larger number in the permutation precedes a smaller one. Each permuta-
the three-variable case: tionj == Uhi2' ... ,jn) has a certain number of inversions associated with it,
denoted briefly by t{j). This number is uniquely defined if the inversions are
det A; det A)' (5)
y= detA' counted successively, in the way illustrated in the next example. The permu-
x= detA'
tation is called odd or even according to whether the number t{j) is odd or
where the determinant of A is defined by even.This property is known as the parity of the permutation.
Exercise 2. Find t{j), where j = (2, 4, 3, 1, 5).
SoLUTION. The number 2 occurs before 1 and therefore forms an inversion.
~e number 4 is in inversion with both 3 and 1, while 3 forms an inversion
+ a13al1an - a13a22a31 - a12a21a33 - a11a23 a32 With 1. Thus, the total number of inversions t(]) in the given permutation is
(6) 4, and j is an even permutation.
i
2 D1rrERMINANTS. INVERSE MATRICIlS, AND RANK 2,2 PROPIlRTIIlS OF DETERMINANTS
27
Exercise 3. Show that interchanging two neighbouring elements in an it thus follows that
odd (respectively, even) permutation produces an even (respectively, odd)
one. [det AI = a12aZ4a33a41aSS'

Exercise 4. Prove that interchanging any two elements in a permutation ~onsider the corresponding permutation j = (2, 4, 3, I, 5). The number
changes it from odd to even or vice versa. to) IS equal to 4 (see Exercise 2) and hence the permutation is even. Thus,
Hint. Show that if there are k numbers between the given elements a and det A = a12 aZ4 a33 a4I aSS'
b, then 2k + 1 interchanges of neighbouring elements are required to inter-
change a and b. Exercise B. Check that
Exercise 5. Show that the parity of a permutation i can be decided by det diag[au. a2Z' .. , an,,] = allan'" a"".
counting the number of interchanges of pairs of elements required to trans-
form i to the sequence io in natural order, that is, io = (1,2, ... , n). (In Exercise 9. Verify that the determinant of a square matrix containing a
general, this transformation can be completed in many ways, but the number zero row (or column) is equal to zero.
of interchanges required will always be even or odd as t(]) is even or odd.) 0 Exercise 10. ~heck that the determinant of a triangular matrix is equal to
Now we are in position to give a definition of the determinant of an n x n the product of Its elements on the main diagonal.
matrix. Let A be a square matrix of order n. The determinant of A, denoted Exercise 11. Let the matrix A of order n have only zero elements above
det A, is defined by (or below) the secondary diagonal. Show that
det A ~ L (-l)!(J)alj,a2.h a"in' (8) detA = (_l)",,,-II/Zalnaz.n_l "anl . 0
i
where t(}) is the number of inversions in the permutation j = (jl,iz, ... ,in) Some general methods for computing determinants will be suggested later
and i varies over all n! permutations of I, 2, ... , n. after a study of the properties of determinants in the next section.
In other words, det A is a sum of n! products. Each product involves n
elements of A belonging to the same diagonal. The product is multiplied by
+ lor -1 according to whether the permutation U1>iz, ... ,in) that defines
the diagonal is even or odd, respectively. ' 2.2 Properties of Determinants
Exercise 6. Verify that the above definition of determinant coincides with
r.
those for 2 x 2 and 3 x 3 matrices.
In what follows A denotes an arbitrary n x n matrix. First we note some
Exercise 7. Evaluate det A, where properties of determinants that are immediate consequences ofthe definition.
0 au 0 0 0 Exercise 1. Prove that if Adenotes a matrix obtained from a square matrix
0 0 0 aZ4 0 A by muitiplying one of its rows (or columns) by a scalar k, then
A= 0 0 a33 a34 0 det A = k det A. 0
a41 a4Z 0 0 0
0 0 0 aS4 ass
Recall that a function f is homogeneous over a field IF iff(kx) = kf(x) for
~very k E IF. The assertion of Exercise 1 thus shows that the determinant
SOLUTION. Obviously, if a diagonal of a matrix contains a zero, then the IS a h~mog~neous function of the ith row (or column) of the matrix A.
corresponding term for the determinant (8) is zero. Since the given matrix ApplYing t~IS property n times, it is easily seen that for any scalar k and
has only one diagonal of possibly nonzero elements, namely, n)( n matnx A

det(kA) = k" det A. (1)

L~
"JIll'$JijM 'h
:l.2 I'ROPIiRTlES 01' iJllTlIRaflNANTll

A function f is additive if f(a + b) = f(a) + f(b) for every a, b in the Then the sign to be attached to the term (4) in det AT is (_I)'(k l, where k
domain of f. The following proposition shows that det A is an additive denotes the permutation
function of arow (or column) of the matrix A. k = (k h k 2 , , kn) . (5)
Exerei" 2. Prove that if each element of the ith row of an n x n .matrix A
To establishProperty 2 it remains to show only that permutations (3) and
can be representedin the form
(5) are both odd or both even. Consider pairs of indices
j = 1,2, . . . ,n, (t,jl), (2'}2)' ... , (n,}n) (6)
then corresponding to a term of det A and pairs
det A =: det B + det C, (k l,1), (k 2 , 2), ... , (kn , n) (7)
whereall rows of B and C except the ith one are identical to those of A, vnd . obtainedfrom (6) by a reordering according to the second entries. Observe
the ith rows are givenby that each interchange of pairs in (6) yields a simultaneous interchange of
numbers in the permutations
C, = [Cl C2 , cn] .
10 = (1,2, ... , n) and 1 = Ult12' ... ,in)'
Note that a similar result holds concerning the columns of A. 0
Thus, a sequence of interchanges applied to} to bring it to the natural order
Combining the results of Exercises 1 and 2 yields the following. io (referto Exercise 2.1.5) willsimultaneously transformio to k. But then the
samesequence of interchanges applied in the reverse order will transform k
Property 1. The determinant of a square matrix is an additive and hO'1(1.
to}.Thus,usingExercise 2.1.5, it is found that t{]) =: t(k).This completesthe
geneous !unction of the itk row(or column) of A, 1 SiS; n.
proof.
We now continue to establish several other important properties of tlhe
Note that this result establishes an equivalence between the rows and
determinant function.
columns concerning properties of determinants. This allows the derivation
Property Z. The determinant of A is equalto the determinant ofits transp,,!, -; of properties of determinants involving columns from those involving rows
ofthe transposed matrix and viceversa.Taking advantage of this,subsequent
det A =: det AT. proofs willbe only of properties concerning the rows of a matrix.
PROOF. Recalling the definition of the transpose, it is not difficult to !: ~e Property 3. If the n x n matrix B is obtained by interchanging two rows
that a diagonal (or columns) of A, then det B = -det A.
(2) PROOF. Observe that the terms of det A and det B consist of the same
factors, taking one and only one from each row and each column. It suffices,
of A ordered, as usual, according to the row indices, and corresponding to
therefore, to show that the signs of each term are changed.
the permutation
Wefirstprove this result when B is obtained by interchanging the first two
(3) rowsof A. We have
of I, 2, .. ,n, is also a diagonal of AT. Since the element au (1 :S: i, i S; n) of A (k = 3, 4, ... , n),
is in the U,i)th position in AT, the second indices in (2) determine the row of and so the expansion for det B becomes
AT inwhich each element lies. To arrive at a term of det AT, the elements
must be permuted so that the second indicesare in the natural order. Suppose det B = L (-1)'(})b 1J,b2i2 bnJ"
that this reordering of the elements in (2) produces the diagonal J

(4) J (-1)llJl o1J2" 02'J1 a3j 3


= '" " .. a "Jft'
.
j

.lJ
~ irs;
.. >
.:IU

J
Let be the permutation U1., ii' i3, ... ,in} and i denote the permutation Example 4. To compute the determinant of the matrix

UUi2' ... ,in)' By Exercise 2.1.4, t(j"} = tU} 1 and therefore
det B :::: - L (-1)'(/)allza2i1 a3h anin'
J
But summing over all permutations J gives the same n I terms as summing
over all permutationsj. Thus det B = -det A, as required.
There is nothing new in principle in proving this result when any two interchange the first two rows. Next add -4 (respectively, - 2 and 3) times
neighbouring rows are interchanged: the indices 1 and 2 above are merely i\be first row of the new matrix to the second (respectively, third and fourth)
replaced by the numbers of the rows. rows to derive, using Properties 3 and 5,
Now let the rows be in general position, with row numbers r, s; and r < s,
say. Then with s - r interchanges of neighbouring rows, rows r, r + 1, ... ,
s - 1, s are brought into positions r + 1, r + 2,... , s, r. A further s - r - 1 .
1-1 3 1]
0 -1 -2 -1
interchanges of neighbouring rows produces the required order s, r + I,
r + 2, ... ,s - I, r, Thus, a total of2(s - r) - l interchanges of neighbouring
det A = -det
[oo-2 -1
-1 2
0
2
.

rows has the net effect of interchanging rows rand s. Since the number of
such interchanges is always odd, the effect on the determinant is, once more, N,)wadd -2 (respectively, -1) times the second row to the third (respec-
simply to change the sign. jll/ely, fourth) to obtain

The next property follows immediately from Property 3. 1-1 3 1]


o -1 -2 -1
Property 4. If the n x n matrix A has two rows(or columns) alike,then
detA=O.
det A = - det 0
[o 0
0
3 2'
4 3
PROOF. The interchange of identical rows (or columns) obviously does not
change the matrix. On the other hand, Property 3 implies that its determinant
It remains now to add the third row multiplied by - t to the last row to get
the desired (upper-) triangular form:
must change sign. Thus, det A = - det A, hence det A = O.

-det[~ =~ -~ - ~].
Exercise 3. Prove that if a row (or column) of A is a multiple of another
row (or column) of the matrix, then det A = O.
det A =
Hint. Use Exercise 1 and Property 4. 0 003 2
Combining the result in Exercise 3 with that of Exercise 2, we obtain the
000 !
following. Tbm~, by Exercise 2.1.10, det A = 1.
Property 5. Let B be the matrix obtained from A by adding the elements of Exe,.cis~ 5. If A is an n x n matrix and if denotes its complex conjugate,
its ith row(or column) to the corresponding elements of its jth row(or column) prove thr,t det if = det A.
multiplied by a scalar (l U ::;:. i). Then
Exercise 6. Prove that the determinant of any Hermitian matrix is a real
det B = det A. numbe.:.
Whe~ the operation described in Property 5 is applied several times, the
Exerci:,.::: 7. Show that for any matrix A, det(A* A} is a nonnegative number.
ev~lu~tton of the determinant can be reduced to that of a triangular matrix.
ThIS IS the essence of Gauss's method for computing determinants. It is EJCe,.cis~ 8. Show that the determinant of any skew-symmetric matrix of
illustrated in the next example. odd order is equal to zero .

1III1iIIIIIi1lD' ....... _
/..J UJJ'Al.;nJJ( r.XI'AN~ION~ jj

Exercise 9. Show that the determinant of the n x n companion matrix Minors can play an important role in computing the determinant, in view
of the following result.
0 1 0 0
1 Theorem 1 (Cofactor expansion). Let A be an arbitrary n x n matrix.
Co = 0
Then for any i,j (1:S; i, j s n);
0 0 1 (1)
-aD -a1 -az -an-l or, similarly,
where ao, ab .. , all - 1 E C, is equal to (-l)nao.
(2)
Hint. Multiply the ith row (i = 1, 2, ... , n - 1) by aj and add it to the last
one. where A,.. = (-I)"+'lM,,,.
Exercise10. If A = [aij]~i':. b m S n, and B = [biiltj:. l' check that Before proceeding with the details of the proof, note that the numbers
Apq u s p, q :s; n) are called the cofactors of the elements al''l and, there-
det( -t :] = det(. ;

where I" stands for the n x n identity matrix.


~l fore, formulas (1) and (2) are referred to as, respectively, row and column
cofactor expansions of det A. Thus, the cofactor A pq is just the minor M l"l
multiplied by + 1 or -1, according to whether the sum of the subscripts is
even or odd.
Hint. Use Property 5 and add bile times the jth column (1 :s; j :s; n) of the
The proof begins with the observation that every term in Eq. (2.1.8)
=
matrix on the left to the (n + k)th column, k 1, 2, ... , m. 0
PROOF.
for det A contains an element of the ith row. Therefore, collecting together
the terms containing aiiU = 1,2, ... , n), we obtain

2.3 Cofactor Expansions


det A = auAn + alzAi Z + ... + alnAin (1 :s; i S n). (3)
Hence, to establish (1), it suffices to show that

Another method for computation of determinants is based on their Aii = Ai} (4)
reduction to determinants of matrices of smaller sizes. This is in contrast to for all i,j.
the method of Gauss (see Example 2.2.4). The following notion is important Consider first the case i = j = 1; we show that All = A 11' Indeed, from
for this approach. (2.1.8) and the construction of All in (3), it follows that
Let A be an n x n matrix. A minor of order n - 1 of A is defined to be the
determinant of a submatrix of A obtained by striking out one row and one All = L (-I)'WaZha3i3 ... an}n'
)
(5)
column from A. The minor obtained by striking out the ith row and jth
column is written Mij (1 S i, j S n). wheret(;)denotes the number of inversions of the permutation (l,iz,h, ... jn)
or, equivalently, of permutation UZ,i3"" ,jJ of the numbers 2, 3, ... , n.
Exercise 1. Concerning the companion matrix of order 4 in Exercise 2.2.9, Using the definition of the determinant for the (n - 1) x (n - 1) matrix

~ ~ ~] =
obtained by deleting the first row and the first column of A, it is easily
M u =det[ -ah jnc:ded t~t the ~xFfessio~ in (5) is that determinant, that is, Mil' Thus
11 M ll -(-1) M ll - All'
-al -az -a3
To prove (4) in the general case, we first shift aij to the (1, I) position by
?teans of i - I successive interchanges of adjacent rows followed by j - 1
M4 1 = det[~ ~ ~] = 1. 0 lDterchanges of adjacent columns. Call the rearranged matrix B. The minor
001 associated with auis the same in both A and B because the relative positions
2 DETIlR.MINANTS, INVIlllsE MATRICES, AND RANK 2.J L'OFACTOR t:XPANSIONS
35
of rows and columns of the submatrix corresponding to this minor M/J are Exercise 4. Check that det A, where A is the n x n Fibonacci matrix
unchanged. Hence, using the special case already proved,
1 I 0 0
det B = a/JMIj + (terms not involving aij)' -1 1 1
A= 0-1 0,
But Property 3 of determinants implies that
1
det B = (_1)/-1( _1)1-1 det A = (-I)/+J det A. o 0 -1 1
Hence is exactly the nth term in the Fibonacci sequence
det A = (-1)/+ JauM IJ + (terms not involving a/J)' 1,2,3,5,8, 13, ... = {a n }:'''' 1t
The result now follows on comparison with (3). Equation (2) can now be where an = an - 1 + a n- 2 (n ~ 3).
obtained using Eq, (1) and the fact that det A = det AT. Exercise 5. (The Fibonacci matrix is an example of a "tridiagonal"
Exercise 2. The determinant of the matrix matrix. This exercise concerns general tridiagonal matrices.) Let J n denote
the Jacobi matrix of order n;
a1 b l 0 0
Cl a2 b2
In = 0 C2 a3 b3 0

bn- I
can be easily evaluated by use of the cofactor expansion along the third row. 0 Cn - l an
Indeed, the third row contains only one nonzero element and, therefore, in
view of (1); Show that

3-1 0] IJnl = anIJn-ll- bn-Icn-IIJn-21 (n ~ 3),


detA=3A 32=3(-I)3+ det -1 2
[o 5 6 =(-3)46= -138.
-1 2
0 where the abbreviation IAI, as well as det A, denotes the determinant of A'
Exercise 6. Confirm that
The next exercise shows that if in expression (1) the cofactors 1
Ai) (1 :s;; i:s;; n,j = 1,2, ... , n) are replaced by those corresponding to ele-
Xl X2 Xn
ments of another row, then the expression obtained is equal to zero. A
similar result holds for columns. det xf X~ 2
Xn = n
1 :S;J<i:s;n
(Xi - XJ)' (8)
Exercise 3. Show that if i - r, then x>:-l xi-I X:- l

(6) Thet"ma nx 10 (8)' .


, wntten v", IS called the Vandermonde matrix of order n.
and if j - s, then ~~e relation in (8) states that its determinant is equal to the product of all
llferences of the form Xi - Xj (1 S j < i S n).
(1)
~int. Use Property 5 of Section 2.2 and add the ith row multiplied by -x
Hint. Consider the matrix obtained from A by replacing row i by row r s~ th~ row i + 1 (i = n - I, n - 2, ... , 1). Apply a column cofactor expan-
(i - r) and use Property 4. Ion lormula for the reduction of det v" to det v,,-t.
Exercise 1. Show that if Ca denotes the companion matrix (see Exercise matrix of a square matrix A resulting from the deletion of the rows and
2.2.9) associated with the (monic) polynomial columns listed in (I). We denote it by

a(A.) = ao + alA. +..,+ a 111 - 1 + A.n> n_1 AC> ~2 ~,,)~


1 12 }p
then
The complementary cofactor to (2) is then defined by
det(AI - Cal = a(A).
Deduce that det(AoI - CII ) = 0 if and only if A.o is a zero of a(f)' 0 ACC:1 ~2 ~")A(-IYAC:1 ~2 ~,,)C,
1 12 }" 1 12 J"
where s = (i) + i 2 + ... + i,,) + 01 + i2 + ." + j ). Obviously, in the case
p = I, this notion coincides with the previously"defined cofactor of an
2.4 laplace's Theorem element. Note that only the position of the index "c" indicates whether we
are considering a complementary minor (on the right) or complementary
cofactor (on the left).
The row and column cofactor expansions of a determinant can be
generalized in a natural way, using the idea of a minor of order p (1 S; P s; n). Example 1. Consider a 5 x 5 matrix A = [aij]!.J=)' Then, for instance,
Recall that we have already defined minors of order n - 1 of an n x n
determinant." More generally, if A is an m x n matrix, then the determinant A (21 235)4 Q21
a2 2 a24
a32 a34'
of a p x p submatrix of A (l s; p s; min(m. n,
obtained from A by striking
= a31
aS I aS2 aS4
out m - p rows and n - p columns, is called a minor of order p of A.
In more detail, if the rows and columns retained are given by subscripts
1 ~ i l -c i2 -c ... -c i" ~ rn, 1 Sil <i2 -c ... <I, S; n, (1)
AG ~ ~r AG :) = = I::: :::1,
respectively, then the corresponding p x p minor is denoted by AC(21 3
2 4
5)=(-I)"lau alsl= -Iau
a43 a4s
a1sI,
a43 a4S
A/~l ~2 ~p) ~.det[alkJk]'''l' (2) where s = (2 + 3 + 5) + (1 + 2 + 4) = 17. 0
\it 12 Jp
Theorem 1 (Laplace's theorem). Let A denote an arbitrary n x n matrix
For example, the minor M 111 of order n - 1 of an n x n matrix A is, in and let any p rows (orcolumns) of A be chosen. Then det A is equal to the sum
this notation, of the products of all Cn . " minorst lying in theserowswith theircorresponding
2 3 ... n) complementary co/actors:
M 111 = A ( 1 2 .. , n - 1 ' det A= LJ AC> ~2 ... ~P)ACC:l ~2 ... ~,,), (3)
1 12 . .. J" 1 12 .. J"
and the minors of order one are merely the elements of the matrix.
~ht;"e the summation extends over all Cn p distinct sets of (column) indices
The minors for which i" = ik(k = 1,2, ... , p)arecalled the principal minors
ltd2' ... .i, (1 S
i, <. iz -c ... <. j" S n). Or, equivalently, using columns,
of A of order p, and the minors with i" = A = k (k = 1,2, ... , p) are referred
to as the leading principal minors of A.
Before proceeding to a generalization of Eqs. (2.3.1) and (2.3.2), we also det A= L A(~)
I 11 12
~2 ... ... ~")AC(~I
J" 11
i2 .. . i,,)
i2 ... i, .
(4)
need the notion of complementary minors, that is, the determinant ofthe sub-
Where 1 :s;; i 1 < i 2 -c ... -c i" S n.
t That is, the determinant of an II x II matrix. t The symbol e.... denotes the binomial coefficient n!/(p!)(11 _ p)!.
1 DlmlRMlNANlS, INVIlRSIl MATRICES. AND RANK 2.S THE BINET-CAUCHY FORMULA

PROOF. The proof of this theorem can be arranged similarly to that of the Exercise 3. Show that if a square matrix A is of a block-triangular form.
cofactor expansion formula. However, we prefer here the following slightly with square blocks AI (i = 1.2, ... , k) on the main (block) diagonal, then
different approach. First we prove that the product of the p x pleading
principal minor with its complementary minor provides p!(n - p)! elements del A = ndet Ai'
k

i=1
of det A. Indeed, if
Exercise 4. Prove that if the n x n matrix A is of the form
( - I)t(J')al)1az12 ... aPIp. and ( -1)t(j")a . a . . . . anI".
[~1 ~2],
. p+ I,Jp+l p+2,Jp+2

are any terms of the given minor and its cofactor, then the product A =
" 3
( - 1)tU') +t(j")al Jl. azJz .. , an}n where Az and A 3 are p x p and (n - p) x (n - p) matrices, respectively,
p < n, then
is a term of det A, because ik (1 S k S p) is less than j, (p + 1 S r S n). det A = (_1)(11+ l)P(det A 2)(det A 3 ) .
Therefore the number tU') + tUff)is the number of inversions in the permu-
=
tation ] Ut>iz,'" ,in), so it is equal to tU). Exercise 5. Compute the determinant of the n x n matrix [- DB],
Since a p x p minor has p! terms and the (n - p) x (n - p) corresponding where B is an n x m matrix (m < n) and D is obtained from the n x n
complementary cofactor has (n - p)! terms, their product provides p !(n - p)! identity matrix In by striking out the columns
elements of det A. il,iz, .. , .t; (1 5. i, < i2 < ... < i; 5. n).
Now consider an arbitrary minor

AC: 11~z
1
.. . ~p) (5)
Answer. det[ -D B] = (_lYB(il
1
h.
2
i",),
m
1 ... Jp
lying in the chosen rows of A. Starting with its i 1th row, we successively
interchange the rows of A with the preceding ones so that eventually the
where t = (n - m) + (t + tik) - "~mk.
k=1
k
"=1 "=1
(6)
minor in (5) lies in the first p rows of A. Note that D", 1 i" - D .. 1 k inter-
Hint. Observe that the submatrix - D possesses only one nonzero minor
changes of rows are performed. Similarly, the shifting of (5) to the position
of order n - m, and use (4).
of the p x pleading submatrix of A requires 1 it - L:=
1 k interchanges L:..
of columns. Observe that the complementary minor to (5) did not change Exercise 6. Show that if all minors of order k (1 5. k < min(m, n of an
and that,' using the part of the theorem already proved and Property 3 m x n matrix A are equal to zero, then any minor of A of order greater than
of determinants, its product with the minor provides p!(n - p)! terms of k is also zero.
(-l)sdet A, wheres = D=1 i" + i".D=t Hint. Use Laplace's theorem for finding a larger minor by expansion along
It suffices now to replace the complementary minor by the corresponding any k rows (or columns). 0
complementary cofactor. Thus, each of Cn, p distinct products of minors lying
in the chosen p rows with their complementary cofactors provides p l(n - p)!
distinct element of det A. Consequently, the sum of these products gives
exactly 2.5 The Binet-Cauchy Formula
CII,ppl(n - p)! = nl
distinct terms of det A. As an important application of Laplace's theorem, we prove the following
The proof of (3) is completed. The formula in (4) follows from (3) by use result concerning the determinant of a product of two matrices.
of Property 2 of determinants.
Theorem 1 (Binet-Cauchy formula). Let A and B be m x n and n x m
Exercise 2. HAl and A4 are square matrices, show that matrices, respectively. If m 5. n and C = AB, then

det[~: :J = (det A 1)(det A 4 ) det C = L


ISi!<h<"'<)m Sn
A/~ ~
Vt 12 J",
r:z)B(il i2
1 2
im).
m
(1)
That is, the determinant of the product AB is equal to the sumof the products Substituting (4) in (3) and comparing with (2), we complete the proof of
of all possible minors of (the maximal) order mof A withcorresponding minors the theorem.
of B of the same order. t
By putting m = n in (1), the following useful fact is obtained:
PROOF. To prove the theorem, we first construct a block-triangular
(n + m) x (n + m) matrix
Coronary 1. The determinant of the product of two square matrices A and
B is the product of the determinants of thefactors:

c= [_~ ~l det AB = (det A)(det B). (5)


Example1. Let
where 1" denotes the n x n identity matrix. Now use Exercise 2210 to

[-1-1]
deduce

det C = de{ _1. ~] = de{ _1. ~] and


B = -: ~.
and hence, by Exercise 2.4.4, Then obviously
det C = (_1)(m+"+1)m(det ex-I)" = (_l)(m+")(m+l) det C. (2)
Now we evaluate det C in a different way. Since
C = AB = [~ ~J
CGl ~ ::: ~) = 0
and det C = 2. The same result is obtained by using the Binet-Cauchy
formula:

for at least one i. > n (1 ~ r ~ m), it follows, using Laplace's theorem, that
detC= 2: A l 2) (. .)
G i2
B ll 12

det C=
l
L A . . . .. '
2 ... m)CeGl"2...... jm). (3)
1,sj,<hS3 '1 1 2
1 sj, <)1"" <)",$." G1 12 Jm 112m
= AG ~)BG ~) + AG ~)BG ~) + AG ~)B(: D
Recalling the definition of C and the result of Exercise 2.4.5, it is not difficult
to see that 1 1-1/ 11 2ll-1 -11 10 211- 2 1
=1 -1 011-
~) = ( .... 1)'( _l)'Blil h '" im), 1 -2 0 + -1 1 1 1 + 11 11
where s =
'D'=it
cell 2
12 Jm \ 1 2 ... m
(4)

1 k + D'= 1 i" and t is given by Eq. (2.4.6).* A little calculation


= 2.

Exercise 2. Check that for any k x n matrix A, k ~ n,


gives

" I (1it 122 ...... k)12


m

S + t = n(m + 1) + 2
k= 1
Lj,., det AA = I... A.. . ~ O.
I:sJt<j,<"'<)kS Il h
and so
m Exercise 3. Applying the Binet-Cauchy formula, prove that
(s + t) - (m + n)(m + 1) = 2 Lit - m(m + 1),
"=1
which is always even.

t For m > " see Proposition 3.8.4.


r Note that in the case " = 1ft, we have t = O.
4.2. L Lle.e"m..' ..... ~..... ~.~~ __ .. _

and deduce the Cauchy identity: then

(t alcl)( t bld (.t


1=1 1=1
l) -
1=1
aldl)( t
1-1
blel) I-~ ~I -I~ ~I 1-121" 1
0
T

= L (ajb" - a"b)(c)d" - c"d).


1 ~i<"~"- .
adj A = _I -11-113 121-113 _1 21-111 ~ [ ~ -27 H 0
-2 3
Hint. Use the matrices

A = [a 1 a2 . . . a,,],
b1 b2 ... bIt
B = [c1
dl
C2
s,
I ~ -~I -I~ -~I I~ ~I
Some properties of adjoint matrices follow immediately from the
Exercise 4. Prove the Cauchy-Schwarz inequality, definition.
Exercise 2. Check that for any n x n matrix A and any k E IF,
adj(A T) = (adj A)T,
in real numbers. t adj(A*) = (adj A)*,
Hint. Use Exercise 3. adj I = I,
Exercise 5. Check that the determinant of the product of any (finite) adj(kA) = k"- I adj A. 0
number of square matrices of the same order is equal to the product of the
determinants of the factors. 0 The following striking result, and its consequences, are the main reason
for interest in the adjoint of a matrix.
Theorem 1. For any n x n matrix A,
A(adj A) = (det A)I. (1)
2.6 Adjoint and Inverse Matrices
PROOF. Computing the product on the left by use of the row expansion
formula (2.3.1) and the result of Exercise 2.3.3, we obtain a matrix with the
Let A = [alj]7.): 1 be any n x n matrix and, as defined in Section 2.4, let number det A in each position on its main diagonal and zeros elsewhere
that is, (det A)I. '
Ai) = ACC) be the cofactor of ail (1 ~ i, j s n). The adjoint of A, written Exercise 3. Prove that
adj A, is defined to be the transposed matrix of cofactors of A. Thus (adj A)A = (det A)I. 0 (2)
adj A A ([AijJ7,i: I)T. Theorem 1 has some important consequences. For example, if A is n x n
and det A i= 0, then
Exercise 1. Check that if
det(adj A) = (det A)It-I.
2 1-1] (3)

A =
[ 0 2
1 -1
0,
3
In fact, using the property det AB = (det A)(det B) with Eq. (1) and the
known determinant of the diagonal matrix, the result follows.
The vanishing or nonvanishing of det A turns out to be a vitally important
prop~rty of a square matrix and warrants a special definition: A square
t For a straightforward proof of this inequali\y (in a more general context), see Theorem Illatrtx A is said to be singular or nonsinqular according to whether det A is
3.11.1. zero or nonzero.
2 DETI!llMINANTS,lNVIlRSEMATRICES, AND RANK 2.6 ADJOINT AND INVERSE MATRICes
45
Exerci!e 4. Confirm that the matrices Exercise 9. Check that if

~ ~]
1
0 0 1]
and A=OOO,
B = [0 [
o-4 000
are, respectively, singular and nonsingular. 0 then there is no 3 x 3 matrix B such that
An n x n matrix B is referred to as an inverse of the n x n matrix A if AB = 1. 0
AB = BA = 1, (4) It should be emphasized that, in view ofthe lack of commutativity of matrix
multiplication, if B is an inverse of A the definition requires that it must
and when this is the case, we write B = A-I. Also, when A has an inverse,
A is said to be invertible. We shall see that a square matrix is invertible if and
simultaneously be a left inverse (BA = 1) and a right inverse (AB = I).
However, it will be seen shortly that a one-sided inverse for a square matrix
only if it is nonsingular (Theorem 2 of this section). is necessarily two-sided.
Exercise 5. Check that the matrices Furthermore, it should be noted that the definition (4) provides trivially a
necessary condition for a matrix to be invertible. Namely, if a square matrix
1 00] 1 00] A has even a one-sided inverse, then A is nonsingular, that is, det A :f: O.
A
[
= t -t 0
o -t t
and
[
B = 1 -2 0
1 -2 2
Indeed, if for instance AB = I, then

det AB = (det A)(det B) = 1


are mutually inverse matrices. and, obviously, det A :f: O. It turns out that the condition det A :f: 0 is also
Sufficient for A to have an inverse.
Exercise 6. Show that a square upper- (lower-) triangular matrix is
invertible if and only if there are no zeros on the main diagonal. If this is the 1'beorem 1. A square matrix is invertible if andonly if it is nonsingular.
case, verify that the inverse is a matrix ofthe same kind.
PROOF. If det A :f: 0, then formulas (1) and (2) show that the matrix
Exercise 7. Check that the involutory matrices, and only they, are their
own inverses. B=_I_ adj A
detA (5)
Exercise 8. Check that if A - I exists, then
satisfiesconditions (4) and, therefore, is an inverse of A.
(A- 1)T = (A T)-I. 0
The existence of an inverse in the case det A :f: 0 easily implies its unique-
The notion of an inverse matrix has its origin in the familiar field of the ness.To see this, first check that the right inverse is unique, that is, if AB I = I
real (or complex) numbers. Indeed, if a :f: 0, we may consider t~e quotient and AB2 = 1, then B I = B2 Indeed, AB I = AB 2 implies A(B I - B 2 ) = 0,
b = l/a as the inverse (or reciprocal) of the number a, and b satisfies ab =
and since A has a right inverse, then det A :f: O. Applying Theorem 2, we
deduce the existence of A -1. Multiplying by A - I from the left we have
ba = I, as in Eq. (4). We also write b = a-I, of course. However, the non-
commutativity of matrix multiplication and, especially, the lack of some .of A-IA(B l - B2 ) = 0,
the properties of numbers, as mentioned in Section 1.3, yiel~ substantial
differences between the algebraic properties of inverse matrices and re- and the required equality B I == B 2 is obtained.
ciprocals of numbers. . Similarly, the uniqueness of the left inverse of a nonsingular matrix is
For instance, as the next exercise shows, not all nonzero matrices have established. Hence if det A :f: 0, then both one-sided inverses are unique.
an inverse. Furthermore, since they are both given by Eq, (5), they are equal.

1 =-=====~~~. .,.. . .
2 DIlTEll.MINANU, INV1lRS1! MATIUCIlS, AND KANL 2.7 ELEMENTARV UPERATIONS ON MATKJl:!iS

Exercise 10. Check, using Eq. (5), that (b) If n = 2, all orthogonal matrices have one of the two forms,

2 21-1]0 -1 [60 -27 O.


2] COS 6 6]6 '
-sin [cos 6 6]sin
[o1 -1 3
=-h
-2 3 4
0 [ sin (J cos sin 6 - cos (J ,
for some angle 8.
Numerical calculations with Eq, (5) for large matrices are, in general,
notoriously expensive in computing time. Other methods for calculating
Exercise 17. A matrix U E en><n is said to be unitary if U*U = 1. Show
the inverse will be presented in Exercise 2.7.4
and provide a better basis
that, in this case, U is invertible, Idet U I = 1 and U - 1 = U*. 0
for general-purpose algorithms.
Exercise 11. Check that if A is nonsingular, then
2.7 Elementary Operations on Matrices
det(A- l ) = (det A)-I.
Exercise 12. Confirm that the product of any number of square matrices
of the same order is invertible if and only if each factor is invertible. Properties 1 through 5 of determinants indicate a close relationship
between the determinant of a square matrix A and matrices obtained from
Exercise 13. Prove that A by the following operations:
(AB)-l = B-IA- l ,
(1) interchanging two rows (or columns) in A;
provided A and B are invertible matrices of the same size. (2) multiplying all elements of a row (or column) of A by some nonzero
number;
Exercise 14. Show that if A and Bare nonsingular matrices of the same
(3) adding to any row (or column) of A any other row (or column) of
size and det A = det B, then there exists a matrix C with determinant 1
such that A = BC. A multiplied by a nonzero number.

Exercise15. Check that if A is a nonsingular matrix and D is square The operations 1 through 3 listed above are referred to as elementary row
(A and D may be of different sizes), then (respectively, column) operations of types 1,2, and 3 respectively.
Now observe that these manipulations of the rows and columns of a
det[ ~ : ] = det(D - CA -1 B)det A. (6)
matrix A can be achieved by pre- and postmultiplication (respectively) of A
by appropriate matrices. In particular, the interchange of rows i 1 and i 2
(il < i 2) of A can be performed by multiplying A from the left by the n x n
Hint. Use the relation
matrix

[C~-l ~][~ D _ ~A-IB][~ A-/B] = [~~l (7)

as well as Exercises 2.4.2 and 2.5.5. (It is interesting to observe the analogy 1
between Eq. (6) and the formula for the determinant of a 2 x 2 matrix. Note (il) 0 1
also that formulas similar to Eq. (7) in the cases when B, C, or D are invertible
1
can be easily derived.) The matrix D - CA- 1B is known as the Schur
E(l) = (I)
complement of A in [~ ~l 1
(i2) 1 0
Exercise 16. A matrix A e Rft><ft is said to be orthogonal if ATA = l:Show
that: 1

(a) An orthogonal matrix A is invertible, det A = I, and A-I = AT.

.I

.".
L
,i "
2 DIlTIlRMJNANTS,INVllRSll MATRICES, AND RANK
2.7 EulMllNTARY OPEIlATIONS ON MATRICES 49

obtained by interchanging rows i 1 and i 2 of the identity matrix. Thus, the Bxercise 2. Check that the inverses of the elementary matrices are also
matrix E(1)A has the same rows as A except that rows i1 and i2 are inter- elementary of the same type. 0
changed. The application ofelementary operations admits the reduction ofa general
Furthermore, the effect of multiplying the ith row of A by a nonzero m x n matrix to a simpler one, which contains, in particular, more zeros
number k can be achieved by forming the product E(2)A, where than the original matrix. This approach is known as the Gauss reduction

1 I (or elimination) process and is illustrated below. In fact, it was used in Example
2.2.4 for computing a determinant.

1
\ Example 3. Consider the matrix
i
E(2) = (i) k
1
(2) i
\
1o 2-1]
0 6.
-2 -4 -10
1 Multiplying the second row by -!, we obtain
Finally, adding k times row i 2 to row i 1 of A is equivalent to multiplication
of A from the left by the matrix
1
A1 = E~2)A = [.~ ~ ~ =~l,
4 -2 -4 -10
1
where E~21 is a 3 x 3 matrix of the type 2 with k = -t, i = 2.
1 '" k Multiplying the second row by -4 and adding it to the third, we deduce
1 1
..
or , (3) 0 1 2-1]
1
1
A2 = E~3'A1 = J 0
o -2[ -4
0 -3 ,
2
1 where E~) is a 3 x 3 elementary matrix of the third type with k = -4,
1 i1 = 3, i2 = 2.
The interchange of the first two rows provides
depending on whether i1 < i2 or i 1 > i2 Again note that the operation to
be performed on A is applied to I in order to produce <3). 1 0 0-3]
Similarly, the multiplication of A on the right by the appropriate matrices
E(1I, E121, or E(31 leads to analogous changes in columns.
Aa = E!jIA2 = 0 [o 1 2 -1 ,
-2 -4 2
The matrices E(1I, E(2 1, and E(3) are called elementary matricesof types 1,2,
and 3, respectively. and the first column of the resulting matrix consists only of zeros except one
Properties 1,3, and 5 of determinants may now be rewritten as follows: element equal to 1.
Employing a 3 x 3 matrix oftype 3, where k = 2, i 1 = 3, i2 = 2, we obtain
= det AE(1) = -det A,
det E(1IA
det E(2IA = det AE(2) = k det A, 100-3]
det E(3)A = det AE(3) = det A.
(4)
A4 = EltlA 3 = [0 1 2-1
o 0 0 0
(5)

Exercise 1. Check that and it is clear that A 4 is the simplest form of the given matrix A that can be
det E(1) = -I, det E(2) = k, det E(3) = 1. achieved by row elementary operations. This is the "reduced row-echelon

I
;1

hitinfF 'til be"


L_" ' ,"." ." ". "." .", ,. , ".", ,,., ,., " " "., " ."., ".
,L LlClCAm'I~"l.'O' "' - , --- ,- _.

form" for A, which is widely used in solving linear systems (see Section 2.9). The following matrices are all in reduced row echelon form:
Performing appropriate elementary column operations, it is not difficult
to see that the matrix A can be further reduced to the diagonal form:
[o10-3]
1 2' [o121]
0 0' [o1-2 0] 0 l'

Observe that these matrices cannot be reduced to simpler forms by the


application of elementary row operations. If, however, elementary column
operations are also permitted, then further reductions may be possible, as
where Etl), E~3), E~3) denote the following matrices of type (3): the next theorem shows. Again, a detailed proof is omitted.
Theorem 2. Any nonzero m x n matrix can be reduced by the application of
1 0 0 3] [ 1 0 0 0] elementary rowandcolumn operations to an m x n matrix in one of thefollow-
E(3) = 0 1 0 0 E(3) = 0 1 -2 0
ing forms:
5
[ 0010' 6 0010'
0001 0001
u, OJ, If OJ (8)
Summarizing all of these reductions, we have
[ o 0'

(E~)E\')~)E\'')A(E~)E''')E\,,> = U ~ ~] ! (6)
Recalling the connection of elementary operations with elementary
matrices, we derive the next theorem from Theorem 2 (see also Eqs. (6) and
(7.
or Theorem 3. Let A be an m x n matrix. There exists a finite sequence of
elementary matrices E lt E2 , , Eu s such that
1000]
[
PAQ = 0 1 0 0,
o 0 0 0
(7) EkEk- J
is one of the matrices in (8).
'" EIAEuIEu2'" Eu s (9)

where the matrices P and Q are products of elementary matrices and are, . Since elementary matrices are nonsingular (see Exercise I), their product
therefore, nonsingular. 0 IS also and, therefore, the matrix in Eq. (9) can be written in the form P AQ,
An application of the elimination procedure just used to the case of a ~he.re P and Q are nonsingular. This point of view is important enough to
generic rectangular m x n matrix leads to the following important result, JustIfy a formal statement.
which is presented without a formal proof. Coronary 1. For any m x n matrix A there exist a nonsingular m x m
Theorem 1. Any nonzero m x n matrix can be reduced by elementary row matrix P anda nonsingular n x n matrix Qsuchthat PAQ isoneof the matrices
in(8).
operations to reduced row echelon form, which has the following defining
properties: . Consider now the set of all nonzero m x n matrices with fixed m and n.
(1) All zero rowsare in the bottom position(s). SI~ee the number of matrices of the types in (8) is strictly limited, there must
(2) The first nonzero element(reading from the left) in a nonzerorow is a exist many different matrices having the same matrix from (8) as its reduced
form.
one (called a "leading one"),
(3) Forj = 2,3, ... , m the leading one in row j (if any)appears to the tight Consider two such matrices A and B. In view of Theorem 3, there are two
of the leading one in rowj - 1. sequences of elementary matrices reducing A and B to the same matrix.
lienee .
(4) Any column containing a leading one has all other elements equal to
zero.

. -. . . . . -. . .- - 1-.. _
1. UI!TIll\MINANT5, INV\ll\Sl! MATRICES, AND RANK 2.8 RANK OF A MATRIX
53

I and since the inverse of an elementary matrix is an elementary matrix of the


same kind (Exercise 2), we can conclude that
We conclude this section with a few supplementary exercises. The first
provides a method for finding the inverse of an invertible matrix via elemen-
tary row operations.
A;:; (E11E;1 .. , E;IE,.. E1)B(El+l .,. E,+,E;;s ' E;;I)'
I Hence A can be obtained from B bya number of elementary operations.
In the language of Corollary 1, we have shown that
Exercise 4. Show that if A is an n x n matrix, then the matrices
and [In B] (12)
A = PBQ (10) of the same size are row equivalent (that is, each can be reduced to the
other by elementary row operations) if and only if A is invertible and B ==
for some nonsingular matrices P and Q of appropriate sizes. The next A- t
theorem summarizes these results.
Hint. If the matrices in (12) are row equivalent, [A In] = PUn B] =
Theorem 4. If A and B are matrices of the same size, the following state- [P PB]. Thus, if the Gauss reduction process applied to [A In] gives the
ments are equivalent: matrix [In B], then necessarily B = A-I. If such a reduction is impossible
(1) A and B have the same reduced form (8). then A fails to be invertible.
(2) Eitherof thesematrices can be obtained from the otherby elementary Exercise 5. Show that if two square matrices are equivalent, then they are
row and column operations. either both nonsingular or both singular.
(3) There exist nonsingular matrices P and Q such that A = PBQ.
Exercise 6. Check that an n x n matrix is nonsingular if and only if it is
t Each of the above statements can be used as a definition of the following
notion of equivalence of two matrices. For instance, the matrices A and B
row (or column) equivalent to the identity matrix In. Thus, the class of
mutually equivalent matrices containing an identity matrix consists of all
are said to be equivalent, written A "" B, if statement 3 applies. nonsingular matrices of the appropriate size.
Obviously, the relation A "" B is an equivalence relation in the sense that
Exercise 7. Show that any nonsingular matrix can be decomposed into a
(1) A -- A (a matrix is equivalent to itself). This is the property of product of elementary matrices.
reflexivity.
(2) If A "" B, then B ,..., A (symmetry). Exercise 8. Show that the reduced row-echelon form of a given matrix is
unique. 0
(3) If A -- Band B "'" C, then A -- C (transitivity).
Thus, the set of all m x n matrices with fixed m and n is split into non-
intersecting classes of mutually equivalent matrices (equivalence classes).
Each m x n matrix A belongs to one and only one class, which can be
described as the set of all matrices P AQ where P and Q vary over all non- 2.8 Rank of a Matrix
singular m x m and n x n matrices, respectively.
Each equivalence class is represented and determined uniquely by a .
matrix of the form (8), called the canonical (simplest) form of the matrices of Let A be an m x n matrix. The size r (1 S; r S; min(m, n of the identity
this class. For example, if r < m < n, then a matrix matrix in the reduced (canonical) form for A of Theorem 2.7.2 is referred to
as the rank of A, written r = rank A. Recalling the possible canonical forms
(11) of A given by (2.7.8), we have

is a representative of a class; it is the canonical form of the matrices of its rankUm 0] = m,


equivalence class. Obviously, an equivalence class is identified if the number
rin (11) is known. This fact justifies a detailed investigation of the number r,
as carried out in the next section. rank[~ ~] = r, rank In = n.
2 DETERMINANTS, INVERSB MATRICBS, AND RANK 2.8 RANK OF A MATRIX 55
Clearly, all matrices belonging to the same class of equivalent matrices is equal to 1, because there is a nonzero minor of order 1 (i,e., an element of
have the same rank. In other words, mutually equivalent matrices are of the the matrix) and each minor of order 2 or 3 is equal to zero. 0
same rank.
Some properties of the rank are indicated in the following exercises:
Exercise1. Check that the matrix in Exercise 2.7.3 has rank 2.
Exercise 4. Show that for any rectangular matrix A and any nonsingular
Exercise 2. Show that the rank of an n x n matrix A is equal to n if and matrices P, Q of appropriate sizes,
only if A is nonsingular. 0
ra.nk(PAQ) = rank A.
The last exercise shows that in the case det A :I: 0, the problem of finding
the rank is solved: it is equal to the size of the (square) matrix. Exercise 5. Prove that for any rectangular matrix A,
In the general case of a rectangular or singular square matrix, we need a rank A = rank AT = rank A*.
method of finding the rank that is more convenient than its reduction to
canonical form. Note that the problem is important, since the rank deter- Exercise 6. Prove that for a block-diagonal matrix,
mines uniquely the whole class of mutually equivalent matrices of the given k
size. rank diag[A u , An' ... AkJ = L rank Aji. 0
We start to develop a method for the direct evaluation of rank by i=J.
examining the matrix A itself with an obvious but suggestive remark. Con- For block-triangular matrices we can claim only the following:
sider the representative (canonical form)
Exercise 7. Confirm that
[~ ~]
to the order of the largest non vanishing minor. The same fact clearly holds
(1)
of a class of equivalent matrices of rank r and observe that its rank r is equal

for other types of canonical forms in (2.7.8). It turns out that this property of
rank[A r ~::
. ,
o ... 0
AAu
k-J,k
A kk
1~ 'irankA/i.
1= J
0

the rank carries over to an arbitrary matrix. A case in which inequality occurs in Exercise 7 is illustrated in the next
TlJec)rem 1. The rank of an m x n matrix A is equal to the order of its exercise.
largest nonzero minor. Exercise 8. The rank of the matrix
PROOF. Theorem 2.7.3 states, in fact, that the matrix A can be obtained
from its canonical form (2.7.8) by multiplication by elementary matrices 1 OiO 0
(inverses to those in (27.9. Then, in view of the remark preceding Theorem o .
0;0 1
1, it suffices to show that multiplying a matrix by an elementary matrix ----'-1-------
does not change the order of its largest nonzero minor.
Indeed. the effect of multiplication by an elementary matrix is expressed
o .
0:0 0
in Eqs. 2.7.4: the elementary operations only transform nonzero minors to
o 0: 1 0
is 3, although
nonzero minors and zero minors to zero minors. Obviously, the same situa-
tion holds if repeated multiplication by elementary matrices is per-
formed. rank[~ ~] + rank[~ ~] = 2. 0
Example 3. The rank of the matrix
The rank of a matrix has now been investigated from two points of view:
3 2 that of reduction by elementary operations and that of determinants. The
A == [ 6 4 concept of rank will be studied once more in Section 3.7, from a third point
-3 -2 of view.
2.9 ::iYSTIlMS Of LINIlAR EQUATIONS AND MATRICIlS 57

Exercise 9. Let A be an invertible n x n matrix and D be square. Confirm and using the definition of the matrix product, we can rewrite (1) in the
that abbreviated (matrix) form

rank[~ ~] = n Ax = b. (4)
Obviously, solving (1) is equivalent to finding all vectors (if any) satisfying
if and only if D = CA -1 B; that is, if and only if the Schur complement of A (4). Such vectors are referred to as the solution vectors (or just solutions) of
is the zero matrix. the (matrix) equation (4), and together they form the solution set.
Note also that the system (1) is determined uniquely by its so-called
Hint. See Eq. (2.6.7). 0
augmented matrix
[A b]
of order m x (n + 1).
2.9 Systems of Linear Equations and Matrices Recall now that the basic elimination (Gauss) method of solving linear
systems relies on the replacement of the given system by a simpler one
having the same set of solutions, that is, on the transition to an equivalent
Consider the set of m linear algebraic equations (a linear system) system. It is easily seen that the following operations on (I) transform the
aUXl + a12xZ + ... + alnxn = b1, system to an equivalent one:
aUxl + azzxz + .,. + aznxn = bz, (1) (1) Interchanging equations in the system;
(2) Multiplying an equation by a nonzero constant;
amlxl + am2x2 + ... -I- amnxn = bm' (3) Adding one equation, multiplied by a number, to another.

in n unknowns Xl' X2' . , Xn Recalling the effect of multiplication of a matrix from the left by ele-
An n-tuple(x~, x~, ... , x~}issaid to be a solution of(1} if, upon substituting mentary matrices, we see that the operations I through 3 correspond to left
x?instead of XI (I = 1,2, ... , n) in (1), equalities are obtained. We also say multiplication of the augmented matrix [A b] by elementary matrices
in this case that x? (i = 1, 2, ... , n) satisfy each of the equations in (1). If E(I I, E(2 1, E(3 1, respectively, in fact, by multiplication of both sides of (4) by
the system (1) has no solutions, then it is inconsistent, while the system is such matrices on the left. In other words, for any elementary matrix Em of
consistent if it possesses at least one solution. It is necessary in many applica- order m, the systems (4) and
tions to solve such a system, that is, to find all its solutions or to show that EmAx = Emb
it is inconsistent. In this section we first present the Gauss reduction method
for solving linear systems and then some other methods, using matrix ~re equivalent. Moreover,since any nonsingular matrix E can be decomposed
techniques developed earlier. A further investigation of linear systems and mto a product of elementary matrices (see Exercise 2.7.8),the original system
their solutions appears in Section 3.10. (4) is equivalent to the system EAx = Eb for any nonsingular m x m
Returning to the system (1), observe that the coefficients on the left-hand matrix E.
side form an m x n matrix Referring to Theorem 2.7.1, any matrix can be reduced by elementary row
operations to reduced row-echelon form. Therefore the system (4) can be
A = [alj]~:/':' l' (2)
t~ansformed by multiplication on the left by a nonsingular matrix B to a
called the coefficient matrix of (1). Furthermore, the n unknowns and the m Simpler equivalent form, say,
numbers on the right can be written as (column) vectors:
~x=~ ~
Where [EA Eb] has reduced row-echelon form.
(3) . This is the reduced row-echelon form of the system, which admits calcula-
tion ofthe solutions very easily.The argument above establishes the following
theorem.
Theorem I, Anysystem of mlinear equations innunknowns has anequivalent Theorem 2. If the coefficient matrix A of a linear system Ax == b is non-
system in which theaugmented matrix has reduced ro~;,echelonform. singular, then the system has a unique solution given by x = A -lb.
In practice, the reduction of the system to its reduced row-echelon form is It should be noted that in spite of the obvious importance of Theorem 2,
carried out by use of the Gauss elimination process illustrated in Example it is rarely used in computational practice since calculation of the inverse
2.7.3. is usually more difficult than direct solution of the system.
Exercise 1. Let us solve the system Example 2. Consider the system

X2 + 2X3 = -I, 2x l + X2 - X3 = 1,
-2x. = 6, 2X2 = -4,
4x. - 2x 2 - 4X3 = -10, Xl - X2 + 3X3 = O.
having the augmented matrix The matrix

0 1 2 ":"1]
-2 0 0 6 A = 02 21 -013]
[ [ 1 -1
4 -2 -4 -10
considered in Example 2.7.3. It was shown there that its reduced row- is invertible with the inverse (see Exercise 2.6.10)

-~ ~].
echelon form is

[~o ~ 0
~0 =~],0
Then the solution of the given system is
3 4

(see Eq. 2.7.5) which means that the system


lx. + 0'X2 + 0'X3 = -3,
O Xl + 1 X2 + 2X3 = -1
is equivalent to the original one. Hence the relations and

Xl = - 3, X2 = -1 - 2t, X3 = t, x.=I, x2=-2, x 3=-1. 0


where t may take any value, describe all of the (infinitely many) solutions of Another approach to solving the linear system Ax = b with nonsinguJar
the given system. 0 A relies on the formula (2.6.5) for finding the inverse A-I. Thus

Proceeding to the case of a system of n linear equations with n unknowns, x = A-lb = _1_(adj A)b.
we recall that if the coefficient matrix A of the system is nonsingular, then det A
A is row equivalent to the n x n identity matrix (Exercise 2.7.6). Hence there
In more detail, if Ai) denotes the cofactor of aij in A, then
is a nonsingular matrix B such that BA = In, and (4) can be rewritten in an
equivalent form,
x = BAx = Bb.
Obviously, B = A-I and then the solution of (4) is given by x = A-lb
and is unique, since the inverse is unique.

;f

I
"1:177
J.... ---.
Ul

or, what is equivalent, It will be seen n Section 3.10 that a complete description of the solution
set of Ax = b for a general m x n matrix A can be made to depend on the
1 n
Xj=-d- LbJA j j (i= 1,2, ... ,n). solution set of the homogeneous equation Ax = 0 (which will always include
et A }=1 the trivial solution x = 0). We now conclude with two exercises concerning
homogeneous linear systems.
Observe now that this summation is just the cofactor expansion along the
ith column of the determinant of the matrix A(i), where A(l) is obtained from Exercise 4. Show that if A is nonsingular, then the homogeneous system
A by replacing its ith column by b = [b 1 b z , bnY. Thus we have the Ax = 0 has onl}' the trivial solution. 0
formulas
Note also the following important result concerning homogeneous
detA(I) systems with m < n.
Xj = det A (i = 1, 2, ... , n). (6)
Exercise 5. Check that any homogeneous linear system having more
This is Cramer's rule for solving linear systems of n equations with n un- unknowns than equations always has infinitely many solutions.
knowns in the case of a nonsingular coefficient matrix. Note that formulas Hint. Consider the reduced row-echelon form of the system. 0
(6) for n = 2,3 were indicated in Section 2.1.
Example 3. Let US use Cramer's rule to solve the linear system
2x1 - X2 - 2X3 = 5, 2.10 The LU Decomposition
4X1 + X2 + 2X3 = 1,
8X1 - X2 + X3 = 5. Consider Theorem 2.7.3 once more, and suppose that the matrix A is
We have square of size n x n and, in addition, is such that the reduction to canonical

2-1 -2] form can be completed without elementary operations of type 1. Thus, no
row or column interchanges are required. In this case, the canonical form is
[
det 4
8 -1
1 2 = 18 :p 0,
1 D =
o
[I0 0]0
r

and therefore Cramer's rule can be applied. Successively replacing the


columns of the matrix by the column [5 1 5]T and computing the deter- ifr < n, and we interpret Do as In if r = n. Thus, Theorem 2.7.3 gives
minants, we deduce (E"E"-1 ... E1)A(Ek+lE"+2'" Ek+,) = Do

5-1
[
det 1
-2]1 2 = 18, det 4
25-2]
[8 5 1 1
[2
2 = 18, det 4
-1 5]
1 1 = - 36.
and E1> E z, ... , E" are lower-triangular elementary matrices and Ek+l'
Ek+Z,"" E Ah are upper-triangular elementary matrices. Since the inverses
5 -1 1 8 -1 5 of elementary matrices are elementary matrices of the same type, it follows
that we may write .
Thus A = LU, (1)
X3 = -H=-2 where L = EllE;l ... E; 1 and is a nonsingular lower-triangular matrix.
is the required solution of the given system. 0 1
~' U = DoE;';, E;+ 2E;;I' an upper-triangular matrix, which is non-
It should be mentioned that although Cramer's rule is an important Singular if Do, and hence A, is nonsingular. A factorization of A into tri-
theoretical result and a useful one when the size of A is not too big, it is angular factors, as in Eq. (1), is known as an LU decomposition of A.
not recommended as a general-purpose algorithm for the numerical solution ~en such a factorization is known, solving the linear system Ax = b is
of large sets of equations. Algorithms based on reduction by elementary re~atlvely quick and simple. The system Ly = b is solved first. Recall that
operations are generally more efficient.
L IS nonsingular so, by Theorem 2.9.2, there is a unique solution vector y.
Also, because L is lower triangular, y is easily calculated if the components where C and d are vectors in C,,-I to be determined, Form the product
Yu Y2' ... , y" are found successively.
Then the equation Ux = y is solved to obtain the sOi.ution(s) of Ax = b. LU [L";1 U"-1 L"_lC],
=
For Ux = yand Ly = b clearly imply LUx = Ax = 6, .~ut once more, the d U"-1 a".. ,
triangular form of U makes the derivation of x from y ])lrticularly easy. In and compare the result with Eq. (3). It is clear that if e and d are chosen so
this case, the components of x are found in the order X",,,,,-lt x 2 Xl' that
The practical implementation and analysis of elimim.sion algorithms for
(5)
linear systemsdepend heavily on the possibility of trianJ.,lar factorizations,
and our next result is the fundamental theorem of this kind. The discussion then the matrices of (4) will give the required factorization. But Eqs, (5) are
to this point has required the unsatisfactory hypothesis that row and column uniquely solvablefor C and d because L,,_I and U,,-I are nonsingular, Thus,
interchanges would not be required in the reduction of A 1'0 canonical Iorrs,
The theorem replaces this hypothesis by a condition on the, leading principal C = L;;.!I"I and II? = aIU;_ll'
minors of A (see Section 2.4 for the definition). and subsitution in (4) determines the unique triangular factors of A required by
thetheorem.
11Ieorem 1. Let A e C" lC" and assumethat the leading principal minors
For the determinantal property, recall Exercise 2.1.10 and Corollary 2.5.1
to deduce that det L = 1 and det A = (det L)(det U). Hence det A =
AG)' AG ~). .... AG ~ ::: : T D (2) det U.
are all nonzero. Then there is a unique lower triangular L with diagonal
elements all equal to one and a unique upper triangular ""lit/'ix II such that Exercise 1. Under the hypothesesof Theorem 1,show that there are unique
A=LU.
lower- and upper-triangular matrices Land U with aU diagonal elements
equal to one (for ~oth matrices) and a unique diagonal matrix D such that
Furthermore, det A = det U = U11U22" u"".
A = LDU.
PROOF. The proof is by induction on n; the size of matris A. Clearly. if
n = 1. then all = 1 . all provides the unique factorizatibn the theorem. Exercise 2. In addition to the hypotheses of Theorem I, assume that A is
Now assume the theorem true for square matricesofsize(n _. 1) x (n - 1). Hermitian. Show that there is a unique L (as in the theorem) and a unique
and partition A as real diagonal matrix D such that A = LDL.
A = [A";I
tl2
"1].
a""
(3) Hint. Use Exercise 1 and especially the uniquenessproperty. (The factoriza-
tion of this exercise is often known as the Cholesky-factorization.i 0
11
where "1. "2
e C"- 1. Then the theorem can be applied to 4"- 1. yieldingthe
unique triangular factorization

2.11 Miscellaneous Exercises


where L"-l has diagonal elements all equal to one. Since

(12...
detA"_l=A 1 2 ... n- 1 r#:O.
1'1 - 1) 1. Check that if ex = vi. fJ = JR. then
A +2
2
exA. fJA 1
L"_I and U ,,-I are nonsingular. det exA A,2 + 2 0 = (A,2 + 1)3.
Now consider n x n partitioned triangular matrices [ {JA. 0 A,z + i

and (4) Z. IfAisasquarematrixandA 2 + 2A + 1 = O,showthat A isnonsingular


and that A-I = -(A + 21).
2.11 MISCELLANEOUS EXERCISES 65
3. Given that D is a diagonal matrix and nonsingular, prove that if D = 8. Let the elements of the n x n matrix A = [aj j ] be differentiable func-
(I + A)-lA, then A is diagonal. . tions of x. If A = [al a2 an], prove that
Hint. Establish first that D = A(l - D).
4. Let matrices 1, A, B be n x n. Check that if 1 + AB is invertible, then
d: det A=de{:x al a2 ... an] + de{al :x a2 a3 .,. an] +
1 + BA is invertible and
... + de{a 1 a2 . .. :x an)'
(1 + BA)-l =I - B(I + AB)-lA.
where the derivative of a vector is the vector of the derivatives of its
s. Verify the Sherman-Morrisonformuia: elements.
1 TA- 1
(
A + T)-l _ A-I _ (A- u)(v ) Hint. Use the definition of det A and Property L
uv - l+v TA- 1u'
9. Let A be an n x n matrix and x, y be vectors of order n. Confirm the
provided the matrices involved exist. relation

6. Let det~~ :] = 0( det A - yT(adj A)x


0
I for any complex number ex. Then deduce
I.
o i1 de{;~A ~;] = (det A)(O( - ;T A~)
be an n x n matrix. Show that for any vector ~ of order n.
Hint. Expand by the last column and then expand the cofactors by the
(A2 - l)P(A - 1) if n = 2p + 1 last row.
dn(A) l! det(A./n - Pn) = { (A2 _ l)P
if n = 2p
10. Let an n x n matrix A = [alj] be represented in the form A = [A 1 A 2] ,
Hint. Show first that dn(A;) = (A? - l)d n- 2 (n ~ 3). where Al and A 2 are n x k and n x (n - k) matrices (1 ~ k ~ n - 1).
Prove that
7. Consider the n x n matrix
[det A 12 s (det AfA1)(det A!A 2 ) .
X +A x
Hint. Use Laplace's expansion formula along the first k columns, the
A = x X+A Cauchy-Schwartz inequality (Exercise 2.5.4), and Exercise 2.5.2.
[
x x ... x n. Establish Hadamard's inequality for any square matrix A = [alj]L= 1:

(a) Prove that det A = An-I(nx + A). [det A 12 ~ n L lajjl2.


n

j=
n

1 i= 1
(b) Prove that if A - I exists, it is of the same form as A.
Hint. Apply the result of Exercise 10, where Al i~ a column of A.
Hint. For part (a), add all columns to the first and then subtract the
first row from each of the others. For part (b), observe that a matrix A 12. Check, using Hadamard's inequality, that
is of the above form if and only if PApT = A for any n x n permutation [det A I s Mnnn/2
matrix. (An n x n matrix P is a permutation matrix if it is obtained from
In by interchanges of rows or columns.) provided A = [alj]L=l and laol ~ M (1 s i,j s n).
67

13. Evaluate det H n , where H; denotes the following Hankel matrix: 16. Show that the inverses of the n x n triangular matrices
So 81 82 8n-1 -I o ... 0 1 0 o
81 82 s, -f 1
0 -I
Hn = S2 == [SI+1-2]7. 1 = 1o AI = and A2 = 0 -I
1 ":'-1
1 0
$2n-2 0 0 1 0 o -I 1
and where the elements have the form
are matrices with zeros below (respectively, above) the main diagonal
(0 'S k ~ 2n - 2)
and ones elsewhere.
17. Check that the inverses of the n x n matrices
for some numbers XI' X 2, , X n E !J:. (The numbers So' SI' , S2n-2
are called Newton sums of XI> X2,"" xn .) 1 0 0

Answer. detHn = nl:S;i<lsn(X; - XJ)2. A=


-2
1 -2
1
_ -1
2-I 2 -I .. :
0]
Hint. H; =
2.3.6).
v" V~, where v" is the Vandermonde matrix (see Exercise 0
and B-.
[o
:
-1
.
". -I
1
0 1 -2 1
14. Compute the determinant of the circulant matrix
are the matrices
Co CI Cn-I
1 o o
Cn- I Co CI Cn-2 2
c= 3
CI A- I =
CI Cn- I Co
n - I I 0
n n- 1 2 1
Answer. det C = 07=lf(IlI) , wheref(x) = D;J c;x l
and 810 62, ... ,8n
denote the distinct nth roots of unity. and
Hint. Cv" = v" diag[f(81), f(82), ... , f(ll n)] , where v" is the Vander-
1 1
monde matrix constructed from the III (i = 1,2, ... , n).
2 2
15. Prove that the determinant of a generalized circulant matrix C, B- I =
n- 1 n- 1 '
2 n-I n
respectively, where row j of B- I is [I 2 . .. j ... j).
Hint. UseA = AtB = A IA 2,whereA I,A 2aredefined in Exercise 16.
18. An n x n matrix A = [alj], where
is equal to n7= I f(Il;), wheref(x) = L7.:-J CiXI and " 1o "2, , " n are all
the nth roots of lX. aij = ai : j (0 :s; i :s; n - 1, 0 :s; j :s; n - I),
that is, 20. Consider the n x n triangular Toeplitz matrix
1 0 0

A = a, (a, of:. 0, / < n)


is said to be a Toeplitz matrix, and it is said to be generated by the quasi- o
polynomial o
p(A.) = a,,_lA.,,-l + ... + alA. + ao + a_ IA.- 1 + ." + a_,,+ 1A.- +1. 0'" 0 a, ... a1 I
n

(For example, matrices A and B of Exercise 17 are Toeplitz matrices.) generated by the polynomial p(,'l.) = 1 + 'Lf= 1 al,'l./. Show that if
Show that an invertible matrix A = [aij]~.j= 1 is Toeplitz if and only if k I

there are companion matrices of the forms P(il) = n(l -


j= 1
Arj) n(1 + alil + b1A
1= I
2
) (k + 2t = n,t
r-p I -P,,-l ... -pol then A can be decomposed as

c<:' = l 'f 0
... 0

k rj
o
I al
1 o

and
1 0 A= n
j= I .=1
n b.
Cl l )
p
=
0. :

1 ."
:'.
~
:
0
-Po
-P"~2
1 21. Pro~e
0

that the compani?n matrix Cm associated with the monic poly-


o ... b. al 1

o ... 1 -Pn-1 nomial a(A) = L~=o alA' (where a(A) has distinct zeros XI' Xl"'" Xn)
can be represented in the form
such that C" = v" diag[xlt X2,"" XJV;I,
C~)A = AC~). where v" denotes the Vandermonde matrix of XI' Xl, . , XII'
1
Hint. -[Pn-1 P,,-2 ... Po] = [a-1 a-l ." a_JA- 22. Prove that if Xi (1 :s i :S s) is a zero of a(,'l.) with multiplicity k,
for any a_II'
19. Let the polynomial P(l) = 1 + aA + hAl (b of:. 0) with real coefficients
have a pair of complex conjugate roots ret l'P. Confirmthe decomposition .
(. = n),
,=1
kj
then
1 0
a
b a 1
1

c,~ P"diag ,
XI 1
~
..... . ,
1[Xli]~, ..... ...
[XI
I 11
~ v;'
[[
o b a 1
where
sin(k - 2)qJ t Ob~erve that this is the factorization of the polynomial q(A) = A'p(l/A) into linear and
c - - d" -_- sinkqJ (k = 23 )
, " .. ,n. qUadratIC factors irreducible in R,
,,- r sin(k - l)qJ' r sin(k - l)qJ
where Vn denotes the generalized Vandermonde matrix
v.. = [WI W1 ... WJ.
Here Wi = [Xil XPl . . . X1 k
l] is an n x k, matrix (1 ~ i ~ s), and
CHAPTER 3
m
1-d- [1
Xlml =m!dx'{' X Xi2 Xn]T
Linear, Euclidean,
I I

23. Let complex matrices A, B, C, D be of sizes n x n, n x I, m x n, and


m x I, respectively. The system [1': and Unitary Spaces
AX = Ax + BU} x E en, U E e/, C E em,
y = Cx + Du '
where A is a complex parameter, plays a fundamental role in "systems
theory."
(a) Show that
y = [C(lnA - A)-IB + D]u.
The matrix in square parentheses is the transfer function for system fI
(and givesthe "output" y in terms of the "input" or "control vector" s), The notion of an abstract linear space has its origin in the familiar
(b) Suppose, in addition, that m = 1 and D = 1m (the m x m three-dimensional space with its well-known geometrical properties.
identity). Show that Consequently, we start the discussion of linear spaces by keeping in mind
properties of this particular and important space.
-C(lnA - A + BC)-IB + 1m = [C(lnA - A)-IB + Imrt,
and find a system whose transfer function is the inverse of that for fI'.
3.1 Definition of a Linear Space

Before making a formal definition of a linear space, we first collect some


important properties of three-dimensional vectors and discuss them from
a more general point of view.
Recall that a vector in three-dimensional space is referred to as a directed
line segment. Thus, it is defined by its length, direction, and orientation. t
So, the initial point of the vector is not essential: vectors having the same
length, direction, and orientation are equivalent and thus describe the
same vector.
This definition of a (three-dimensional) vector allows us to consider all
vectorsin the space as having the same initial point. If the space is equipped
with a Cartesian coordinate system, then, putting this common initial
point at the origin, we can deduce that a vector i is completely described

t Direction can be defined as a set of parallel straight lines.

71
72 3 LINEAR, EUCUDIlAN. AND UNITARY SPACES 3.1 DEFINmoN OF A LINEAIl SPACE 73

by three numbers, namely, the coordinates (a1o a2' aa) of the terminal point 84 (<< + P)i = OCR + pi,
of the corresponding vector with initial point at the origin. We write 85 ti = ii.
It is not difficult to see that these properties yield other fundamental
i = (ai' a2' aa)
properties of vector addition and scalar multiplication of position vectors.
and call this a position vector. Thus, there is a one-to-one correspondence For example, lXO = 0 for any real number lX. To see this, first use AS to obtain
between all position vectors in three-dimensional space and all three-tuples (1)
of real numbers.
In the set of all position vectors, the operation of addition is defined as Then by A4,
in Section 1.2. Thus, to any two position vectors there corresponds another ocO + [oeO + (-oeO)] = oeO.
position vector according to a certain rule. This specific correspondence is a
particular case of a binary operation for which we develop a formal definition. Hence, applying A3,
Let d, flI, and l ' be arbitrary sets, and let .vi x 91 denote the set of all [oeO + oeO] + (- oeO) = oeO
ordered pairs of elements (a, b), where a and b are members of.vl and 91,
respectively. A binaryoperation from d x 91 to l ' is a rule that associates and, using 83 and A4,
a unique member of f' with each ordered pair of d x 91. oe(O + ii) + (- oeii) = oeii,
Example 1. (a) If JII, 91, and [/ each refer to the set of all real numbers, oeii + (- oeii) = oeii.
then addition, subtraction, and multiplication of real numbers are binary
Comparing with (1) we get the desired result, 0 = oeO.
operations from .vi x d to d.
(b) If we take d 1 to be the set of all real numbers excluding zero, then Exercise 2. Deduce the following facts from. AI-AS and 81-8S (oc, pare
division is a binary operation from d x d 1 to d. real):
(c) Denoting the set of all position vectors (with real coordinates) as
IRa, we conclude that vector addition is a binary operation from (a) Oi = ii.
IRa x IRa to IRa. (b) If d = 0, then either oe = 0 or it :;:: 0, or both.
(d) In the above notation, scalar multiplication of position vectors by a (c) oe( -i) = -oei.
real number is a binary operation from R x IRa to IRa 0 (d) (-oc)i = -d.
(e) (oe - P)i = IXi - pi.
A binary operation defined on d x d is said to be closed (on d) if the (f) oc(i - Ii) = oei - ocli. 0
operation is from d x d to .vi. Thus, vector addition is closed on IRa.
Let us collect the basic properties of position vectors regarding addition Thus, the properties At-AS and 81-8S are basic in the sense that they are
and scalar multiplication for our further consideration. sufficient to obtain computational rules on the set of position vectors which
e
For any position vectors i, Ii, and any real numbers lX, p, we have the extend those of arithmetic in a natural way.
following properties: It turns out that there are many sets with operations defined on them like
vector addition and scalar multiplication and for which these conditions
Al a + Ii is a unique vector of the set to which a, Ii belong, make sense. This justifies the important unifying concept called a "linear
A2 a+ii=b+i. space." We now make a formal definition, which will be followed by a
A3 (i + b) + c = a + (Ii + C), fewexamples.
A4 There exists a vector 0 such that i + 0 ee i, Let!' be a set on which a closed binary operation ( +) (like vector addition)
AS For every i there exists a vector - i such that i + (-8) = O. is defined. Let' be a field and let a binary operation (scalar multiplication)
81 oca is a unique vector of the set to which a: belongs, be defined from IF x l ' to !fJ. If, in this context, the axioms AI-AS and
82 oc(pi) = (ocp)i, Sl-85 are satisfied for any a, b, C E S and any oc, pe fF, then l ' is said to be
83 oc(a + Ii) = d + ocii, a linear space over !F.
74 3 LINEAR, EUCLIDEAN, AND UNITARY SPACIlS 3.2 SUBSPACES 75

In this definition we use the notion of a "field," The definition of a field (b) the set of all pairs of real numbers of the form (1, a);
is an issue we shall avoid. The reader who is not familiar with the concept (c) the set of all real matrices of the form
need only understand that, in this book, wherever a general field fF is intro-
duced, lR (the set of real numbers) or e (the set of complex numbers) may
be substituted for , . .
Before proceeding to examples of linear spaces, let us introduce some with a fixed bo # 0;
notational devices. The set of all m x n matrices with elements from the field (d) the set of all polynomials of a fixed degree n;
fF is denoted by r x n. In particular, Rm x n (respectively, C" x 1 stands for the (e) the set of all Hermitian matrices over C;
set of real (respectively, complex) m x n matrices. Exploiting this notation, (f) the set of all rotations of a rigid body about a fixed point over R.
!F 1 Xn (respectively, fF lft x 1) denotes the set ofall row- (respectively, column-)
In each case pick out at least one axiom that fails to hold. 0
,1ft x 1 will often be denoted by ,m.
matrices of length n (respectively, m). For the sake of convenience, the set
In contrast, the symbol JI;. stands for It should be noted that a set can be a linear space under certain operations
the set of all ordered n-tuples of numbers from the field ,. Borrowing
some geometrical language, the elements [a1. a2 . .. am]T of
(a1> a1.,"" am> from fF 1ft are both referred to as vectors of order m, as is
,m and
of addition and scalar multiplication but may fail to generate a linear
space (over the same field) under other operations. For instance, the set R1.
is not a linear space over R under the operations defined by
[al a1. .,. a",]e,l x m.
For solving the following exercises, it suffices to verify the conditions (ai' a1.) + (b1> b1.) = (a1 + b 1 ,.... 1, a1. + b1.)'
AI-AS and SI-S5 (called the linear space axioms) for the elements ofa set. IX(al' a1.) = (lXa1, IXa2)

Exercise 3. Check that the following sets are linear spaces (with regard to for any (aI' a 2), (b1, b2) E R2 and any ex e R.
the familiar operations defined on them) over the appropriate field: To avoid any such confusion, we shall always assume that the operations
(a) Rn, en, , n , ' n ' on the sets of n-tuples and matrices are defined as in Chapter 1.
e
(b) Rlft x n, m x n, p x n In conclusion, note also that a triple (a1> a2' a3)' which was referred to
(c) The sets of all upper triangular matrices and all symmetric matrices as a vector with initial point at the origin and terminal point at the pointP
with real elements. with Cartesian coordinates. (ai' a2, a3) (this approach was fruitfUl, for
(d) The set of all polynomials of degree not exceeding n with real co- instance, for the definition of operations on triples), can be considered as a
efficients (including the identically zero function). point in three-dimensional space, Similarly, an n-tuple (ai' a2' . , an)
(e) The set of all real-valued continuous functions defined on [0, 1]. (alE" i = 1,2, ... , n)may be referred toasa "point" in the " n-dimensional"
(f) The set of all convergent sequences of real numbers. 0 linear space F ...

Thus, the concept of a linear space admits the study of structures such
as those in Exercise 3 from a unified viewpoint and the development of a
general approach for the investigation of such systems. This investigation 3.2 Subspaces
will be performed in subsequent sections.
Before leaving this section, we make two remarks. The first concerns
sets and operations that fail to generate linear spaces; the second concerns Let 9' denote a linear space over a field !F and consider a subset 9'0 of
the geometrical approach to n-tuples. elements from 9'. The operations of addition and scalar multiplication are
defined for all elements of 9' and, in particular, for those belbnging to .9'0'
Exercise 4. Confirm that the following sets fail to be linear spaces under The results of these operations on the elements of 9'0 are elements of 9'.
the familiar operations defined on them over the appropriate field: It may happen, however,that these operations are closedin 9'0' that is, when
(a) the set of all position vectors of unit length in three-dimensional both operations are applied to arbitrary elements of 9'0' the resulting
space; elements also belong to 9'0' In this case we say that 9'0 is a subspace of Y.
76 3 UNIlAIl, EUCLIDEAN, AND UNITARY SPACES 3.2 SUBSPACES
77
Thus, a nonempty subset 9'0 of a linear space 9' over ' is a subspace of 9' Exercise 3. Let A E m x ". Prove that the set of all solutions of the homo-
if, for every II, be 9'0 and any a E ', geneous equation Ax = 0 forms a subspace of ~". 0
(1) a + bego' The set of all vectors x for which Ax = 0 is the nullspace or kernel of the
(2) aa Ego. matrix A and is written either N(A) or Ker A. Observe that Exercise 3
(Observe that here, and in general, bold face letters are used for members simply states that Ker A is a subspace.
of a linear space if some other notation is not already established.) Exercise 4. Find Ker A (in 1R 3) if
It is not difficult to see that go is itself a linear space under the operations
defined in 9' and hence the name "subspace."
If f/ is a linear space, it is easily seen that if 0 is the zero element of f/, then A=G -11 0]l '
the singleton {O} and the whole space f/ are subspaces of f/. These are known
as the trivial subspaces and any other subspace is said to be nontrivial. SOLUTION. A vector [Xl X2 X3]T belongs to Ker A if and only if
It is also important to note that the zero element of f/ is necessarily the
zero element of any subspace 9'0 of 9'. (This follows from Exercise 3.1.2(a).)
It is easy to verify that criteria (1) and (2) can be condensed into a single
statement that characterizes a subspace. Thus, a nonempty subset 9'0 of a
linear space f/ over ' is a subspace of f/ if and only if for every a, b E f/ 0 or, what is equivalent,
and ever a, fJ E ', aa + fJb E f/ o-
Xl - X2 = 0,
Example 1. (a) Any straight line passing through the origin, that is, the XI + X2 + X3 = O.
set of triples
Hence. XI = X 2, X3 = -2X2' and every vector from Ker A is of the form
{x = (Xt>X2,X3): Xl = at,x2 = bt'X3 = ct,(-ro < t < eoj},
is a subspace of R 3 .
(b) Any plane in 1R 3 passing through the origin, that is, the set oftriples
{x = (XI' X2' X3); aXI + bX2 + CX3 = 0, a 2 + b 2 + c2 > D], for some (real) ex. As in Example l(a), this is a subspace of R 3 0
for fixed real numbers a, b, c, is a subspace of 1R 3 . Geometrically, a vector x.from R 3 belongs to the kernel of A = [a/J]ri~ 1
(c) The set of all n x n upper-Ilower-) triangular matrices with elements if and only if it is orthogonal to the row-vectors of A. Thus for i = 1, 2, .. : , m;
from IR is a subspace of lR"le l
the ~ector x must be perpendicular to [a/1 ai2 ai3]' If m = 2, as in the
(d) The set of all real n x n matrices having zero main diagonal is a preVIOUs example, then any x E Ker A must be orthogonal to both of the
subspace of Rille ".
vectors [l -1 0] and [1 1 1], that is, x must be orthogonal to the
(e) The set of all symmetric real n x n matrices is a subspace of Rill< ". 0 pl~e containing these vectors and passing through the origin. It is now
obVIOUS that Ker A represents a straight line through the origin orthogonal
Note again that any subspace is itself a linear space and therefore must to this plane.
contain the zero element. For this reason, any plane in R 3 that does not pass
Now we introduce another subspace associated with a matrix, and this
through the origin fails to be a subspace of R 3
should be seen as a dual. or complementary, concept to that of the kernel.
If A is an m x n matrix, the set
Exercise 1. Prove that if 9'1 and 9'2 are subspaces of a linear space ~
then the set f/1 ("\ 9'2 is also a subspace of 9'. Give an example to show that R(A) = Im A A {y E m : y = Ax for some x E "}
gl u f/2 is not necessarily a subspace of 9'. is said to be the range (or image) of A.
/0 3 LINEAR. EUCUDEAN. AND UNITARY SPACES 3.3 LINEAR COMBINATIONS
79
Exercise S. The set Example 1. (a) Since
aT = [2 -3 -4] = 2[1 0 1] - 3[0 1 2],
1
the vector aTE 1R x 3 is a linear combination of the vectors aT = [1 0 1]
is the range of the matrix and e] = [0 1 2]. 1

(b) Any vector in 'n is a linear combination of the unit vectors


A = [~ ~ ~l I
o
0
1
o
Exercise 6. Check that the range of any matrix A E (F",x n is a subspace of o, e2 = 0 , ... , en = 0
(F"'.
o
Exercise 7. Show that if the matrix product AB is defined, then o o 1
Ker(AB) ::> Ker B, Indeed, any a = [OCI IX2 ocnY E fFn can be represented in the form
with equality when A is invertible, and
" = L~=r lXiei'
. (c) In th~ Iin~ar space 'm)( n of m x n matrices over !F, any matrix A is a
Im(AB) c Im A, !mear .c~mbma.t1.on of the mn matrices Eij (1 ::;; i ::;; m, 1 S j S n) with I
m the l,jth position and zeros elsewhere.
with equality when B is invertible. 0
=(d) In the'"linear space of n x n Toeplitz matrices A = [a i _J.]~-:! over
I matrices A = [o]n-: I
1,)-0
Note that the symbol c (or :: is used consistently to denote either iT, any matrix IS a linear combination of 2n -

strict inclusion or equality.


< k ,.
(1 - n...;. 1) h h k I-j.k i,J=O
"" n - ,were t e Kronecker delta Olj is defined by
O.
l]
= {I
0
when i =j,
when i "" j.
~e) Any polynomial in x of degree not exceeding n can be represented as
3.3 Linear Combinations a linear combination of the monomials 1, x, x 2 , , x". 0
R~turning to the problem formulated above and making use of the new
We start with the following problem: Let at> a2,"" an be elements termInology, we conclude that any subspace containing the given elements
of a linear space g over the field F. We want to find the minimal subspace a l , . . , "n' must contain all their linear combinations.
go of g containing the given elements. The minimahty of go is understood It is easily verified that the set of all linear combinations (over ') of the
in the sense that if 9'r is a subspace of g containing the elements al> a2' elements "t> ~2' """~ belonging to a linear space 9" generates a subspace
... , an, then 9'i :> 9'0' .$Po of..9. Obviously, this subspace solves the proposed problem: it contains
To solve the problem we first note that, according to the definition of a the given elements themselves and is contained in any other subspace
subspace, .9'0 must contain, along with ai (i = 1,2, ... , n), all the elements 9J of [/ such that aj, a2' ... ' an E [/1'
of the form IXilll(IX; E!F, i = I, 2, ... , n). Furthermore, the sums of elements . The minimal subspace 9'0 generated by the elements all tl2,"" an E 9'
belonging to 9'0 must also be elements of 9"0' ~s ref~rred to as the linear hull (or span) of the a l (i = 1,2, ... , n) over P and
Thus, any subspace containing the elements lit> " 2, ... , an must also IS WrItten span{a l , a2"." an}. Thus by definition,
contain all elements of the form
(1) . span{"t> a2"" , an} A{tlE9":a = rOClal'OCt>1X2, ... ,lXnE'}.
1=1

for any IX; E fF (i = 1,2, ... , n). An expression of the form (1) is referred to ~ote ~hat a linear space span {a 1> a 2, ... ,an} is determined uniquely by the
as a linear combination of the elements "1' a 2, , lin over the field :F. panmng elementsa l (i = 1,2, ... , n). We have proved the following theorem.
80 3 LINEAR, EUCLIDEAN, AND UNlTARY SPACES 3.4 LINEAR DEPENDENCE AND INDEPENDENCE
81
Theorem 1. If ~ isalinear space and"lo"z'''',"n E ~,then span {a h "n} ~o= span{a2' aa'' an} This means that the spaces consisting of linear
is theminimal (in the sense 01inclusion) subspace containing ai' az, ... , an' combinations of the n elements a lo a:z, .... an and n - I elements a ,
2
Example 2. (a) If al and a2 stand for two nonparallel position vectors 'a,;, An are identical. In other words, for any fixed linear combination of
in IRa, then their linear hull isthe set of all vectorslyingin the plane containing "10 a2'-, an there exists a linear combination of the last n - 1 elements
"I and "2 and passing through the origin. This is obviously the minimal such that they coincide.
subspace containing the given vectors. For the investigation of this situation we take an element" ofthe subspace
9'0 expressed as:
(b) Let al and az be parallel position vectors in IRa. Then the. linear hull
of both of them, or of only one of them. is the same,namely,the line parallel
to each of III and ll2 and passing through the origin (in the language of IRa).
o
where CX], CX:z .... , CXnE fF, and 0(1 =F O. Since90 isalso the span of a2. "3 ..... a".
Example 2 shows that in some cases a subspace span{"l' ":z ... , lln}
may be spanned by a proper subset of the given elements ah a2' ... ' lln'
then necessarily a = L7
=2 Pial. where P2. P3 . p"e fF. Hence

The problem of determining the minimal number of elementsthat span the " n

subspace willbeconsidered in the next section. L O(llll = 1=2


1=1
L Pilll
Now we note that the notion we introduced in the previous section of the
image of a matrix A e fFM" n can also bedefined using linear combinations. or, what is equivalent,
Let x = [Xl X2 Xn]T be an arbitrary vector from fFn and let
A...j U = 1,2... n) denote the columns of A considered as vectors from (1)
P. Observethat the vector Ax e P can be written as the linearcombination. with Gel =F O.
Ax = A ... 1 X l + A ... 2 X 2 + ... + A ...nxn (2) Thus, in viewof (1), a necessarycondition that a set of n elementsfrom 9'
and a subset of n - 1 elements from them span the same subspace is that
of columns of A. Representation (2) shows that every vector y in 1m A,
the image of A. is a linear combination of the columns of A, and conversely. (2)
Thus Im A can bedefined as the linear hull of the columns of A-.
where not all of l'1o l'2... ' l'" are equal to zero. In general. elements
Note that the span of the columns of A written as vectors from fFM is
also referred asthe column space of A. written 'IA' The row space fit A C fF 1 "ft
lllo "2.. a" satisfying (2) with at least one l'1 =F 0 (1 :s;; t :s;; n) are called
linearly dependent. Thus. we conclude that n elements alo a2..... a" and
of A is defined similarly. Thus a proper subset of those elements span the same subspace only if "1. a 2
UtA = {yT:yT = llTA for some" E fFm} = {yT: y E Im AT}. . . ~ , anare linearly dependent.
It turns out that this condition is also sufficient: if a 10 a2 ..... 4" are
Although the column and row spaces of the matrix A are generally linearly dependent, then the subspace span{a l, a2 4 n} can be spanned
completely different (observe that 'tile fFM and fit II C fF 1 x "), a character-
by some n - 1 elements of the set {ai' a:z ... , a"}. Indeed, let there exist
istic common to them both will emerge in Section 3.7.
~alars l'l> l'2... l'n such that Eq, (2) holds and at least one of the l'1> say l'l'
IS not zero. Then, dividing Eq. (2) by l'j, we obtain

3.4 Linear Dependence and Independence "j = - -l'14 1 - ... -


n-, l'j+1 l'n
- - " J - l - - - " j + l - .. - - "n,
~ ~ ~ ~
and consequently, "j is a linear combination of the remaining elements of
Let the elements a 10 a 2 , , an from a linear space f/ over fF span the {Ill> a 2 , , aft}.
subspace ~o = span{al' a:z, , an}, and suppose that 90 can be spanned Furthermore.let a E span{" 10 a 2' ... ~a"}. so that" is a linear combination
by a proper subset of the given elements. For example, suppose that of the spanning elements al> a2... 4". Since for somej, the element 4j is
;J.J J m l,unuN OF A HASIS
83
a linear combination of 411> , a i- I> a i+ 10 , an, then so is a. Hence (b) the set of matrices in R 3 )( 3
a E span {a I' , aj-I' aJ+ 10 , an}, and so
12 0]
o 1 0, 0 0 0]
SPan{aI> "2'' an} c span{"I> ... , "i-I> "i+I' .. ' an}.
Since the reverse inclusion is obvious, the two subspaces must agree.
[000 [
o 0 0 ;
o 1 0
We have now established another characterization of a linearly dependent
set of elements. (c) the set of m x n matrices Eli (1 ::S; i ::S; m, 1 ::S; j ::S; n) with 1 in the
(i, j) position and zeros elsewhere;
Theorem I. The subspace span {II 10 412 . . . , an} is the span ofa proper subset (d) the set of polynomials
of {al}l n= I if and only if the elements 4110 42' , lin are linearly dependent.
PI(X) = 1 + 2x, P2(X) = x 2, P3(X) = 3 + 5x + 2x 2 ;
A linearly independent set is now defined as one that is not linearly (e) the set of polynomials 1, x, x 2 , , x";
dependent. Thus, if g is a linear space, the set of elements 4 1, a2' ... , an e g
(0 any subset of a set of linearly independent elements. 0
is linearly independent if the equation
n
L ')Ilal =
;=1
0,
3.6 The Notion of a Basis
for ')11'1'2, ... , 1'n E ~ implies that
1'1 = 1'2 = ... = 1'n = O.
We see that, in this case, it is no longer possible to delete an element of the Let the elements ai' 412,"" am (not all zero) belong to a linear space g
over the field ~ and let
set {III> a2' ... , an} in the construction of span{4h " 2 , ... , 4n } .
It turns out, therefore, that in this case n is the smallest number of spanning ~ = span {a 10 a 2 , . , am}.
elements of the linear hull being considered. This and other related problems
will be discussed in more detail in the next section, while the rest of this If the elements 41 10 42, ... , am are linearly independent, that is, if each of

section will be devoted to some illustrations. them fails to be a linear combination of others, then (Theorem 3.4.1) all
a/ (i = 1.,2, ... , m) are necessary for spanning go. In the case of linearly
Exercise 1. Confirm the linear dependence of the following sets: dependent elements aI' 42, .. , 4 m we may, according to the theorem,
(a) any two vectors in R 2 lying on the same line through the origin delete all vectors of the set that are linear combinations of the others so
(collinear vectors); . that the span of the remaining (linearly independent) n elements is ~.
(b) any three vectors in R 3 lying in the Same plane passing through the Thus we can always consider ~ as the linear hull of n linearly independent
elements where 1 ::S; n ::S; m.
origin (coplanar vectors);
Let g denote an arbitrary linear space over !F. A finite set of elements
(c) the set {4lt "2' a3} of elements from R 3 such that aT = [2 -3 0],
al = [1 0 -1],41 = [3 -6 1]; (1)
(d) The set of polynomials PI(X) = 4 + 2x - 7x 2, P2(X) = 2 - x 2 ;
P3(X) = 1 - x + 2x 2 ; is said to be a (finite) basis of.9 if they are linearly independent and every
(e) any set containing the zero element 0.;. " element 4 E.9 is a linear combination of the elements in (1):
(f) any set containing an element that IS a linear combination of the
others; IX; E iF, i = 1, 2, ... , n. (2)
(g) any set having a subset of linearly dependent elements.
Exercise 2. Confirm the linear independence of the following sets: In other words, the elements in (1) form a basis of.9 if

(a) the set of unit vectors eIt e 2, . , en in iFn ; (3)


84 3 UNllAll. EUCLJDIlAN. AND UNITARYSPACBS 3.5 THE NOTIONOF A BASIS 85
and no &, (1 S; i S; n) in Eq. (3) can be discarded. In this case the elements holds is said to be the representation of II with respect to the basis {1l,H=I'
in (1) are referred to as basiselements of f/. The scalars IXl,1X2' .. , IXn are referred to as the coordinates (or components)
Recalling our discussion of linear hulls in the first paragraph of this of a with respect to the basis. Proposition 1 thus means that the representa-
section. we conclude that any space generated by a set of m elements, not tion of an element with respect to a fixed basis is unique.
all zero, has a basis consisting of n elements, where 1 S n S m. Someexamples
of bases are presented in the next exercise; the results of Example 3.3.1 and Example 2. If IX = [oc l OC1 . .. OCn]T E ,n, then its representation with
Exercise 3.4.2 are used. respect to the standard basis is again [OCI OC1 ... OI:n]T. 0

Exercise 1. Verify the following statements: The following example illustrates the fact that, in general, a linear space
does not have a unique basis.
(a) The set of n unit vectors el , e1' ... , en E fF" defined in Example
3.3.1(b) is a basis for (Fn. This is known as the standard basis of the space. Exercise 3. Check that the vectors
(b) The set {I, X, x 1 , ,xn} is a basis for the space of polynomials
(including zero) with degree less than or equal to n.
(c) The 1M matrices Ei) (1 S; i S; m, 1 S; j S; n) defined in Example
3.3.1(c) generate a basis Cor the space,mlCn. 0
form a basis for 1R 3, as do the unit vectors el> e1, e3'
Note that a basis necessarily consists of only finitely many vectors.
Accordingly, the linear spaces they generate are said to befinite dimensional; Hint. Observe that if II = [OCI OC1 0I:3]T e 1R 3, then its representation with
the properties of such spaces pervade the analysis of most of this book. If a respect to the basis {4 1, 4 1 , II]} is
linear space (containing nonzero vectors) has no basis, it is said to be of
Hocl + OC1 - IX] -a l + a1 + at] IXl - a1 + lX]]T
infinite dimension. These spaces are also of great importance in analysis and
applications but are not generally within the scope of this book. and differs from the representation [cx l CX1 CXl]T of tl with respect to the
standard basis. 0
Proposition 1. Any element of a (finite-dimensional) space f/ is expressed
uniquely as a linear combination of the elements ofa fixed basis. In view of Exercise 3, the question of whether or not all bases for fJ' have
the same number of elements naturally arises. The answer is affirmative.
In other words, given 4 E f/ and a basis {ai, a1' .. ' an} for f/, there is one
and only one ordered set of scalars OCl' OC1' , OCn
sentation (2) is valid.
E'
such that the repre- 1'beorem 1. All bases of a finite-dimensional linear space f/ have the same
number of elements.
PROOF. If n n PROOF. Let {b h b2 , .... bn } and {Ch C1' ... , cm } be two bases in f/. Our
a = 1=LI oc,a, = 1=1
L Pia" aim is to show that n = m. Let us assume, on the contrary, that n < m.
First, since {b 1, b 2 , , bn } constitutes a basis for fJ' this system consists
then of linearly independent elements. Second, every element of f/ and, in
partiCUlar, each c) (l S; j S; m) is a linear combination of the b, (i = 1,2,
n
L (OC, -
1=1
PI)Il, = O. . .. , n), say,
n
The linear independence of the basis elements implies oc, = P, for i = I, 2,
c) = L Pub" 1 S; j S; m. (4)
.. , n, and the assertion follows. 1= I

Let Il belong to a linear space f/ over' having a basis {Ill> 1l1' . ' lin}' Y".e shall show that Eq.(4) yields a contradiction of the assumed linear
The column matrix ex = [IXI OC1 .. . OCJTe,n such that the decomposition andependence of the set {Ch , , cm}. In fact. ~~t
"m
LY)c) = o. (5)
)=a

'I"f
3 LINEAR.EUCLIDEAN, AND UNITARY SPACES 3.6 SUM AND DIRECT SUM OF SUBSPACES
87
Substituting from Eq. (4), we obtain Exercise 6. Prove that
dim A(9"o) S dim 9"0'
~
i=l
}'i(r fJl}b;) = "( f }'JPIi)b; = O.
1=1 1=1 i=l
with equality when A is invertible.

The linear independence of the system {6i> 6 2 , 6n} now implies that
Hint. Use Exercise 3.2.7 and Exercise 5 in this section. 0
m . Let 9'.
be an n-dimensional linear space. It turns out that any system of r
r. }'jfJij = 0,
J=l
i = 1, 2.... , n, (6) linearly 1D~ependent elements from 9' (1 S r S n - 1) can be extended to
form a baSIS for the space.
and, in view of Exercise 2.9.5, our assumption m > n yields the existence P~opos~tion 2. If t~e eleme?ts {Ill}~= I (1 S r S n - 1) from a space 9' of
of a nontrivial solution (')11,12"'" 1m) of (6). This means [see Eq. (5)] d,mensIOn n are linearly independent, then there exist n - r elements
that the elements (:10 C2' Cm are linearly dependent, and a contradiction a, + It II, +2, .. , an such that {ai}7=1 constitutes a basis in f/. "
with the assumption m > n is derived.
PROOF. Since dim 9" = n > r, the subspace 9i = span{1l1t a2, " ' , II,} is
If m < n, exchanging roles of the bases again leads to a contradiction.
a proper subspace of f/. Hence there is an element ",+ I e 9' that does not
Thus m = n and all bases of 9' possess the same number of elements.
~Iong to 9'1: The elements {a;H~: are consequently linearly independent,
The number of basis elements of a finite-dimensional space is, therefore, since otherwise a'H would be a linear combination of {a t H: l and would
a characteristic of the space that is invariant under different choices of basis. belong to 9"1' If r + 1 = n, then in view of Exercise 4 the proof is completed.
We formally define the number of basis elements of a (finite-dimensional) If r + 1 < n the argument can be applied repeatedly, beginning with
space 9' to be in the dimension ofthe space, and write dim 9' for this number. span{alt 112', a,H}, until n linearly independent elements are con-
In other words, if dim 9' = n (in which case 9' is called an n-dimensional structed.
space), then 9' has n linearly independent elements and each set of n + 1 Let lilt az.. II, (l S r S n - 1) belong to .9, with dim 9" = n. Denote
elements of 9' is necessarily linearly dependent. Note that if 9' = {OJ then, [/1 = span{llb "2' ... , a,}. Proposition 2 asserts the existence of basis
by definition, dim 9' = o. ele~ents 11,+ h. a,+ z, ... II. spanning a subspace 9; in 9' such that the
The terminology "finite-dimensional space" can now be clarified as union of base~ m 9'1 and 9"2 gives a basis in f/. In view of Bq, (2), this provides
meaning a space having finite dimension. The ..dimension" of an infinite- a representation of any element a e 9' as a sum of the elements
dimensional space is infinite in the sense that the span of any finite number ,
of elements from the space is only a proper subspace of the space. a(l) = L
1"'1
IX;II; (e 9i) and a(2) = r.
n

1",,+1
IXI"t (e 9;).
Exercise 4. Show that any n linearly independent elements "10 "2 ' "n We write
in an n-dimensional linear space 9' constitutes a basis for 9'. a = a(l) + a(2).
SoLUTION. If, on the contrary, span{"l' b2 , , "n} is not the whole space, where a(l) e 9i and a(2) e 9;. We will study this idea in more detail in the
"n
then there is an II e f/' such that II, "1' 62, ... are linearly independent. next section.
Hence dim 9' ~ n + 1 and a contradiction is achieved.
Exercise 7. Check that, with the notation of the last paragraph f/', ,..., f/.
Exercise 5. Check that if 9'0 is a nontrivial subspace of the linear space 9: = {OJ. 0 1 2
then
dim 9'0 < dim 9'. 0
3.6 Sum and Direct Sum of Subspaces
Let f/'o be a subspace of IFn and let A e px n. The subspace

{Ax: x e 9'0} Let 9' be a finite-dimensional linear space and let 9'1 and f/'2 be subspaces
of 9". The sum 9i + 9; of the subspaces 9"1 and 9'2 is defined to be the
in P is known as the image of 9'0 under A and is written A(9'o). set consisting of all sums of the form III + "2, where III e 9i and 112 e 9;.
88 3 LINEAR, EUCLIDIlAN, AND UNITARY SPACIlS 3.6 SUM AND DIRECT SUM OF SUBSPACIlS 89
It is easily seen that!l't + !l'2 is also a subspace in !I' and that the operation PROOF. Assume first that 9't (j 9'2 = {OJ. If simultaneously
of addition of subspaces satisfies the axioms AI-A4 of a linear space (see
Section 3.1), but does not satisfy axiom AS. " = "I + "2 and " = "i + "21
Note also that the notion of the sum of several subspaces can be defined where ".'"1 E 9'.. 42'"Z E 9'2' then
similarly.
"1 - 41 = "2 - 4 z
Exercise 1. Show that if "1 = [l 0 I]T, "2 = [0 2 _I]T, b1 =
[2 2 I]T, b 2 = [0 0 I]T, and Since "/ - "I E 9i (i = 1, 2), the elements "1 - "'. and "2 - "2 belong to
9'1 ( j 9'2 = {OJ. Hence "1 = "~, "2 ="2 and the representation (2) is
!l'1 = span{"t, "2}, !l'2 = span{bb b2}, unique.
Conversely, suppose the representation (2) is unique but also 9'1 ( j 9'2
then !l'1 + !l'2 = 1R 3
Observe that dim 9'1 = dim !l'2 = 2, while dim 1R3
'# {O}. Let C E 9'1 ( j 9'2' C '# O. The second decomposition
=3. 0
A general relationship between the dimension of a sum of subspaces " = ("t + c) + ("2 - c)
and that of its summands is given below. leads to a contradiction.
Proposition 1. For arbitrary subspaces !l't and !l'2 of a finite-dimensional This theorem reveals an important property of subspaces 9'0 = 9'1 + 9'2
space, for which 9'1 ("\9'2 = {OJ. In this case 9'0 is called the direct sum of the
dim(!I'1 + !l'2) + dim(!I'1 n !l'2) = dim!l'l + dim !l'2' (I) subspaces 9'1 and 9'2, and is written 9'0 = 9'1 + 9'2,.
In view of Theorem I, the direct sum !l'o = !l'1 + 9'2 can also be defined
The proof of (I) is easily derived by extending a basis of !l'1 ( j 9'2 (see (when it exists) as a subspace 9'0 such that for each" E 9'0 there is a unique
Proposition 3.5.2) to bases for 9'1 and 9'2' decomposition" = "t + "2' with"t E 91, 42 E 9'2' This remark admits an
Obviously, Eq. (I) is valid for the subspaces considered in Exercise 1. obvious generalization of the notion of direct sum to several subspaces.
More generally, the following statement is a consequence of Eq. (I). Thus, let 9'0'9'.. 9'2' ... ,.9'J. be subspaces of a (finite-dimensional) linear
Exercise 2. Show that for any positive integer k, s~ace 9'. If any element" of 9'0 has a unique decomposition 4 = D=1 4/,
t
With 4/ e 9j for i = 1, 2, ... , k, we say that 9'0 is the direct sum of the sub-
dim(9't + !l'2 + ...... 9',,)::; L dim 91 0 spaces 9'1,9'2""'.9'J. and write 9'0 = 9'1 + 9'2 + ... +.9'J. or, briefly,
'''I Vo = D=t '91. Note that the operation of taking the direct sum also
satisfiesthe axioms AI-A4 of a linear space.
Note that by definition of 9'1 + 9'2 every element a of 9'1 + !l'2 can be
represented in the form Exercise 4. Confirm that 1R 3 = 9'1 + 9'2" where 9'1 = span{el> e 2}, 9'2
" = "t + "2' (2) = span{e3 }, and el> '2' e3 are the unit vectors in 1R3 0
where "1 e9i, "2 e 9'2' although in general, the decomposition in Eq. (2) The rest of this section is devoted to the investigation of properties of
is not unique. direct sums.
Let the elements bl> b2 , ... , b, and e.. C2"'" Cm be bases for the
Exercise 3. Let 9i, 9'2' "I' "2' bt , and 6 2 be defined as in Exercise 1.
Confirm the following representations of the element" = [1 0 O]T: I sUbspaces 9'1 and 9'2, in 9"; respectively. The set
{bl> b 2 , b" CI> C2"'" Cm}
b 2 = -t4 2 + fbi - b2 0
" = "I - I'
It turns out that nonuniqueness in Eq. (2) is a result of the fact that is referred to as the union of these bases. The union of several bases is defined
in a similar way. .
9'1 ( j 9'2 '# { O } ' I
Theorem 1. Let!l't and !l'2 be subspaces of a finite-dimensional space and J Tlieorem 2. The subspace 90 is the direct sum of subspaces 91, 9'2' ... , ~
let 4 denote an element from 9't + !l'2' Then the decomposition (2) is unique i of 4 finite-dimensional space 9' if and only if the union of bases for 9't, V 2
if and only if 9'1 n [/2 = {OJ. ... ,91 constitutes a basisfor 9'0'
90 3 LINEAR, EUCLIDEAN,AND UNITARY SPACES 3.7 MATRIX REPRESENTATION AND RANK 91

PROOF. We shall give details of the proof only for the case k = 2, from Now we can give a description ofthe direct sum of k subspaces comparable
which the general statement is easily derived by induction. with the original one used in the case k = 2.
Suppose, preserving the previous notation, Exercise 6. Show that 9'0 = L~= 1 9j if and only if for any i (l ~ i ~ k),
tt = {b 1 , b 2 , , b" C1> C2"'" c",} 9j /"'I (9'1 + '" + 9"-1 + 9",+ t + ... + 91) = {O} (5)
is a basis of subspace 9"0' Then for every a E 9"0' or, what is equivalent,
r '"
a = L /l.b, + L "IJc) = al + a2' (3)
1=1 Jool

where a 1 e 9'1 and 42 e 9'2' Thus, 9"0 c 9"t + 9'2' It is easily seen that also
for i = 2, 3, ... , k.
9", + 9i c 90 and hence 9"0 = 9'1 + 9'2' Since 9"1 /"'I 9"2 = {O} (see
1
Exercise 7. Confirm that 9"0 = L'=
19", if and only if for any nonzero
Exercises 3.5.7), we have 9"0 = 9"1 + 9'2' a, E ~ (i = 1,2, ... , k), the set {at, 42, ... , 4k} is linearly independent.
Conversely, let 9"0 be a direct sum of 9'1 and 9"2' That is, for any a e 9'0 Hint. Observe that linear dependence of {a to , at} would conflict with
there is a unique representation (5). 0
4=al+ 0'2 ,
Let 9"1 be a subspace in the finite-dimensional space 9". The remark
where 41 e 9"1> 41 e 9"1' Let tt be a union of base~ {ba~~ 1 for 9"t and preceding Exercise 3.5.7 provides the existence of a subspace 9"1 of 9' such
[e }"'_ for 9"2' Our aim is to show that the system 81S a baSIS for 9'0'
~~d~ any element II from 9'0 is a linear combination (3) of elements
+
that 9"1 9"2 = 9". The subspace 9"2 is said to be a complementary subspace
of 9i with respect to 9".
from tt. On the other hand these r + m elements of 8 are linearlyindependent. It is worth mentioning that a complementary subspace of 9"t with respect
To see this, we first recail that the assumption 9"0 = 9"1 9"2 implies (in + to 9" is not unique and depends on the (nonunique) extension of the basis
fact, is equivalent to) the unique decomposition of 9"t to one for 9" (see the proof of Proposition 3.5.2).
0=0+0 ~ Exercise 8. Let 9"1 be defined as in Exercise 1. Check that both of the
of the zero element, since 0 e 9'0 /"'I 9'. /"'I 9"2' Thus, if subspaces
r
L lX.b. + L ~)Cj = 0, '" 9"2 = span{ed and 9'2 = span{e2}
3
1= 1 )=1 are complementary to 9"1 with respect to 1R
where obviously Li=tlX,b,e 9"t and D.. t ~)c)e 9"1' then it follows from Exercise 9. Prove Proposition I. 0
Eq. (4) that
r '"
L IX,b, = L ~)Cj = O.
,= 1 j= 1 3.7 Matrix Representation and Rank
The linear independence of the basis elements for 9't and 9'2 now yields
IX, = 0 (i = 1,2, ... , r) and ~J = 0 (j = 1,2, ... , m) and, cons~quentl~, the
linear independence of the elements of 8. Thus the system 8 IS a basts for Let 9' denote an m-dimensional linear space over :F with a fixed basis
9'0' . {a ha 2 , , a",}. An important relationship between a set of elements from 9'
In the next three exercises it is assumed that 9'0 IS the sum of subspaces and a matrix associated with the set is revealed in this section. This will
CP UJ ('I and necessary and sufficient conditions are formulated for allow us to apply matrix methods to the investigation of the set, especially
07 1, 07 1 , ... ,071<
the sum to be direct. for determining the dimension of its linear hull.
Exercise 5. Show that 9'0 = Lt=
1 9j if ~LDd only if . Recallthat if be 9" and b = Li"= 1 ala;. then the vector [IX1 IX2 ... a",]T
I< Was referred to in Section 3.5 as the representation of b with respect to the
dim 9'0 =L dim 9".. 0 basis {a,}r..l' Now we generalize this notion for a system of several elements.
'=1
3 LINIlAlt, EUCLIDEAN, AND UNITARY SPACIlS 3.7 MATRIX REPRESENTATION AND RANK 93

Let b h b2 , , bn belong to fI'. From the representations ofthese elements deduce the relation (3). Hence the systems {bJ}f"' l and {rl.J}f"I are both
in the basis {1I1}T=h we construct the m x n matrix linearly dependent or both linear independent. Rearranging the order of
Aa = [OCi}]i,j'':' h (1) elements, this conclusion is easily obtained for any p elements from
where {bl' b2 , , b,,} and their representations (1 ::::;; p ::::;; n).
Hence. we have proved the following.
1::::;; j s n. (2)
Proposition 1. The number of linearly independent elements in the system
The matrix (1) is referred to as the (matrix) representation of the ordered {bto b2 , b,,} is equal to the dimension of the column space of its (matrix)
set {bit b2, ... , bIt} with respect to the basis {Ill' "2"", II",}. Obviously, the representation with respect to a fixed basis.
(matrix) representation of a set of vectors with respect to a basis is unique, In other words. if <eA is the column space of a matrix representation A
while representations of the same set in different bases are generally distinct. of the system {b 1 , b2 , , b,,} with respect to some basis, then
Exercise 1. Check (using Exercises 3.5.2and 3.5.3)that the matrices dim(span{b l b2 , , bIt}) = dim rcA = dim(Im A). (5)

A.- U!J and A.- U-!] The significance of this result liesin the fact that the problem of determining
thedimension of the linear hull of elements from an arbitrary finite-dimen-
sional space is reduced to that of finding the dimension of the column space
ofa matrix.
are representations ofthe system b l = [l 2 _l]T, s, = [0 1 2]T with Note also that formula (5) is valid for a representation ofthe system with
respect to the bases {'I' '2' '3} and {"I' a2' "3}' respectively. where respect to any basis for the space fI'. Consequently. all representations of the
III = [1 1 O]T,1I2 = [0 1 I]T. 113 = [1 0 I]T. 0
given system have column spaces of the same dimension.
As in Exercise 1 we see that any matrix can be referred to as the representa- The next proposition asserts a striking equality between the dimension
tion of its column vectors with respect to the standard basis. It turns out that ofthe column space <eA of a matrix A and its rank (as defined by the theory of
when A is a representation of b 1 b2 , , 6n with respect to some basis, there determinants in Section 2.8).
is a strong relationship between the dimension of the column (or row) Theorem 1. For any matrix A,
space of A, that is,dim(ImA)(see Section 3.3),and that of span {bit b2 .. . . b,,}.
We first observe that elements bit ... , bp (1 ::::;; p ::::;; n) are linearly depen- dim <eA ::;: rank A.
dent if and only if their representations rI.} = [IX1} IX2} IXlII,aT, hOOF. We need to show that the dimension ofthe column space is equal to
U::;: 1,2, ... , p) are linearly dependent. Indeed, the equality . the order ofthe largest nonvanishing minor of A e ,,,,x,, (seeTheorem 2.8.1).
For convenience,let us assume that such a minor is the determinant ofthe
(3)
submatrix of A lyingin the upper left corner of A. (Otherwise, by rearranging
implies. in view of (2), columns and rows of A we proceed to an equivalent matrix having this
property and the same rank.) Let the order of this submatrix be r. Consider
L PJ 1=LI OC/JIII
p
J= I
('" ) ::;: '"
1= I
('I
L L= I ocuPJ III = ) 0,
the (r + 1) x (r + 1) submatrix
all aIr I all

where {aili'= I is the underlying basis. Consequently, the linear independence a21 a2r I a2. I
of {a}T"I yields D"'1 IXI}P} ::;: 0 for i = 1.2... m. or. what is equivalent. I
I
ar r I ar
(4) I
----------,---
au atr I at.
Thus, the linear dependence of bit b 2 , b p implies that of rl. 1, rl.2 ,rl.p '
Conversely, starting with Eq. (4) and repeating the argument. we easily and observe that det A ::;: 0 for all possible choices of k and s,
3 UNBAR, EUCLIDEAN, AND UNITARY SPACIlS 3.8 SoMEPaOPER'n1lS RELATl!D TO RANK 95

Indeed, if k ~ r or s S; r, then the minor is zero since there is a repeated Theorem 2. For any b 1 b 2 , b; E.9,
row or column. If k > r or s > r, then the minor is zero since the rank of A dim(span{bi> b 2 , , bn}) = rank A, (8)
is r and any minor with order exceeding r is zero.
Now let Alj U = 1. 2, ... , r or j = s) denote the cofactors of the elements where A is a representation of b.. b 2 , , bn with respect to a basis in f/l.
of the last row of A. Expanding det Aby its last row, we obtain Exercise 2. Show that the representation of any system of m linearly
a1l1 AU + au Au + ... + at,A u + auAt = 0, 1 S; k ~ In. (6) independent elements from an m-dimensional space fI' with respect to
another basis for the space is an m x m nonsingular matrix.
Now observe that the numbers A"J (1 :S;;.i ~ r or j = s) are independent of
thechoiceofk,sowewriteAtj == AJ Therefore, writing (6) for k = 1.2... m. Hint. Use the result of Exercise 2.8.2. 0
we obtain It follows from Eq, (8) that the ranks of all matrices that are representations
of a fixed set {bi> ... , bn} with respect to different bases are equal. Thus (see
A. 1A 1 + .4. 2 A2 + ... + A.,A, + .4 A. = O. (7) Section 2.8), all these matrices are equivalent. This will be discussed in
where A"'J (1 S j S r or j = s) denotes the jth column of A considered as a Section 3.9 after some necessary preparations, and after a few important
vector from :F"'. But A. is a nonvanishing minor of A. Therefore Eq. (7) consequences of Theorem 1 are indicated in the next section.
shows the linear dependence of the columns A. 1 , A... 2 , , A." A and,
consequently, of all the columns of A.
Now observe that the columns A.J (; = 1,2, .. ,r) are linearly inde-
3.8 Some Properties of Matrices Related to Rank
pendent. Indeed. from
A.l
(X1 + (X2A"'2 + ... + ex,A"" = 0, exJ e:F, j = 1,2, ... r, Let A be a matrix from !/Fm><n. It follows from Exercise 2.8.2 and Corollary
it. follows lhat AOI = ~ where 01 = [OCI OC2 ... (X,]T and A = [alj]tJ 3 1' 3.7.1 that, if m = n and A is nonsingular, then rank A = n, Ker A = {O},
Since det A = A. 0, It follows by Theorem 2.9.2 that 01 = 0 and the linear and hence, when det A of. 0,
independence of {A", .. . . . , A",,} is established. Hence
rank A + dim(Ker A) = n.
dim(span{A", .. A",2"'" A",n}) = r = rank A This relation is, in fact, true for any m x n matrix.
and the assertion follows. Proposition 1. For any A E F" x n,
Rephrasing the proof of Theorem 1 with respect to the rows of A, it is rank A + dim(Ker A) = n. (1)
easily seen that hOOF. Let rank A = r and, without loss of generality, let it be assumed
dim !Jt A = rank A, that the r x r submatrix All in the left upper corner of A is nonsingular:!

where!JtA denotes the row space of A (see Section 3.3).


Observing that Ax = 0 for x :;: 0 is equivalent to the linear dependence . A = [~:: ~::l
of the columns of A and using Theorem 2.9.2, we obtain a corollary to where All E!/Fr><r and det All O. By virtue of Theorem 2.7.4, the given
Theorem I. matrix A can be reduced to the form
Corollary 1. If A E :Fn >< n, the equation Ax = 0 has a nontrivial solution if A = [~ ~] = P- 1 AQ- l, (2)
and only if A is singular.
where P and Q are invertible m x m andn x n matrices, respectively.
In conclusion, combining the assertions of Proposition 1 and Theorem 1,
we immediately obtain the following result, which was the first objective f Recall (see Section 2.8) that the rank is not changed under, in particular, the elementary

of this section. Operationsof interchanging columns or rows.


96 3 LINEAR,EUCLIDEAN, AND UNITARY SPACES 3.8 SoME PROPERTIES RELATED TO RANK 97

Note that r = rank A = rank A and observe that Ker A (being the linear Proposition 3. Any m x n matrix A from !F"'''" of rank r (r ~ 1) can be
hull of the unit vectors e,+I' e,+2, ... , en) has dimension n - r. written as a product
It remains now to show that the dimension of Ker.4, or dim(Ker p-I AQ -I)
(3)
is equal to the dimension of Ker A. In fact, if x e Ker A,then Q-I x e Ker A
since AQ-l =PA. Similarly, Q-1xeKerA implies xeKerA. Using th~ where PI efFmK', Ql e IF'''", and both matrices have rank r.
terminology introduced just before Exercise 3.5.6, we have Ker A = Q-l PROOF. Let two nonsingular matrices P and Q be defined as in Eq. (2). Writing
(Ker A) and the desired result follows from Exercise 3.5.6.
We say that A e,mK" is a matrix of jUli rank if rank A = min(m, n).
In view of Theorem 3.7.1 and the remark thereafter, it is obvious that A
and
Q =[~:J.
is of full rank if and only if its column vectors are linearly independent in'"' where P1e!F",Kr, P 2e!F",K(m-,), QlefF,K", Q2e!F("-')"", it is easy to
in the case m ~ n, while for m ~ n, its row vectors must be linearly inde- see that the blocks P2 and Q2 do not appear in the final result:
pendent in '". In particular, a square n x n matrix is of full rank if and only
if both systems of column and row vectors are linearly independent in '".
Appealing to Theorem 3.7.1 and Exercise 2.8.2, it is easily found that the
A = p[d ~]Q = P 11,Ql = PIQ1'
following statement is valid: The desired properties of PI and Ql are provided by Exercise 1.

Exercise 1. Let A be an n x n nonsingular matrix. Show that any submatrix Note that Eq. (3) is known as a rank decomposition of A.
of A consisting of r complete columns (or rows) of A (l :s; r :s; n) is of full Exercise 4. Show that any matrix of rank r can be written as a sum of r
rank. matrices of rank 1.

Exud se 2. Let rank A = r. Show that at least one r x r submatrix lying 1 Exercise S. If A and B are two matrices such that the matrix product AB
in the "strip" of r linearly independent columns (rows) of A is nonsingular, I is defined, prove that
rank(AB) :s; min(rank A, rank B).
Exercise 3. Use the Binet-Cauchy formula to show that, if A e em K" and
SoLUTION. Each column of AB is a .linear .combination of columns of A.
m :s; n, then det(AA*) > 0 if and only if A has full rank. (Compare with Hence rt AB c:; 'f/ A and the inequality rank (AB) :s; rank A follows from that
Exercise 2.5.2.) 0
of Exercise 3.5.5 and Theorem 3.7.1. Passing to the transpose and using
Exercise 2.8.5, the second inequality is easily established.
Proposition Z. For any A, Be !F"'''",
Exercise 6. Let A be a nonsingular matrix from fF""". Prove that for any
rank(A + B) :s; rank A + rank B. Be!F"K" and C e!F1 " " ,

PROOF. Consider the subspace .9'0 spanned by all the column vectors of both
rank(AB) = rank B, rank(CA) = rank C.
A and B. Clearly, dim go :s: rank A + rank B. On the other hand, the Hint. Use Exercise 3.5.6 and Theorem 3.7.1. 0
column space of A + B is contained in 9'0 and hence, in view of Theorem
3.7.1, The next statement provides an extension of the Binet-Cauchy theorem
(Theorem 2.5.1) for m > n.
rank(A + B) :s; dim 9'0' Proposition 4. Let A and B be m x nand n x m matrices, respectively.
11m> n, then det AB = O.
The assertion follows.
hOOF. It follows from Exercise 5 that rank (AB) :s; rank A. Since A has
The next proposition shows that any nonzero matrix can be represented only n columns, it is clear that rank A :s; n and therefore rank(AB) :s; n < m.
as a product of two matrices of full rank. But AB is an m x m matrix, hence Exercise 2.8.2 gives the required result.

2
98 3 LINEAR, EUCUDEAN, AND UNITARY SPACES 3.9 CHANGE OF BASIS AND TRANSmON MATRICES 99

3.9 Change of Basis arid Transition Matrices

Let!/' denote an m-dimensional space and let b E 9'. In this section we will
study the relationship between the representations pand P' of b with respect Therefore
to the bases 8 = {II.. "2' 11 m} and 8' = {II'I. a2... ' II;"} for!/'. For this
purpose. we first consider the representation P E [Fm x m of the system 8
b= f Pial = f (~ a.IJP1)aj
1=1 1=1 1=1
with respect to the basis 8'. Recall that by definition. Since the representation of an element with respect to a basis is unique
P A [lXlj]f.}= I (proposition 3.5.1), it follows that
m
provided Pl = L IXIJP}.
}=I
i = 1.2, ... , m.
m
IIj =L IXljai.
1=1
j = l.2... m. (1) Rewriting the last equality in matrix form, we finally obtain the following:
Theorem 1. Let b denote an elementfrom the m-dimensionallinearspace !/'
and that (by Exercise 3.7.2) the matrix P is nonsingular. over !F. If Pand p' (e [Fm) are the representations of b with respect to bases
The matrix P is referred to as the transition matrix from the basis 8 to I and 8' for 9', respectively. then
the basis 8'. Transition matrices are always square and nonsingular, The
uniqueness of P (with respect to the ordered pair of bases) follows from
P' = Pp, (2)
Proposition 3.5.1. where P is the transition matrtx from 8 to 8'.
Exercise 2. Check the relation (2) for the representations of the vector
Exercise 1. Check that the matrix
,,= [l 2 -1] with respect to the bases {e.. e2,e3} and {a.. a2,a3} defined
o1 0
1] in Exercise 3.7.1.
Exercise 3. Let the m x n matrices A and A' denote the representations
1 1. of the system {bl' b2 , , b,,} with respect to the bases 8 and 8' for !/'
is the transition matrix from the basis {al;"2, "3} defined in Exercise 3.7.1 to (dim!/' = m). respectively. Show that
the standard one. while its inverse. A' = PA, (3)

r.=+:_: -:]. where P is the transition matrix from 8 to 8'.


Hint. Apply Eq. (2) for each element of the system {b" b 2 , , b,,}. 0
In Exercise 3 we again see the equivalence of the representations of the
is the transition matrix from the standard basis to {al' az, a3}' 0 same.system with respect to different bases. (See the remark after Theorem
3.7.2.)
Now we are in a position to obtain the desired relation between the repre-
sentations of a vector in two different bases. Preserving the previous nota- Exercise 4. Check Eq. (3) for the matrices found in Exercise 1 in this
tion, the representations jJ and p' of b E 9' are written section and Exercise 3.7.1 0
The relation (2) is characteristic for the transition matrix in the following
sense:
where Proposition 1. Let Eq. (2) hold for the representations p and II' of every
m m element b /rom f/' (dim 9' = m) with respect to the fixed bases" and 8',
b= LI pjal = LI fJiaj.
j= }=
respectively. Then P is the transition matrixfrom 8 to 8'.
100 3 UNIlAll, EUCUDIlAN, AND UNITARY SPAC1lI1 3.10 SoLUTION OF EQUATIONS 101

hOOF. Let A' denote the representation of the elements of I with respect~* TIaeorem 1. The system (1) is solvable if andonly if
to I'. Since the representation of these elements with respect to I is the iden~;
rank[A b] = rank A.
tity matrix 1", and (2) for any b E 9' yields (3), it follows that A' = P. It re.", (2)
mains to note that the representation A' of the elements of I with respect, PROOF. If there is a solution x = [XI X;l " Xn]T E en of Eq. (1),
to I' coincides with the transition matrix from I to I'. '! then L7= 1 A.,x, = b, where A*l, ... , A. n stand for the columns of A. Hence
Note that the phrase ..every element b from 9''' in the proposition can be, II is a linear combination of the columns of A and, therefore, by virtue of
replaced by "each element of a basis I." Theorem 3.7.1, rank[A b] = rank A.
We have seen in Exercise 1 that the exchange of roles of bases yields the.! Conversely, if Eq, (2) holds, then b is a linear combination of the columns
transition from the matrix P to its inverse. More generally, consider three~ of A and the coefficients of this combination provide a solution of (1). This
bases {a,ll"=l' {aa~h and {aj}l"=l for9'. Denote by P (respectively, P/)th~: completes the proof.
transition matrix from the first basis to the second (respectively, from the'i Exercise 1. Observe that the matrix equation
second to the third).
~,
Proposition 1. If P" denotes the transition matrix from the first basis to:'
the thirdthen. in the previous notation, , Ax = [~-~ ~][::l [~121
1 4 -2 -7 X3
= = b

P" =p'P.
has no solution since rank A = 2 but rank[A b] = 3.
PROOF. In fact, for the representations I', Il', and If' of an arbitrarily cho.'
Exercise 2. Confirm the existence of solutions of the linear system
element b from 9' with respect to the above bases, we have (by using Eq. (2);
twice) .~ Xl + 2X;l = 1,

II" = P'P' = P/(PII) = pip II. ~' 2x1 - 3X;l + 7X3 = 2,


Hence, applying Proposition 1, the assertion follows. .- Xl + 4x z - 2X3 = 1. 0
CoroUary 1. If P is the transition matrixfrom the basis I to the basis Ii Note that Theorem 1 can be interpreted as saying that the system (1) is
consistent if and only if b belongs to the column space (range) of A.
then P- 1 is the transition matrixfrom lito I. .
. ~e next result allows us to describe the set of all solutions of (1) by com-
bining a particular solution with the solutions of the corresponding homo-
geneous matrix equation Ax = O.
Theorem 1. Let Xo be a particular solution of the matrix equation Ax = b.
3.10 Solution of Equations Then
(a) Ifx' E Ker A, the vector X o + x' is a solution of (1).
(b) For every solution x of(1) there exists a vector x' such that Ax ' =0
Consider the following system of m linear equations with n unkno
written in matrix form:
and" = "0 + x:
In other words, the solution set of (1) consists of vectors of the form
Ax = b, AeP"", "0 + x', where Xo is a particular solution of (1) and x' is some solution of
where, as before, :F may be read as e or R. Investigation of such a syst the corresponding homogeneous equation.
began in Section 2.9 and will continue here. Note that Eq. (1) is also refe hooF. For (a) we have
to as a matrix equation. .
We start with the question of consistency (or solvability) for the glv A(xo + x') = Axo + Ax' = b + 0 = b,
equations. In other words, we are interested in conditions under whi ';:d hence Xo + x' is a solution of (1). For part (b), if Ax = band Axo = b,
Eq, (1) is sensible, meaning that at least one solution exists. t en A(x - "0) = 0 and x' = x - Xo e Ker A. Thus, x = Xo + x',
102 3 I.JNEAR. EUCUDIlAN, AND UNITARY SPACES 110 SoLUTION OF EQUATIONS 103

Exercise 3. The vector Xo = [ -1 1 I]T is a solution vector of the SoLUTION. It is easily seen that the last two rows of A are linear combina-
system considered in Exercise 2. A solution vector of the homogeneous tions of the first two (linearly independent) rows. Hence rank A = 2. Since
equation is x' = [2 - 1 - l]T. Now it is easy to observe that any solution b is a linear combination of the first and the third columns of A, then rank
of the nonhomogeneous system is of the form Xo + rtx' (rt E IR). [A b] = 2 also, and the given equation is solvable by Theorem 1.Observe,
furthermore, that Xo = [l 0 - 1 0] is a particular solution of the system
CoroDary 1. If A is a square matrix, then the matrix equation Ax b has
and that Ax' = 0 has n - r = 2 linearly independent solutions. To find
:=
a unique solution if and only if A is nonsingular. them, first observe that the 2 x 2 minor in the left upper corner of A is not
PROOF. If Eq, (1) has a unique solution, then by Theorem 2 we have x' = 0 zero, and so we consider the first two equations of Ax' = 0:
and Ax = 0 has only the trivial solution. Hence, by Corollary 3.7.1, the
matrix A is nonsingular. The converse statement follows from Theorem
2.9.2.
[1-2][Xl] [1 1] [x;] [0]'
2 xi + 3 -1 X4, =
or
There are some other important consequences of Theorem 2. When
solutions exist, the system Ax = b has as many solutions as the correspond-
ing homogeneous system; and the number of solutions is either 1 or 00
in both cases. Since any solution vector of Ax = 0 belongs to Ker A, the
G-~J [:~J = [ =~:' ~ :~l
Hence, inverting the 2 x 2 matrix on the left, we have
dimension of Ker A provides the maximal number of linearly independent
solutions (as vectors of iF") of this homogeneous equation. The set of
solutions of a nonhomogeneous equation is not a linear space. The next [x~] = ![-6~; ++ 2X:] .
X2 -X3 3X4
theorem suggests that it should be regarded as a subspace (Ker A) shifted Write x; = u, X4, = v (to suggest that they will be free parameters), then
bodily from the origin through X o (a particular solution). we have Xl
=:l< -6u + 2v), xi =!( -u + 3v), x; = u, x4. = v.

Setting first v = 0 and u = 1, and then u = and v = 1, it is easily con-
cluded that [-! -i 1 O]T and H i 0 l]T are linearly independent
solution vectors of Ax = O. Thus, the general solution of the inhomogeneous
system can be described by
x = Xo + x' = [1 0 -lOr + u[ -! -! 1 O]T + v[! i 0 l]T
for any scalars u and v.
Note that the number of free (mutually independent) parameters in the
general solution is (as indicated by Theorem 3) just n - r = 2.
Exercise 6. Use Proposition 3.8.1 to show that if A E IF"'''", then Ax =0
implies x = 0 if and only if m.~ n and A has full rank.
Exercise 7. If A E fF"'X", show that the system Ax = b has a unique
Example 5. Describe the solution set of the system given in the matrix form I
solution if and only if m ~ n, A has full rank, and the equation is solvable.
I Exercise B. Consider the equation Ax = b, where A E IF m " " and b e fF"',

[ ~ -~ ~ -~] [::1 =[-~1=


and let M E fFP x '" be a matrix for which rank(MA) = rank A. Show that the
Ax= b.
Solution sets of Ax = band MAx = M b coincide.
3 -2 4 X3 -1
-1 -2 -2 2 X4 1 Hint. Show first that Ker A = Ker MA. 0
104 3 LINEAR, EUCLIDEAN, AND UNITARY SPACES 3.11 UNITARY AND EUCLIDEAN SPACES 105

3.11 Unitary and Euclidean Spaces (b) Let hI> hz, ... , hn be positive numbers and define another binary
operation on 9' by
n
We started Chapter 3 by formalizing some familiar concepts in ~3 and then (x, y) = L hIX;)i!. (3)
extending them to define a linear space. Now we go a step further in this 1=1
direction and generalize the important concept of the scalar product in Show that this is also an inner product on 9'. 0
~3 (see Section 1.3) to include a wide class of linear spaces. This allows us
to generalize the familiar notions of length, angle, and so on. The inner product possesses some additional properties that are simple
consequencesof the axioms 11-13.
Let f/ be a (finite-dimensional) linear space over the field ' of real or
complex numbers and let x, y e f/. A binary operation (x, y) from f/ x f/ Exercise 3. Check that
to ' is said to bean inner(or scalar) product on f/ if the following properties
are satisfied for all x, s. % e f/ and IX, Pe ': (a) (x, IXy + p%) = (x, y) + fJ(x, s),
11 (x, x) ;;:: 0 and (x, x) = 0 if and only if x = 0, (b) (x,O) = (0, x) = 0,
12 (IXX + py, %) = IX(X, s) + P(y, e),
13 (x, y) = (y, x) (where the bar denotes complex conjugate).
for any x, y, z e f/ and IX, Pe:F. 0

These three properties may be described as positivity, linearity in the first Linear spaces with inner products are important enough to justify their
argument, and antisymmetry, respectively. It is easily verified that they are own name: A complex (respectively, real) linear space 9' together with an
all satisfied by the familiar scalar product in R3 The next example is the inner product from 9' x f/ to e (respectively, R) is referred to as a unitary
natural extension of that case to en. (respectively, Euclidean) space.
To distinguish between unitary spaces generated by the same 9' but
Exercise 1. Confirm that the binary operation different inner products (x, y) and (x, y), we shall write'" 1 = 9'(, ) and
(x, y) = XIYI + x~Y~ + ... + xnYn = y*x, (1) tf/z = 9'( , ), respectively. Recall that any n-dimensional space over the
field of complex (or real) numbers can be considered a unitary (or Euclidean)
defined for any pair of elements space by applying an inner product of the type in Eq, (2), or of the type in
x = [XI X~ '.. Xn]T, Y = [YI y~ ... Yn]T Eq. (3), or of some other type.
from en, is a scalar product on en, 0
Exercise 4. Let dIJ = 9'( , ) be a unitary space and let 9'0 be a subspace
The inner product given by Bq, (1) will be called the standard innerproduct of f/'; Show that dlJ o = 9'0(, ) is also a unitary space. (When discussing
for en, it will be the one applied if no other is specified. The standard inner subspacesof unitary spaces, this property is generally tacitly understood.) 0
. product for ~3 also is defined by Eq, (1), where obviously the bars may be '
omitted. Thus, in this case the inner product is bilinear, that is, linear in Let dIJ denote the unitary space 9'( .' ) and let x e dIJ. The inner product
both arguments. (x, y) on 9' allows us to define the length (or norm) of the element x as the
The next exercise shows how an inner product can be defined on any number J(x, x), written [x]. Axiom I1 of the inner product says that
finite-dimensional space [/ over a field' in which complex conjugates are Ilxll ;;:: 0 and only the zero element has zero length. It should be noted that
defined. the length of the element x depends on the chosen inner product on f/ and
Exercise 2. (a) Let {at> "z, ... , an} be a basis for 9' over :F. Check. that that length in another unitary space 9'< , ) would be given by IIxlll =
the binary operation J<x, x).
n It is clear that if (, ) denotes the familiar scalar product in 1Il3, then the
(x, y) = L XCVI (2) length of x E R3(, ) defined above coincides with the classical Euclidean
1= I length of the vector. The next exercise can be seen as a generalization ofthis
is an inner product on 9', where x = :D'= I xlal and y = L7= 1 YI"I' statement.
3 LINEAR. EUCLIDEAN. AND UNITARY SPACES 3.12 ORTHOGONAL SYSTEMS 107

Exercise 5. Check that the length of the element x e !/ having the repre- it is defined uniquely since, in particular, cos 0 is not changed if x, yare
sentation [IXI a2 .. , an]T with respect to a basis {at> ... , an} for !/ is replaced by lXX, fJy for any positive IX, fJ.
given by
Exercise 7. Check that the angle 0 between the vectors [0 3 - 4 O]T
n
and (1 I 1 I]T from 1R4 is equal to 1t - arccos(O.I) (with respect to the
2
Jlxl/ = L rxjiik("j, "k) (4) standard inner product). 0
1.k= 1
in the standard inner product on!/. 0 The notion of angle is not usually considered in unitary spaces for which
the expression on the right side of (6) is complex-valued. Nevertheless,
An element x e dft with length (norm) 1 is referred to as a normalized extending some geometrical language from the Euclidean situation, we
element. It is evident that any nonzero element x can be normalized by say that nonzero elements x and y from a unitary (or Euclidean) space are
transition to the element AX, where A 1/lIxll. = orthogonal if and only if (x, y) = O. In particular, using the standard scalar
The following inequality, concerning the norms and inner product of product in en of Exercise 1, nonzero vectors x, y in en are orthogonal if and
any two elements in a unitary space, is known as the (generalized) Cauchy- only if y*x = x*y = O.
Schwarz inequalityand contains Exercise 2.5.4 as a special case (in IRn) .
Exercise 8. Check that any two unit vectors ej and ej (1 S i < j S n) in C"
Theorem I. If x, yare members ofa unitaryspace, then (or IR") are orthogonal in the standard inner product.
I(x, y)1 s IIxllllYll. (5) Exercise 9. (Pythagoras's Theorem) Prove that if x and y are orthogonal
=
PROOF. Let rx = -(y, X), fJ = (X, X), and % rxx + fJy. To prove the equiv- members of a unitary space, then
alent inequality lal2 S P(y, y), we compute, using 12, (7)
(%, %) :::: rx(x, rxx + fJy) + /1(y, rxx + py). Note that if x, y denote position vectors in 1R , then Ilx - yll is the length
3

Appealing to 11 and Exercise 3(a), we obtain of the third side of the triangle determined by x and y. Thus, the customary
Pythagoras's theorem follows from Eq. (7). The next result is sometimes
+ ap(x, y) + ii./1(y, x) + I/W(y, y).
Os IrxI 2 (x, x) described as Appollonius's theorem or as the parallelogram theorem.
Recalling the definitions of rx and P. this reduces to Exercise 10 Prove that for any x, y E dft,
o s P(_lcxI 2 + PCy,y. IIx + Yll2 + llx _ yll2 = 2(lIx1l 2 + IIYIl2)
If P = 0, then x = 0 and (5) is trivially true. If P > 0, then Icxl 2
S P(y, y). and interpret this result for 1R 3
Exereise 6. Check that equality holds in (5) for nonzero x and y if and only Exercise 11 Let f/ be the complex linear space em"n (see Exercise 3.l.3(b.
= .1.y for some .1. E C. 0
if x Show that the binary operation defined on 9' by
Proceeding to the definition of the angle f) between two nonzero elements (X, Y) = tr(XY*),
x and y ofa Euclidean space 8, we first recall that for 8 = 1R 3 with the familiar
scalar product, for all X, Y E .9; defines an inner product on [I' (see Exercise 1.8.3). 0

(x,y)
cos 0 = IIxIlIlYII' os 0S 1t. (6)
3.12 Orthogonal Systems
Generalizing this, we say that the angle (J between nonzero x and y fr?JU
the Euclidean space 8 = 9'(, ) is given by Eq. (6). Note that the expression Any set ofnonzero elements from a unitary space is said to be an orthogonal
on the right in (6) is, because of the Cauchy-Schwarz inequality, less than set, or system, if any two members of the set are orthogonal, or if the set
or equal to I in absolute value and therefore the cosine exists. Furthermore, COnsists of only one element.
108 3 -LINEAR. EUCLIDEAN, AND UNITARY SPACES 3.12 ORTHOGONAL SYSTEMS 109

Exercise 1. Show that an orthogonal set in a unitary space is necessarily as is the inner product of any two elements. The next exercises give the
linearly independent. 0 details.
An orthogonal system in which each member is normalized is described Exercise 3. Let {",," 2 , , "n} be an orthonormal basis for the unitary
as an orthonormal system. Clearly, any subset of the set of unit vectors in space <t and let X = L7=1 OCi"i be the decomposition of an X E <t in this basis.
en (or IRn) with the standard inner product constitutes the prime example of Showthat the expansion coefficients OCi are given byoc, = (x, ",), i = 1,2, ... , n.
an orthonormal system.
Exercise 4. Let {ai, "2' ... , an} be a basis for the unitary space <t and let
Consider the possibility of generating a converse statement to Exercise 1.
n
Given a linearly independent set {X1o X2' , xr } in a unitary space, can we
construct an orthogonal set from them in the subspace span{xl' ... , x r } ? X = L oc,ai'
i= I
The answer is affirmative and, once such a system is obtained, an ortho-
normal system can be found simply by normalizing each element of the Prove that (x, y) = L1=, (Xi Pi for any x, ,)' E <t ifand only ifthe basis {ai}7= 1
is orthonormal. 0
orthogonal set. One procedure for completing this process is known as the
Gram-Schmidt orthoqonalizauon process, which we now describe. Recall that any system of linearly independent elements in a space can be
Let {Xl' . , xr } be a linearly independent set in a unitary space <t. We extended to a basis for that space (Proposition 3.5.2) that can then be
shall construct an orthogonal set {y.. Y2' .. , Yr} with the property that for orthogonalized and normalized. Hence the result of the next exercise is to
k = 1,2, ... , r, be expected.
span{y" ... , Yt} = span{x" ... , Xt}. (1) Exercise S. Prove that any orthonormal system of elements from a unitary
First assign,)' 1 = x, and seek an element Y2 E span {x" X 2} that is orthogonal space <t can be extended to an orthonormal basis for <t.
to ,)'1' Write Y2 = OCl,)', + X2 and determine a, in such a way that (,)'10 ,)'2) = o. Exercise 6. (Bessel's inequality) Let {al> a2' ... , ar } be an orthonormal
It is easily seen that OCl = -(,)'10 X2)/(,)'1o ,)'1)' Note that the linear indepen- basis for ~ 0 = S"o( , ) and let x belong to <t = g( , ). Prove that if
dence of Xl and X2 ensures that,)'2 #: 0; Now the process is complete ifr = 2. 9'0 c s; then
More generally, an induction argument establishes the result, as indicated r

in the next exercise. L 1(x, "i)1 2 s


i=l
(x, x), (2)
Exercise 2. Let x" X2' , X r be linearly independent elements of a unitary
space <t = g( , ). Prove that a set {Y" ... , ,)'r} is orthogonal and satisfies with equality (Parseval's equality) if and only if x E go'
Eq. (1) if Y, = x, and, for p = 2,3, ... , r. Hint. Use Exercises 3-5, and recall the comment on Exercise 3.11.4. 0
p-l (YJ' xp ) It is instructive to consider another approach to the orthogonalization
yp = x p - L -(--) Yi' 0
j=l Yj' Yj problem. Once again a linearly independent set {x" X2' , x,} is given
It is quite clear that the procedure we have just described can be used to and we try to construct an orthogonal set {Y1o ... ' ,)'r} with property (1)
generate a basis of orthonormal vectors for a unitary space, that is, an We again take j , = x" and for p = 2,3, ... , r we write -
orthonormal basis. In fact, we have the following important result. p-l

Theorem 1. Every finite-dimensional unitary space has anorthonormal basis.


L (Xjpxi>
s, = x p + j=l
PROOF. Starting with any basis for the space <t, an orthogonal set of n I Where (Xlp' ' (Xp-l.p are coefficients to be determined. We then impose
the orthogonality conditions in the form (x/, Yp ) = 0 for i = 1,2, ... , p - 1,
elements where n = dim <t can be obtained by the Gram-Schmidt process. \
Normalizing the set and using Exercises 1 and 3.5.4, an orthonormal basis . . . and note that these conditions also ensure the orthogonality of yp with
for tJ is constructed. . 1 ":',,J
)'h ,'p _
1. The result is a set of equations for the (;l'S tha; can be written
as one matrix equation:
Compared to other bases, orthonormal bases have the great advantage ' il

that the representation of an arbitrary element from the space is easily found, ,i,r (3)
:;~~, i
":~':',l

11

.L
=.'k .
~;:
110 3 LINIlAR, EUCLIDEAN,AND UNITARY SPACES 3.13 ORTHOGONAL SUBSPACES III

wheree, = [O(lp CX2p CXP_I,p]T, p P= [(XI' xp) Exereise 7. Let A E em" n with m ~ n and let rank A = n. Confirm the
the coefficient matrix is representation ofA = QR, where R is an n x n nonsingular upper triangular
matrix, Q E em"", and Q*Q = In (i,e., the columns of Q form an ortho-
G= [(Xl' Xjm.j~I' (4) normal system).
Thus, the orthogonalization process can be continued with the calculation
of unique elements 12' ... , 1r if and only if eq. (3) has a unique solution Hint. If %p = L~= 1 t}PIA*j (p = 1, 2, ... , n) are the orthonormal vectors
constructed from the columns A",j of A, show that Q = [%1 %2 , z,,]
vector Cltp for each p. We know from Corollary 3.10.1 that this is the case if
and R = ([tjp]j.p= 1)-1, where tjp = t}PI, satisfy the required conditions. 0
and only if matrix G is nonsingular. We are about to show that, because of
the special form of G illustrated in Eq, (4), this follows immediately from the Observe that when the QR decomposition of a square matrix A is known,
linear independence of the x's. the equation Ax = b is then equivalent to the equation Rx = Q*b in which
We tum now to the special structureofG. More generally, let x.. X2,'" .s; the coefficient matrix is triangular. Thus, in some circumstances the QR
be any set of elements from a unitary space lft = [/( , ). The r x r matrix decomposition is the basis of a useful equation-solving algorithm.
G = [(Xj!Xk)]J.k= I Exercise B. Let {Ul' ... , un} be a basis for a unitary space lft and let" E 'fl.
is said to be the Gram matrix of {Xl>"" x r} and its determinant, written Show that "is uniquely determined by the n numbers (e, u),j = 1,2, ... , n. 0
det G = g(x .. . . . , x r ) is the Gramian or Gram determinant of {XI' . , x r } .
Our experience with the Gram-Schmidt process suggests that the Gramian
is nonzero provided the x's are linearly independent. This is the case, but 3.13 Orthogonal Subspaces
more is true.

TIt.eorem 2. (Gram's criterion) Elements x, from a unitary space


Xl> ,
Let [/1 and 9i be two subsets of elements from the unitary space 'fI = [/
are linearly independent, or linearly dependent, according as their Gramian (, ). We say that [/1 and [/2 are (mutually) orthogonal, written [/1 .1. [/2'
g(x It . x r) is positive or zero. if and only if (x, y) = 0 for every x E 9';, y E 9i.

PROOF. Observe first that the Gram matrix is Hermitian and, consequently, Exercise 1. Prove that two nontrivial subspaces [/1 and [/2 of 'fI are
the Gramian is real (Exercise 2.2.6). orthogonal if. and only if each basis element of one of the subspaces is
Taking advantage ofTheorem Llet Y.. 12' ... , Yn be an orthonormal basis orthogonal to all basis elements of the second.
for the space and for j = 1,2, ... , r. let Xj = L:=I
O(jk1t. Define the r x n
Bxercise 2. Show that (x, y) = 0 for all y E 'fI implies x = O.
matrix A = [CXjk] and note that A has full rank if and only if XI' . , x, are
linearly independent (see Theorem 3.7.2). Bxercise 3. Let [/0 be a subspace of the unitary space lft. Show that the
Now observe that set of all elements of lft orthogonal to [/0 is a subspace of lft. 0
n n

(Xj,X'k) = ".~1 O(j"liko(y",y,,) = "~I CXj"lit " " ,\


The subspace defined in Exercise 3 is said to be the complementary ortho-
gonal subspace of [/0 in 'fI and is written [/~. Note that Exercise 2 shows that
where we have used the fact that (y", y,,) = t5"", the Kronecker delta. Now it '. 1 1ft! = {O} and {O}.1 = fill (in e:rt).
is easily verified 'that the Gram matrix G = [(Xj, xJ] = AA* and, using <I
Exercise 3.8.3. the result follows. ,J Exercise 4. Confirm the following properties ofcomplementary orthogonal
subspaces in lft:
The Gram-Schmidt process leads to the following decomposition of >1'
matrices offull rank, which should be compared with the rank decomposition :. (a) ([/~).1 = [/0'
of proposition 3.8.3. This result is of considerable importance in numerical ,l;! (b) If f/1 c: [/2' then [/t ::> f/t,
work. Surprisingly, it forms the first step in a useful technique for computing ':1 (e) (f/1 + f/2 l = f/t ('\ f/r,
(d) (f/1 ('\ f/2 ).1 = f/t + f/r. 0
112 3 LINEAR, EUCLIDEAN, AND UNITARY SPACES 3.14 MISCELLANEOUS EXERCISES 113

The sum of a subspace f'o in '" and its complementary orthogonal sub- Two systems of elements {XltX2 ..... X"} and {YI.YZ,""x,,} in a
space f'i is of special interest. Note first that this sum is direct: f'o n f'i == unitary space are said to be btorthoqonai if (XI. YJ) = Olj (1 s: i, j s: k).
+
to}, so that f'o f't is a direct sum of mutually orthogonal subspaces, Exercise 9. Prove that if {xjH=1 and {Y}'=l generate a biorthogonal
that is, an orthogonalsum. More generally, the direct sum of k linear subspaces system, then each system consists of linearly independent elements.
f'1 (i = 1,2, ... , k) of <I is said to be the orthogonal sum of the subspaces
if f'1 .l f'J for i #0 j (1 s: i, j s: k) and is indicated by f'1 ~ f'z Ea . Ea f'" Exercise 10. Show that if {xil7=1 and {Yj}j= 1 are biorthogonal bases for
or, briefly, byL'= 1 Ea f'j' Note the contrast with f'1 + +... +
f'z f'" used 'fl. then (span{xlt .... x,,}).!. =
span{YHlt YHZ.... , Y.} for any positive
for a customary (nonorthogonal) direct sum. integer k < n.

Exercise S. Let til be a unitary space of dimension n. Show that for a sub- Exercise 11. For biorthogonal systems {x;H= I and {Yj}j=l in 'I, show that
space f'o of til,
(x. x) = L" (x, XI)(Y;. x)
til = f'o Ea f'i (1) 1=.1

and in particular, dim f'o + dim f'i = n. 0 for any x E 'I if and only if the system {xili= I consistutes a basis in 'I.
Deduce the Parseval equality (Eq. (3.12.2 from this.
In view of Eq. (1), each element x of 'I can be written uniquely in the form
Exercise 12. Let <I be a unitary space.
x=x1+X Z , (2)
(a) (cf. Ex. 3.11.9). Let x, Y E 'I. Show that
where x I E f'o and x I .L Xz. The representation (2) reminds one of the de-
composition of a vector in R' into the sum of two mutually orthogonal IIx + yli Z = IIxll z + lIyII 2 (3)
vectors. Hence, the element XI in Eq. (2) is called the orthogonal projection if and only if 9"t.e(x. y) = O.
of x onto f'o. (b) In contrast. let [!", qy be subspaces of 'I, and show that these sub-
Exercise 6. Show that if'" is a unitary space, spaces are orthogonal if and only if Eq. (3) holdsforall x E [!" and Y E dJ/.O

til = L"
j=1
Ea 91,
where 9j = span{aEl, 1 s: is: n, and {"I' "z, ... , "II}' is an orthonormal 3.14 Miscellaneous Exercises
basis for til.
Exercise 7. Let 1. Find a basis and the dimension of the subspace f'l spanned by a l =
[1 0 2 -1]T'"2 = [0 -1 2 O]T, and a, = [2-1 6 -2]T.
Answer. The set {al' a z} is a basis for f'I' and dim f'1 = 2.
and let x = D= I XI' where XI E 91 (1 s: i s: k). Prove that 2. Let
6 1 = (l -1 = [l
= L"
4 _1]T. 6z 0 0 I]T, 6, = [-1 -2 2 I]T.
(x, x) (Xj,XI);
j=1 and let f'z = span{6 1 6 z 6 3 } . If f'1 is defined as in Exercise 1, show that
compare this with Parseval's equality, Eq. (3.12.2). 0 dim(f'1 n f'z) = 2, dim(f'1 + f'z) = 3.
3. Let f'1' f'2 be subspaces ofthe linear space f' of dimension n. Show that
A converse statement to Exercise 7 is also true in the following sense.
. if dim f'1 + dim f'l > n. then f'1 n f'z #0 to}.
Exercise 8. Check that if (L~= 1Xl> D= I XI) = D=
dXi> x;) for any
4. Two linear spaces f' and f'1 over :IF are isomorphic if and only if there is
elements Xl E 91, 1 s: i s; n, then the sum D=
I 91 is orthogonal. 0
a one-to-one correspondence x +-+ XI between the elements x e f' and

[

~.n

114 3 L.INEAR, EUCLIDEAN, AND UNITARYSPACI!S 3.14 MISCELLANEOUS EXERCISES 115

XI E .91 such that if X+-+Xl and yHYl> then X+Y+-+Xl +Yl and 10. Prove the Fredholm theorem: The equation Ax = b (A E em"") is
(Xx +-+ (Xx 1 (y E.?, Y 1 e 9'1' oc e F). Prove that two finite-dimensional solvable if and only if the vector b E em is orthogonal to all solutions of
spaces are isomorphic if and only if they are of the same dimension. the homogeneous equation A *y = O.
(The correspondence, or mapping, defining isomorphic linear spaces
is called an isomorphism.) Hint. Use the result of Exercise 8.
U. Check that the rank of a symmetric (or skew-symmetric) matrix is
Hint. Consider first a mapping of linearly independent elements from equal to the order of its largest nonzero principal minor. In particular,
9' onto a system of linearly independent elements in fIJI'
show that the rank of a skew-symmetric matrix cannot beodd.
5. Show that if all the row sums of a matrix A e e" x m are zeros, then A is 12. Show that the transition matrix P n from one orthonormal basis to
singular. another is unitary, that is, p:p" = In.
Hint. Observe that Ax = 0 for x = [l 1 . . . IJT. Hint. Observe that the columns of P viewedas elements in r generate
6. Check that, for any n x n matrices A, B, an orthonormal system (with respect to the standard inner product).

rank(AB) ~ rank A +.rank B - n. 0


In Exercises 7 and 8, let (, ) denote the standard inner product on en.
7. If A e e"X", prove that if (x, Ay) = 0 for all x, y e en, then A = O.
8. (a) If A e en,,", show that (x, Ay) = (A*x, y) for all x, y e en.
(b) If A is an n x n matrix and (x, Ay) = (Bx, y) for all x, y e en,
then B = A*.
9. Let 9' denote the space of all real polynomials of degree not exceeding
n, with an inner product on 9' defined by

(P, Q) A f I P(x)Q(x) dx P, Qe.9:

(a) Verify that this is indeed an inner product on .9:


(b) Check that the Cauchy-Schwartz inequality, Eq. (3.11.5), be-
comes

If I P(x)Q(x) dx I~ [f I P2(X) dxT/2 [r I Q2(X)dx T


'2

(c) Check that the Gram-Schmidt orthogonalizatio.n process applied


to the basis 1, x, x 2 , X' of 9' leads to the polynomials oc"L,,(x) (k =
0, 1, ... , n), where
2
1 d"(x - 1)"
L,,(x) = 2"k! dx" (the Legendre polynomials)

and oc" denote some scalars.

-_...._-----_.,
4.1 LINEAR TRANSFORMATIONS 117

4.1 Linear Transformations

CHAPTER 4
Let and 9'2 denote linear spaces over a field :F. A transformation
9'1
T: 9'1 is said to be a linear transformation if, for any two elements
-. 9'2
Xl and X2 from 9'1 and any scalar IX e /F,
Linear Transformations T(XI + X2) = T(Xl) + T(X2)'
(1)
and Matrices T(OCX1) = OCT(Xl),
that is, if T is additive and homogeneous.
Note that by assuming 9'1 to be a linear space over /F we guarantee
that the elements Xl + X2 and IXXI also belong to the space 9'1 on which
T is defined. Furthermore, since 9'2 is a linear space over fF., the ima$es
T(Xl + X2) and T(lXxt) are elements of 9'2 provided T(Xl) and T(X2) are.
An isomorphism between two linear spaces of the same dimension (see
Exercise 3.14.4) is an example of a linear transformation of a special kind.
Other examples are presented below.
Example 1. Consider the two-dimensional space R2 of position vectors
and let 9'1 = 9'2 = R2 If T denotes a transformation of this space into itself
such that each vector x is mapped (see Fig. 4.1) into a collinear vector ocox,
The notion of a function of a real variable as a map of one set of real where OCo e /F is fixed (oco =I:- 0),
numbers into another can be extended in a natural way to functions defined
on arbitrary sets.Thus, if F 1 and F 2 denote any two nonempty sets ofelements,
T(x) = lXoX,
a rule T that assigns to each element x e F 1 some unique element y e F 2 . then it is easily seen that T is a linear transformation of R2 onto itself.
is called a t~ansformation (or operator, or mapping) of F 1 into F 2' and we Indeed,
write T: F l ~ F 2 or just T: F l -. F 2. The set F 1 is the domain'of T. If T Ttx, + x~) = IXO(XI + X2) = IXOXI + lXoX2 = T(Xt) + T(X2),
assigns y e F 2 to x e F l' then y is said to be the image of x under T and is
denoted y == T(x). The totality of all images of the elements of F 1 under T and for alloc e R,
is written
T(ocx) = OCoCXX = cx(ocox) = ocT(x).
T(F 1 ) = {yeF 2 : y == T(x) for some xeF 1 } Obviously, for any y E 1R 2 there is an x, namely, x = (1/IXo)Y, such that
and is referred to as the image of the set F 1 under the transformation T or T(x) = y and hence T:1R 2 ~ 1R 2
as the range of T.
Generally, T(F 1) is a subset of F 2' In the case T(F 1) = F 2, we say that
T maps (or transforms) F 1 onto F 2; we shall indicate that by writing
T: F 1 ~ F2 Obviously, T: F 1 ~ T(F 1 ) . In the important case F 1 =

I
F 2 == F we also say that the transformation T acts on F.
In this book our main concern is transformations with the property of
"linearity," which will be discussed next. Many properties of matrices
become vivid and clear if they are considered from the viewpoint of linear
transformations. Fig. 4.1 Collinear mapping.

116
118 4 LINEARTRANSfORMATIONS ANDMATJUCIlS 4.1 LINEARTRANSFORMATIONS 119

finite-dimensional spaces can be represented by a matrix together with


matrix-vector multiplication. Before presenting one more example, note that
the transformation of Example 1 is determined by the scalar matrix A = 01
in the sense of Example 3.
Example 4. Let T be the transformation mapping the linear space P" of
polynomials over 91' with degree not exceeding n into pn-. according to
Fig.4.1 Rotation. the rule
T(P,,(x = p~(x).
Example 2. Let Tbe the transformation of anticlockwise rotation of vectors That is, T maps a polynomial in P" onto its derivative (a polynomial in
(see Fig. 4.2) from R 2 through a fixed angle y (0 < y < 2ft). If x = (x .. X2), pn-I).
then it is easily seen that the resulting vector is r = (X'i> X2), where Recalling the appropriate properties of derivatives, it is easily seen that
Xl = (cos y)x. - (sin Y)X2' the conditions in (1) hold, so the transformation is linear. Identifying the
polynomial p,,(x) = I:1=o a/xi with the vector p = [ao a. ... a,,]T, we
Xz = (sin y)x 1 + (cos Y)X2' can think of the transformation as a mapping T from fF"+ onto fF", given
Using matrix notation, we can write by the rule

[c~s y -sin y] [Xl],


Xl]
[ Xz
=
SID Y cos Y X2 T(p) = [~0 ~.. .~. ::'.0 ~]p =
n
Pl>
or, representing x and x' as column vectors,
T(x) = x' = Ax, where the matrix is n x (n + 1) and p. = [a. 2a2 . na,,]T. 0
where Let us denote by ftJ(9'.. 9'2) the set of all linear transformations mapping
A = [c~s Y
sin Y
-sin "J.
cos',y
the linear space 9'. into the linear space 9'2' Both spaces are assumed
throughout the book to be finite-dimensional and over the same field !F.
When 9'. = 9'2 = .9, we write ftJ(9') instead of ftJ(9; 9').
N ow the linearity of the transformation T easily follows from the appropriate Note that, combining the conditions in (I), a transformation T e ftJ(9'l> 9'2)
properties of matrices (see Section 1.2): can be characterized by a single condition:
T(x. + X2) = A(x. + X2) = Ax. + AX2 = T(Xl) + T(X2), (2)
T(lXx) = A(lXx) = IX(Ax) = IXT(x).
for any elements Xl> X2 from 9i and IX, fJ E!F. This implies a more general
Example 3. Let TA denote a transformation of the n-dimensional space F" relation:
into 91'"' given by the rule
T...(x) = Ax,
where A is a fixed m x n matrix:' Obviously
x e 91"',
1A is linear (see Section 1.2 and
T( i i=1
IXiXi)
= 1=. /Xi T(XJ,
Example 2). 0 Where X/E 9; OCi E fF (i = 1,2, ... , k). In particular,
Thus, any transformation from 91''' into 1 m determined by an m x n T(O) = 0,
matrix, together with matrix-vector multiplication, is linear. It is remark~ble
T( -x) = - T(x),
that, in an appropriate sense, any linear transformation can be described
in this way. It will be shown later that any linear transformation between any T(x. - X2) = T(xt) - T(X2)'
J20 4 UNBAR TRANSFORMATIONS AND MATRlCIlS 4.1 LINEAR TRANSFORMATIONS 121

Exercise 5. Let T E ~(9'h 9'2)' Show that if the elements Xi E 9'1 (i = Consider three linear spaces 9'1' 9'2' and 9'3 over the field" and let
I, 2, ... , k) are linearly dependent, so aretheir images T(Xi) (i = 1,2, .. , k). TI e ~(9'I' 9'2), T2 e ~(9'2' 9'3)' If x e 9'10 then T1 maps x onto an element
o ye 9'2 and, subsequently, T2 maps y onto an element Z E 9;. Hence a suc-
Exercise 5 shows that a linear transformation carries a linearly dependent cessive application of T1 and T2 performs ~ transformation of 9'1 into 9' .
system into a linear dependent one. However, a linear transformation may This transformation T is referred to as the composition of T2 and Tit a:d
also transform linearly independent elements into a linearly dependent set. we write T = T2 TI The formal definition is .
For example, the transformation mapping every element of 9'1 onto the
zero element of 9'2' (3)

Example 6. The transformation T E ~(1R4, 1R3) defined by for all x e .9'1-

T ([ ~m~ =~ : ::][:
maps the linearly independent vectors [ -1 0 I. IYand [0 0 0 I]T
Exercise 9. Show that T given by Eq, (3) is a linear transformation from
9'1 to 9'3' that is, T e !R(9'it 9'3)' 0

Similarly, a composition of several linear transformations between ap-


propriate spaces over the same field, T = 44-1'" T1 , is defined by the
rule
into the (linearly dependent) vectors 0 and [0 1 O]T, respectively. 0
The rest of this section will focus on the algebraic properties of the set
T(x) 4('lk-I(' .. (Tz(T1(x .. for all x e 9'.

!JI(f/I' f/2)' It turns out that !JI(f/l> f/2) is itself a linear space over !F if In particular, if T E !R(9'), T" denotes the composition ofthe transformation
addition and scalar multiplication of linear transformations are defined as T with itself k times.
follows:
For any Tit T2 e ~(9'h 9'2) and any x e 9'10 Exercise 10. Check the following properties of composition of linear
transformations:
(T1 + T2)(x) A T 1(x) + T2{x).
(a) T1(Tz T3 ) = (T1T2.)T3 ;
For any Te ft'(9'1o 9'2)' (l e" and x e 9'1' (b) a(TI T2 ) = (aT1)T2 = T 1(aT2.);
(ocT)(x) A T(x). (c) T1(T2. + T3 ) = T 1T2. + T1T3 ;
(d) (T1 + T2.)T3 = T 1 T3 + T2. T3
Note that two transformations TI and T2 from ~(9'I' 9'2) are equal or
= T2(x)for every x e 9'1' Obviously, the zero transformation
identical if T1(x) Hint. Consider the image of an element under the transformations on both
o such that O(x) = 0 for all x e 9'1 plays the role of the zero element in sides of the equations,
!R(9'I' 9'2)'
Exercise 7. Check that, with respect to the operations introduced, the set Exercise 11. Show that if transformations are defined by matrices as in
!R(9'I' 9'2) is a linear space over fF.
Example 3, then the properties of compositions of transformations in
Exercise10 are equivalent to the corresponding properties of matrices under
Exercise B. Let {X1o X2' ... , x n} and {Yh Y2' ... , Ym} denote bases in ~ matrix multiplication. 0
and 9'2' respectively. The transformations T;j E ~(9'1' 9'2), i = I, 2, ... , n
and j = I, 2, .. , In, are defined by the rule Consider the linear space !R(9'). The operation of composition defined
above, with properties indicated in Exercise 10, is an additional operation
TiJCXI) = Yj and TiJ{x,,) = 0 if k :F i, define<l on the elements of !R(9'). A linear space with such an operation is
then Til) is defined for any x e 9'1 by linearity. Show that the 1i/s constitute r~ferred to as an algebra. Thus, we may describe ~(9') as an algebra of
a basis in !JI(f/I f/2 ). l n particular, dim !/(f/I, f/2) = mn. 0 linear transformations acting on 9'. The identity transformation 1 defined

122 4 LINEARTRANSFORMATJONS AND MATRICES 4.2 MATRIX REPRESENTATION 123

by lex) A x for all x e.9 has the property that IT = TI = T for all Exercise 1. Find the matrix representation of the transformation

:J
T e !t'(.9). T e Z(9'3, 9'4) such that
It was shown in Exercise 3.1.3(b) that jr;;:nx" is a linear space. The space

(Em -r::; ~
:J1;'''x" together with the operation of matrix multiplication is another
example of an algebra. T x [x, x, x,y e ',
Polynomials in a transformation are defined as for those in square matrices
(see Section 1.7).
with respect to the pair of standard bases.
Exercise 12. Show that for any T e !t'(.9) there is a nonzero scalar poly- SOLUTION. Computing T(e),j = 1,2,3, where tS = {el> ez, e3} denotes the
nomial peA) = Ll= 0 PIAl with coefficientsfrom such that p(T) A :D=o
Pi T i standard basis in 3, we obtain the system of elements

n ~ Ul m
is the zero transformation.

SoLUTION. Since (by Exercise 8) the space fR(.9) is finite-dimensional, there


is a sufficiently large positive integer I such that the transformations 1, T(,,) - T(,,) T(,,) -
T, T 2, ... , T I viewed as elements of fR(.9) are linearly dependent. Hence
there are scalars Ph P2, ... , P, from , not all zero, such that LI=o
PiT I = O.
For Y = [YI Y2 Y3 Y4F and the standard basis tD = {e'h e~, e3, e4} in
o 9'4, we have Y = 2:t=l Yie;. Hence the representation of Y is the vector
[Yt Y2 Y3 Y4]T itself, and the required representation of T is the matrix

4.2 Matrix Representation of Linear Transformations 01 1


10 11
A = AI,,, = 1 0 0 . 0
[
1 -1 0
Let T e !t'(.91, .9z), where dim.91 == n and dim.92 = m. We have seen
The representation of a linear transformation T e fR(9''', :J1;'m) with respect
in Example 4.1.3 that, in the particular case .91 = n and .92 = m, anr
to the standard bases of the two spaces is referred to as the standard (matrix)
m x n matrix defines a transformation T e !t'(9'n, m). Now we show that representationof T.
any transformation T e !t'(.9I , .9z ) is associated with a set of m x n matrices. In viewof Exercise 3.7.1, it is clear that the transition from one basis in.92
Let ~ = {xl,xz,''''x,,} and tD = {YI'Yz,""Y"'} be any bases in the to another changes the representation of the system {T(x) }j= I ' There is
spaces !Ii. and 92 over , respectively. Evaluating the images T(xj)
also a change of the representation when a basis in .91 is replaced by another.
U = 1, 2, ... , n) and finding the representation of the system {T(xt),
T(X2), ... , T(x,,)} with respect to the basis tD (see Section 3.7), we obtain Exercise 2. For the same Te !t'(:J1;'3, :J1;'4) defined in Exercise 1, verify that
the representation with respect to the bases

T(xj) =
'" aijYI'
L
1=1
j = 1,2, ... , n.

The matrix A,." = [aij]~j~ I e '" x" is defined by this relation and is said
(1)
x,-[J x,-LH x,-(!]
to be the (matrix) representation of T with respect to the bases (~, tD). Note for :J1;'3, and
that the columns of the representation A, " of T are just the coordinates of
T(Xl), T(X2), ... , T(x,,) with respect to ihe basis tD and that~ as indicated
in Section 3.7, the representation of the system {T(xj)}j=t with respect to
the basis tD is unique. Hence, also fixing the basis ~, the (matrix) representa-
tion of T with respect to the pair (~, tD) is uniquely determined.

-
__ ~ __ ~~ ... " ...... -'U'... mA.IlIUN~ AND MATRICES 4.2 MATRIX REPRESENTATION 125

for j :::: 1,2, ... , nand k = 1, 2. Hence for any (X E fF,

!l -~ :].
III

(XT,,(Xj):::: ~ (Xal~)Yi>
A'= 1=1
III

o 1-1 (TI + T2)(xj) = ~ (alJl + aW)YI'


;;0;; 1

Hint. Calculate T(xj ) U = 1, 2, 3). The solution ofthejth system oftype (1) Thus, if TI - A I and T2 - A 2 , then, for (X E fF,
(withm = 4) gives thejth column of A'. 0
(2)
Thus, the same linear transformation generally has different matrix
representations in distinct pairs of bases. It is natural to ask which properties (3)
all of these matrices have in common; this will be our main concern in the
next section. Now we are focusing on a relationship between the set of linear and the isomorphism is established.
transformations ft'([I'I' [1'2) and the set of their representations with respect We consider now the operation of matrix multiplication and its connec-
to a fixed pair of bases (8, ~). We already know that any linear transforma- tion with the composition of transformations. Assume that TIE S!'([/.. 92)
tion produces a unique m x n matrix, which is its representation with respect and T2 E 2(92, [/3)' Let {ill = {z.}:= I denote a basis in [/3 and let 8 and
to (8, !'9). The converse is also true. 'I be defined as before. If A = [aiJ]~j~1 and B = [b.ilt:'''1 are representa-
tions of T1 and 12 with respectto the pairs (8, !'9) and (~, {ill), respectively, then
Exereise 3. Let [/1' .92 , 8 , and ro be defined as before. Given A E Fill"., III
confirm the existence of a unique T E ft'(.91, [/2) such that the representation
A.... of Twith respect to (8, ~) is A.
Tz T1(xj) = Tz(TI(xJ = L alJ Tz(YI)
1=1

Hint. Define a transformation T on the elements of 8 by Eq. (1) and then


extend T to [1'1 by linearity:
for allj = 1, 2, ... , n. Hence, the representation of 12 TI with respect to the
"
T(x) = ~ (XjT(xj), bases (16', fJI) is the matrix BA. Briefly, if TI - A, Tz -B, then
J"I
(4)
provided x = D=1 (XJxJ. 0
Note that the relation in (4) obviously holds for transformations TI and
Thus given the bases 8 and !'9 there is a one-to-one correspondence between
T2 both acting on [/'1 and their representations with respect to the pairs of
the sets, ft'([/I' [1'2) and Fill"" in the sense that each matnx
. A rrom
&. dZ'1II"
..,.
bases (8, ro) and (fS, al), where C,~, and IJt are all bases for [/1' This general
is a representation of a transformation T E !R([/I, [/2) with respect to the
approach simplifies considerably when we consider transformati~ns acti~g
bases C and ~, and vice versa. This correspondence T - A is an isomorphism
on a single space and their representationswith respec~ to o~e b~sls (that IS,
between the linear spaces ft'(//I' //2) and !Fill"". (See Exercise 3.14.4 for the from the general point of view, with respect to a pair of identical bases).
definition of an isomorphism). Indeed, if Al (respectively, A 2 ) is the repre-
Clearly, the correspondences in Eqs. (2), (3), and (4) are valid for tra!,sfor~a
sentation of T I (respectively, T2) from ft'(.91, //2) with respect to the bases
tions 1', and T acting on the linear space [/ and their representations With
C = {Xj}j=1 and ~ = {YI}7'= .. and if we write Ai = [a~')], k = 1,2, then (see I z .
respect to one basis C (= ro = gel for ~Using Exercise 3 an supposing
d .
Eq. (1
the' basis 8 to be fixed we say that there is a one-to-one correspondence
III
between algebraic ope~ations involving linear transformations i~ !.([/)
T,,(Xj) =L ~)YI and those involving square matrices. Such a correspondence, given by
1= I
127
126 4 LINEAR TRANSFORMATIONS AND MATRICES 4.3 MATRIX REPRESENTATIONS

Eqs. (2)-(4), is known as an isomorphism between algebras (9'(9') and


: 4 . Check the statement of Theorem 2 for . the transformation
Exercue
/F" X", in this case). T considered in Exercise 1 and its standard representation.

Theorem 1. (a) There is an isomorphism between the linear spaces


SOLUTION. Preserving the previous notation, if x = [Xl X2 X3]T
then the representation lX of x with respect to the s~andard basIs. IS
-r:
9'(9'1,9'2) and /Fm x" given by the correspondence T+-+A where ] T' Noting a similar result for Y .
E [F4 and ItSrepresentation
x II = [ Xl X1 X3 ' )
"", to a
Te 9'(9'1,9'2) and A"" e/Fm " is the representation of T with respect p, we have y = T(x) if and only if (see the solution of Exercise 1
fixed pair ofbases (8, ~).

[~:L
lyJ [~II 1~ll::l
(b) There is an isomorphism between the algebras 9'(9') and /F"l<"
given by the correspondence T+-+A" where Te9'(9') and A,e/F"l<" is

~1
the representation of T with respect to a fixed basis 8 for 9'.
-1
Along with the established isomorphism 9'(9'1' 9'2) +-+ /Fm x" between the
spaces 9'(9'10 9'2) and /Fm X", there are isomorphisms 9'1 +-+ /F" and 9'2 +-+ /F'"
Exercise S. Verify that the matrices
between pairs of linear spaces of the same dimension (see Exercise 3.14.4).
For instance, these isomorphisms are given by the correspondence x +-+ ex
(respectively, Y +-+ II), where IX (respectively, II) denotes the representation 2 0 0]
100 and
(1-1
o
I]
1 0
of x e 9i (respectively, Y e 92) with respect to a fixed basis 8 for 9'1 ( 111
011
(respectively, ~ for 92). It turns out that these isomorphic correspondences
3
carryover to the relations y =: T(x) and P == A,..,ex. are representations of the same transformation T e ..!l(1R ) with resp;ct to the
standard basis and the basis [1 1 O]T, [0 1 I]T, [1 0 1], respec-
Theorem 2. In the previous notation, let the bases 8, ~ be fixed and x +-+ ex, tively. Check the statement of Theorem 2 for these cases.
y+-+ ,I, T-A"fI T hen
y = T(x) (5)
Hint. Use Exercise 3.5.3. and calculate T(ej),j = 1,2,3. 0
if and only if
II = A,."tx. (6) Matrix Representations. Equivalence. and Similarity
4.3
PROOF. Let T E9'(9'109'2)' Assume" Eq, (5) holds and that x +-+ (I and
y +-+ ,I, where (I = [oc i OC2 OC,,]T and II = [PI P2 . . Pm]T. Then,
using Eq. (1), L t T !(f/. f/.) To describe the set JIlT of all matrices that are repre-
t
sent:tio:S of T Yndifferent pairs of bases, we shall use the r~sults f Section

ItI PIYI = Y = T(x) = jtl ocJT(xJ) = i~l (tl QijOCJ)YI' (7)


3.9 about how a change of bases inftuen'7s the representation 0 a system.
As before assume that dim 9'1 = n and dim 9'2 = m.
,
Let ~ and . ' basi r~ then the representa-
~. be two bases In 9'2' If 8 IS a aSIS In VI'
Therefore tions A1.'11 and A I ' of T are related by Exercise 3.9.3 as
(1)
PI = L" aljoci' j = 1,2, .. " m, (8) A"fI' = PA,."
J=l where Pe/FmlCm is the transition matrix from the basis ~ to the basi~.~'.
and Eq. (6) holds. " consider along With8
Now . ' ' 'CD' in VrDI an diet "Q denote the transition
a basis
Conversely, if (6) (or the equivalent (8 is given, then by defining T as matrix from 8 to 8'. Then we will show that
suggested in Exercise 3 and using (7), it is easily seen that (8) implies (2)
A"., = A",fI,Q
Eq. (5).
128 4 LINEAll TllANSFORMAll0NS AND MATlUCIlS 4,3 MATRIX R!lPRIlSllNT....T10NS 129

Indeed, if (I and (I' denote the representations of x e 9'1 with respect to the Hint. Given A' = a
PAQ - 1, use the matrices P and as transition matrices
bases 8 and 8', respectively, then, by Theorem 3.9.1, and construct a pair of bases with respect to which A' is a representation
ofT. 0
(I' = Q. (3)
Thus, any transformation T e .ft'(9'I' 9'2) (dim 9'1 = n, dim 9''1. = m) gen-
Let Y = T(x) and let IS denote the representation of y with respect to the basis erates a whole class d T of mutually equivalent m x n matrices. Furthermore,
~'. By Theorem 4.2.2 we have this is an equivalence class of matrices within :F",)(n (see Section 2.7). Given
T e .ft'(9'I' 9''1.)' it is now our objective to find the ..simplest" matrix in this
P= A..... (I and p= A.., ,(1'. (4)
class since, if it turns out to be simple enough, it should reveal at a glance
Hence, using Eq. (3), the second relation in (4) yields p = A.., , QcL Compar- some important features of the "parent" transformation.
ing this with the first relation in (4) and noting that x e [/1 is arbitrary, the Let T E fi'(~, 9'2), T :F O. The class of equivalent matrices generated
required equality (2) follows. by T is uniquely defined by the rank r of an arbitrary matrix belonging to
Combining Eqs. (1) and (2) and recalling (Exercise 3.7.2) that transition the class (see Section 2.8). Referring to Section 2,7, we see that the simplest
matrices are nonsingular, we arrive at the following theorem. matrix of this equivalence class is of one of the forms

Theorem 1. Let Te fi'(9'1 9'2}.lfA.... andA..,.". denote the representations or [I", 0], (6)
ofT with respect to the pairs ofbases (8, ~ and (8', ~'), respectively, then
A.., = PA ....Q-l, (5) depending on the relative sizes of r, m, and n.
where P and a denote the transition matrices from f to f' and from 8 to 8', A matrix in (6) is referred to as the canonicalform of the representations of
respectively. . T with respect to equivalence. This canonical form (also referred to as the
canonical form for any matrix in JIlT with respect to equivalence) instantly
Example 1. Consider the matrices A and A' obtained in Exercises 4.2.1 , gives the rank of all the matrices in the corresponding equivalence class.
and 4.2.2, respectively. We already know that they are equivalent: A' = Exercise J. Check that the canonical form with respect to equivalence
PAQ-l where P and Q are transition matrices from the standard bases ofthe representations of T E .ft'(:F3, :F 4) defined in Exercise 4.2.1 is
to the bases in Exercise 4.2.2. In fact, calculating these transition matrices
as in Exercise 3.9.1 (see also Corollary 3.9.1),

00]-1 10 0]
010
P =
1 1
-1 0 1 0 a- 1 10]
= [1 0 1 . 1
o [000
AO=OOlD

[ o 1 0 1
000 ' 1
o -1 0
Finding the bases with respect to which the representation of a trans-
Recalling the definition of equivalent matrices (Section 2.7), the last formation has the canonical form can be performed with the help oftransition
theorem can be restated in another way: matrices. In more detail, let T E .ft'(9"1' 9''1.) and let AI. denote the repre-
sentation of Twith respect to the bases 8.and f in 9'1 and 9''}., respectively.
Theorem 1'. Any two matrices that are representations of the same linear If A o is the canonical form of the equivalence class generated by T,
transformation with respect to different pairs of bases are equivalent. then there are nonsingular matrices P E j!'m)( '" and Q E j!'n)( n such that
It is important to note that the last proposition admits a converse state- (Theorem 1)
ment. A o = P- 1A .... Q,
Exercise 2. Show that if A is a representation of T e fi'(9'b 9'2), then a~Y Where P, Q are transition matrices from the bases 8, ~ to the bases with
matrix A' that is equivalent to A is also a representation of T (with respect . respect to which the representation of T is in the canonical form A o By
to some other pair of bases). use of Eq. (3.9.1), the required bases are easily found.
130 4 LINEAR TRANSFORMATIONS AND MATRlCIlS 4.4 SOME PROPERTIES OF SIMILAR MATRICES 131

There are no essentially new features in Theorems 1 and 1', or in the 4.4 Some Properties of Similar Matrices
related discussion, when the special case of transformations from ~(.9')
is considered. That is, transformations from f/' into itself. The associated
matrix representations A". are square, of course. This means that the Throughout this section all matrices are assumed to be square.
possible canonical forms under equivalence are confined to the first and
last of those displayed in (6). Exercise 1. Let A ~ B; that is, let A and B be similar matrices. Show that
A new phenomenon arises, however, if we insist that t = Go As might (a) rank A = rank B;
be expected, the canonical forms of (6) cannot generally be achieved under (b) det A = det B;
this additional constraint. In the context of Theorem 4.2.I(b), we now seek a (c) A - AI ~ B - AI;
single basis 4 for f/' in which the representation A, of T is as simple as (d) Ak ~ Bk for k = 0, 1,2, ... ;
possible. The next theorem is a first step in the analysis of this problem. (e) P(A) ~ PCB) for any scalar polynomial pOL);
Keeping in mind the meaning of the matrices P and Q in Eq. (5), it is an (f) AT ~ BT
immediate consequence of Theorem 1. .
SoLUTIONS (b) If A = PBp-l, then det A = (det PXdet BXdet P-l) and
Theorem 2. Let Te ~([I'), If A, and A" denote the representations of T the result follows from Exercise 2.6.11.
with respect to the bases 4 and 4' in 9: respectively, then
(e) For P(A) = I:.t=o p,A' we have
A""" PA,P- 1,
where P is the transition matrixfrom the basis 8 to the basis 8'. peA) = rI

1=0
p,(PBP-l)'

The relation (7) between matrices A" and A, turns out to be so important
in the theory of matrices and linear transformations that a special name is = p( Pl BI) p - l = PP(B)P- 1 0
1=0
justified: Square matrices A and B are said to be similar, written A ~ B,
if there is a nonsingular matrix P such that When A and Bare similar the matrix P of the relation A = PBP- 1 is
A = PBP-l. said to be a transforming matrix (of B to A). It should be pointed out that this
transforming matrix P is not unique. For example, P may be replaced by PP 1
It is easy to verify that similarity determines an equivalence relation on where P 1 is any nonsingular matrix that commutes with B.
g;""'". The next result is then a consequence of Theorem 2.
Theorem 3. Let T E ~(/7) and dim /7 = n. Then the set of all matrix', Exercise 2. Show that
representations of T (with respect to different bases in /7) is an equivalence
classofsimilar matrices in g;""'".
Exercise 4. Check that the representations of the transformation Tdefined
in Exercise 4.2.5,with respect to the bases indicated there, are similar matrices. and that nonsingular matrices of the form
o
Denote the set of matrix representations of T (of the last theorem) by .rI~.
In contrast to the problem of finding the" simplest" matrix in the equivalence
class .rIT of equivalent matrices, the problem of determining the ..simplest" where PI commutes with Al and P z commutes with A z , are some of the
matrix in the equivalence class .rI!]. of similar matrices (that is, the canonical transforming matrices.
form of the representations of T with respect to similarity) is much more'
complicated. For the general case, this problem will be solved in Chapter 6, Exercise J. Confirm the following generalization of the previous result: A
while its solution for an especially simple but nonetheless important kind of square matrix A is similar to a matrix obtained from A by reflection in the
transformation is found in Section 4.8. secondary diagonal.
132 4 UNEAR TRANSFORMATIONS AND MATlUCIlS 4.5 IMAGB AND KBIlNBLOF A LINEARTRANSFORMATION 133

Exercise 4. Prove that if (i1> i 1 , , ill) is a permutation of (1,2, ... , n), Exerci~e 7. Check that the matrices
then
A= [~~] and I = [~ ~]
for any A = [a.J]:.J-l' are n~t similar, although they have the same rank, determinant, and trace.
Hint. Observe that the matrix A = [allel",];',"-l is obtained from A by D
interchanging columns accompanied by a corresponding interchanging of Thelsimilarity relation preserves some other properties of matrices.
rows. Deduce from this the existence of a permutation matrix P such tha~
A = PAP- l 0 Exerc~se B. Show that if A 11:$ B, then
(a) I if B is idempotent, so is A;
Exercise 4 gives rise to the problem of determining the operations on
(b) ! if B is nilpotent of degree k, so is A.
elements of a matrix that transform it into a similar one. Recalling that
any nonsingular matrix is a product of elementary matrices (Exercise 27.7),. Exer~se 9. Show that the scalar matrices are similar to themselves using
we write the matrix P in Eq. (4.3.8) in the form P = EIlEIl- l '" E l . Hence' any ti,"ansforming matrix P. Conversely, prove that a matrix having this
the transformation of B into a similar matrix A can be interpreted as Ii property must be scalar.
result of successive applications of elementary matrices: Hint. I Consider first diagonal transforming matrices. D
A = EIl{EIl_l{E2(E1BEil)Eil) ... Ek'_\)Ek'1.
For any elementary matrix E, the result ofthe operation EBE- l can be easil~
described in terms of row and column operations depending on the type (s . 4.5 Image and Kernel of a Linear Transformation
Section 2.7) of elementary matrix. The following exercise is established in thiS
way.
Theo~em I. Let T e !([/.. [/1)' If [/ is a subspace in [/1' then T([/) is a
Exercise 5. Show that a matrix is transformed into a similar one if an subsp(JCe in [/2'
only if elementary operations of the following kind are performed: PRoor Let Y1> Y2 e T([/). Then there are elements Xu X1 e [/ such that
(a) The ith column (row) is interchanged with the jth column (row 11 = ,T(x~ (i = I, 2). Since T is linear, it follows that
and subsequently, the ith row (column) and jth row (column) are inter
changed; ocY1 + fJY2 =OCT{Xl) + fJT(xz) = T(axl + fJX2)'
(b) The jth column (row) is multiplied by a nonzero scalar oc and, su Hence for any a, fJ E ; the element aYl + fJY2 is the image ofocxt + fJX2 E 9;
sequently, thejth row (column) is multiplied by l/a; that [s, it belongs to T([/). This proves that the image T([/) is a linear
(c) The jth column (row), multiplied by IX, is added to the ith col subspace in [/2'
(row), while the ith column (row), multiplied by -IX, is added to the jt
column (row). D In particular, the image T([/l) of the whole space [/1' called the image of
the transformation T and usually denoted by 1m T, is a subspace in [/2'
Observing that the trace of a matrix is not influenced by the operation Its d~ension is called the rank of the transformation T and is written rank T.
Thu~,
of Exercise 5, the next exercise can be completed.
rank T ~ dim(Im T).
Exercise 6. Check that if A 11:$ B, then tr A = tr B. D iL

'iW~ have seen in Exercise 4.13 that an m x n matrix A over 9' determines
Note that the common characteristics we have found (rank, determinan ....flinar transformation T.. from :F" to 9''" in a natural way. Observe that
trace) for an equivalence class of similar matrices rail to provide comple ,'1['.. as defined here is just the subspace 1m A of 9''", as introduced in
information about the class. tion 3.2.
134 4 LINEAR TRANSFORMATIONS AND MATRICI!S 45 IMAGE AND KERNEL OF A LINEAR TRANSFORMATION 13S

Exercise 1. Describe the image of the transformation T E 9'(R 3 (4) Theorem2. With the previous notation,
defined in Exercise 4.2.1 and determine its rank.
rank T = rank A
SOLUTION. Representing T as for any A EdT'
PROOF. Let A be the representation of Te 9'(9'10 9'2) with respect to the
Xl:Z
X + X3]
Xl
X:z
= Xl
[0] + [1] + [1]
1
1 X:z.
1
0
0
X3 0 '
bases tI = {YJl'=1 and C = {YilT"'1 for 9'i and 9'2. respectively. Since
rank T = dim(span{T(xj)j=I), it follows from Proposition 3.7.1 that
[ rank T = dim(span{Mi= I). where Pi (1 s j :s; n) is the representation of
XI-X:z 1 -1 0 the element T(xj) with respect to the basis ~. Recalling that A =
[PI P2 ... Pn] and using the result of Theorem 3.7.2, the proof is
itis not difficult to see that 1m T coincides with span{vt> V2' V3}, where
completed.
V3 = [0 1 1 l]T, V:z = [1 1 0 _1]T, 113 = [1 0 0 O]T. The vec-
tors Vi (i = 1, 2, 3) are linearly independent and, therefore, rank T = 3. Generalizing the definition of the kernel of a matrix, as introduced in
Section 3.2, we now define the kernel of any transformation T E ..2'(9'1> 9':z) by
Exercise 2. Show that. for any linear subspace 9' of 9';.
Ker T A {x E 9'1 : T(x) =O}.
dim T(9') s dim 9' (1)
Inthe notation of Exercise 4.1.3 we see that for any matrix A, Ker A =
and hence, that rank T :s; dim 9'1' Ker TA
Hint. Use Exercise 4.1.5. Compare this result with that of Exercise 3.5.6. Theorem 3. Let T E 9'(9'1' 9'2)' The set Ker T is a linear subspace in 9'1'
Exercise 3. Check that for any T1 T:z e 9'(9'1' 9':z). PROOF. For any xl> X2 E Ker T and any e, fJ ElF, the element lXXI + fJx:z
also belongs to the kernel of T:
Im(T1 + T:z) c: 1m TI + 1m T2
T(OIXI + fJx:z) = aT(x) + fJT(X2) = 010 + fJO = O.
Then prove the inequalities
This completes the proof.
Irank TI - rank T:z1 s rank(TI + T2 ) s rank TI + rank 'Ii.
The dimension of the subspace Ker T is referred to as the defect of the
Hint. Use Exercise 3.6.2; compare this with the assertion of Proposition transformation and is denoted by def T.
3.8.2.
Exercise 5. Describe the kernel of the transformation defined in Exercise
Exercise 4. Let TI E 9'(9'1.9'2) and T:z E 9'(9'2.9'3)' Show that 4.2.1 and determine its defect.
Im(T:z TI ) = T2(lm TI ) SoLUTION. To find the elements x E 9'1 such that T(x) = 0, we solve the
system
and that

[~: ~~:]
rank(T:z TI ) s min(rank TI rank T:z).
= 0,
Hint. Use Eq. (1). Compare this with the result of Exercise 3.8.5. 0
The notion of the rank of a transformation can be defined in a different XI - X:z
way. Recall (Section 4.3) that all matrices from the equivalence class .PIT which gives XI = X:z = X3 = 0 and Ker T = {OJ. Thus, def T = O.
have the same rank and so this can be viewed as a characteristic of the
transformation T itself. It is thus natural to consider this characteristic Exercise 6. Describe the kernel and determine the defect ofthe transforma-
as the rank of T. Note that both definitions are equivalent. tion considered in Exercise 4.1.6.

-------_._---
136 4 LINEARTRANSFORMATIONS AND MATRlCJ1lI 4.5 IMAGE AND KERNELOF A LINEARTRANSFORMATION 137

Answer. Ker T = span{[O 1 1 O]T, [1 1 0 _1]T}; def T = 2. 0 then the element L~= ,lXiXi is in (Ker T) n 9'0 and hence is the zero element
of 9'1' Now the linear independence of {x;}~=, yields IXI = 0 (i = 1,2, ... , r)
Theorem 4. For any matrix representation A ofa lineartransformation T, and the linear independence of T(Xl), ... , T(x,).
dim(Ker T) = dim(Ker A). Furthermore, every element y E Im T can be represented as a linear com-
bination of T(x), ... , T(x,). To see this, we first write)' = T(x.) for some
PROOF. Let us use the notations introduced in the proof of Theorem "E 9'1' Let {Xat=l be a basis for Ker T, where k = def T. If the basis repre-
2. An element x belongs to Ker T if and only if its representation sentation of x is
IX = [IX, 1X2 . .. 1X8 ] T with respect to a basis tf belongs to the kernel k r
of AI. , Indeed, if X='f:j=,IX}X}, then xeKerT if and only if X = L PIX; + L YiXi,
D= I IX} T(x}) = 0 or, what is equivalent, i=' i= 1

we obtain T(x) = Ll= 1 'Vi T(Xi), which completes the proof.


t i
j='
IX}
i=1
ai}Yi = fit
1=' \i=,
ajai})YI = O.
Exercise 7. Deduce Eq. (2) from Theorem 6. 0
The linear independence of the elements )'10 )'2""')'m from the basis f The assertion of Theorem 6 is seen from a different point of view in the
implies that AI. ac = 0, where AI = [ai}]l':j~l' It remains only to apply next exercise.
Proposition 3.7.1. Ex.ercise B. T E :l'(9'1o 9'2) and let Im T = span{y" Y2' ... , y,}, where
YI = T(X.i) for i = 1,2, ... , r. (We say XI is the preimaqe of )'i') Confirm that
Using the notion of the defect for matrices, we say that def T = def A .~
.9'0 = span{x., X2"'" x,) satisfies Eq. (3). 0
for any A that is a representation of T. It should be emphasized that, in
general, the subspaces 1m T and 1m A, as well as Ker T and Ker A, lie in Using the same technique as in the proof of Theorem 6, it is not difficult
different spaces and are essentially different. to prove the next exercise.
Theorems 2 and 4, combined with the relation (3.8.1), yield the following
Exereise 9. If T E :l'(9'" 9'2) and 9' is a linear subspace in 9'1' show that
useful result:
dim T(9') = dim 9' - dim(9' n Ker T). (4)
Theorem 5. For any Te !t'(9'1' 9'2),
Hint. Extend a basis in 9' n Ker T to a basis in 9'. 0
rank T + def T = dim 9'1' (2)
Note that Eq. (4) implies Eqs. (1) and (2) (for 9' = 9'1) and also the
This result is in fact a particular case of the next theorem, which is inspired inequality
by the observation that Ker T consists of all elements of 9'1 that are carried
by T onto the zero element of 9'2' Hence, it is reasonable to expect that dim T(9') ~ dim f/ - def T. (5)
Im T is spanned by images of the basis elements in a complement to Ker T. The next exercises concern properties of sums and compositions of linear
Theorem 6. Let Te !t'(9'I' 9'2) and let 9'0 be a direct complement to Ker T;,( transformations.
in 9', :
Exercise 10. Let T" T2 e!t'( .'/,,9'2)' Check that
Ker T, n Ker T2 c: Ker(T. + T2 ) .
Exereise 11. Let T1 e :l'(9'1' 9'2) and T2 e !t'(9'2 , 9'3)
(a) Show that
Ker T, c: Ker(T2 T.) = {x E 9'1: T,(x) e Ker T2 }
(b) Show that T1(Ker T2 T.) c: Ker T2 with equality if Ker T2 c: 1m Tr-
138 4 LINEAl!. TRANSFORMATIONS AND MATRICES 4.6 INVERTIBLE TRANSFORMATIONS 139

(c) Apply the inequality (5) with f/ = Ker(1211) and 1 = 1 1 to show Theorem 1. Let Te !i'([/I' 9'2) and let dim [/1 = n, dim 9'2 = m. The
that following statements are equivalent:
def(12 T.) s def T1 + def 12. (6) (a) T is left invertible;
(b) KerT={O};
Exercise 12. Check that for any Te .!l'(9'),
(c) T defines an isomorphism between 9'1 and 1m T;
Ker T c: Ker T 2 c: Ker T 3 c: ... , . (d) n S m and rank T = n;
1m t:::l 1m T 2 :::l 1m T 3 ::J (e) Every representation 0/ T is an m x n matrix with n S m and full
rank.
Exercise 13. Show that if
PROOF. If Eq. (1) holds and T(x) = 0, then x = 11 (x) = 11(T(x = 0;
(7) hence Ker T = {OJ and (a)=> (b). To deduce (e) from (b) it suffices, in view
of the linearity of T, to establish a one-to-one correspondence between
then Ker TJ = Ker TJ+ 1 for allj ~ k.
[/1 and Im T. Let y = T(Xl) = T(%2), where x,e [/1(i = 1,2). Then
SoLUTION. Suppose on the contrary that for some j > k, we have Ker TJ T(XI - X2) = 0 and the condition Ker T = {OJ yields XI = %2 and, con-
Ker TJ+ 1. In other words, TJ+ I(X) = 0 but TJ(x) 0 for some x E 9',.. sequently, the required uniqueness of x. To prove the implication (e) => (a),
=
Writing j = k + i; we obtain TH 1(T'(x 0 while Tt(T'(x O. This we observe that the existence of a one-to-one map (namely, T) from 9',.
contradicts the assumption that Ker r = Ker r+ I. 0 onto Im T permits the definition of the transformation 11 E .!l'(lm T, 91)
by the rule: If y = T(x) then T1(y) = x. Obviously T1(T(x = T1(y) = x
Since strict inclusion between subspaces, 9'0 c: 9'10 implies dim 90
for all x E [/1> or, what is equivalent, T1 T = 11' Thus, (e) =>(a). We have
< dim 9'., then, in a chain of subspaces of a finite-dimensional space [/
shown that (a), (b) and (e) are equivalent.
such that
Now we note that the equivalence (b) <::> (d) follows from Eq. (4.5.2), and
9'0 c: [/1 c: ." c: 9'" c: ... c: .9, Theorem 4.5.2 gives the equivalence (d) <::> (e). This completes the proof.
there exists a positive integer k ;s; dim 9' such that S'1. = S'1.+ 1 = ., '. Using Exercise 1. Check the equivalence of the following statements:
this observation and combining Exercises 12 and 13 will yield the con-
(a) Te .!l'([/I' 9'2) is left invertible.
clusion of the next exercise. (b) For any linear subspace 9' c: 9'1'
Exercise 14. Check that for any Te.!l'(9'), dim T([/) = dim .9,
Ker T c: Ker T 2 c: .. c: Ker T" = Ker TI<+ 1 = ... c: .9, (c) Any linearly independent system in 9'1 is carried by T into another
where k denotes a positive integer between 1 and n, n = dim.9, and the linearly independent system in 9'2' 0
inclusions are strict. 0 The notion of right invertibility of a transformation is defined similarly
by saying that T E .!l'([/l> 9'2) is right invertible if one can find a transforma-
tion T2 e .!l'(9'2' 9'1) such that
4.6 Invertible Transformations T12=~, ~
where 12 denotes the identity transformation in 9'2' The transformation
A transformation T e .!l'(9'I' [/2) is said to be left invertible ifthere exists 12 in Eq. (2) is referred to as a right inverse of T.
a transformation T1 e .!l'(9'2 , 9'1) such that Theorem 2. LetTe.!l'( 9'1' 9'2) and let dim 9'1 = n, dim 9'2 = m. The
T1 T = II., (1) following statementsare equivalent:
where II denotes the identity transformation in 9'1' If Eq. (1) holds, then (a) T is right invertible;
the transformation T1 is called a left inverse of T. (b) rank T = m;
140 4 LINI!A1l TRANSFORMATIONS AND MATlUCIlS) 4.6 INVERTIBLE TRANSFORMATIONS 141

(e) T defines an isomorphism between any direct complement to Ker T TIaeorem 3. Let T e 2(9'1 9'2) and let dim 9'1 = n, dim 9'2 = m. The
and 9'2; following statements.are equivalent:
(d) n ~ m and def T = n - m ; ,
(e) Everyrepresentation ofT is an m x n matrixwithn ~ m andfull rank.J (a) T is invertible;
(b) Ker T= {O}, rank T= m = n;
PROOF. (a) =>(b). Let Eq. (2) hold and let y E 9'2' Then y = T12(Y) ;; (c) T defines an isomorphism between 9'1 and 9'z;
T(T2 (y and the element .x = 12(Y) E 9i is a pre-image of y. Thus 9i = (d) Every representation of T is an n x n nonsinqular matrix.
1m T an~ part (b) follows from part (a). To prove the implication (b) => (c~ Using Theorem 4.5.5, we deduce three corollaries.
+
we consider a subspace 9'0 such that 9'1 = Ker T 9'0 and show tha
T provides a one-to-one correspondence between 9'0 and 9'2' Indeed, . Corollary 1. Let Te 2(9'1,9'2) be left invertible. Then T is invertible if
y = T(Xl) = T(.x2), where Xl e 9'0 (i = 1, 2), then Xl - .x2 = 90 n Ker and only if dim 9'1 = dim 9'2' In particular, a left invertible transformation
;; {O}. Hence Xl = X2 and the required isomorphism between 90 and acting on 9' is invertible and defines a one-to-one map of 9' onto itself.
follows from the linearity of T. Now observe that this isomorphism guarant
the existence of a transformation 12 e 2(9'2,9'0) defined by the rule: CoroUary 2. Let T E 2(9'1' 9'2) be right invertible. Then T is invertible
if and only if def T = O. In particular, a right invertible transformation acting
()n9' is invertible and produces a one-to-one mapof 9' onto itself.
If y = T(x) where X e 90, then 12(Y) = x.
.""Qo-oUary 3. Any representation ofaninvertible transformation isaninvertible
Clearly, T(T2(y = T(.x) = y for all y E 9'2 and the relation (2) hol
tMtt-ix.
Thus, the equivalence of (a), (b), and (c) is established. .
Note that the equivalences (b)-(d) and (d)-(e) are immediate cons ~Ltonsider an invertible transformation T e 2(9'10 9'z). By definition there
quences of Theorems 4.5.5 and 4.5.2, respectively. . ...~~ transformations Tl , Tz E 2(9'z, 9'1) such that simultaneously Eqs. (1)
and (2) hold. In this case, the equalities
The notions ofleft (or right) invertible matrices and their one-sided invers
are defined similarly to those for transformations. As pointed out in Sectio.
T1 = T1I2 = T1(TT2 ) = (T1T)Tz = I1Tz = T2
2.6, a one-sided inverse of a square matrix is necessarily an inverse matri wthat the one-sided inverses of T coincide. Hence there is a transforma-
that is, both a left and right inverse. T- l e !R(9'2' 9'1) satisfying the conditions

E.xercu,e 2. . Sho.wthat a transformation T E 2(9'10 9'2) is left (respectivel


T-1T = II and TT- l = 12 , (3)
right) invertible if and only if any representation A is a left (respectivel transformation T- 1 is called the inverse of T and is easily seen to be
right) invertible matrix. ue, It is clear, comparing Eq. (4.2.4) with Eq. (3), that if A is a represen-
n of an invertible transformation T e 2(9'h 9'2) with respect to the
Hint. Use the correspondence (4.2.4). 0 (8,~, then A -1 is the representation of T-l with respect to (f9, 4).
llrticular, if T, T- 1 E !R(S"), then their representations with respect to
Theorems 1 and 2 applied to matrices state, in particular, that a rectangu basis in 9' are correspondingly A and A-I.
m x n matrix of full rank is left invertible if m ~ n, and it is right inverti te that part (d) of Theorem 3 asserts that a transformation Tis invertible
if m s: n. The treatment of one-sided invertible matrices will be continued d only if det A#:O for every A edT' When T acts on a space~. then
the chapter on generalized inverses, while the invertibility properties xercise 4.4.1(b) all its representations have the same determinant. In
square matrices are discussed later in this section after some furt' case it is then reasonable to define the determinant ofthe transformation
preparations. . (9'), written det T, as the determinant of any of its representatiol,ls
Let Te 2(9'1,9'2)' The transformation T is said to be invertible if it ~. Hence we now conclude that a transformation Te 2(9') is invertible
both left and right invertible. We combine Theorems 1 and 2 and' .only if det T #: O. This result for matrices was established in Sectic:>n
corollary to Theorem 3.7.1 to derive the next theorem.
142 4 LlNBAR TRANSFOllMATIONS AND MATRtcm 4.7 DttltlCT SUMS OF 'TRANSFORMATIONS

Exercise 3. Let 11: 91 ~ 91+ 1 (i == 1,2, ... , k) be invertible linear trans- Suppose now that 9't == .9'2' Thus we investigate transformations
formations. Show that T == 'lk 'lk-I ... T1 e 1(9'1' 9'H I) is also invertible T e 1(9'). The following concept, which is' important in its own right is
and that T-I == T;IT;I ... TAl. 0 helpful in this context. '
The result in Exercise 3 can be reversed in the following sense. Let Te !t'(9'). A subspace 9'0 c: 9' is called invariant under T (or T-
invariant), iffor any x E 9'Q, the image T(x) also belongs to 9'Q:
Exercise 4. Let 11 e 1(9'), i == 1, 2, ... , k. Check that all 11 are invertible
T(x) e.9'o for all x e .9'0' (2)
if and only if, for any permutation (j 10 h, ... , j,,) of (1, 2, ... , k), the trans-
formation 1JI1J2 ... 1J" is invertible. In other words, if 9'0 is T-invariant, then every element x e.9'o is trans-
formed by T into an element of the same subspace 9'0' Hence the restriction
Exercise S. Check that if D= 0 PiT i
== 0, Po :F 0, and T e 1(9'), then T Tlvo in this case is a transformation acting on 9'0 and therefore can beviewed
is invertible. 0 as a "part" of Tin 9'0'
Obviously, {O} and the wholespace are invariant under any transformation.
, These spaces are referred to as the trivial invariant subspaces and usually are
4.7 Restrictions. Invariant Subspaces. and excluded from further considerations.
Direct Sums of Transformations The next exercise shows that linear transformations that are not invertible
always have nontrivial invariant subspaces. '
Exercise 3. Let T e !t'(9'). Show that Ker T and 1m Tare T-invariant.
Let TE !t'(9't, 9'1) and let 9'0 be a linear subspace in 9't. The transforma-
Show also that any subspace containing 1m Tis T-invariant.
tion T : 9'0 -+ 92 defined by T(x) == T(x) for every x e 9'0 is referred to as the
restriction of Tto 9'0 and is denoted by TI.9'o' Obviously, TlvC],.E 1(9'0,9'2)' SOLUTION. For the first subspace, let x e Ker T. Then T(x) == 0 and, since
In contrast, a transformation Te !t'(9'I' 9'1) coinciding with '1' on 9'0 c: 9'1 oE Ker T, the condition (2) is valid. The proof of T-invariance of 1m T is
is called an extension of T. similar.
Exercise 1. Let Te!t'(9'I' 9'1)' Check that for any linear subspace 9'0 c: 9'1' Exercise 4. Show that 9'0 is T-invariant if and only if T(xl) E 9'0 (i == 1.
2, ... , k), where {xt> X2"'" x.} is a basis for 90.
Im(Tlvo) == T(9'o), Ker(Tlvo) == 9'0 1"'\ Ker T: 0
Exercise 5. Check that if TI T2 = T2 T1, then Ker T2 and 1m T2 are T1-
The second relation of Exercise 1 shows that if 9'0 is any direct complement" invariant.
+
to Ker T in 9't (thus 9'0 Ker T == 9't), then T Ivo has the trivial sub-
Exercise 6. Show that sums and intersections of T-invariant subspaces are
space {O} as its kernel. In this case, we deduce from Theorem 4.5.6 that
1m T == T(9'Q). So the action of a linear transformation T is completely T-invariant. 0
determined by its action on a direct complement to Ker T, that is, by the The notion of an invariant subspace for a matrix A is obtained from that
restriction of T to such a subspace. for transformations by treating A as the transformation TA introduced in \
We now consider an arbitrary decomposition of the domain into a direct Exercise 4.1.3.
sum of subspaces (as defined in Section 3.6). Proposition 1. Let Te !t'(f/') and suppose that T has an invariant subspace
+
Exercise 2. Let T E !t'(9'h .9'2) and let 9't ee 9'0 !J. Write x == Xt + X1 in f/' of dimension k: Then there exists a matrix A E d~ such that one of
where Xt e 9'Q, X2 e .9, and show that, if Tt e 1(.90, 9'2) and T2 E 1(!J, .9'2}' its invariant subspaces is also ofdimension k.
the equality PROOF. Let the subspace 9'0 c: 9' be T-invariant of dimension k. Let
T(x) == Tt (Xl ) + T2(X2) 'obe a basis in 9'0 and let the basis I in .9' be the union of 1 0 and a basis
in a direct complement to 9'0 in f/'. If y == T(x), x E 9'0' then, because of the
holds for any x E 9i if and only if T1 == Tlvo and T2 == Tlv 0 invariance of 9'0 under T, the representation p of y with respect to I is of
A similar result can be easily formulated for a direct sum of sever the form P == [111 112 . . {J. 0 . . . O]T. The representation tl of x
subspaces as defined in Section 3.6. with respect to the same basis 8 is also of this form, and then Theorem 4.2.2
144
4 LINEAR TRANSFORMATIONS AND MATRICIlS 4.8 DIRIlCT SUMS AND MATRICES 145
implies that the subspace of the representations of the elements of V with 4.8 Direct Sums and Matrices
o
respect to I is AI-invariant. Clearly. the dimension of this subspace is equal
tok.
_ Let Te .!l'(V) and let V o c: V be a T-invariant subspace. Denote by Let T E .!l'(V) and let d denote the equivalence class of similar matrices
+
9' a direct complement to V o in V: V =Vo !J. (For the existence of associated with T. In searching for the simplest matrices in this class, we
such a space ~ see Section 3.5.) Note that !J is not necessarily T-invariant. first study the representations of T in the case when the space V is decom-
posed into a direct sum of subspaces, at least one of which is T -invariant,
Example 7. Let Te .!l'(F2 ) be defined as follows:
+
Let V = VI V'J,. where VI is T-invariant. assuming for the moment
that such a decomposition is always possible. Now we construct a represen-
tation of T that is most "appropriate" for the given decomposition of the
space, namely. we build up the representation of T with respect to the basis
Obviously, V o A span{[1 O]T} = {roc O]T: oc e F} is T-invariant:
T([oc O]~ = roc O]T eVo' However. the subspace fj A span{[O I]T} {XI. X2 Xl' Xk+ h ... , XII} (1)
is a direct complement to V o, but it fails to be T-invariant: T([O I]T) = in .9, where k = dim 91 and the first k elements constitute a basis. in VI'
[l I]T !J. Observing that 9'0 is the only nontrivial T-invariant subspace Since VI is T -invariant, T("J) e VI (j = 1.2... k) and Eq. (4.2.1) gives
of fF2. we deduce that it is impossible to decompose fF2 into a direct sum of rc
T-invariant subspaces. 0 T(xj) = L ocljXj, j = 1.2, ... k. (2)
1=1
If the space V can be decomposed into a direct sum of k subspaces ~. all
of which are T-invariant. we may write. for every" e .9, Hence the matrix A = [oclj]7.J= I' where the oclj (1 S; i. j ~ n) are found
l
from the equation T(xj) = ~J.l oclJXt U = 1.2... , n), is of the form:
L "I/(xJ.
= [~1 ~:l
T(x) = (3)
1=1 A (3)
where x = D=I XI (XI e~, i = 1,2, ... , k) and T; = TI.Y'I for each i. (See
where Al eFrc"k and A2 e F (II- kl ,, (n- kl By definition. the matrix (3) is the
the remark immediately following Exercise 2.) Note that when k = 2 Eq. (3)
differs from Eq. (I) to the extent that we may now interpret T; as being in representation of T with respect to the basis (1). Thus, a decomposition of
the space into a direct sum of subspaces, one of which is T -invariant, admits
.!l'(.9;) for i = 1 and 2. Bearing Eq. (3) in mind. we give the following
definition. a representation of T in block-triangular form (3).
Let Te.!l'(9') and let V = L~=I' 9i. where each 9i is T-invariant. Then Exerdse1. Let Te.!l'(9'), where V = VI +
V 2 and VI isT-invariant.
the transformation T is referred to as the direct sum of the transformations Check that the representation of T with respect to a ba~is in which th.e basis
1j = TIS', (i = 1.2..... k) and is written
elements of VI are in the last places is a lower-block-triangular matnx, 0
l

T= L'1j
1=1
~ 11 + T2 +'" + 11. (4) +
If V = VI V 2 and both subspaces VI and V 2 are T:invariant. then a
simpler representation of T can be found. Namely, reasonmg as before and
Note that Eq. (4) now implies Eq. (3) for every X e .9,and vice versa. The constructing the representation of T with respect to the basis {x l' "2' ... , "Ii:'
significance of the representation (4) lies in the possibility of reducing the. "11:+1> , XII}' in which the first k elements constitute a basis in VI and
study of the transformation T to that of its "parts" 1j acting on smaller the others generate a basis in V 2, we obtain the matrix
subspaces. This is the basic strategy adopted in the next section for the
analysis of the class of matrix representations of T. A = [~1 ;J. (4)

Exercise 8. Let T e .!l'(V) and p(A) be a scalar polynomial. Show that ifa Where Al e,rcxlt and A 2 efF(n- ltlx(n- ltl. Hence, if Te.!l'(V) can be decom-
subspace is T-invariant. then it is p(T)-invariant. 0 posed into a direct sum of transformations T1 e .!l'(V1 ) and 12 E ~(V 2).
>"TV
4 LINEA1l TRANSFORMATIONS AND MATIUCIlS 4.9 EIGENVALUES AND EIGENVECTORS OF A TRANSFORMATION 147
that is, T = 1i + 72, then there is a representation A of T in a block- Exercise 3. Show that if the nonsingular matrix A is simple, so are A-I and
diagonal form (4), where obviously Al and A 2 are representations of T and adj A.
12, respectively. Note that, conversely, if there exists a matrix A in .91~ of1 the
form (4), then there is a basis in 9' such that the Eq. (2) holds and

T(Xj) = L" aljXI j = k + 1, k + 2 ... , n.


,nExercise 4. Let A e !!J'n lCn and let there be a k-dime!,sional subspace of
that is A-invariant (1 ~ k < n). Show that the relation
1=A:+1
A = p[A I A 3 ] p-l (5)
In view of Exercise 4.7.4, the subspaces f/ 1 = span{xj }' =1 and f/ = o A2
+
span {Xj}j=k+l are T-invariant and, obviously, f/1 f/2 = f/. By analogy,
2
is valid for some A I e [FA: "A: and a nonsingular P e [Fn x n.
the matrix A in (4) is said to be a direct sum ofthe matrices Al and A and
2 Hint. Choose a basis bto b 2 bn for:F n by extending ~ basis b lo bA:
is written A = Al -I- A 2 A similar definition and notation, A = Al -I- for the A-invariant subspace. Then define P to be the matnx whose columns
A 2+... + A p or, briefly, A = Lf= I AI> is given for the direct sum of p are b to b 2 , bn
matrices. We summarize this discussion.
Theorem 1. A transformation T e !R(f/) has a representation A e.9l in
block-diagonalform, consisting ofp blocks, if and only if T can be decomposed
into a direct sum ofp linear transformations.
4.9 Eigenvalues and Eigenvectors of a Transformation
In symbols, if A e .91~, then A = Lf= l ' Ai if and only if T = Lf= l ' 1;.
In this case the matrix Ai is the representation of 11 (l ~ i ~ p).
Recall that Example 4.7.7 provides an example of a transformation T from Let T e !R(.9). We start the investigation of.one-dimensional T -invariant
+
!R(2) that cannot be written T = T1 '12. Hence, there are linear trans- subspaces with a description of their elements 10 terms of T.
formations having no block-diagonal representations (with more than one
block). Such transformations can be referred to as one-block transformations. Proposition I, Let T e!R(9') and let.9o c .9 be a T-invariant ~ubspace of
In contrast, a particularly important case arises when a transformation from dimension 1. Thenfor all nonzero x e 9'0 there is a scalar Ae!F (Independent
!R(.9) can be decomposed into a direct sum of transformations acting on of x), such that .
one-dimensional spaces. Such transformations are called simple. Note the
T(x) = Ax. (1)
following consequence from Theorem 1 regarding simple transformations.
Since.9o is one-dimensional, it is of the form
Theorem 2. Let.9l~ denote the equivalence class of matrix representations
PROOF.

of the transformation T e .!l'(.9). Any matrix A e .91 is similar to a diagonal 9'0 = {uo:O '" Xo E 9'0 and a e !F}.
matrix D if and only if T is simple.
The T-invariance of ~ means T(x) e 9Q for every nonzero x e 9Q an~,
By analogy, a matrix A e [F""" is simple if it is a representation of a simple therefore. representing x as oXo, we have T(<<oxo) = xo. Thus, the condi-
transformation T acting on an n-dimensional space. Using this notion, the tion (1) holds for A = ('1./0 ('1.0 '" 0). If Xl denotes another elemen~from
previous result can be partially restated. SPo. then Xl = YX o for some y E:F and T(x l) = yT(xo) = yAxo - Ax1,
Theorem 2'. Any matrix A e n" n is simple if and only if it is similar to a which shows the uniqueness of A in Eq. (1).
diagonal matrix D. Note that a one-dimensional T-invariant subspace exists provided Eq. (1)
The meaning of the diagonal elements in D, as well as the structure of the holds true for at least one nonzero element x.
matrix that transforms A into D, will be revealed later (Theorem 4.10.2), Exercise 1. Check that if for some nonzero x 0 e 9',
after the effect of a linear transformation on a one-dimensional space is
studied. T(xo) = AXo, (2)
Exercise 2. Check that any circulant is a simple matrix. then the subspace e", = span{.l'o} is T-invariant. 0
148 4 LINEAR TRANSFORMATIONS AND MAiTRICllS 4.9 E,GENVALUES AND EIGENVECTORSOF A TRANSFORMATION 149

Thus, the nonzero elements x satisfying Eq. (2) playa crucial role ~n the Consider the linear hull f/o of all eigenvectors of T corresponding to A. In
problem of existence and construction of one dimensional T -invariant view of the last observation, every nonzero element of this subspace is an
subspaces. We need a formal definition. If T E !R(f/), a nonzero element eigenvector of T associated with A and hence it makes sense to call f/o the
Xo E f/ satisfying Eq .. (2) is said to be an eigenvector of T corresponding to eigenspace of T corresponding to the eigenvalue A. of the transformation.
the eigenvalue A. of T. Thus, the existence of T-invariant subspaces of
Exercise 3. Let A. denote an eigenvalue of T E !R(f/). Show that the eigen-
dimension I is equivalent to the existence of eigenvectors of T.
space associated with A coincides with the subspace Ker(T - AI).
If we now observe that Eq. (2) is equivalent to
Exercise 4. Find eigenvectors, eigenspaces, and eigenvalues of the trans-
(T - )J)xo = 0, Xo ,p. 0, formation T E !R(1ll3 ) defined as follows:
we can characterize eigenvalues in another way.
Exercise 2. Show that AE fF is an eigenvalue of T if and only if the trans-
formation T - )J fails to be invertible. 0
T([:~]) =X3
[2:: ].
X2 + X3
The next proposition uses the concept of an algebraically closed lfield, SoLUTION Solving the system T(x) = AX, which becomes in this case
which is introduced in Appendix 1. For our purposes, it is important to
note that C is algebraically closed and III is not.
we obtain
Proposition 2. Let f/ beafinite-dimensionallinear space over analgebra(cally
A= 2 and Xl = 2X2 = 2X3,
closed field. Then any linear transformation T e !R(f/) has at least one ~igen
value. A= 0 and XI = 0, X2 = -X]o

PROOF. By virtue of Exercise 4.1.12, there is a monic polynomial p(A)lwith A= 1 and Xl = 0, X2 = O.


coefficients from fF such that P(T) = O. Since fF is an algebraically closed
Thus, the transformation has three one-dimensional eigenspaces spanned
field, there exists a decomposition of p(A) into a product of linear factors: by the elements [2 1 I]T, [0 1 _l]T, [0 0 I]T and corresponding
I
to the eigenvalues 2, 0, and I, respectively. 0
p(A) = L (A - Ai),
i=1 It is important to note that, in contrast to Exercise 4, a transformation
where A; E fF; (i = I, 2, ... , I). Then p(T) = Ol=dT - A;l) is the izero acting on an n-dimensional space may have less than n distinct eigenvalues.
transformation, hence at least one of the factors T - A.i I (l S; i S; I) must
be noninvertible (see Exercise 4.6.4). Now the required assertion follows rrom Exercise 5. Check that the transformation T E !R(,2) defined in Example
Exercise 2. 4.7.7 has only one eigenvalue, A = I, associated with the eigenspace spanned
by [1 O]T. 0
Obviously, if x is an eigenvector of T corresponding to A, then it foUows
from T(ocx) = A(OCX) for any oc E ~ that any nonzero element from the lone- Furthermore, it may happen that the transformation has two or more
dimensional space spanned by x is an eigenvector of T corresponding t~ the linearly independent eigenvectors corresponding to the same eigenvalue,
same eigenvalue A. This also follows from the results of Proposition ~ and as the next exercise shows.
Exercise 2. Moreover, every nonzero linear combination of (linearly dqpen-
Exercise 6. Check that the transformation T e 2(~3) defined by
dent or independent) eigenvectors corresponding to the same eigentalue
A is an eigenvector associated with A. Indeed, if T(x/) = AXi (XI =F xj i =
1,2, ... , k), tben the linearity of T implies Xl]) _[2Xl2x+ X3]
T
([ X2 - 2

T(.t OClx;)
.=1
.f lXiT(x;) = A(.t.=1 lX;Xi) ,
= 1'=1
X3 3X3

has eigenvalues Al = A2 = 2 and A. 3 = 3 associated with eigenspaces


and if Lf=l OC;X; ,p. 0, it is an eigenvector of T corresponding to ..t span{[l 0 O]T, [0 1 O]T} and span{[l 0 l]T}, respectively. 0

'4. .'f
'I "
r
.~
~
ISO 4 LINEARTRANSFORMATIONS AND MATRICES 4.9 EIGENVALUES AND EIGENVECtORS OF A TRANSFORMATION 151

Thus, two eigenvectors of T corresponding to the same eigenvalue may . CoroDary 2. Let T E !f(g) and dim f/ = n. If T has n distinct eigenvalues,
be linearly dependent or independent. However, the situation is clear then there is a basis/or g consisting ofeigenvectors of T.
when the eigenvalues are distinct.
In this event we also say that the eigenvectors of T span the space f/ or that
Theorem1. Eigenvectors corresponding to distinct eigenvalues are linearly
independent. T has an eigenbasis in f/. Note that Exercise 6 shows that a transformation
with no < n distinct eigenvalues can still have an eigenbasis, On the other
PROOF. Let AI' A2 , , A. be the distinct eigenvalues of the transformation hand, not every transformation (see Exercise 5) has enough eigenvectors
to span the space. It turns out that the existence of an eigenbasis of T in 9' is
T e ftJ(f/) and let XI' X2 , , x. denote corresponding eigenvectors.
equivalent to saying that T is a simple transformation.
Suppose that there exist numbers IX.. IX2, , IX. such that
Theorem 2. A transformation T e ftJ(9') is simple if and only if it has an
L IX/Xi = O.
1= I
(3)
eigenbasis in f/.
To prove that IXI = 0 (i = 1, 2, ... , s), we first multiply both sides of Eq. (3) PROOF. If {Xj}j= I is an eigenbasis in 9; then, putting .5Il = span{x/} for
on the left by T - All, noting that (T - A1I)xI = 0 and T(xl) = AiX/> 1 s r s n, we obtain the decomposition T = L7=
I' 1j, where 1j acts on f/j
i = 1, 2, ... , s. Then and dim 9'j = 1(1 SiS n). Hence T is simple. Conversely, let f/ = Li=I' 9'/>
where subspaces f/, of dimension one are T-invariant, i = 1,2, ... , n.
L (AI - A,)tXjxj = O. (4) By Proposition 1, any nonzero element of f/I , say XI' is an eigenvector of 1;
1=2
1 SiS n. Clearly, the set of all such Xj (i = 1,2, ... , n) gives an eigenbasis
After premultiplication by T - A21, Eq. (4) becomes ofT in s:

L
/=3
(AI - A1)(A2 - Aj)tXjx/ ee O. In view of Theorem 2, Corollary 2 provides a sufficient (but not necessary)
condition for T to be simple (see Exercise 5).
After s - 1 such operations we finally have Exercise 7. Check that the transformation T e ftJ(9') is invertible if and
only if all eigenvalues of T are nonzero.
(AI - A.)(A 2 - A.).. (A'_I - A..)IX.X. = O.
Exercise 8. Prove that each T-invariant subspace f/o contains at least one
which implies IX. = O. But the ordering of the eigenvalues and eigenvectors eigenvector of T (the field is algebraically closed).
is arbitrary, so we can also prove 1%1 = IX2 = ... = 1%.-1 = O. Hence the
elements Xl> X2' , x. are linearly independent. Hint. Consider Tlvo and Proposition 2.

The spectrum of a transformation T e .5f(f/), written a(T), is defined as Exercise 9. Let T be a simple transformation from .5f(9'). Show that
the set of all distinct eigenvalues of T. Note that Proposition 2 can be re- (a) 1m T is spanned by the eigenvectors of T corresponding to the non-
stated in this terminology as follows: Any linear transformation acting on a zero eigenvalues,
finite-dimensional space over an algebraically closed field has a nonempty (b) 1m T n Ker T = {O}.
spectrum. Now Theorem 1 implies a limit to the size of the spectrum.
Exercise 10. Let TeftJ(f/) and a(T) = {AI' A2 , , A.}. Show that if 9'j
Corollary 1. The spectrum ofa linear transformation acting on an n-dimen- denotes the eigenspace of T associated with Aj , then
sional space consists ofat most n distinct eigenvalues.
(a) The sum D=If/)s direct;
(b) The transformation T is simple if and only if
PROOF. If T e ftJ(9') and T has k distinct eigenvalues, k > n = dim 9', then
the n-dimensional space f/ has more than n linearly independent elements
(eigenvectors of 1; in this case). This is a contradiction.
L .9j=~ 0
j= I

.L
152 4 LINEAR TRANSFORMATIONS AND MATRICES 4.10 EIGIlNVALUES ANDEIGIlNVIlCTORS OF A MATRIX 153

4.10 Eigenvalues and Eigenvectors of a Matrix If E in is a representation of x with respect to the same basis in 9' in which
A is a representation of T, then Theorem 4.2.2 and the isomorphism between
elements and their representations show that the relation (3) is equivalent
Let A E IF")( ", The matrix A can be viewed as a linear transformation to All = ,\(I, Note that e :f: 0 since x :f: 0 and, therefore, A E u(A).
acting in IF" (see Exercise 4.1.3). Hence we define a nonzero vector a E fF" to Observe that we have also found a relation between the eigenvectors of a
be an eigenvector of A corresponding to the eigenvalue A if matrix and the eigenvectors of the transformation which the matrix
All = ,w, II :f: O. (1) represents.

The set of all distinct eigenvalues of A is called the spectrum of A and is Proposition 3. An element x E f/J is an eigenvector of T E .!l'(9') associated
denoted by u(A). Observe that the spectrum is invariant under similarity with AE u(T) if and only if the representation e'" of x with respect to a
transformations, basis in f/J is an eigenvector of the matrix representation of T with respect to
thesame basis.
Proposition 1. Similar matrices havethe same spectrum.
By varying the bases in f/J, we obtain a set of vectors {P-':det P :f: O}
PROOF. Let Eq, (1) hold and suppose A = PBP-I. Then that are eigenvectors of the corresponding representations of T (see Propo-
sition 2). Note also that Theorem 1 and Proposition 3 provide an easy trans-
B(P-III) = P-' AIX = ..t(p- leI) ition from the eigenvalues and eigenvectors of a transformation to those of
and since e :f: 0, the vector P'" leI is an eigenvector of B corresponding to the matrices. In particular, they admit an analogous reformulation for matrices
eigenvalue .t Hence Ae a(B) and u(A) c:: u(B). The reverse inclusion is of the results concerning eigenvalues and eigenvectors of transformations.
shown similarly. For instance, recalling that the representations of linearly independent
elements also are linearly independent (see Exercise 3.7.2), we obtain an
Note that, in fact, we proved more. analog of Theorem 4.9.1.

Proposition 2. If is an eigenvector of A associated with the eigenvalue Proposition 4. Eigenvectors of a matrix corresponding to distinct eigen-
)., then ' = p-I is an eigenvector of the matrix B = p-, AP corresponding values are linearly independent.
to the sameeigenvalue .t We now use this together with Proposition 4.9.2 and Theorem 4.9.2.

Thus, all matrices from the equivalence class d~ of matrix representations Proposition 5. The spectrum u(A) of a matrix A E C "x" is not empty and
for a transformation T E .!l'(f/J) have the same eigenvalues associated with consists ofat most n distinct eigenvalues. If A has n distinct eigenvalues, then
eigenvectors ofthe form P-'x, where P is a transforming matrix. Hence, in it is simple.
particular, the spectrum of the matrices from JII~ is uniquely determined by We can now complement the statement of Theorem 4.8.2' by indicating
the transformation T; Consequently, a strong connection between the spec- the structure of the diagonal matrix D and the corresponding transforming
trum of T and the representations of T is to be expected. matrix P.
Theorem 1. Let T e .!l'(f/J) and let JII denote the corresponding equivalence Theorem 2. Let A E fF")( n bea simple matrixandlet D be the diagonal matrix
classofmatrix represenuaions for T. Then OCcurring in Theorem 4.8.2':
a(T) = a(A) (2) A = PDp-I, det P :f: O. (4)
for any A E JII~. Then D = diag[Aj]j~, with AjE a(A),j = 1,2, ... , n, and
PROOF. In view of Proposition 1; it suffices to show that Eq, (2) holds for P = [x, X1 x,,],
at least one matrix A from JII~. Let Ae u(T) and inwhich Xj (1 S j S n) is an eigenvector ofA corresponding to the eigenvalue
AJ(1SjSn).
T(x) = Ax, x :f: O.

r '".
154 4 LINEAR TRANSFORMATIONS AND MATKIl.:1lS "t.11 _ ,... _ ,.. ...
JJJ

PROOF. Rewriting Eq. (4) in the form AP = PD and comparing the columns Bxercise 1. Consider a matrix A E IFn"n with eigenvectors X10 X2 . . Xn
of the matrices AP and PD, it is easily found that corresponding to the eigenvalues A10 A2 , , An; respectively. Show that any
subspace of fjin generated by a set of eigenvectors
AXj = AjXj' j = 1,2, ... , n, (5)
Xit' xh"'" xjk(1 :S;h <i, < ... <t. n)
where Xj (1 S; j S; n) is the jth column of P and Aj (1 S; j S; n) is the jth
diagonal element of D. Since P is nonsingular, none of its columns can be of A is A-invariant.
the zero element in IFn and hence xj in (5) is an eigenvector of A correspond- Bxercise 2. Show that the spectrum of the square matrix
ing to Aj

Theorem 2 can be reformulated in an illuminating way. Define the matrix A = [~I ~:J.
Q = (p-I)T and write Q = [y Y2 ... Yn]. Then QTp = p-Ip = I and,
comparing elements, we obtain where Ai and A 2 are square, is the union ofthe spectra of Al and A 2

yJx/< = t5 j l< Bxereise 3. Check that the matrix A e fjin" n is singular if and only if it
has a zero eigenvalue.
for I ~j, k ~ n. Note that in the sense of Section 3.13, the s~stems {xJlj=1
and {jij}j= I are biorthogonal with respect to the standard inner product
Bxercise 4. Let Ae a(A),A be nonsingular, and Ax = AX, X :;:: O. Prove that
A-I is an eigenvalue of A - I associated with eigenvector x. 0
in en.
Take transposes in Eq. (4) and we have AT = QDQ-l, so that

ATQ = QD.
4.11 The Characteristic Polynomial
This implies ATYj = AjY) for j = 1,2, ... , n. Thus, the columns of(~-I)T are
eigenvectors for AT. They are also known as left eigenvectors of A since they
satisfy yJA = A)yJ. We now bring the ideas of the theory of determinants to bear on the prob-
Now, to recast Theorem 2, observe that Eq. (4) implies, using Eq. (1.3.6), lem of finding the eigenvalues of an n x n matrix A e !Jli'n"n. As in the
situation for transformations (see Exercise 4.9.3). it is easily shown that
A = PDQT = [XI X2 ... x n]
AIYT
:T
j Aea(A) i f and only if the matrix AI - A is not invertible, or equivalently
[ (Theorem 2.6.2),
AnYn
n det().J - A) = O. (1)
L A)XjYJ.
j=1 Observe now that the expression on the left side of Eq, (1) is a monic poly-
nomial of degree n in A; .
Thus, A is expressed as a sum of matrices of rank one (or zero if Aj = ~).
and each of these is defined in terms of the spectral properties of A, that IS, det().J - A) = Co + CIA. + ... + ('n_1An-1 + cnA", Cn = 1. (2)
in terms of its eigenvalues, eigenvectors, and left eigenvectors. Consequently.
The polynomial c(A) A L;=o cIA; in Eq. (2) is referred to as the character-
the next result is called a spectral theorem. istic polynomial of the matrix A. These ideas are combined to give Theorem 1.
Theorem 3. Let A e IFn"n be a simple matrix with eigenvalues AI,"" A. Theorem 1. A scalar Ae!Jli' is an eigenvalue of A e !Jli'n" n if and only if Ais a
and associated eigenvectors XIt Xz, , X n Then there are left eigenvectors zero of the characteristic polynomial of A.
Ylt Y2.' 1nfor which yJx/< = t5 j /< , 1 S; j k S; n, and
The significance of this result is that the evaluation of eigenvalues of a
n
matrix is reduced to finding the zeros of a scalar polynomial and is separated
A = L AjxjYJ. (1)
.from the problem of finding (simultaneously) its eigenvectors.
j= I
156 4 LINIlAR TRANSFORMATIONS AND MATRICES 4.11 THIl CHARACTERISTIC POLYNOMIAL
157
Exercise 1. Check that the eigenvalues of the matrix Theorem 2. Let A E fF""" and let c(A) = L~ = 0 CiA i denote the characteristic
polynomial of A. Then the scalar cr(O :$ r < n) is equal to the sum ofall prin-

A- H_~ J cipal minors oforder n - r of A multiplied by (-I)"-r.


PROOF. Let at, a 2..
in !F", we write
, a" be the columns of A. Then, using the unit vectors

are Al = 2, A2 = 3, A3 = -1. (-l)"c(A) = det(A - AI) = det[at - lei a 2 - le2 . .. a" - le,,].
Exercise 2. Show that the diagonal elements of a triangular matrix are Since the determinant is a homogeneous linear function of its columns, we
its eigenvalues. 0 can express the latter determinant as the sum of 2" determinants of matrices
Note that ifthe field fF is algebraically closed (for definition and discussion having columns of the form ai or -Aei' Let us select those determinants in-
see Appendix 1), then all the zeros of the scalar polynomial C(A) ~elong to volving just r columns of the form - Aej for some i. There are precisely C" r of
IF. Hence the spectrum of any A E fF" "" is nonempty and consists .of at these obtained by replacing r of the columns of A by (-A) x (a unit vector)
most n points; properties that are already familiar. They are estabbsh~d in every possible way. Thus, from the lemma, each of these determinants is
in Theorem 4.10.5 and, as indicated in Theorem 4.10.1, are also properties ofthe form ( - A)' times a principal minor of A of order n - r. Furthermore,
of the underlying linear transformation. every principal minor of order n - r appears in this summation. The result
follows on putting r = 0, 1,2, ... , n - 1.
Observe also that, if A = PBP-I, then
c...(A) = det(,u - A) = det(P(U - B)P- I) = det(U - B) = CB(A).
In particular, by setting r =n- 1 and r = 0 in Theorem 2, we obtain
"
-C n - 1 = L: ajj = tr A
Thus the characteristic polynomial is invariant under similarity ~n~ hence i=\
and Co = (-l)"det A,
all matrices in the equivalence class JlI~ have the same characteristic poly-
respectively. The last result can be obtained directly by putting A = 0 in the
nomial. In other words, this polynomial is a characteristic of the transform.a-
definition of the characteristic polynomial. A fundamental result ofthe theory
tion T itself and therefore can be referred to as the characteristic polynomUlI
of scalar polynomials tells us that (-l),,-rcr is also the sum of all products
ofthe transformation. . . .
of the n zeros of C(A) taken n - r ala time (0 ~ r < n). Thus, if ill' A2' .. , A"
Proceeding to an investigation of the coefficients of th~ charact~nstlc
are eigenvalues of A (not necessarily distinct) then Theorem 2 gives the im-
polynomial of a matrix A or, equivalently, of a transformation T havmg A portant special cases:
as a representation, we first formulate a lemma.
tr A = L" Ar and det A = Il" A r (3)
Lemma 1. If A E !F""" and distinct columns it, i 2 , , ir (l ~. r -:- n) of A r=1 r=1
are the unit vectors ei., ei2'... , eir , then det A is equal to the principal minor We have seen that any n x n matrix has a monic polynomial for its charac-
of A obtained by striking out rows and columns ii' i 2 , , i r teristic polynomial. It turns out that, conversely, if C(A) is a monic polynomial
over !F of degree n, then there exists a matrix A E !Fnl(" such that its charac-
PROOF. Expanding det A by column it, we obtain teristic polynomial is C(A).
1 2 i1 - 1 i1 +1 Exercise 3. Check that the n x n companion matrix
detA=(-l)'+I'A ( 12 i - l i
l 1+1
:) o 1 0 0
1 2 i1 - 1 i 1 + 1 ... n) o 0 1
=A 1 2
( i1 - 1 i 1 + 1 .n o
Now evaluate this minor by expansion on the column numbered i 2 It is
o o o 1
found that det A is the principal minor of A obtained by striking out ro~s -co -C\ -C2 -C"-1

and columns i 1 and i2 Continuing in this way with columns ;3' ;4"'" Ir has c(l) = A" + D;J CiA! for its characteristic polynomial. (See Exercise
the result is obtained. 2.2.9.) 0
158 4 LINIlAIl. TRANSFORMATIONS ANDMATRICES 4.12 THE MULTIPLICITIES OF AN EIGENVALUE 159

The last result implies, in particular, that for any set 0'0 of k points in C Exercise 10. Show that the eigenvalues ofthe real symmetric matrix
(1 S k Sn), there is an n x n matrix A such that O'(A) = 0'0'
It was mentioned earlier that similar matrices have the same characteristic 13 4 -2]
polynomial. The converse is not true. 4 13-2
[
-2 -2 10
Exercise 4. Check that the matrices
are 9, 9, and 18.
and Exercise 11. Show that the following matrices are simple and find corre-
spondingmatrices D and P as described in Theorem 4.10.2:
are not similar and that they have the same characteristic polynomial,
c(A) = (A - 1)2. .
[~ -~ ~], 01 00 -11] ,
Exercise 5. Prove that two simple matrices are similar if and only if their
(i)
112
(ii)
[o 1 1 (iii)

characteristic polynomials coincide.


Hint. Show first that the characteristic polynomials are
Exercise 6. Show that
(a) The characteristic polynomial of the matrix
(1- 1)(A - 2)(A - 3), 0. - 1)(12 + 1), and (A - 1)(1 + 3)2,
respectively. 0
[~I ~:]
is equal to the product of the characteristic polynomials of Al and A 2
(b) If .9'0 is T-invariant, then the characteristic polynomial of TI.v.. 4.12 The Multiplicities of an Eigenvalue
divides the characteristic polynomial of T. 0
An important relation between the eigenvalues ofthe matrices A and p(A),
where pel) is a scalar polynomial, is considered next. Let A e :ff" l(" be a representation of the transformation T E !t'(f/), and
let iF be algebraically closed. Then the characteristic polynomial c(A) of A
Theorem 3. Let 10 be an eigenvalue of A e:ff" M" and let pel) be any scalar (or of T) can be factorized into a product of n linear factors:
polynomial. Then P(lo) is an eigenvalue ofp(A).
PROOF.
~~
The reader can easily verify, using the commutativity of powers of
.
c(A) = n"
1= 1
(A - AI)' (1)

peA) - p(lo)l = (A - 101)q(A) where AI e I1(A), i = 1, 2, ... , n. It may happen that some of these eigen-
values are equal, and we define the algebraic multiplicity of the eigenvalue
for some polynomial q(l). Thus det(A -101) = 0 yields det(p(A)- p(lo)I) = AI of A (or of T) to be the number of times the factor (A - AI) appears in
and the required result follows from Theorem 1. Eq.(1).In other words, the algebraic multiplicity of AI E I1(A) is its multiplicity
Exercise 7. Prove that the eigenvalues of an idempotent matrix are aI as a zero of the characteristic polynomial.
equal to zero or one. Using Theorems 4.5.6 anc14.9.2, show that an idem We have seen that Ao e I1(A) if and only if Ker(A - 101) #: {O}, and the
potent is a simple matrix, g~ometric multiplicity of the eigenvalue ,(0 of A (or of T) is defined as the
dunension of the subspace Ker(A - Aol). Note that by referring to the sub-
Exercise B. Prove that the eigenvalues of nilpotent matrices are all equ . sPace Ker(A - AI) as the eigenspace of the matrix A associated with the
to zero and that nonzero nilpotent matrices cannot be simple. eigenvalue ..l, we can restate for matrices all the results concerning eigen-
Exercise 9. Verify that the eigenvalues of A and AT coincide, while th s~aces for transformations. We also point out that scalars that fail to be
eigenvalues of A and A are complex conjugates. eIgenvalues of A can be viewed as having zero (algebraic and geometric)

1
:'i' ,; !! '
Ii' q
4 UNBAR TRANSFORMATIONS AND MATlUCIllI 4.13 ApPLICATIONS TO DIFFEIUlNTlAL EQUATIONS 161
160

multiplicities. However, the algebraic and geometric multiplicities are not


4.13 First Applications to Differential Equations
generally equal.
Example 1. Consider the matrix Consider a system of ordinary differential equations with constant co-
efficientsof the following form:
A = [~ ~l Xl = aUxl + Q12 X2 + + QlnXn,
1
The characteristic polynomial C(A) of A is c(A) = ..1. and therefore the x1 = a1l xl + a12 x1 + + aznxn,
algebraic multiplicity of the zero eigenvalue Ao = 0 is 2. On the other hand,
dim(Ker(A - ..1.0 1) = dim(Ker A) = 1 and the geometric multiplicity of
Ao=Oisl. 0 Xn = anlx l + an2 x Z + ... + annxn,
Theorem 1. The geometric multiplicityof an eigenvalue does not exceed its where Xl> , X n are functions of the independent variable t, the symbol
algebraic multiplicity. ~J denotes the derivative of Xj with respect to t; and the coefficients aIJ are
Independent of t. We can obviously abbreviate this set of equations to one
PROOF. Let A. be an eigenvalue of the n x n matrix A and let matrix equation:
m" = dim(Ker(A - ..tl). i(t) = Ax(t), (1)
If r denotes the rank of A - M then, by Eq. (4.5.2), we have r = n - m" and, w~ere x(t) = [Xl X z ." XJT and A = [alj]i',j=a' Note that we may
all the minors of A - M of order greater than n - m" are zero. Hence wnte x or i instead of x(t) or i(t) if their dependence on t is clear from the
if c(p) = L1=0 Clpi is the characteristic polynomial of A - M, Theorem context.
4.11.2 implies that Co = Cl = ... = Cm..-l = O. Therefore . If A has at least one eigenvalue, we can easily demonstrate the existence
of a solution of i = Ax. We look for a solution of the form x(t) = xoe AoI,
. where Xo and AO are independent of t. Then x = ..lox and Eq, (I) can be
i=mA. written (A - Ao l)xo = O. Thus, if we choose Xo to be a (right) eigenvector
and must have a zero of multiplicity greater than or equal to mAo It remains of A with eigenvalue ..1.0 , then x(t) is indeed a solution. Furthermore, if
to observe that, if c(p) is the characteristic polynomial of A, then ~ e C""n, such a solution always exists because every A then has at least one
eigenvalue with an associated (right) eigenvector.
c(p.) = c(p. - A).
Proposition 1. The vector-valued function x(t) = xoe Ao' is a solution of Eq.
Denote by La(respectively, L)the sum of the algebraic (respecti~eIY,
(I) if and onl)' if ..1.0 is an eigenvalue of A associated with an eigenvector x o'
geometric) multiplicities of the eigenvalues of an n x n complex matrix A.
Then Theorem 1 implies Hence the problem of determining the eigenvalues and eigenvectors of a
matrix is closely related to finding the solutions of the corresponding system'
L.s~=n. of differential equations. Can we characterize the solution set f/' of all solu-
In the caseL. = n, the relations (2) imply that the algebraic and geome . tions of Eq. (1)1 It is easily verified that f/' is a linear space and it will be
multiplicities of each eigenvalue of A must coincide. On the other hand, shown later that the dimension of this space is n (see Section 7.10). Here, the
we use Proposition 4.10.4, the equality L= n also means that the sp .' result will be proved for simple matrices only. The first exercise shows how
fFn can be spanned by nonzero vectors from the eigenspaces pf th.e ~atn ~subspace of f/' can always be generated.
that is, by their eigenvectors. Hence if r.. =
n, then th~ matrix A IS ~Impl
It is easily seen that the converse also holds and a criterion for a matnx to
I1xereise 1. Let A E en"" and let f/' denote the solution space of Eq. (1).
~f AI' A2' . , A, are the distinct eigenvalues of A associated with the eigen-
simple is obtained. Vectors XI> X 2, , x" respectively, show that the set
1beorem 1. Any matrix A with entries from an algebraically closed fie 9'Q = span{xle A1' , xzeA. ', , x,e A, ' }
is simple if and only if the algebraic and geometric multiplicities ofeacheig a linear subspace in f/' of dimension s.
valueofA coincide.
162 4 LINEAR TRANSFORMATIONS ANDMATIlICES 4.13 ApPLICATIONSTO DIFFERENTIAL EQUATIONS 163

Hint. Observe that the linear independence of' XI' X2"'" x, implies that constant Cj' Thus, in this case, z(t) = diag[e A" , , e"h']c and so every
of xle A", X2eA2', , x,e A.' and use Proposition 1. 0 solution of Eq, (1) has the form
For a simple matrix A we can say rather more than the conclusion of x(t) = P diag[e A" , , eAh']c (6)
Exercise 1, for in this case we have a basis for C" consisting of eigenvectors for some C E C", C = [el C2 '" C,,]T.
of A. In fact, if x" . . . , x" is such a basis with associated eigenvalues A" ... , A.", Observe that Eq. (6) just means that any solution of Eq. (1) is a linear
respectively, it is easily verified that the n vector-valued functions xje AJ', combination of the primitive solutions xje AjI, j = 1,2, ... , n. Since these
are solutions, j = I, 2, ... , n, and they generate an n-dimensional subspace primitive solutions are linearly independent (see Exercise I), it follows that
of solutions, f/ o- We are to show that f/ 0 is in fact, all of!/. if !I' is the solution set of Eq, (1) and A is a simple n x n matrix, then
More generally, let us first consider the inhomogeneous system dimf/ = n.
i(t) = Ax(t) + I(t) (2) In Chapter 9 we will return to the discussion of Eq, (2) and remove our
hypothesis that A be simple; also we will consider the case when I(t) #: 0
(where/(t) is a given integrable function). It will be shown that Eq. (2) can more closely.
be reduced to a set of n independent scalar equations. To see this, let eigen-
Exercise 2. Describe the solutions of the following system of differential
vectors, x" ... , x" and left eigenvectors Yh , YII be defined for A as re-
quired by Theorem 4.10.3. If P = [XI ... x,,] and Q = [YI ... Y,,], equations with constant coefficients (x] = xj..t),j = 1,2,3)
then the biorthogonality properties of the eigenvectors (see Eq, (4.10.6 are Xl = 2x l
summarized in the relation X2 = 11XI + 4X2 + 5x 3 , (7)
QT P = I, (3)
X3 = 7Xl - X2 - 2X3'
and the representation for A itself (see Eq. (4.10.7 is SoLUTION. Rewrite the system in matrix form:
A =PDQT, (4) x = Ax, (8)
where
where D = diag[AI' ... , All].
The reduction of the system (2) to a set of scalar equations is now accom- 2 0 OJ
A= 11, 4 5,
plished if we substitute Eq. (4) in Eq. (2), multiply on the left by QT, and
take advantage of Eq. (3) to obtain
[
7 -1 -2
QTi(t) = DQTx(t) + QTf(t). and recall (see Exercise 4.11.1) that A has eigenvalues 2, 3, and -1 corre-
sponding to the eigenvectors Xl = [3 -79 25]T, X2 = [0 5 _1]T,
Introducing new variables ztt) = yJx(t) for j = 1,2, ... , n and transformed and X3 = [0 1 _1]T. Thus, by Eq, (6), any solution X = [Xl X2 X3 ] T
prescribed functions tPJ{t) = yJ!(t), the equations become i(t) = D:(t) + of Eq. (8) is of the form
fP(t), where
z(t) = [Zl(t) Z2(t) .. Z,,(t)]T, fP(t) = [fI'l(t) fl'2(t) .. fI',,(t)]T.
X = c lx le ,
2
+ C2 X2 e3 + C3 X3 e- ' = [XI X2 X3] 0
[
e'
2
0 0] [Cl]
e3 0_ C2

Or, equivalently, we have n "uncoupled" scalar equations o 0 e' C3


or, What is equivalent, comparing the components of the vectors,
zJ{t) = AJZtt) + cptt), (5)
Xl = 3Cle2',
for j = 1, 2, ... , n. These can ~ solved by element~ry methods and the
solution ofthe original system (2) IS recovered by observing that z(t) = Q x(t)
T
X2 = -79cle2'+
SC2e3' + C3e-',
and, using Eq. (3), x(t) = Pz(t). ' 2
X3 = 25c le - C2e3. - c3 e- ',
For the homogeneous Eq. (1) (whenj'(r) = 0 for all t in Eq. (2)?, we simply'
have cptt) = 0 in Eq. (5) and the solution of (5) is zJ{t) = cJe"J' for some for some c" C2, C3 E C. 0
164 4 LINEAR TRANSFORMATIONS ANDMATRICIIS 4.14 MISCELLANIlOUS EXIiRCISES 165

4.14 Miscellaneous Exercises where k :s;; n - 1. Then observe that any subspace in 9' containing
Im(T - Al) is T-invariant.
6. Prove that for any T e !R(f/) there is a set of T -invariant subspaces
1. Let 9' = 9'1 + 9'". Define a transformation T on 9' by V, (i == 1,2, ... , n = dim f/) such that
T(x) = X1> 9'1 c: 9'1 c: ... c: 9'/1-1 c: 9'/1 = 9'
where x = Xl + X2 and Xl E f/l' X2 E f/1' Prove that and dim 9', = i for i = 1, 2, .. , n.
(a) T E 9'(9');
(b) f/1 and f/1 are T-invariant;
Hint. Use an inductive argument based on the previous result.
(c) T satisfies the condition T" = T(i.e., T is idempotent). 7. Interpret the result of Exercise 6 for matrices. Show that the represen-
The transformation T is called a projector of f/ on f/1 parallel to V 1 tation of T with respect to the basis" = {Xlo X2"'" x,,}, where x, e 9'1
(1 :s;; i :s;; n) (see Exercise 6) is an upper-triangular matrix, while with
Observe that Tlv. is the identity transformation on f/l'
respect to 8' = {X/I' X,,_I"'" XI}' it is lower-triangular (Schur's
2. Consider a simple linear transformation T E 9'(V) where dim f/ = n, theorem).
A1> A1 , . , A" are its eigenvalues, and Xl' X1' . ' x" are corresponding 8. Prove that if A is a simple matrix with characteristic polynomial c(A),
eigenvectors which span 9'. Define f/1 = span(xJ for i = 1, 2, ... , n. then c(A) = O. (This is a special case of the Cayley-Hamilton theorem,
Show that T = D= I A,I" where I, is the identity transformation on to be proved in full generality in Sections 6.2 and 7.2).
.91 (1 :s;; i :s;; n) and that f/ = L1=1 . 9'1; this is the spectral theorem for
simple transformations. Hint. Use theorem 4.10.2.
3. Consider the spectral decomposition presented as Theorem 4.10.3. Let, 9. Show that the inverse of a nonsingular simple matrix A E e" X" is given
" by the formula
Gj = Xj1J for i = 1,2, ... , n, so that A = rAjG j.
j=l
A-I =- .!...(A"-l + C"_IA"-2 + ... + CIl),
(a) Show that each Gj has rank one and that G1>"" G"aremutua1ly Co
orthogonal in the sense that Gt Gj = 8/d G" for 1 :s;; i, k :s;; n. where c(A) = A" + r1:A C,A' is the characteristic polynomial of A.
(b) If P(A) is any scalar polynomial, show that P(A) is simple and
P(A) == D=l
p(Aj)Gj
Hint. Observe that A - l c(A) = O.
The matrix Gj , 1 :s;; J :s;; n, is referred to as the constituent matrix of A' 10. Check the following statements.
associated with A J . , (a) If A or B is nonsingular, then AB ~ BA. Consider the example
4. The matrix RA == (Al- A)-I, called the resolvent of A, is defined at
the points A~ a(A). Verify the following statements:
(a) R A - R p == (p - A)RAR p ;
(b) If A is simple and G 1, G1 , ... , G"aredefinedasm ExerCIse 3, then
to confirm that in the case of singular matrices, AB and BA may not be
" 1
R" == I:
'=1A
.,---,-
- A,
G" At a(A). similar.
T
(b) For any A, Be.'F"x", the matrices AB and BA have the same
Hint. First use Eqs, (4.13.3) and (4.13.4) to write R" = P(Al ..,.. D)-la : characteristic polynomial and hence the same eigenvalues.
5. Prove that if dim f/ = n, any transformation T E 9'(f/) has an invarianf. H'nt. For part (b) use the following polynomial identity in A:
subspace of dimension n - 1.
Hint. Let AE aCT) and use Eq. (4.5.2) for T - Al to confirm the exis
det(Al - (A - pI)B) = det(U - B(A - pI
tenee of the k-dimensional subspace Im(T - Al) invariant under for p e ' such that A - pI is nonsingular,
166 4 LINEAU TRANSfORMATIONS AND MATRICES 4.14 MISCELLANEOUS EXERCISES 167

11. Prove that if AB = BA, then A and B have at least one common eigen- Hint. Use . Exercise 2.11.14 to2 determine the eigenvalues
.
A."=/a(s
c ,
)
vector. Furthermore, if A and B are simple, then there exists a basis in an d th e eigenvectors [1 s/c s" S:-1]T, where
the space consisting of their common eigenvectors. n-l
Hint. Consider span{x, Bx, ... BSx, ...}, where Ax = ,lx, x 0, and a(,l) =L aj,li
i=O
use Exercise 4.9.8 to establish the first statement.
and
11. Let A e in IC n. Show that, in comparison with Exercise 4.5.12,
(a) If the zero eigenvalue of A has the same algebraic and geometric
S/c = cos
21Ck) + i sin. (21Ck)
(n n' k = 0, 1... n - 1.
multiplicities, then Ker A = Ker A 2 ;
(b) If A is simple, then 1m A 2 = Im A.
17. Let V E !(,Cf'I' 9'z), X E fR(9'z, 9'1) and satisfy XV = I, the identity on
13. If A is an idempotent matrix of rank r, show that tr A = r. 9'1' Show that 9'2 = Im V +Ker X.
Hint. Use Exercise 4.11.7, Theorem 4.10.3, and Eq. (4.11.3). Hint. To show that 9'2 = 1m V + Ker X, let x e9'2 and write x =
14. Let A e r IC n and let A = [II II ... II], where II E in. Show that at
p + W, where I' = VXx. Then show that the sum is direct.
least n - 1 eigenvalues of A are zero.
Hint. Use Theorem 4.11.2.
15. (a) Let An be the n x n matrix

2C 1
I 2c
0
1 (o~ ],

where C
[

= cos 8. If D;
! .:. :
= det An' show that
1
1
2c

D = sin(n + 1)8
n sin nO
(b) Find the eigenvalues of An'
Note that Dn(c) is the Chebyshev polynomial of the second kind, Uic).
The matrix An with c = - 1 arises in Rayleigh's finite-dimensional ap-
proximation to a vibrating string,

Hint. For part (b), observe that An - A.I preserves the form of An for
small Ato obtain the eigenvalues of An:

A" = 2(C - cos n ~l). k = 1,2, ... n.


16. Find the eigenvalues and eigenvectors of the n x n circulant matrix
with the first row lao a1 ,.. an-.J.
S.l ADJOINT TRANSFORMATIONS 169

in Exercise 3.12.8,it is sufficient to know these inner products for all vectors
y in a basis for "Y. It follows that the action of T is determined by the values
CHAPTER 5 of all t,he numbers (Tx, y):z, where x, yare any vectors from'" and 1";
respectively. We ask whether this same set of numbers is also attained by
first forming the image of y under some T1 E fe(1"; "'), and then taking the
(, )1 inner product with x. In other words, is there a T1 E !l'(1"; 1ft) such

Linear Transformations that


(T(x),Yh = (x, T1(Y 1 (1)
in Unitary Spaces
forall x E IfI and Y e "Y'1 The answer is yes,and the" partner" for T, written T*,
and Simple Matrices is known as the adjoint of T.
To define the adjoint we consider an orthonormal basis U It , Un for
1ft. Then define a map T*:1'" .... '" in terms of T by
n
T*(y) = L (y, T(Uj2Uj
j=1
(2)

for any y E "Y. We check first that T* e fe( 1"; 1fI). If y, z e "Y and 01, p E fIE
then, using the linearity ofthe inner product (Section 3.11),
The results discussed in Chapter 4 are obviously applicable to linear trans- n n
formations mapping one unitary space into another. Some additional T*(OIY + pz) = oc L1 (Y, T(Uj2 Uj + P L1 (%, T(Uj1 U
j= j=
j
properties of transformations resulting from the existence of the inner
product are considered in this chapter. In particular, it will be shown that =ocT*()I) + PT*(z),
any linear transformation acting on a unitary space has a dual transforma-.
by definition (2) of T*. Thus T* e fe( 1"; 1fI). Our first theorem will show that
tion, known as its "adjoint" Normal matrices' and transformations are'
defined and studied in terms of adjoints prior to the analysis of the important r*, so defined, does indeed have the property (1) that we seek. It will also
subclasses of Hermitian matrices (and self~adjoint transformations) and Show that our definition of T* is, in fact, independent of the choice of ortho-
~ormal basis for 1fI.
unitary matrices and transformations. All of these special classes are subsets.
of the simple matrices and transformations introduced in Chapter 4. , First we should note the important special case of this discussion in which
Other important ideas associated with simple matrices are developed, 1ft == "Y and the inner products also coincide. In this case, the definition (2)
reads
for example, projectors and the reduction of quadratic forms. The chapter
concludes with an important application (Section 5.12)and some preparatory n

ideas for subsequent applications in systems theory (Section 5.13). Note T*(x) = L (x, T(ujuj, (3)
that throughout this chapter the underlying field is assumed to be 9&' C = j=1

or9&' == iii. and T*, as well as T, belongs tofe("').

Theor.l. Let T E !l'(IfI, 1'"), where IfI is a unitary space with inner product
(, )1 and 1'" is a unitary space with inner product ( , h. The adjoint trans-
5.1 Adjoint Transformations
~011nQtion T* E !l'("Y, "') satisfies the condition
(T(x), y)z = (x, T*(Y. (4)
Let 1ft, 1'" be unitary spaces with inner products (, ). and (, )1' re-
spectively, and let TeftJ(Ift,1'"). Observe first that any vector Ile"Y . every x e "',y e "Y and is the unique member of !l'(1"; "')for which Eq. (4)
uniquely determined if (Il, "h is known for all "e -r.
Indeed, as indicat Conversely, if Eq. (4) holds, then T* is the adjoint of T.

168

I;
.'~, "
170 5 LINI!AIl TRANSFORMATIONSIN UNITARY SPACIlS 5.1 ADJOINT TRANSFORMATIONS
171

PROOF. It has been shown that T* e 9'("Y, 1ft). To prove the equality in (d) (Tl + 12)* = TT + T!;
Eq. (4) we use Exercise 3.12.3 to write x = Li.l (x, "1)1"1' Then, using the (e) (T:z T1) '" = TrT~ for every T E !f(OI1, "IY), T
l 2E !f(nJY, 1").
linearity of T,
SOLU~ION. We shall prove only the last equality, since all of them are
(T(x),1)2 = (t
1-1
=
(x, "1)1 T(u,),
:z 1= 1
1) t
(x, "IMT(ul)' y};t. (5)
establishedanalogously. Successive use ofEq. (4) gives
(x, (72 TI)"'(1 = (T2 T1(x), y) = (Tl(x), Tt(y = (x, TTTt(y
On the other hand, by Eq, (2) and the orthonormality of the basis,
for all x e 1ft, y E r:. Hence (see Exercise 3.13.2), (T2 TJl*(y) = T!Tr(1) for

(x, T*(yl = (tl (X,III)IIlI' Jl (Y, T(IIJ)hIlJ)1


allYE 1" and therefore part (e) holds.

Exercise J. Check that T e 9'(Ift) is invertible if and only if T* e !R(Ift) is


n
= L (x,IIIMy, T(u.)h. (6) invertibleand that in this case,
1= 1
(T"')-l = (T-l)'l<. (7)
Since (Y, T(uI)h = (T(u.), yh, Eq, (4) follows on comparing Bq. (6) with
Eq. (5). SoLUTION: Let T be invertible. If T*(x) = 0, then by Eq. (3),
For the uniqueness, let (T(x), Y)2 = (x, Tl(Yl for some T 1 e 9'("Y, 1ft) n
and all x e 1ft, y e r:. Then, on comparison with Eq. (4), L (x, T(u/ul = 0
1=1
(x, T1(Y1 = (x, T*(Yl
and consequently (x, T(II I = 0, i = 1,2, ... , n. The invertibility of T
for all x e 1ft and y e r:. But this clearly means that (x, (T1 - T*)(Y1 = 0 means that since all the elements T(uI) are images of the basis elements in
for all x e 1ft and 1 e "Y, and this implies (by Exercise 3.13.2) that Tl - T* = ft, they generate another basis in 1ft. Hence by Exercise 3.13.2; we must have
0, that is, Tl = T*. x =: 0, and by Theorem 4.6.3, the transformation T* is invertible. The
The proof of the converse statement uses similar ideas and is left as an converse follows by applying part (b) of Exercise 2. To establish Eq. (7), write
exercise. II
(x, (T-l)*(y = (T-l(x),y)
It is interesting to compare Eq. (4) with the result of Exercise 3.14.8. The
similarity between these statements will be explained in our next theorem. and use the invertibility of T* to find an element z e 1ft such that T*(z) = y.
Hence, continuing, we have by Exercise 2(b)
Exerci,e 1. Let T e 9'('3, ,4) be defined as in Exercise 4.2.1.Check that
1(x),
the transformation (T- y) = (T-l(x), T"'(z = (TT-l(X), z) = (x, z) = (x, (T*)-l(y
Y2 + Y3 + Y4] forany x, y e 1ft. Thus, reasoning as in the solution of Exercise 2 we deduce
Tl(y) = Yl + ~: - Y4 , the required relation (7). 0 '
[

for all y = [Yl Y2 Y3 Y4]T e /lF4, is the adjoint of T with respect to the Proceeding to the study of matrix representations for a transformation
standard inner products in /lF3 and /lF4. Confirm Eq, (4) for the given trans- T e !l'(Ift, 1"') and its adjoint, we discover the reason for the use of the" star"
formations. 0 notation for both the adjoint of a linear transformation and the conjugate
Some important properties of the adjoint are indicated next. transpose of a matrix.

Exercise 2. Prove the following statements: Theorem 2. Let T e 9'(Ift, 1"') and let T* denote the adjoint ofT. IfA e/IF m x n
(a) 1* = I, 0* = 0; is t,,! representation ofT with respect to the orthonormal bases (8, ~), then the
(b) (T*)* = T; coflJugate transpose matrix A * e IFn x m is the representation ofT* with respect
(c) (aT)'" = tiT* for any IXE /IF; to the pair ofbases (~, 8).
172 5 UNBAR. 'TRANSFORMATIONS IN UNITARY SPACIllI 5.1 ADJOINT 'TRANSFORMATIONS 173

PROOF. Let 8 = {U .. UZ, '''II} and f = {Ill> V2, , vm} denote ortho- Exercise 9. Prove that if T e !(O/I), the" set 8 = {u1, "2"'" ulI } is an
normal bases in'fl and 1'; respectively. Then the representation A = [alj]~j~ 1 orthogonal basis in 'Pt, and A is the representation of T with respect to
is found from the equations 8, then the representation A' of T* with respect to the same basis is

T(Uj)
m
=L aliVi, j = 1,2, ... , n,
A' = D- 1AD, (8)
i=l where D = diag{(uh UlHi_ r- 0
where ali = (T(NJ), IIJl' and i = 1,2, ... , m,j = 1,2, ... , n. Similarly, for In particular, the relation (8) shows that A' = A* provided the basis"
the element bJ, (1 S; i S; m, 1 S; j S; n) of the representation of T* with is orthonormal. However, it turns out that the assertion of Theorem 2 re-
respect to the bases f and 8, we have mains true if 0/1 = -r and the two sets of basis vectors are biorthogonal
in 'Pt (see Section 3.13).
II
T*(vj) = L bjj"j' i = 1,2, ... , m, Theorem 3. Let T e !(tfI) and let (8, 8') denote a pairofbiorthogonal bases
J= 1 in0/1. If A is the representation ofT withrespect to 8, then A is the representa-
where bJ, = (T*(Vi)' Uj)l' for 1 S; i S; m and 1 S; j S; n. Hence Theorem 1 tion of T" with respect to G'.
implies
~ooF. Let 8 = {UhU2""'UII},'" = {"'1>Ui, .. ,u~},andA = [aij]7.J=I'
bji = (llit T(uj)h = (T(uJ)' v;)z = al) lS;iS;m,lS;js;n, Since T(uJ) = L7= 1 aijul (1 S; j S; n) and (Uk'U;) = ~ki (l S; i, k S; n), it
follows that (T(u j),,,1> = aij' Computing the entries bij of the representation
and the assertion follows. B of T* with respect to 8' in a similar way, we obtain b'J = (T(uj), 141)
(1 S; i,j S; n). Hence for all possible i andj,
Exereise 4. Check the assertion of Theorem 2 for the transformations T
and T1 = T* considered in Exercise 1. hi) = (uj, T(uI = (--T""'(u-I)""'-u~--.) = a)i>
Exercise 5. Deduce from Theorem 2 properties of the conjugate transpose and therefore B = A *.
similar to those of the adjoint presented in Exercises 2 and 3. Exercise 10. Check that if T e !(tft) then ..1. e aCT) if and only if Xe aCT*).
Exercise 6. Check that for any Te !(O/I, -n, Hint. See Exercise 4.11.9.

rank T = rank T*, Exercise 11. Prove that the transformation T e !('Pt) is simple if and only
if T* is simple.
and that, if 0/1 = 1";then in addition
Hint. Use Theorem 3.
defT = defT.
Exercise 12. If T E !(0/1) is simple, show that there exist eigenbases of.
Exercise 7. Let Te !(O/I, -n and let T* denote the adjoint of T. Show that T and T in tfI that are biorthogonal.

0/1 = Ker T E9 Im T*, -r = Ker T$ Im T. Hint. Consider span{xhx2, ... ,Xj_hXj+lo""XII}, 1 s i s n; where
{xi}7= 1 is an eigenbasis of T in tfI, and use Exercise 8.
(See the remark at the end of Section 5.8.)
Exercise 13. If T e .5f(tfI) is simple and {Xlo Xl' ... , XII} is its eigenbasis in
Hint. Note that (x, T(y)h = 0 for aUy e'fl yields xeKer T*. ~, prove that a system {Ylo Yz, .. 'YII} in "1/ that is biorthogonal to {XiH=1
ISan eigenbasis of T* in "1/. Moreover, show that T(x,) = ..1.j x l yields T*CY,) "=
Exercise 8. Let T e !(tft). Check that if a subspace 0/10 c: 0/1 is T -invariant, AIY" 1 S; i S; n.
then the orthogonal complement to 'flo in 'fI is T*-invariant. 0
Hint: Use Eq. (4).
The next example indicates that the relation between the representations
of T and T with respect to a pair of nonorthonormal bases is more com- Exercise 14. Check that the eigenspaces of T and T* associated with non-
plicated than the one pointed out in-Theorem 2. complex-conjugate eigenvalues are orthogonal. 0
174 S UNBAR TRANSFORMATIONS IN UNITARY SPACES 5.2 NORMAL TRANSFORMATIONS AND MATRICIlS 175

We conclude this section with an important result concerning the general In general, if the matrix B is similar to A and there exists a unitary trans-
linear equation T(x) = b, where T E !R(dIt, 1") and bE "Y are given andsolu- forming matrix U so that
tions x in dIt are to be found. This is a general result giving a condition for B = UAU = UAU- 1 ,
existence of solutions for any choice of bE "Y in terms of the adjoint operator
T*. It is known as the Fredholm alternative. (Compare with Exercise 3.l4.10.) then we say that B is unitarily similar to A, and observe that B and A are
simultaneously similar and congruent. t It is easily seen that unitary similarity
Theorem 4. Let T E !R(dIJ, "Y). Either the equation T(x) = b is solvable for is an equivalence relation. In other words, .91~ coincides with the equivalence
any b e "Y or the equation T"'(y) = 0 has nonzero solutions. class of matrix representations of Te 9'(<ft) with respect to orthonormal
PROOF. Since T'" is a linear transformation it is enough to show that bases in tft, and all matrices in this class are unitarily similar. Note that the
T(x) = b solvable for any b e "Y is equivalent to T*(y) = 0 implies y = O. siniplest(canonical) matrix in this class is a diagonal matrix ofthe eigenvalues
of T. We now summarize the discussion of this paragraph in a formal state-
Now the solvability condition clearly implies that Im T = 1': But, by
Exercise 7, ment.
r = Im TE9Kcr T*, Theorem I. An n x n matrix A is normal if and only if it is unitarily similar
to a diagonal matrix ofits eigenvalues;
consequently Ker T* = {O}, i.e., T*(y) = 0 implies y = O.
Clearly, the argument can be reversed to establish the converse statement.

5.2 Normal Transformations and Matrices


.' A = UDU*,
where D = diag[iliJ7", l> ille u(A), i = 1,2, ... , n, and U is unitary.
Note (see Theorem 4.10.2) that the columns of U viewed as vectors from
fF" give the (orthonormal) eigenvectors of A and that D in Eq. (1) is a canon-
icalform for matrices in .91~ with respect to unitary similarity.
(1)

It was shown in Section 4.9 that a simple transformation T acting on a Recall that in IR" x" the unitary matrices are known as orthogonal matrices
linear space f/ has an eigenbasis in f/. If the space is unitary, then the pos- (ref. Exercise 2.6.16) and satisfy the condition UTU = I. Naturally, real
sibility arises of an orthonormal eigenbasis of T, that is, a basis consisting ot matrices A and B from /R"Xft are said to be orthogonally similar when B =
mutually orthogonal normalized eigenvectors of the transformation: UAUT for some orthogonal matrix U E IR"Xft.
Obviously (see,Theorem 4.9.2) transformations T with this property are, Exercise 2. Let A denote a normal matrix. Prove that the matrix A P is
necessarily simple. Thus, a transformation T e !R(dIt), where 1ft is unitary,' normal for any positive integer p (and, if det A ~ 0, for any integer pl.
is said to be normal if there exists an orthonormal basis in tft consisting en-
tirely of eigenvectors of T. The normal matrices are defined similarly. Exercise 3. Show that the matrix A is normal if and only if A* is normal
and that the eigenspaces associated with mutually conjugate eigenvalues,
Exercise 1. Show that a transformation T e !R(tir) is normal if and only if
of A and A'" coincide.
the representation of T with respect to any orthonormal basis in tft isa
normal matrix. Exercise 4. Check that if A is normal then, for any polynomial p(A.) over C,
the matrix peA) is normal.
Hint. See Theorem 4.2.2 0
Exercise 5. Prove a spectral theorem for normal matrices (see Theorem
For a further investigation of normal matrices and transformations, wd
4.10.3): If A is ann x n normal matrix with eigenvalues A.h il2 , . , il,., then
need some preparations. First, let us agree to use the notation .r;I~ for the'
there is an associated set of orthonormal eigenvectors {Xl> X2' , x n} such
set of all matrices A e :Fft'X ft such that A is a representation of Te9'(IfI) wit4i
respect to an orthonormal basis in dIt. Furthermore, recall (Exercise 3.14.12J, that A == 'JJ= tAj xjxT- 0
that the transition matrix P e :Fft 'X ft from one orthonormal basis in dIt into" If a: matrix fails to be normal then by Theorem 1 it cannot be unitarily
another is a unitary matrix, that is, P satisfies the condition P"P = I. Hence similar to a diagonal matrix. However, it is interesting to ask how far an
Theorem 4.3.2 implies that any two representations A and A' from
are related by the equation A' = PAp, where P'is some unitary matrix~ tCongruence of matrices will be studied in Section 5.5.
176 5 LlNEAll TRANSFORMATIONS IN UNITAllY SPACBS 5.2 NORMAL. TRANSFORMATIONS AND MATRICES 177

arbitrary matrix can be reduced by means of a unitary similarity. The next Hence s IJ = 0 for j = 2, 3, ... , n. The 2;2 element of the equation SS'" = S*S
theorem, which is due to Schur! and Toeplitz~, casts some light on this yields
more general question. IS2212 + IS2312 + ... + IS2nl2 ... Isd 2 + IS2212,
TheoI'em 2. Any square matrix is unitarily similar to an upper triangular and since we have proved S12 = 0 it follows that S2J = 0 for j ... 3,4, ... , n.
matrix. =
Continuing in this way we find that sJk 0 whenever j =I: k and 1 ~ j, k ~ n,
Thus, S must be a diagonal matrix and A ... USU"'. Now use Theorem 1.
PROOF. Let A e F n )( n and let 1ft be the unitary space of iF" equipped with the
standard inner product. Then, in the natural way, A determines a transfor- Note that by virtue of Exercise 1 and the relation (4.2.4), a similar result
mation T e ~(Ift). By Exercise 4.14.6 there exists a chain of T-invariant holds for transformations.
subspaces Theorem 3'. A transformation T E ~(l!) is normal if and only if
{O} = l!o c: l!1 c: ... c: l!n = l! TT'" = T"'T. (3)
such that l! I has dimension i for each i. It follows that, using the Gram- Proceeding to special kinds of normal transformations, we first define
Schmidt procedure, an orthonormal basis {U lo " " un} can be constructed a transformation TE~(l!) to be self-adjoint if T = T*, and to be unitary
=
for l!with the properties thatu,El!1 and UIl!I-1 for i 1,2, ... , n, Then if TT'" = T"'T = I.
the T -invariance of the chain of subspaces implies that the representation of Exercise 6. Check that a transformation T e .sf(l!) is self-adjoint (re-
T with respect to this orthonormal basis must be an upper-triangular matrix, spectively, unitary) if and only if its representation with respect to any
B say (see Section 4.8). orthonormal basis is a Hermitian (respectively, unitary) matrix. 0
But the matrix A is simply the representation of T with respect to the
standard basis for Fn (which is also orthonormal), and so A and B must be The important results of Exercise 6 allow us to deduce properties of self-
unitarily similar. adjoint and unitary transformations from those of the corresponding matrices
studied in the subsequent sections. If not indicated otherwise, the inner
The Schur-Toeplitz theorem plays a crucial role in proving the next product in F n is assumed to be standard, that is, (x, y) ... y"'x for any
beautiful characterization of normal matrices; it is, in fact, often used as the ~ye?~ .
definition of a normal matrix.
Exercise 7. IfTe~(l!) is normal and {U"U2' ... ,un} is an orthonormal
Theorem 3. An n x n matrix A is normal if and only if eigenbasis in diI, prove that {uIH= 1 is also an (orthonormaljeigenbasis of T*
AA* ... A*A.
in 1ft. Furthermore, if T(u,) = A;Uh then T*(u,) = Al", for all i =
1, 2, ... , n.
Obviously, a similar result holds for normal matrices.
PROOF. If A is normal, then by Eq. (1) there is a unitary matrix U and a Hint. See Exercise 5.1.13.
diagonal matrix D for which A ... UDU"'. Then
Exercise B. Find a real unitary matrix U and a diagonal matrix D for which
AA'" ... (UDU*XUDU"') ... UDDU"' ... UDDU* ... (UDU"')(UDU"') = A*.4; A = UDU T if
~ ~ -:].
so that A satisfies Eq. (2).
Conversely, we suppose AA'" ... A"'A and apply Theorem 2. Thus, there' A = [
is a unitary matrix U and an upper triangular matrix S ... [SlJ]~J= 1 for whic~ -4 4 3
A ... USU"', It is easily seen that A.A.'" ... A."'A. if and only if SS .. SS: Exercise 9. If Te ~(l!) is normal and l!o is a T-invariant subspace in l!,
The 1,1 element of the last equation now reads show thai Tl lfto is a normal transformation.
IS1112 + IS1212 + ... + ISlnl2 ... IS 11 /2 , Exercise 10. Consider a simple transformation T e ~(f/J). Show that an
inner product ( , ) can be defined on. f/J such that the transformation
t Malh. Annalen 66 (1909), 488-510.
~ Malh. Zeilschrifl2 (1918), 187-197. !f e ~(diI) is normal, where l! ... f/J, ( , ). 0

,. ,
178 5 LlNI!All TRANSFORMATIONS INUNlTAIl.Y SPACES 5.3 HERMITIAN AND DI!FINITB MATRICES 179
5.3 Hermitian. Skew-Hermitian. and Definite Matrices where i
2
= -1 and HI and H 2 are Hermitian matrices defined by
1 1
The structure and properties of Hermitian matrices are particularly
u, = Bfe A A 2(A + A*). Hz =Jm A A 2i (A - A*). (2)
simple and elegant. Because such matrices arise frequently in applications, The representation (1) is called the Cartesian decomposition of A. Its
it is important that we fully understand them. In particular, A is Hermitian analog in the case n = 1 is the Cartesian decomposition of a complex number
if A* = A. Thus, using Theorem 5.2.3, it is apparent that a Hermitian matrix in the form z = a + ib, where a and b are real.
is normal and so they have all the properties of normal matrices established
in the previous section. Exercise 3. Check that a matrix A is normal if and only if its real part
(Bfe A) and imaginary part (Jm A) commute.
Theorem 1. The spectrum ofa Hermitian matrix is real.
Exercise 4. Prove that if A is normal and A. E cr(A), then A. = ..1. 1 + iA2.
PROOF. A Hermitian matrix Hisnormal and so, by Eq. (5.2.1),H = UDU*, where At E u(Bfe A) and A2 E cr(Jm A). 0
where U is unitary and D is a diagonal matrix of the eigenvalues of H. But
H* = H implies UlJU* = UDU* and hence Jj = D, from which the result Consider a Hermitian matrix H of order n. Since all eigenvalues of H
follows. are real, they can be ordered: ..1. 1 ~ A2 ~ .,. :s;; An' Those cases in which all
eigenvalues are of one sign are particularly important. A Hermitian matrix A
More can be said: the existence or a real spectrum for a normal matrix is said to be positive (respectively. negative) definite if all its eigenvalues are
A is a necessary and sufficient condition for A to be Hermitian. positive (respectively, negative). Obviously. a definite (i.e. either positive or
Theorem 2. A normal matrix A is Hermitian if and only lfits spectrum lies negative definite) matrix is non singular (see Exercise 4.10.3). We shall also
on the realline. need some refinements: A Hermitian matrix is called positive (respectively,
negative) semi~definite or nonnegative (respectively. nonpositive) definite. if
PROOF. Let A*A = AA* and suppose u(A) c: (-00, (0). Since A = UDU* all its eigenvalues are nonnegative (respectively, nonpositive). A semi-
and A* = UD*U* = UDU*, it follows that A = Alit and hence A is Herro. definite matrix is understood to be one of the two varieties defined above.
itian. We shall write H > 0 and H :::: 0 for positive definite and positive semi-
The second part of this theorem is Theorem 1. II definite matrices, respectively. Similarly, the notations H < 0 and H ~ 0
Thus Hermitian matrices, and only they, have an orthonormal eigenbasis will be used for negative definite and negative semi-definite matrices.
along with a real spectrum. Another important description of Hermitian- The following result-is very important and is frequently used in the defi-
matrices follows from their definition and Theorem :5.1.1. nition of definite matrices. It has the great advantage of being free of ideas
associated with the spectrum of A.
Exercise1. Prove that an n x n matrix A is Hermitian if and only if (Ax, y)
= (x, Ay) for all x, y E tn. 0 Theorem 3. Given a Hermitian matrix H E!Fnl<n, H :::: 0 (or H > 0) if and
only if(Hx, x) :::: 0 (or (Hx, x) > O)for all nonzero x E !F H
We know that for an arbitrary square matrix, the eigenvectors associated'
with distinct eigenvalues are necessarily linearly independent (Proposition ~OOF. The necessity of the condition will be shown in Corollary 5.4.1.
4.10.4). If the matrix is Hermitian. it is also normal and, by definition of a Let (Hx, x) :::: 0 for all x E n and let the eigenvectors U1. U2, , Un of H
normal transformation, it has an orthonormal eigenbasis. associated with the eigenvalues ..1.1> ..1.2 , An' respectively, form an ortho-
normal basis in /Fn Since
Exercise 2. Show (without using the fact that a Hermitian matrix is nor~al
that the eigenvectors associated with distinct eigenvalues of a Hermina o ~ (HUh u,) == A,(U" u,) = Ai
matrix are orthogonal. 0 for i == 1,2, ... , n, the matrix H is positive semi-definite.
It turns out that any square matrix A can be represented in terms of The positive definite case is proved similarly.
pair of Hermitian matrices. Indeed, it is easy to verify that for A E tn l<n,
Note that results similar to Theorem 3 hold for negative definite and semi-
A = HI + iH2 ( definite matrices.
180 S LINEAR 'TRANSFORMATIONS IN UNITARY 8PAClll 5.4 SQUARI! ROOTOF A DEFINITIl MATRIX 181

Exercise 5. Show that for any (even rectangular) matrix A, the matrices Theorem I. A matrix H is positivedefinite(or semi-definite) ifand only ifit
A*A and AA* are positive semi-definite. has a positive definite (respectively, semi-definite) square root R o . Moreover,
rank H o = rank H.
Hint. Let A be m x n and consider (A *Ax, x) for x E :F".
PROOF. If H ~ 0 then, by definition, its eigenvalues {A,}?= 1 are nonnegative
Exercise 6. Confirm that for any A e:Fm><n,
and we can define a real matrix Do = diag[A, A , .. , AJ.The
Ker(A*A) = Ker A, Im(A* A) = 1m A*. matrix H is normal and hence there is a unitary matrix U such that H ;".
UDU*, where D = D~. The required square root of H is the matrix
Exercise 7. Deduce from Exercise 6 that the matrix A *A (where A e P x. n; = UDo U*, (1)
and m ~ n) is positive definite if and only if A is of full rank.

Exercise8. Check that if HI and H 2 are positive definite and , peR are since H~ = UDoU*UDoU* = UD~U* = H. Note that the representation
1
nonnegative real numbers, not both zero, then the matrices H 1 and HI + (1) shows that the eigenvalues of Hoare the (arithmetic) square roots of the
PH 2 are positive definite. eigenvalues of H. This proves that rank H o = rank H (Exercise 5.3.9), and
H ~ 0 yields H 0 ~ O.
Exercise 9. Show that if H is positive semi-definite and rank H = r, then H Conversely, if H = H~ and H 0 ~ 0, then the eigenvalues of H, being the
has exactly r positive eigenvalues. squares of those of H 0' are nonnegative. Hence H ~ O. This fact also implies
the equality of ranks.
Exercise 10. Show that if H is any positive definite matrix, then the function
The same argument proves the theorem for positive definite matrices.
( , ) defined on Cn x C by (x, y) = y*Hx is an inner product on en. 0
The following simple corollary gives a proof of the necessary part of
Skew-Hermitian matrices have a number of properties that can be seen
Theorem 5.3.3.
as analogs to those of Hermitian matrices. The next two exercises give
results of this kind. Corollary I. If H ~ 0 (or H > 0), then (Hx, x) ~ 0 (or (Rx, x) > 0) for
all xefF".
Exercise 11. Prove that the spectrum of a skew-Hermitian matrix is pure
imaginary. hOOF. Representing H = H~ and using the matrix version of Theorem
S.1.1, we have
Exercise 12. Show that a normal matrix is skew-Hermitian if and only if
its spectrum lies on the imaginary axis. (Hx, x) = (H~x, x) = (Hox, Hox) ~ 0 for all xe:Fn
Exercise 13. Let A be Hermitian and let H be a positive definite matrix Exercise 1. Prove that if H is positive semi-definite and (Hx, x) = 0 for
of the same size. Show that AH has only real eigenvalues. some xefFn, then Hx = O. 0
Exercise 14. Use Exercise 2.10.2 to show that, if H is positive definite, Note that the square root of a positive semi-definite matrix cannot be
there is a unique lower triangular matrix G with positive diagonal elements unique since any matrix of the form (1) with a diagonal matrix
such that H = GG*. (This is known as the Choleskyfactorization of H.) 0
D = diag[A, A,, A]
is a square root of H. However, the positive semi-definite square root 8 0 of
His unique.
5.4 Square Root of a Definite Matrix and Singular Values Proposition I. Let H be positive semi-definite. The positive semi-definite
squar~ rootH o ofH is unique and is given by Eq. (1).

An analogy between Hermitian matrices and real numbers can be seen, hOOF: LetH 1 satisfy H~ = H. By Theorem 4.11.3, the eigenvalues ofHI are
in the following result, which states the existence of a square root H o of a square roots of those oCH and, since H I ~ 0, they must be nonnegative. Thus,
positive semi-definite matrix H, that is, a matrix H o such that H~ = H. the eigenvalues of HI and Ho of Eq, (1) coincide. Furthermore, HI
182 S LINEAR TRANSFORMATIONS IN UNITARY SPAC A SQUARE ROOTOFA DEFINITE MATRIX 183

is Hermitian and, therefore(Theorem 5.2.1), HI = VDoV* for some unit I'then AI e a(AA*) and is associatedwith the eigenvector AXj.1n particular,
V. Now H~ = H~ = H. so VD~V = UD~U and hence (UV)D~ A*A) c a(AA*). The opposite inclusion is obtained by exchanging the
Da(UV). It is easily seen that, because Do ~ 0, this implies (UV)D o oles of A and A*.
Do(UV) and, consequently. HI = H o, as required.
Thus the eigenvalues of AA and AA"', as wellas of (A"'A)1/2 and (AA"')1 12,
In the sequel the unique positive semi-definite (or definite) square roo iffer 'only by the geometric multiplicity of the zero eigenvalue, which is
ofa positive semi-definite (or definite)matrix H isdenoted by H 112 Summar - r for A*A and m - r for AA"', where r"" raok(A*A) .... rank(AA*).
izingthe above discussion. note that A. e O'(H 1/ 2 ) if and only if Xl. e O'(H). an ilso, for a square matrix A it follows immediately that the eigenvalues of
the corresponding eigenspaces of H11 2 and H coincide. The concept of *A and AA'" coincideand have the same multiplicities. [Compare this with
squareroot ofa positivesemi-definite matrixallowsus to introducea spectr 'e result of Exercise 4.14.10(b).]
characteristicfor rectangular matrices. Note that we proved more than was stated in Theorem 2.
Consideran arbitrary m x n matrix A. The n x n matrix AA is(generall
xercise 3. Show that if Xl> X2' ... , X k are orthonormal eigenvectors of
positive semi-definite (seeExercise 5.3.5). Thereforeby Theorem 1 the mat
*A corresponding to nonzero eigenvalues, then AXl> AX2"'" AXk are
AA has a positive semi-definite square root HI such that AA = H
hogonal eigenvectors of AA* corresponding to the same eigenvalues.
The eigenvalues AI' A2'... l" of the matrix HI = (A *A)1I2 are referredt converse statement is obtained by replacing A by A*.
as the singular values SI' S2 ... ' s" of the (generally rectangular) matrix
Thus, for i = 1,2, ... , n, ~ercise 4. Verify that the nonzero singular values of the matrices A and
siCA) A AjA*A)1/2). '"are the same. If,in addition, A is a square n x n matrix, show that siAl =
Obviously, the singular valuesof a matrix are nonnegative numbers. iA *) for i = 1, 2,... ,n. 0
j2, 82 = In the rest of this section only square matrices are considered.

n
Exercise 2. Check that 81 = 1 are singular valuesof the matrix
roposition 2. The singular values of a square matrix are invariant under
,itary transformaiion.
A-H 0 other words.for any unitary matrix U E CN x" and any A E C N N. X

Note that the singular valuesof A are sometimesdefinedas eigenvalues i = 1.2, ... , n. (2)
the matrix (AA*)1/2 of order m. It follows from the next fact that the ditfe Bydefinition,
ence in definitionis not highly significant.
sIPA) = AIA*U*UA)1/2) = AjA*A)1/2) = sj(A).
Theorem 2. The nonzeroeigenvalues of the matrices(A*A)11 2 and (AA)1
coincide. prove the second equality in Eq. (2). use Exercise 4 and the part of the
PROOF. First weobservethat it suffices to prove the assertion ofthe theor
oposition already proved.
for the matices AA and AA*. Furthermore, we select eigenvecto eorem 3. For an n x n normal matrix A efP'''x",
Xl' X2,'" X" of A*A corresponding to the eigenvalues At> A2"'" l" sue
i = 1,2, ... , n.
that {xt> X2,"" x,,} forms an orthonormal basis in SF". We have
(A Ax;, x) = Aj(Xi' xi) = A,t5ij, I ~ i, j :s; n. . OOF. Let Ai denote an eigenvalue of A associatedwith the eigenvector XI'
nee A"'A = AA*, it follows (see Exercise 5.2.7) that
On the other hand, (A*Axh Xi) = (Axj, AXi)' and comparison showsth
(AXh Ax/) = Aj, i = I, 2, ... ,n. Thus, Ax/ = 0 (1 S; i :s; n) if and only A*Axj = AiA*X/ = Aj l / x i = IA;l2 xi. (3)
~=Q~~\ nee x, is an eigenvector of A"'A corresponding to the eigenvalue 1~12.
AA*(Axl) = A(A*Ax/) = AI(Axl), 1 :s; i :s; n, definition, IAll is a singular value of A. Note that A has no other singular
the precedingremark shows that for AI .;:. 0, the vector AXj is an eigenvect ues, sincein varying i from 1 to n in Eq. (3),all the eigenvalues of A'"A are
of AA. Hence if a nonzero AI e o{A*A) is associated with the eigenvect tained,
184 5 UNBAR TRANSFORMATIONS IN UNITARY SPACES 5.5 CONGltUENCB AND TfIE INBltTIA OF A MATlUX
185
Exercise 5. Checkthat for any n x n matrix A, Ordering the eigenvalues so that the first s scalars .t1> ..1.2 , , ..t. on the main
[det At = sl(A)S2(A) s,,(A). diagonal of D are positive and the next r - s numbers A 1 1
'
arenegative. .
wemay wnte D = U1D1DoDI U1, where .+1> ....+2'',.,
Hint. If H = A*A, where H ~ 0, then det H = [det A I.
2

Exercise 6. Confirm that a square matrix is unitary if and only if all its,
D1 = diag[A, A. ... ,.jf., JI As +1 I, ... JiU. 0, ... 0],
singular values are equal to one. ' '!
Do i~ given by Eq. (~), a~d 1 is a permutation (and therefore a unitary)
matnx. Hence. substituting into (3), we obtain H = PDoP*, where P =
Exercise 7. Let A e em x" and A = QR be the Q-R decomposition" of A
U UU1D1 The theorem is established. . '
described in Exercise 3.12.7. Show that A and R have the same singular
values. 0 Coronary 1. A Hermitian matrix He ,''x,, is positive definite if and only
if it is congruent to the identity matrix.
In. other words. H > 0 if and only if H = P p* for some nonsingular
matnxP.
5.5 Congruence and the Inertia of a Matrix
Coro~ 2. A n.e~m~tian matrix He'''X'' of rank r is positive semi-
defimte if and only if It JS congruent to a matrix ofthe form
The notion ofunitarysimilarity discussed in Section5.2can begeneralized
in the following way. Two square matrices A and B are said to be congruent I, 0 ]
[ o O,,-r .
if there existsa nonsingularmatrix P such that
A =PBP*. Thus H ~ 0 if and only if H = p.P for some P.

It is clear that for a unitary P the matrix A is unitarily similarto B. Also; ~oroUary 3. Two Hermitian matrices A. B e ,/ll(" are congruent if andonly
congruence is an equivalence relation on ,''x,,. It is also obviousfrom Eq':, (if r~~k A = rank B ~nd the number of positive eigenvalues (counting multi-
" plicltles)ofboth matrtces is the same.
(1) that if A is Hermitian then all matricesin the equivalence classof A are
Hermitian. In Theorem 1 it will be shown that there is a particularlysimple ,~,; We already know that a Hermitian matrix H can be reduced by a con-
canonicalmatrix in each such classof Hermitian matrices. :"gl'Uence transformation to a diagonal matrix. If several Hermitian matrices
are given, t~e~ each of them can be reduced to diagonal form by means of
Theorem 1. A Hermitian matrix He ,"X" is congruent to the matrix (generally distinct) congruence transformations.
:' The important problem of the simultaneous reduction of two Hermitian
I. 0 0 ]
Do = 0 -1,-. 0 , :', ttlatrices to diagonal form using the same congruence transformation will
[
o 0 0,,_, ifJ\, now ~e di~uss~: Thisturns out to be possibleprovidedoneofthe Hermitian
{;' tnatnces IS positive (or negative) definite. If this condition is relaxed the
where r = rank Hand s is the number of positive eigenvalues of H counted J7tQuestion of canonical reduction becomes significantly more complicated.
according to multiplicities.
:I'heorem 2. If H 1 and H 2 are two n x n Hermitian matrices and at least one
In other words, the matrix Do in Eq. (2) is the canonical form of H wit~' Ofthem, say H lo is positive definite, then there exists a nonsingular matrix Q
respect to congruence; it is the simplest representative of the equivalence SUch that
class of Hermitian matrices congruent to H. Q*H 1Q = I and Q*H 2Q = D, (4)
PRooP. By Theorem 5.2.1, il\lhere D == diag[A. It , ,t,,] and A. 1o A2 , , A" are the eigenvalues ofn;: H 2'
all ofwhich are real.
H= UDU, Furthermore, if tjj denotes the jth column of Q. then for j = 1, 2, ... , n,
where D is a diagonal matrix of the eigenvalues of H, and U is unitarr. (AJH 1 - H 2 )tjj = O. (5)

I
186 S LINEAR TRANSFORMATIONS IN UNITARY SPACIlS 5.SCONGRUENCIl AND THE INIlRTIA OF A MATRIX 187

PROOF. Let H~/2 be the positive definite square root of HI and define where n(A) v(A) and 6(A) denote the number ofeigenvalues of A, counted
H~I/2 == (H1 /2)-1. Observe that H == H11/2H2Hl1/2 is Hermitian, and with their algebraic multiplicities, lying in the open right half-plane, in the
so there is a unitary matrix U such that U*HU == diag[Ah A2' ... , An] AD, open left half-plane, and on the imaginary axis, respectively. Obviously,
a real diagonal matrix, where AI> Az, ... , An are the eigenvalues of H. Since + v(A) + (j(A) = n
n(A)
H == H~12(Hl1 H 2)H I 112, the matrices Hand H 11H 2 are similar. It follows
that All A2' ... , An are also the eigenvalues of H 11H 2 and are real. and the matrix A is nonsingular if e5(A) = O.
NowdefineQ == H 11/2U;wehave In the particular case of a Hermitian matrix H, the number n(H) (respec-
tively, v(H merely denotes the number of positive (respectively, negative)
QH 1Q == U*H11/2HIH11/2U = U*U = I, eigenvalues counted with their multiplicities. Furthermore, e5(H) is equal to
Q*H 2Q = U*HI112HzHl1f2U = U*HU = D, (6) the number of zero eigenvalues of H, and hence, H is nonsingular if and only
if e5(H) = O. Note also that for a Hermitian matrix H,
as required for the first part of the theorem.
n(H) + v(H) = rank H.
It is clear from Eqs. (6) that
The difference n(H) -v(H), written sig H, is referred to as the signature of
(Q*H 1Q)D - Q*H 2Q == 0, H.
and this implies H 1QD - HzQ =0. Thejth column of this relation is Eq. The next result gives a useful sufficient condition for two matrices to have
(5). the same inertia. The corollary is a classical result that can be proved more
readily by a direct calculation. The interest of the theorem lies in the fact that
Exercise 1. Show that if the matrices HI and H 2 of Theorem 2 are real, matrix M of the hypothesis may be singular.
then there is a real transforming matrix Q with the required properties.
Theorem 3. Let A and B be n x n Hermitian matrices of the same rank r.
Exercise 2. Prove that the eigenvalues of H l 1H 2 in Theorem 2 are real by If A = MBM* for some matrix M, then In A == In B.
first defining an inner product (x, y) = y*H 1 X and then showing that
PROOF. By Theorem 1 there exist nonsingular matrices P and Q such that
H 11 H 2 is Hermitian in this inner product. []
PAP* = diag(I" -I,_" O]A Dl/)
We conclude our discussion on the simultaneous reduction of a pairof
and
Hermitian matrices to a diagonal form by noting that this is not always
possible. Q-IB(Q-l)* = diag[/., -1,-., 0] A Dlfl,
Exercise 3. Show that there is no nonsingular matrix Q such that both where t and s denote the number of positive eigenvalues of A and B, respec-
matrices tively; t == n(A) and s = x(B).
To prove the theorem it suffices to show that s = t. Observe that, since
and A == MBM*,
D~1) = PAP* = PM(QD~2)Q*)M*P*
are diagonal. 0
or, writing R = PMQ,
In the study of differential equations, the analysis of stability of solutions
requires information on the location of the eigenvalues of matrices in rela- D~1) = RD~2IR*. (7)
tion to the imaginary axis. Many important conclusions can be deduced by Now Suppose s < t and let us seek a contradiction. Let x e ~n be given
knowing only how many eigenvalues are located in the right and left half-
planes and on the imaginary axis itself. These numbers are summarized in by ~ = [~l], where Xl = [XI X2 ... X,]T efFl and Xl '1= O. Then
the notion of "inertia." The inertia of a square matrix A e fFn" n, written
In A, is the triple of integers ,
x*D~1)x = L Ixtl 2
> Q. (8)
In A = {neAl, veAl, e5(A)}, j-l
188 S LINEARTRANSFORMATIONS IN UNITARY SPACIlS S.6 UNITARY MATRICES 189

Partition R* in the form The converse of this statement is also true for normal matrices. (It is
an analog of Theorem 5.3.2 for Hermitian matrices).
R* = [R 1 R z], Theorem 2. A normal matrix A e :Fft)( ft is unitary if all its eigenvalues lie
R3 R4
ontheunit circle.
where Rj is s x t.Sinces < t,anxlcanbechosensothatxl :FOandRlxl =
PROOF. Let Ax, = A,x" where I~I = 1 and (x" xJ) = 61} (1 S; i,j S n).
O. Then define r= R3Xl e:Fft- a and we have R*x = [;]. Consequently, . For any vector xe:Fft, write x = L7-1
IX,Xi; then it is found that
using Eq. (7) and writing y = [y .. Y2' ... , Yft_JT ,
,-a A*Ax = A*(.t
.=1
IXiAX,) = A*{. IX,A,Xi)'
I-I
(1)
x*Dg'x = (R*x)*D\:'(R*x) = - L Iyl S; 0, From Exercise 5.2.7 we know that A*x, ;::: A,X, (i = 1,2, ... , n), so Eq, (1)
J-l
becomes
and a contradiction with Eq. (8) is achieved.
Similarly, interchanging the roles of D~' and D!r', it is found that t < s if ft ft ft
=
impossible. Hence s t and the theorem is proved.
A*Ax =L
IXIA,A*Xi = L
IXi1A,12x, = L
lXiXI = X.
1=1 I-I i=1

Corollary 1. (Sylvester's law of inertia). Congruent Hermitian matrices. Thus A *Ax ;::: x for every x e:Fft and therefore A *A = 1. The relation,
have the same inertia characteristics. . AA* = I now follows since a left inverse is also a right inverse (see Section
PROOF. If A = PBP* and P is nonsingular, then rank A = rank B and the 2.6). Hence the matrix A is unitary.
result follows. Thus, a normal matrix is unitary if and only if its eigenvalues lie on the unit
Exercise 4. Show that the inertia of the Hermitian matrix circle. Unitary matrices can also be described in terms of the standard inner
product. In particular, the next result shows that if U is unitary, then for any
O il ] li: the vectors x and Ux have the same Euclidean length or norm.
[ -ifft Oft
Theorem 3. A matrix U e :F ft )( ft is unitary if andonly if
is (n, n, 0). 0
(Ux, U,) = (x,,) (2)
.for all x; y e :F ft.
6.6 Unitary Matrices PRooF. If UU = 1, then
(Ux, Uy) ;::: (U*Ux, y) = (x, y)
We now continue the study of unitary matrices. Recall that a sq and Eq. (2) holds. Conversely, from Eq. (2) it follows that (U*Ux, y) =;= (x, y)
matrix U is unitary if it satisfies the condition U*U = Uu* = 1. or, equivalently, U*U - l)x, y) ;::: 0 for all x, y e :Fft Referring to Exercise
lbeorem1. The spectrum ofa unitary matrix lieson the unitcircle. 3.13.2, we immediately obtain U*U = 1. Consequently, UU* = I also,
and U is a unitary matrix.
PROOF. If Ux ;::: AXand (x, x) = 1, we obtain
",)2 Observe that Eq, (2) holds for every x, y e,ft if and only if it is true for
(Ux, Ux) ;::: (U*Ux, x) = (x, x) = 1.
th~ vectors of a basis in :F ft. (Compare the statement of the next Corollary
On the other hand, \Vith Exercise 3.14.12.)
(Ux, Ux) = (Ax, AX) = I.W(x, x) = IAlz. , ollary 1. A matrix U e :Fft )( ft is unitary if and only if it transforms an
The comparison yields IliZ = 1 and the desired result. thonormal basisfor :F ft intoan orthonormal basis.
190 S LINEAR TRANSFORMATIONS IN UNITARY Sp PoLAR ANDSINGULAR-VALUSDECOMPOSITIONS 191

Now take the elements el (i = 1, 2, ... , n) of the standard basis in fF" ~ orthonormal eigenvectors of AA* corresponding to the eigenvalues
x and yin Eq.(2). 12 " " .1" respectively. Choosing an orthonormal basis {y,+ 10 ' Y.} in
AA.... we extend it to an orthonormal eigenbasis {y,H= 1 for AA*.
Corollary 2. A matrix U E fF)(" is unitary if and only if the columns 0/ Proceeding to a construction of the matrices Hand U in Eq. (1), we write
viewed as vectors 0/ g;", constitute an orthonormal basis in g;". (AA"')1/2 and note (see Section 5.4) that Hy, = (A,)l/2y, for i = 1,2, , n.
~so, we introduce an n x n (transition) matrix U by Ux, == YI' i = 1,2, , n.
Note, in conclusion, that some properties of unitary matrices are rem
f.[ote that Corollary 5.6.1 asserts that U is unitary. We have
iscent of those of the unimodular complex numbers ell,where 0 :s; ')' < 2n.
particular, it follows from Theorem 1 that all eigenvalues of a unitary mat . ~, HUx, = Hy, == AYI == Ax" i == 1,2, ... , r. (3)
are of this form. The polar form of more general complex numbers is th
!1Since (Axjo Ax,) = (A*Ax,. Xi) == Ai = 0 for i = r + I, r + 2, ... , n, it
starting point for our next topic. tdUows that Ax, = 0 (r + 1 :s; i:S; n). Furthermore, as observed in Section
Exercise 1. (Householder transformations). Let x be a nonzero vector' SA. AA*y, == 0 implies Hy, = O. Thus
e and define the n x n matrix HUXi = Hy, = 0 == Ax" i == r + 1, ... n. (4)
U==[-tXxx*, The equalities (3) and (4) for basis elements clearly give HUx = Ax for
proving Eq. (1).
where tX = 2(X*X)-1. Show that U is both Hermitian and unitary. Fi
the n eigenvalues of U and an associated orthonormal eigenbasis. 0 I Note that if A is nonsingular, so is (AA *)1/2 and in the polardecomposition
the matrix H == (AA*)1I2 is positive definite. Observe that in this case
unitary matrix U can be chosen to be H-l A and the representation (1) is
ue.
5.7 Polar and Singular-Value Decompositions dual polar decomposition can be established similarly.
,~rcise 1. Show that any A E fF" ><. can be represented in the form

The following result has its origin in the familiar polar form of a compl A = UIHt> (5)
number: A == ADe'l, whereAo ~ 0 and 0 :s; 'I' < 2n. ere HI == (A *A) 1/2 and U 1 is unitary.. 0

Theorem 1. Any matrix A E !iF"><. can be represented in the form or normal matrices both decompositions coincide in the following sense.
osition 1. A matrix A E fF" ><" is normal if and only if the matrices H and
A=HU,
Eq. (1) commute.
whereH ~ 0 and U is unitary. Moreover, the matrix H in Eq. (1) is uniquea F. If A == HU == UH, then
is given by H = (AA",)1 /2. .
A"'A == (HU"'XUH) = H2 = (HU)(U*H) == AA*
PROOF. Let
dA is normal.
AI ;;::: A.2 ~ ~ it, > 0 = A,+ 1 = ." = it" Conversely, let A be normal and adopt the notations used in the proof
iTheorem 1. Since AA'" = A"'A, the system {xIH=1 is an orthonormal
denote the eigenvalues of A*A (see Theorem 5.4.2) with correspondi basis for AA'" and, consequently, for H = (AA*)1/2. Hence, for
eigenvectors XI> X2' ... , x. that comprise an orthonormal basis in g;". 1,2, ... , n,
Then, by Exercise 5.4.3, the normalized elements UHx, == u; A (6)
hermore, from A == HU it follows that A,x, = A*Axl = U*H2UXI' and
i == 1.2... r, sequently H2UXI = A,UXI' Thus the vectors UXI are eigenvectors of
192 S LiNIwl1'RA.NSFORMATIONS IN UNITAllY SPA 6.7 PoLAR ANDSINGULAR-VALUE DECOMPOSITIONS 193

= 1, 2, .. , n). Then the matti


HZ corresponding to the eigenvalues .t, (i Note the structure ofthe matrices U and V in Eqs. (7) and (9): the columns
H has the same eigenvectors corresponding to the eigenvalues .jih re- ofU (respectively, V) viewed as vectors from /F'" (respectively, /F") constitute
spectively: an orthonormal eigenbasis of AA* in /F'" (respectively, of A*A in /F"). Thus,
a singular-value decomposition of A can be obtained by solving the eigen-
HUx, =..;r, UXh i =1,2, . . . ,n. value-eigenvector problem for the matrices AA * and A *A.
Comparing this with Eq. (6), it is deduced that UH = HU, as required. Example 4. Consider the matrix A defined in Exercise 5.4.2; it has singular
Exercise 2. Check that if A = AOe'7 (Ao > 0) is a nonzero eigenvalue of a values St = ../i
and Sz = 1. To find a singular-value decomposition of A,
normal matrix A, then Ao is an eigenvalue of H while e'7 is an eigenvalue we compute A*A and AA* and construct orthonormal eigenbases for these
U in the polar decomposition (1). matrices. The standard basis in IF z can be used for A*A and the system
Exercise 3. Check that the representation
{[IX 0 ct]T, [0 1 O]T, [IX 0 _ac]T}, where IX = 1/../i,
can be used for
AA*. Hence

AA [~ ~] = [:=: :=:][.: :l ac=-


1
../i' A = [~ ~l [~~ ~l [~ ~.][~ ~]
=
is a polar decomposition of A. Find its dual polar decomposition. 0 1 0 at 0 -at . 0 0
The procedure used in proving Theorem 1 can be developed to pro\! is a singular-value decomposition of A. 0
another important result.
The relation (7) gives rise to the general notion of unitary equivalence
Theorem 2. Let A denote an arbitrary matrixfrom /F"'"" and let {s,H=t b .' ~f two m x n matrices A and B. We say that A and B are unitarily equivalent
the nonzero singular values ofA. Then A canbe represented in theform itthere exist unitary matrices U and V such that A = UBV*. It is clear that
,;:,,!;.'.:, unitary equivalence is an equivalence relation and that Theorem 2 can be
A = UDV*, ;;idJnterpreted as asserting the existence of a canonical form (the matrix D) with
where U e /Fm II mand Ve /F" "" are unitary and the m x n matrix D has s, :ii',)espect to unitary equivalence in each equivalence class. Now the next result
the i, i position (1 ~ i ~ r) and zeroselsewhere. "i'ii to be expected.
The representation (7) is referred to as a singular-value decomposition ?\.l!rcposition2. Two m x n matrices are unitarily equivalent if and only if
the matrix A. . fliey have the same singular values.
PROOF. Let us agree to preserve the notation introduced in the proof , )\OOF. Assume that the singular values of the matrices A, Be,m ll "
Theorem 1. Note only that now the matrix A is generally a rectangular ma ~incide. Writing a singular-value decomposition of the form (7) for each
and the systems of eigenvectors {X'}?"l for A*A and {Yi}T.. t for AA* a matrix (With the same D), the unitary equivalence readily follows.
now orthonormal bases in IF" and IF"', respectively. We have [see Eq. (2 <,Conversely,let A and B be unitarily equivalent, and let D 1 and D z be the
Ax, = Ay" i = 1,2, ... , r, '~trices of singular values of A and B, respectively, that appear in the
~iJlgular-value decompositions of the matrices. Without loss of generality,
and, by definition of x, (see Exercise 5.3.6), Ax, = 0 (r + 1 ~ i ~ n). it may be assumed that the singular values on the diagonals of D l and D z
Now we construct matrices O>ii;;(are in nondecreasing order. It is easily found that the equivalence of A and
V = [Xt Xz ... x"] and U = [Yl Yz . . . y",] .8>implies that D 1 and Dz are also unitarily equivalent: Dz = U ID l vr.
ewing the last relation as the singular-value decomposition of D z and
and note that, according to Corollary 5.6.2, they are unitary. Since by d ing that the middle factor D1 in such a representation is uniquely deter-
=
ition s, .jI, (i = 1,2, ... , n), the relation (8) implies ed, we conclude that D 1 = D z.
AV = [SIYI szYz ... srYr 0 ... 0] = UD, IIaryI. Twom x n matrices A and B areunitarily equivalent if and only
where D is the matrix in Eq. (7). Thus, the representation (7) is established. 'lthe matrices A *A and B*B are similar.
194 5 LINIlARTRANSFORMATIONS IN UNITARY SPACES 5.8 IDEMPOTENT MATRICES (PROJECTORS) 195

PROOF. If A = UBV"', then Thus, each idempotent matrix P generates two uniquely defined mutually
complementary subspaces 9'1 = Ker P and 9'2 = Im P such that their
A"'A = VB"'U*UBV'" = VB"'BV'" = VB"'BV- 1
+
direct sum is the entire space. The converse is also true: If f/' 1 f/'2 = !F",
Conversely, the similarity of A"'A and B*B yields, in particular, that they there exists a unique idempotent P for which 9'1 = Ker P and 9'2 = Im P.
have the same spectrum. Hence A and B have the same singular values and, To see this.just define! Pon'''byPx = X2, where x == Xl + x2andxI E9'1o
X2 E9'2' Then it is clear that p = P and f/'1 = Ker P, 9'2 = Im P. Also, the
2
consequently, are unitarily equivalent.
uniqueness of P follows immediately from the definition. Thus, there is a
one-to-one correspondence between idempotent matrices in 'n x" and pairs
5.8 Idempotent Matrices (Projectors)
+
of subspaces f/'h f/'2 such that 9'1 f/'2 = /F".
In view of this result, we shall say that the idempotent matrix P performs
the projection 0/ the space IF" on the subspace 9'2 parallel to f/'l' or onto f/'2
. along ~. Hence the name "projector" as an alternative to "idempotent
By definition, an idempotent matrix P satisfies the condition p2 = P. matrix."
Another name for such 'a matrix is a projector. The name has geometrical We now go on to study the spectral properties of projectors.
origins that, among other things, will be developed in this section. Observe
that the definition can also be applied to transformations P E !t'(f/') for any Theorem 3. An idempotent matrix is simple.
linear space f/'. It is left as an exercise for the reader to formulate analogues PROOF. We have already seen in Exercise 4.11.1 that the eigenvalues of an
of Theorems 1-5 in this case. We start with the followingsimple proposition. idempotent matrix are all equal to 1or O. It foUows then that the eigenvectors
Theorem 1. If P is idempotent, then: of P all belong to either 1m P (A. = 1)or Ker P (A. = 0).In viewofTheorem 2,
the eigenvectors of P must span '" and hence P is a simple matrix.
(a) I - P is idempotent;
(b) Im (I - P) = Ker P; The spectral theorem (Theorem 4.10.3) can now be applied.
(c) Ker(I - P) = Im P. Corollary1. An n x n projector P admits the representation
PROOF. For part (a), observe that (1 - p)2 = I - 2P + p2 = I - 2P +,
= r1 xJyj,
r

P = I - P. Proceeding to part (b), we see that if, E Im(l - P), then, == P (1)
(1 - P)x Cor some x E /Fri. Hence P, = P(l - P)x = (P - p 2 )x = 0 and
j;

yeKer P.Conversely,ifPy =0, then (I -Ply =1 and thereforeyelm(I - P); where r = rank P and {Xl"'" x r}, {ylJ ... , Yr} are biorthogonal systems in
Thus Im(I - P) = Ker P. A similar argument givesthe proofof part (c). I( ,,".
Note that the idempotent matrix Q = I - P is called the complementary' Note that a direct computation of p 2 shows that, conversely, any matrix
projector to P. The importance of the last theorem becomesclearer when w~ of the form (1) is idempotent.
recognize that the kernel and image of P constitute a direct sum decomposl-. . An important case arises when P is both idempotent and Hermitian:
tion of the space /F" on which P acts. Note also that, for the complementary; }>2 = P and P'" = P. In this case the subspaces 9'. = Ker P and f/'2 = 1m P
~~~~Q, . are found to be orthogonal. For, if x e f/'. and y e f/'2' then
1m Q = Ker P, Ker Q = 1m P. (x, y) = (x, Py) = (P*x, y) = (Px,y) = 0
Theorem 2. If P is a projector, then Ker P + Im P = !F". and therefore 9'1 $9'2 = '''. A Hermitian idempotent P is called an
orthogonal projector: it carries out the projection of ''' onto ~ = 1m P
PROOF. For any x E IF", we can write x = Xl + X 2' where Xl = (I - p).t~
~Iong the orthogona/complement.9i = Ker P.
Ker P and X2 = Px elm P. Hence 'n = Ker P + 1m P. Finally, if x E Ke~
P f"'l 1m P, it must be the zero element. For, x e Ker P implies P X =
while X elm P implies by Theorem 1 that (1 - P) x = O. These two co t Strictly speaking, this construction first defines a linear transformation from 9'" to 9".
elusions mean that x = 0 and so the sum Ker P + Im P is direct. he matrix P is then ils representation in the standard basis.
196 5 LINEARTltANSFORMAllONS IN UNITARY SPACIlS. S.8 IDEMPOTENT MATlUCES (PROJECTORS) 197

The effect of an orthogonal projector can be described in terms of its UUltrix of another set of basis vectors, (and k :s; n) then there is a nonsingular
eigenvectors. To see this, wechoose an orthonormal set of (right) eigenvectors' k x k matrix M such that A = XM, whence
{Xl' x:z,"" X,,} for P. Then the corresponding left eigenvectors are just
the adjoints of the (right) eigenvectors and the representation (1) becomes
P = XM(M*X*XM)-IM*X*
(see Exercise 5.2.5) =XM(M- 1(X*X)-I(M*)-1)M*X*
= X(X*X)-l X*.
Hence, we get the same projector P whatever the choice of basis vectors for
and for any X e fF", ft may be. In particular, if we choose an orthonormal basis for ~ then
, , X*X = land
Ie
Px = L Xjxtx = L (x, Xj)Xi' P = XX* = L xi x 1,
J-l i=1
J=I
Now we see that each term of this summation is an orthogonal projection
and we have returned to the form for P that we had before.
of x onto the span of a (right) eigenvector Xi' Hence P is an orthogonal .f.

projector onto the space spanned by Xl' Xz,;'" x" that is, onto Im P. Exerdae 1. If P e ,,,><,, is a projector on 9i along ~, show that:
It is instructive to make a more direct approach to orthogonal projecto (a) p* is a projector on fl'j along fl't (see Section 3.13);
as follows. Suppose we are given. a subspace fI' c fF" with a b
(b) For any nonsingular He'''''", the matrix H-lpH is a projector.
{"l' "z, ... , UIe} and we wantto find an orthogonal projector onto fI'. Deft
a
the n x k matrix A = ["b "z,"" UIe], then typical member of fI' is oft
Describe its image and kernel. 0
form Ax for some X e ,Ie. Given ana e '", we want to find a vector Ax e . The next results concern properties of invariant subspaces (see Section
that can appropriately be caned the projection of II onto 9'. . .~~7),
/>,,",
which can be readily described with the help of projectors.

We determine Ax by demanding that II - Ax should be orthogonal t.


raaeorem4. Let A e !F" " IJ. A subspace t/ of !F" is A-invariant if and only
every member of fI'. This will bethe case if and only if II - Ax is orthogon
tTPAP = AP for any projector P of:F" onto fI'.
to each of the basis vectors. Thus, in the standard inner product,
hOOF. Let fI' be A-invariant, so that Ax e fI' for: all x e fI'. If P is any
(a - Ax, ui ) = "1(a -
Ax) == 0, j == 1,2, ... , k, Projector onto ~ then (1- P)Ax = Ax - Ax = 0 for all x e fI'. Since for
which is equivalent to the matrix equation .any y e ," tbe vector x = Py is in ~ it follows that (I - P)APY = 0 for all
n!F", and we have PAP = AP, as required.
A*(a - Ax) = O. fit Conversely, if P is a projector onto fI' and PAP = AP, then for any x e ~
'We have Px = x and Ax = APx = PAPx. Thus, AxeIm P = fI' and, since
Solving for x, we obtain x == (A*A)-l A*a, noting that the inverse exis .isany member of ~ the space fI' is A-invariant.
because A*A is the Gram matrix of linearly independent vectors (l'beore
3.12.2).Thus if P is to be an orthogonal projector on ~ we must have Note that the sufficient condition for f/ to be invariant is reduced, in fact,
the existence of at least one projector Ponto fI' such that QAP =
Pa == Ax = A(A*A)-IA*a, "e'", lie. - P)AP == a
that is,
..,.-.. A particularly important role is played in the spectral theory of matrices
P _ A(A*A)-lA*. A-invariant subspaces that have a complementary subspace that is also
..lnvariant. Such a subspace is said to be A-reducing. Thus, subspace fl'1
In fact, it is easy to verify that P is idempotent and Hermitian. Thus, we ha f!''' is A-reducing if A(fI'l) c: fl'1 and there is a subspace .92 such that
determined an orthogonal projector on fI' in terms of an arbitrary basis for fl'2) c fl'a and $F" = fl'1 + fl'z. In this case, we may also say that the pair
What is of particular interest, however, is the fact that P is independent subspaces fl'h fl'z is A-reducing. When a subspace is A-reducing, there is a
the choice of basis vectors for [II. To see this, observe that if X is the n x ore symmetric characterization than that of Theorem 4.
198 S UNllAll TRANSFORMATIONS IN UNITARY SPA: )5.8 IDEMPOTENT MATRICIilI (PROJECTORS) 199 '

1beorem 5. Let A elF" lC", A pair of subspaces 9"10 9"2 c:: :Fn is A -reduci ;However, it is important to bear in mind that Eq.(4) is not generally true for
if and only if AP ... PA, where P is the projector onto 9'2 along 9'1' square matrices, as the example
+
= [~ ~]
PROOF. Let:Fn = 9'1 9"2' and A(9"l) c:: 9'.. A(9"2) c:: 9"2' For any x e
write x = Xl + X2., where Xl E 9'1' X2. E 9'2.' Then APx = Ax2. and, sin A
AXl E9"l = Ker P,
quickly shows.
The general property (3) can be seen as one manifestation of the results in
Exercise 5.1.7. These results are widely useful and are presented here as a
But AX2 e 9"2 ... Im P, so
.ormal proposition about matrices, including some of the above remarks.
PAx = PAX2 = AX2 = APx roposition 1. If A ecm"n then
for allxe:Fn ThusPA == AP. Cn == Ker A E9 1m A*, Chi = Ker A'" E9 1m A.
Now let P be a projector on 9'2 along 9"1 and suppose that AP == P'
We are to show that both V 1 and 9'2 are A-invariant. Use Theorem 5.1.2 to prove Proposition 1.
If X2 e 9"2 then PX2 == X2 and
Verify that the matrix
PAX2E~,
AX2 == APX2 ==
so that A(9"2) c:: V 2. Next, if Xl E9"10 t[ ; ~ -~]
-1 1 2
Ax, == A(I - P)x 1 == (I - P)AXl e 9"10 an orthogonal projector and find bases for its image and kernel.
so that A(9"l) c:: 9i and the proof is complete. weise 4. Let P and Q be commuting projectors.
Let us take this opportunity to bring together several points concern' Show that PQ is a projector and
the image and kernel of a matrix. For the sake of clarity rind future use, ,
consider a matrix A e Cm " n to be a linear transformation from Cn into Im PQ == (Im P) n (Im Q), Ker PQ == (Ker P) + (Ker Q).
although the discussion can also be put into the context of abstract lin ) Show that if we define P 83 Q == P + Q - PQ (the "Boolean sum"
transformations. Recall first that 1m A c:: Chi and Ker A c: Cn Then, P and Q), then P 83 Qis a projector and
Theorem 4.5.5,
m(P EI3 Q) == (Im P) + (1m Q), Ker(P EI3 Q) == (Ker P) n (Ker Q).
dim(Im A) + dim(Ker A) == 111.
t. For part (b) observe that I - (P 83 Q) == (I - P)(I - Q) and use
In particular, if m == n it is possible that this statement can be strengthen (a).
to yield
rise 5. Let P and Q be projectors. Show that QP == Q if and only if
ImA + KerA == C", Pc:: Ker Q.
Theorem 2 says that this is true if A 2 == A, that is, if A is a projector, an rise 6. Let T e 1(dIJ, j/") (as in Theorem 5.1.4). Show that there is a
is easily seen to be true if A is normal (and especially, if A is Hermitian
Ue solution Xo elm T* of T(x) == b if and only if b E Im T.
unitary). In the latter case more is true: when A is normal the direct sum
orthogonal, . Using Exercise 5.1.7 define P E 1(1/') as the orthogonal projector
1m A E9 Ker A == C". along Ker T. If x satisfies T(x) == b, define Xo == Px. 0
,;ttl T*
200 5 LINEAR TRANSFORMATIONS IN UNITARY SPAC MATRICES OVER THE REAL NUMBERS 201

5.9 Matrices over the Field of Real Numbers is 1 0 , and la, 1 0 e o{A). The eigenvectors of ;r corresponding to Ao and
respectively, are mutuallycomplex conjugate, that is, of the form x + b'
x - iy. Indeed, if A% = ..1.0 %, where % = x + iy, then it is easily checked
Let A e R" X ft. Clearly. all statements made above for an arbitrary fi t A(x - iy) = lo(x - iy). Note that since ..1.0 #: 1 0 , the eigenvectors
IF are valid in the particular case IF = iii. However. R fails to be algebraic iy and x - iy are linearly independent (Proposition 4.10.4). Consider
closed and therefore matrices in R")(" may lose some vital properties. subspace o" = span{.x + iy, x - iy}. It has a real basis {x, y}. Referring
instance, a matrix with real elements may have no (real) eigenvalues. f/ as a subspace of R", that is, as the set of all linear combinations of x,y
example of such a matrix is . real coefficients, it is clear from the relations

,veR, v#:O.
Ax = JIX - vy, Ay = vx + It)', (3)
ere It + iv = A.o, that!/' is A-invariant. However, the matrix A has no
(See also the matrix in Example 4.1.2. in which l' is not an integer multiple 1) eigenvectors in !/' (Compare with Exercise 4.9.8.) For example, if A
n.) On the other hand. if the spectrum of a real matrix is entirely real. th e coefficient matrix given in (I), the linear system
there is no essential difference between the real and complex cases. A(ax + px) = ..1.(ax + py), A. e R,
Now we discuss a process known as the method ofcomplexijieation. whi
reduces the study of real matrices to that of matrices with complex entri lies that ax + py = O.
First we observe the familiar Cartesian decomposition % = X + iv. wh hus, if the characteristic polynomial of a real matrix A has a complex
% e c;n and x, ye Rn. Hence if {4l> 42, .. an} is a basis in R". ' ,', then it is associated with a two-dimensional A-invariant subspace
n iii", although A lSI' has no eigenvalues and, consequently, no eigen-
and ors in 9'.
e summarize this discussion.
then
rem 1. A realmatrix A e R"X n has a one-dimensional (two-dimensional)
n
iantsubspace in Rn associated witheachreal(complex)zero o/the charac-
% L (IX) + ip)aj'
= X + iy == )=1 ic polynomial.
Thus en can be viewed as the set of all vectors having complex coordi -the first case, the invariant subspace is spanned by a (real) eigenvector
in this basis. while R" is the set of all vectors with real coordinates ; while in the second case the matrix has no eigenvectors lying in this
respect to the same basis. The last remark becomes obvious if the basi
R" is standard. Note that the vectors x in Rn can be viewed as vectors of
riant subspace.
'matrix U e /RnNn satisfying the condition UTU = UU T = 1 is an onho-
.
..

form x + iO in eft. I matrix. The'properties of orthogonal matrices viewed as matrices in


Proceeding to a matrix A e RnxlI. we extend its action to vectors from, "can be deduced from those of unitary ones. In particular, since a unitary
by the rule rix is simple, it is similar to a diagonal, matrix of its eigenvalues. Some
A% = Ax + iAy e eigenvalues may be complex numbers and therefore are not eigen-
es of U viewed as a matrix in Rn lln Hence the following result is of in-
for l = X + iy.x.y Elff'. Note that the matrices A and Aare in fact the,
and therefore have the same characteristic polynomial with real coeffici
It should also be emphasized that only real numbers .are admitted as ei ary 1. Any orthogqnal matrix U e Rft ll " is similar (over R) to a matrix
values of A, while only vectors with real entries (i.e., realvectors) are a , form
as eigenvectors of A. diag[I.... , 1. -1, ... , -1, D ll D 2 , ... ,D,,],
Let Ao e u(A) and Axo = .taxo. where Xo E /Rn and Xo #: O. By Bq.
Axo = Axo = .taxo and the matrix A has the same eigenvalue and ei :e D. e R2 x 2 (i = 1. 2, ... , k) are matrices 0/ the form (1). in which Jl =
vector. If Ao is a complex zero of the characteristic polynomial e(l) of A 1', v == -sin 1'. y #: nt, t = 0, 1.... ,~ .
202 S LiNBAR TRANSFORMATIONS IN UNITARY SPA to BILINEAR, QuADRATIC,ANDHERMITIAN FoRMS 20.3

5.10 Bilinear. Quadratic. and Hermitian Forms We first note that the matrix A is, infact, the matrix oftheform (1) with
peet to the standard basis {el' e 2... , ell} in C". Furthermore, if Pdenotes
e transition matrix from the standard basis to a basis {Xl. X2 . X,,}
Let AefF""" andx,y efF". Consider thefunciionfA. defined on IF" x C" then. by Eq. (3.9.3), x, = Pel. i = 1.2, ... , n. Hence for i.] = 1,2, ... n,
by
fA.(x. y) ~ (Ax. y).
}IIi = (AX" x) = (APel> Pel) = (p* APe" e).
Recalling that the standard basis is orthonormal in the standard inner
Recalling the appropriate properties of the inner product, it is clear tha educt, we find that the }Ii} are just the elements of the matrix p*AP. Thus
for a fixed A, the functionf.. in Eq. (1) is additive with respect to both vecto ,;" P" AP and matrices of the same conjugate bilinear form with respect
variables (the subscript A is omitted): different bases must becongruent.
f(xi + X2. y) = f(x.!.y) + /(X.2.Y). rem 1. Matrices of a conjugate bilinear form with respect to d!fferent
f(x. YI + Y2) = f(x, Yl) + f(x, Y2), in C" generate an equivalence classofcongruent matrices.
and homogeneous with respect to the first vector variable: 'Now consider some special kinds of conjugate bilinear forms. A conjugate
'near form fA-ex. y) = (Ax, y) is Hermitian if fA-eX, y) = fA<Y' x) for all
f(lXx, y) = IX/(X, y). ex e IF. 'eC".
If :F = IR. the function is also homogeneous with respect to the seco ertise 1. Check that a conjugate bilinear form is Hermitian if and only
vector variable: he generating matrix A is Hermitian. Moreover. all matrices of such a '
f(x, py) = Pf(x.y), p e IR, are Hermitian matrices having the same inertia characteristics.
while in the case fF = C. the function f == f .. is conjugate-homogeneous wi ;nt. Use Theorem 1 and Sylvester's law of inertia. 0
respect to the second argument: articularly interesting functions generated by conjugate bilinear forms
f(x.. Py) = p/(x. y). fJ e C. the quadratic forms obtained by substituting y = X in fA.(Y, x), that is, the
s fA(x) = (Ax. x) defined on IF" with values in IF.
Now we formally define a function f(x. y), defined on pairs of vect
from IF". to be bilinear (respectively. conjugate bilinear) if it satisfies By using the identity
conditions (1) through (4) (respectively, (1) through (3) and (4'. Consi f(x.y) = Hf(x + y, x+ y) + if(x + iy,x + iy)
the conjugate bilinear function f..(x. y) defined by Eq. (1) on C" x
If 8 = {x p X2' ... x,,} is a basis in C" and x = L7=1 lXlXl' Y = D=l fJ - if(x - iy. x - iy) - f(x - Y,X - y,
then w that any conjugate bilinear form is uniquely determined by the cor-
fA-(x.y) = (Ax.y) = L" (Ax/t X)lX1Pj = L" }lljlX,Pj " nding quadratic form.
l,i=1 1,)= I
where "eise3. Show that the conjugate bilinear form fA.(x. y) = (Ax,y) is
mitian if and only if the corresponding quadratic form is a real-valued
}I'i = (Ax" Xi), i,j = 1,2, ... , n. ction, 0
When expressed in coordinates as in Eq. (5), fA is generally called a c
jugate bilinearform or simply a bilinearform according as ~ = C or ~ e rest of this section is devoted to the study of quadratic forms generated
respectively, The form is said to be generated by the matrix A, and the ma a Hermitian matrix, therefore called Hermitianforms. Clearly. such a
A = [Ylj]7,j= 1 is said to be the matrix of theform fA with respect to the b ttl is defined on ~" but now takes values in IR. In this case
If Our first objective is to investigate the connection between matri
the bilinear form (1) with respect to different bases and, in particular. betw fA(x) = (Ax, x) = L" }l1)1X,), (7)
them and the generating matrix A. 1.)=1
204 S UNBAR TRANSFORMATIONS IN UNITARY SPAC ;i;'iFlNoING THIl CANONICAL FORMS 205

where 11j = 1JI (i,i = 1,2, ... , n) and the /XI (i = 1,2, ... , n) are the coor 'x can be ascribed to the form that it generates. Also, the simultaneous
ates of x with respect to some basis in en.
In viewof Theorem 1, the Hermit .tion of two Hermitian matrices to diagonal matrices by the same
matrices A = [JI'IJ J7,j- 1 defined by Eq. (7) are congruent. Consequen uent transformation, as discussed in Section 5.5, is equivalent to a
Theorem 5.5.1 asserts the existence of a basis in e" with respect to whi Itaneous reduction of the corresponding Hermitian forms to sums of
there is a matrix of the form Ao = diag[Il' - 1,-.. 0]. In this (canonic res.
basis, the Hermitian form is reduced to an (algebraic) sum of squares:
.eise 5. Consider a Hermitian form f(x) defined on IR3 The set of
(AoX', x) = r
II

tJ=1
11}OI/X/i.j = r
1

1=1
IlXl/
z-
,

LllXJl z,
J=I+l
18 in IR3 (if any) that satisfies the equation f(xr. Xz, Xa) = 1 is said to
a central quadric. Show that if the equation of the quadric is referred
where [IXI /X 2 lXJT is the representation of x with respect to t itable new axes, it is reduced to the form
canonical basis. Alzf + AZz~ + Aaz~ = 1. (10)
Theorem 2. For any Hermuian form (Ax, x), thereexistsa basis in C" w' new coordinate axes are called the principal axes of the quadric. (Note
respect to which theform is represented as the sum ofsquares in Eq. (8).wh ;thenature of the quadric surface is determined by the relative magnitudes
s = 7t(A} and r = rank A. igns of A.. AZ' Aa.)
Corollary 1. The numbers of positive a
(Sylvester's law of inertia). t Use/(xt>x Z, X3) = x*Hx and Eq. (9).
negative squares in Eq: (8) are invariants of the Hermitian form/,., denot rise 6. VerifythatAI::S; AZ ::s; AainEq.(IO}aretheeigenvaluesofHand
s = 1l:(A), r - s = veAl. . for H > 0, the quadric surface (10) is an ellipsoid with the point
Note that the representation (8) isreferred to as the first canonical fi 1 1 . 0. O} farthest from the origin.
of (Ax, x) in contrast to the second canonical form: reise 7. Examine the nature of the central quadric whose equation is
,
3x~ + 3x~ - 2xzx 3 - 4X1X3 - 4XlxZ = 1.
(Ax, x) = L A.,I OCiI 2,
i-I Diagonalize the matrix of the Hermitian form on the left using
where {Ai}/= 1 are the nonzero eigenvalues of A and [IX1 lX2 lX. remS.2.t. 0
is the representation of x with respect to a corresponding orthonor .
eigenbasis. The representation (9) readily follows from Theorem 5.2.1.
addition, since A = UDU*, where D = diag[A.. ..1.2 " " , ...1." 0, ... ,0]
U is unitary, the second canonical form (9) can be obtained by a unit 5.11 Finding the Canonical Forms
congruent transformation of bases. This is an advantage of this canoni
form. (See the next section.)
Consider some particular cases of Hermitian forms. By analogy e shall describe in this section some techniques for computing the
matrices, defin~ the Hermitian formf(x} to be positive (respectively.negati : nieal forms of a given Hermitian quadratic form under congruence.
definite if f(x) > 0 (respectively, f(x) < O} for all nonzero x e C". A isan n x n Hermitian matrix, wenote that the second canonical form of
analog extends to the definition of semi-definite Hermitian forms also. ,x} is easily found if all the eigenvalues {,tar: 1 and eigenvectors {x,H= 1
known. Indeed, proceeding from {XI}ja 1 to an orthonormal eigenbasis
Exercise 4. Check that a Hermitian form is definite or semi-defin =1 for A in C" and constructing the unitary matrix
according as its generating matrix (or any other matrix of the form)
definite or semi-definite of the same kind. U = [Xl Xz ... Xn],
Hint. See Theorem 5.3.3. 0 educe by use of Theorem 5.2.1 that U*A U is a diagonal matrix D of the
values of A. Hence for x = Li=llX1X/> we obtain the second canonical
It should also be noted that a parallel exists between Hermitian matri
and Hermitian forms in that the rank and the inertia characteristics of ,
(Ax, x) = L AIllXI1 2,
t Philosophical Mag. 4 (1852), 138-142. 1=1
206 S LINJlAR TRANSIFORMATIONS IN UNITARY SPACIlS ,sJ I FINDING THBCANONICAL FORMS 207

where r = rank A and .t1, , A, denote the nonzero eigenvalues of A cor Thus, we have proved that any real Hermitian form can be reduced to the
responding (perhaps, after reordering) to the eigenvectors f1>' , i r Ho form (2). Applying the same process to h(Xl) and so on, the reduction is
ever, the difficulties involved in finding the full spectral properties of a ma completed in at most n - 1 such steps.
encourage us to develop a simpler method for reducing a Hermitian fo
A method suggested by Lagrange allows us to obtain the first canon' Exercise 1. Find the rank and inertia of the form I(x) .... 2t 012 + 220(3 +
form, as well as the transforming matrix, by relatively simple calculations. ~ and determine a transformation of the coordinates 0(1' /X2' /X 3 that reduces
For convenience we confine our attention to real Hermitian fo f to a sum of squares.
If x = [0(1 0(2 IXn]T, we write
~OLUTION. Performing the transformation /Xl = P3' 0(2 = P2' 0(3 = PI' we
n
have
I(x) = L 11} IXi /X}
I.},= 1
= (pj + 2PIP2) + 2P2P3 = (PI + P2)2 - P~ + 2P2P3'
I(x)
= 1t1IX~ + ... + 1nnO(~+ 2112/Xl/X2 + ... + 21n-1>n/Xn-l/Xn, .(1' Now put ')11 = PI + P2' 1z = Pz, 13 = P3 and obtain
where all numbers involved are real and 1,) = 1/1 for i,j = 1,2, , n. No'
f(x) = ')Ij - ')I~ + 2')12')13 = I5f - 15~ + b~
that the expression (1) can be viewed as a real function in n real variabl
0(1) 0(2' , O(n' here bl = ')Ito b2 = ')12 -1'3,15 3 = ')13' Thus the rank of the form is r = 3, the
number of positive squares is 1t = 2, and the signature is 1t - V = 2 - 1 = 1.
Case 1. Assume that not all of 1" are zero. Then assuming, for instan Note that the transformation P yielding the reduction is obtained by com-
that 111 #: 0, we rewrite Eq. (1) in the form bining the preceding transformations:
i

[ ::Js, = [ -~ ~ ~][::J.
f(x) = 111(/Xi + 21'12 0(1/X2 + '" + 2 ')lIn /X1/Xn) + fl(xl), I
')'11 ')111
, \
where Xl = [/Xz /X3 /Xn]T. Obviously, fl(Xl) is a Hermitian fo 1 0 0 OC3
Following the process of "completing the square," we obtain
Inverting the transforming matrix it is found that substitutions which reduce
I(x} to a sum of squares are given by
I(x) '"" 1'11(0(1 + 112 0(2 + '" + 11n ocn)Z + fz(xl),
~ ~ ~J [:: J.
')'11 ')111

where fz(x 1) is another Hermitian form in Xl' Thus, the nonsingular t


formation of coordinates determined by [:OC3: ] =[ 1 -1 -1 b3
_ /Xl + -112 OC2 + '" + -an,
Yin 'Note also that, by using Lagrange's method, we have avoided the computa- .
P1 -
')111 111 n of eigenvalues of the matrix

A=r~ ~ ~J
transforms the given form into the form
')'I1Pi + f 2(Xl)'
and we can write Xl = [JJ2 P3 ... Pn]T, lOll

Case 2. If = 1'22 = .. , = ')'nn = 0, then we consider a coefficie


111
,at generates the form. Such a computation would require the calculation
OCu #: 0, where i :f= j, and assume, for instance, that OC12 :f= O. We now make ,all the zeros-of the characteristic equation A3 - .tz - 24 + 1 = O. As we
congruence transformation off determined b,ythe coordinate transformati . eady know, two of them must be positive and one negative. 0
Pl = OC .. pz = OCz - /Xl' P3 = OC3"'" and Pn = 1Xn. It is easily seen t
.{Another method (due to Jacobi) for finding the inertia of A or, equivalently,
the form determined by the transformed matrix has a term in Pf. We
now apply the technique described in case 1 to reduce f to the form (2). .the corresponding Hermitian form will be discussed in Section 8.6. .
208 S LINIlA1l TRANSFORMATIONS IN UNITARY SP ,z.. THE THEoRY IlF SMALL OsclLLATlIlNS 209

Exercise 2. Find the rank and inertia of the following Hermitian forms an rgy V is agreed to be zero in this position, then for small oscillations,
find transformations reducing them to sums of squares: n n

(a) 21x~ - O(~ + 0(10(2 + 0(10(3; V =! L L CJkPJPk = !pTCp,


j= 1 k= 1
(b) ad + 2(OClOC2 + IX2llf3 + OC3OC4 + OC4OCs + OCIOCS); ere C is a real symmetric nonnegative (or positive) definite matrix, that
(c) 2Ix1OC2 - IX1IX3 + OCIOC4 - 0(2IX3 + OC2IX4 - 2IX3OC4' V 2:: 0 for any p ERn.
Answers. (a) r = 3, 11: = 1, v = 2; (b) r = 5, 11: = 3, v = 2; e define the derivative of a matrix (or a vector) P to be the matrix whose
11: = 1, v = 2. The respective transforming matrices are ments are the derivatives of those of P. For any conformable matrices
nd Q whose elements are differentiable functions of x, it is easily seen that

[
1
o
o0
t1 !]
1 ,
1
1
0
0
0
1 0
1 -1
0 1
0 0
0 1
0 1
1 -1 ,
1 -2 [ -~0~0-~1 1~]. 0
We now
~(PQ) =
dx
dP Q + pdQ.
dx dx
view Pb P2"'" Pn' PI' 1'2"'" Pn as a set of 2n independent
(1)

o 0 0 1 iables and we obtain


0 0 0 0 1
aT
~.1.
(apT A'P +p'TA ~.op)
=!2~'
UI'I IIPI UPI

1(e,TA'p
= 2 + P'TA)
e,
5.12 The Theory of Small Oscillations
= AI.;,
In this section we introduce an important physical problem whose sol. ,ere AI. denotes the ith row of A. Similarly,
tion depends on the problem resolved in Theorem 5.5.2: the simultaneo ov
reduction of two Hermitian matrices (or forms). We will describe the sm -;- = C,.p.
up,
oscillations of mechanical systems, although the resulting equations
their solution also arise in quite different circumstances; the oscillations Equations of motion for the system (with no external forces acting) are
current and voltage in electrical circuits, for example. . tained by Lagrange's method in the following form:
Consider the motion of an elastic mechanical system whose displad
ment from some stable equilibrium position can be specifiedby n coordinat ~(OT) _aT = _oV i= 1,2, ... ,n.
Let these coordinates be Pb P2' , Pn' If the system performs small os. dt 0PI op, aPI'
lations about the equilibrium position, its kinetic energy T is represent m the above expressions for the derivatives we find that the equations
by a quadratic form in the time rates of change of the coordinates; Pb ... , otion are
say. Thus, there exists a real symmetric matrix A such that
i = 1,2, ... , n,
n n
I ~ ~ i'TA'
T = }' t... t... aikPjp" = 'IP p,
j=l k=1
Aji + Cp = O. (2)
where p = [PI P2 Pn]T and p = [PI P2 , Pn]T. Furthermot
for any motion whatever,''; E Rn and for physical reasons, T is positive. FOllowing the methods in Section 4.13, and anticipating sinusoidal
A is a real, positive definite matrix (see Exercise 5.10.4). lutions, we look for a solution p(t) of Eq. (2) of the form p(t) = lIeIOJ',
When the system is distorted from its equilibrium position, there is 'ere q is independent of t and 00 is real. Then;; = - oo2 qelOJf and if we write
erally potential energy stored in the system. It can be shown that ifthe coo == 002, Eq, (2) becomes
nates are defined so that p = 0 in the equilibrium position and the poten (-,lA + C)q = 0, (3)
210 S LnU!AI.l TRANSFORMATIONS IN UNITARYSPACllS ~t12 TIm THBoRYOF SMALL OscILLATIONS 211

or :rhUs, when expressed in terms ofthe normal coordinates, boththe kinetic


and potential energies are expressed as sums of squares of the appropriate
(AI - A-Iqq = O. variables.
The admissible values of 1 (defining nasural frequencies ofvibration) are the . We can obtain some information about the normal modes by noting that
n eigenvalues of the matrix A - IC. The modesofvibration are determined by ~hen the system vibrates in a normal mode offrequency COl' we have s = e,
corresponding right eigenvectors. However, we need to ensure that the and so the solution for the ith normal-mode vector ql is
eigenvalues are real and nonnegative before our solutions have the required qi = (A- I (2U}ei = ith column of A- I (2U.
form with a real value of co.
The algebraic problem posed by Eq. (3) is just a special case of Eq. (5.5.S)c Hence the matrix of normal-mode vectors, Q = [ql q2 . qll] is given
and, using the argument of that section, the eigenvalues of A-I C are just byQ = A- I(2U. Furthermore,
those of A- I(2CA- I(2. But the latter matrix is obviously nonnegative (or QTAQ = I, QTCQ = W 2 , (5)
positive) definite along with C. Hence, all eigenvalues of Eq. (4) are real and
nonnegative (or positive if C is positive definite). Write for the jth eigenvalue ~hich may be described as the biorthogonality properties of the normal-
1j = co}, (COj ~ O), and set W = diag[cob ... , {Oil]. mode vectors:
The proof of Theorem 5.5.2 shows that if we define new coordinates in ,Now we briefly examine the implications of this analysis for a system to
e" by q = A-l(2US, where UU = I and "Which externalforcesare applied. We suppose that, as in the above analysis,
the elastic system makes only small displacements from equilibrium but
U*(A- I(2CA- I(2}U = A = W 2,
that, in addition, there are prescribed forces fl(t), ... , f,,(t) applied in the
then A and C can be simultaneously diagcnalised, (In fact, U is here a (real) l::C)ordinates PI' P2' . , PII' DefiningJ(t) == [f1(t)f,,(t)]T, we now have
orthogonal matrix.) Reverting to the time-dependent equations, this suggests
the definition of new generalised coordinates for the system:
given by p(t) = A-I(2U~(t}, so thatp :=; A-I(2U~. Substitution into Eq. (2'
'I'...''"' the equations of motion:
Ail + Cp = f(t). (6)
and premultiplieation by U*A - 1(2 leads to the"decoupled" equation: Putting P = QC, premultiplying by QT, and using Eq. (5), Eq. (6) is immed-
iately reduced to
~+ W2~ =0,
or
i :=; 1, 2, ... , n. or
This implies that ej + coje} = qjf(t),
2 T
]

= 1,2, ... , n.
e~t) = exl cos(colt + I} if col:l= O. If (OJ :1= 0, a particular integral of such an equation can generally beobtained
The ex's and 's in these equations are arbitrary constants that are usuaUy! in the form .
determined by applying given initial conditions. When el
varies in this,
fashion while all other e's are zero, we say that the system is oscillating in the
ith normal mode. The coordinates eb e2 , , ell are normal coordinates for
{ttl = -1
COJ
it
0
qJJ(r) sin(coj(t - -r dt,
the system. Notice that . and if coJ = 0, then
T = ijJTAp = i~T~
= !(~i + ~~ + ... + ~~)
ett) = qIf(rXt - r)dr.

and ~ Particular, if det W "I: 0, we may write


v = tpTCp = i~TW2;
';(t) = W- I f~ sin(W(t - rQTf(r:) dr,
= t(Q~e~ + ... + co~e~).
212 5 UNJlAIl TRANSFORMATIONS IN UNITARY SPA .13 AOMISSIBLB PAIRS OF MATRICIlS 213

where integration of the vector integrand is understood to mean term-wi ;8.1). Indeed, the case r = 1 is of considerable interest. Let K, = Ker X
integration, and the jth diagonal term of the diagonal matrix sin W(t - nd let K1, be the set of all vectors in both Ker X and Ker XT. Obviously,
is sin(cojt - -r. 1, c:; K" although we may have K" = K t (When p = 1 and T:F 0, for

An important case that arises in practice can be described by writi xample.) It is also clear that we may write
fl.t) = foe''''', where fo is independent of t. This is the case of a prescri
sinusoidal external (or exciting) force. It is left as an exercise to prove th K" = Ker[:T]'
if co :F OJj (j = 1,2, ... , n), then .
= 1,2, ... , define
pet) = Q~(t) = e'flJ' t (Ii OJ1qJfo~
j= 1 OJj - OJ
ore generally, for I
X
is a solution of Eq. (6). XT
This result can be used to discuss the resonance phenomenon. If we imagi K 1 = Ker XT 2 ,
the applied frequency OJ to be varied continuously in a neighbourhood 0
natural frequency COj' Eq. (7) indicates that in general the magnitude oft XT1-, t
solution vector (or response) tends to infinity as OJ'" OJJ' However,
singularity may be avoided if qIIo = 0 for all qJ associated with OJJ. nd it is seen that we have a "nested" sequence of subspaces of ep :
Exercise 1. Establish Eq. (7). K, ::> K 2 ::> K 3 ::> ....
Exercise 2. Show that if C is singular, that is, C is nonnegative (but fundamental property ofthis sequence is contained in the next lemma.
positive) definite, then there may be normal coordinates ~t<.t) for Will
1~t<.t)I"'" 00 as t .... 00. mma 1. Let (X, T) be an admissible pair of orderp and let the sequence
Exercise 3. Consider variations in the stiffness" matrix C of the fo h K", ... of subspaces of e'l be defined as above. If K j = K j + 10 then K j =
C + kl, where k is a real parameter. What is the effect on the natural fl': )+1 = l(J+" = ....
quencies of such a change when k > 0 and when k < 07 0
The matrices

and (I)
5.13 Admissible Pairs of Matrices

When an r x p matrix X is multiplied on the right by a p x p matrix ve sizesjr x p and (j + l)r x p, respectively. If K, = K J+ 1 then, using
the resulting matrix XT is again r x p. In fact, all matrices in the seque oposition 3.8.1, these two matrices are seen to have the same rank. Think-
X, XT, XT 2 , are of size r x p. Such sequences appear frequently in so in terms of" row rank," this means that the rows of XTJ are linear com-
modern applications of the theory of matrices and particularly in syste nations ofthe rows of the first matrix in (1). Thus, there is an r x jr matrix
theory. We give a brief introduction to some of their properties in such that
section.

~T
First, any pair of matrices (X, T) of sizes r x p and p x p, respectiv~,
with Ker X :F {O} is said to be an admissible pair oforderp. Of course, r . (2)
p are arbitrary positive integers, although in applications it is frequently
XTj = E[ ].
case that r s; p, and if r < p, then certainly Ker X :F {OJ (see Propositi XTj-t
214 5 UNllAR TRANSFORMATIONS IN UNITARY SPACES 5.13 ADMlSSlBLB PAIRllOF MATRICES 215

Now let xeKj = K j + 1 Then, using Eq, (2), Exercise 5. Let (X, T) be as in Exercise 4 and let R, S be nonsingular
matrices of sizes r x rand p x p, respectively. Show that the pairs
(X, T) and (RXS, S-lTS)
XT XT 2
XTJ+I X=(XTl)Tx=E X: ] Tx=E [XT]
: x=O,
have the same index and unobservable subspaces of the same dimension
[
XTJ-l XTJ (admitting zero dimension in the case of observable pairs). Observe that,
using Theorems 4.3.1 and 4.3.2 this exercise implies that observabiIity can
since x eKJ+ 1 Consequently, x eKer(XTJ+l) and so x eKj+ 2 Then it be interpreted as a property of a pair of linear transformations which have
follows inductively that for any i ~ j, we have K; = K j X and T as matrix representations. 0
The lemma now shows that the inclusions in the sequence K 1 => K 2 => The next result is the form in which the observability of (X, T) is usually
K] => '" are proper until equality first holds, and thereafter all K/s are presented. Note that the value of the index of the pair (X, T) does not appear
equal. The least positive integer s such that K. = K.+ 1 is a characteristic explicitly in the statement.
of the admissible pair (X, T) and is called the index (of stabilization) of the
pair. Thus, if K, has dimension kit then k 1 > k 2 > .. , > k, = kr+l = .... Theorem 1. The admissible pair (X, T) ojorderp is observable if and only if
Since the first s - 1 inclusions are proper it is clear that s ~ p (see also

~T
Exercise 3 below). The subspace K. is said to be residual.
A distinction is now made between the cases when the residual subspace (3)
rank[ ] = p.
K. is just {OJ, so that k, = 0, and when K. contains nonzero vectors, so that
k, ~ 1. If sis theindex ofstabilization of the admissible pair (X, T) and k, 0, = XTP-l
the pair (X, T) is said to be observable. Otherwise, the residual subspace K.
is said to be unobservable, a terminology originating in systems theory PROOF. If condition (3) holds, then in the above notation, the kernel
(see Section 9.11). K p = {OJ. Thus, the unobservable subspace is trivial and (X, T) is an
Exercise 1. Let (X, T) be an admissible pair of order p, and establish the observable pair.
Conversely, if (X, T) is observable then, by'Exercise 3, the index s cannot
following special cases.
exceedp.Hence K, = K p = {O}, and this implies the rank condition (3).
(a) If r = I, X::F 0, and T = lp, then the index of (X; lp) is one, and .
The results described so far in this section have a dual formulation. The
k1 = k;z = = p - 1.
...
discussion could begin with consideration of the subspaces of CP,
(b) Ifr = 1, X i= 0, and (X, T) are observable, then the index is p.
(c) If r = p and det X #: 0, then for any p x p matrix T, the pair (X, T) R. = 1m X*, R;z = 1m X* + Im T*X*,
is observable with index one.
and the trivial observation that R 1 c:: R 2 c::O. But there is a quicker way.
(d) If x T is a left eigenvector of T, then the pair (xT , T) has index s = 1.
Exercise 1. Show that the unobservable subspace of an admissible pair:
Define R J = L;= I ImT*t- X*) and observe that
(X, T) is T-invariant. e, = Im[X* T*X* T*i-X*]. (4)
Exercise 3. If (X, T) is an admissible pair of order p and has index s, and. Then Proposition 5.8.1 shows that
if X has rank p #: 0, show that
p-k
--'~ss:p+ 1- p-k.
p
Exercise 4. Let (X, T) be admissible of order p with X of size r x p. Sho
. Ker[ :r ]
XTrl
$ Im[X* T*X* ... T*i-1X*] = CPo

that for any matrix G of size p x r, the pairs


Consequently, we have the relation
.(X, T) and (X, T + GX)
have the same index and unobservable subspace. KJ$R j = CP (5)
216 S LINBAR. TRA\IISFORMATIONSIN UNITARYSPACIlS S,l4 MISCBLLANEOUS ExmtCISIlS 217

for j = 1.2.... The properties of the sequence 1(1) K 2 now tell us im- Hint. Prove that there are m x m matrices Mj t k ~ 2.j "'" 1.2, ... k - 1.
mediately that RIc R 2 C c CP with proper inclusions up to Ra-) eRa. such that
where s is the index of (X. T), and thereafter R. = R.+) :::: .... Furthermore, [B (A + BK)B ... (A + BK)n-IB] =
R. = CP if the pair (X, T) is observable.
In this perspective. it is said that the pair (T .... X*) is controllable when
(X. T) is observable. and in this case K. = {O} and Ra = CPo If R.:F C". [B AB ... An-IB] ~
I M12 ...
I . ",
Min
M"
1 0
then R. is called the controllable subspace and the pair (T'"X"') is uncontml [ n~ 1.n
lable. o .., 0
Note that. using Eq. (4) and the fact that s ::s:; p. we may write
a-I ,-1
R. = L ImT"'tX"') = L ImT"'tX*).
10=0 10=0 5.14 Miscellaneous Exercises
The criterion of Theorem 1 now gives a criterion for controllability which
is included in the following summary of our discussion.
1. For which real values of IX are the following matrices positive definite:
Corollary 1. Let A and B be matrices of sizes p x p andp x r, respectively.
Then thefollowing statements are equivalent: 1 IX IX] 1 1 IX]
(a)
[
IX 1 IX. (b)
[
1 1 1.
(a) The pair(A, B) is controllable;
(b) The pair (B"'. A*) is observable; IX IX 1 IX 1 1
(c) rank[B AB ... A,-lB] = p; Answer. (a) -! < IX < 1; (b) none.
(d) C' = Lr;J Im(AkB).
2. Let A e C"XIl and suppose Ax = Ax and A"'y = I-IY (x, Y:F 0). Prove
Exercise 6. Consider a pair (A, b), where A is px p and b e 0 (i.e., r =1' that if A. "!: 1-1, then x 1. Y, and that if x = Y. them A. = ji.
in Corollary 1). Let A be simple with the spectral representation of Theorem 3. Check that the matrix H ~ 0 is positive'definite if and only if det H :F O.
4.10.3. and let b = B-1 Pix] be the representation of b in the eigenbasis ~. Let A, BE C" x" and define
for A (with n replaced by p). Show that if p] = 0 for some i. then (A, b) is
not controllable. R=[AB -AB]
Exercise 7. Use Exercise 4 to show that if A. B. and F are matrices of sizes
p x p, p x r, and r x P. respectively, then the pairs Show that 1 is an eigenvalue of R only if -1 is an eigenvalue of R.

, (A, B) and (A + BF.B) Hint. Observe that PR + RP = 0 where


have the same controllable subspace.
Exercise 8. Let A. B be as in Exercise 7 and let U, V be nonsingular matrices
P= [_~ ~l
of sizes p x p and r x r, respectively. Use Exercise 5 to show that the pair 5. Verify that a Hermitian unitary matrix may have only + 1 and -1 as its
(A, B) is controllable if and only if the pair (U - 1AU. U - 1B V) is controllable. eigenvalues.
~. If P is an orthogonal projector and IX), 1X2 are nonzero reat numbers.
Exercise 9. Let A e C" x" and, B e C" x m. Show that the following three prove that the matrix IX~P + IXW - P) is positive definite.
statements are equivalent:
Hint. Use Theorem 5.2.1.
(a) The pair (A, B) is controllable;
(b) For any m x n matrix K the pair (A + BK, B) is controllable; ,'7. Prove that an orthogonal projector that is not a zero-matrix or a unit
(c) For any complex number 1, the pair (A - lI, B) is controllable. matrix is positive semi-definite.
218 S UNI!AB. TilANSFOlilMATIONS IN UNITARYSPACES 5.14 MISCBLLANBOUS EXERCISES 219
8. Prove that if A is Hermitian, then the matrix I + eA is positive definite H such that the matrix B ~ A*H + HA is positive definite, then In A
for sufficiently small real numbers e. = {n,O,O}.
9. Show that a circulant matrix is normal.
Hint. Consider the form (Bx, x) evaluated at the eigenvectors of A.
10. Prove that an idempotent matrix is normal if and only if it is an ortho-
gonal projector. 19. Let A = HI + iH z be the Cartesian decomposition of A. Show that if
11. Prove that A is a normal matrix if and only if each eigenvector of A is
HI < 0, then n(A) = O.
an eigenvector of A"'. 20. Check that if n(A) = 0, then n(AH) = 0 for any H > O.
Hint. See Exercise 5.2.3 and compare with Exercise 2 above. Hint. Observe that H 1/Z(AH)HI- I/Z) = H 1/zA(H 1/ Z)*.
12. Prove that if A'"B = 0, then the subspaces 1m A and Im B are orthogonal. 21. Let HI = Hr, Hz = H1. We write HI :?: Hz if HI - H 2 is a positive
13. Prove that any leading principal submatrix of a positive semi-definite semi-definite matrix. Prove the following statements:
matrix is positive semi-definite. (a) H 1 ~ HI (symmetry);
Hint. See Corollary 5.5.2. (b) IT HI ~ Hz and u, H 3 , then HI ~ H 3 (transitivity);
(c) If HI >0, Hz >0, and HI:?: Hz, then Hi 1 ~ H 11
14. If A, Been x n are positive semi-definite matrices, prove that
(a) The eigenvalues of AB are real and nonnegative;
Hint. For part (c) use Exercise 20.
(b) AB = 0 if and only if tr AB = O. 22. Prove the Fredholm alternative for matrices: If A E cm x n then either the
Hint. For part (a) consider the characteristic polynomial of AB, where system Ax = b is solvable for any b E em or the homogeneous system
A = U diag[D o, O]U*, Do :?: O. A*y = 0 has nonzero solutions.
23. Prove that if S is real and skew-symmetric, then 1 + S is nonsingular
IS. Prove that the Gram matrix is a positive semi-definite matrix for any and the Cayley transform,
system of elements and is positive definite if the system is linearly in-
dependent. T = (I - SXI + S)-I,
16. Let x,y be elements ofa unitary space~ with a basis 8 = {xl> X2"'" x n} . is an orthogonal matrix.
Prove that, ifrt and II are representations of x and y, respectively, with 24. (a) If A is a real orthogonal matrix and 1 + Ais nonsingular, prove
respect to 8, then that we can write
(X,y) = (rt, GP), A = (I - S)(1 + S)-I,
where G is the Gram matrix of 8. where S is a real skew-symmetric matrix.
Conversely, show that for any G > 0, the rule in Eq, (1) gives an inner (b) If U is a unitary matrix and 1 + U is nonsingular, show the matrix
product in tit and that G is the Gram matrix of the system 8 with respect
to this inner product.
A = i(l- U)(I + Uri
17. Let A, B, C E en )(nbe positive definite. is Hermitian.
25. Let B, C E IRn K " , A = B + iC, and
(a) Prove that all 2n zeros AI of det(AZ A + ..tB + C) have negative
[B-C]B'
real parts.
R= C
Hint. Consider the scalar polynomial (x, (A~ A + ~B + C)x) fol
x E Ker(A~A + AlB + C) and use Exercise 5.10.4. Prove that
(b) Show that all zeros of det(AzA + ..tB - C) are real. (a) If A is normal, then so is the real matrix R;
(c) Prove the conclusion of part (a) under the weaker hypothes (b) If A is Hermitian, then R is symmetric;
that A and C are positive definite and B has a positive definite real pa (c) If A is positive definite, then so is the real matrix R;
18. Let A e en x n. Prove that if there exists a positive definite n x n matri (d) If A is unitary, then R is orthogonal.
6.1 ANNIHILATING PoLYNOMIALS 221

6.1 Annihilating PolynomialA

CHAPTER 6
We shall first consider the notion of annihilating polynomials for a linear
transformation T: 1/ -. If. The significance of this for our geometric ap-
proach mentioned above is not immediately obvious, but it will be seen
The Jordan Canonical that the main result of the section, Theorem 4, does indeed present a
decomposition of 1/ as a direct sum of subspaces. This will be important
Form: A Geometric later in the chapter.
Let 1/ be a linear space over fF, let Te!i'(I/), and let P(A) = I:l=o p,A'
Approach (P, :F 0) be a polynomial over" As with matrices (see Section 1.7), the
transformation peT) is defined by the equality P(T) :A I;1=0 P,T
'.
Exercise 1. Show that for any two scalar polynomials PI(A) and P2(A)
over' and any Te !i'(f/), the transformations PI(T) and P2(T) commute.
D
The result of the first exercise yields the possibility of substituting T
insteadof Ain any expression consisting ofsums and products ofpolynomials.
Exercise 2. Show that the equality
Let T denote a linear transformation acting on an n-dimensionallinew;' s II .J
space If. It has been seen in Section 4.3 that a matrix representation of T iii P(,t) = I; Il PitA), \
determined by each choice of basis for ~ that all such matrices are similai J= I 1= 1 'i
to one another, and that they form an equivalence class in the set of al Where PIJ{).) are polynomials in A., implies
>

n x n matrices. The simplest (canonical) matrix in the equivalence cl; s k


d~ of similar matrices (representations of T in different bases of 1/) w peT) = L Il PitT)
J= 1 1= 1
be found in this chapter. For simple transformations, this problem i
already been solved in Section 4.8; its solution was based on the existence .~xereise 3. Let Te !i'(f/) and let I/o be a subspace in If. Show that if I/o is
a basis in 9' consisting of eigenvectors of the transformation. Furthermon invariant, then it is p(T)-invariant for any polynomial p().) over ". I
the canonical form was found to be a diagonal matrix of eigenvalues.
'Breise 4. Let Te !i'(9') with f/ a linear space over i'. H 10 e a(T)
For a general transformation Te !i'(f/) there may be no eigenbasis
o E "), and P(A) is any polynomial over fF, show that P(lo) e o(p(T.
T in If. Therefore, additional elements from 1/ must be joined to a set
oreover,if x is an eigenvector of Tcorresponding to 10 , then x is an eigen-
linearly independent eigenvectors of T to construct a "canonical" b tor of P(T) associated with P(Ao).
for f/ with respect to which the representation of T is in a canonical fot
(although no longer diagonal). This procedure will be performed in Secti' wtlON. H T(x) = AoX, then
6.5, after the structure of T-invariant subspaces is better understoe
This is called the geometric (or operator) approach, in contrast to t
presented in the next chapter and based on algebraic properties of matrii
P(T)(x) = (f
1;0
P1T1)(x)
= 1=0 PIA~X = P(Ao)x. D
of the form ).J - A. 'em I. Let Pl(A), P2(A), ... , Pk(A) be arbitrary nonzero polynomials
~r fF andlet p().)denotetheirleastCommon multiple. Then/or any T e !i'(f/),
t
Ker p(T) = I; Ker p,(T). (1)
1=1

220
222 6 THE JORDAN CANONICAL Fo: 6.1 ANNIHILATING PoLYNOMIALS 223

PROOF. Let x eI:,= 1 Ker pi.T). Then x = D",I Xi' where pl(T)(x,) = 0 pl(T)(x) = 0 and P2(T)(x) = 0 then, since Pl(A.) and pz(A) are relatively
i = 1,2, ... , k. Since p(..t) is a common multiple of Pl(A),.. ,Pl(A), the: prime (see Theorem 3 in Appendix 1)
exist polynomials qi...t) such that p(A.) = q,(A)PI(A.), i = 1,2, ... ,k. Usin lex) = cl(T)Pl(T)(x) + CZ(T)P2(T)(x) = 0,
Exercise 2, we obtain peT) = q;(T)pi.T). Hence Xie Ker pi.T) impli:
x,eKer p(T)and consequently X= 1:'=1 xleKer PeT). Thus, 1:f=l
Ker Pi. . and consequently the required result: x = O. An induction argument like
c Ker P(T) for any common multiple pel) of Pl(A.), Pz(A) . , PtCA). :\t' tbat of Theorem 1 completes the proof.
The opposite inclusion can be proved by induction on k. The crux of tb '~~Y:.;"Note that the hypothesis on the polynomials in Theorem 2 is strongerthan
argument is contained in the case k == 2. So first let k == 2 and assu !i~#~;~e condition that the set {Pi...t)}~", 1 be relatively prime. (See Exercise 4 in
P(T)(x) = O. Then, as in the first part of the proofs, Pi(T)qi.T)(x) = 0 tl .;:;, Appendix 1 and recall that the polynomials being considered are assumed
i == 1,2. If )', == q;(T)(x) then, obviouslY')'i e Ker pi.T). Furthermore, p(..t) tobe monic.)
the least common multiple ofPI (A) and pz(A) and, therefore, the polynomia "Now we define a nonzero scalar polynomial P(A) over IF to be an
ql(A.) and qz{A.) are relatively prime. (See Exercise 8 in Appendix 1.) This '~nnihilating polynomial of the transformation T if P(T) is the zero trans-
equivalent (see Eq. (3) in Appendix 1) to saying that Cl(A)ql(A) + C2(A.)qz(, (ormation: P(T) = O. It is assumed here, and elsewhere, that T E !R(9') and
= 1 for some polynomials C 1(A) and c 2(..t). Thus there are polynomi " is over IF.
c~A.) (i = 1,2) such that
ereise 5. Show that if peA) is an annihilating polynomial for T, then so is
cl(T)qt(T) + CZ(T)q2(T) == I. e polynomial p(..t)q(..t) for any q{..t) over IF.
Writing x, == Ci.T)()',), and applying this representation of I to x we obtai xercise 6. Check that if P(A) is an annihilating polynomial of T and
x = Xl + xz, where X, e Ker pi.T) (i = 1,2). Hence X e Ker Pl( . E aCT), Ao E IF, then P(Ao) = O. (See Exercise 4.) 0
+ Ker pz{T), as required. In other words, Exercise 6 shows that the spectrum ora transformation T
Now let the result hold for the polynomials PI(A), Pz(A), ... , Pn-l('
included in the set of zeros of any annihilating polynomial of T. Note that,
(2 < n s k). Thus if P(A) denotes their least common multiple, then
view of Exercise 5, the converse cannot be true.
n-l
Ker P(T) = 1: Ker pi.T). leorem 3. For any Te !R(9') there exists an annihilating polynomial of
'''1 ee I ~ nZ , where n = dim [!J.
But the least common multiple p{il) of Pl(A.), ... Pn(il) coincides with t OOF. We first observe that the existence of a (monic) annihilating poly-
least common multiple of p(..t) and Pn{A) (Exercise 9 in Appendix :inial p(A) for T is provided by Exercise 4.1.12 and is equivalent to the
Therefore, using the already proved part of the theorem and the inducti dition that T be a linear combination of the transformations T ',
'
hypothesis, 0,1, ... , 1- 1, viewed as elements in !R{9'). (We define TO A I.) Thus,
n some Po, Ph"" P,-l e~
Ker PeT) = Ker PeT) + Ker Pn(T) = I: Ker pi.T).
1=1
. T ' = -pol- PIT- .. - PI_IT'- I. (3)

The proof is complete. 'to verify Eq, {3}, we examine the sequence I, T, T 2 , '" Since the dimen-
~n of the linear space 9(9') is n Z (see Exercise 4.1.8),. at most n Z elements
Theorem 1. If, in thenotation ofTheorem 1, each pair ofpolynomials select, this sequence are linearly independent. Thus there exists a positive
from {PI(A.), p2{il), ... , PleA)} is relatively prime,then eger I (1 ~ I ~ nZ ) such that I, T, , T '- l viewed as elements in 2(9')
"linearly independent, while I, T, , T ' form a linearly dependent system.
II . ce T' is a linear combination of T i = 0, 1, ... , 1- 1. This proves
Ker peT) = L' Ker pi.T). theorem.
',
1=1

PROOF. Consider the case k.= 2. To show that the given sum is din he main result follows readily from Theorem 2, bearing in mind that
it suffices to show that Ker Pl(T) () Ker P2(T) = {OJ. But if simuItaneoU =
'(T) 0 then Ker(p(T = f/'.
224 6.2 MINIMALPOLYNOMIALS 225

Theorem 4. Let Te 9'(f/) and let p(A) denote an annihilating polynomial of PlOOF. Let P(A) be an annihilating polynomial of T and let mel) denote
T. If p(.t) = 1]=1 p.(.t) for some polynomials pl(A), .. , p,,(A) and if et.lCk. .a minimal polynomial of the transformation. Using Theorem 1 of Appendix
pair of polynomials selected from {Pl(A), p2(A), ... , p,,(A)} is relatively prime, I, we write
then PeA) = m(A)d(A) + rCA), (1)
f/ " .Ker p.(T).
=L where either deg r(A) < deg m(A) or r(.t) == O. Assuming that m(A) does not
1=1 divide P(A), that is, r(A) = 0, we arrive at a contradiction. Indeed, applying
the result of Exercise 6.1.2 to Eq. (I), we find from p(T) = m(T) = 0
Theorem 4 is the main conclusion of this section and, as it will be seen,i that r(T) = O. Hence r(A) is an annihilating polynomial of T of degree less
allows us to study the action of T on f/ in terms of its action on the (T';' than that of the minimal polynomial m(A). This conflicts with the definition
invariant) subspaces pI,.T). ofm{A).
Exercise 7. Suppose Al :1= .1. 2 and the matrix CoroUary 1. The minimal polynomial of T is unique.

A =
AI
0
1 0 0]
Al 0 0
PRooF. If ml(A) and m2(l) are minimal.polynomials of the transformation
!o then by Theorem 1 each of them divides the other. Since both of them are
monic, Exercise 2 of Appendix 1 yields ml(.t) = m2(.t).
[
o 0 .1.2 1
o 0 0 i1.2 CoroUary 2. The set of all annihilating polynOmials of T consists of all
4 pplynomials divisible by the minimal polynomial of T.
is viewed as a linear transformation on C Show that
P(A) A (A - A1)2(.t - .1.2)2 hOOF. To see that every polynomial divisible by meA) is an annihilating
polynomial of T, it suffices to recall the result of Exercise 6.1.5.
is an annihilating polynomial for A and verify Theorem 4 with Pl0) ;::::.'(

(A - .1. 1 )2 and P2(A) = (A - A.2)2. /.'1'eorem Z.The set of the distinct zeros of the minimal polynomial of T
Show that the conclusion does not hold with the choice of facto toincides with the spectrum of T.
ql(A) = q2(A) = (A - A1X.t - .1.2) for P(A). 0
~ooF. In view of Exercise 6.1.6, it suffices to show that any zero of the
~imal polynomial meA) of T belongs to the spectrum a(T) of T.
.... Let m(A) = q(l) TIt=
1 (A - Al) , where q(l) is irreducible over !F. Assume
on the contrary that As' a(T) (1 ~ s ~ k); in other words, T - ;'s1 is non-
6.2 Minimal Polynomials ~gular. Since meT) = 0, Exercise 6.1.2 implies

It is clear that a fixed linear transformation Te !(f/) has many annil . q(T) TI" (T - .1.;1) = O.
i== 1,.
ing polynomials. It is also clear that there is a lower bound for the del
of all annihilating polynomials for a fixed T. It is natural to seek a mo
annihilating polynomial of the least possible degree, called a min
polynomial for Tand written m:r(A), or just meA). It wiDbe shown next that
Ice rii(A) = m(A)/(A - .t..} is an annihilating polynomial of T. This contra-
s the assumption that m(l) is minimal and shows that lb A2 , , A" E a(T)
required.
i
.~
set of all annihilating polynomials for T is "well ordered by the degreeS
the polynomials and, hence, there is a unique minimal polynomial. In this chapter the nature of the field !F has not played an important role.
"W, and for the rest of the chapter, it is necessary to assume that !F is
Theorem I. A minimal polynomial ofT e 9'(f/) divides anyotherannlhilat, :ebraically closed (see Appendix 1). In fact, we assume for simplicity that
polynomial of T. ... C, although any algebraically closed field could take the place of C.
i
I
i~i
~
62 MINIMAL PoLYNOMIALS '1:27
226
In view ofEq. (5), we may represent any x e;f/ as x = D=1 x" where x,E<J,
Corollary 1. Let!F = C, and let Te !R(!?), with u(n = {Ai> Az,"" A,}; for i = 1,2, ... , s. Then, using the commutativity of polynomials in T and
The minimal polynomial m(A) of T is given by the conditions (1j - A,I,)'"'x, = 0 (i = 1,2, ... , s, i ::f:: j) and (Tj - AJl j l i (xJ)
meA) = n (A -
I

Ait"
=O. it is easily found that ml(T) = O. Since ml(A.) is annihilating and divides
1=1 the minimal polynomial m(.t), a contradiction is obtained.
for somepositive integers m, (i = 1,2, ... , s). Proposition 1. With the notation of Theorem 3, dim f; ~ m..
The exponent m, associated with the eigenvalue A, e a(A) is clearly unique PROOF. It has been shown above that (1 - 1ir" is the minimal polynomial
(see Corollary 1 of Theorem 1). It plays an important role in this and subse- of 1;. Hence (1; - AJ)'"' = 0 and (11 - A,I)'"'= 1 ::f:: O. From Exercise 4.5.14
quent chapters and is known as the index of A,. we see that
Applying Theorem 6.1.4 to m(.t) written in the form (2), we arrive at the
Ker(1j - ,V) c Ker(1j - AII)Z c '" c::: Ker(1j - A,r!""
following important decomposition of a linear transformation.
each inclusion being strict, and dim(Ker(1j - All ~ 1. Consequently,
Theorem 3. Let T e .!C(!?), let O'(T) = {A1o Az, ... , A,}; and let
index milor eachi. Then dim t'; = dim(Ker(1; - All)"") ~ mi'
, Exercise 1. Show that if f/ has dimension n and Te .!C(f/), then the degree
T= I:. n",
1=1 of the minimal polynomial of T does not exceed n. 0
Note that the spectrum of each 1; = TI" in Eq. (3) consists of one point
where,jor i = 1,2, ... , s,
only and that 0'(1;) f"I u(Tj) = f/J for i :1= j. It turns out that the decomposition
'D, = KerT - A,Ir')
is a T-invariant subspace.
ofa lineartransformation into a direct sum of such transformations is unique.
Theorem4 The representation (3) is unique (up to the orderofsummands).
I
"
c\
PROOF. By Eq. (6.1.5) we have, on puttingp;(A) = (l - ,lit", i = 1,2, ... , s; PROOF. Indeed, let T'fJ=1 .1':, where a<1;) = {Ai} (1 $ i $ s) and A, :F Aj
I ifi ::f:: j. 1f(A - All' is the minimal polynomial of 1': (by Theorem 2, only such
!? = L' 'D,. (5) polynomials can be minimal for 1':), then the product 1 (A - J.Jr,n1=
1=1 (i = 1, 2, ... ,s) must coincide with the minimal polynomial meA) of T.
Therefore, recalling the definition of the direct sum of transformations, it This is easily checked by repeating the argument used in the proof of
sufficesto show that each ~ I is T-invariant. Indeed, if x e tJ,then (T - .t,l)""(x) Theorem 3.
= O. Thus, by Exercise 6.1.1, The fundamental result obtained in Theorem 3 will be developed further
(T - A,I)""T(x) = T(T - A,I)""(x) =0 in Section 4 after the structure of the subspaces t'1' 'D z , ... , fRs in Eq. (4)
is studied.
and T(x) e 'D, for 1 S; i S; s, showing that '!J, is T-invariant.
Now it is to be shown that for j = 1, 2, ... , s, the minimal polynomial Considering the representations of T and the T/s together with the ideas
of Tj = Tl acting in 'Dj is (A - Aj)'"J. Indeed, if I j denotes the identity of Section 4.8, we immediately obtain the following consequences of
lfJ Theorems 3 and 4.
transformation in 'D j , then for any x e '!J j we have
(Tj _ AiJ)"'J(x) = (T - AjI)"'J(X) = 0 Corollary 1. Let Te .!C(f/) with!F = C. There exists a basis in f/ in which
therepresentation A of T is a complex block-diagonal matrix
and, therefore, (1J - Ak1if'J = 0 (in '6). Thus, (A - Ajr is an annihi~
J
A = diag[A h A z , .. , As], (6) .1

lating polynomial of Tj. Assume that this polynomial is not minimal, and 'I
'I
let (A - ,lirJ tr, < mj) be the minimal polynomial of Tj. Write and the spectrum of AI (1 $ i $ s) consistsof one point only. Moreover, these II
s Points are distinct and if q, is the size 01AI> 1 $ i $ s, then ql' q2"'" q. are J
ml(A) = (A - AJ)"i n ~)" - A,)'"'. uniquely determined by T. \ \

\
~'i
'=I.;j

..
.
!,

!'
;1
228 6 Tml JORDANCANONICAL p, 6.3 GSNBRAUZEDElOBNSPACBS 229

It willbe shownlater (seeProposition 6.6.1)that the numbers q1> qz, .... , ~xueise 4. Check that m(A) = (A - 4)2(A - 2) is the minimal polynomial
are, in fact, the algebraic multiplicities of the eigenvalues of T (or of A)i of the matrix
Note that each matrix Ai in Eq. (6) can also be described as a representa-
tion of the transformation '1i = TI.., with respect to an arbitrary basis in '6,.\ 6 2 2]
Corollary 1 above can be restated in the following ,equivalent form. ; A = -2 2 O. 0
[ 002
c
CoroUary 2. Any complex square matrix B is similar to a matrix A of the,
form (6), where Ai e''".', o(AJ = {AI}, 1 :s= i:S= s, and the numbers ql arl'
uniquely determined by B.
6.3 Generalized Eigenspaces
Note that the positive integer s in Eq. (6) can be equal to 1, in which cas,
the spectrum of T consists of a single point.
In the previous section it was shown that if Te 2(//), the action of T
Exereise 2. Let A be an eigenvalueof two similar matrices A and B. Sho ,n // can be broken down to the study of the action of the more primitive
that the indices of A with respect to A and to B are equal. 0 ,;ansformations '1l = Tlg, on the T-invariant subspaces fl of Eq. (6.2.4).
continue with the "decomposition" of T, we now study the structure of
Thefinaltheorem of the section is known as the Cayley-Hamilton theore:, ese subspacesmore closely.Also,a technique will be developedin Section
and plays an important part in matrix theory. Another proof win appear' . that generates a system of projectors onto the subspaces f1"'" f&.
Section 7.2 in terms of matrices rather than linear transformations. Let Te !l'(//) with , = C, Ae u(T), and .9,. = Ker(T - ur,
where r
lesover the set of nonnegative integers. Exercise4.5.14 asserts that
Theorem S. The characteristic polynomial c(.l) of Te !l'(t/) is one ofi,
annihilating polynomials: c(T) = a {O} = //0 c: //1 c:9'2 c: ... c: ~ = ~"'1 = .. , c:9' (1)
some positive integer p. Note that 9't is the eigenspace of T associated
PROOF. First recallthat the characteristic polynomialofa lineartransfo ..1.; the strict inclusion to} c: 9"1 follows from the assumption l e u(T).
tion is just that of any of its representations (see Section 4.11). Referring
Eq. (6), we see that c(T) is just the product of the characteristic polynom :cise 1'. Check that x E .9,. if and only if (T - Al)(x) e ~- t. 0
of At, A z, ... , AB For j = 1,2, ... , s, the size of the square matrix Aj Ie subspace .9,. (1 S; r s; p) in (1lis referredto as a generalized eigenspace
just qj = dim fJ and o(AJ) = {AJ}. Consequently, the characteristic pol "of order r associated with the eigenvalue l. Also, a nonzero element x
nomial of A j is (A- Ailli Then, by Proposition 1, qJ ~ mJ' Finally,since' h that x E ~ but x ~-t is said to be a generalized eigenvector of T of
'er r corresponding to the eigenvalue l, that is,
c(T) = c(A) = n (A -
&

J=t
Alii
x E Ker(T - Alt, but x Ker(T - uy- 1.
and particular,the customary eigenvectorsof Tcan beviewed as generalized
&
vectors of T of order 1 corresponding to the same eigenvalue. The same
m(T) = L (A - A.l"i, be said with regard to eigenspaces and generalizedeigenspaces of order
J-l
the transformation. Note that throughout this section, all generalized
it is clear that c(T) is divisible by meT). By Corollary 2 to Theorem 1 vectors and eigenspaces are assumed (if not indicated otherwise)to be
conclusion is obtained. iated with the fixed eigenvalue Aof T.
is important to note that the subspaces ~i of Eq. (6.2.4) are generalized
Exereise J. Use the Cayley-Hamilton theorem to check that the ma spaces, and the basic decomposition of Theorem 6.2.3 depends on the
of Exercise4.11.1 satisfies the equation mpositlon (6.2.5) of the whole space 9' into a direct sum of generalized
A 3 = 4A z - A + 61. :nspaces.
230 6.3 GENERALIZBD EIOIlNSPACIlS 231

Exercise 2. Show that x, is a generalized eigenvector of T of orderr ~ 2 Exercise 5. Check that any Jordan subspace for Tis T-invarillnt.
(corresponding to the eigenvalue A) if and only if the element X,-l ='
(T - 11Xx,) is a generalized eigenvector of Tof order r - I, or, equivalently, Exercise 6. Prove that any Jordan subspace contains only one (linearly
independent) eigenvector.
T(xr ) = Ax, + x,_ 10
SOLUTION. Suppose the Jordan subspace is generated by the chain x ..
where X,-l e -9.-10 X,-l 9',-1' ... , x, and has associated eigenvalue .t If X = D=
1 IX,X, is an eigenvector
Exercise 3. Check that X, is a generalized eigenvector of T of order r ~ 1 of Tassociated with Ao , then using Eq. (2) we have (with Xo = 0),
ifand only ifthe vector (T - 11y- l(X,) is an eigenvector of Tor, equivalently,
, (, ) ,
,=r I rt,X, = T ,=LIrt,x, = ,=L1 rtt<AX, + X'_I)'
(T - 11)"(x,) = 0 for k ~ r. 0 '\0
Using Exercise 2, observe that if x, is a generalized eigenvector of T of
order r, then there are vectors X,_ h " " X z, Xl for which Consequently, comparing the coefficients (see Exercise 4),

T(xI) = .txI .tort, = A.a.l + IXI+I' ISiSr-l,


T(xl ) = ..tXl + Xl' ..1.0 rt, = A.a."

so that A = ).0 and rt2 = IX3 = ... = IX, = O. Thus x = IXl Xl with IXl =F 0, as
T(x,) = Ax, + X,_ h required. 0
where xJ e f/j for j = 1, 2, ... ,r. Such a sequence Xl' Xl' ,X, is called Let Te .i'(.9") and let the subspace zz, c:: 9' be T-invariant. If X E.9"o and
a Jordan chain of length r associated with the eigenvalue A.. The chain .~ exists a basis in .9"0 of the form
also be seen as being associated with Xl' From this point of view, an eigen-
vector Xl is selected and the vectors Xl' , X, are generated by successively {X, T(x), ... , T"-l(X)}, k = dim 9'0'
solving the equations of (3) for as long as there exist solutions to the non-,
homogeneous equation (T - 11)xJ = XJ-l for j = 2,3, .... Obviously. t ~cn .9"0 is said to be a cyclic subspace of.9" (with respect to Tor Tlvo)'
length of any Jordan chain of T is finite since r S dim 9'" S dim 9' for
fIroposition 1. A Jordan subspace is cyclic.
chain of length r, Furthermore, the members of a Jordan chain are linear
independent, as the next exercise demonstrates. Let Xl> X2 x, be a Jordan chain and observe that Eqs. (3)
Exercise 4. Show that any Jordan chain. consists of linearly independ
elements.
X,-l = (T - 11)x"
SoLUTION. Letrf= 1 lX,X, = O. Applying the transformation (T - ur:' to'
both sides of this equation and noting that (T - AIr l(X,) = 0 for (5)
1 Sis r - 1 (Exercise 3), we obtain lX,(T - 11y-l(Xr) = O. By Exercise 3
X2 = (T -
11)'-2x"
the element (T - 11)'- l(X,) is an eigenvector of T and hence nonzero. Th XI = (T - Aly-Ix , .
a., = O. Applying the transformation (T - 11y- 1 to the equatio
Using the binomial theorem, it is clear that the Jordan subspace generated
ri: I lXiX; = 0, we similarly derive a.,- 1 = O. Repeating this procedure, tb by x.. Xl' , X, is also generated by x., Tx., ... , T"" IX,. Since the Jordan
linear independence of the set of generalized eigenvectors of the Jord
chain is established. 0 . . SUbspace is T-invariant (Bxereise 5) and x, is in this subspace, it follows that
the subspace is cyclic with respect to T.
In particular, Exercise 4 shows that the Jordan subspace for T,
Observe that Eqs. (5) show that if x; is a generalized eigenvector of order T,
" = span{x.. Xl, . . , X,}, ., en (T - 11)'x.. is a generalized eigenvector of order r - i, for i = 0,1, ... ,
generated by the elements of a Jordan chain of T oflength r has dimension r - 1. 1
, i

\~
~
232 6 1HB JoRDANCANONICAL FQI ,,4 TIm STRUCTURE OPGBNBRALIZED E1GBNSPACES 233

6.4 The Structure of Generalized Eigenspaces .


l,?ROOF. B"eanng In mm - d th iations
e re ' hiIpS (6.3_1) 1let x"(1) ,x,(2) , ._., x,,"
(I I

denote linearly independent elements in 9', such that

In the preceding section some basic ideas on generalized eigenspaces aD>


9'p-l + span{x~ll, X~2), _ , x~,,)} = 9'p. (2)

Jordan chains have been introduced. In this section we put them to won Qbviously, the vectors X~I), __ ., x:p) in Eq. (2) are generalized eigenvectors
to prove that each generalized eigenspace f, of Eq. (6.2.4) has a basis mad, ~fT of order p. It is claimed that the ptp vectors of the form
up of one or several Jordan chains. (T - Ut(X~I, (T - ~(x~, _.. , (T - ut(x~p, (3)
Let Te fLI(9') with = C and let 9'p = Ker(T - AI)" be the generalize
eigenspace of T of maximal order p associated with the eigenvalue A, that' . ~ = 0, 1, .. _, p - 1, are linearly independent. Indeed, if
9',_ 1 :;: 9'p = 9',+ 1- The subspace 9'" satisfying this condition is referred t
as the generalized eigenspace of T associated with A. Thus, "the generalize f r (Xlk(T -
P'l=OI=1 M)l(X~i) = 0, (4)
eigenspace" of an eigenvalue, without reference to order, implies that thi ':':
order is, in fact, maximal. .en, applying the transformation (T - My-I, we deduce by use of Exercise
.3 that
Theorem 1. There exists a unique decomposition
tp '''_1 '.
(T- U)p-l(r Ot:IOX~i)) = O.
1-1
9"p = L . J~I) + 1=L'1 ~~-1 +... + 1=L'1 ~\i).
1=1 ence Lr-llXlox~)e9'p-l and, recalling the choice of {X~)Htl in Eq. (2),
obtain L?=llXIOXg) = O. Consequently, (XiO = 0 (i = 1,2, ... , tp). Now
where ~~) (l S; j S; p, 1 S; i S; ti ) is a cyclic Jordan subspace ofdimenswlt.
,plying the transformation (T - uy-z to Eq, (4), it is shown similarly
In this decomposition of 9'p, there is the possibility that ti = 0 when .tlXn = 0 (i = 1,2, . _. , t p ) , and so on.
corresponding sum on the right is interpreted as the trivial subspace to}. hus, the cyclic Jordan subspaces ~g) (i = 1,2, ... , t p) constructed by
Theorem 1 implies that there exists a basis in Ker(T - MY consisting .rule
Jordan chains; each chain is a basis in a cyclic Jordan subspace I ~~) = span{(T - ml(x~}t:A, 1 S; i S; t",
(1 S; j S; p, 1 :s; i S; til. Namely, starting withj = p, we have the basis 'sCythe conditions dimV~ = p (1 S; i S; t p) and, if t :;: j, then
r,1)
1, x(1)
Z , ... , x~t.! h x(1) ~~) " ~Y1 = to}.
..P ,
.Now we will show that for any fixed k (0 s; k s; p - I),
x~p), x~,,), .., , X{I,,)
"-1' ,
X{I,,),
9'p-II-1 () span{(T - Mf(x~; 1 S; i S; t p} = to}. (5)
X(I,+l)
1 , X~p+l), .,., X(I"
p-l + 1),
is is obvious from Eq. (2) when k = O. More generally, suppose that
X(,,+I,-I) X(I,,+.,- al ~llXjk(T - Mf(x~ e 9',-11-1' Then
, x!'+'p-d,

r Ot:IkX~)
1 t
,-1 ,
(T - U Y- l ( = 0
X{I-'1+1) I'" 1
1 ,
d r:~ 1 alkx~) E 9'p_ h which contradicts (2) .again unless (Xu = (XZII = ...
xl')
1,
lX1pk = 0, so that Eq. (5) follows.
ow consider the elements x~~ 1 = (T - M)(x~i), i = 1,2, ... , t" from
where t = D= 1 ti and the elements of each row form a Jordan chain which are generalized eigenvectors of T of order p - 1 (Exercise 6.3.2). It
maximal possible length. Also, the elements ofcolumn r in (1) are generaU .,\Vs from above that x~~ 1 are linearly independent and that, by Eq. (5),
eigenvectors of T of order r associated with A. The basis (1) of Ker(T -
is called a Jordan basis ofthe generalized eigenspace. 9",_ z f"l span{x~l.! l' x~.! .. . . . ,x~~ 1} = to}.
234 6 THE JORDANCANONICAL Fo; .4 THE STRUcruRIl OF GIlNIlRAUZIID EIGBNSPACI!S
235
Hence there is a set of linearly independent elements {x~l)_ l}~~"-1 in ~_ xercise 2. Check that the number k", of Jordan chains in (1) of order m is
such that
a> + span{x(l)p-l }"+'.-l _ a>
k", = 2/", - ''''-1 - ''''+1
o7 p- 2 1=1 - 07 p- l ' here I. = dim(Ker(T - ).J)a).
Applying the previous argument to Eq. (6) as we did with Eq, (2), we obtai
a system of elements of the form (3), where p is replaced by p - 1,and havin
similar properties. It remains to denote
=
xercise 3. Letthe subspace ~I Ker(T - .1.J)"'i be defined as in Theorem
.tt~~1 = span{(T - ).J)"(xg~i)}{;;~, 1 SiS t p - 1> .2.3.Show that for i = 1,2, ... , s,
to obtain cyclic Jordan subspaces of dimension p - 1. Ker(T - .1.;1)",,-1 c Ker(T - Allr == Ker(T - Allr+ r (9)
Continuing this process, we eventually arrive at the relationship
or any positive integer r, and the inclusion is strict.
9'1 +span{x~)}~"'i"-I+'''+12 = [/1., In other words, for A = A/, the number p of Theorem 1 coincides with m,
where the vectors x~l), with i == 1,2, ... , L!=%
t j , denote linearly independ d the generalized eigenspace of T associated with the eigenvalue AI is just
generalized eigenvectors of T of order 2. As before, it can be shown that t Ker(T - All)"",
elements x~) = (T - ll)(Il)xy) (k == 0, 1) are linearly independent membe ere ml is the index of Ai> lSi S s.
of [/1 and, together with a few other linearly independent elements fro
[/1> form a basis {x~)H= 1 (t = L!=
1 t p ) in 9'1' For i =
1,2, ... , t, we write. 'LuTION. Let PJ(A.) == (A. - AJ)"'J so that, as in Eq. (6.2.2),
.ttl) = span{x~'+ ...+ 12+1)}, 1 s; i s t 1 m(A.) == Pl(A.)P2(A) .. p.(A.)
The system of elements built up above constitute a basis in 9',. Indee the minimal polynomial for T and, by Theorem 6.1.4,
{x~}H= 1 (t = L~= 1 t J) is a basis in [/1> while Eq.(7) shows that the union'
{x~)}I=l and {x~)}I~,;'i,,-I+.+t2 generates a basis in [/2' and so on. N [/ == L .Ker ptT ). (10)
J=1
also (see Eqs. (2), (6), and (7 that for r = p, P - 1, ... , 1,
ow define Pi(A.) == (A. - Aifp!) and PtA) == ptA) if j :# i. Then by
tp + tp - 1 + ... + t; = dim ,Y,. - dim 'y"-1> iroUary 2 of Theorem 6.2.1 the polynomial .
and therefore the numbers t J (1 sis p) are uniquely defined by T and /fl(A.) == PI0.)fJ2(A) ", P.(A)
The proof is complete.
ihilates Tand Theorem 6.1.4 applies once more to give
Exercise 1. Find a Jordan basis in $'2 associated with the transformati a

Tdiscussed in Exercises 4.7.7 and 4.9.5. 9' = L . Ker PJ{T)


J=l

SOLUTION. The transformation T has only one eigenvector, Xl = [1 0\ mparing with (10) it is clear that Ker piT) == Ker MT), as required.
corresponding to the eigenvalue A == 1. Hence there is a generalized eigl early Ker(T - Aj l ) ".. - l c Ker PI(T) and if equality obtains then the
vector X2 such that span{xb X2} = jF2. Indeed, writing X2 [oc PJT, = lynomial q(A) ~ m(AXA - .1.j)-1 annihilates T. This is because
follows from T(X2) == X2 + Xl that
Ker q(T) == L . Ker PJ(T) = ~
[oc; P] == [;] + [~]. J= 1

I
h implies q(T) ::;:: O. But this contradicts the fact that meA) is the minimal
and consequently p = 1. Thus, any vector [oc IY
(oc E IF) is a general' omial for T. So the inclusion in (9) must be strict. 0
eigenvector of T and a required Jordan basis is, for instance, {[I .ecalling the result of Exercise 6.3.5, Theorem I allows us to develop the
[0 I]T}. It of Theorem 6.2.3 further with the following statement.
236 6 THE JORDAN CANONICAL FORM 6.5 THE JORDAN THEoIU!M 237

lbeorem 2. Using notation of Theorem 6.2.3 and defining 1j = TI." corresponding to the eigenvalue A,. Thus, the Jordan block (3) is the repre-
till sentation of TI" with respect to the basis (1), which is a Jordan chain
T, = L . Tjil, 1 S; i S; S, generating .I.
l=l
Exercise 1. Show that the Jordan block in (3) cannot be similar to a block-
where till = dim(Ker(T - A,l)) and the transformations TjJl diagonal matrix diag[A 10 A 2 , , A,J, where k ~ 2.
j = 1, 2, ... , t(i) act in cyclicJordan subspaces.
SoLUTION. If, for some nonsingular Pee'''',
The decomposition (6.2.3) together with Eq. (11) is said to be the Jordan
decomposition of the transformation T. J = Pdiag[A loA z]P-t,
then tbe dimension of the eigenspace for J is greater than or equal to 2 (one
for each diagonal block). This contradicts the result of Exercise 6.3.6. 0
6.5 The Jordan Theorem Thus, a Jordan block cannot be "decomposed into smaller blocks by any
similarity transformation.
Consider a linear transformation T acting in a finite-dimensional space Let Te .2'(.9') with jt = C,and writea(T) = {A., A2 , , A.}. By Theorem
f/. In view of Corollary 1 to Theorem 6.2.4, the problem of finding the ,~.2.3, T =D.. l' 1j, where 1j = TI" and l1 = Ker(T - All)"" (l ~ i ~ s).

canonical representation of T IS reduced to that for a transformationT, Choosing in each fi a Jordan basis of the form (6.4.1), we obtain a Jordan
(1 Sis s) acting in the typical generalized eigenspace Ker(T - AL1Y"! basis for the whole space 9' as a union of Jordan bases; one for each general-
(m, being the index of AJ. Then Theorem 6.4.2 states that, for this purpose, i .ized eigenspace. This is described as a T-Jordan basis for f/. We now appeal
sufficesto find a simple representation ofthe transformations TP) (I S j S tIl), '~o Theorem 6.4.1.
acting on cyclic Jordan subspaees,
Theorem I. In the notation ofthe preceding paragraph, the representation of
Let .I denote a cyclic Jordan subspace for T spanned by
[With respect to a T-Jordan basis is the matrix
(T - Ally-I(X), ... , (T - AiIXx), X, (11
J = diag[J(A.), J(A~, ... , J(A.)], (4)
where Aj e a(T)and x is a generalized eigenvector of T of order r associat
ere for 1 ~ i S; s,
with A,. Denote Xj = (T - Ally-j(x) U = 1,2, ... , r) and observe [Sl
Exercise 6.3.2 andEqs. (6.3.3)] that T(xI) = A/XI and A, -
J( ' ) - dllag[ JTill
p , JIll
" JIll
,-1>"" JIll I.'" Jli1l]
,-1"", JI'1 (5)
<f
T(xj) = AIXl + Xl-I' j = 2,3, ... , r. ( which the j x j matrix Jjil of the form (3) appears tjll times in Eq. (5). The
~mbers tjl) (i = 1,2, ... , s;j = 1, 2, ... , p) are uniquely determined by the
Hence, comparing Eq. (2) with the definition of the representati
~nsj'ormation T.
J, = [tXi)Jtl=1 e C,JC, of TI" with respect to the basis (1), that is
r It is possible that tjil = 0 for some j < mit in which case no blocks of size j
=L
we obtain
T(x})
,-. (XIlXI, j = 1,2, ... , r, 'pear in Eq. (5).
Note that the decomposition of J in Eqs. (4) and (5) is the best possible
Ai I 0 ... 0 the sense of Exercise 1. The representation J in Eqs. (4) and (5) is called a
o AI 1
:Ulan matrix (or Jordan canonicalform)t of the transformation T. We now
, ''t.
:~
Iply Corollary 2 of Theorem 6.2.4 to obtain the following. 'i:'
JI = -, 0
1 IrcUarY 1. Any complex square matrix A is similar to a Jordan matrix
o o AI t isuniquely determined by A up to the order ofthe matrices J(AJ in Eq.(4)
the Jordan biocksin Eq. (5)
where an superdiagonal elements are ones and all unmarked elements
zeros. The matrix in (3) is referred to as a Jordan block (or cell) of ord Jordan, c., Troile dessubstitutions et desequalions algebr;ques, Paris, 1870 (p. 125). Il j

~.
238 6 TIm J01DAN CANONICAL FI 6.6 PARAMETERS OF A JORDAN MATlUX 239

Example 2. Consider the matrix Exercise 4. Show that if A = SJS- I and A = TJT- I , then T = SU for
some nonsingular U commuting with J.
6 2 2]
A = -2 2 0 .
Conversely, if U is nonsingular and commutes with J, A = SJS- t,and
[ 002 if we define T = SU, then A = TJT- 1 0

Since det(A - AI) = (2 - AXA - 4)2, it follows that a(A) = {2,4}. It i


easily seen that the matrix A has one Oinearly independent) eigenvector, sa:
XI = [0 -1 I]T, corresponding to the eigenvalue Al = 2 and one (linearf
6.6 Parameters of a Jordan Matrix
independent) eigenvector, say X2 = [2 -2 O]T, associated with A2 = 4.
From the equation (A - 41)x3 = X 2 it is found that X3 = [1 0 O]T is a
generalized eigenvector of A of order 2 associated with A2 = 4. Hence a Let A E en x n and let J, given by Eqs. (6.5.4) and (6.5.5) be a Jordan
Jordan basis for A is {XI' X2' x 3 } and the matrix canonical form of A, that is, A = PJ P - I for some invertible P. In this section
we shall indicate some relations between the multiplicities of an eigenvalue
0 21] and the parameters of the corresponding submatrices J(AJ of J.
P=[x i X2 X3]=
[ -I -2 0
1 0 0
Proposition 1. The dimension of the generalized eigenspace of A E e n x n
associated with the eigenvalue A; E a(A) [that is, the order of the matrix
transforms A into the Jordan form: J(lJ in Eq. (6.5.4)] is equal to the algebraic multiplicity ofA;.

20 0]1 PROOF. Let A = PJ r:


I, where J is given by Eq. (6.5.4). The characteristic

[0 0 4
A = P 0 4 P- I 0 polynomials of A and J coincide (Section 4.11) and hence

The problem of determining the parameters of the Jordan matrix fot


c(A) = det(lI - J) = n det(AI -
$

t 1
J(A;).
e
a given A e n x n without constructing the Jordan basis is considered in the
In view of Eqs. (6.5.5) and (6.5.3),
next section. We emphasize again that, in discussing the Jordan form ora
matrix A EFnxn, the field F is assumed to be algebraically closed. Hence'
n x n real matrices should be considered as members of en x n in order to
c(.t) = n (.t - .tl)t',
$

1= I
(1)
guarantee the existence of the Jordan form presented here.
where q; is the order of J(A,j) (l $ i $ s). In remains to recall that qj in Eq,
Exercise 3. Let A e C" x. and define a set ofJordan vectors for A to be a set (1) is, by definition, the algebraic multiplicity of A;.
of linearly independent vectors in en made up of a union of Jordan chai
Note that the number of Jordan blocks in J(Ai ) is equal to the number of
for A. Show that a subspace 9'0 c en is A-invariant if and only if 9'0 has
rows in (6.4.1), that is, to the geometric multiplicity of the eigenvalue Ai'
basis consisting of a set of Jordan vectors. (In other words, every A -inva .
SO that it readily follows from (6.4.1) and Proposition 1 that the geometric
subspace is a direct sum of subspaces that are cyclic with respect to A.)
multiplicity of an eigenvalue does not exceed its algebraic multiplicity
Hints. Given that 9'0 has a set of Jordan vectors as a basis, u (the total number of elements in (6.4.1. However. these two characteristics
Exercise 6.3.5 to show that 9'0 is T-invariant. For the converse, uniquely determine the Jordan submatrix J(A;) associated with Ai only in
X = [Xl X2 x,]beann x rmatrixwhosecolumnsformanarbitta: the cases considered next.
basis for 9'0' Observe that AX = XG for some G E e'x,. Write G = SJS-
where J is a Jordan matrix, and examine the columns of the matrix XS. Proposition 2. The matrix A is simple if and only if the geometric multi-
pliCity of each of its eigenvalues is equal to the algebraic multiplicity.
The following exercise shows that there are, in general, many similarid
Equivalently, A is simple if and only if the minimal polynomial of A has only
which will reduce a given matrix to a fixed Jordan normal form and a
simple zeros.
Shows how they are related to one another.
240 6 THE JORDAN CANoNICAL FORM 6.6 PARAME11lRS Of A JORDAN MATRIX 241

PROOF. A matrix is simple if and only if it is similar to a diagonal matrix (see In cases different from those discussed above, the orders of the Jordan
Theorem 4.8.2'). Hence, in view of Exercise 6.5.1 and Theorem 6.5.1, the blocks are usually found by use of Eq. (6.4.8) and Exercise 6.4.2, which
corresponding Jordan matrix must be diagonal and therefore consists of require the determination of dim(Ker(A - U)") for certain positive integers s.
1 x 1 Jordan blocks. Thus, recalling Proposition 1 and the foregoing
discussion, the multiplicities of the matrix coincide. Proceeding to the second Exercise S. Determine the Jordan form of the matrix
statement, we note that" the multiplicity of an eigenvalue as a zero of the
minimal polynomial (the index) is equal to the maximal order of Jordan
blocks associated with the eigenvalue (remember the meaning of pin (6.4.1).
20 0 0]
-3 2 0 J
This observation completes the proof.
[
A= . 0 0 2 1 .
Note that in this case the matrix has no proper generalized eigenvectors.

Exercise 1. Show that if the geometric multiplicity of an eigenvalue AI is


equal to one, then the Jordan matrix has only one Jordan block associated
.'
with A/. The order of this block is equal to the algebraic multiplicity of the
o 0 0 2
SoLUTION. Obviously, the characteristic polynomial is c(A) = (..t - 2)4 and
dim(Ker(A - 21)) = 2. Hence the Jordan form consists of two blocks.
To determine their orders (2 and 2, or 3 and 1), we calculate that (A - 21)2
=0 and, therefore, the number k 2 in Exercise 6.4.2 is equal to 2. Hence
eigenvalue. 0
When Exercise I holds for all the eigenvalues, the minimal and the
k2 = 2/2 -
of order 2:
'I -
13 = 8 - 2 - 4 = 2, and consequently there are two blocks
characteristic polynomials coincide. The converse is also true. I

Exercise 2. Check that the minimal and the characteristic polynomials


of a matrix coincide if and only if its Jordan matrix has only one Jordan block
J = diag[[~ ~J. [~ ~]l 0
associated with each distinct eigenvalue of tbe matrix.
There is a concept that summarizes the parameters of the Jordan form.
Hint. See the proof of Proposition 2. 0 Let A E e"" n and let A., ,1.2' ... ,A. denote all its distinct eigenvalues of
geometric multiplicities Ph P2' ... ' Ps, respectively. Thus, there are Pi
A matrix A ElF"" n is said to be nonderogatory if its characteristic and Jordan blocks in the Jordan form J of A associated with the eigenvalue AJ
minimal polynomials coincide; otherwise it is derogatory. Thus, Exercise 2, (1 S j S s). Denote by IS}I' :<!!: cSj2) :<!!: ~ ISjPJI the orders of these blocks and
reveals the structure of nonderogatory matrices. write

Exercise 3. Check that a companion matrix is nonderogatory. {(cS\I', fJ\2), " ., fJY'I), (cS~I), fJ~2), ., cS!fz), ., . , (fJ~I), fJ~2), .. , fJ~P.)}. (2)

Exercise 4. Find a Jordan matrix J ofthe matrix A having the characteristic The form (2) is referred to as theSegre characteristic t of the matrix A.
polynomial c(A) = (A - 1)3(.;[ - 2t if the geometric rnultiplicitiesare also Obviously this characteristic, along with the eigenvalues of the matrix,
known: dim(Ker(A - I
= 2 and dim(Ker(A - 21)) = 3. uniquely defines its Jordan form. Note that the numbers qi = ,WI + fJ}2 1
+ '" + {)Y'JI, j = I, 2, ... , s, are the algebraic multiplicities of the eigen-
SOLUTION. Consider Jordan blocks of A associated with Al - 1. The' values A.. A2, ... , As, respectively.
geometric mUltiplicity 2 means that there are two blocks associated with'
this eigenvalue, while the algebraic multiplicity 3 implies that one of the' Exercise 6. Verify that the Segre characteristic of the matrices discussed
blocks must be of order 2. Reasoning in a similar way for the second eigen-; in Exercises 4 and 5 are, respectively,
value, we obtain !

{(2, 1), (2, I, I)} and {(2), (2)}. o


J = diag[[~ ~]. [1], [~ ~]. [2], [2] 1 0 tsegre C.. Atti Accad. naz: Linre], Mem. HI, 19 (1884). 127-148.
242 1 TIm RI!AL JORDAN FORM 243

6.7 The Real Jordan Form We confirm our claim simply by verification. We confine attention to the
case r = 3; the modifications for other values of r are obvious. Define the
unitary matrix
Let A e Rft" ft. If A is viewed as a complex matrix, then it follows fro
Theorem 6.5.1 that A is similar to a Jordan matrix as defined in Eqs, (6.5~ 1 -i 0 o 0 0
and (6.5.5). However, this Jordan matrix J, and the matrix P for which 0 0 1 -i 0 0
r:' AP = J, will generally be complex. In this section we describe a normal
form for A that can be achieved by confining attention to real transforming U=~ I~ o 0 0
i 0 o
1 -i
0 0
matrices P, and hence a real canonical form. To achieve this, we take advan- 0 0 1 i 0 0
tage of the (complex) Jordan form already derived and the fundamental fact
- 0 o 0 0 1
(see Section 5.9) that the eigenvalues of A may be real, and if not, that they
appear in complex conjugate pairs. Recallingthat A = /l + iro, it is easily verified that
We shall use the construction of a (complex) Jordan basis for A from tbi
Jl. ro 10 0 0
preceding sections to obtain a real basis. The.vectors of this basis will Cor'
the columns oC our real transforming matrix P.
-(J)
Jl. 0 1 0 0
u*e~) ~1)]U = I
First, if A. is a real eigenvalue of A, then there is a real Jordan basis Cor t 0 0 /l ro 1 0
J,(1l, 1) = o
generalized eigenspace of Il, as described in (6.4.1); these vectors will form 0 -(J)
Jl. 0 1
part of our new real basis for Rft If A. e o(A) and is not real, then neither 0 0 0 0 /l ro
1 E a(A). Construct Jordan chains (necessarily non-real) to span the genen 0 0 0 o -ro /l
ized eigenspace of Aas in (6.4.1). Thus, each chain in this spanning set satisfi
relations of the form (6.3.3) (with T replaced by A). Taking complex con] Furthermore, with the Cartesian decompositions x j = Yj + iZj for j = 1,2.,
gates in these relations, the complex conjugate vectors form chains spannin .; . , r, we get
e-
the generalized eigenspace orl. ..
Q = [P p]U = j2[Yl %1 Y2 %2 Y3 Z3],
It will be' sufficient to show how the vectors for a single chain of length J
together with those of the conjugate chain, are combined to produce t . ich is real and of full rank since [P P] has full rank and U is unitary.
desired real basis for a 2r-dimensional real invariant subspace of A. ese vectors span a six-dimensional real invariant subspace of A associated
A. be the eigenvalue in question and write A. = Jl. + lro, where Jl., (J) are rei ith the non-real eigenvalue pair Il, 1 (see Theorem 5.9.1). This statement
.and ro :F O. Let P be the n x r complex matrix whose columns are t clear from the relation AQ = QJ,(,i, X). Our discussion leads to the fol-
vectors XI> .1'2' ' x r of a Jordan chain corresponding to A. If J(..1.) is t wingconclusion.
r x r Jordan block with eigenvalue A, then AP = PJ(A) and, taking conj
gates, AP = PJ(1). Thus, ,orem 1. If Ae Rn"ft, there is a nonsingular matrix X E RR"ft such that

A[P P] = [P
- [J(Il)
P] 0
0]
J(1)'
- 1 AX
'rms:
is a real block-diagonal matrix, each block of which has one of two

(a) the form of (6.5.3), in which case the size of the block is equal to the
and for any unitary matrix U of size 2r, ength of a complete Jordan chain of a set (6.4.1) corresponding to a real
1genvalue;
A[P p]U = [P p]UU*[J(A)
0 0] U.
J(A) (b) theform J,(A, 1) indicated above in which case the size of the block is
'ualto twice the length ofa (complex) complete Jordan chain ofa set (6.4.1)
We claim that there is a unitary matrix U such that the 2r columns of Q '.orresponding to a non-real eigenvalue A :;; Jl. + ko.
[P p]U are real and such that J,{1l, 1) A U* diag[J(A), J(1)]U is also.
The matrix J,(A.,1) then plays the same role in the real Jordan form for Describe the real Jordan form for a matrix that is real and
that diag[J(A), J(I)] plays in the complex form. i.W
244 6 THB JORDAN CANONICAL FOR MIsCELLANEOUS EXBRClSI!S 245

6.8 Miscellaneous Exercises 9. If meA) is the minimal polynomial of A E C"K nand P(A) is a polynomial
over C, prove that P(A} is nonsingularif and only if meA) and peA) are
relatively prime.
1. Prove that two simple matrices are similar if and only if they have the
same characteristic polynomial.
10; If lXb <X2' <X3 E R and are not all zero, show that the characteristic and
minimal polynomials of the matrix
2. Show that any square matrix is similar to its transpose.
3. Check that the Jordan canonical form of an idempotent matrix (&:. <Xo OC1 OC2 <X 3]
projector) is a diagonal matrix with zeros and ones in the diagonal -<Xl <Xo -<X3 2
positions. [
-<X2 <X3 <Xo -OCI
4. Show that - OC3 - OC2 OC I OCo

(a) the minimal and the characteristic polynomials of a Jorda are


block of order r are both (A - Ao)' ;
0.2 - 2oco.1 + (lX~ + <Xi + oci + <XmJ
(b) the minimal polynomial of a Jordan matrix is the least commo
multiple of the minimal polynomials of the Jordan blocks. withj = 2 andj = I, respectively. I
5. Show that a Jordan block is similar to the companion matrix associat
with its characteristic (or minimal) polynomiaL
Hint. First reduce to the case <Xo =0 and show that in this case
Al = i(oci + oci + ocW/ 2 is an eigenvalue with eigenvectors
I
I

[~J [~l
Hint. For a geometrical solution: Show that the representation
TI.1 with respect to the basis {x, T(x), ... T,-l(X)} for a Jord
subspace .I is a companion matrix.
6. Prove that any matrix A E C" "It can be represented as a sum of a simp,
matrix and a nilpotent matrix.
and

Let A E en l(". Show that A = A-I if and only if there is an m ~ n and


l
Hint. Represent an r x r Jordan block in the form All + N, whe: a nonsingular 8 such that .
N = [c5J+1,lc]J,lc~1t and observe that N' = O.
7. Check that the matrices
A = 8[1o m
0
-1"_,,,
]8-1.
au ...
.,. ] 00
0
1 o ... 0 ; Hint. Use the Jordan form for A.

'.... '.
and 0
a,-l"
[I 0 00
0 ... 0
1
0
are similar, provided Ql,l+ 1 :F 0, i = 1, 2, ... , r - 1.
8. Check that for an r x r Jordan block J associated with an eigenval
Ao, and for any polynomial P(l.),
P(1.0} p'(l.o)/ll .... p(,-ll(l.o!/(r - I}l]
o '.'
p(J) =:
[
... ... p'(.1~}/1!
o ... 0 p().o}
7.1 THB NOTION OF A MATRIX POLYNOMIAL
247

and suppose that 1 and the coefficients of the polynomials ac,ll) are from a
field:F, so that when the elements of A(A) are evaluated for a particular value
CHAPTER 7 of l, say 1 = A.a, then A(lo)enxn. The matrix A(l) is known as a matrix
polynomial or a lambda matrix, or a A-matrix. Where the underlying field
needs to be stressed we shall call A(,t) a ,t-matrixover. Note that an example
of a ,t-matrix is provided by the matrix AI - A and that the matrices whose
elements are scalars (i.e., polynomials of degree zero, or just zeros) can be
Matrix Polynomials viewed as a particular case of ,t-matrices.
and Normal Forms A ,t-matrix A(,t) is said to be regular if the determinant det A(,t) is not
identically zero. Note that det A(l) is itself a polynomial with coefficients
in :F, so A(,t) is not regular if and only if each coefficient of det A(,t) is the
zero of fF. The degree of a A-matrix A(l) is defined to be the greatest degree
of the polynomials appearing as entries of A(,t) and is denoted by deg A(,t).
Thus, deg(M - A) = 1 and deg A = 0 for any nonzero A enx n.
Lambda matrices of the same order may be added and multiplied together
in the usual way, and in each case the result is another ,t-matrix. In particular,
a l-matrix A(l) is said to be invertible if there is a A-matrix B(A) such that
A(A}B(,t) = B(l}A(A} = 1.In other words, A(A} is invertible if B(A} = [A(A}]-1
The important theory of invariant polynomials and A-matrices developed existsand is also a A-matrix.
in this chapter allows us to obtain an independent proof of the possibility
Proposition I, A l-matrix A(A} is invertible if and only if det A(A) is a non-
of reducing a matrix to the Jordan canonical form, as well as a method of zeroconstant.
constructing that form directly from the elements of the matrix. Also, other
normal forms of a matrix are derived. This approach to the Jordan form can PROOF. If det A(A} = c - 0, then the entries of the inverse matrix are equal
be viewed as an algebraic one in contrast to the. geometrical approach' to theminors of A(l} of order n - 1 divided by c-O and hence are poly-
developed in Chapter 6. It requires less mathematical abstraction than the 'nomials in 1. Thus [A(l}] -1 is a A-matrix. Conversely, if A(A} is invertible,
geometrical approach and may be more accessible for some readers. In then the equality A(,t}B(A} = I yields
addition, the analysis of matrix polynomials is important in its own right.
Direct applications to differential and difference equations are developed in det A(A} det B(l} = 1.
this chapter, and foundations are laid for the spectral theory of matrix
polynomials to be studied in Chapter 14. Thus, the product of the polynomials det A(A} and det B(l} is a nonzero
constant. This is possible only if they are each nonzero constants.
A l-matrix with a nonzero constant determinant is also referred to as a
unimodular l-matrix. This notion permits the restatement of Proposition 1:
7.1 The Notion of a Matrix lPolyn~mial A A-matrix is invertible if and only if it is unimodular. Note that the degree of a
unimodular matrix is arbitrary.

Exercise 1. Check that the following A.-matrices are unimodular:

r -ll']
Consider an n x n matrix A(A) whose elements are polynomials in A:
r,,(l) .,,(1) ... .,,(1)]
A}
= Q21 (A.)
azz(l}
.. .. . Qzn(,t} (1) 1 [<1A -_ 1}2 1]i I'
A(
: : '" .: ' A 1(A} = 0 1 .1.4
001
,
AiA) = 2 0
On 1(A) Qnz(l} ... a",,(l}

246
248 7 MATRIX POLYNOMIALS A~D NORMAL FORMS 'l.~ DIVISION OF MATRIX PoLYNOMIALS
249
Unimodular .il.-matrices play an important role in reducing an arbitrary Furthermore, the degree of the product
Il-matrix to a simpler form. Before going on to develop this idea in the next
two sections, we wish to point out a different viewpoint on ,t-matrices. A(Il)B(Il) = AoBo + A(A 1Bo + AoB I) + ... + IlI+mAIB",
Let A(Il) denote an n x nil-matrix of degree I. The (i, j)th element of A(Il) does not exceed I + m. Clearly, if either A(A) or B(A) is a matrix polynomial
can be written in the form with an invertible leading coefficient, then the degree of the product is
exactly / + m.
aIJ{Il) = a111 + aU11l + ... + aWl', Suppose that B(,t) is a matrix polynomial of degree m with an invertible
and there is at least one element for which aW oF 0 (1 $ i,j $ n). If we let A, leading coefficient and that there exist matrix: polynomials Q(,t), R(,t), with
be the matrix whose i, jth element is al7 (r = 0, 1, ... , I for each i, }), then R(,t) == 0 or the degree of R(Il) less than m, such that
we have A, oF 0 and
A(,t) = Q(,t)B(Il) + R(,t).
A(Il) = Ao + AA1 + '" + Il'-I A'_I + ,t'A,. if this is the case we call Q(.il.) a right quotient of A(,t) on division by B(.il.)
This formulation motivates the term "matrix polynomial" as an alternative and R(,t) is a right remainder of A(,t) on division by B(,t). Similarly, 0(,1,),
for" Il-matrix." . R.(Il) are a left quotient and left remainder of A(,t) on division by B(,t) if
Exercise 2. Check that A(,t) = B(Il)(l(.il.) + R(,t)

[1l 1 Il~ --3,1,II. +- 2]1 [10 - 2]1 + .Il[12 -1] + 1l 1].


2 . and R(.il.) == 0 or the degree of R(.il.) is less than m.
+ II. + = 2 [1 0 If the right remainder of A(,t) on division by B(,t) is zero, then Q(,t) is
2.1l - 3 0 1
said to be a right divisor of A(,t) on division by B(,t). A similar definition
The description of ,i-matrices as matrix polynomials suggests problems ~pplies for a left divisor of A(l) on division by B(l).
concerning divisibility, the solution of polynomial equations, and other, In order to make these formal definitions meaningful, we must show that,
generalizations of the theory of scalar polynomials. Of course, the lack of. . given this A(,t) and B(,t), there do exist quotients and remainders as defined.
commutativity of the coefficients of matrix polynomials causes some diffi: :When we have done this we shall prove that they are also unique. The proof
culties and modifications in this program. However, an important and usefu.~ ,be the next theorem is a generalization of the division algorithm for scalar
theory of matrix polynomials has recently been developed, and an introduc-, polynomials.
tion to this theory is presented in Chapter 14. '
In the next section we present some fundamental facts about divisibiliq! theorem 1. Let A(,t) = L!=o ,tIAi> B(,t) = Ll':-o .il.IBI be n x n matrix poly-
of matrix polynomials needed for use in this and subsequent chapters. nomials ofdegrees I and m, respectively, with det B", O. Then there exists
a right quotient and right remainder of A(,t) on division by B(,t) and similarly
Jar a left quotientand left remainder.

1.2 Division of Matrix Polynomials ~R.ooF. If I < m, we have only to put Q(Il) = 0 and R(Il) = A(,t) to obtain
the result.
If I ~ m, we first "divide by" the leading term of B(,t), namely, B",.il.'"
Let A(Il) = L~=o AlA, be an n x n matrix polynomial (i.e., Al E r"~ Pbserve that the term of highest degree of the ,t-matrix AlB; l,tI-"'B(,t) is
i degree I. If A, = 1, then the matrix polynomial A(A) i~
= 0, 1, ... ,I) of just A,.il.I. Hence ,I
1
said to be monic. If B(Il) = LI"=o AIBI denotes an n x n matrix polynomial
of degree m, then obviously A(,t) = A,B;l,t'-"'B(.Il) + A(1I(II.),
Where A(l l(lI.) is a matrix polynomial whose degree, 110 does not exceed
A(Il) + B(Il) = "
L lli(A I + BI) , 1- 1. Writing A(1I(,t) in decreasing powers, let
i=O

where k = max(l, m). Thus, the degree of A(II.) + B(Il) doex not exceed k. AUI(Il) = AI:IIl'I + ... + A~II, AUI 0, II < I.
'I \i

I: \~l
~.
Ii;.
\,,
250 ,~,~ DIVISION OF MATRIX PoLYNOMIALS
2S1
If 11 ~ m we repeat the process, but on A(1)(.t) rather than A(.t) to obtain J.rxercise 1. Examine the right and left quotients and remainders of A(,t)
on division by B(l), where
A{l)(.Il) = AI:)B,;;I.Il It- mB(l) + A(2)(1),
where A(l) = [1 + 1+
4 2
A-I
3 2
A +.Il + 1 + 2]
A(2)(.Il) = A1~).Il'2 + ... + Alf), Al~) :f:. 0, 12 < 'I' 1
2
+ 1 1]
2,\3 - ,\ 2A2 + 2A '

In this manner we can construct a sequence of matrix polynomials A(A), B(A) = [ ,\ ).2 +). .
A(1)(A), A(2)(A), ... whose degrees are strictly decreasing, and after a finite
number of terms we arrive at a matrix polynomial A(r)(l) of degree l, < m,
SOLUTION. Note first of all that B(,1.) has an invertible leading coefficient.
with 1,_ 1 ~ m. Then, if we write A(.t) = A(O)(.t), It is found that
A(-l)(..t)=AI:~,1)B';;I.t,.,-,-mB(.t)+A()(..t), . s= 1,2, ... ,r.
Combining these equations, we have
.
2
A(A) = [A -
2..t
1 1]..t -
2
2
[..t +
..t
1 1
,1.2 +.11.
]
+
[2..t 2..t + 3]
-S..t -2;'
A(;') = (A,B,;;I..t,-m + AI:)B,;;I,1.It-m + ...
= Q(;')B(.Il) + R(;'),
+ Al~~,1)B';;I..t'r-I-m)B(..t) + A(r)(,1.).

[;,2..t+ 1 ..t 1] 1]
The matrix in parentheses can now be identified as a right quotient of A(.t)
on division by B(,1.), and A(r)(,1.) is the right remainder. ..
A(..t) = 2 +
[;'2
A .II. _ 1
.Il +
1 = B(.II.)(2(..t).
It is not difficult to provide the modifications to the proof needed to prove I
the existence of a left quotient and remainder. i
Thus, (2(A) is a left divisor of A(..t). 0 !
r ;I
:'tJ
Theorem2. With the hypotheses of Theorem 1, the right quotient, right, ~" Now we consider the important special case in which a divisor is linear
'j

remainder, left quotient, and left remainder are each unique. ., (i.e., a matrix polynomial of the first degree). But first note that when discus-
sing
,:0,
a scalar polynomial p(.II.), we may write
PROOF. Suppose that there exist matrix polynomials Q().), R(,1.) and Ql(.t), \

R 1(,1.) such that . p().) = a,A' + a,_I).'-1 + ... + ao = .Il'a, + ).'-l a'_1 + ... + ao.
A(,1.) = Q(,1.)B().) + R(.II.) For a matrix polynomial with a matrix argument, this is not generally pos-
and sible. If A(A) is a matrix polynomial over g; and BE~nl<n, we define the
tightvalue A(B) of A()') at B by
A(.II.) = QI(A.)B(;') + R 1(.Il)

where R(A),RIO.) each have degree less than m. Th.en A(B) = A,B' + A,_IBI-I + ... + A o

(Q(A) - Ql(AB()') = R 1( ).) - R()'). and 'the left value A(B) of A(l) at B by

If Q(.II.) :f:. QI(I), then the left-hand side of this equation is a matrix pol ,4(B) = B'A, + B'-IA,_ I + ... + Ao.
nomial whose degree is at least m. However, the right.haIl~ side is a mat
polynomial of degree less than m. Hence Ql(l) = Q().), whence R 1( ). ) = R(J . The reader should be familiar with the classical remainder theorem: On
also. 'ividing the scalar polynomial P(l) by I - b, the remainder is P(b). We now
A similar argument can be used to establish the uniqueness of the Ie 'rove a remarkable extension of this result to matrix polynomials. Note
quotient and remainder. rst of all that AI - B is monic.
252 7 MATRIX PoLYNOMIALS AND NORMAL Fo 253

Theorem 3. The right and left remainders of a matrix polynomial A(l) jis, by definition, an n x n matrix polynomial over F ofdegree n - 1
division by )J - Bare A(B) and A(B), respectively. that
(U - A)B(A) = B(A)(U - A)= c(,l)I. (1)
PROOF. The factorization
',w, cO.)I is a matrix polynomial ofdegree n that is divisible on both the
All - BJ = (AJ-II + AJ- 2B + ... + ,lBJ-2 + B)-I)()J - B) and the right.by U - A with zero remainder. It remains now to apply
,proUary 1 above.
can be verified by multiplying out the product on the right. Premultipl'
both sides of this equation by A J and sum the resulting equations' Irollary 1. Iff is any scalar polynomial over F and A e Fn II n, then there is
j = 1,2, .... I. The right-hand side of the equation obtained is of the fo lynomial p (depending on A) ofdegree less than nfor which f(A) = p(A).
C(AX.u - B), where C(,l) is a matrix polynomial. IF. Let f(A) = q(A)c(A) + r(A), where c(l) is the characteristic poly-
The left-hand side is
mial of A and r(A) is either zero or a polynomial of degree not exceeding
I I I , 1. Then f(A) = q(A)c(A) + r(A) and, by Theorem 4, f(A) = r(A).
L A)l) - L A)Bi = L A),V - L AJB) = A(A) - A(B).
)=1 j=1 )=0 j=O This result appears again later as a consequence of the definition of f(A)
more general classes of scalar functions f.
Thus,

A(A) = C(A)(,u - B) + A(B). ?


~i

The result now follows from the uniqueness of the right remainder on divisi 7.3 Elementary Operations and Equivalence
of A(A) by (U - B). ~;
The result for the left remainder is obtained by reversing the factors in .
initial factorization, multiplying on the right by A), and summing. . . this section our immediate objective is the reduction of a matrix
.ynomial to a simpler form by means .of equivalence transformations,
An n x n matrix X eFnlln such that A(X) = 0 (respectively,A(X) =) ichwe will now describe. This analysis is a natural extension of that
is referred to as a right (respectively, left) solvent of the matrix polyno: eloped in Sections 2.7 and 210 for matrices with entries from a field F
A()'). matrices whose entries are scalar polynomials and so no longer form a
.d. Mterwards. on confining our attention to the matrix polynomial
Corollary 1. The matrix polynomial A(A) is divisible on the right (resp, - A, the desired normal (canonical) forms for the (constant) matrix A
tively, left) by ),J - B with zero remainder if and only if B is a right (res
1be derived.
tively, left) solvent of A(,t). We start by defining the elementary row and column operations on a matrix
'Jynomial over F by an appropriate generalization of those (in fact,
ly of the third one) on a matrix with constant entries as described in
This result provides a second proof of the Cayley-Hamilton theore ion 2.7. The parentheses refer to the right elementary operations.
(see Theorem 6.2.5).
1) Multiply any row (column) by a nonzero c e F.
2) Interchange any two rows (columns).
Theorem 4. If A is a square ma~rix with characteristic polynomial C(,
then c(A) = O. In other words, A is a zero ofits characteristic polynomial.
3) Add to any row (column) any other row (column) multiplied by an
litrary polynomial b(A) over ,F.
PROOF. Let A e ' > I l l >I and let c(,t) denote the characteristic polynomial of It is easy to verify (comparing with Section 2.7) that performing an
Define the matrix B(l) = adj(U - A) and (see Section 26) observe entary row operation on an n x n matrix polynomial is equivalent to
254 7 MATRIX PoLYNOMIALS AND NORMAL FORMS 7.3 ELIlMENTARY OPIlRATIONS AND EQUiVAUlNCB 255

premultiplication of the matrix polynomial by an n x n matrix of one of the As in Section 2.7, the matrices above are said to be elementary matrices
following three forms, respectively: of the first, second; or third kind. Observe that the elementary matrices can
be generated by performing the desired operation on the unit matrix, and
1 that postmultiplication by an appropriate elementary matrix produces an .
. elementary column operation on the matrix polynomial.
1 Exercise 1. Prove that elementary matrices are unimodular and that their
Etl) = I c inverses are also elementary matrices. 0
1 We can now make a formal definition of equivalence transformations: two
matrix polynomials A(l) and B(l), over ', are said to be equivalent, or to be
1 connected by an equivalence transformation, if B(l) can beobtained from A(l)
by a finite sequence of elementary operations. It follows from the equivalence
of elementary operations and operations with elementary matrices that
1 A(A) and B(l) are equivalent if and only if there are elementary matrices
E1(l), E 2(l), ... , E,,(l), Ek+ l(l), ... , Eil) such that
0 B(A) = E k( ).) ., E l(l)A(l)Ek+ leA) ... E.(A)
E(2) =
I or, equivalently,
1 B(l) = P(A)A(A)Q(A), (1)
0
where pel) = E,,(A) .. E 1(l) and Q(A) = EH 10.) .. E.(A) are matrix poly-
nomials over '. Exercise 1 implies that the matrices peA) and Q(A) are
unimodular and hence the following propositions.

1 Proposition 1. The matrix polynomials A(A) and B(l) are equivalent if and
only if there are unimodular matrices P(A) and Q(l) such that Eq. (1) is valid.
1 bel) The last fact readily implies that equivalence between matrix polynomials
Et13)(A) = is an equivalence relation.
1 Exercise 2. Denoting the fact that A(l) and B(l) are equivalent by A "'" B,
prove that
1 (a) A...., A;
(b) A "'" B implies B "'" A;
or (c) A "'" Band B "'" C implies A "'" C.
Exercise 3. Prove that any unimodular matrix is equivalent to the identity
matrix.

1 SOLUTION. If A(l) is unimodular, then A(A)[A(l)] -1 = I and hence relation


E~3)(l) = (1) holds with peA) = I and Q(l) = [A(A)] -1. 0
b(A) Exercise 2 shows that the set of all n x n matrix polynomials splits into
Subsets of mutually equivalent matrices. Exercise 3 gives one of these classes.
~amely, the class of unimodular matrices.
256 7 MATRIX PoLYNOMIALS AND NORMAL FOR A CANONICAL FORM FOIl A MATRIX PoLYNOMIAL 257

Exercise 4. Prove that a matrix polynomial is unimodular if and only i :onal matrix polynomial
it can be decomposed into a product of elementary matrices.
AoO~) = diag[at(}')' a2(}.), ... , anO.)], (1)
Hint. For the necessary part, use Exercise 3.
which aJ..A) is zero or a monic polynomial, i = 1,2, ... , n, and aJ{A) is
Exercise5. Show that the matrices ible by aj_l(Jl), j = 2, 3, .. , n. A matrix polynomial with these prop-

A(A) = [ .1.2 _ A
A A +_ 1
XZ 1] and B = [1 0]
0 0
s is called a canonical matrix polynomial. Note that if there are zeros
ong the polynomials aJ..A) then they must occupy the last places, since
nzero polynomials are not divisible by the zero polynomial. Furthermore,
are equivalent and find the transforming matrices P(A) and Q(A). ,me of the polynomials ap.) are (nonzero) scalars, then they must be
SoLUTION. To simplify A(A),add - (A - 1) times the first row to the secon al to 1 and be placed in the first positions of the canonical matrix, since
row, and then add -1 times the first column to the second. Finally, addin divisible only by 1. Thus, the canonical matrix is generally ofthe form
- Atimes the second column of the new matrix to the first and mterchangin
Ao(A) = diag[l, ... , 1, al(A), .. , a,,(A), 0, .... 0], (2)
columns, obtain the matrix B. Translating the operations performed intI
elementary matrices and finding their product, it is easily found that re aJ..A) is a monic polynomial of degree at least 1 and is divisible (for

Q(A) = [-1 1+
1 -,1.
A] P(A) = [1 0]
1- l 1
2, ... , k) by aj_l(A).'It is also possible that the diagonal of the matrix in
cOntains no zeros or ones.

and that Eq. (1) holds. Irem I. Any n x n matrix polynomial over , is equivalent to a
,1tical matrix polynomial.
Exercise 6. Show that the following pairs of matrices are equivalent: I

0 1 A] [1 00] 'F. We may assume that. A(A) .p. 0, for otherwise there is nothing to
(a)
[A2
A
- l
A
,t2 - 1 A? - 1
1 and 0 1 0 ;
0 0 0
ve, The proof is merely a description of a sequence of elementary trans-
ations needed to reduce successively the rows and columns of A(A)
"he required form.
(b)
.F
[ ,t _ 1
A + 1]
l1 an
d f10
L .1.4 -
02
.1. +1
]
. J. Let aIJ{,t) be a nonzero element of A(,t) of least degree; by inter-
ging rows and columns (elementary operations of type 2) bring this
Exercise 7. Show that ifE(A) is a unimodular matrix polynomial,t lent to the 1, 1 position and call it 411(.1.). For each element of the first
all minors of size two or more are independent of ,to 'and column of the resulting matrix, we find the quotient and remainder
SoLUTION. Observe that this is true for all elementary matrices. Exprl
ivision by 411(l):
E(A) as a product of elementary matrices and use the Binet-Cauchy form alJ..A) = al1(A)QiJ..l) + rlJ..A), j = 2,3, ... , n,
al1(A) = al1(,t)qI1(,t) + rH(,t), i = 2, 3, .. , n.
ow, for each i and j, subtract qlJ..A) times the first column from the ith
mn and subtract qu(,t) times the first row from the ith row (elementary
7.4 A Canonical Form for a Matrix Polynomial :rations oftype 3). Then the elements ad..}.), al1{A.) arereplaced by rd..A.)
ru(A), respectively (i, j = 2, 3, ... , n), all of which are either the zero
~ynomial or have degree less than that of al1(A). H the polynomials are
The objective of this section is to find the simplest matrix polynomial' all zero, we use an operation of type 2 to interchange al1(A) with an
each equivalence class of mutually equivalent matrix polynomials. In in, lent r iJ..A) or ru(A) ofleast degree. Now repeat the process of reducing
detail, it will be shown that any n x n matrix polynomial is equivalent degree of the off-diagonal elements of the first row and column to be less

l1li
.\
258 7 MATRIx POLYNOMIALS ANDNORMAL FORl\IS 7.5 INVAlUANT POLYNOMIALS 259

than that of the new a11(A). Clearly, since the degree of Qu(l) is strictI Note that a matrix polynomial is regular if and only if all the elements
decreasing at each step, we eventually reduce the A.-matrix to the form a1(1), ... , an(A.) in Eq, (1) are nonzero polynomials.

0 ... 0] Exercise 1~ Prove that, by using left (right) elementary operations only,

l
a ll(,l)
a matrix polynomial can be reduced to an upper- (lower-) triangular matrix
~ aZ~(A): : '. a2n~J.). polynomial B(..1.) with the property that if the degree of bJj.,l) is IJ U = 1,
2,... , n), then
o anz(A). . . a/Ul(A)
(a) I; = 0 implies that bk; = 0 (bJk = 0), k = 1,2, ... , j - 1, and
Step 2. In the form (3) there may now be nonzero elements al/)..), 2 S; i (b) IJ > 0 implies that the degree of nonzero elements bkl.A) (b;kfA is
j S; n, whose degree is less than that of a1&1.). If so, we repeat Step 1 agai less than I;, k = 1,2, ... ,j - 1. 0
and arrive at another matrix of the form (3) but with the degree of au{
The reduction of matrix polynomials described above takes on a simple
further reduced, Thus, by repeating Step 1 a sufficient number of times, WI.
can find a matrix of the form (3) that is equivalent to A(A) and for which form in the important special case in which A(A.) does not depend explicitly
on A at all, that is, when A(A) == A E ,n" n; This observation can be used to
all(A) is a nonzero element of least degree.
, Showthat Theorem 1 implies Theorem 2.7.2 (for square matrices).
Step 3. Having completed Step 2, we now ask whether there are nonzef The uniqueness of the polynomials alP.), .. , an{,l) appearing in Eq. (I)
elements that are not divisible by al &~). If there is one such, say a/I.A), .. is shown in the next section.
add columnj to column 1, find remainders and quotients of the new col
Ion division by a11(,l), and go on to repeat Steps 1 and 2, winding up wi
a form (3), again with a 11 (A) replaced by a polynomial of smaller degree.
Again, this process can continue only for a finite number of steps befor,
we arrive at a matrix of the form 7.5 Invariant Polynomials and the Smith Canonical Form
\
0 ... 0]
l
a 1(,l)
~ bz~(,l.) b2~(A) , We start this section by constructing a system of polynomials that is
niquely defined by a given matrix polynomial and that is invariant under
o bnz(,l). . . bnJ.A) uivalence transformations of the matrix. For this purpose, we first define
!he rank of a matrix polynomial to be the order of its largest minor that is
where, after an elementary operation of type 1 (if necessary), a1(A) is moni 80t equal to the zero polynomial. For example, the rank of A(,l.)defined in
and all the nonzero elements b/J{A) are divisible by al(A) without remaind' xercise 7.3.5 is equal to 1,while the rank ofthe canonical matrix polynomial
Step 4. If all bij.A) are zero, the theorem is proved. If not, the above mat 0(.1.) == diag[a1(,l), ... , a,.{A), 0, ... ,0] is equal to r. Note also that if
may be reduced to the form (A) is an n x n matrix polynomial, then it is regular if and only if its rank is n.
'oposition 1. The rank ofa matrix polynomial is invariantunderequivalence
a1(A) o o o 'ansformations.
o a2(A) o o
o o C33(..1.) c3n(A) t)OF. Let the matrix polynomials A{A) and B(A) be equivalent. Then by
oposition 7.3.1 there are unimodular matrices P(,l) and Q{A} such that
.~) == P(A)A(A)Q(A). Apply the Binet-Cauchy formula twice to this equa-
o o Cn 3(A) Cnn(A)
'~ to express a minor b(A) of order j of B(A) in terms of minors aa(..1.) of
where a2(A) is divisible by a1(A) and the elements Ci/",l), 3 S; i,j S; n, A) of the same order as follows (after a reordering):
divisible by a2(A). Continuing the process we arrive at the statement of b(A) == L p.(..1.)a.{,l)qa{A), (1)
theorem.

~'
!;
".
260 7 MATRIX PoLYNOMIALS ANDNORMAL FORMS 7.S INVARIANT PoLYNOMIALS 261
where p.(l) and q.(l) denote the appropriate minors of order} of the matrix Now consider the quotients
polynomials peA) and Q(A), respectively. If now b(A) is a nonzero minor of
B(.t.) of the greatest order r (that is, the rank of B(A)is r), then it follows from . d 1(l) . d 2(A) . d.(l)
Eq. (1) that at least one minor a.(l) (of order r) is a nonzero polynomial and
11(.1.) = doO.)' 12(.1.) = dl(l)"'" I
r, = d - (..'I.)'
r
hence the rank of B(A) does not exceed the rank of A(A). However, applying In view ofthe divisibility ofdJ{l) by dj_l(l), the quotients iJ{l)(j = 1,2, ... , r)
the same argument to the equation A(A) = [p(A)r 1B(l) [Q(l)] -1, we see are polynomials. They are called the invariant polynomials of A(l); the
that rank A(l) ~ rank B(l). Thus, the ranks ofequivalent matrix polynomials invariance refers to equivalence transformations. Note that for J = 1, 2, ... , r,
coincide.
dt. l) = i&l)i 2(l) ... ij..A), (3)
Suppose that the n x n matrix polynomial A(A) over :F has rank r and let
and for J = 2,3, ... , r, it.A) is divisible by ij-IO.).
dt.A) be the greatest common divisor of all minors of A(A) of order J, where
}= 1,2, ... , r, Clearly, any minor of orderj ~ 2 may be expressed as a linear Exereise 2. Find (from their definition) the invariant polynomials of the
combination of minors of order j - I, so that dJ- &l) is necessarily a factor matrix polynomials
dt.A). Hence, if we define do(A) == I, then in the sequence
do(A), dIO,), .. , d.(l), (a) O
Al il l;
[ l2 - A A2 - 1 12 - 1
1 (b) [A
l3 A
Z

.Its 0
O. 1
dt.A) is divisible by dJ-I(A),} = 1,2, ... , r, 0 0 2).

Exercise 1. Show that the polynomials dJ{A) defined in the previous para- Answer. (a) il(A) = 1, i 2(l) = 1; (b) il(A) = A., i2(1) = A., i3(A) = AS _ ,1.4.
graph for the canonical matrix polynomial Ao(l) = diag[al(A), az(A), .. , o
a.(..1.),O, ... ,0] are, respectively, Recalling the result of Exercise 1 and observing that the invariant poly-
nomials of the canonical matrix polynomial are merely ail), we now use
dl(A) = al(A), d 2(A) = al(A)a2(A), ... , d.(l) = n
J=I
aJ{l). 0 (2) Proposition 2 to obtain the main result of the section.

The polynomials doell, d 1(l), . , dr(l) have the important property of


Theorem 1. A matrix polynomial A(l) ofrank r is equivalent to the canonical
matr~x polynomial diag[iI(A), ... , ir(A), 0, ... ,0], where il(l), i2 0.), : .. , i.(..t)
invariance under equivalence transformations. To see this, let dt.}.) and are the invariant polynomials of A(..t).
8t.l) denote the (monic) greatest common divisor of all minors of order}
of the matrix polynomials A(l) and B(l), respectively. Note that by Prop- The canonical form
osition 1 the number of polynomials dt.l) and "J{l) is the same provided diag[il(A), ... , ir(l), 0, ... , 0]
that A(A) and B(l) are equivalent.
.is known as the Smith canonical form. H. J. S. Smith t obtained the form for
Proposition 1. ' Let the matrix polynomials A(l) and B(l) ofrank r be equiva- ~atrices of integers (which have much the same algebraic structure as matrix
lent. Then, with the notation of the previous paragraph, the polynomials d/..}.) '" polynomials) in 1861. Frobeniust obtained the result for matrix polynomials
and 8t.A) coincide U = 1,2, ... , r). in 1878. The matrix polynomial A(l) and its canonical form are, of course,
, OVer the same field :F.
PROOF. Preserving the notation used in the proof of Proposition 1, it is
Corollary 1. Two matrix polynomials are equivalent if and only if they have
easily seen from Eq. (1) that any common divisor of minors a..{A) of A(l) ~he same invariant polynomials.
.of order j (1 s j ~ r) is a divisor of b(l). Hence "t.),,) is divisible by dt.A)~
But again, the equation A(l) = [p(l)r IB(l)[Q(l)r I implies that d}(l) PROOF. The" only if" statement is just Proposition 2. If two matrix poly-
is divisible by "t.l) and, since both polynomials are assumed to be monic, W: llomials have the same invariant polynomials, then Theorem 1 implies that
obtain
t Phi/os. Trans. Roy. Soc, London 151 (1861),293-326.
fJJ{l) = dJ{A), j = 1,2, ... , r. Jour. Reine Angew. Math. (Cre/le) 86 (1878), 146-208.
262 7 MATRIX POLYNOMIALS AND NORMAL FOI .6 SIMILAIlITY AND THEFIRST NORMAL FORM 263

they have the same Smith canonical form. The transitive property of equiva- i~Arhus, using Theorem 7.5.1, A and B are similar if and only if AI - A and
lence relations (Exercise 7.3.2) then implies that they are equivalent. I.~..)J - B have the same invariant polynomials.

Exercise 3. Find the Smith canonical form of the matrices defined i J;PROOF. Suppose first that A andB are similar. Then there is a nonsingular
Exercise 2. "s E!I''''''' such that A
;; SBS- 1 ; whence
Answer. Al-A=S(Al-B)S-l,

10 0] [A0.;'0 00 J. which implies that AI - A and ),J - B are equivalent. The corollary to
Theorem 7.5.1 now implies that )J - A and AJ - B have the same invariant
(a)
[o 0 0
0 I 0 (b)
O. 0 ,V - ..1.4
0
polynomials.
Conversely, suppose that )J - A and AJ - B have the same invariant
Exercise 3 illustrates the fact that the Smith canonical form of a matri polynomials. Then they are equivalent and there exist unimodular matrix
polynomial can be found either by performing elementary operations \.polynomials P(A) and Q(A) such that
described in the proofofTheorem 7.4.1 or by finding its invariant polynomia
P(A)(Al - A)Q(..1.) = )J - B.
Exercise 4. Find the invariant polynomials of),J - ell where,as in Exerc'
Thus, definingM(A) ;; [P(A)]- I, we observe that M(A) is a matrix polynomial
2.2.9, C" denotes the I x I companion matrix associated with the polynom and .
a(A.) = A.'+ ~];A a,A'.
SOLUTION. Observe that there are nonzero minors of orders I, 2, ... , I .1
of Al - ell that are independent of A and therefore, in the previous notatt
we have d 1 ;; d 2 = ... = d'-1 ;; 1. Since, by definition d,(A) = det(Al-
then in view of Exercise 4.11.3, d,(A);; a(A). Hence i 1 =... == i'-1
i,(A) = a(A).
Exercise S. Show that if A elF""", then the 'sum of the degrees Qf
invariant polynomials of AI - A is n.
SOLUTION. Since Al - A is a matrix polynomial of rank n, there aref
Theorem I) unimodular matrix polynomials peA) and Q(A) such that . Al - A)S(A) + Mo)(U - B) = (Al - A)(R(A)('u - B) + Qo),
P(A)(AJ - A)Q(il) = diag[i 1{A.), i 2 (il ), ... , i,,(A)].
On taking determinants, the left-hand side becomes a scalar multiple of (Al - A){S(A) - R(..1.)}(U - B) = (,u + A)Qo - Mo('u -lJ).
characteristic polynomial and hence has degree n. The result now foil
t this is an identity between matrix polynomials, and since the degree of
immediately. 0
polynomial on the right is I, it follows that S(A) == R(A). (Otherwise the
ee of the matrix polynomial on the left is at least 2.) Hence,
7.6 Similarity and the First Normal Form Mo()J ...:. B) = (Al - A)Qo, (2)

This section is devoted to the development of the connection bet


equivalence and similarity transformations. The first theorem will esta'.' Mo = Qo, MoB = AQo, and MoB = AMo. (3)
an important link between the theory of matrix polynomials, as discus;
in this chapter, and the notion of similarity of matrices from'"'' ". 'rW we have only to prove that M o is nonsingular and we are finished.
:uppose that division of peA) on the left by AI - B yields
Theoreml. Matrices A,BeIF"K" are similar if and only if the rna
polynomials AJ - A and AJ - B are equivalent. P(..t) = (M - B)U(A) + Po,
264 7 MATRIX POLYNOMIALS AND NORMAL FORMS ''].1 EUlMBNTARY DIVISORS
265
where Po is independent of..t Then using Eqs, (1) and (2) we have equivalence of AI - C 1 and A(,t}; hence 1, ... ,1, i..(A.}, ... inO.) are the
1= M(A)P(A) invariant polynomials of both M - C I and 11 - A.
= AI - A)S(A) + Mo)AI - B)U(A) + Po)
Check that the first natural normal forms of the matrices
= (,tI - A)S(,t)(U - B)U(,t) + (M - A)QoU(A)
+ (M - A)S(,t)Po + MoP o
= (M - A){Q(,t)U(A) + S(,t)Po} + MoP o. AI = 3 -1 0]
-1 3 O. A2 =
[6 22]
-2 2 0

Hence Q(,t)U(A) + S(,t)Po = 0 and MoP o = I. Thus, det M o '# 0 and the
1 1 -I 2 0 0 2
last equation in (3) implies that A and B are similar. are, respectively,
Now we are in a position to prove a theorem concerning a simple normal .
form to which a matrix from IF n x n can be reduced by similarity transforma-.
tions. It is important to observe that the normal form obtained is also in C~I) [~- "-~--i]. C~2) = [ .~ 1
= o 0]
I. 0
1F" x n and that fF is an arbitrary field. o -8 6 32 -32 10
Theorem 2. If A e 1F"x n and the invariant polynomials of AI - A ofnonzero
degree are i.(,t), i,+ I(,t), ... , ill(A), then the matrix A is similar to the block-
diagonal matrix
C 1 = diag[C lo' Clo+l' ... , C/J, 7.7 Elementary Divisors
where Clk denotes the companion matrix associated with the (invariant)
polynomial i,,(l), (s s; k S; n).
The problem ofreducing the companion matrices of invariant polynomials
The matrix C I in Eq, (4) is often referred to as the first natural normal form
of A. .
J>pearing in the first natural normal form to matrices of a smaller order
lepends on the field fF over which these polynomials are defined. Indeed, we !
PROOF. First we note that by Exercise 7.5.5 the matrix C I in Eq, (4) is', 'btain the further reduction by considering the factorization of each in- 1
n x n, and since the invariant polynomials are over ', we have C l eIF"x". ariant polynomial into irreducible polynomials over fF. For simplicity,
We now show that M - A and).J - C I have the same invariant polynomials d because this is the most important case in applications, we are going to
and hence the result will follow from Theorem 1. nfine our attention to matrices (and hence invariant polynomials) defined
Indeed, by Theorem 7.5.1 and Exercise 7.~.4, the matrix polynomi ,:ver the field of complex numbers, C (but see.also Section 6.7).
AI - Clk (s::; k s; n) is equivalent to the diagonal matrix D,,(l). ,We consider the problem of an n x n matrix polynomial A(,t), with
diag[l, 1,... ,1, i,,(A)]. Then it follows that M - CI is equivalent to f' ements defined over C, having rank r, (r S n) and invariant polynomials
matrix D(A) = diag[D,(l), D.+ l(l), ... , D,,(l)] and thus i(,i), i 2(,i), ... , i,(l). We define the zeros of det A(,i) to be the latent roots
.qr A(A), and since det A(l) is a polynomial over C we may write

I
AI - C I = P(l)D(l)Q(l) '!.

for some unimodular matrices peA), Q(A).


Now define ,
det A(A) = k l n&

1=1
(1 - l})"'J,
A
I
/.\(,t) = diag[l, ... , I, i..( l), ... , i,,(l)] 1here k l ' :F 0; AI' .12 ; "" l, are the distinct latent roots of A(,t); and m} ;::: 1

and observe that /.\(.1) is the Smith canonical form of AI - A. Since by ell
,PC eachj, From the Smith canonical form we deduce that Ii
~
mentary operations of the second kind the matrix polynomial D(l) can .
transformed into /.\(l), they are equivalent. The relation (5) then implies t det A(l) = k 2 n i/..l), ~
}=1
I

~~"l'!
\~

~ 'f
"
266
I
7 MATRIX POLYNOMIALS AND NORMAL FOR ELI!MENTAIlY DIVISORS
267 I
where k2 :F O. Since the invariant polynomials are monic, it follows thai
k 1 = k 2 and
SOLUTION. Since the order of the matrix polynomial is 5, there are five
invariant polynomials ik(..t), k = 1, 2, ... , 5. And since the rank is 4, is(A.) =
l
dsCA)/d4 (J.) == O. Now recall that the invariant polynomial itA) is divisible
n iJ{A) = n (A -
j"l ""1
AJnlk by ij - 1(..t) for j ~ 2. Hence, i4 (J.) must contain elementarydivisors associated
withall eigenvalues of the matrix polynomial and those elementary divisors
Moreover, since itA) is a divisor of iJ+t(A) for j = 1,2, ... , r - 1,it follow, must be of the highest degrees. Thus, i4 (J.) = J.3(A - 2)2(..t + 5). Applying
that there are integers Olj'" 1 S; j S; rand 1 S; k S; s, such that the same argument to i3(..t) and the remainingelementarydivisors, we obtain
i 1(A) = (A - Al)~I1(A - AZ)~12 .. (A - A.)"",
i3(.t) == .t(,t - 2) and then i1(.t) == 1 All elementary divisors are now
exhaustedand hence it(A) == 1. 0
i2(A) = (A - At)~ZI(l - A2)"zz .. (A - l.)~2',
The possibility of describing an equivalenceclass of matrix polynomials
in terms of the set of elementary divisors permits a reformulation of the
ir(it) = (A - J.t)~rl(A - A2)~rz ... (A - A;r", statement of Theorem 7.6.1.
and for k = 1, 2, ... , s; .Proposition 1. Two n x n matrices A and B with elementsfrom C are similar
if and only if H - A and i..l - B have the same elementary divisors.
o S; a1/< S; aZk S; . S; ark S; mk, t
)=1
a)k = mk'
Before proceeding to the general reduction of matrices in en"", there is
one more detail to be cleared up.
Each factor (A - AirJIe appearing in the factorizations (1) with a.jk > 0 .
called an elementary divisor of A(A). An elementary divisorfor which a.)k = Theorem Z. If A(A), B(A) are matrix polynomials over C, then the set of
is said to be linear; otherwise it is nonlinear. We may also referto the elemen' elementary divisors ofthe block-diagonalmatrix
tary divisors (A - Ak)"JIc as those associated with Ate, with the obvious mean,
ing, Note that (over C) all elementary divisors are powers of a linear poly~ C(J.) = [A(A) 0]
nomial. o B(A)

Exercise 1. Find the elementary divisors of the matrix polynomial defin .is the union ofthe sets of elementary divisors ofA(J.)and B(A). II
in Exercise 7.5.2(b). eROOF. Let Dt(A) and D2(..t) be the Smith forms of A(A) and B(A), respec-
tively. Then clearly
Answer. A., A., A4 , A-I. 0
Note that an elementary divisor of a matrix polynomialmay appear sever
times in the set of invariant polynomials of the matrix polynomial Al
C(A) = E(A)[Dt(J.)
o DZ(A)
0 ]F(J.)
I
the factorizations (1) show that the system of all elementarydivisors (alo
with the rank and order) of a matrix polynomial completely definesthe
tor some matrix polynomials E(A) and F(A) with constant nonzero deter-
ruinants. Let (..t - A.o)~', ... , (A - A.o)~p and (A - J.o)l\ ... , (1 - ..to)"'1 be
I
of its invariant polynomials and vice versa.
It therefore followsfrom Corollary 1 of Theorem 7.5.1 that the elementa
divisors are invariant under equivalence transformations of the ma
the elementary divisors of Dt(A) and D2(A), respectively, corresponding to
thesamecomplexrooUo. Arrangethe set ofexponentsOlt,
~n a nondecreasing order: {ollJ ... , aI" plJ ... , {J,,} == {vlJ
, ol p, {J t, ... , {Jq,
,'I'p+g}, where
I'.,

polynomial.
Theorem 1. Two complex matrix polynomials are equivalent if and only
they have the same elementary divisors.
.p< 'l't S; S; 'l'p+q' From the definition of invariant polynomials (Section
'.5), it is clear that in the Smith form D = diag(it(A), ... , ir(l), 0, ... ,0] of
'iag[Dt{J.), D2 (J.)], the invariant polynomial ir(A) is divisible by(..t - ..to)1p+'I
ut not by (A - Ao)1p+'I+ 1; and i,-t(A) is divisibleby (A - lO)1P+v' but not
I
'y (A - Ao)1 p+'I-1 +t; and so on. It follows that the elementary divisors of
Exercise 2. Determine the invariant polynomials of a matrix polynom
of order 5 having rank 4 and elementary divisors A, A., l3, J. - 2,(..t- 2
[
Dt(A)
o
0]
D 2(l)
A + S.
\ ~
~,
~p
/l\i.'.{
..
fi',:'

I, .
268 1 MATRIX POLYNOMIAlS AND NORMAL FORMS
1.8 THIl SECOND NORMAL FORM 269

(and therefore also those of C(A corresponding to J.. o) are just (J.. - ,to)", or, equivalently,
.. , (J.. - ,to)'!'''', and the theorem is proved. (b) 11=1, Iz=21, ... , Ip=pl, Ip+i=r, i~l,
Exercise 3. Determine the elementary divisors of the matrix where p = [rfl], the integer part of rll.
A- 1 0 (c) The elementary divisors of U - A are

[
o
o
AZ + 1 AZ :
-1 AZ - 1
1] (A -
..
ocoy, ... , (A -
y
ao)P,
f l
(A - oco)P+ 1, , (A. - OCo)P+ 1,
.. '

, - q times q times
(a) over C; (b) over R. where the integers p and q satisfy the condition r = pi + q (0 ::; q ::; I - 1,
Answers. (a) A-I, A , A - i, A. + I; (b) A-I, A , J..z
Z Z
+ 1. P > 0).

Exercise 4. Let J be the Jordan block of order p associated with ,to: Hint. For part (c) use part (b) and Exercise 6.4.2 to obtain k l = '" =
kp-l = 0, k p = I - q, k p + t = q (in the notation of Exercise 6.4.2). Also,

J=
,to
~,to
1 O. ::'.
'.
?]o L observe that in view of Exercise 4, the number k", gives the number of ele-
mentary divisors (,t - J.. 01" of A.

[ : . '. ". I
il?xercise 9. Let I be an eigenvalue of A e Cftxn. Show that the index of A.
o '" 0 11.0 r. I

Show that the matrix }J - J has a unique elementary divisor (..t - ..loy.
is equal to the maximal degree ofthe elementary divisors oru - A associated
With..l. 0 I
Exercise 5. Check that unimodular matrices, and only they, have n
i>.
I
elementary divisors. \ f

Exercise 6. Let D e C" x n be a diagonal matrix. Check that the element


divisors of}J - D are linear. The Second Normal Form and the Jordan Normal Form
Exercise 7. Verify that (..t - 0)' is the unique elementary divisor of t
matrix polynomial }J - A, where Let A e cnxn and let '1(J..), lz(A), ... , Ip(A) denote the elementary divisors

'; 1]
the matrix }J - A.
0 OCI OC2 : :
o 0 OCI leorem 1 (The second natural normal form). With the notation of the
A = '. 2 'evious paragraph, the matrix A is similar to the block-diagonal matrix
: ' OC1
[
o ... 0 /Xo Cz = diag[C
",
C'l' ... , C 'p]' (1)

(A E C'X') and /XI :1= O. re Clk (1 ::; k :s; p) denotes the companion matrix associated with the
)nomiall,.{,t).
Hint. Observe that the minor of U - A of order r - 1 obtained by striki
out the first column and the last row is a nonzero number and, therefore, in t ,F. By Exercise 7.5.4 the only nonconstant invariant polynomial of the
notation of Section 7.S,d I = d 2 = ... = d'_1 = 1 and d,(A) = (A. - oco)" trix .u - C'I< is Ml). Since l/cO.) is an elementary divisor of .u - A
, the,field C, it is a power ofa linear polynomial in ..t, and hence the unique
Exercise 8. Let A eC'x, be defined by Eq. (2) and let OCI = 1X2 = entary divisor of A.l - C'I<' Applying Theorem 7.7.2, it follows that the
= 0c,_1 = O. IXI :1= 0 (1 ::; I ::; r - I). Verify that ;rix.u - C l , in which C z is defined in Eq. (1), has the same elementary
isors as .u - A. Proposition 7.7.1 now provides the similarity of A and
(a)
.
I~ ,Q. dlm(Ker(<<o 1 - A)") =
{IS if Is:S; r
'f I
r 1 s r \

It
270 7 MATlUX POLYNOMIALS AND NORMALFORMS 7.9 THE CHARACTERISTIC AND MINIMAL POLYNOMIALS
271
Exercise 1. Show that the first and second natural normal forms of the PROOF. By Exercise 7.7.4,the only elementary divisor oUI _ J1is(A _ ~Y'.
matrix The same is true for the matrix )J - e,l' where I, refers to I;(it) = (it _ A,)'"

3-1 -5 1]
1 I -1 0
(see the proof of Theorem 1). Thus, the matrices J, and C , are similar
(by Proposition 7.7.1) as are the matrices J and C 2 in Eq. (1) (~ee Theorem

[oo 0 -2 -1
0 1 0
7.7.1). It remains now to apply Theorem 1 to obtain the desired result.

Exercise 2. Show that the Jordan normal forms of the matrix discussed in
Exercise 1 are
are, respectively, the matrices

0o 01 01 0]0 0
1
-1 -2 0 0
0 0] -1o 10 01
-1 0 0 [2 1 0 0]
0 2 0 0
[-40-40 03 1
and
[o 0 0 1. 0 [ oo 0 2 1
0 0 2
or 0 0 -1
0 0 0-1
n 0
2 o 0 -4 4
Corollary 1. A matrix A eC"X n is simple if and only if all the elementary
Note that in contrast to the first natural normal form, the second normal divisors of )J - A are linear.
form is defined only up to the order of companion matrices associated with
the elementary divisors. For instance, the matrix PROOF. First, if A is simple then (Theorem 4.8.2') A = PDp-I, where
det P :F 0 and D = diag[A h . ,,t,,]. Hence )J - A = P(Al - D)P-l and
01 0 0]
-4 4 0 0
the matrices AJ - A and ,t[ - D are equivalent. But the elementary divisors
Of D are linear (Exercise 7.7.6) and, therefore, so are those of AI - A.
[o 0 0 1
o 0 -1 -2
Conversely, if all the elementary divisors of)J - A are linear, then all the
Jordan blocks in the Jordan normal form are 1 x 1 matrices. Thus, A is
similar to a diagonal matrix and therefore is simple. . '
is also a second natural normal form of the matrix in Exercise 1.
In the statement of Theorem 1, each elementary divisor has the, forlll Exercise 3. Let A be a singular matrix. Show that rank A = rank A 2 if and
I,tA) = (A - Aj)"'i. Observe also that there may be repetitions in the eigen- !->nly if the zero eigenvalue of A has only linear elementary divisors. 0
values AI' A2,"" A" and that 1:f=1 OCi = n.
Now we apply Theorem 1 to meet one of the declared objectives of th
chapter, namely, the reduction of a matrix to the Jordan normal form.
c1
Theorem 2 (Jordan normal form). IfA e C" x" and)J - A hast elementar' 7.9 The Characteristic and Minimal Polynomials
divisors (A - AI)", i = 1,2, ... , t, then A is similar to the matrix
J = diag[Jt>J 2 , ,J,]eCnXn
, In this section we continue our investigation of the relationships between
where J i is the Pi x p, Jordan block corresponding to (A - AI)" for i = . . the characteristic and minimal polynomials. This was begun in Chapter 6.
2, ... , t; . Let A eFnxn and let B(A) = adj(U - A). As noted in Section 7.2, B(A)
o ... 0 ~a monic polynomial over F of order n and degree n - 1. Let a(A) denote
AI 1
the (monic) greatest common divisor of the elements of B(A). If c(..1.) and
0 Al :,1.) denote, respectively, the characteristic and the minimal polynomials
JI = I: 0 :A, then we already know (see Theorem 6.2.1) that C(A) is divisible by m(..1.).
1 turns out that the quotient c(l)/m(A) is simply the polynomial l:5(A) just
0 ... o Al
.:fined. .

I
I
\
272 7 MATRIX POLYNOMIALS AND NORMAL FORMS 7.9 TUB CuARACTI!R.lsnc AND MINIMAL POLYNOMIALS 273

Theorem 1. Wit!! the notationof the previous paragraph, in which dn- &t) is the (monic) greatest common divisor of the minors of
M - A of order n - lor, equivalently,the greatest common divisor of the
c(l) = cS(l)m(l). (1) elements of the adjoint matrix B(.t) of A(l). Thus, dn .:. 1(.t) = l5(l), where c5(1)
PROOF. First, we introduce the matrix polynomial is defined in the proof of Theorem 1. This yieldsi,,(l) = m(l) on comparison
of Eqs. (4) and (1).
1
C(l) = ,5(1) B(l), Now wecan givean independentproof ofthe resultobtainedin Proposition
6.6.2.
known as the reduced adjointof A, and observe that the elements of C(l) are
relatively prime. In view of Eq. (7.2.1) we have c(l)1 = cS(AX~J - A)C(A).. Corollary 1. A matrix A e en" nis simple if andonly if its minimal polynomial
This equation implies that c(ot) is divisibleby cS(ot) and, sinceboth of themare has only simple zeros.
monic,there exists a monic polynomialmeA) such that C(A) = 6(A)m(1). Thus PROOF. By Corollary 7.8.1, A is simple if and only if all the elementary
= (M - A)C(l). divisors of M - A are linear, that is, if and only if the invariant polynomial
m(A)1 (2)
Our aim is to show that mO.) = m(l), that is, that mel) is the minimal
in(l) = n:=
1 (A - ~), where A1' A2' ... , A.are the distincteigenvalues of A.
The result follows immediately from Theorem 2.
polynomial of A. Indeed, the relation (2) shows that m(,t)1 is divisible by
M - A and therefore, by the corollary to Theorem 7.23, meA) = O. Thus, CoroUary 2. A matrix A E Fn" n is nonderogatory if and only if the n - 1
rii(l)is an annihilating polynomial of A and it remains to checkthat it is an first invariant polynomials ofM - A are ofdegree zero,that is,they are identi-
annihilating polynomial for A of the least degree. To see this, suppose that . cally equal to 1.
rii(l) = 6(l)m(A). Since m(A) = 0, the same corollary yields PROOF. Let A be nonderogatory and let i 1(l ), i2(l ), ... , i n(l ) denote the
m(A)1 = (M - A)C(l) invariant polynomials of M - A. If c(l) is a characteristic polynomial of A
for some matrix polynomial C(A), and hence
then c(l) = n;=
1 il(l), and by assumption C(A) coincideswith the minimal
polynomial m(A) of A. But by Theorem 2, mel) = iJ,t) and therefore the
m(l)1 = (M - A)O(l)C(l). comparison gives i1(1) = ... = in - 1(1) = 1. Note that there are no zero
invariant polynomials of M - A sinceit is a regular matrix polynomial. .'.
Comparing this with Eq. (2) and recallingthe uniquenessof the leftquotient,
we deduce that C(l) = O(l)C(l). But this would mean (if 0(1) ;f. 1) that 6(1) Now we show that the second natural normal form of Theorem 7.8.1 (as
is a common divisor of the elements of C(l), which contradicts the definition wellas the Jordan formthat is a consequenceofit) is the best in the sensethat
of ,5(l). Thus 8(l) == 1 and mel) = m(l). no block C'k' k = 1; 2, ... , p, in Eq. (7.8.1) can be split into smaller blocks.
More precisely, it will beshown that C cannot be similarto a block-diagonal
This result implies another useful description of the minimal polynomial. 'k
matrix(of more than one block).To seethis,wefirstdefinea matrix A e ," x "
to be decomposable if it is similar to a block-diagonal matrix diag[A.. A 2 ] ,
'Iheorem 2. The minimal polynomial of a matrix A e Fn"" coincides with
where Al ecnlxlIl, A2 e C"z x lIz, and nl> n2 > 0, n1 + n2" n. Otherwise, the
the invariant polynomial of M - A of highestdegree.
matrix is said to be indecomposable.
PROOF. Note that M - A has rank n and that by Proposition 7.3.1 and
Proposition 1. A matrix A e en "II is indecomposable if and only if it is non-
1nbeorem 7.5.1, derogatory and its characteristic polynomial c(l) is of the form (1 - 10)' for
det(M - A) = i 1(1)i2(.t) i..(l), some 10 E C and positive integer p.
in which none of i 1(.t), ... , i n(l ) is identicallyzero. PROOF. Let A be indecomposable. If it is derogatory, then by Corollary 2
Also, IX = 1 since the invariant polynomials, as well as det(M - A), are, the matrix polynomial M - A has at least two nonconstant invariant poly-
monic. Recalling Eq, (7.5.3), we deduce from Eq. (3) that nomials, Now Theorem 7.6.2 shows that A is decomposable, which is a
c(A.) = det(M - A) = d.._ 1(1)i..(1), contradiction. If the characteristic polynomial c(l) has two or more distinct
\

I
274 7 MATRIXPOLYNOMIALS AND NORMAL FORMS 7.10 DIFFERENTIAL AND DIFFERENCIl EQUATIONS
275
zeros, then ,u - A has at least two elementary divisors. Theorem 7.8.1 then '*
for certain matrices L o, L 1, , L, E C)(", where L1 0 and the indices on
states that again A is decomposable. the vector x(t) denote componentwise derivatives.
Conversely, if A is nonderogatory and c(A) = (A - .to)"
then A must be Let ~(C") be the set of all continuous vector-valued functions yet) defined
indecomposable. Indeed, these conditions imply, repeating the previous for all real t with values in C". Thus y(t)::;; [Yl(t) . PII(t)]1' and
argument, that ,u - A has only one elementary divisor. Since a matrix B is 't(t), . , p,,(t) are continuous for all real t. With the natural definitions of
decomposable only if the matrix M - B has at least two elementary divisors, vector addition and scalar multiplication, ~(C") is obviously a (infinite-
the result follows. dimensional) linear space.
Here and in subsequent chapters we wiIJ investigate the nature of the
Corollary 1. The matrices C,,, (k = 1, 2,... , s) in Eq. (7.8.1) are indecom- solution space of Eq. (1), and our first observation is that every solution is a
posable.
member of ~(C"). Then it is easily verified that the solution set is, in fact, a
The proof relies on the fact that a companion matrix is nonderogatory linear subspace of ~(C"). In this section, we examine the dimension of this
(Exercise 6.6.3)and on the form of its characteristic polynomial (see Exercise subspace. .
4.11.3). The matrix polynomial L(A) = D=oAiL)is associated with Eq. (1), which
can be abbreviated to the form L(d/dt)x(t) = O. Thus,
Corollary 2. Jordan blocks are indecomposable matrices.
To prove this see Exercises 6.6.2 and 7.7.4. Thus, these forms cannot be d) L L J -dJx
I

reduced further and are the best in this sense. (


L -d x(t) ~
t i=O dt
l'

Exercise 1. The proof of Theorem 1 above [see especially Eq. (2)] shows .To illustrate, if
that the reduced adjoint matrix C(A) satisfies
A2 (t)]
(M - A)C(A) = m(A)I.
Use this relation to show that if Ao is an eigenvalue of A, then all nonzero
L(A) = [ A-I ~2] and x(t) = [
XI
X2(t) E ~(C
2
),

columns of C(Ao) are eigenvectors of A corresponding to .to.


Obtain an
and, in addition, Xt(t) and X2(t) are twice differentiable, then
analogous result for the adjoint matrix B(la). 0

(d) .= [X\ll(t)X?l(t)
L dt x(t)
+ 2X2(t) ]
_ xl(t) + X~21(t) .

7.10 The Smith Form: Differential and Difference Equation$ Let L.(A), L 2(A) be n x n matrix polynomials and

L(A) = L 1(1)L2(A).
In this section an important application of the Smith normal form is
made to the analysis of differential and difference equations. Consider first
a set of n homogeneous scalar differential equations with constant comple~
coefficients in n scalar variables Xl(t), ... , x,,(t). Definex(t) by . L(~)X(t) = L , (~) ( L (:t)X(t).
2 0
x(t) = [x.(t) .. , X,,(t)]T With these conventions and the Smith normal form, we can now establish I
1',.
and let I be the maximal order of derivatives in the n equations. Then the' the dimension of the solution space of Eq. (1) as a subspace of ~(C). This e~',
result is sometimes known as Chrystal's theorem. The proof is based on the ,~
equations can be written in the form

!.:;"
Well-known fact that a scalar differential equation of the form (1) with order l". j'

L,rll(t) + L ,_ 1X(I-1)(t) + ... + L 1x(1)(t) + Lox(t) = 0 I has a solution space of dimension I. U


\:
l .\

..
276 7 MATRIx POLYNOMIALSANI> NORMAL FORMS 1.10 DIFFIllU!NTIAL AND DIFFIllU!NCIl EQUATIONS 277

Theorem 1 (G. Chrystall). If det L(l) = 0 then the solution space of The following special cases arise frequently: if det L, =F 0 and, in particular,
Eq. (1) has dimension equal to the degreeof det L(l). if L(..t) is monic, then the solution space of Eq. (1) has dimension In. This
follows immediately from the obvious fact that the leading coefficient of
PROOF. Since det L(l) 0, the Smith canonical form of L(,t) (see Section
7.5)is of the form D(..t) = diag[i 1 (..t), i2(,t), .. , i,;{,t)],where i1(1), i2(1),. , i,,(l), det L(l) is just det L,. One special case is so important that it justifies a
separate formulation:
are the nonzero invariant polynomials of L(.t). Thus, by Theorem 7.5.1,
L(l) = P(l)D(l)Q(l) (2) Corollary 1. If A e C""" then the solution space of x(t) = Ax(t) has
dimension n.
for some unimodular matrix polynomials pel) and Q('\). Applying the result
of Exercise I, we may rewrite Eq. (1) in the form
We turn now to the study of a set of n homogeneous scalar difference

P(~)D(~t)Q(~t)x(t) = O. equations with constant complex coefficients and order I. Such a system can
be written in matrix form as
Lety(t) = Q(d/dt)x(t) and multiply Eq. (3) on the left by P-t(d/dt) (recall
that P-1(l) is a polynomial) to obtain the system
L,xJ+I + L'-lXJ+I-l + ... + L 1xJ+1 + Lox) = 0 (6)
I
,I

[i (:t),,i"(:t)}(t) forj = 0, 1,2, ... , whereLo,L 1, , L,eC"x" (L, =F 0), and a solution is a
diag i
1 ( : ,). 2 = 0,

which is equivalent to Eq. (1). Clearly, the system in (4) splits into n indepen-
dent scalar equations
sequence of vectors (xo, x.. X2, ...) for which all the relations (6) hold with
j = 0, 1,2, .... In this case, we introduce the linear space 9"(C") consisting
of all infinite sequences of vectors from C", and with the natural component-
wisedefinitions of vector addition and scalar multiplication. Then it is easily
'I,

ik(~t)Y"(t) = 0,
verified that the solution set of (6) is a subspace of 9"(C").
k = 1,2, ... , n, It will be convenient to introduce a transformation E acting on 9"(C")
where Yl(t)' .. , y,;{t) are the components of yet). But the scalar differentia!
defined by 1"

equation a(d/dt)y(t) = 0 has exactly d = deg a(l) linearly independ E(uo, Ulo .] = ("10 "2"")
solutions. Apply this result to each of the equations in (5) and obse 1
fot any (uo, II.....) e 9"(C"). It is clear that E is linear. It is called a shift
that the number of linearly independent vector-valued functions yet) sa',
fying Eq. (4) is equal to the sum of the numbers of linearly independe
solutions of the system (5). Thus the dimension ofthe solution space ofEq. (:
is d t + d2 + ... + ~, where d" = deg i,,(l), 1 s: k s: n. On the other hanl
operator and will be seen to playa role for difference equations analogous to
that played by d/dt for differential equations.
For r = I, 2, 3, ... , the powers of E are defined recursively by
I!
Eq. (2) shows t h a t '
det L(l) = ex det D(l) = ex n" i,,(l),
F:u = E(F:- 1u), r = I, 2, ... , " e 9"(C"), ~j
where EO is the identity of 9"(C"). Consequently,
"=1
where ex = det P(l)Q(l). Since ex is a nonzero constant, it follows that F:(uo, "1') = ("" u r + h" .),
\
L" deg i,,(..t) = e""nand (uo, "1~ ...)e 9'(C"), we define
I
deg(det L(.i = d1 + d2 + + ... + d". Also, if A e
"=1 A("o, "h"') A (A"o, Alit, ...) e 9"(C").
The equivalence of the systems (4) and (1) in the sense that
'Then it is easily verified that, if we write x = (x 0' Xh ), Eq. (6) can be written
x(t) = Q-l(~~(t) in the form
now gives the required result. L(E)x A L,(E'x) + ... + L1(Ex) + LoX = 0, (7)
t Trans. Roy. Soc. din. 38 (1895), 163. \\therethe zero on the right side of Eq, (7) is the zero element of 9'(C").

ill! ~l
278 7 MATRIX POLYNOMIALS AND NOIUofAL FORMS 7.11 MISCELLANEOUS EXERCISES
279
Another illuminating way to see Eq. (6) is as an equation in infinite vectors
and matrices. For simplicity, consider the case I = 2:
(1 + flo (1 - fli)
2 ~z = 2 ' ~3 == 2 .
LoOt.,Lo LL La..
0",] [XO]
Xl
1
[0]0
2. If A(.t), B(A), D(A) are matrix polynomials of size n, and A(l), and SeA)
[o 0.. Lo L "
x = O'
..
l
.. 2
have invertible leading coefficients and degrees I and m, respectively, show

"
t that the degree of A(l)D(l)B(l) is not less than I + m.
Theorem 2. If det L(l) = 0, then the dimension of the solution space of Eq. 3. Two m x n matrix polynomials A(l) and B(A) are said to be equivalent if
(6) isequal to thedegree ofdet L(l). there are unimodular matrices peA) of order m and Q(l) of order n such
that B(l) == P(l)A(A)Q(l). Show that any m x n matrix polynomial is
PROOF. Using the decomposition (2), reduce Eq, (7) to the equivalent form
equivalent to a matrix polynomial of the form
D(E)y == 0, (8)
a.(,1.) 0 ... 0 0 .. 0
where y == Q(E)x and 0, x, y e 9'(C"). Observe that Eq. (8) splits into n
independent scalar equations. Since the assertion of the theorem holds true
o 02(A)
for scalar difference equations, it remains to apply the argument used in the
proof of Theorem 1. o
Coronary 2. If A e C""", then the solution space of the recurrence relation
o o am( ).) 0 o
ifm:::;; n, or
xi+1 == Ax}> j == 0, 1, 2,... ,
.hasdimension n. al(l) 0 ... 0
o
o
a,,(..1.)
7.11 Miscellaneous Exercises
o

1. Let A e C 10 " and suppose the invariant polynomials of 1I - A are!


10 o o
i&~) = (A.3 + 1)z.
== . == i,(l) == 1, is(l) == l + 1, i9(l) = l3 + I, ilO(l) ifm> n, where ajl) is divisible by aj_l(l),j ~ 2.
Check that the first and second natural normal forms and the Jordan
4. Show that if A e C""" is idempotentand A :F I, A :F0, then the minimal
normal form of A are, respectively,
polynomial of A is l2 - l and the characteristic polynomial has the form
010000 (). - 1)').5. Prove that A is similar to the matrix diag[I" 0] .
001000 5. Prove that

C l == diag -1, [ 01,


-1 o 0
~ 10]
000100
o 0 0 0 1 0 II, (a) The geometric multiplicity of an eigenvalue is equal to the number
of elementary divisors associated with it;
000001 (b) The algebraic multiplicity of an eigenvalue is equal to the sum of
_ -1 0 0 -2 0 0 degrees of all elementary divisors associated with it.

[_~ -~l ~2' _~~ 2~J, ~3' -~~ 2~Jl


6. Define the multiplicity of a latent root .1.0 of A()') as the number of times
C z == diag[ -1, -I, [ [ the factor ). - .1.0 appears in the factorization of det A(A) into linear
factors.

J .[
== diag -I, -I, [-1 -11] '~2>o[~2 1~2'~3'
0 J [~3 1]]
~3 0 '
Let A(A) be a monic n x n matrix polynomial of degree I. Show that
A(A) has In latent roots (counted according to their multiplicities), and
..
i
\1.;

i,
..
:'1
280 7 MATRIX PoLYNOMIALS AND NORMAL POI .7.11 MISCELLANEOt1S ExERCISES
281
that if the In latent roots are distinct, then all the elementary divisors where AJ = A,-AJ' j = O. 1, ... , I - 1. Show that Ao is a latent root of
A(A) are linear. A(A) if and only if it is an eigenvalue of CA'
7. If Ao is a latent root of multiplicity tXof the n x n matrix polynomial A(A). 14. Let X lJ X 2.. , X, ECII" II be right solvents of the n x n matrix poly-
prove that the elementary divisors of A(A) associated with AO are all nomial A(A). Show that, with C" defined as in Exercise 13.
linear if and only if dim(Ker A(Ao = tX.
8. If m(A) is the minimal polynomial of A E CII"" and f(A) is a polynomial (a) C" V = V diag[X., X 2' ,X,J, where V denotes the generalized
Vandermonde matrix
over C. prove that f(A) is nonsingular if and only ifm(A) and f(A) are
relatively prime.
9. Let A(A) be a matrix polynomial and define a vector x =f.: 0 to be a latent. Xl X
III I" '"
...
III]
X,
vector of A(A) associated with A.o if A(Ao)x = o. V=.. 2 "
[
If A(A) = A o + A.A l + .PA:l (det A 2 :F 0) is an n x n matrix poly~,
nomial, prove that r is a latent vector of A(A) associated with Ao if and
L. .:
X 2 ... xL.
I

only if . (b) If V is invertible then the spectrum of A(A) (i.e., the set of all
its latent roots) is the union of the eigenvalues of X lJ X 2, , X, ;
(A.o [A
Al
2
0] + [0 -A2])[ XO]-0
A2 Ao 0 AoXo - .
(c) CA is similar to diag[X 1, X 2 , X,] if and only if the set of all
elementary divisors of C A coincides with the union of the elementary
Generalise this result for matrix polynomials with invertible leading divisors of X .. X 2 , . , X"
coefficientsof general degree.
10. Let A(A) be an n x n matrix polynomial with latent roots Ah A2. An
and suppose that there exist linearly independent latent vectors
XlJ X:l ... x" of A(A) associated with A1J A2.. An. respectively. Prove
that if X = [Xl X2 ... XII] and D = diag[Ab A2 ... AJ. then
S = XDX- I is a right solvent of A(A).
11. Let the matrix A E C""" have distinct eigenvalues AlJ 042 , ... , A. and
suppose that the maximal degree of the elementary divisors associated
with At (1 ::;; k ::;; s) is m" (the index of At). Prove that
B

e- = L' Ker(,V - A)"'k.


It .. 1

U. Prove that if X E C"" II is a solvent of the n x n matrix polynomial A(A).


then each eigenvalue of X is a latent root of A(,i).
Hint. Use the Corollary to Theorem 7.2.3.
13. Consider a matrix polynomial A(A) = D.. o AiAJ with det A, =f.: 0, and
define its companion matrix .
0 I" 0 ... 0
0 0 I"
C"A I :
...
0
0 0 0 III
-A o -AI .. , -A,- l

"'~ \"
I'
i\
'\'

I,
...

i
".\
J FIELD OF VALUIlS OF A HIlllMITIAN MATRIX 283

of Values; Extremal Eigenvalues of a Hermitian

CHAPTER 8
~\, The eigenvalues of a matrix A E C" .", form a set of n (not necessarily
distinct) points in the complex plane. Some useful ideas concerning the
istribution of these points can be developed from the concept of the field
The Variational Method values of A, defined as the set F(A) of complex numbers (Ax, x), where x
l'nges over all vectors in e" that are normalized so that (x, x) = x.x = 1.t
Observe that the quadratic form

f(x) = (Ax, x) = L" aljxjxj, x= [Xl X2 '" X,,]T,


i,j=l

'a continuous function in n variables on the unit sphere in C", that is, the
it ofvectors for which
(x, x) = I" Ixil 2 = 1.
1=1

The technique developed in this chapter can be visualized as the genet, nee it foUows (see Theorem 2 in Appendix 2) that F(A) is a closed and
zation of a geometrical approach to the eigenvalue problem for 3 x 3 rl unded set in the complex plane. It can also be proved that F(A) is a convex
symmetric matrices. We saw in Exercise 5.10.5 that a quadric surface , (see Appendix 2 for the definition), but we omit this proof.
be associated with such a matrix H by means of the equation xTHx = I,
loorem 1. The field of values of a matrix A e Cn K " is invariant under
(Hx, x) =: 1, where x e R3 In particular (see Exercise 5.10.6), if H is posit' '[ary similarity transformations. Thus, if U E e" K" is unitary, then
definite, then the corresponding surface is an ellipsoid. The vector Xo e
from the origin to the farthest point on the ellipsoid can then be described' F(A) =: F(UAU"'). ,
a vector at which (x, x) attains the maximal value subject to the conditi I.... Wehave
that x satisfies (Hx, x) = 1.
Then the problem can be reformulated and the side condition eliminan (UAU"'x, x) = (AU"'x, U*x) = (Ay, y),
by asking for the maximal value ofthe quotient (x, x)/(Hx, z), where x vad re y = Usx. If (x, x) = 1, then (y, y) = (UU"'x, x) = (x, x) = 1. Hence
over all nonzero vectors in R 3 Or, what is essentially the same thing, umber O(is equal to (UAU"'x, x) for some x E en with (x, x) = 1 if and only
seek the minimal value of the Rayleigh quotient: =: (Ay, y) for some normalized y E e"; that is, 0( belongs to F(U AU*) if
only if 0( e F(A).
(Hx, x)
x#: O. his result admits an elegant geometrical description of the field of values
(x, x) ,
normal matrix in terms of its eigenvalues. First note that all eigenvalues
Now, this quotient makes sense for any n x n Hermitian matrix H and VI arbitrary matrix A E en K" are in F(A). For, if AE I1(A), then there is an
x E c n and is also known in this more general context as the Rayleigh quot l1vector x of A corresponding to this eigenvalue for which Ax = Ax and
for H. The study of its properties in this context is one of the main subj ,x) =: 1. Hence
of this chapter. (Ax, x) =: (AX, x) =: ..1.(x, x) =: A.,
d so AE F(A). Thus, I1(A) c: F(A).

t Throughout this chapter the inner product ( , ) is assumed to be the standard inner
'uct in C".

~
282 \~
284 8 THE VARIATIONAL METHOD 8.1 FIELDOF VALUES OF A HERMITIAN MATRIX 285

For normal matrices, the geometry of the field of values is now easily lSuch that (x, x) = 1. Hence, C = 0 (see Exercise 3.13.2) and A = A"',that
characterized.The notion of the convex hun appearing in the next theorem is, A is Hermitian..
is definedand discussed in Appendix 2.
Corollary 1. IfH is Hermitian with the eigenvalues A. S A2 S ... S An' then
Theorem 2. The field of values of a normal matrix A E Cn "II coincides with
F(R) = [At> An]. (1)
the convex hull of its eigenvalues.
Conversely, if F(A) = [A., An]' then A is Hermitian and AI' An are the minimal
PROOF. If A is normal and its eigenvalues are .tit.t2 , , .tn' then by and maximal eigenvalues of A, respectively.
Theorem 5.2.1 there is a unitary n x n matrix U such that A = UDU"', where
D = diag[A u A2 , , .t,,]. Theorem 1 now asserts that F(A) = F(D) and so it This result is an immediate consequence of Theorems 2 and 3.
suffices to show that the field of values of the diagonal matrix D coincides Recalling the definition of F(H), we deduce from Eq, (1) the existence of
with the convex hull of the entries on its main diagonal. vectors Xl and X n on the unit sphere such that (Hxu Xl) = Al and (Hxll , XII) =
Indeed, F(D) is the set of numbers A". Obviously,
II A. = (Hx., x.) = min (Hx, x), (2)
= (Dx, x) = L ~lxd2,
where (x, x) = L7=.
IX

IXl12 = 1.
1-. and
(0:,0:)=.

On the other hand, the convex hull ofthe points A., A2' .. , An is,according
,til = (HxlI , XII) = max (Hx, x). (3)
(x.o:): 1
to Exercise2 in Appendix 2 the set
Moreover, the vectors Xl and X n are eigenvectors of H corresponding to A}
tt. i ~ it.
BIAi : 0, Bi = 1}-
and .tn, respectively. In fact, it follows from Eq. (2) that if X is any vector with
(x, x) = 1, then
Putting 8i = Ix.1 2 (i = 1,2, ... , n) and noting that, as x runs over all vectors tHx, x) ~ Al = A}(x, x).
satisfying (x, x) = 1, the B. run over all possible choices such that 8. ~ 0; Hence
D-. 8i = 1, we obtain the desired result.
H - A x) ~ 0,
ll)x,
For a general n x n matrix A, the convex hun of the points ofthespectrum
of a matrix A is a subset of F(A) (see Appendix 2). But for the special case of and this implies that H -A 1fis positivesemidefinite. But Eq. (2) also implies
Hermitian matrices, the geometry simplifies beautifully. that

Theorem 3. The field of values of the matrix A E C'' is an interval of the : H - A}I)x}, x}) = 0,
real line if and only if A is Hermitian. and it follows (as in Exercise 5.4.1) that (H - A}l)Xl =
O. Since (x}, x}) = 1
PROOF. First, if A is Hermitian then A is normal and, by Theorem 2, F(A) it follows that x} "" 0 and x} is therefore an eigenvector of H corresponding.
is the convex hull of the eigenvalues of A, which are all real (Theorem S.3.l~ . to .t}.
Since the only convex sets on the real line are intervals, the required result Similarly, XII is an eigenvector of H corresponding to the maximal eigen-
follows. value.t". Thus, we have proved the following:
Conversely, if A E e"" nand F(A) is an interval of the real line, then/ .11aeorem 4. If Al and .Il.,. denote the minimal and maximal eigenvalues of a
writing Hermitian matrix He cnlCn. respectively. then
1 1
B = 9141 A = 2(A + A"'), C = .111'1 A = - (A - A"') A} = min (Hx, x), An = max (Hx,'x).
2i ' (0:.0:1-. (o: .o:)-}
we obtain Moreover, the extremal values of (Hx, x) are attained at corresponding eigen~
(Ax, x) = B + iC)x, x) = (Bx, x) + i(Cx, x), vectorsof H.

where (Bx, x) and (ex, x) are real since B and C are Hermitian (see Sectio This result is the first step in the development of a min-max theory of
S.10). But F(A) consists only of real numbers, therefore (Cx, x) = 0 for eigenvalues of Hermitian matrices, one of the main topics in this chapter.
286 8 THE VARIATIONAL METHOD 8.2 COURANT-FISCHER THEORY 287

Exercise 1. Let H = [hii]i,i= 1 denote a Hermitian matrix. Prove that, and the subspace CC"is orthogonal to the space generated by Xl' X2,"" xp - 1,
preserving the notation of this section. It has already been shown that

(a) Al ~ hjJ ~ An for j = 1,2, ... , n; Al = min R(x), (1)


(b) Ifcx = n- 1 L7,J=1 h jj , then Al s cx s An' o",e If,
and it will now be shown that the other eigenvalues of H can be characterized
Hint. For part (a) put xi = Bj' (a
unit vector in en) in (Hx, x) and use
in a way that generalizes statement (1).
Theorem 4. For part (b) use the vector n- 1/ Z (1 1 ... l]T. 0
Theorem 1. Let R(x) be the Rayleigh quotient defined by a Hermitian
matrix H, and let the subspaces ~ h , iln (associated with H) be as defined
in the previous paragraph. Then,jor i = 1,2,... , n,thenumbers
Al = min R(x) (2)
8.2 Courant-Fischer Theory and the Rayleigh Quotient 0 .. #& 911

are the eigenvalues ofH. Moreover, the minima in Eq. (2) areattainedon eigen-
vectors XI of H.
In this section, the variational description of extremal eigenvalues of
a Hermitian matrix achieved in the preceding section will be extended to PROOF. Observe first that the case i = 1 is just that of Theorem 8.1.4
all ofthe eigenvalues. For this and other purposes the notion ofthe Rayleigh =
written in the form (1). Let U [Xl X2 x n] be the matrix formed by
quotient for an n x n Hermitian matrix H is useful. We define it to be the set of orthonormal eigenvectors of H (as in Theorem 5.2.1), and recall
that CCj is spanned by Xi' Xi+ h " ., X n Then for any X E il, there are numbers
Rn<x) == R(x) ~ (Hx, x) x'# O. IX/> cx/+ l' , CXnsuch that X = D=I
cx)x}. Introduce the vector IZE en defined
(x, x) , by
Thus, R is a real-valued function defined by H, whose domain is the nonzero at = [0 . .. 0 a, aj+ 1 " an]T, (3)
vectors of en. SO that X = UIZ, and observe that, using Theorem 5.2.1,
It is clear that, replacing x by cxx, the Rayleigh quotient remains invariant. n
Hence, in particular, (Hx, x) = x*Hx = *U*HUrx = rx*Drx = L Aj 1IXj 12 (4)
i=i
min (Hx, x) = minR(x)
(.".,)= 1 .,'1'0 Also we have (x, x) = D=l lail 2
and so
max (Hx, x) = max R(x),
(".")=1 .,'1'0

and the result of Theorem 8.1.4 can be rewritten in the form


R(x) = Ct AjIIXjI2)/Ctl IIX)I
Z
) .
(5)

Since Al S; AI+ 1 ~ . ~ An' it is clear that R(x) ~ Al for all lXi' IXI+ l' ... , IXn
Al = min R(x), In= maxR(x). (not all zero, and hence for all nonzero X e rei) and thatR(x) = A/ when OCI '# 0
.,'1'0 ., .. 0 and IXI+ 1 = OCi+ 2 = ... = IXn = O. Thus relation (2) is established. Note that
and these extrema are attained on eigenvectors of H. the minimizing vector of<ifi is just CX/X,' an eigenvector associated with Ai'
Now let Al ~ Az ~ ... ~ An be the eigenvalues of an n x n Hermitian The following dual result is proved similarly.
matrix H and suppose that Xl' X2"'" X n are associated eigenvectors fonn-
ing an orthonormal basis. Denote by ~p the subspace of en generated by the Exercise 1. In the notation of Theorem 1, prove that
vectors x p' X p+ l' , Xn (1 ~ p ~ n). Obviously, rep has dimension n - p + 1, AI = max R(x), i = 1,2, ... , n,
we have the chain of inclusions 0'1'",&91"_1. I

en = CC 1 :::> ~2:::> :::> ren , Where the maxima are attained on eigenvectors of H. 0
288 8 THB VARIATIONAL M1lTHOO 8.3 THB STATIONARY PROPERTY OF TUB RAYLEIGH QUOTIBNT 289
The result of Exercise I, like that of Theorem I, suffers from the disad- Theorem 2. (Courant-Fischer). In the previous notation,
vantage that A) (j > 1) is characterized with the aid of the eigenvectors
xn-J+ 10 , x n Thus in employing Theorem 1 we must compute the eigen- AJ == max min R(x) (8)
9'J O#u9'J
values in nondecreasing order together with orthonormal bases for their
eigenspaces, The next theorem is a remarkable result allowing us to charac- for j = 1,2, ... , n, or. in dual form,
terize each eigenvalue with no reference to other eigenvalues or eigenvectors. An - J+ 1 == min max R(x). (9)
This is the famous Courant-Fischer theorem, which originated with E. 9'J O#ZIl9'J
Fischer! in 1905 but was rediscovered by R. Couranf in 1920. It was Courant . . In other words, Eq. (8) says that by taking the minimum of R(x) over all
who saw, and made clear, the great importance of the result for problems in nonzero vectors from an (n - j + I)-dimensional subspace of en, and then
mathematical physics. We start with a preliminary result. the maximum of these minima over all subspaces of this dimension in en, we
Proposition 1. Let 9J (1 S j Sn) denote an arbitrary (n - l + 1)-dimen~ obtain the jth eigenvalue of H.
sional subspace o/Cn, and let H be a Hermitian matrixwith eigenvalues A1 ~ PROOF. To establish Eq. (8) it suffices,in view of the first statement of Eq.
A2 S ... S; An and Rayleigh quotient R(x). Then (6),to show the existence of an (n - j + l)-dimensional subspace such that
min R(x) S; A), max R(x)~ An- J +!. minR(x), where x varies over all nonzero vectors of this subspace, is equal
O,,"xe9'1 O#"e9'1 to A). Taking ilJ == span{xJ' x)+1t ... Xn }, we obtain such a subspace. In
fact, by (6),
PROOF. Let X .. X2' ' Xn be an orthonormal system of eigenvectors fo~
H corresponding to the eigenvalues A1 S; A2 S .. S; An of H, and let !Jj;:it min R(x):s AJ ,
span{x1' X2l. , x)} for j == 1,2, ... , n. Fix j and observe that, since diUlt O#xeWJ
9J + dim fl'J == (n - j + 1) + j > n, there exists a nonzero vector Xo e whileby Theorem I,
belonging to both of these subspaces (see Exercise 3.14.3). In partieul
since Xo e!Jj it follows that xo == 11= 1 tXt and hence min R(x) == AJ
O#xe'ttJ .
J J) J
(Hxo, xo) == ( r A.t"x", r tXt ==:' r At loctl 2 This means that the equality (8) holds. Equation (9) is established similarly.
j,


t= 1 t=l "=1

Thus obviously
J J
,xercise 2. Check that for H E IRa l< a, H > 0, and j == 2. the geometrical II
I: 11%11 :S (Hxo. xo) :S AJ 11%11
"=1
A,1
2

t=1
r 2 terpretation of Eq. (8) (as developed in Exercises 5.10.5 and 5.10.6) is the
owing: the length of the major axis of any central cross section of an
i\

2 lipsoid (which is an ellipse) is not less than the length of the second axis
and, dividing by ll=11,,1 - (xo, xo), we obtain
the ellipsoid, and there is a cross section with the major axis equal to the
A1 S; R(xo) :S AJ nd axis of the ellipsoid. 0
Recall now that Xo also belongs to fIJ and that our immediate pu
is to establish the first relation in (6). But this follows readily from (7) si
min R(x) S; R(xo) S; AJ. 8.3 The Stationary Property of the Rayleigh Quotient
'_x",9'J

A The second statement of the proposition is proved similarly on reple


9j by iln - The eigenvalues of a real symmetric matrix 'can be described in terms of
J+ 1
Rayleigh quotient in another way. Let A denote a real symmetric matrix
. ,g in IR". We may then view R(x) associated with A as a real-valued func-
t MonatsheftejUr Math. wrd Ph". 16 (1905): 234-409. ofthe n independent real variables Xl. X2' , X n that make up the com-
l Math. Zeitsehrift7 (1920): 1-57. . ,entsof x. We write R(x) == R(x 1, X2' ' xn) .

\,1!~!
!

,
290 8 THE VAlUATIONAL METHOD 8.4 PaoBLllMS WITH CoNSTRAINTS 291

Theorem 1. If A is a realsymmetric matrix with an eigenvalue Ao andassoci- those nonzero members of C" that satisfy an equation (or constraint) of the
ated eigenvector xo, then R(x l' X2' , xn) has a stationary value withrespect form a*x = 0 where a* = [IX1 IX:z IX,,], or
to Xl' X2, ... , X. at Xo = [x1 x~ X~]T and R(xo) = A.o. IX1X 1 + IX2X2 + ... + IXnX" = 0,
PROOF. We have to show that for k = 1,2" .. , n, the derivatives oR/ox" where not all the (X} are zero. Clearly, if rtJ :1= 0 we can use the constraint to
vanish when evaluated at X o' Indeed, representing (Ax, x) as a quadratic express x} as a linear combination of the other x's, substitute it into R{x),
form, and writing x = [Xl X2 . .. xn]T, we have and obtain a new quotient in n - 1 variables whose stationary values
(and n - 1 associated eigenvalues) can be examined directly without
(Ax, x) = xTAx = r"
I,i= 1
aljxlxj' ,reference to constraints. As it happens, this obvious line of attack is not the
most profitable and if we already know the properties of the unconstrained
Since alj = aJIt it follows that system, we can say quite a lot about the constrained one. We shall call
o n
eigenvalues and eigenvectors of a problem with constraints constrained
-;-(Ax, x) = 2 L akJxJ eigenvalues and constrained eigenvectors.
uXa: J-1 More generally, consider the eigenvalue problem subject to r constraints:
or, equivalently, (x, "1) = "tx = 0, i = I, 2, ... , r, (I)
iJ
-;- (Ax, x)
ox,
= 2A".x, where" I' "2' ... , a, are linearly independent vectors in C".Given a Hermitian
matrix H E C""., we wish to find the constrained eigenvalues and constrained
I
1
eigenvectors of H, that is, the scalars A. E C and vectors x E Cn such that ij
il
where Ak* denotes the kth row of A.
In particular, setting A = 1 in Eq. (1), we obtain Hx =.tx, x:l= 0, and (x, "1) = "rx = 0, i = 1,2, ... .r. (2)

o Let JlIr be the subspace generated by" l''':Z' ... .e, appearing in Eq. (1) -\
-;- (x, x) = 2xk and let Ifn - , denote the orthogonal complement to JlIr in CD:
uXa: 1
JlI, EB ~"-r = C". I
and hence
If 61> h 2 , , h.- r is an orthonormal basis in Wn - , and
oR 0 (Ax, x) (x, x){2Ak*x) - (Ax, x)2x"
-=---=
aXA; OXa: (x, x) (x, x)
2 B = [6 1 62 6n ':' , ] ,
< I
I
then, obviously, BE cn (. - , ), B*B = 1"_,, and BB* is an orthogonal pro-
x
Evaluating this at Xo = [x1 X~ .. X~]T, where Axo A.oXo and thus' = jector onto' ~n-, (see Section 5.8). ~
=
A,,*xo A.oX~, we deduce that the numerator in Eq. (2) is equal to zero:' The vector x E C satisfies the constraints of Eq. (1) if and only if x E ~"-r' ~
(xo, xo)(2A,,*xo) - {Axo, xo)2x: = (xo, xo)2A.ox: - A.o{xo, xo)2x: = O. That is, BB*x = x and the vector y = B*x E cn-' satisfies x = By. So, for
x oF- 0,
This completes the proof.
RH(x) = (Hx, x) = (HBy, By) = (B*HBy, y)
(x, x) (By, By) (B*By, y)

8.4 Problems with Constraints Since B*B = 1.-" we obtain


R(x) == R,rtx) = R B lIB (y ) (3)

Long before the formulation and proof of the Courant-Fischer theore111i and the problem in Eq, (2) is reduced to the standard form of finding the
mathematicians were faced' with eigenvalue computations that arose front stationary values of the Rayleigh quotient for the (n - r) x (n - r) Herm-
maximization problems subject to a constraint. In the variational approach; itian matrix B*HB (Section 8.3). We have seen that these occur only at the
\ ;'
this means that we seek the stationary values of Ru{x) where x varies over. eigenvalues of B*HB. Thus A. is a constrained eigenvalue of H with associated <\ ~.
\:\~.;
it

Ii:
tl.!
'I
~~
292 8 TIm VAlUATIONAL METHOD 8.4 PROBLIlMSW1TI1CONSTRAINTS
293

eigenvector x if and only if A is an eigenvalue of B*HB with eigenvector y, PROOF. First recall the definition of the space X I in Proposition 1 and ob-
where X = By. serve that dim X I = n - r - i + 1. It follows from Theorem 8.2.2 that
Let the constrained eigenvalues be Al ::S; A2 ::S; :s: A..-. and let
""1+. = max min R(x) ~ min R(x)o
Yl' Y2' .. , Yn-. V 1 + r O'/J:.ze9'.+r (J.aeX.

be associated orthonormal eigenvectors of B*HB. Then Xl = BYl' ... x.. J _.= By virtue of Proposition 1,
BYn-. are the corresponding constrained eigenvectors. They are also ortho-
normal since, for i,j = 1,2, ... , n - r;
""H. ~ o",,,,.x,
min R(x) = AI'

Dlj = (Xl> Xj) = (BYI' BYj) = (B*BYI' Yi) = (,1' 'i)' . and the first inequality is established.
For the other inequality we consider the eigenvalues VI :s: V2':S: :s: Vn
For i = 1,2, ... , n - r, let XI be the subspace in Cn generated by the
vectors x" XI+ it x" r By first applying Theorem 8.2.1 to B*HB in.
of the matrix -H. Obviously, VI =
-"""+1-1' Similarly, let the eigenvalues
of -H under the given r constraints in (1) be OJ 1 :s: OJ2 :s: .. :s: co,,_.. where
0 , r

CIt - r and then transforming the result to c n and using Eq, (3), we easily
derive the following result. OJI = -A,,-.+I-1 for i = 1,2, ... , n - r.
Proposition 1. If Al :s: A2 :s: .. :s: A..-. are the eigenvalues of the con~, Applying the result just proved for H to the matrix - H, we obtain
strained eigenvalue problem (2) with associated orthonormal eigenvectors '~ .
Vi+. ~ COl'
Xl' X2 , , X n - .. thenfor i = 1,2, ... , n - r,
for i = 1, 2, 0 , n - r, and hence
AI = min R(x),
/I~"'.X.
-:,."..+l-l-r ~ -A,,-r+l-I'
where XI is the subspace generated by XI' xl+ .... 0, .'t,,_ r: Writing j =
n + 1 - r - i, we finally have Jl.j :s: Aj for j = 1, 2, ... , n - r,
Similarly, using Exercise 8.21, we have another result. and the theorem is proved.

Proposition 2. With the notationofProposition 1, Note the following particular case of this theorem in which the "inter-
lacing" oCthe constrained and unconstrained eigenvalues is most obvious.
I
~ = max R(x), i = 1, 2, ... , n- r,
o"'....x,,-.- ..... toroUary 1. If Al :s: A2 :s: ... :s: A..-l are the eigenvalues of H with one
where X n- r- I+1 is the spacegenerated by xn -.-1+ l' .. , x n - . ~pnstraint and Jl.l :s: ""2 :s: .. :s: Jl.n are eigenvalues ofH, then c I
We consider now the relationships between the eigenvalues ofa constrain'
eigenvalue problem and those of the initial unconstrained problem.
""1 :s: Al :s: Jl.2 :s: ... ::S; IJ...-l s A,,-1 :s: IJ.... I
main result in this direction is sometimes known as Rayleigh's theorem;
although Rayleigh himself attributes it to Routh and it had been used eve
Note also that the eigenvalue problem for H with linear constraints (1)
n be interpreted as an unconstrained problem for a leading principal
I
bmatrix of a matrix congruent to H.
earlier by Cauchy. It is easily proved with the aid of the Courant-Fisch
theorem.
., More precisely, let H e Cn x n be a Hermitian matrix generating the Rayleigh
otient R(x) and let B denote the matrix introduced at the beginning of the
I
Theorem 1. If Al S;; A2 :s: .. s An - . are the eigenvalues of the eigenvalu
problem (2)for a Hermitian He cn"n with r constraints, then
""1 S;; AI ::S; ""1+..
ctioe, that is, B = [b 1 bl, ... bn - . ] , where {bj}j:~ forms an ortho-
rmal basis in fI1n- .. the orthogonal complement of.Jll, in C". Consider the
atrix C = [B A], where the columns of A form an orthonormal basis in
I
J
i = 1, 2, ... , n - r,
~, and note that C is unitary and, therefore, the matrix C*HC has the ~
where Jl.l ::S; ""2 ::S; :s: ""n are the eigenvalues of the corresponding unc(J!J; ,eeigenvalues as H. Since i:
strained problem H X = J!X, X .p. o. i
B*] ~:!
tRayleigh, J. W. So, The Theory of Sound 2nd Ed., 1894. Revised, Dover, New York, 1 C*HC = [ A* H [B A] = [B*HB
A*HB
B*HA]
A*HA ' ~
\
(~
~
Vol. I, section 92(a).

I~
1:.1,
294 8 THE VAll.IATlONAL MI!THOD 8.5 THE RAYLEIGH THEOREM AND DEFINITE MATRICES 295
B*HB is the (n - r) x (n - r) leading principal submatrix of C*HC. Thus, SOLUTION. For the eigenvalues Al ~ A2 ~ .. S; An- , of a -principal sub-
the constrained eigenvalue problem for H described above is reduced to matrix H n - , of H we have, using Eq. (1),
finding the eigenvalues of a leading principal submatrix of a matrix congruent
to H. This last problem is interesting in its own right and will be considered o < III S Al S Az S ... S An - "
in the next section. where III denotes the smallest eigenvalue of the positive definite matrix H.
Exercise 1. Consider the matrix A of Exercise 5.2.8. Examine the con- But then det H n - , = A1A z . An-, is positive and the assertion follows. 0
strained eigenvalues under one constraint for each of the following constraint We can now develop a different approach to definite matrices, one in
equations and verify Corollary 1 in each case: which they are characterized by determinantal properties.
(a) Xl = 0;
(b) Xl - x 2 = 0; Theorem 2 (0. Frobeniust), An n x n Hermitian matrix H is positive
definite if and only if all of its leading principal minors,
I

(c) Xl - x3 = O. 0 Ii

HG), HG ~), ... , H(l 2 n), (2) Ii


1 2 n ~
11

~
8.5 The Rayleigh Theorem and Definite Matrices are positive.

PROOF. If H > 0 then, by Exercise 1, all of its principal minors and, in


Let H be an n x n Hermitian matrix and let H n - , be an (n - r) x (n - r) particular, the leading principal minors of H are positive.
! Conversely, let the minors (2) be positive. Denote by H" (1 :s; k :s; n) the
Ii
principal submatrix of H, and 1 S r < n. The following result can be viewed
as a special case of Theorem 8.4.1.
Theorem 1 (Rayleigh). If Al S Az S ... S An - , are the eigenvalues
H n - , and III S 112 S ... S Iln are the eigenvalues of H, then for i
k x k leading principal submatrix of H; we prove by induction on k that the
matrix H; = H is positive definite.
.. First, for k = 1 the 1 x 1 matrix HI is obviously positive definite. Then
suppose that the matrix H~ (1 S k S n - 1) is positive definite and let us
I
1,2, ... , n - r, show that Hk+l is also positive definite. Note that the. assumption H~ > 0 1
means the positivity of the eigenvalues A.\~) s A~) S ... ~ A1/l) of HII If now
III S AI S PH"
1.\k+ 1) S .t<f+ 1) S ... S A~: I) are the eigenvalues of HH I' then Theorem 1
PROOF. If Hn - , is not the (n - r) x (n - r) leading principal submatrixd yields
H we apply a permutation matrix P so that it is in this position in the trans
formed matrix pTHP. Since p- 1 = pT, the matrix pTHP has the san
spectrum as H, so without loss of generality we may assume that H n - .
.
\
1(k+I)
I\.J
<
_
1(~)<
1\.1 -
1(k+l).r
1\.2 ~
< A(k+1) .r 1(~)
_" ~ I\./l -
< A(Hl)
~+ I ,

.and the positivity of the eigenvalues ).~~) (i = 1, 2, ... , k) implies positivity


I
,
1
the (n - r) x (n - r)leading principal submatrix of H. of the numbers A~+ I), .Il~+,l) ,lU f). But also .Il~+ 1) is positive since
In view of the concluding remarks of the previous section, the eigenvale
the positivity of the minors in (2) implies
of H n - , are those of an eigenvalue problem for CRC* with r constrain

I
'II~+l) > 0
where C is some unitary matrix. This also follows from the easily check, d et 'H ~+1-1\.1
- 1(/l+1)1(/l+1)
JL2 "'1\."'+1
relation B*(CHC*) B = H n - , . Appealing to Theorem 8.4.1, we obtain
Thus, all the eigenvalues of H~+l are positive, that is, HH1 is positive
ill S AI S ill+r> i = 1,2, ... , n - r, definite. , . i
i',
where ill' ilz, ... , j1,. are the eigenvalues of CHC*. It remains now to ob
Note that, in fact, all the leading principal submatrices of a positive
that, because C is unitary, ilJ = IlJ for j = 1, 2, ... , n and this completes
definite matrix are positive definite.
proof.
Exerdse 1. All principal minors of a positive definite matrix are positi t Silzungsber der preuss. Akad. Wiss. (1894). 241-256 Mll.rz. 407-431 Mai.
296 8 TIm VAlUATIONAL MIlTHOD
8.6 TIm JACOBl-GUNDllLFlNGEIl-FaoBBNlus METHOD 297

Exercise 1. Confirm the following criterion for negative definiteness of of HIc+ 1 is positive. Furthermore. die = A~)A~) ... A~ll:). Consider the case
an n x n Hermitian matrix: H < 0 if and only if the leading principal minors dt > O. Then in the sequence 1. d u ... dt dH I the sign is changed the same
det Hll: (k = 1.2... n) of H satisfythe conditions det HI < 0 and number of times as in the sequence 1. d l die. and hence our aim is to
showthat HIc+ I has the same number of negative eigenvalues and one more
(det H.)(det HH 1) < 0 positiveeigenvalue than Hie'
fork=I.2..... n-1. 0 Indeed. let Hie have m (l S m S k) negative and k - m positive eigen-
values:
Thus. a Hermitian matrix H = [h,i]:.i= 1 is negative "definite if and only ).~) S ." S ).~) <0< A.::~1 S ... S ).~Ie).
if hl 1 < 0 and the signs of its leading principal minors alternate.
The tests for positive and negative definiteness presented in this section ByTheorem 8.5.1.~) S ~U) S ~~ 1 and two cases can occur:
are often used in practice since they are readily implemented from a com- ~;t' <0 or ~U) > O.
putational point of view. Some generalizations of these results will be con-
sidered in the next section. The first inequalityis impossiblein the case die > 0 and dH I > 0 presently
beingconsidered. In fact. if
A\Hl) S A\ll:) S ... S A~) S ~t:) < 0 < ~~I S ... S 11ll::l).

8.6 The Jacobi-Gundelfinger-Frobenius Method wehave


signdk+1 = sign(A.~+I)A.f+I) ... lUl) = (_l)m+1.
sign dll: = sign(l\ll:) l~) .. . l~) =: ( -1 Y".
The following result is a generalization of the Sylvester theorem. For a
a contradiction with the assumption sign dlc+ I = sign dll: = 1.Thus. A.~~ P> 0
general Hermitian matrix H, it gives a criterion for finding the inertia of H;
and HIc+ I has one more positive eigenvalue than Hll:.
as defined in Section 5.5.
Other cases are proved similarly to complete an inductive proof of the
Theorem 1 (G. Jacobil). Let HI. H 2 H; be the leading principal' theorem.
submatrices of a Hermitian matrix HE Cn K n of rank r, and assume thai
Consider the matrix
d. = det Hll: =1= 0, k = 1.2, ... , r, Then the number of negative (respectivel~;
positive) eigenvalues of H is equal to the number of alterations (respectively; .
constancies) ofsign in the sequence

PROOF.
1.d l d2 , dr'
We prove this result by induction. If k = 1, then d 1 is the (unique)'
H= [~! ~0 J]
associated with the real Hermitian form discussed in Exercise 5.li.2(a).
Since
eigenvalueof HI and obviously d I is negative or positive according as theret
is an alteration or constancy of sign in the sequence 1, d1
Suppose that the statement holds for the leading principal submatris'
., d 1 = 2, d2 = det G~] = -t. d 3 = det H = t.
H., that is, the number of negative (respectively, positive) eigenvalues of the number of alterations of sign in the sequence 1.2, -!.! or. what is equiv-
H" is equal to the number of alterations (respectively. constancies) of sign alent.in the sequenceof signs +. +, -, + is equal to 2, while the number of .1
in the sequence 1,d1 d2 , dll:' The transition to Hll:+ I adds to this sequence l:Onstancies of sign is 1. Thus. n(H) = 1. v(H) = 2, and the first canonical
the nonzero number dk+l = det Hk+l' If dH I > O. then (see Eq. (4.11.3 form of the corresponding Hermitian form contains one positive and two
the product of all the eigenvalues . negative squares. 0
llf+1) S ).~+ I) S ... S lfU) It is clear that this method of finding the inertia (if it works) is simpler
than the Lagrange method described in Section 5.11. However, the latter
also supplies the transforming matrix.
t JormwJfiir die Reine Und .4.ngew. Math. 53 (18S7), 26S-270.
,'
1\
l

1
\1
298 8 THE VARIATIONAL METHOD 8.6 THE JACOBl-GUNDELFINGIlR-FROBIlNIUS METHOD 299
A disadvantage of the Jacobi method is the strong hypothesis that the associated with the quadratic form in Exercise 5.11.1. Compute the number
leading principal minors be nonzero. This condition is relaxed in the gener- of alterations and constancies of sign in the sequence
alizations due to Gundelfinger and Frobenius.
1, d1 = 0, d2 ::: -I, d3 = -1,
Theorem 1 (S. Gundelfingert). With the notation of Theorem I, suppose
that the sequence (1) contains no two successive zeros. If det H, ':F 0, then or equivalently, in the sign sequences
the assertion of Jacobi's theorem remains valid if any sign is assigned to
each(isolated) zero dl ::: 0 in (1). +,+,-,- or +,-,-,-
In other words, for computing the number v(H), single zeros in the sequence Using the Gundelfinger rule, we deduce v(H) = I, 1C(H) = 2. 0
(1) can be omitted.
Theorem 3 (G. Frobenius"). If, in the notation of Theorem 1, det H, ':F 0
PROOF. Assume inductively that the hypothesis holds for HI' H 2 , , H". and the sequence (1) contains no three successive zeros, then Jacobi's method
If det HIl+1 ':F 0, then the appropriate part of the proof of Jacobi's theorem can be applied by assigning the same sign to the zeros dl + 1 = diU = 0 if
can be repeated to show that the transition from 1, d l ' ,d" (with signs didi+ 3 < 0, and different signs if djdi+ 3 > O.
instead of zeros) to 1, d1 , , dll+ 1 adds an alteration or a constancy of sign
(between dIe and d1l+ 1 ) according as HIl+ 1 has one more negative or positive PROOF. We only sketch the proof, because it relies on the same argument
eigenvalue than H". as that of the previous theorem.
Suppose that dll+ 1 = det HH 1 = O. Then at least one eigenvalue A\1l+ 1) Let dH 1 = dk+ 2 = 0, d" ':F 0, dH 3 =1= O. As before, there is exactly one
(1 ;$; i ;$; k + 1) must be zero. Since dIe ':F 0 and dll+ 2 ':F 0 there is exactly zero eigenvalue of HHl' say, ~n). Similarly, it follows from ..111<+3) ':F 0
one zero eigenvalue of Hu t- Indeed, if AIH 1) = A~1l+ 1) = 0 (t < s), then by (i = 1,2, ... , k + 3) and Theorem 8.5.1 that there exists exactly one zero,
applying Theorem 8.5.1 there are zero eigenvalues of H" and HU 2 ' which A~+\2) = 0 (0 ~ S;$; k + I), of HH2' Using Theorem 8.5.1 to compare the
yields dIe = dll+ 2 = O. eigenvalues of HH 1 and HIl+ 2 ,it is easily seen that either s = m or s = m + 1.
=
Let A~t t) 0 (1 s m ~ k - 1). Then it follows from The distribution of eigenvalues in the case s ::: m looks as follows:

A~) < ~tf) = 0 < ..t~~1 A.~) < ~:l'= 0, A.~+11 < ~:f) = 0 = A.~::::),
and ~+2) s A~t~) < A~:::f) = 0 < A~n).
A~n) < A~t P = 0 < A.~ti) (2)
Hence the matrices H", HH10 H1<+2' and HH3 have, respectively, m, m, m,
that the number of negative eigenvalues of H" is m and the number of negative
and m + 1 negative eigenvalues.
eigenvalues of HIl+2 is m + 1. Hence sign dIe = (_l)m, sign dll+ 2 = (_1)111+1,
Thus, for s = m we have d"dk+ 3 < 0 and, in this case, H H 3 has one more
and therefore d"dll+ 2 < O.
negative eigenvalue than H. Assigning to the zeros d"+I' d"+2 the same sign,
If ..t~+I) = 0, then all eigenvalues of H" are positive and so sign d" = 1,
We obtain only one alteration of sign in the sequence d", dH 10 dH 2 , dH 3 ,
Also, we may put m = Oin(2)andfind that sign dll+ 2 = -1. Thusd"dll+ 2 < 0
which proves the theorem in this case.
once more. A similar argument gives the same inequality when ..t~":l = 0.
It is similarly found that if s = m + 1, then d"dH 3 > 0, and the assertion
It is now clear that, by assigning any sign to dll+ 1 = 0 or omitting it, we
of the theorem is valid.
obtain one alteration of sign from d" to dll+ z- This proves the theorem.
EXll1IIple 2. Consider the matrix It should be emphasized that, as illustrated in the next exercise, a further
improvement of Jacobi's theorem is impossible. In other words, if the

=
01 0I 0]I sequence (1) contains three or more successive zeros, then the signs of the
H
[o I I
leading principal minors do not uniquely determine the inertia character-
istics of the matrix.

t Jouma/jur die Reineund Angew. Math. 91 (1881),221-237. t SilZUngsber. der preuss, Akad. Wlss. (1894), 241-256 Marz, 407-431 Mai.
300 8.7 AN ApPLICATION OF THE COURANT-FiSCHER THEoRY 301

Example 3. The leading principal minors of the Hermitian matrix Exercise 1. Check that HI ~ H 1 ~O if

0 0 0
o a 0 0 bJ HI =.[2 1]
1 2' H2 =
[2 I)
1 1 '
H= 00 a 0' a,bO,
[ while H~ - H~ fails to be a positive semi-definite matrix. 0
bOO 0
Applying the Courant-Fischer theory, a relation between the eigenvalues
are of matrices HI and H 2 satisfying the condition HI ;;:: H 2 can be derived,
This can also be viewed as a perturbation result: What can be said about the
d1 = d z = d3 = 0, d4 = aZbz < O. perturbations of eigenvalues of Hz under addition of the positive semi-
On the other hand, the characteristic polynomial of His (..1.z - b2)(A - a)2 definite matrix H t - H 2 ?
and hence u(H) = {b, -b, a, a}. Thus for a > 0 we have n(H) = 3, v(H) = I,
while for a < 0 we have n(H) = I, v(H) = 3, and the inertia of H is not Theorem 1. Let HI' H 2 E'"'''' be Hermitian matrices for which HI;;:: H 2'
uniquely determined by the signs of its leading principal minors. 0 If Jitl :S p.~l ~ . :S J1:1 denotethe eigenvalues ofH A: (k = 1, 2) and the rank
of the matrix H = HI - H 2 is r, then
Note also that if the rank r of H fails to be attained on the leading principal
(a) p.i ll ;;:: p.!2) for i = 1, 2, , n;
submatrix H, of H that is, dr = 0, then the Jacobi theorem cannot be applied,
(b) p.~11 ~ p.w,r for} = 1,2, , n - r.
Exercise 4. Observe that the Hermitian matrix
PROOF. Observe that the condition H ;;:: 0 implies, by Theorem 5.3.3, that

H =
1 1 1]
1 1 1
(Htx,x) = H 2 + H)x,x) = (H 2x,x) + (Hx,x);;:: (H 2x,x)
[1 1 a for any x E 'II. Thus, if f/ is any subspace in ''', then
with a 1 has rank 2, while d z = det Hz = O. Furthermore, the leading min(H 1x, x);;:: min(H 2x, x), (1)
principal minors d 1 = 1, d2 = d3 = 0 do not depend on the values assigned to Where the minima are over all vectors x E f/ such that (x, x) = 1. In particular,
a, whereas the inertia of H does: considering all the subspaces 9j of dimension n - i + 1, where 1 :S i :S n,
det(U - H) = A[lZ - (a + 2)A + (2a - 2)]. we deduce from (1) and Theorem 8.2.2 that

Since for the nonzero eigenvalues AI' 1z of H we have A11z = 2a - 2 and, p.i 11 = max min (H,x, x);;:: max min (Hzx, x) = p.i2 )
v. "sV, v. "sV,
11 + A.z = a + 2,itfoUowsthatn(H) = 2ifa > 1 and 7t{H) = Hora < 1. 0 ("."1= 1 ("."1= 1
for i = 1, 2, ... , n. This proves part (a) of the theorem.
For the second part, assume that H has positive eigenvalues Vi'..' Vr
8.7 An Application of the Courant-Fischer Theory and .eigenvalues Vr + t =... = vn = 0, together with an associated ortho-
normal basis of eigenvectors x.. X2' ' x". For any xee" write x =
Consider two n x n Hermitian matrices HI and Hz. We say that H2 1S
D.l jXj' Then we have [see also Eq. (5.10.9)]
r
not greater than HI' written HI;;:: Hz or Hz :S H h if the difference HI - H 2 (Hx, x) = L Vj la j l2.
is a positive semi-definite matrix. Similarly, we write HI> H 2 or H 2 < HI' j=1
if HI - H 2 > O. These relations preserve some properties of the nat
WriteYj = vJ/2x jfor} = 1,2, ... , rand use thefactthat aj = (x, xj)(Exercise
ordering of real numbers (see Exercise 5.14.21).However, there are properti
3.12.3) to obtain
of numerical inequalities that fail to be true for Hermitian matrices. F, r r
instance, from HI;;:: H 2 ;;:: 0 the inequality H~ ;;:: H~ does not follow in L vj ljl2 = L I(x, yjW,
general. j= 1 1'= 1
302 8.8 APPLICATIONS TO THll THroR.v OF SMALL VIBRATIONS 303

and note that 11' ... ,1, are obviously linearly independent. Since H 1 Notice also that if xJ is an eigenvector of B, then fJ =A -112 XJ is a normal-
Hz + H, we now have mode vector for the vibration problem and, as in Bqs, (5.12.5),
,
(H1x, X) = (H:lX,x) + L I(X,II)I:l, qTAt" = ~J'" qrCt" = coJ~Jk'
'=1 On applying each theorem of this chapter to B, it is easily seen that a general-
and weconsider the eigenvalue problems for H 1 and H1. both with constraint ization is obtained that is applicable to the zeros of det(AA - C). As an
(X,11) = 0, i = 1,2, ... ,r. Obviously, in view of Eq. (2), the two proble . example, we merely state the generalized form of the Courant-Fischer
coincide and have the same eigenvalues Al S Az S .. s A..-,. Applyi theorem of Section 8.2.
Theorem 8.4.1 to both problems, we obtain . Theorem 1. Given any linearly independent vectors aI' a 2, ... , a'-l in
p.~") S; AJ S; p.~"~" k = 1,2; j = 1, 2, ... , n - r, R", let.9, bethe subspace consisting of all vectors in Oil" that areorthogonal to
til"'" a'-l' Then
which implies the desired result:
p.}11 S; AJ S; p.}':J,. A, = max min (xTCx)
T
9', 1tE9', x Ax .
..... 0

The minimum is attained when aJ = tJ,j = 1,2, ... , i-I.


1-
8.8 Applications to the Theory of Small Vibrations !. The physical interpretations of Theorems 8.4.1 and 8.7.1 are more obvious
In this setting. The addition of a constraint may correspond to holding one
E,oint ofthe system so that there is no vibration at this point. The perturbation
We return now to the problem discussed in Section 5.12. In that sectil nsidered in Theorem 8.7.1 corresponds to an increase in the stiffness of
we reduced a physical vibration problem to the problem of solving the matt:' e system. The theorem tells us that this can only result in an increase in
differential equation Ap + Cp = 0, where A is positive definite and C )e natural frequencies.
either positive definite or semi-definite. This then reduced to the p'
algebraic problem of using a congruence transformation to reduce A and
simultaneously to diagonal matrices. This, in turn, was seen to depend
the reduction of the matrix B = A-lI1.CA -111. to diagonal form by means
an orthogonal congruence transformation. .' . Ii
The natural frequencies of vibration of the system are given by the n
negative square roots of the zeros of det(A..t - C), which are also the eigl
values of B. We know now, therefore, that the largest natural frequenl .,
co" = ..t~/:l,isgivenby~ = maxR 1(x),whereR1istheRayleighquotientf6 I
and x :f:: 0 ranges over Oil" (see Exercise 8.2.1).Theorems 8.2.1and 8.2.2im: ~
i
that all the natural frequencies can be defined in terms of extreme values
R 1 However, observe that if we write x = A 1/ 2q, then

R1(x) =
xTA - 1 /1.CA - 111.x ,lCq
xxT = qTAq' II
The Rayleigh quotient appropriate to this problem is therefore chosen to.
qTCq
R(q) ="'-A .
q 'I
9.1 FUNCTIONS DIlFINED ON THE SPECTRUM OF A MATRIX 305

9.1 Functions Defined on the Spectrum of a Matrix

CHAPTER 9
Let A e;C"x" and suppose that AI> A2"'" A, are the distinct eigenvalues
of A, so that
meA) = {A - Al)"'I{A - A2 )"'2 {A - A,)hl' (I)
Functions of Matrices
is the minimal polynomial of A with degree m = ml + m2 + ... + mi'
Recallthat the multiplicity mle of Ale as a zero of the minimal polynomial is
referred to as the index oftheeigenvalue Ale (Section 6.2) and is equal to the
maximal degree of the elementary divisors associated with Ale (I S k S s),
Given a function f{A), we say that it is defined on the spectrum of A if the
numbers
feA'Ie)' roo. ... .r-: l){,,*), k = 1,2, ... , s, (2)
called the values off{A) on the spectrum ofA, exist. Clearly, every polynomial
over C is defined on the spectrum of any matrix from C"" ". More can be said
In this chapter we consider matrices A from en x" (including those
about minimal polynomials.
IR""" as a special case) and the possibility of giving a meaning to f(A:
where fell is a complex-valued function of a complex variable. We wouk Bxercise 1. Check that the values of the minimal polynomial of A on the
like the definition of f(A) to hold for as wide a class of functions f(l) spectrum of A are all zeros. 0
possible. We have seen that the question is easily resolved if f{l) = peA) is
f The next result provides a criterion for two scalar polynomials PI{A) and
polynomial over the complex numbers; thus,
P2{A) to produce the same matrix Pl{A) ::: P2{A). These matrices are defined
I I as described in Section 1.7.
peA) L PIAl if P(A) = L PIAl.
1=1 1-0 Proposition 1. If Pl(A.) and P2{A.) are polynomials over C and A e C""",
Moreover, if m(A) is the minimal polynomial for A, then for a polynomi then pl{A) = P2{A) if and only if the polynomials Pl{A) and P2{A) have the
peA) there are polynomials q{l) and rCA) such that ,same values on the spectrum of A.
peA) = m(..1.)q(..1.) + r(l), hOOF. If Pl{A) = P2{A) and Po{A) = p&1.)- P2{A), then obviously, Po{A)
::. 0 and hence Po{A) is an annihilating polynomial for A. Thus (Theorem
where rCA) is the zero polynomial or has degree less than that of meA). Thus: .2.1) Po{.il.) is divisibleby the minimal polynomial meA) of A given by (I) and
since meA) = 0, we have peA) = rCA). there exists a polynomial q{..1.) such that Po{..1.) = q{A)m{A). Computing the
The more general functions f(..1.) that we shall consider will retain .', values of Po{A) on the spectrum of A and using Exercise I, it is easily seen that
property; that is to say, given f(..1.) and A, there is a polynomial rCl) (wi
degree less than that of the minimal polynomial of A) such that f(A) = rCA); p\Jl{AIe) - p~){AJ = plf{AJ = 0 (3)
It is surprising but true that this constraint still leaves considerable freedo
in the nature of the functions f(l) that can be accommodated. ror j = 0, 1,... , m" - 1, 1 S k S s. Thus, the two polynomials pl{A) and
P2{A) have the same values on the spectrum of A provided Pl{A) = P2(A).
Conversely, if (3) holds, then pO(A) has a zero of multiplicity mle at the point
,'Ie for each k = 1, 2, ... , s, Hence p(){A.) must be divisible by m{..1.) in (O and
t follows that Po{A) = 0 (see Corollary 1 of Theorem 6.2.2)or, equivalently,
leA) = P2{A).

304
~I"
306 9.2 INTERPOLATORY PoL YNOMJAUl 307

It is this property of polynomials with matrix arguments that we now use for each 1 S k S s. By differentiation,
to extend the definition off(A) to more general functions f(A). Thus, we shall
demand that all functions that are defined on the spectrum of A and that
take on the same values there should yield the same matrixf(A). In particular,
pW)(),,.> = t (~)lt~)(A")IW-i)(A")
1=0 1

for any f(A) defined on the spectrum of A we shall be able to write f(A) ...' for 1 .:s; k .:s; s, 0 .:s; J .:s; m" - 1. Using Eqs. (3) and recalling the definition of
p(A), where p(A) is a polynomial with the same values on the spectrum of A. Ilt,,(l), we have for k = 1, 2, ... , s,j = 0, 1, ... , m" - 1,
The existence of such a polynomial p(A) with the prescribed properties
follows from the solution of the problem of general Hermite interpolation,
which we must now consider.
h.i = (~)iltx".,I/11-'-
1=0 1
il(l,.). (4)

Since 1/1,,(1,,) "" 0 for each fixed k, Eqs. (4) can now be solved successively
(beginning with j = 0) to find the coefficients ocle.o, .. , tXt. mk- 1 for which (3)
.holds. Thus, a polynomial peA) of the form given in (2) satisfies the required
9.2 Interpolatory Polynomials conditions.
Proposition 2. The polynomial peA) ofProposition I is unique.
Proposition 1. Given distinct numbers AI' A2''" ,A., positive integers ml The unique polynomial of degree less than m that satisfies conditions (1)
m2'.' m.with m =: D-l
mil' anda set ofnumbers is known as the Hermite interpolating polynomial.
. This is an important step in our argument but the proofis cumbersome and
h.o, h. h ' ' ' , h.mk-l> k = 1,2, ... , s, IS therefore omitted. It can be made to depend on the nonsingularity of a
.thereexists a polynomial p(A) ofdegree lessthan m such that

for k
p(l,.) = h.o,
= 1, 2, .. , s.
p'll(A,.) = h. h ... , plJllk-U(AJ = fi.. "'k- I
generalized Vandermonde matrix (see Exercise 2.11.22).
Exercise 1. If ml = m2 = ... = m. = 1, then conditions (1) reduce to the
imple"point wise" conditions ofinterpolation: p(lle) = h.o for k = 1,2, ... , s.
Show that these conditions are satisfied by the Lagrange interpolating
~olynomial :
I
1
PROOF. It is easily seen that the polynomial p,,(A) = txle(l)I/IIe(A),t whe i-;.,
l.:s;kSsand
txle(l) =: txle.o + txl.I(A - AJ + .., + ocl.mk-I(A -lle)"'k-l,
'.
pel) = L h.o,.
"=1
., I. . ". . , ,. ."
(1 - 11) '" (1 - 1,,-1)(1 - Ak+ I) ... (1 - 1.)
(5) -j
{

1/1,,(1) = n (A -
i=l.J1e
Aj)"'J,
xercise 2. Prove the result of Proposition 2 in the case described in
ercise 1, that is, when ml = m2 = ." = m. = 1.
~i

t
has degree less than m and satisfies the conditions into First show that a polynomial of degree less than m that vanishes at m
istinct points is necessarily the zero polynomial. This can be done using the
p,,(A,) = p~U(A,) = '" = p~ml-I)(;'l.i) =0 nsingularity of a certain Vandermonde matrix (see Exercise 2.3.6). 0

for i "" k and arbitrary ex".o, ex"...... txle.mk-I' Hence the polynomial A useful role is played by those polynomials for which numbers h. t in (1)
e all zero except one, and the nonzero number, say h.i is equal to 1. Thus
peA) = PI(A) + P2(A) + ... + p.(l) opositions 1 and 2 guarantee the existence of a unique polynomial fP"JA)
degree less than m (with 1 :s;; k S sand 0 :s;; j ::;; ml - 1) such that
satisfies conditions (1) if and only if
fPt](l,,) = bj ro r = 0, 1, ... , mA: - 1; (6)
p,,(A,,) = h.o, p~l)(A,.) = h.l"'" pL"'k-I)(A,.) = h.mk- I
d when i "" k,
t If s = I, then by definition "'1(1) = I. fPt](A,) = 0, r = 0, 1,... ,m, - 1. (7)
308 9.3 DEFINITION OF A FuNCfION OF A MATRIX 309
These m polynomials are called the jUndamental or cardinal polynomials SOLUTION. The eigenvalues of A are found to be Al = 3, A2 = 5, so that
of the interpolation problem. the minimal polynomial m(A) of A is (A - 3)(A - 5). The Lagrange inter-
Once these polynomials are known, the solution to every problem with the polatory polynomial is
general conditions (1) can be expressed as a linear combination of them. A - A2 A -AI
For, if P(A) satisfies (1) then, as is easily verified, P(A) = f(A 1) Al _ A + f(A 2 ) A - Al
2 2
, m,,-1 = -ie 6
(A - 5) + ie 10
(A - 3).
peA) = L L h.lp,,/..A). Hence, by definition,
10= 1 j=o
Exereise 3. Verify that the m polynomials tp,J..A) (1 S k ::;:; s and 0 ::;:; j ::;:; e2 A = -ie6(A - 51) + ie10 (A - 31)
m" - 1) just defined are linearly independent and hence form a basis for the 3e10 e6 _elO + e6 ]
_
space 9i'm-l of all polynomials of degree less than m. =i [ 3elO _ 3e6 _e 10 + 3e6 .
Exercise 4. Show that, with the hypothesis of Exercise 1, the fundamental Exercise 3. If
interpolatory polynomials are given by .

n, (1 - A I n' (1" - Aj) A=~ 0 1 1,


200]
tp"o(A) 55 tp,.(A) =
}= I.j#"
J)
}=1.}#"
[o 0 1 I
for k = 1,2, ... , s. 0 I'
i
4 4A = (00 01 0]O.
9.3 Definition of a Function of a Matrix
sin A = - A -
n
2"
n
2

0 0 1
Exercise 4. If f(A) and gel) are defined on the spectrum of A and h(A)
~ f(A) + Pg(A), where IX, p e C, and k(l) = f(A)g(l), show that h(l) and
I
'I
Using the concepts and results of the two preceding sections, a general k(l) are defined on the spectrum of A, and
definition of the matrix f(A) in terms of matrix A and the function I can be
made, provided I(A) is defined on the spectrum of A. If P(A)is any polynomiel'
heAl = qf(A) + Pg(A),
k(A) = f(A)g(A) = g(A)f(A).
I
that assumes the same values as f(A) on the spectrum of A, we simply defin~
f(A) A p(A). Proposition 9.21 assures us that such a polynomial exist~1
while Proposition 9.1.1 shows that, for the purpose of the formal definition,
the choice of p(A) is not important. Moreover, Eq. (9.2.8) indicates hoW
Exercise 5. If 11(1) = 1- 1 and det A :F 0, show that 11 (A) = A-I. Show
also that if f2(A) ;: (a - A)-I and a O'(A), then 12(A) = (al - A)-I. I
P(A) might be chosen with the least possible degree.
Hint: For the first part observe that if P(A) is an interpolatory poly-
nomial for ft()..), then the values of Ap(A) and lfl(A) = 1 coincide on the I ~
Exercise 1. If A is a simple matrix with distinct eigenvalues AI"'" A, an,
f(1) is any function that is well defined at the eigenvalues of A, show that
spectrum of A and hence AP(A) = Afl(A) = I.
.Exen:ise 6. Use the ideas of this section to determine a positive definite ,I
f(A) = f(Ak) [ n (A - liI)! Ii (Ak - AJ)]
.square root for a positive definite matrix (see Theorem 5.4.1). Observe that J
,,= 1 J= 1.J"'''
Exercise 2. Calculate I(A), where I(A) = e and 2A
J-l.}#"
. ,there are, in general, many square roots. 0
, We conclude this section with the important observation that if f(A) is a
! Ii
olynomial, then the matrix I(A) determined by the definition of Section 1.7

[6-1] .oincides with that obtained from the more general definition introduced
"

~ ;I~\.
A = 3 2' in this section.
;'~, ,

~!~ ...

,
\
310 9 FUNCTIONS OF MATRICES 9.4 PROPEllTIBS OF FUNCTIONS OF MATRICES 311

9.4 Properties of Functions of Matrices Theorem 3. Let AeC" and let J =diag[JJ]~"l be the Jordan canonical
Kl

form ofA, where A = PJP-l and J} is thejth Jordan block ofJ. Then

In this section we obtain some important properties of f(A), which, hi f(A) = P diag[f(Jl),f(J 2 ) , , f(J,)]P- 1 (3)
particular. allow us to compute f(A) very easily. provided the Jordan
The last step in computing f(A) by use of the Jordan form of A consists,
decomposition of A is known. These properties will also establish important
therefore, of the following formula.
relationships between Jordan forms for A and f(A).
Theorem 1. If A e eft lCft is a block-diagonal matrix, Theorem 4. Let J 0 be a Jordan block ofsize I associated with ).0;
A = diag[A 1 , A 2 , , Arl,
and the function f().) is defined on the spectrum of A. then ).o 1 ]
Ao ".
f(A) = diag[f(A 1 ) , f(A 2 ) , . . , f(A,)]. J
o
= [. ". ~.
PROOF. First. it is clear that for any polynomial q().),
q(A) = diag[q(A 1) , q(A,2),' .. , q(A,). If f(A) is an (I -I)-times differentiable function in a neighborhood of Ao,
then
Hence, if p(,t) is the interpolatory polynomial for f(,t) on the spectrum of A,
we have f(A o) .!l! f'(A o) ... 1
(I-I)!
f('-1)(A)
0
f(A) = p(A) = diag[p(A 1) , P(A2 ). " p(A,).
0 f(A o)
Second, since the spectrum of A j (1 Sj s t) is obviously a subset of the f(Jo) = I I. (4)
spectrum of A. the function f(A) is defined on the spectrum of A J for eae .! f'(Ao)
j = 1, 2,... , t. (Note also that the index of an eigenvalue of A) canm l!
exceed the index of the same eigenvalue of A.) Furthermore, since f(: f(A o)
0 '" 0
and p().) assume the same values on the spectrum of A, they must a
have the same values on the spectrum of A} U = 1,2, ... , t). Hen,
'ROOF. The minimal polynomial ofJ 0 is (A - Ao)'(seeExercise 7.7.4)and the
f(A J) = p(Aj ) and we obtain Eq. (1).
alues of f(A) on the spectrum of J 0are therefore f(A.o),/,(,1.0)... , f(I-1 )(AO)'
The next result provides a relation between f(A) and f(B) when A and' e interpolatory polynomial p(,t), defined by the values of f(,t) on the
are similar m a t r i c e s . " ,ectrum{Ao}ofJo,isfoundbyputtings = l,mA: = 1').1 = Ao,andy,I(A) == 1
Theorem 2. If A, B, Peen K n, where B = PAP-I, and f(A) is defined on Eqs. (9.2.2) through (9.2.4). One obtains
spectrum ofA, then '-I 1
f(B) = Pf(A)P-l.
p(A) = L 7j f(l)(Ao)(jl - AO)i.
1=0 Z.

PROOF. Since A and B are similar. they have the same minimal polynoro 'he fact that the' polynomial p(A) solves the interpolation problem p(j)(A o)
(Section 6.2). Thus, if p().) is the interpolatory polynomial for f(A) on fU)().o), I ::; j ::; I - 1, can also be easily checked by a straightforward
spectrum of A. then it is also the interpolatory polynomial for f(A) on lculation,
spectrum of B. Thus we have f(A) = P(A), f(B) = P(B), p(B) = Pp(A)P .' We then have f(J 0) = p(J 0) and hence
and so the relation (2) follows.
1-1 1
In view of the Jordan theorem, Theorems I and 2 imply a related theari f(J o) = L 7jj!iI(A.o)(Jo - Ao1)'
i=O Z.
'~
about functions of matrices.

\j\i
~"
312 9 FuNCTIONSOF MATRJC PROPERTIES OF FUNCTIONS OF MATRICES 313

Computing the powers of J 0 - Aol, we obtain ~~e,cise 3. Show that, if all the eigenvalues of the matrix A e en"n lie in
..e open unit disk, then A + J is nonsingular,
, 'die 4. If the eigenvalues of A E en"n lie in the open unit disk, show
o 1 0 .. , .......0 1 0 o t1itit. Re A < 0 for any A e O(C), where C = (A + Ir l(A - I). 0
:~fti-H

,.:1c The elementary divisors associated with a particular eigenvalue are not
o o
(J o - loI)i = I I = 1
lecessarily preserved under the transformation determined by f(A), in
that an elementary divisor (A - Air of AI - A is not necessarily carried
'. 1 o ~ver to an elementary divisor (A - f(Ai' for AI - f(A). Exercise 9.3.3
iiJustrates this fact by showing that the nonlinear elementary divisor associ-
o o fed with the eigenvalue n/2 of A splits into two linear elementary divisors
.the eigenvalue f(1t/2) = 1 of f(A). A description of the elementary divisors
and Eq. (4) follows. if AI - f(A) in terms of those of AI - A is given in the following theorem.
Thus, given a Jordan decomposition of the matrix A, the matrix f(A) is ,orem7. Let A1 , A2, ... , A. be the distinct ei(Jenvalues of A e en"n, and ':

easily found by combining Theorems 3 and 4. the function f(A) be defined on the spectrum of A. Ii 11
'(a) If/,(A,,) 0, the elementary divisors of AI - f(A) associated with the
Theorem S. With the notation of Theorem 3, igenvalue f(A,,) are obtained by replacing each elementary divisor (A - A,.)r
f(A) = P diag[f(J I)' f(J 2), . , f(Jt)]P-l, .iU - A associated with A" by (A - f(~r.
'i,(b) Iff'{A,,) = /,,(A,.) = ... = jl'-ll(A,,) = O,butP/I(A,,) 0, where I ~ r,
where f(JI) (i = 1,2, ... , t) are upper-triangular matrices of the form give, an elementary divisor (A - AJ of AI - A splits into r linear elementary
in Eq. (4). isors 01AI - f(A) of the form A - f(A,,).
(e) III'(A,,) = /,,(A,,) = ... = jl'-l l(A,,) = 0 butP'I(~) 0, where 2 S I
Exercise 1. Verify (using the real natural logarithm) that r - 1, the elementary divisors 01 AI - f(A) associated with f(A,.) are \
btained by replacing each elementary divisor (A _A,,)r of AI - A associated 1

In
( [
-26 22 02]) = [ -10 -22 01] [In02 In0 4 i0] [ -1-2
0 2 0., 1].-. i.th A" by I elementary divisors (A - AJP (taken I - q times) and (A - AJ"+ 1
en q times), where the integers p and q satisfy the condition r = Ip + q -!
o 0 2 1 0 0 0 0 In4 1 0 o S q s I - I, p > 0).
ote that the cases (a) and (b) can be viewed as extreme cases of (c).
Hint. Use Example 6.S.2. 0
ClOF. In view of Theorems 5 and 7.7.1, since each elementary divisor
We observe that if J, is associated with the eigenvalue A." then the diagon - 1,,)' of AI - A generates a Jordan block J" of size r associated with
elements off(J,} are all equal to f(A,}, and since the eigenvalues of a triangul , it suffices to determine the elementary divisors of AI - f(J,.).

I
matrix are just its diagonal elements. it follows that the eigenvalues of f( If!'(A,,) "-'F 0 in Eq. (4), then by Exercise 7.7.7 the unique elementary divisor
possibly multiple, are f(Al)' f(A2)' ... ,f(A.). Compare the next result wi 'I(J,,) is (A - f(A,,r, which proves part (a). Part (b) is checked by observing
Theorem 4.11.3. at I(J,.) is in. this case a diagonal matrix and applying Exercise 7.7.6.
e result of Exercise 7.7.8 readily implies the required statement in (c). ~
Theorem 6. If AI' A2, ... , An are the eigenvalues of the matrix A e C' 1:
and f(A) is defined on the spectrum of A, then the eigenvalues of f(A) Q, ercise 5. If B is positive definite. show that there is a unique Hermitian ';

f(Al), f(A2), ,f(l,,) trix H such that B = e". (This matrix is, of course, a natural choice for
B.) Give an example of a non-Hermitian matrix A such that eA is positive
Exercise 2. Check that eA is nonsingular for any A e C"'n. efinite. 0
314 9 fuNCTIONS OF MATRICES 9.5 SPECTRAL RESOLUTION OF f(A) 315

9.5 Spectral Resolution o~. f(A) Recall that, by definition, Z"o =


C'fJko(A).1f we apply the result of Exercise
4.14.3 to the polynomials tpA;o(A), k = 1, 2, ... , s,we see that ZkO is the sum
of the constituent matrices associated with the eigenvalue Ij; It is then easily
The following decomposition, or spectral resolution, of f(A) is important seen that ZA;O is idempotent, and the range of ZA;O is the (right) eigenspace
and is obviously closely related to the spectral theorem for simple matrices; associated with AA; (that is, the kernel of A" I - A). Thus, the result of Exercise
which we have already discussed in Theorem 4.10.3 and Exercise 4.14.3. 4.14.3 is seen to be a special case of Theorem 1. We deduce from it that for a
It gives a useful explicit representation of f(A) in terms of fundamental simple matrix A
properties of A.
Theorem 1. If A E C"l<n, f(A) is defined on the spectrum {Al' A2' ... , A,}
L ZkO
= I and ZA;OZjO = ~kjZA;O'
A;=1
of A, f".J is the value ofthe jth derivative off(A) at the eigenvalueA" (k = 1, 2, although the first of these results is easily seen to be true for all matrices
... , s,j = 0, 1, ... , m" - 1), and m" is the index ofAI" then there exist matrices AeCn lC " (see Exercise 1). We shall show in Theorem 2 that the second
Z"J independent ofI().) such that
result is also generally true.
Ihk-l
Finally, we note that, in view of Exercise 9.2.4,
f(A) = L L ft.JZ"j' (1)
"=1 J=O
Moreover, the matrices Z"J are linearly independent members of cnl<n and
z.; = tpko(A) = n.
j=; 1.j*k
(A - AjI) I n
j=1.j"
(AA; - .il).

commute with A and with each other. Exercise 1. If A e C" l< n prove that, in the previous notation,
PROOF. Recall that the function f().) and the polynomial peA) defined by
Eq. (9.2.8) take the same values on the spectrum of A. Hence f(A) = peA) (a) I = L: ZA;O and (b) A = L (AA;Z"o + Zu)
A;= I k=l
and Eq. (1) is obtained on putting Z"J = tp"tA). It also follows immediately
that A and all of the Z"J are mutually commutative.
Hint. Put f(A) = 1 in Eq. (1) for part (a) and f(A) = A. for part (b).
It remains only to prove that these matrices are linearly independent. Exen:ise 2. If A E C" l<" and A a(A), show that
Suppose that Ihk-I .,

Ihk-l RA(A) A (..11- A)-1 = L L r1 )~ ~J+IZr.j'


A;=1 j=O
L j=O
"=1
L C"jZ"j= 0 Hint. Use Exercise 9.3.5. 0
and define hOI.) = Lt.Jc"Jtp"t).); then heAl is a polynomial of the space Let us examine more closely the properties of the components of a general
Bl'm-l; that is, the degree of heAl does not exceed m - 1. Then Z"J = tptJ.A) matrix A E c"x ", Note particularly that part (c) of the next theorem shows
implies that heAl = O. But m is the degree of the minimal polynomial of A that the components ZI0,"" Z.o of any matrix are projectors and, by
and, since h annihilates A, it follows from Theorem 6.2.1 that h is the zero part (a), their sum is the identity. A formula like that in part (a) is sometimes
polynomial. Hence, since the tp,,/s are independent by Exercise 9.2.3, we known as a resolution ofthe identity.
have c"J = 0 for each k andj. Thus, the matrices Z"j are linearly independent.

We shall call the matrices Z"j the components of A; note that, because they
Theorem 2. The components ZA;j U = 0, 1, ... , mA; - I; k = 1, 2, ... , s) of
the matrix A e C" X" satisfy the conditions

are linearly independent, none of them can be a zero-matrix. (a) L: Z"O = I;
"=;1
Let us investigate the case of simple matrices more closely. We now have
ml = m2 = ... = m. = 1 in (1) and so, for a simple matrix A, (b) Z"pZI = Oifk #:: I;
ZrJ = Z"J ifand only ifj = 0;
L: ftZ"o,
(c)
f(A) = ft = ft.0' k = 1,2, ... ,8.
"=1 (d) Z"OZk, = Zkr' r = 0, 1, ... , mr. - 1.
316 9 FUNCTIONS OF MATRICES 9.S SPECTRAL REsOLUTION OFf(A) 317

PROOF. The relation (8) was established in Exercise 1. However, this and the function can be represented as in Eq. (1); only the coefficients f".j differ from
other properties readily follow from the structure of the components Z"j' one function to another.
which we discuss next. When faced with the problem of finding the matrices ZJ;j' the obvious
First observe that if A=PJP- I , where J=diag[J"J 2, ... ,J,] is a technique is first to find the interpolatory polynomials f{J"tl) and then to put
Jordan form for A, then, by Theorem 9.4.5, =
Z"J f{JJ;J(A). However, the computation of the polynomials f{J"tl) may be a
I very laborious task, and the following exercise suggests how this can be
Z"j = lp"tA) = P diag[lplcj(JI), ... , lp"j(J,)]P- ,
avoided.
where lp"tJi) (1 SiS t, 1 S k S s,O S j S m" - 1) is a matrix ofthe form.
~xample 3 (see Exercises 6.2.4, 6.5.2, and 9.4.1). We want to find the com-
(9.4.4). The structure of lp"tJ,) is now studied by considering two possi-
ponent matrices of the matrix
bilities.
Case 1. The Jordan block J, (of size ni) is associated with the eigenvalue =
6 2]
2
At. Since (see Eq. (9.2.6 qJt)(A,,) = OJ, (r,j = 0,1, ... , m" - 1), it follows'
from Eq. (9.4.4) that
A
[ -2 2 0 .
002
1 We also want to find In A using Theorem 1. We have seen that the minimal
lp"tJ ,) = 7j N~l' polynomial of A is (A - 2)(1 - 4)2 and so, for any / defined on the spectrum
J.
of A, Theorem 1 gives
where N ftc is the ni x ", Jordan block associated with the zero eigenvalue,
and it is assumed that for j = 0 the expression on the right in Eq, (3) is the' /(A) = /(2)Z,0 + /(4)Z20 + /(1)(4)Z21' (7)
", X ", identity matrix. Note also that ", S m" (see Exercises 7.7.4 and We substitute for /(A) three judiciously chosen linearly independent
7.7.9). polynomials and solve the resulting simultaneous equations for ZIO, Z20'
C"se 2. The Jordan block J, (of order n;) is associated with an eigenvalue and Z2l' Thus, putting /(1) = 1, /(1) = 1 - 4, and /(1) = (A- 4)2 in
1, =F 1". Then again n, S m, and [as in Eq, (9.2.7)] we observe that lp"tl' turn, we obtain
along with its first m, - 1 derivatives, is zero at 1 = A,. Using (9.4.4) agai ZIO + Z20 = 1,
we see that

[-~ ~],
2
lp",PJ = o. -2Z ro + Z" - A - 41 - -2
To simplify the notation, consider the case k = 1. Substituting Bqs, ( o -2
and (4) into (2) it is easily seen that
0 0
= FP diag[N~" N~2"'" N~., 0, ... , O]P- I
Zlj

for j = 0,1, ... , ml - 1, provided J to J 2 , , JIf are the only Jordan blo
4Z I0 = (A - 41)2 = 0
[o 0
0
-:1
These equations are easily solved to give
of J associated with 11 , More specifically, Eq. (5) implies
ZIO = P diag[I.. O]p-I, 0 0] [1 00] [2 2 2]
where r = nl + n2 + ... + nlf'
The nature of the results for general 1" should now be quite clear, and t
.
ZIO =
[0 0 1
0 -1, Z20 = 0
000
1 1" Z21 = -2 -2 -2 .
,000 ; ~
required statements follow.. By comparison, the straightforward calculation of a component matrix, 1ji
y Z20, using its definition, Z20 = f{J20(A), is more complicated. Seeking a l
The results of Theorems 1 and 2 should be regarded as generalizations ,lynomial lp20(1) of degree not exceeding 2 and such that i I,
Exercise 4.14.3. They may help in computing /(A) and especially in sim . Ii
taneously finding several functions of the same matrix. In the last case ea l f{J20(2) = 0, lp20(4) = I, lp20(4) = O. (8) \'
\ '~
318 9 FuNCTIONS OF MATRIC] 95 SPIlC'J1tAL RIlsoLUnON OF f{A)
319
we write (see the proof of Proposition 9.2.1)
For k = 1,2,... ,5 and j = 0, 1,... , m k - 1,
lI'zoO.) = [oco + OCl(A, - 4)](A - 2)
1
and observe that the conditions (8) give 0 = 1, OC I = - t. Hence ZkJ = :; (A - AkI'YZ"o
J.
(9)

Z20 = -iA 2
100] PROOF. Observe first that the result is trivial for j = O. Without loss of

000
[
+ 2A - 31 = 0 1 1, generality, we will again suppose k = 1 and use the associated notations
introduced above. Since A - All = P(J - A1I)P-I, it follows that
as required. , , } - All = P-I(A - AII)P = diag[Nn" ... , Nn. , J,,+I - All, ... , J, - All],
Note that formula (7) for f(A) = In Aprovides another method for finding' (10)
In A (compare with that used in Exercise 9.4.1):

In A = (In 2)ZIO + (In 4)Z20 + 1Z21 = ,[ - 1


In 4 +i 1 1]t , ',
In 4 -! In 2 -
where J I' J 2, ... , J, are the only Jordan blocks of J associated with AI'
Multiplying Eq. (10) on the right by P- IZ 1JP and using Eq. (5), we obtain
jlp-I(A - A,II)ZIJP = diag[N~i 1, ... , N~: 1,0, ... , OJ.
o 0 In 2 tIsing Eq. (5) again, we thus have
which coincides with the result of Exercise 9.4.1.
jIP-I(A - A,II)ZIJP = U + 1)!P- 1Z 1 J+IP
Exercise 4. Check formulas (3) through (6) and the statement of Theorem
for the matrix A considered in Exercise 3. 0 1
ZI.J+l = j + 1 (A - AII)ZIJ
Note that, in general, the results of Theorem 2 may help in solving tb
equations for the component matrices. for j = 0, I, ... , m" - 1. Applying this result repeatedly, we obtain Eq. (9)
tor k = 1.
ExerciseS. Show that the component matrices of
Theorem 3 should now be complemented with some description of the

=
-1 12 1]
0 -1 0-3
projectors Z"o' k = 1,2, ... ,5 (see also Section 9.6).

A
[o 0 0 I 1
0 0 0
Theorem 4. With the previous notation,
Im Z1:0 = Ker(A - A"I),"k.
"",<,.,

are (taking Al = 0, A2 = 1, A3 = -1) PROOF. We will again consider only the case k = 1. If J is the Jordan form
of A used in the above discussion, then (see Eq. (10

o0 -4J
00 00 -3 [0
0 00 01 01] (J - All)'"' = diag[O, ... , 0, (J,,+ 1 - 111)'"' , ... , (J, - 111)'"1] ,
ZIO = 0 0 0 -1' Z20 = ,0 0 1 1 ' ~lnce nl , n2 , , n" ~ mi' The blocks (Jr - 111)'"' are easily seen to be
[
0001 0000 ~onsingular for r = q + 1, ... , t, and it followsthat the kernel of (J - 1 11),"1
,~spanned by the unit vectors e., '2' . "" .. where r = nl + n2 + ... + nlj'
. 1 0-13]
0103 [00 010003] }.et 9' denote this space.
';; We also have (A - AII)'"I = P(J - AII)'"IP-I,aJiditfollowsimmediately
[0 0 0 0
Z30 = '0 0 0 0' Z31 = 0 0 0 O' 0 ~hat x e Ker(A - 1 11),"1if and only if P - IX e f/. '
0000 Using Eq. (6) we see that x e Im Z 10 ifand only if
The next result allows us to find all component matrices very easily on, p-Ixelm[lr O]P- I =.9,
the projectors Z1:0, k '= 1, 2, ... , 5, are known. d hence Ker(A - All),"1 = Im Z10'.
320 9.6 CoMPONENT MAnlcllS AND INVARIANT SUBSPACIlS 321

It turns out that the result of the theorem cannot be extended to give an. and,since 1m ZIO n 1mZjO = {O} when i ~ i,
equally simple description of 1m Z"J for j ;;:: 1. However, a partial result for
1m Z,.. ",,,-1 is given in Exercise 7.
Ker Z,.o = Im(l- Zto) = L .1m ZJo' (1)
J=o.J~I<
Exercise 6. Prove that (A - A"/Y""Z"o = O.
Thus, both 1m Zto and Ker Z"oare A-invariant and thefollowing resultholds.
Exerc:ise 7.. Prove that 1mZ"o :::l 1m Z"t :::l :::l 1m Z",,,,,,-1 and that
1mZ",111,,-1 is a subspace ~ of Ker(A - A-"l). Show that the dimension of Theorem 1. The component matrix ZtO associated with an eigenvalue At of
9'" is equal to the number of Jordan blocks associatedwith A" having order. As C"K n is a projector onto the generalized eigenspace of A" and along the
m". . " sum of generalized eigenspaces associated with all eigenvalues of A different
from AI<'
Exercise B. If x, , are, respectively, right and left eigenvectors (see Section
4.10) associated with A1' prove that Z10X = x and ."TZ10 = ,T. The discussion of Section 4.8 showsthat if wechoose bases in 1m Z"o and
Ker Z"o, they willdetermine a basis for C" in which A has a representation
Exercise 9. If A is Hermitian, prove that Z"o is an orthogonal projector.
onto the (right) eigenspaceof A".
Exercise 10. Consider the n x n matrix A with all diagonal elementS'
diag[A1, A z], wherethe sizes of A 1 and A z arejust the dimensions of 1m z"o
and Ker Z"o' respectively. In particular, if these bases are made up of Jordan
chainsfor A, as in Section 6.4, then Al and A z both have Jordan form.
l
t

I~
equal to a number a and all off-diagonal elementsequal to 1. Showthat the For any A e C"" II and any z j a(A), the function f().) = (z - ).)-1 is
eigenvalues of A are A1 = n + a - 1 and Az = a - 1 with Az repeated, defined on u(A); hence the resolvent of A, (zl - A)-t, exists. Holding A
n - 1 times. Show also that fixed, the resolventcan be considered to be a functionof z on any domain in C
not containing points of a(A), and this point of viewplays an important role
1 1 -1 -1 in matrix analysis. Indeed,weshalldevelop it further in Section 9.9 and make
Z10 = ~ 1. 1. . IJ
~,Zzo In - 1
= !. :-1
n - 1 -1 substantialuseofit in Chapter 11,in our introduction to perturbation theory.
n . .[ . . '. : n:
A representationfor the resolvent was obtained in Exercise 9.5.2:
1 1 ... 1 -1 -1 B 111,,-1 j'
(zl-A)-I=L L (_i}'+tZtJ'
,,= 1 J"O Z "
Using the orthogonality properties of the matrices ZtJ' it is easily deduced
I
\
that
"",-I .,
I
9.6 Component Matrices and Invariant Subspaces (zl - A)-t(l- Zto) = L j=O
L (z _J A,.\}+1 Zt)
I<=Z t,,
It was shown in Chapter 6 (see especially Theorem 6.2.3 and Section 6;
that, given an arbitrary matrix A e C"K", the space C" is a direct sum of
and, using Eq, (1) and Exercise 9.5.6, the resolvent has Im(1 - Z10) as its
range. Now as z .... )... elements of (zl - A)-1 will become.unbounded.
But,sincethe limit on the right of the equation exists, we seethat multiplica-
I
generalizedeigenspaces of A. Now,Theorem9.5.4statestbat eaehcompon

I
tion of the resolventby 1.- Z10 has the effect of eliminatingthe singular part
matrix Z"o is a projector onto the generalized eigenspace associated wi: ~;f (zl - A)-1 at z = AI' In other words, the restriction of the resolvent to
eigenvalue 1t , k = 1,2, ... , s, Furthermore, the result of Theorem 9.5.2(1 the subspace .
the resolution of the identity, shows that the complementary proj
s
1 - Z"ocan be expressedin the form
Im(1 - Z10) = L 1mZJO
s }=z
1- ZtO = L ZjO
J"'O,J~"
,as no singularityat z = ).1' ;~
l,
i~
Hi1

r
"
322 9 FUNCTIONS OP MATRICES 9.7 FURnmR !'RoPIlRl1IlS OP fuNCTIONS OF MATRICES 323

Define the matrix Theorem 1. Let P(u lt U~, , u,) be a scalar pol.vnomial in u u,; . u.
and let /10), /'1.(1), .. ,/,(1) be functtons defined on the spectrumofA e e".
"",,-1 i'
E1 = zlim
... A,
(zl
1
- Ar (l - Z10) = L L
,,;Z j;O
(l
1-
" \.1+1 Z"j' (2) If the function 1(1) = P(fl(A),lz(A), . ,/,(). assumes zero values on the
spectrum of A, then
The next proposition expresses 1 - Z10 = D;z Z"o in terms of E l and will
I(A) = P(fl(A),/z(A), ... , f,(A = O.
be used in Chapter 11.
Theorem 2. In the notationo/the previous paragraphs, PROOF. Let Pl(A), Pz(A), ... , p,(A) be interpolatory polynomials on the
spectrum of A for /l().), /z(A), ... ,f,(A), respectively. Then by definition of
1- ZlO = E l(1 l1 - A). the PI' we have II..A) = PI(A) for i = 1, 2, ... .t. Denote the polynomial
PROOF. The simple statement L! - A = (1 1 - z)1 + (zl - A) implies P(Pl(A), pz(A), ... , p,(A by p(A) and observe that since PI(A) and II..A)
have the same values on the spectrum of A (1 S; i S; t), so do p().) and I(A).
(zl - A)-l(lll - A) == (1 1 - z)(zl - A)-l + 1. Hence our assumption that f(A) is zero on the spectrum of A implies the
Multiply on the left by 1 - Z 10, take the limit as z .... 1 1, and take advantage same for P(A) and therefore f(A) = peA) = O.
of Eq. (2) to obtain the result. For the particular cases mentioned above, we first choose fl(A) =
Exercise 1. Let A e e and a(A) = {lIt Az, ... ,l.}. Define a function
x sin A, Iz(A) = cos l, and P(ul , u z) = u~ + "~ - 1 to prove that, for every
!tel) on the spectrum of A by setting !tel) == 1 in an arbitrarily small neigh- A e e", it is true that sin z A + cos z A = 1. Next, taking fl(A) = e",
borhood of J. and !t().) == 0 in an arbitrarily small neighborhood of each /z(A) = , and P(Ul' uz) = UlUZ - 1, we obtain eAe-A = 1. Thus, e-A
= (eA)-l.
lj E a(A) with lj #: 1". Show that !teA) = Z"o and use Exercise 9.3.4 to show
that Z"o is idempotent. Exercise 1. If A e e" and the functions/l(A) andlz()') are defined on the
Exercise 2. If). = 11 is an eigenvalue of A and 11, ,A. are the distinct spectrum of A, show that the functions F 1(A) =11 (A) + /z(A), F z(A) =
eigenvalues of A, show that, for any z a(A), . fl(A)/z(A) are also defined on the spectrum of A, and F leA) = fleA) + fz(A),
F z(A) = fl(A)/z(A).
"'.-1 'IZ
(zl - Ar 1 = L ( ]. '~{+1 + E(z),
J-O z-
Exercise 2. If p(A.) and q(A.) are relatively prime polynomials, det q(A) #: 0,
and r(A) denotes the rational function p(A)/q(..t), prove that
where E(z) is a matrix that is analytic on a neighborhood of Aand ZuE(z)
= E(Z)Z11 = O. Write E = E(A) and use this representation to obtain' rCA) = p(A)[q(A)]-l = [q(A)rlp(A).
Theorem 2. 0
Exercise 3. Prove that for any A e e",
elA =cosA + i sin A.
(Note that, in general, the cosine and sine functions must be considered as
9.7 Further Properties of Functions of Matrices defined on C.) 0
We can also discuss the possibility of taking pth roots of a matrix with the
In this section we obtain some additional properties of functions aid of Theorem 1. Suppose that,p is a positive integer and let flO,) = All",
matrices, which, in particular, allow us to extend some familiar identities fo! fz(A) ='A, and P(u!> uz)= uf - Uz. We would normally choose a single-
scalar functions to their matrix-valued counterparts. For example, we valued branch of the function All" for fl (A.), but this is not essential. If, how-
whether the identity sin z IX + cos 2 IX = 1 implies that sin 2 A + cos z A I = ever, the matrix A has a zero eigenvalue with index greater than 1, then,
~ince derivatives of fl (A) at the origin are needed, fl (l) would not be defined
and does e"'e-" = 1 imply eAe-A = 11 We answer these questions (and
on the spectrum of A. Thus, if A is nonsingular, or is singular with a zero
whole class of such questions) in the following theorems.
324 9 FUNCTIONS OF MATRfCllS 9.8 SEQuIlNCIlS AND 8BRms OF MATRfCIlS 325

eigenvalue of index one, we define flO.) == A1/p (with a known convention Let us show that {g(Il,,), g'(IlJ, ... , g(m,,-l)(PJ} contains all the values of
concerning the choice of branches at the eigenvalues), and g(p.) on the spectrum of B. This fact is equivalent to saying that the index
"",-1
nIt of the eigenvalue JJk of B does not exceed mIt or that the polynomial
All" == fleA) == L L f'f(A,,)ZkJ' p(p.) == (Il - 1l1)m l(p. - JJ2)m2 (Il - Il,)m" annihilates B (l s k s s). To
I
1:=1 J=O
see this, observe that q(A) == p(l(A == (I(A) - j(Atm, .. (I(A) - f(A.m. 1
Then, by Theorem 1 applied to the functions fleA), f2(A), and p(u l , U2) has zeros AI: with multiplicity at least m. (k :;: 1,2,. ,s) and therefore
defined above, q(A) assigns zero values on the spectrum of A. Thus, by Theorem 1 with
(Al/p)" == A.
t == 1, pCB) == p(f(A == 0 and peA) is an annihilating polynomial for B. \
This discussion should be compared with the definition of the square . Now defining the polynomial r(p.) as satisfying the conditions (2), we obtain
root of a positive definite matrix made in Section 5.4. a polynomial with the same values as g(lJ) on the spectrum of B and the proof
It should benoted that our definition of A 1/2, for example, does not include is completed. Note that r(p.) is not necessarily the interpolatory polynomial
aU possible matrices B for which B2 == A. To illustrate this point, any real for (J(Il) on the spectrum of B of least possible degree.
symmetric orthogonal matrix B has the property that B 2 == I, but B will Exercise 4. Prove that for any A e C"x n,
not necessarily be generated by applying our definitions to find 11/2. A
particular example is the set of matrices defined by sin2A == 2 sin AcosA. 0 il

[c~s(J
1\
B(O) == sin (J 1 ~I
81n 0 -cos (J J 9.8 Sequences and Series of Matrices !
(see Exercise 26.16).
Theorem 2. Let h(A) == g(f(A, where f(A) is defined on the spectrum of
A e cnx n and g(P)is defill8d on the spectrumofB == f(A). Then h(A) is defined} In this section we show how other concepts of analysis can be applied to
on the spectrum ofA and heAl == g(f(A. matrices and particularly to functions of matrices as we have defined them.
We first consider sets of matrices defined on the nonnegative integers, that
PROOF. Let Ait l2"'" l. be the distinct eigenvalues of A with indiceS is, sequences of matrices.
mit m2' ... , m., respectively. Denote f(A,.) == Ilk for k == 1,2, .. , s. Since . Let AI' A 2 , be a sequence of matrices belonging to cmx n and let
a~rl be the i,jth element of Ap, p == 1,2, .... The sequence At> A 2 , ... is
h(lJ == g(PJ,
said to converge to the matrix A e Cm x n if there exist numbers alJ (the elements
h'(Ak) == g'(Pk)!'(lJ, of A) such that aW -+ au as p -+ 00 for each pair of subscripts i, j. A sequence
that does not converge is said to diverge. Thus the convergence of sequences
h(m,,- II(AJ == gcm,,-II(p.J[f'(AJ]m,,- 1 + ... + g'(p.Jpm,,-I)(A,,), of matrices is defined as elementwise convergence. The definition also
includes the convergence of column and row vectors as special cases.
it follows that h(..1.) is defined on the spectrum of A and assigns values given
by the formulas in (1). Let r(p.) denote any polynomial that has the satn~ Exercise 1. Let {A"}:'~1> {B,,}:'_l be convergent sequences of matrices
values as (J(P) on the spectrum of B and consider the function r(f(..1. Ifrom cm x" and assume that A, -+ A, B p -+ B as p -+ 00. Prove that as p - 00,
the values of h(l) and r(f(..1. coincide on the spectrum of A, then h(A) ..D
r(f(..1. assigns zero values on the spectrum of A and, by Theorem 1, A, + Bp-A + B;
ApBp -+ AB and, in particular, IX"B, -+ txB for any sequence lX" -+ lX
heAl == r(f(A == reB) == g(B) == g(l(A,
QA,Q-l -+ QAQ-l;
and the result is established.
Now observe that in view of (1) the values of h(A) and r(f(J. coincii diag[A", BpJ -+ diag[A, B].
on the spectrum of A if into These results follow from fundamental properties of sequences of
:omplexnumbers. For part (b), if lX" -+ IX and PP -+ pas p -+ 00, write
\
,.<iI(pJ == g<iI(pJ
for i == 0, 1, . ; , mk - 1, k == 1, 2, ... ~ s. a.,p, - a.p == (IX, - a:)p + (P" - P)a. + (a., - tx)(P, - Pl. 0
\~.,'I:
![
:\
II};
\

\,
326 9.8 SEQUIlNCES ANDSI!RJES OF MATRICES
327
We now consider the case in which the members of a sequence of matrices for A, ~t follows that m :S n :S n 2 and therefore (see Exercise 3.10.7) the
are defined as functions of A e c nxn. Thus, let the sequence of functions, system In (1) has a unique solution of the form
fl(A),f2(A), be defined on the spectrum of A, and define AI' A f,(A);
n
p = 1,2, For each term of the sequence At> A 2 , , Theorem 9.5.1 yielcls, )= ~ ." a(pl
f lil(A (4)
1'''
i,,==
oJ fl,
I
.. '

~here the. 1I,(i, r =1,2, ... , n) depend on j and k but not on p. Now if
lim A eXI~s as p -+ 00: then so does lim al:1 and it follows from Eq. (4)
r
. that lim f I' (A,,) also exists as p -+ 00 for each j and k. This completes the
Now the component matrices Z"i depend only on A and not on p, so tha
first part of the theorem.
each term of the sequence is defined .by the m numbers f~n().,,) for k .
1, 2, ... ,s and j = 0, 1, .. , mIt - 1. We might therefore expect that tll . If f~I(A,,) -+ fUl(A,,) as p -+ 00 for each j and k and for some f().), then
convergence of AI' A", .. , can be made to depend on the convergence oftb
Bq, (3) shows that AI' -+ f(A) as p .... 00. Conversely, the existence of lim A
m scalar sequences Pf().,,), f<f(AJ, . . rather than the n2 scalar sequences . as p -: 00 provides the existence of lim Jil(A,,) as p -+ 00 and hence th;
equality (2). Thus, we may choose for f(A) the interpolatory polynomial
the elements of A 1 , A:z
whose values on the spectrum of A are the numbers lim fUl( 1 )
Theorem 1. Let the functions fl,f2"" be defined on the spectrum 0 . . .k = 1, 2, ...', s, J = 0, 1, ... , mIt - 1. The proof is complete.
1'....ee I' A",

AeC"x n and let AI' =fp(A), p = 1,2, .... Then the sequence A I,A 2 , ... ,
!
AOj .. conve"/ies as p .... ~ if and only. the m scalar sequen~s f!f(Ak)
f 2 (A,,), ... ,fI' (AJ, .. , (k - 1,2, ... , s, J - 0, 1, ...', mIt - 1, m - ml + m
We now wish to develop a similar theorem for series of matrices. First
the series of matrices L;'==o AI' is said to converge to the matrix A (th~
+ ... + m.) converge as p .... 00. 'sum of the series) if ~he sequence of partial sums B. = L;== I A p converges
Moreover, if fifO.,,) fUl(A,,) as p .... 00 for each j and k and for so to A as v -+ 00. A senes that does not converge is said to diverge.
function f(A), then AI' f(A) as p -+ 00. Conversely, if lim AI' exists The following theorem is immediately obtained by applying Theorem 1
p -+ 00, then there is a junction f(A) defined on the spectrum of A such t" to a sequence of partial sums.
lim AI' = f(A) as p 00.
.1lleorem 2. 1/ the functions UIo U2, . .. are defined on the spectrum of
PROOF. If f!f().,,) fUl(A,,) as p .... 00 for each j and k and some f( A e C" x n, then L;'== I upeA) converges if and only if the series L;'= I U~'l(A,,)
then Eq. (1) and Exercise 1 imply that converge/or k = 1,2, ... , s and] = 0, 1, ... , m" - 1.
Moreover, if
ee
L ulf(AJ = flJ)(A,,)
1'==1
and reach j and k, then
",,,-I eo

lim AI' = L L fUl(AJZkJ = f(A). L up(A) = f(A),


1'==1
1'''''''' k=1 J-O
vice versa.
Conversely, suppose that lim AI' exists as p .... 00 and denote the elem
of AI' = f,,(A) by a~l, i, r = 1, 2, ... , n, p = 1, 2.... Viewing Eq. (1) ~e ~an now apply this theorem to obtain an important result that helps
set of n2 equations for the m = ml + m2 + ... + m. unknowns ffj'( JUShfy our development of the theory of functions of matrices. The next
where p is fixed, and recalling the linear independence (Theorem 9.5.1 .eorem Shows the generality of the definition made in Section 9.3, which
the component matrices Z"J as members occ"xn, we deduce that the n2 ' ay have seemed rather arbitrary at first reading. Indeed, the properties
coefficient matrix ofthis system has rank m.Since the degree m ofthe mini scribed in the next theorem are often used as a (more restrictive) definition
polynomial of A'does not exceed the degree of the characteristic polyno a function of a matrix.
,
1, ~
I'
~\
:\
328 9 FuNCTIONS OF MATRICIlS 9.9 THB RESOLVBNT AND THE CAUCHYTHOOIUlM FOR MATRICES 329

Theorem 3. Let the jUnction f(~) have a Taylor series about Ao e C: are thetrigonometric andexponential functions. Thus, from thecorrespond-
ing results for (complex) scalar functions, we deduce that for all A e C"",
GO
-1)" (-I)"
f(A) = I: CX,,(A - ~o}", (5) sin A = LOIl (
A ZP+ l, cos A = L __A z",
OIl

,,=0 ,,=0 (2p + I)! ,,=;o (2p)!


GO A" GO A Z" + 1 GO A Z"
with radius ofconvergence r. If A e C""", then f(A) is defined and is given by A
e L "p.
=17=0 sinhA L (2P + 1)1'
=17=0 cosh A L (2P)1'
=17=0
OIl
If the eigenvalues AI'Az , ... , A" of A satisfy the conditions IA)I < 1 for
f(A) = L cx,,(A - AoI)1' (6)
j = I, 2, ... ,n,then we deduce from the binomial theorem and the logarithmic
,,=0
series that
I
if and only if each of the distinct
ofthe following conditions:
eigenvalues AI. Az , ... , A. of A satisfies one
(/_A)-1 =
GO

LA", 10g(1 + A) = I: (-1)1'-: 1 A". (7) i


17=0 ,,=1 P
= eA'eA' for I
,I
(a) IA" - ~ol < r; Exercise 2. Prove that if A eC"", then eA(B+tl any complex
(b) I~" - Aol = r and the series for f( lttk- l l (A) (where mIl is the index of numbers sand t.
~,,) is convergent at the point A = A", 1 S k s; s. Exercise 3. Prove that if A and B commute, then eA+ B = eAeB, and if
e(A+Blt = eAte B1 for all t, then A and B commute.
Ii

PROOF. First we recall that f(A) exists if and only if fUl()"J exists for
j = 0, 1, ... , mIl - 1 and k = 1, 2, ... ,s. Furthermore, the function f(A) Exercise 4. Show that if
given by Eq. (5) has derivatives of all orders at any point inside the circle
of convergence and these derivatives can be obtained by differentiating
the series for f(A) term by term. Hence f(~) is obviously defined on the
A= [~ ~]. 1
spectrum of A if I~" - ~I < rfor k = 1,2, ... , s. Also, if I~" - ~I = r for then
some k and the series obtained by mIl - 1 differentiations of the series in .
(5) is convergent at the point A = A", then f(A,,), f'(AJ, ... , P"'k-ll(~J t]
et-t' e
'Bt = [10 1l-
1
(e
ellt
llt
- 1)] ' .j
exist and, consequently,f(A) is defined: Note that, starting with the existence
of f(A), the above argument can be reversed.
If we now write u,,(~) = CX,,(A - AD}" for p = 0, 1,2, .. , then Up(A) is certainly
Ct _ [ cos t
e -.
sin t]
.
-smt cost
defined on the spectrum of A and under the hypotheses (a) or (b), the series
I:;,= 1 u~I(A.,.), k = I, 2, ... ,s, j = 0, I, ... , mIl - I, all converge. It now Exercise 5. Show that the sequence A, A Z , A 3 , converges to the zero- "
follows from Theorem 2 that L;'=o up(A) converges and has sum f(A) . matrix if and only if PlI < Lfor every Ae a(A). 0
and vice versa.

Note that for functions of the form (5), a function f(A) of A can be defined
by formula (6). Clearly, this definition cannot be applied for functions having 9.9 The Resolvent and the Cauchy Theorem for Matrices
only a finite number of derivatives at the points A" e a(A), as assumed in
Section 9.3.
If the Taylor series for f(A) about the origin converges for all points in the' The theory of (scalar) functions of a complex variable provides a third
complex plane (the radius of convergence is infinite), then f(A) is said to approach to the definition of f(A) that is applicable when f().) is an analytic
be an entire function and the power series representation for f(A) will, function. This approach is also consistent with the general definition of
converge for all A e C""". Among the most important functions of this kind! Section 9.3.
330 9 FUNCTIONS OF MATRfCl!S 9.9 THE RESOLVENT AND THE CAUCHY THEOREMFOR MATRICES 331

~onsider a matrix A(A) whose elements are functions of a complex Athat do not belong to the spectrum of A. Since the matrix A is fixed in this
variable ..1. and define the derivative (or integral) of the matrix to be the matrix section, we use the notation R). = (M - A)-1 for the resolvent of A.
obtained by differentiating (or integrating) each element of A(A). For
Exercise 4. Verify that for A u(A),
r = 0, I, 2, ... , we write
d
rF -R.
dA" = -R.2..,
dAr A(A) or
and, more generally,
for the derivative of A(A) of order r. dJ
dAi Rot = (-I)ljIRJ+ 1 for j = 1,2, ....
Exercise 1. Prove that
d Hint. Use Exercise 4.14.4(a). 0
dt (eA') ;; AeAt = eAtA.
Theorem 1. The resolvent R). of A e en.." is a rational function of A with
Hint. Use the series representation of e". poles at the pointsof the spectrum ofA and Roo = O. Moreover, each A" E u(A)
is a poleof R). of orderm", where mil is the index(lfthe eigenvalue A".
Exerc:;se 2. Prove that (with the variable t omitted from the right-han
expressions) hOOF. Let m(A) ;; I::"-o lXjAi be the minimal polynomial of A. It is easily
seen that
;t (A(t2 = A(1)A + AA(1), meA) - m(p.) _ "';,
1
A,.J
A _ p. - i... ')IiJ ,.,-,
',J"'O
and construct an example to show that, in general,
for some V,} (1 S; i,j S; m - 1) and therefore, substituting A for p, we have
!!.
dt
(A(t2 :;. 2AA(l)
.
for A u(A)

Exercise 3. Prove that, when p is a positive integer and the matrices exist,
(m(A)! _ m(AR). ;; "'r: ("'~
i-O
\JA')AJ.
i-O

Since m(A) = 0, it follows that, for the values of A for which m(..1.) :F 0,
-d (A(t,,;; L" AJ-1A(1)AP-i, ....,..-

dt J-1 1 "'-1
R). = (') L m;(it)Ai, (1)
:t(A(t-"= _A-"d~" A-". 0
mIL }=o
Where m;(A) (0 S; j :s; m - 1) is a polynomial of degree not exceeding m - 1.
The notation The required results now follow from the representation (1). .
I A(A) dA
Thus, using the terminology of complex analysis, the spectrum of a
matrix can be described in terms of its resolvent. This suggests the application
of results from complex analysis to the theory of matrices and vice versa.
will be used for the integral of A(A) taken in the positive direction along, In the next theorem we use the classical Cauchy integral theorem to obtain
path L in the complex plane, which, for our purposes, wiUalways be a simpl its generalization for a matrix case, and this turns out to be an important and
piecewise smooth dosed contour, or a finite system of such contours, witho ~seful result in matrix theory.
points of intersection. , First, let us agree to can a function f(it) of the complex variable Aanalytic
Let A e C"" ". The matrix (Al - A) -1 is known as the resolvent of on an open set of points D of the complex plane iff(..1.) is continuously differ-
and it has already played an important part in our analysis. Viewed as ~~tiable at every point of D. Or, what is equivalent,f(A) is analytic if it has a
function of a complex variable A., the resolvent is defined only at those poin Convergent Taylor series expansion about each point of D. Iff(..1.) is analytic
332 '9~9 THE REsoLVENT ANDTHE CAUCHYTHEOREM FOR MATRICES 333

within an open domain D bounded by a contour L and continuous on the The functions (1 - 1,)'(1- ~) - i-I with k:p p or with k == p and
closure D, then the Cauchy theorem asserts that j S r - 1 are analytic in the domain bounded by Lp ..Therefore the integrals
Ul _ E. r /(10) of these functions around L p are equal to zero. Now observe that for the
remaining cases we have. in view of Eq. (2) for f(l) == 1,
f (10 ) - 2ni JL (A -1oY+ 1 dl
10 eD and) == 0, 1, .... ~ ~==r,
for any
Theorem 2. If A e c has distinct eigenvalues 11 , ,1., the path L is a
nxn

Hence Eq. (5) l.mplies


iL,.
(A_l,Y-J-ld).=={2ni
. 0 If J > r.
closed contour having 1 1, ,1, in its interior, and f().) is continuous in and
on L and analytic within L, then
1 1
f(A) = -2. f /(l)(U - A)-l dl == 2 . f f().)R). dl.
i,.
(1 - l,,)'RAdl = rlZ p,(2ni),

nih . nih which is equivalent to (4).


PROOF. Use the representation of Exercise 9.5.2 for the resolvent and
multiply both sides by f().)(21ti)-1. Applying formula (2) with 10 == At: We remark 'that, for those familiar with Laurent expansions, Eq. (4)
(1 S k S s) and integrating along L, we o b t a i n . ' expresses Z"i as a coefficient of the Laurent expansion for the resolvent

-2'
1
n.
i f(l)(lI- A)-l dA.

=IrLl m,,-l
L IUl(~)Z"j'
R A about the point It.
. . Using the results of Theorems 9.5.2 and 9.5.4, and that of Theorem 3
L eJ=O tor j = 0, the following useful result is obtained (see also Theorem 9.6.1).
and the result follows from the spectral resolution of Theorem 9.5.1. ifheorem4. With the previous notation, the integral
Note that the relation (3) is an exact analog of (2) in the case} == O
1
(1-10)- 1 is simply replaced by (ll - A)-I. It should also be remark P, = ZkO = 2 . f RA. d)', 1 S k S S.
that (3) is frequently used as a definition of I(A) for analytic functions leA.', nI JL"

CoroUary 1. With the notationof Theorem 2, f/:alled the Ries% projector, is a projector. The system {PrrH=1 is an orthogonal
system of projectors in the sense that PrrPJ == 0 for k ::I: j, giving a spectral
1.
1 = -2 f RA, d.l.,
nih
1
A = 2 . ARA dl.
nI
i
L
resolution of the identitymatrix (with respect to A):

The component matrices ZtJ of A can also be described in terms of th and


resolvent of A.
here mrr is the indexofthe eigenlJalue 1"of A.
Theorem 3. Preserving the previous notation, let L k be a simple piecew
smooth closed contour that contains At e a(A) in its interior, and assume th :tercise S. Let A E cn x n and L o be a simple piecewise smooth closed
no otherpoints ofthe spectrum ofA are inside or on Lt. Then . ntour, Show that
f RAdA = ZkO
Z"j == jl~1I:i 1,,(1- ~YRA.dA JLo rr=t

. Lo contains in its interior only the points 11, A;Z' , A, from the spectrum
for j = 0, 1, ... , mIl - 1; k = 1, 2, ... s. A. and that the integral is the zero-matrix if there are no points of a(A)
PROOF. Multiplying both sides of the equation in Exercise 9;5.2 by (1 - ideLo.
(1 s p s s, 0 S r S m, - 1) and integrating around LII , we obtain

L L i Prove that, with the notation of Theorems 3 and 4,

i (1 - A,,)'RA dl =
L,. tel }=O
"",-1
jlZtJ
L,.
(A-A.y
(1 A.
-
)t+1 o:
I:
lkZko + Zu == AZkO == 2
nI
1.
f
JL "
ARA dl. 0
334 APPLICATIONS TO DIFFERIlNTlAL EQUATIONS 335

Exercise7showsthat, when Ak is aneigenvalue of A ofindex 1,the projecto ~ore generally, we look for a solution of Eq. (I) in the form
Z"o can easily be described in terms of the reduced adjoint C(A) of A and the
minimal polynomial m(A) of A. x(t) = xo(t)eAo' ,
Ao E C, (3)

Exercise 7. Show that, if At is an eigenvalue of A of index I, then where xo(t) is a vector of order n whose coordinates are polynomials in t.
For convenience we write xo(t) in the form
1
Z"o = m(I)(A,,) C(A,,). t"-l t
xo(t) = (k _ 1)1 Xl + ... + 11 X"-l + x", (4)
SoLUTION. Use Theorem 3 to write
where XiE en, i = 1,2, ... , k, and Xl is assumed to be nonzero.
Z,,~ = ~ f. (ll -
2m 4
A)-l dA.
It is easily seen that x(t) in the form (3) is a solution of (1) if and only if
xo(t) = (A - AD l)xo(t),
Then use Exercise 7.9.1 to obtain
here xo(t) is given byEq. (4). By differentiation, we then have
1 f 1
Z"o = 2'1ti JL/c meA) C(A) dA. xgl(t) =' (A - Aol)iXO(t), i = 1,2, ... , (5)

Since the index of A" is 1, we have m(A) = (A - AJ)ml(A), where ml(A~ J


d since xo(t) is a vector-valued polynomial of degree k - 1,it follows that
m(1)(AJ) ". O. Substitute this into the last displayed equation and use ::;:: (A - Ao l)"xo(t).
theorem of residues to obtain the result. Now we find from Eq. (4) that x~-1)(t) = x" and comparing this with
q. (5), Xl = (A - Aol)"-lxo(t). Multiplying on the left by (A - Aol) we
Exercise B. Let J be a matrix in Jordan normal form. Use Theorem 4 .
btain (A - Aol)xl = 0, and since Xl '# 0 by hypothesis, Xl is an eigenvector
show that the projectors Z"o for J have the form diag[du , du , ... , d .
f A corresponding to the eigenvalue Ao. .
where d"i is equal to 0 or 1 for 1 S k S sand! SiS n. 0
Now compare the (k - 2)nd derivative of .ro(t) calculated from Eqs, (4)
d (5) to get
tXl + Xz = (A - AoI)"-zxo(t).

ultiply on the left by (A - AoI), and use the results of the preceding para-
9.10' Applications to Differential Equations aph to get (A - Aol)xz = Xl' Thus, (see Eqs. (6.3.3 Xl' X2 form a Jordan
ain of length 2.
Proceeding in the same way, it is found that Xl' X2"'" x" form a Jordan
Consider again the systems of differential equations with constant c ain for A associated with Ao. A similar argument shows that if X l' s:2' .. ,X"
efficientsintroduced in Section 4.13. We wish to find functions oftime, x crate a Jordan chain for A corresponding to the eigenvalue Ao , then Eq. (3)
with values in en satisfying the homogeneous equation ng with Eq. (4) gives a solution of Eq, (1). Thus, we have proved the fol-
ing,
x(t) = Ax(t)
position 1. The junction x(t) defined by Bqs. (3) and (4) is a solution oj
or the nonhomogeneous equation
q. (1) if and only if Xl' X2' ,
X" constitute a Jordan chainjor A associated
.t(t) = Ax(t) + f(t), ith the eigenvalue Ao of A.
where A e C""n andf(t) is a prescribed function oftime. We willnow anal Now recall that, by Theorem 7.10.1, the dimension of the solution space
these problems with no hypotheses on A E en"n. .' of Eq. (1) is n and by the Jordan theorem, there is a basis in en consisting
It was seen in Section 4.13that primitive solutions ofthe form Xo eAo' , wh irelyofeigenvectors and generalizedeigenvectorsof A. Hencethe following
Ao is an eigenvalue and Xo an associated eigenvector, play an important p suit is clear.
336 9 FUNCTIONSOFMATRICES 9.10 ApPUCAUONS TO DIFFERENTIAL EQUATIONS 337

Theorem 1. Let A e CIIXII and let ll' 12" " , l. be the distinct eigenvalues 0/ PROOF. By Theorem 1 any solution x(t) of (1) ii a linearcombin~dion of the
A of algebraic multiplicity Olio 012' 01 respectively.ljthe Jordan chainsjor n expressions xp(t)(p = 1. 2... , n) of the form (6);
Ai (1 ~ k s s) are n
X lll) XIII:) ... , x~~kl'
x(t) = L fJpxp(t).
11' 12' p=1
Xli)
21'
Xl")
22' X~!k" Observe that xp(O), p = 1. 2... , n, givesthe vectors of the Jordan basis of A
in C",and therefore the matrix B = [Xl(O) xi(O) ... xll(O)] is nonsingu-
XIII:) X(i) x(")
'k.l ,,,,2 rk."k.rl.; lar. Thus, the system
where D~ 1 Vii = 0111:, thenany solutionx(t) 0/(1) is a linear combination o/the
n vector-valued funaions L fJ"x,,(O) = Xo
,,=1

~~ 11:) t
II) ] or in matrix form Bp = Xo with P= [fll P2 ... fJlI]T, has a unique
[ U - I)! X~I + ... + Ii xb-I + x~J1) A,.t
e , solution for every Xo E C".
where j = 1, 2, ... , Vilt i = 1,2, ... r/l' k = 1, 2, ... , s, Exereise 2. Find the solution of the system considered in Exercise 1 that
satisfies the condition x(O) = [0 1 I]T.
In other words,the expressions in (6) form a basis in the solution space 9' of
(1) and therefore are referred to as thejUndamental solutions of (1). Note also. SOLUTION. It is easily seen that by putting t = 0 in Eq, (7) we obtain a
that AI: has geometricmultiplicity rl: and x\"l, x~l, ... , X~~I span Ker(A - ll:l). linear system
Exercise 1. Find the general form of the solutions (the general solution) of
the.system

XI] [6 2 2]["1] = Ax(t).


f(t) =~2'
["3 = - 02 02 02 "2
"3
withan invertible coefficient matrix. The vector [fJ 1 fJ2
[1 -1 2]T and therefore the required solution is
fJ3]T is found to be

SoLUTION. It wasfound in Exercise 6.5.2 that the matrix A has an eigenvalue


Al = 2 with an associated eigenvector [0 -1 I]T and an eigenvalue).2 = 4'
associated with the eigenvector [2 - 2 O]T and generalised eigenvector.
[1 0 O]T. Hence the general solution, according to (6), is givenby
Now we show how the solution of (8)can be found by use of the notion of a
function of A. Start by using the idea in the beginning of the section but
applying it directly to the matrix differential equation i =Ax. We look for
solutions in the form eAt% with % e CII. Exercise 9.9.1 clearly shows that the
vector-valuedfunction x(t) = eAt% is a solution of (1) for any % e C":
It is frequently the case that a particular solution of (1) is required that is
specified by additional conditions. The initiql-value problems are of thi. f(t) = ~ eA'% = AeA'% = Ax(t).
kind, in which a vector Xo e cn is given and a solution x(t) of(1)is to be foundl dt
where x(t) takes the value Xo at t = O. Obviously x(t) = eAt% satisfies the initial condition x(O) = Xo if and only if
Theorem 1. Let A E CIIXII. There is a unique x(t) suchthat % = Xo. Thus, using Theorem 2, we find the following.

i(t) = Ax(t) and x(O) = ;to Theorem3. The unique solution ofthe problem (8) canbe written in theform
for any given vector Xo E CII. x(t) = eAtxo' (9)
338 9.10 ApPLICATIONS TO DWFIlIUlNTIAL EQUATIONS 339

Let x(t) be a solution of (1). Computing %(0), we see from Theorem 3 that where Xo = e.tIOI(to) = %(to)' This suggests that
x(t) = eAtx(O). Hence we have a further refinement.
Theorem 4. Any solution x(t) of(/) canberepresented intheformx(t) = eAr% x(t) = eA.t(e-A.1oxo + 1: e-A.'f(-r) dr:)

+ i'
for some% e en.
= eA(I-tolxo eA.(I-'!f(-r) dr:
It is a remarkable and noteworthy faet that all the complications of
10
Theorem 1 can be streamlined into the simple statement of Theorem 4. This
demonstrates the power of the "functional calculus" developed in this is the solution of x= Ax +f, where x(t o) = Xo' This is easily verified.
chapter. Theorem s. If f(t) is a piecewise-continuous function of t with values in
Another representation for solutions of (1) is derived on puttingf(A) = eA' en, to e R, and A e en x n, then every solution ofthe differential equation
in Theorem 9.5.1 and using the spectral resolution for eAt. Hence, denoting
x(O) = xo, we use Theorem 3 to derive x(t) = A%(t) + f(t)
m,,-l
hastheform
x(t) = L L Z"ieAklxo = L (1"0 + lut + ... + 1".~_l~-l)eA"',
"=1 J=o k=l
where %"J = Z"JXO and is independent of t. Thus, we seeagain that the solution
x(t) = efl/-tolxo + i'
10
eA.II-'!f(-r) dr:, (10)
is a combination of polynomials in t with exponential functions. If A is
a simple matrix, the solution reduces to and Xo = x(t o)'

x(t) = L ZkOe.. t
'
Exercise 4. Find the solution of ;i = Ax
are as defined in Exercises 1 and 2 and
+f, with x(O) = xo, if A and Xo I
"=1
1
[!] ="W'"
and we see that each element of x(t) is a linear combination of exponential"
functions with no polynomial contributions (see Section 4.13).
1=
Exercise 3. Bymeans of component matrices, find the solution of the initial~
1
value problem discussed in Exercise 2. 0 SoLUTION. The vector eAtxo is known from Exercise 1. Using Eq. (9.5.1)
Finally, consider the inhomogeneous Eq. (2) in which the elements or to find the particular integral, we have
fare prescribed piecewise-continuous functionsfl(t), ... ,In(t) of t that are
not all identically zero. Let 9' be the space of all solutions of the associated
homogeneous problem x = Ax and let !I" be the set of all solutions of
,(t) = f.teAII-Jftr:) dr: =
o
r m
k=l 1=0
1 Z"J f'(t -
0
-r:y-leA~t-tlf(-r:) d-r:,
;i = Ax + f. If we are given just one solution, e!l", then (compare wit The integrals
Theorem 3.10.2) z e!l" if and only if z = , + w for some HI e 9'. The v .
cation of this result is left as an exercise. Thus. to obtain :T we need to kno
just one solution of the inhomogeneous equation, known as a particut ,
integral, and add to it all the solutions of the homogeneous equation, eacll. are easily found to be ate", !at 1e4 " and !ae1t(e2t - 1), respectively. Hence
of which we express in the.form eftx for some x e C.n
We look for a solution of .t = Ax +f in the form yet) = ate4rZ10el + !at2e4tZlle1 + tae 2/(e2t - I)Z20e l
= efIZ(t).
-tmeWH~] + [-m-[~~~~l
x(t)
If there is such a solution, then .t = Ax + eAtl and so eA.t: = f, hence t

z(t) = e-A.10xo + {e-A.'f('f) d-r, The solution is then given by x(t) = eA,xo + ,(t).
340 9 FUNCTIONS 01' MATRICES 9.11 OBSIlllVABLB AND CONTROLLABLB SYSTEMS 341

=
Exercise S. Let A be nonsingular and p A -1 b, where bee". Show that Theorem 1. The system (1) is observable if and onlY if Ker Q(e. A} = to}.
if x(t) is a solution of f(t) = Ax(t) - b, then that is, if and only if the pair (C, A) is observable.
x(t) - p= e4t(x(O) - Pl. Before proving the theorem we establish a technical lemma that will
(Here Pis the steady-state solution of the differential equation.) simplify subsequent arguments involving limiting processes. An entire
function is one that is analytic in the whole complex plane; the only possible
Exercise 6. Let A e C"II" and let a sequence of vectors 10.11> ... in CIt be singularity is at infinity.
given. Show that any solution of the difference equation
x,+ 1 = Ax, + I" r = 0, 1, 2, ,
Lemma 1. For any matrix A e c"xn there exist entire functions "'1(t).
: .. , "',,(t) such that
can be expressed in the formxo = % and, for j = 1, 2, ,
i-l eAt = I:" '"~t)Ai~.l. (2)
Xi+l = Ai% + I: Ai- k- l/k, i=l
for some % e Cn 0
k=O
PROOF. Let I(A) =
eAt; then I is an entire function of A that is defined on
the spectrum. of any n x n matrix A. Since IUJ(A) tJeAt, it follows from
=
Theorem 9.5.1 that
Iftk-l

9.11 Observable and Controllable Systems I(A) = eAt = L L tie),ktZkJ' (3)


k-l i=O

In this section we bring together several important results developed in Since Zki = tp"JCA) for each k, j and tpki is a polynomial whose degree does
this and earlier chapters. This will allow us to present some interesting not exceed n [see the discussion preceding Eqs, (9.2.6)and (9.2.7)],and because
ideas and results from the theory of systems and control. the tp"J depend only on A, the right-hand term in Eq. (3) can be rearranged
Let A e cn x" and C E C"x" and consider the system of equations as in Eq. (2). Furthermore, each coefficient '" JCt) is just 'a linear combination
of functions of the form tie),kt and is therefore entire.
t(t) = Ax(t). x(O) = xo,
yet) = Cx(t). PRooFOF1'HEoREM I. IfKerQ(C,A):F {O} then there is a nonzero xo such
=
that CAJxo 0 for j =
0, 1, ... , n - 1. It follows from Lemma 1 that
Thus x(t) has values in C" determined by the initial-value problem (supposing CeAtxo = 0 for any real t. But then Theorem 9.10.3 and Eqs. (1) yield
Xu E Cn is given) and is called the state vector. Then yet) e C" for each t ~ 0
and is the observed or output vector. It represents the information about the yet) = CeAtxo == 0
state of the system that can be recorded after "filtering" through the matrix
C. The system (1) is said to be observable if yet) == 0 occurs only when '.and the system is not observable. Thus, if the system is observable, the pair
X o = O. Thus, for the system to be observable all nonzero initial vectors (C, A) must be observable.
must give rise to a solution which can be observed, in the sense that the output Conversely, if the system is not observable, there is an Xo :F 0 such that
vector is not identically zero. yet) = Ce4txo == O. Differentiating repeatedly with respect to t, it is found
The connection with observable pairs, as introduced in Section 5.13, is that
established by the first theorem. For convenience, write Q(C, A)e"t xo == 0

=[ f
and Ker Q(e, A) :F {O}. Consequently, observability ofthe pair (C, A) must
Q = Q(C, A) J. imply observability of the system.

CA n- l It has been seen in Section 5.13 that there is a notion of "controllability"


a matrix of size nr x n. of matrix pairs that is dual to that of observability. We now develop this
,~,
342 9 fuNCTIONS OF MATRICI!S 9.11 OOSEllVABLE AND CONTROLLABLE SVSTI!MS 343

notion in the context of systems theoty. Let A e CIIKlland Be CII Km and Lemma 2. The matrix-valuedfunction wet) defined by
consider a differential equation with initial condition .
.i(t) = Ax(t) + Bu(t), .11(0) = x o' (4) W(t) = f~eABB*eA' ds (7)

Consistency demands that x(t) e CII and u(t) e C'" for each t. We adopt the has the property that,for any t > 0, Im Wet) = f(JA,B'
point of view that u(t) is a vector function, called the control, that can be
PROOF. Clearly, W(t) is Hermitian for each t > 0 and hence [see Eq.
used to guide, or control, the solution x(t) of the resulting initial-value' .
(5.8.5)]
problem to meet some objective. The objective that we consider is that of
controlling the solution x(t) in such a way that after some time t, the function 1m Wet) = (Ker W(t.L.
x(t) takes some preassigned vector value in CII.
Also (see Eq. 5.13.5), if Jt'". .,B denotes the unobservable subspace of (B"', A"')
To make these ideas precise we first define a class of admissible control
then
functions u(t). We define "'", to be the linear space of all Cm-valuedfunctions
that are defined and piecewise continuous on the domain t ~ O. and we f(JA,B = (Jt'"A,B)J..
assume that u(t)e"'",. Then observe that, usingEq. (9.10.10), when u(t)e"'",
Thus, the conclusion of the lemma is equivalent to the statement Ker Wet)
is chosen, the solution of our initial-value problem can be written explicitly
in the form
=Jt'"A,B and we shall prove it in this form. .
If x e Ker W(t) then
x(t; Xo, u) = eA'xo + f~e<'-)ABU(S) ds. x*W(t)x = S)B*eA'x U2'd S = 0;
The purpose of the complicated notation on the left is to emphasize the . here the Euclidean vector norm is used. Hence, for all s e [0, t],
dependence of the solution x on the control function a and the initial value
xo, as well as on the independent variable t. B*eAx = O.
A vector y e CII is said to be reachable from the initial vector Xo if there is Differentiate this. relation with respect to s repeatedly and take the limit as
a aCt) e "'", such that x(t; x o, u) = y for some positive time t. Under the s -+ 0+ to obtain .
same condition we may say that y is reachable from Xo in time t.
The system ofEq. (4) is said to be controllable (in contrast to controllability B*(A*)I-lx = 0 for j = 1,2, ... , n.
of a matrix pair) if every vector in en
is reachable from every other vector i~
en. Our next major objective is to show that the system (4) is controllable if
n Ker(B*(A*y-l) =
II

and only if the pair (A, B) is controllable, and this will follow easily from the' xe Jt'"A,B'
next result. J=1

Recall first that the controllable subspace of (A, B), which we will denote Thus Ker Wet) c Jt"A.B' It is easy to verify that, reversing the argument
f(JA.B' is given by and using Lemma I, the opposite inclusion is obtained. So Ker W(t) = Jt"A.B
n-l and the lemma is proved.
f(JA.B = L Im(A'B). This lemma yields another criterion for controllability of the pair (A, B).
,=0
(See Corollary l(d) of Theorem 5.13.1.) Corollary 1. The pair (A, B) is controllable if and only if the matrix
W(t) ofEq. (7) is positive definite for all t > O.
Theorem 2. The vector % e en is reachable from y e cn in time t if and only if
PROOF. It is clear from definition (7) that Wet) is at least positive semi-
%- eA'yef(JA.B' definite. However, for (A, B) to be controllable means that f(JA.B = en,
The proof will be broken down with the help of another lemma, whic and so Lemma 2 implies that 1m Wet) = en. Hence Wet) has rank n and is
provides a useful description of the controllable subspace of (A, B). . therefore positive definite.
344 9 FUNCTIONSOP MATJl.ICIlS 9.12 MISCI!LLANIlOUS EXl!RCISIlS 345

PROOF OF THEOREM 2. If % is reachable from , in time t, then, using Using definition (7) of W(t), this equation reduces to
Eq. (5), x(t; y, II) = eAty + % - eAty = %,
% - eA'y = f~el'-')AB"(S) ds as required.
Bxercise 1. Show that if z is reachable from y and y E flA. B' then % e flA. B'
for some" e t.p[m' Use Lemma 1 to write Bxercise 2. The system dual to that of Eqs. (4) is defined to be a system of
%- e"ty = rAJ- B i''''J..t - s)lI(s)ds,
J""1
1

0
the type in Eqs. (1), given in terms of A and,8 by
y(t) == -A*y(t),
and, using definition (6), it is seen immediately that % - e'" 'Y e ~A, B' %(t) = B*y(t).
For the converse, let % - e"'ye~A.B' We are to construct an input, or Show that the system (4) is controllable if and only if its dual system is
control, "(t) that will "steer" the system from y to %. Use Lemma 2 to write observable.
Exerdse 3. Let flAB have dimension k. Use the ideas of Section 4.8 to form
%- e"ty = W(t)w = {eA'BB'*eA'W ds
a matrix P such that, if A = p- 1 AP and B = p- 1B, then
for some WE cn. Put s = t - T to obtain and
%- e"'y = {eAII-.IBB*e"I'-.lw dT = f~e"ll-')B,,(t) dt, where AI is k x k, Bt is k x m, and the pair (AI' B t ) is controllable.
where we define 11(1) = B*eA"'(I-.lw e t.p[m' Comparing this with Eq. (5), Hint. Recallthat~A.BisAinvariantandlm B c: fl",B' Usethetransforma-
we see that % is reachable from y in time t. tion of variables x = p% to transform x(t) = Ax(t) + B(t) to the form
%(t) = Az(t) + B,,(t) and show that if
Coronary 1. The pair (A, B) is controllable if and only if the system = [%I(t)]
%()
.i = Ax + BII, x(O) = X 0, is controllable. t () ,
%2 I

hOOF. If the pair (A, B) is controllable then ~A.B = cn. Thus, for any whercz 1 (t) takes values in C, then %1(t) = A 1%1(t) + B 1,,(t) is a controllable
y, % E Cn, it follows trivially that z - eA , y E fl A. B' Then Theorem 2 system.
that the system is controllable.
Bxerdse 4. Establish the following analogue of Lemma 1. If A E Cn " n,
Conversely, if the system is controllable then, in particular, every vector
then there are functions qJl(l), .. , qJn(l) analytic in any neighborhood not
% E fl" is reachable from the zero vector and, by Theorem 2, flA. B == e. containing points of a(A) such that
Thus, the pair (A, B) is controllable.
n
When the system is controllable, it turns out that any % e n is reachable c 0.1- A)-1 = L tpJ..,t)AJ-l. 0
c
from any 'Y e n at any time t > O. To see this, we construct an explicit control J"'1
function ,,(s) that will take the solution from y to % in any prescribed time t.
Thus, in the controllable case, W(t) of Lemma 2 is positive definite and we
9.12 Miscellaneous Exercises
may define a function '
..(s) == B*e"ol' - ' IW (t)- l(z - e"',).
1, If f(l) is defined on the spectrum of A E cn lC II, prove that f(AT) =
Then Eq. (5) gives [f(A)]T.
Hint. Observe that the interpolatory polynomials for A and AT
x(t; y,,,) == e"'y + ({eAI1-.IBB*e"I'-.) ds)W(t)-I(% _ eA1y). coincide.
346 .9.12 MISCELLANEOUS EXERCISES
347
2. If A
n
show that A ==
eC"X , l
D=l
(A~ZkO + 2AkZ U + 2Zk2) by use of'. and hence that if all eigenvalues of A are in the half-plane given by
Theorems 9.5.1 and 9.5.2 and Exercise 9.5.1(b). {A E C: !JI,e l < tft,e Ao}, then
3. Prove that the series I + A + A l + ... converges if and only if all
the eigenvalues of A have modulus less than 1. (lAo - Ar 1 = Looe-(UO-Alr dt,

Hint. See Exercise 9.8.5.


Hint. Apply the result of Exercise 9.5.2.
4. Show that a unitary matrix U can be expressed in the form U == el1l,l
where H is Hermitian. Observe the analogy between this and the repre- 10. Prove that for any A e C"lC", .the component matrices ZkJ U= 0, I,
sentation u == e'" (cx real) of any point on the unit circle. ... , mk - I, k == I, 2,... , s) sattsfy

Hint. Show that U == V diag[e"'l, e"'z, ... , e""']V for some unitaryV~
and real scalars CXt, /Xz, . , tXn and define H == V diag[cxl' CXl' , cxJV ;,i ZkjZkr
o t)Zk.J+tJ
= \ +i 0 S t. t S mk - I,
5. If A e C"X", prove that there is a real skew-symmetric matrix S such that, with the convention that Zkl == 0 if 1 ~ mk'
A == tr if and only if A is a real orthogonal matrix with det A == 1.
Hint. Use Eq. (9.5.9).
6. Determine A"(A), where
ll. Let L(A) == 2:1=0 A'L, be an n x n A-matrix over C,let A e CftlC", and let
= [1 - A+ A 1- A]
l
A(A) , L(A) == 2:1=0 L,A'. Prove that .
A - Jlz A'
B m,,-1
for all integers n and all A by use ofEq. (9.5.1). (a) L(A) == 2: 2: L(J)(Ak)ZkJ'
k=1 j=O
7. If the elements of the square matrix D(Jl) are differentiable and A(A) =
det D(A), show that where ZkJ (1 S k S s, 0 SiS mk - 1) are the component matrices for
A '
1.
and if A(A) oF 0, (b) L(A) = 2
n,)L
r L(A)R,t dA,
where R,t = (AI - A) - 1, A u(A), and L is a contour like that described
in Theorem 9.9.2.
8. Suppose that A is a real variable. Let D(A),. defined in Exercise 7, be. Hint. For part (a) first find A' (i = 0, I, '" ,1) by Eq. (9.5.1); for
solution of the matrix differential equation X(l) = X(A)A(A), whe part (b) use Eq, (9.8.4) and part (a).
A(A) exists for all Jl. Prove that if A(A) == det D(A), and if there is a ,1.0 f
which A(Ao) :F 0, then
12. With the notation of Exercise 11, prove that

A(A) = A(Ao) exp L: tr(A(,.,. d~ L(A)Zkr ==


m,,-l
2: LlJ)(Ak)
(i + t)
. Zk,J+r
j=O }

and hence A(A) :F 0 for all A. (This relation is variously known as t fOr k == 1,2, ... , s and t == 0, 1, ... , mk - I, where Zk,j+t == 0 for
Jacobi identity, Liouville'sformula, or Abersformula.) j + t 2: mk'
9. If all the eigenvalues of A have negative real parts, prove that Hint. Use Exercises 10 and 11 and Theorem 9.5.2(b).

A-I = - L""e" dt, 3. Prove that Ker Z"o c Im(A - ~l) and that mk == 1 implies Ker ZkO
= Im(A - Akl) for k = I, 2, ... , s.
348 9J2 MISCELLANEOUS ExERCISES 349

Hint. Represent CPkO(,1.) - 1 = (A - AJq(A), where CPIIl-A) are defined 21. Let I(A) be a function defined on the spectrum of a Hermitian matrix
in Section 9.2, and substitute A for .t For the second part use Exercise H with the property If(l)1 = 1 whenever l is real. Show that f(B) is
9.5.1(b). unitary.
14. If A, BE Cn x n and X = X(t) is an n x n matrix dependent on t, check 22. (a) Let Ao be an eigenvalue of A e C" X II with algebraic multiplicity Jl.
that a solution of Show that
x = AX + XB, X(O) = C, J.l = lim [(l - Ao) tr(M - A)-I].
.a-+""
is given by X = eAtceB'. Hint. Use Ex. 7 above with D(A) = AI - A.
IS. Consider the matrix differential equations
(b) If A has component matrices Zo, ZiI".' ZIII-l corresponding
(a) X = AX, (b)X = AX + XB, to Ao, show that
in which A,BeCn x lI are skew-Hermitian matrices and X = X(t J.l for j = 0
tr(Zj) ={ .
Show that in each case there is a solution matrix X(t) that is unitary fo o for J = I, 2, ... , m - 1.
each t.
(c) Check the above properties in Exs. 9.5.3 and 9.5.5.
Hint. Use Exercise 14 with C = I for part (b).
16. Let A(t), B(t) E Cn XII for each t E [til t 2 ] and suppose that any solutio
of X + AX + XBX = 0 is nonsingular for each t E [t l , t 2] . Show th
following.
(a) The matrix-valued function Y = X-I satisfies the linear equ
tion
y- YA = B. (1
(b) If A is independent of t, then the solution of Eq. (1) with Y(to
= C- t .
S to S t2 ) is given by
Y(t) = C- 1e4(.- lol + i'
10
B(r)e4('-'l d't'.

17. Find the solution of i = Ax +/, x(O) = xo, x(O) = %0'


t8. Verify that if A is nonsingular, then
x(t) = (cos A 1/2 t)XO + (A 1/2)-I(sin A I / 2t )XO
is the solution of f = - Ax, x(O) = xO' %(0) = .to' How can the res
be extended to include a singular matrix A?
19. If A e C" X " and j(t) = Ay(t) has a nonzero solution y(t) for whi
y(t +'t') = Ay(t) for all t and some fixed 't' > 0, prove that A =
for some J.l E a(A). Hence show that a periodic solution exists only'
is singular.
20. Show that the controllable subspace of the pair (A, B) is the small
A-invariant subspace containing 1m B.
Hint. Note Exercise 9.11.3.

~:l
:.1
\ .i.
;
,
10.1 THB NOTIONOF A NORM 351

It is helpful to bear in mind the analogy of a norm with the modulus of a


complex number or with the familiar length of a vector in R 3 The axioms
CHAPTER 10 1-3 are immediate generalizations for these ideas and therefore, in particular,
axiom 3 is called the triangle inequality. '
Note first of all that in a unitary space tfI (see Section 3.11) the inner
product generates a norm. Thus, one defines /lxU = (x, X)1/2 for any x e <fI,
Norms and Bounds and the norm axioms above follow readily from the inner product axioms.
It can be shown, however, that not all norm functions are generated by an
for Eigenvalues inner product and, since several such norms are widely useful, much of the
analysis of this chapter is completed without reference to inner products.
The next set of exercises is important. The first points out some immediate
and frequently used consequences ofthe norm axioms. The second introduces
the most important norms on the spaces an and en of n-tuples, The third
exerciseshows, in particular, that norms can always be defined on an abstract
finite-dimensional fineer space. A linear space together with a norm defined
on it is called a normed linear space.
Exercise 1. (a) For any x, y in a normed linear space 9', prove that
In many situations it is useful to be able to assign a single nonnegative.
number to a vector or a matrix as a measure of its magnitude. We already Illxll - lIy/ll S; IIx + yll s; IIxll + IIYII
have such a measure of magnitude for a complex number, namely, its (b) For any x e 9', show that II-x/l = [x], I
modulus; and for a vector x of en, the magnitude can be measured bytlte (c) For any x e V with x 0, show that there is a member of f/ of the
length (x, x)1/2, introduced in Section 3.11. We will take an axiomatic form /XX with norm equal to 1. (For example, take a = IIxll- 1 .)
I I
approach to the formulation of measures of magnitude or norms of elementS
of an arbitrary linear space and begin by setting down the properties we' Exereise2. Let x be a typical vector ofCn(or an),x = [Xl X2 ... xJ.T
demand of such a measure. We shall see that there are many possible choices Show that the functions II II"", /I 111, II 112, II lip defined on Cn (or an) as
for norms. follows are all norms on C" (or R"):
A study of norms of matrices and vectors is the first topic of this chapter,
while different results concerning bounds for eigenvalues (some of which are
(a) IIxll"" = max IXil (the infinity norm);
1 :sj:sn
expressed in terms of a norm of a matrix) are studied in the second part of n
the chapter. (b) /lxlll L Ixlli
= j=1
(c) IIxlb = CtllXjl2)1/2 = (x*x) 1/2 (the euclidean norm);
10.1 The Notion of a Norm
(d) IIx ll" = (tl IXJ!1')liP, p ~1 (the Holder norms).
Consider a linear space 9' over the field of complex or real numbers.!.;'
real-valued function defined on all elements x from 9' is called a norm (on 9");. Hint. For part (d) the Minkowski inequality should be assumed, that is,
usually written /I II, if it satisfies the following axioms: .for any p ~ 1 and any complex n-tuples (x I> x 2, .. , x n) and (.vI> Y2' ... , Yn),
(1) /lxll ~ 0 for all x E 9' and IIxll = 0 if and only if x = Oi
(2) lIaxll = lalllxll for all x e 9' and a e C;
(3) IIx + yll s IIxll + \lylI for all x, Y E 9'. 'I
~'~.'

I
350
352 10 NORMSANDBOUNDS FOREIGENVALUIlS 10.1 TuB NOTiONOFA NORM 353

Exercise 3. Let {bl , bz , . ,b n} be a fixed basis for a linear space g, and for The remainder of this section is devoted to two important results con-
any x e 9', write x = D-l fl.j bj cerning norms in general. The first tells us that a norm is necessarily a
"smooth" function of the coordinates of a vector x e g with respect to a
(a) Showthat IIxll.., = maxi SjSn Ifl.jl definesa norm on 9'. fixed basisin 9'. Thus,"small" variations in the coordinates of x giverise to
(b) If p ~ 1,show that Ilxll, = (D=l lfl.jl,)l,,, defines a norm on 9'. 0'
smallvariationsin the sizeof the norm. Observefirst that in a normed linear
The euclidean norm of Exercise 2(c) is often denoted by II liE and is just space Ilx/l depends continuously on x itself-. This follows immediately from
one ofthe Holder norms of part (d).It is clearfrom these exercises that man Exercise l(a), which implies that, in any nonned linear space 9',
different normscan be defined on a givenfinite-dimensional linearspace,and. IlIxll - /lylll S; IIx - 111
a linear spacecombined with one norm produces a different "normed linelU"' . for all x, Y e EI'.
space" from the same linear space combined with a different norm. For:
example, en with II II.., is not the same normed linear space as en with II liz'
Theorem 1. Let {b l> b2., ... ,b,,} be a fixed basis in a linear space 9'. Then
For this reason it is sometimes useful to denote a normed linear space by tiny norm defined on 9' depends continuously on the coordinates ofmembers of
~with respect to b1, ... , bn In otherwords,ifx = D=1 xJbJ'Y = D=l yJbJ
pair, 9', II II.
denote arbitrary members of 9' and II /I is a norm on 9', thengiven S > 0 there
lt is also instructiveto consider the examplesof Exercise 2 as special
i~ a t5 > 0 depending on fl such that
of Exercise 3. This is done by introducing the basis of unit coordinate vecto
into en, or IIln, in the role ofthe general basis {b l ,. , bn} of Exercise 3. IlIxll - 111111 < S
The notation II II 00 usedfor the norm in Exercise 3(a)suggests that it migh~ whenever IXj - Yjl < 0 for j = 1,2, ... , n.
be interpretedas a norm of the type introduced in Exercise 2(d) with p == 00:
PROOF. Let M = maxlSJSn IIbjll. By the first norm axiom, M > 0. so we
This can bejustifiedin a limitingsensein the following way: Ifx == D"l {t)ll
and there are v components al (1 S; i S; n) of absolute value IIxll oo == Ii may choosea 0 satisfying 0 < 0 S; s/Mn.
= maxlSISn lall, then ' Then for any x,JIe9' for which Ix) - Jljl < Oforeachj we have.using the
.~ond and third norm axioms,
'/;-:j.

Ilxll" = (lall" + lfl.zl" + .,. + 1n1")I',, = p(v + sf + + S~_y)I''', n n


"x- JI/I S L Ix) -
j=l
Yjlllbj " S;M L Ix) - JlJI < Mno S 6.
where for some k dependingonj, Sj = la"l/p < 1 (j = 1,2, , n - v). No j= 1

it is easily seen that Uxll oo = P = lim, __ oo IIxll". The next result is a comparison theorem betweendifferent norms defined
i
onthesamespace. Ifanytwo norms, II lit and II liz, on 9' satisfyII X/ll S; /Ix liz
Bxercise 4. If II Udenotes a norm on a linear space 9' and k > 0, showtha 'forallx e 9', wesaythat II liz dominates II lit. Now,wehavenoted in Exercise
kU II is also a norm on 9'. 4 that a positivemultiple of a norm is a norm. The next result says that all
Exercise 5. Let 'ft be a unitary space with inner product ( , ), as defin normsare equivalent in the sense that for any two norms II III and II liz on
in Section 3.11. Show that, IIxll = (x, x)1/Z defines a norm on 'ft. (Thus, th 9', II 111 is dominated by a positive multiple of II liz and, at the same time.
natural definition of a norm on a unitary space makes it a normed line ~: liz is dominated by a positivemultiple of II 111' It should be noted that the
space.) proofdepends on the fact that 9' is finite-dimensional.
:1 First we make aformal definition: two norms II 111 and II liz defined on a
Exercise 6. Show that the euclidean vector norm on en is invariant unde linearspace9' are said to be equivalent if there existpositivenumbers r 1 and
unitary transformations. In other words, if x e en and U e en" nis a unit ril depending only on the choice of norms such that, for any nonzero x e 9', I
matrix, then IIUx/l z = /lx/lz. Formulate an abstract generalization of -. i
result concerning an arbitrary unitary space 'ft. rl S; IIxlll/llxllz S; rz. (1)
rem 2. Anytwonorms onafinite-dimensionallinear space areequivalent.
Exercise 7. If H is a positivedefinite matrix, show that
OOF. Let II III and /I liz be any two norms defined on EI'. We have
IIxll = (Hx, X)l'Z == (X*HX)I'Z bserVed that the functions fl and fz defined on 9' by flex) = IIx/ll and
is a nann on en. What is the generalization appropriate for an abstra z(x) = II x liz are continuous. Then it is easily verified that the sets
unitary space? 0 C1 = {xeg;lIxlll = l], C2 "" {xe9':lIxIl2-l}
354 10 NORMSANDBouNDS FOR ErOIlNVALUES 10.2 A VIlCTOR NORMASA MIlTIUC 3SS

are bounded and closed in 9' (see Appendix 2). It follows that the greatest They correspond to the axioms for a metric, that is, a function defined on
lower bounds pairs of elements of the set f/I that is a mathematically sensible measure of
distance.
'1'1 = inf IIxlb, 'I':z = inf
.....c.
Ilxlll The set of all vectors in a normed linear space whose distance from a fixed
..... Cl
pointxo is r > 0 is known as the sphere with center Xo and radius r. Thus,
are attained on members of C 1 and C 2 , respectively. Hence '1'1 = Ilxoll:z for written explicitly in terms' of the norm, this sphere is
some XoEC l and so ')11> O. Similarly, '1'2 > 0 and for any nonzero xe9',
it follows from (2) and the second norm axiom that {xe9':lIx - xoll = r}.
The set of x e 9' for which II x - .to" ~ r is the ball with centre Xo and radius
'I' ~ II x II - IIxU2 r.The sphere with centre 0 and radius r = 1 is called the unitsphere and plays
1 Oxlll 12 - UXUl an important part in the analysis of normed linear spaces, as it did in the
and proof of Theorem 10.1.2. The unit ball is, of course. {xe9':lIxU ~ I}.
It is instructive to examine the unit sphere inR:z with the norms II 11"", I
x II II xII 1 n 112, and II 111 introduced in Exercise 10.1.2 It is easy to verify that the
')12 s II Ilxlb ~1 = IIxlb portions of the unit spheres in the first quadrant for these norms (described
I
Taking 7l = '1'2 and 72 = l/Yl' inequalities (1) are obtained and the theorem by p = 00,2. 1, respectively) are as sketched in Fig. 10.1.It is also instructive I
is proved. to examine the nature of the unit spheres for Holder norms with intermediate
values of p and to try to visualize corresponding unit spheres in R 3
I,
Exercise B. Verify that equivalence of norms on 9' is an equivalence
relation on the set of all norms on 9'. 0 Exercise 1. Show that the function II 1/ defined on en by
. Theorem 2 can be useful since it can be applied to reduce the study IIxll = IXll + ... + IXn-ll + n- 1Ix
nl,
properties of a general norm to those of a relatively simple norm. The stron where x = [Xl X:z Xn]T, is a norm on en (or Rn). Sketch the unit
candidates for the choice of a "relatively simple" norm are geneI1!.lly spheres in R 2 and R 3 0
U11"",0 111' or most frequently, II liz of Exercises 2 and 3.

10.2 A Vector Norm as a Metric: Convergence

In Section 10.1 the emphasis placed ona norm was its usefulness as
measure of the size of members of a normed linear space. It is also ve
important as a measure of distance. Thus, if f/, 0 II is a normed linear sp
then for any x, y E f/, the quantity Ox - yU is thought of as a measure oft
distance between x and y. The special case of R 3 (or R:z) with the euclid
norm produces the most familiar physical measure of distance.
The following properties are easily verified from the norm axioms fl
any x, y, z in a normed linear space:
(a) 11% - yll ~ 0 with equality if and only if y = x; o x,
(b) IIx - YII= IIY - xII;
Fig. 10.1 Unit spheres in R. for p-norms.
(c) llx - YO ~ 11:1' - s] + liz - Yll
356 10 NORMS AND BoUNDS FOR EIGENVALUES 10.2 A VECTOR NORM AS A MIlTRIC
357

Norms can be defined on a linear space in a great variety of ways and give However,this process is cumbersome and appears to depend on the choice
rise to unit balls with varied geometries. A property that all unit balls have in of basis for .9' (although, in fact, it does not). Furthermore it turns out to
common, however, is that of c~nvexity. (See Appendix 2 for a general dis- be equivalent to a notion of convergence defined by a norm and, indeed, any
cussion of convexity.) norm, as we shan soon verify.
A sequence {Xt}k':.1 in anormed linear space f/, II II is said to be convergent
Theorem 1. The unit ball in a normed linear space is a convex set. in norm (or simply convergent) to Xo e.9' if IIx" - xoll -loO as k ... co. We will
write, informally, Xt -+ Xo as k -+ co.
PROOF. Let 9', II II be a normed linear space with unit ball B. If x; Ye B,
then for any t e [0, 1], the vector % = tx + (l - t)y is a member of the line ,Exercise 3. Show that, if Xt -+ XO, Yle -+ Yo,and ale -+ oeo as k -+ co, where exo
segmentjoining x and y. We have only to prove that % e B. But the second and and the ext are scalars, then IIxt - xoll ..... 0 and 0Vt + Yt -+ OCoXo + Yo as
third norm axioms immediately give k -+ co.

11%11 s tllxll + (1 - t)1I111, Exercise 4. Show that if {Xt}:'''''1 converges in one norm to Xo, then it
converges to Xo in any other norm.
and since IIxlI S 1 and lIyll S 1 we get 11%\1 S 1, so that %eB. Hint. Use Theorem 10.1.2. 0
Exerdse 1. Show that any ball in a normed linear space is a convex set. The next theorem showsthat coordinatewise convergenceand convergence
i. .ip. norm for infinite sequences (and hence series) in a finite-dimensional
A primitive open set in a normed linear space is a neighborhood. which iSt space are equivalent. However, the result is not true if the space is not of
simply the interior of a ball. More formally, if 9'. \I \I is a normed linear spa . finite dimension. Indeed, the possibility of different forms of convergence is
the neighborhood with centre Xo and radius r is the set ..a major issue in the study of infinite-dimensional linear spaces.
N(xo, r) = {xe9':lIx - xoll < r}. .Theol'em 2. Let {Xt}:'''''1 be a sequence in a./initedimensionallinear space[/'.
The sequence converges coordinatewise to Xo e f/ if and only if it converges
Neighborhoods are frequently useful in analysis. For example, if f/, II to Xo in (any) norm.
and 91, \I liz are two normed linear spaces and f is a function defined on
with values in 91, then f is continuous at Xo e 9' if given e > 0 there is a 6 > l,'ROOF. Let {b 1,b2 , ... , bn} be a basis in the linear space 9' and write
depending on s, such that x E N(xo, lJ) implies f(x) E N(f(xo), e). Note th .tie = D"", exCJtlbJ for k = 0, 1,2, .... Given coordinatewise convergence to
N(xo, 6) is a neighborhood in f/ defined via the norm II 111 and N(f(xo), sF *0 e f/ we have, for any norm II II defined on f/,
a neighborhood in Of defined using the norm" 112.
A basic concept of analysis isconvergence,and we bave already consider IIxle - xoll = II t (ex}t) - a}ObJ list
~
lex}t) - aT)llIbJlI -+ 0
the convergence of sequences and series in ell and IIln in Section 9.8. For J"" I J= 1
general finite-dimensional linear space 9', the approach adopted there ~s k -+ co,since n and the basis are fixed. Hence, we haveconvergencein norm.
be generalized by first introducing a basis into 9', say {b" b2 , .. , b" ... Conversely,given that Xt -+ Xo in any norm we can, using the equivalence
If {xaJ:'= I is a sequence in 9' then each member has a representation l'>fnorms (as in Exercise 4), replace the norm by II II.., of Exercise 10.1.3(a).
lI'hen

tx~O)I.
k = 1,2, ... ,
IIxl: - xoll.l:I= II t(a}t) - a}0bJ II = max la}t) -
. J=l .., hiJsn
and the sequence is said to converge coordinatewlse to an elem
IIx~ -
. I~

#:rhus, xoll -+ 0 implies IIx" - xolI.., -+ 0 and hence


Xo = D-, cx}O)bj if cxr)-+ cx~O) as k -. co for each of the n scalar sequen ~I,:\

obtained by putting J = 1, 2, ... ,n. Thus, the convergence of sequences max Icx~l:) - ~O) I -. 0
vectors considered in section 9.8 is just coordinatewise convergence in t ISJS.
~
linear space en with respect to its standard basis. k ... co. This clearly implies coordinatewise convergence. 1

'\~ ."
. 'i.:
.

, I
358 10 NORMS ANDBOUNDS FOREIGENVALUES 10.3 MATRIX NORMS
359
10.3 Matrix Norms Use the HOlder inequality:

We start this section by confining our attentionto norms on the linear,


"
j~IIXJYJI S;
("'~IIXjl' )1/'("~lIYJlq,
)1 1
'1
(3)
space enlC n, that is, to norms of square matrices. Obviously, all properties where p ~ I, q ~ I, and v:' + s:' = 1.
of a general norm discussedin Sections 1 and 2 remain validfor this particulan
case. For instance, we have the following. . Exercise 5. Prove that, for any matrix norm 1/ II,

Exercisel. Verify that any norm on e nlC n depends continuously on thtj 11111:2: I, IIAnl/ sl/AII", 1IA- 11/:2: II~II (detA O).
matrix elements.
Exercise 6. If S is nonsingular and II II is a matrix norm then the real-
Hint. Consider the standard basis {EIj}tJ;l for e nlC n (see Exercise 3.5.1) valuedfunction f defined by f(A) = IISAS-ll/ is a matrix n~rm.
and use Theorem 10.1.1. 0 "
Exercise 7. Check that the following are true.
The possibility of multiplying any two matrices A, Been ICn gives rise to
an important question about the relation between the norms l/AII, IIBII ~d . .(a) The function IIAII = n maxI5,.J5nla,J! is a matrix norm. (Compare
the norm I/ABI/ of their product. We say that the norm 1/ 1/ on e nxn is a WIth Exercises 2 and 10.1.4.)
matrix norm or is submultiplicative (in contrast to a norm of a matrix as~ . (b) If II liE denotes the euclidean matrix norm (see Exercise 3) and II II is
member of the normed space en X n) if i~ defined as in part (a), then IIA liE S IIA 1/.
(c) IIAII~ = tr(A*A).
IIABII s IIAII IIBII Exercise B. If p :2: 2 and (lip) + (llq) = 1, then in the notation of Eq. (2),
for all A, Been IC". Before examining some matrix norms note first that nt) IIABIl, s min(llAll, IIBllq. IlAllq IlBIl,).
all norms on en Xn are matrix norms.
Hint. Use Eq. (3). 0
Exercise 2. Verify that the real-valued function max1S:i.Js:nla'JI, where i

is the i,jth element of A E.enxn, defines a norm on en ><n but is not a mat ': . The most useful matrix norms are often those that are easily defined in
norm. termsofthe elementsof the matrix argument. It is conceivablethat a measure
ofmagnitude for matrices could be based on the magnitudes of the eigen-
Exercise 3. (The euclidean or Frobenius norm.) Prove that ~~lues, although this may turn out to be of no great practical utility,
s~nce the eigenvalues are often hard to compute. However,if A.l' . , A." are the
IIAIIE =(
I.J""1
lall1)1/2, elge~values of A, then J.l.A = maxlsJsnlA.jl is such a measure of magnitude
and IS known as the spectral radius of A.
. Consider the matrix N of Exercise 6.8.6. It is easily seen that all of its
defines a matrix norm on en" n. .eigenvalues are zero and hence the spectral radius of N is zero. Since N is
Hint. For the norm axioms the argument is that of Exercise10.1.2(c), whi .not the zero-matrix, we see immediately that the spectral radius does not
requires the use of the Cauchy-Schwarz inequality (Exercise 2.5.4). Fa satisfythe first norm axiom. However, although the spectral radius is not a
property (1) use the Cauchy-Schwarz inequality again. norm, it is intimatelyconnected with the magnitude of matrix norms. Indeed,
matrix norms are often used to obtain bounds for the spectral radius. The
Exercise 4. This generalizes Exercise 3. Show that the Holder norm, next theorem is a result of this kind.

IIAII, = L
n
laul'
)1/',
p :2: 1, theorem 1. If A E en IC.. andJlA is the spectralradius of A then,for any matrix
(
I.J'" 1 norm,
is a matrix norm on en x nif and only if 1 S p S 2.
360 10 NORMSAND BouNDs FOR EIGBNVALU 10.3 MATRIX NORMS 361

PROOF. Let il. be an eigenvalue of A such that itA = 1.1.1. Then there is matrix norm of Exercise 6 are compatible if and only if \I IIvand II II are
vector x rF 0 such that Ax = Ax. Definethe n x n matrix compatible. 0
Ax = [x 0 0 ... 0] Aswithcolumnvectors and their norms,norms appliedto matricesprovide
Ii useful vehicle for the study of convergence of sequences and series of
and observethat
matrices. Indeed, comparing the discussions of sections 9.8 and 10.2 it is
AAx=A.A x easily seen that the followingproposition is just a special case of Theorem
10.2.2.
Using the second norm axiom and property (1) of the matrix norm, we
deduce that ,Proposition 1. Let {A.J;'..1 be a sequence in CIIl X " . The sequence converges
coordinatewise to A if and only if /lAic - A II -+ 0 as k - eo in any norm.
IAllIAxll ~ lIAIIIIAxll,
Since 4 x rF 0, it follows that IIAxll :F 0 and hence
Here,the norm needs to have only the three basic properties of a vector I
IAI = itA s /IAII
norm,and the sub-multiplicative property plays no part.
Thenextexercises illustratethe useofnorms in the analysisofan important
I
numerical procedure. I
We are very often faced with the problem of finding the norm of a vect..
that is givenin the form Ax, where A e C"X" and x e C". Givenour experien
withproperty (1) for a matrix norm, wemightexpectthat there willbema
norms II II and vector norms II IIv (that is, norms on C")for which
Exercise 12. Supposethat A e C" lC" has an eigenvalue Al withthe properties
~hatIAll exceeds the absolute value of all other eigenvalues of A and the
~dex of Al is one.(Then Al is calledthe dominant eigenvalue of A.) Showthat,
\i
~
i'

IIAxll v s lIAII IIxll v ,'PI r --I> 00,


When Eq. (4) holds for all x e C" and all A E C"x", then the vector norm II
and the matrix norm II II are said to be compatible.
(;fA)'- Z10' (5)

Exercise 9. Prove that the euclidean matrix and vector norms are co Hint. Let j(z) = z"and useTheorem 9.5.1. (ThehypothesisthatAl has index
patible. e is made for convenience. Otherwise,the limit on the right of (5) is a little
more complicated.):
SOLUTION. For arbitrary x ".;, [al a2 . . !XJT E C"and
~xercise 13. (The power method for calculating eigenvalues). Let A e C"x"
A = [alj]4J=1 E C"x", , d have a dominant eigenvalue Al of index one (as in Exercise 12). Let Xo
compute Ax = [PI P2 PJT, where P, = D=l a/jaJ' i = 1,2, .... ; any vector for which X o 11 Ker A", lIxoll = 1 in the euclidean norm, and
Hence by Eq. (3) for p = 2, q = 2, ,IOXO rF O. Definea sequence of vectors Xo, xl> x 2, . of norm one by

IlAxll: = t IP,I2= tIt I2~ tIt laljl2){ t laJI2).


1=1 'el J=1
a'JaJ
1-1 \i=1 \,;-1
1
x, = kAX,-h
, r = 1,2, ... ,

Finally, since 1IA1l~ = ~1.J=1 la'JI2 and IIxll: = Li=l IIXil2, we have here k, = IIAx,-ll1. Showthat:
11Axll 2
~ IIAII~lIxIl2, .(a) The sequence is well-defined (i.e., k; rF 0 for any r).
which is equivalent to Eq. (4). (b) i\s r - 00, k, -IAII.
(c) There is a sequence {P'};"..1 of complexnumbers such that as r --I> 00,
Exercise 10. Prove that the matrix norm defined in Exercise 7(a) is c r+l/P, -+ IAd/A.1 and
patible with the Holder norms on C"for p = I, 2, co.
Exercise 11. Let II II. be a norm on C" and let SeC""" be nonsin
. p,x, - P1> (6)
Show that IIxlls = lISxll vis also a norm on C". Then show that II lis and t 'here PI is an eigenvector of A of norm one determined by the choice of X o'
362 10 NORMS ANDBouNDS FOil BIGIlNVALUES" INDUCED MATRIX NORMS
363
(Note that by normalizing the vectors x, to have euclidean norm equal to!; We shall prove that, ~or a fixed.matrix A~ this is a wen-defined nonnegative
one they are not defined to within a unimodular scalar multiple. Thus, the' number, and the resultmg function f(A) IS a matrix norm. We shall call it
pr~nce of P, in (6) is to be expected. In numerical practice a different the matrix .norm induce~ by the vector norm II lIy. Note that there is a close
normalization is likely to be used, with II II"" for example. But as in so many. . ~alogy with the Rayleigh quotient (for Hermitian matrices), as discussed
cases, the euclidean norm is the most convenient for a n a l y s i s . ) ' 10 Chapter 8. Also, ?bserve that the definition can be applied to the definition

OUTUNE OF THE SoLUTION. Assume inductively that k h k 2 , , k'_1 are" of a norm ,for any ,bnea~ tra?sformation acting on a normed linear space.
nonzero. Then show that . To begin our mveatigatlon of the quotient in (1), we first note some
Important properties: .

AX,-l = k; 1 k k A'Xo' Exere;se 1. Show that, for the function f(A) defined on C""" by Eq. (1),
-I'" '- 1

Since Ker A' = Ker A" for all r ~ n (ref. Ex. 4.5.14), Xo ~ Ker A'. and so (a) f(A) = sUPncII. "~"v= I IIAxll y;
AX,_1 =F O. Hence k, =F 0 and (a) follows by induction. . (b) There is a vector Xo e C", depending on A. such that Ilxoll y = 1 and
Then observe
I(A) = IIAxolly. Thus

k, = IIA'xoII/IIA,-lxoll. f(A) = max IIAxII Y' xecn. (2)


"~lIv=1
Define 'I = -IZIOXO, where = IIZloxolI =F 0, and use the pieced'
exercise to show that IAII-'IIA'xoll ... a. Combine with (7) to obtain part hint. Observe first that replacing x by (Xx does not alter the quotient in
Use the fact that Eq. (1). For part (b) take advantage of the technique used in the proof of
heorem 10.1.2. 0
x, = k,
1k k A'Xo
... '- I "The importance of the induced norms is partIy explained by part (c) of
..e following theorem.
together with Theorem 9.5.1 to show that
k, .. k2k l em I, Ler] IIv denote any vector norm on C",
l~ X, -+ a'l' (a) The function f defined on cn"n by
and define P, = (k, ... k 2 kl)/(acl~) to obtain (c). 0
f(A) = sup IIAxll y = max IIAxll v
"'''C'',~ .. o IIxll y ~"C".II"".= I
4 matrix norm and is denoted by II II.
(b) The norms II II and II IIv are compatible.
10.4 Induced Matrix Norms (c) If II 111 is any matrix norm compatible with II IIy then,for all A e cn"n,

IIAII s IIAII I
We now investigate a method for obtaining a matrix norm from ~y gi OOF. We set out to establish the axioms for a matrix norm. For axiom 1
vector norm. If II Ily is a vector norm in C" and A eC""", we c~nsld~r;:. clear that f(A) ~ 0, and that f(A) = 0 if A = O. If A =F 0, then there is
quotient IIAxlly/llxlly defined for each nonzero xecn. The qu?tlent IS, n
x eC such that Ax =F 0 and IIxlly = 1. Hence f(A) > O. Axiom 1 is
viously nonnegative; we consider the set of all numbers obtained wh refore satisfied.
takes all possible nonzero values in cn and define sing axiom 2 for the vector norm and Eq. (2), we obtain
f(A) = sup II Axlly. f(AA) = max IIAAxll y = IAI max IIAxlly = IAlf(A),
~"CII.~"O IIxll
Y 1I~lIv"l 1I~lIv= I
364 10 NORMS AND BoUNDS FOR ErOBNVAL 10.4 INDUCIlDMATRIX NoRMS
365

and so axiom 2 for the matrix norm is also satisfied. Furthermore, ,. ~y the euclidean vector norm is known as the spectral norm, and it is probably
the most widely used norm in matrix analysis.
II(A + B)xllv = IIAx + Bxll", S IIAxli v + IIBxll v'
,. Exercise 4. If A is a unitary matrix, show that the spectral norm of A is 1.
hence
ExerdseS. Let AeC"xnand let Ad be the spectral radius of A*A. If II II.
max II(A + B)xll v S max IIAxli v + max IIBxll v. . denotes the spectral norm, show that IIAII. = A~2. (In other words, II AII.
11%11,,=1 11"11,,-1 11"11,,"'1 is just the largest singular value of A.)
Thus, f(A + B) :s: f(A) + feB) and we have axiom 3 for the matrix no
SoLUTION. The matrix A *A is positive semi-definite and therefore its
Proceeding finally to the submultiplicative property (10.3.1) of a matd
eigenvalues are nonnegative numbers. Hence Ad is actually an eigenvalue
norm, we infer from Exercise 1(b) the existence of an X o e CII with IIxo IIv = ofA*A.
such that f(AB) = IIABxoliv for given A, BeCII XII. Then
Let Xl' X;z, ... , Xn be a set of orthonormal eigenvectors of A*A with
f(AB) = IIABxoliv = IIA(Bxo)llv s f(A)II Bx oll v' ~ssociated eigenvalues 0 SAl S A;z S ... S ~. Note that Ad = ~ and if
because it follows from Eq. (1) that f. D=I
=
~ector
ftlX" then A*Ax =
norm on CII. Then
D"'1
ftIAIXI' Let II liB denote the euclidean
"
IIAxll v :s: f(A)lIxll v IIAxll~ = (Ax, Ax) = (x, A* Ax) = L
Iftll;zAt.
for all nonzero x e CII and any A e ~ x n. Using Eq. (3) once more (with resp 1=1
to B), we have For xeC" such that IIxliB = 1, that is, L~=llftI12 = I, the expression
above does not exceed ~ and attains this value if ftl = CC2 == ... = ft"-1 = 0
f(AB) S f(A)\IBxollv s f(A)f(B)lIxollv = f(A)f(B), =
d ocn 1. Thus
and the property (10.3.1) holds. Part (a) of the theorem is thus proved IIAII. =
max IIAxlis = A~/2 = A~2.
we may write f(A) = IIA II. 11"11.=1
Observe that part (b) ofthe theorem isjust the relation (3). For part (c) xere;se 6. If A = diag[d.. d;z, ... , dll] and II II is a matrix norm induced
have, with an element Xo defined as in Exercise 1(b), one ofthe Holder norms of Exercise 10.1.2(d), show that
IIAII = f(A) = II Axoliv :s: IIA\lll1 xoliv = IIAII .. IIAII = max IdJi.
ISJS"
for any matrix norm compatible with II Ilv. xerdse 7. If A, U e C" x" and U is unitary, prove that, for both the euclidean
Exercise 2. Verify that 11111 = 1 for any induced matrix norm. atrix and spectral norms,lIAII = IIUAII = IIAUII and hence that, for these
Exercise 3. Let N e C"X" denote a matrix with ones in the i, i + 1 , norms, IlAIi is invariant under unitary similarity transformations.
.0

positions (1 si s nn - 1) and zeros elsewhere. Check that there are indu int. For the euclidean norm use Exercise 10.3.7(c); for the spectral norm
matrix norms such that . se Exercises 4 and 10.1.6.

(a) IINnll = n; (b) IINIIII = 1; (c) IINnll < 1. erase B. If A is a normal matrix in C"X" and II II.denotes the spectral
rm, prove that \IAII. = P.d' the spectral radius of A. (Compare this result
Hint. Consider the vector norm of Exercise 10.2.1 for part (a) and t, ith Theorem 10.3.1.) If f is a complex-valued function defined on the
norms ctrum of A, prove that IIf(A)II. is equal to the spectral radius of f(A).
IIxllv = (kl)n-1Ixil + (l)n-:z 1
k IX21+ ... + k IX"-II + Ix,,1 into Use Theorem 5.2.1.
ereise 9. Show that the matrix norm II lip induced by the vector norm
with k = 1 and k > 1 for parts (b) and (c), respectively. 0 II coof Exercise 10.1.2 is given by
II
Some of the most important and useful matrix norms are the indu
norms discussed in the next exercises. In particular, the matrix norm indu II A li p = IsiS"
max LJaIJI.
JzI
366 10 NORMSAND BouNDs FOil EJGIlNVALU\lS 10.5 ABSOLUTB VUCTOll NoRMS 367

(Note that weform the row sums ofthe absolute values of the matrix elements; Before leaving the idea of inducing matrix norms from vector norms, we
and then take the greatest of these sums.) ask the converse question: Given a matrix norm II II, can we use it to induce
SOLUTION. Let x = [Xl X2 XJT be a vector in cn with IIxll ... = 1. a compatible vector norm '1 This can be done by defining II IIv by
Then IIxll v = IIxa T Il (4)

IIAxll = max
llO
IslslI
Ii:.
i-I
aliXil::;; max i:. lalillxil::;; IIxll
ISISlli=1
oo II A/lp = II All p.
for some fixed nonzero vector a.
Exercise 12. Let
Thus,
A = [ 3/2 -1/2]
max IIAxll oo s IIAllp, -1/2 1 .
11"'11",=1
Use Exercise 11 to show that IIAII. s 2 and Exercise 8 to show that
and so the norm induced by II 1100 cannot exceed IIAIlp- To prove thattheyare, IIAII. = (5 + JS}/4 ~ 1.8.
equal we have only to show that there is an Xo E e" with 11%011... = 1 and:
IIAxollllO ~ IIAIlp- It is easy to see that if IIAll p = 'D=llalcil, then the vector Exercise 13. Consider an n x n matrix with block diagonal form,
Xo ::= [otl ot2 otJT, where . A = diag[A l A 2 ] , where Al and A 2 are square matrices. Show that, in
the spectral norm,
laki l IIAII = max(IIA 1 1I, IIA:aIl}.
(Xj = alcj
Let A e en x nand consider a Hermitian matrix of the form
{
o 0 iA]
H = [ -iA* 0 .
j = 1, 2,... , n; satisfiesthe above conditions.
Show that IIHII = IIAII, in the spectral norm. 0
Exercise 10. Show that the matrix norm induced by the vector norm II
of Exercise 10.1.2 is given by
" lalli.
IIAll y = max L 10.5 Absolute Vector Norms and Lower BoundS
Isis" 1=1 of a Matrix
(Note that we form the column sums of the absolute values of the mat .
elements, and then take the greatest of these sums.) . Almost all the vector norms we have introduced in the exercises depend i

Hint. Use the fact that IIAll y = IIA"'IIp- .only on the absolute values of the elements of the vector argument. Such i,

norms are called absolute vector norms. The general theory of norms that we
Exercise 11. If II II. denotes the spectral norm and " lip and " II, are have developed up to now does not assume that this is necessarily the case.
defined in Exercises 9 and 10, prove that We shall devote the first part of this section to proving two properties of
absolute vector norms.
IIAII: s IIAll pII A ll y . One of the properties of an absolute vector norm that we shall establish
SoLUTION. Using the result of Exercise 5, we have IIAII. = },,~J.2, where Ad is called 'the monotonic property. Let us denote by lxl the vector whose
the maximum eigenvalue of A*A. Since A"'A is positive semidefinite, A,,( elements are the absolute values of the components of x, and if Xl> X2 E IR",
just the spectral radius of A*A. By Theorem 1O.3.1,}".. ::;; IlA"'AIl,. Thus We shall say that Xl ~ X2 (or X2 ~ Xl) if this relation holds for corresponding
components of Xl and X2' A vector norm" Ilv on C" is said to be monotonic
IlAIl: :s; Il A*All y :s; Il A*lI y Il AII, = Il All pIl AIl, 0 Ixd:s; /x21 implies IIxlllv s IIx211v for any Xl> x 2 eC".
368 10 NORNB AND BoUNDS fOR EIGENVALuei 10.5 ABSOLUTB VECTOR NORMS 369

The second property we will develop concerns the induced matrix norm, If we can show that there is an Xo e en for which IIDxollv/llxol\v = d, the
When a matrix norm is evaluated at a diagonal matrix proofwiltbe complete. Supposethat mis an integer for which d'= IdmI. Then
o= diag[d 1 d2 dJ, IIDemllv = IIdmemUv = Id I = d.
II em IIv Ilemll v m
we think highly of the norm if it has the property 11011 = max) Id)l. We will
show that II II has this property if and only if it is a norm induced by an'. where em is the rnth unit vector in C", Hence
absolute vector norm.
IIDII = max IIDxll v = d,
Theorem1 (F. L. Bauer. J. Stoer, and C. WitzgaUf). If II IIv is a vector ..... 0 IIxll v
'and (c) follows from (b).
normin enand II II denotesthe matrix norminduced by II IIv. then the following
(c)~(a) Suppose thatthecondition (e)holds;letx = [Xl x 2 XJT.
conditions are equivalent:
Choosingd) = lifx} = Oandd j === IxJ!lx) if x) :F 0, wesee thatfor any xe C
(a) II IIv is an absolutevector norm; there is a diagonal matrix D = diag[d l,d2 , ,dn] such that lxl = Dx. Hence
(b) II IIv is monotonic; property (e) yields IIDII == liD-III = 1. We then have
(c) 11011 = max1s):snldJI for any diagonal matrix
IIlxlll v = llDxll v ~ IIDllllxll v = IIxll v
D = diag[d 1 d2 dJ and since x = D- 1 Ix l, it foUows that
from en"n.
Ilxll v ~ liD-III II Ixl IIv = II Ixl IIv
PROOF. (a) ~ (b)We shall prove that if II IIv is an absolute vector norm
thus IIxll v = nIxlll v for ~y xeen, and so (e) implies (a). This completes
and [x], === F(Xl' X2' xJ, where x === [Xl X2 XJT, then Fis j\.
the proof.
nondecreasing function of the absolute valuesof x.. X2' ' x. Let us focus"
attention on F as a function of the first component, Xl' The same argument Exercise 1. If II II is a matrix norm induced by an absolute vector norm
willapply to each of the n variables.Thus, we supposethat X2' X3' ,x"ate and IAIis the matrix with elements layt. show that \lA1I < IIIAIIi.
fixed, define
Hint. Use Eq. (10.4.2). 0
f(x) = F(x, X2' X3"'" x.) Proceeding to the second topic of this section, recall that our discussion
and consider the behaviour of f(x) as X varies through the real numbers. ofnorms was started by looking for a measure of magnitude or a measure of
If the statement in part (b) is false.there existnonnegativenumbers p, q for' the departure of an elementfrom the zero element. Confining our attention
whicbp < qandf(p) > f(q). Nowsmcef(x) =f(lxl), wehavef(-q)=f(q)" to matrix norms, a measureof departure of a matrix from singularity (rather
and so the vectors Xl = [-q X2 X.]T and %2 = [q X2 xJ~ than the zero-matrix) would also be useful. Thus we seek a nonnegative
belong to the ball consisting of vectors % for which Ilxll. ~ f(q). This ball'i ' function that is defined on all square matrices A and that is zero if and only if
convex (Exercise 10.22) and since the vector Xo = [p X2 XJT, det A = O. Atfirstsight,it issurprisingthat such a measureshould be obtained
be written in the form Xo = t%l + (1 - t)%2 for some t e[O, I), then Xo from a definition complementary to that of the induced norm given in Eq.
belongs to the ball. Thus IIxollv ~ f(q). But IIxollv - f(P) > f(q),so we arri (10.4.1). At second sight it is less surprising, because we know that there is
at a contradiction and the implication (a) => (b) is proved. an x :F 0 with A~ = 0 if and only if det A = O. We define the lowerbound of
(b) ~ (e) Suppose without loss of generality that D 0 and defin , A, written leA), with respect to the vector norm II IIv, by
d = max, s):sn IdJi. Clearly, IDxl sldxl for all x e en and so condition (
Axllv.
implies that IIDxll v S; IIdxll v - dl/xll v Thus 'eA) = inf
nC... ~ ... IIxl\v
II (1)

IIDII = sup IIDxll v s; d. If we accept the idea that IIA -111 increases indefinitely in magnitude as A
... 0 Uxll. approaches a singular matrix, then the next theorem justifies our claim that
t Numerische Math. 3 (1961). 257-264. the lower bound is a measure of departure from singularity.
370 10 NORMSAND BoUNDS FOR EIOENVALUI!8' 10.6 TImO:IlUOORIN THEoRBM 371

Theorem 2. If II II is the matrix norm induced by the vector norm II 10.6 The Gerigorin Theorem
I is the corresponding lower bound,then
I(A) = {OllilA-III if det A 0, A description of regions of the complex plane containing the eigenvalues
if detA = O. AI' A2 , ,A" of a matrix A e C"K" is presented in this section. Recall that
PROOF. If det A :F 0, define a vector 1 in terms of a vector x by 1 = Ax. a first result of this kind was obtained in Theorem 10.3.1 (see also Exercise
Then for any x :F 0, 1 :F 0, 10.5.2).
Before entering into further details, let us point out the essential fact that
. f IIAxll" . f 11111" ( IIA -1111,,)-1 1 the eigenvalues of matrix A depend continuously on the.elements of A. Since
I(A) = m IIxll" = milA 'III" = sUPIiYu.- = IIA-'II' the eigenvalues are just the zeros of the characteristic polynomial, this result
If det A = 0, there is an Xo e e" such that Ax o =
obviously I(A) = 0 by the definition of I(A).
and IIxoll" = 1. Then follows immediately once it is known that the zeros of a polynomial depend
continuously on its coefficients. This is a result of the elementary theory of
algebraic functions and will not be pursued here.
Corollary1. If A, BE e"x", then We now prove one of the most useful and easily applied theorems that give
bounds for eigenvalues. This is known as Gedgorin's theorem and was first
I(AB) ~ I(A)I(B).
published as recently as 193]t.
CoroDary 2. If II II" is an absolute normand Theorem I. If A e e" K" andail: denotes the elements of A,j, k = 1,... , n, and
D = diag[d" dz , " " dJ e cn K
",
Pi = L'lail:l,
then I:

I(D) = min Idjl.


lsjs"
De
where denotes the sumfrom k = 1 to n, k = i, thenevery eigenvalue ofA lies
inat least one of the disks
PROOF. If D is nonsingular, then dJ = 0 for each j and, using Theorems 2
{z: Iz - ajjl ~ Pj}, i = 1,2, ... , n, (1)
and 1,
inthe complex z-plane.
Furthermore, a set of m diskshaving no point in common with the remaining
n - m disks contains m andonlym eigenvalues of A.
Ifdet D = 0, then leD) = 0 and obviously min'SJs" IdJI = 0 also. PROOF. Let A be an eigenvalue of A with the associated eigenvector x. Then
Exercise 2. Let At> Az, .. , A" be the eigenvalues of A E C" X". Prove that; .
Ax = AX or, writing this relation out as n scalar equations,
with respect to any vector norm II IIv' " aJkxl: = Axj'
leA) s min IAil.
L
1:=1
j = 1,2, ... ,n.
1$}$"

Prove also that equality obtains if A is normal and II


vector norm. 0
Note that by combining the result of Exercise 2 with Theorem 10.3.1, we
can define an annular region of the complex plane within which the eigen-
values of A must lie.
Now since x "" 0 it must be that IXpl :F 0, and so we have lAo - appl ~ Pp'
The first result is proved.
Exercise 3. Show that if A is nonsingular then, in the notation of Theorem
IIAxl/" ~ I(A)lIxl/". 0 t Izv. Akad. Nauk. SSSR Ser, Fiz.-Mat. IS (1931),749-754.
372 10 NORMS ANDBoUNDSFOil. EIGENVALUES" 10.6 TIm GWGOIUN THEOREM 373

Suppose now that A = D + C, where D = diag[au,' .. ,a,.J, and let A similar proof can be performed for column sums, and the result
B(t) = D + tC. Then B(O) = D and B(l) = A. We consider the behaviour follows. 0
of the eigenvalues of B(t) as t varies in the interval [0, 1] and use the con-
Note that P in Eq.(2) can be viewed as the matrix norm induced by the
tinuity of the eigenvalues as functions of t. Thus, for any t in this interval, the
vector norm Ilxll oo (Exercise 10.4.9) and similarly for \I (Exercise 10.4.10).
eigenvalues of B(t) lie in the disks with centers aiJ and radii tP}tj = I, 2, .. ; .n.
This shows that the result of Exercise 2 is a particular case ofTheorem 10.3.1
Now suppose that thejth disk of A = B(1) has no point in common with
(for a special norm).
the remaining n - 1 disks. Then it is obviously true that the jth disk of B(t)
is isolated from the rest for all t in [0, 1]. Now when t = 0, the eigenvalues of Exercise 3. With the notation of Eq. (2), prove that
B(O) are aUt ... , """ and, of these, aJJ is the only one in the jth (degenerate), [det AI S min(pn, v")
disk. Since the eigenvalues of B(t) are continuous functions of t, and the jth
disk is always isolated from the rest, it fonows that there is one and only onet
for any A e en"n. 0
eigenvalue of B(t) in the jth disk for all t e [0, 1]. In particular, this is the' A matrix A = [aJk]j.k a 1 e en"n is said to be diagonallydominant if
case whenr = 1 and B(1) = A. n
This proves the second part of the theorem for m = 1. The completion laJil> L' laJkl = Pi
of the proof for any m S n is left as an exercise. k=1

for allj = 1,2, ... n. In other words. if


Note that by applying the theorem to AT, a similar result is obtained using
column sums of A, rather than row sums, to define the radii of the new disks; dJ = lajil- Pj' j = 1,2, .... n,
The n disks in (1) are often referred to as the Gersgorin disks.
d = min d}. (3)
IsJsn I
Exercise 1. Sketch the Gerigorin disks for the following matrix and for itsl
transpose: then A is diagonally dominant if and only if d > O.
0 1 0 Exercise 4. Let A be diagonally dominant and AI be an eigenvalue of A. \
With the previous notation, show that IAII c::: d. i == 1. 2, . n,and hence that I
A=}i lSI
[
161
l]. [det A I ~ d" In particular. confirm the nonsingularity of any diagonally
e .

dominant matrix. and illustrate with the matrix of Exercise 4.11.10. (The
o ! ! -2 nonsingularity of a diagonally dominant matrix is sometimes referred to as
the Levy-Desplanques-Hadamard theorem.)
If S = diag[1, 1. 1. 4]. compute SAS- 1 and deduce that A has an eigenvall1e:
in the disk Iz + 21 ~ 1. Hint. Consider the matrix B = (1/d)A and apply Theorem 1 to show that
IPJ I c:::. 1 for any PJ E I1(B).
Exercise 2. If A E en"n. show that the eigenvalues of A lie in the disk
{zEC:lzl S min(p, \I)}, where Exercise 5. Prove that an Hermitian matrix that is diagonally dominant
and has all positive diagonal elements is positive definite.
n n
P = max I: la)kl. \I = max L lai"l. Exercise 6. If A = B + C, where B = diag[l, 2, 2] and ICitl s 6 < i for
IS}S"I<=1 1SI<Sn J=1
J. k = 1. 2, 3. prove that there is an eigenvalue of A in the disk
SOLUTION. For any eigenvalue ..t. of A (1 SiS n). Iz - 1 - clli S 122.
IAil - laiJI S IA - aiil S PJ' ~xercise 7; This is a generalization of Exercise 6. If A = B + C, where
Therefore 1J = diag[b 1, , bn]. PJ =
min k } IbJ - bkl > O. and IC}A:I s 6 < Pi2n for
n =
J, k 1. 2, ... , n, prove that there is an eigenvalue of A in the disk \
IAtI S Pi + lai}1 = I:
k=1
laikl S p.
Iz - hJ - ciil ~ 2n(n - l):l.IPJ. \i
~.i
1
374 10 NORMS ANDBoUNDSFOR EIGENVALUES 10.7 GERGoRJN DISKS AND IRRIlDucmLll MAmcllS 375

Exercise B. If fez) = ao + atZ + .,. + a,r, a, :F 0, and Zo is a zero of f, Theorem 1 (0. Tausekyl), If A = [ajlc]j,Il=1 is irreducible and
prove that
IZolsl + max Iall/a, I. 0 laJJI ~ t
II-t.II...}
IajA: I, j = 1,2, ... , n, (2)
II
The localization of eigenvalues of a simple matrix will be discussed in with strict inequality for at least onej, then A is nonsingular.
Section 11.2. PROOF. Suppose, on the contrary, that Ax = 0 for some
X = [Xl X2 XII]T:I= 0.
x
Clearly, there is a permutation matrix P E R" " such that Px = i, where
10.7 Gersgorin Disks and Irreducible Matrices i = [Xl X2 . X,,]T and either
IXll = Ix2 1 = .. , = Ixlll (3)
In this section two useful results are presented that can be seen as refine- or
ments of Gedgorin's theorem. However, they both involve the further hypo- = ... = Ix,1 > li,+ d ~ ... ~ Ixlli.
Ixd (4)
thesis that the matrix in question be "irreducible," This is a property that is
easily seen to hold in many practical problems. We make a formal definition Denote A = PApT = [Ii}Jj.i"l and observe that this simultaneous
permutation of rows and columns of A allows us to rewrite Eq. (2) in the form
here and return for further discussion of the concept in Chapter 15.
If n ~ 2. the matrix A e C"lC" is said to be reducible if there is an n x n
permutation matrix P such that
IIi}jl ~ L" IIijlll, j = 1,2, ... , n. (5)
""I.II#j

pTAP = [Au
o
Au],
A22
(1)
We also have Af = 0 and so ~J"l iijtXi
we obtain, in the case of Eq. (3),
== 0 for j = 1, 2, ... , n. Therefore
where Au and A 2 2 are square matrices of order less than n. If no such P
exists then A is irreducible.
liijJlllxll.., = IIijJllx}1 = It
11-1.11 "'}
(jjllXII\ s ( f.
II"I.II",j
IIi}III) IIxll"", 1 Sj S n.

If A is considered to be a representation of a linear transformation T oil Dividing by IIxll"" :F 0, We then have


an n-dimensional space 9', the fact that A is irreducible simply means that, /I

among the basis vectors for 9' chosen for this representation, there is no laill S L Icijt I,
II"l.i#j
1 Sj S n,
proper subset that spans an invariant subspace of A (see Section 4.8 and Eq.
(4.8.3) in particular). and comparison with (5) gives equalities throughout (5), a contradiction that
proves the theorem in this case.
Exercise 1. If A E R" X" is reducible, prove that for any positive integer p,
In the case of Eq, (4) we have, for 1 :s; j :s; r,
the matrix A" is reducible. Also, show that any polynomial in A is a reducible
matrix. liiJJlllxll .. = liijjlli}1
Exercise 2. Check that the square matrices A and AT are both reducible
or both irreducible. 0 = It
II"I.II"'j
ii}IIXi l fp
The first result concerns a test for a matrix A E C" x II to be nonsingular.
Clearly, Theorem 10.6.1 provides a test (or class of tests) for nonsingularity S( t
i=l,i"')
Itijtl)" X II.., + t
11",+ 1
lajll IIill I,
(see Exercise 10.6.4): A is certainly nonsingular if it is diagonally dominant
In other words, if la}}1 > Eo",}la}III for j = 1,2, ... , n, then the origin of the
complex plane is excluded from the union of the Gedgorin disks in which
f
s (II..t.II#) 1
ii j ll I) IIxII"" ' (6)
a{A) must lie. These inequalities can be weakened at the expense of the
irreducibility hypothesis. , Duke Math. J. 15 (1948). 1043-1044.
376 10 NORMS AND BouNDS FOR EIGENVALUES 10.8 lim ScHUR THEoREM 377

Dividing by IIxll oo :# 0, we see that and n - ~ strict inequalities. In view of Theorem 1. this conflicts again with
n
the definition of A unless m = n. Hence the assertion of the theorem.
lii))1 s L Iii)" I, 1 ~j ~ r, In the study of iterative processes it is often useful to have criteria (sharper
"-I.k"')
than that of Theorem 10.3.1) for deciding when the spectral radius of a
and comparing with Eq. (5), we obtain equalities in (5) for j = 1, 2, ... , r. matrix is less than 1 (see Exercise 9.8.5). Theorem 2 yields such a criterion as
Hence it follows from Eq. (6) that a corollary.

n CoroJIaiy 1. Let A = [aliJi. i=1 be irreducible and satisfy


L lalll = 0,
1-'+1
j = 1.2..... r,
for j = 1. 2... n. (5)
or. what is equivalent. the matrix A is lower block-triangular. Thus.
A = p T AP is reducible and a contradiction is obtained. This proves the with strict inequality for at leastone i. Then PA < 1.
theorem.
PROOF. Note first that the hypothesis (5) implies at once that all Gedgorin
Exercise 3. Check that the n x n matrices disks are in the unit disk. So Theorem 10.6.limmediately gives JlA ::s; 1.
The fact that there is strict inequality in (5) for at least one i means that
n 1 1 1 2 -1 0 0 there is at least one Gedgorin disk that is entirely in the interior of the unit
1 2 0 0 -1 2 -1 disk. Since points on the unit circle can only be on the circumference of the
1 0 3 0 0 Gedgorin disks, it follows from Theorem 2 that an eigenvalue of A on the
0 -1 unit circle would have to be a common boundary point ofall n disks and this
is impossible. Hence the result. '
1 0 0 n 0 o -1 2
1
are nonsingular. 0
I
The next result shows that either all eigenvalues of an irreducible matrix A
10.8 The Schur Theorem
lie in the interior of the union of Gersgorin's disks or an eigenvalue of A is a
common boundary point of all n disks.
.j
Theorem 2 (0. Taussky). Let A = [a)Ji.,,= 1 be a complex irreducib~
matrix and let A. be an eigenvalue of A lying on the boundary of the union
the Gersgorin disks. Then Ais on the boundary of eachof the n Gersgorin disks;
PROOF. Given the hypotheses ofthe theorem, let 1.1. - alii = PI for some i,
0;
We present next a rather different result that originates with 1. Schur."
Let us recall the decomposition of a matrix A e jR")<" into the sum of a
. Hermitian matrix B and a skew-Hermitian matrix C. We write B = t(A + A *),
~ = !<A - A *). and observe that A = B + C, B* = B. and C* == - C. Now
If A * = A, then C = 0 and all the eigenvalues ofA are real. This suggests that
l'!

1 ::s; i ::s; n. If A lies outside the remaining n - 1 Gerlgorin disks, then


1.1. - alii > p) for allj :# i. Therefore by Theorem 1 the matrix)J - A must
be nonsingular. This contradicts the assumption that it is an eigenvalue of A.
. .
a Dorm of C may give an upper bound for the magnitudes of the imaginary
parts ofthe eigenvalues of A. Similarly. if A* = -A then B = 0 and all the
e~genvalues of A have zero real parts, so a norm of B might be expected to
Thus, since it must lie on the boundary of the union of the disks, it must be a give an upper bound for the magnitudes of the real parts of the eigenvalues
point of intersection of m circumferences where 2 ::s; m ::s; n. Thus there are' of A. Schur's theorem contains results of this kind. In the following statement
m equalities in the system . .!Jte A and Jm A will denote the real and imaginary parts of the complex
, number it.
n
1.1. - aiil ~ L Ia)l I, 1 ~j ~ n.
t Math. Ann. 65 (1909), 488-510.
1<=I.k"')
378 10 NORMS AND BOUNDS FOR EIGENVALUIlS 10.8 Tim SCHUR TlmoRBM
379
Theorem 1. If A e c-x., II II denotes the euclidean matrix norm, and 2
PROOF Clearly, IIAII~ S; ~.s p2 =: n pl . The first inequality of Theorem 1
11, 11 , .. , A. are the eigenvalues of A, then then gives III :s; (L,lld 2 ) /2 :s; IIAII :s; np, which is the first result of the
corollary. The two remaining results follow similarly. .
L 1111 :os; IIA111 1
,
1=1 Note that t~e particular case 'I: =:=.0 provides another (high-powered)
proof t~at the eigenvalues of a Hermitian matrix are real. Similarly the case
L I~e 1112 s II BII 1
, f1 = 0 yields the result that the eigenvalues of a skew-Hermitian matrix have
1=1 their real parts zero.
Finally, we should note that np is the norm defined in Exercise 1O.3.7a.
1=1
L I Jm 1111 :os; IICII 1 , The result III :s; np is then a special case of Theorem 10.3.1.

where B =!<A + A*) and C =!<A - A*}. Equality in anyone of these Corollary 2 (I. Bendixson f). II'
'J
A E IR x. and 'I:
= 1.
Z
max r,s Iars - asr I,
thenfor any eigenvalue 1 of A,
relations implies equalityin all threeandoccurs if andonly if A is normal.
PROOF. Theorem 5.2.2 informs us that there is a unitary matrix U and an
IJm 11 s ,,,In(n - 1}/2.
upper-triangular matrix 11 such that A = U11 U*, and the eigenvalues of A are PROOF. When A is real, the third inequality of Theorem 1 implies that
the diagonal elements of 11. Now the euclidean norm is invariant under
unitary similarity transformations (Exercise 10.4.7) and so IIAI1 2 = 111111 2 i:
1=1
IJm 1111 :s; t
r.&=1,r5
lar. -2 aul2 s ",2 n(n - 1).
If the elements of 11 are written d" (r, s = 1, 2, ... , n), then
Si~ce A is real, all of its complex eigenvaiues arise in complex conjugate
pairs, so that every nonzero term in the sum on the left appears at least twice.
IIAll l
=
1=1
L 1111 + r<.
L Id
2
r.1
2
~ L 1111,
1=1
1
Hence 21 Jm 1,1 2 :s; ",2 n(n - 1)and the result follows.
and the first inequality is proved. Before leaving these results, we should remark that the bounds obtained
We also have B = !U(11 + 11*)U* so that, as before, in t~e corollaries are frequently less 'Sharp than the corresponding bounds
obtained from Gedgorin'stheorem. This is well illustrated on applying
IIBI1 2 = till +2 1 1 + r".L Id +2 dull
1=1
1
2
r
Corollary 1 to Exercise 10.6.1.
Exercise 1. Prove that [det AI S n/2 p., where p = max, . lar.l.
= t lbie
1= 1
1.1
2
2
+
1=1 rs;.
L Idr.l ~
Iale 111
l
t 1.
SoLUTION. First of all note the inequality of the arithmetic and geometric
means: namely, for any nonnegative numbers al' a2' .. ' a.,
A similar proof also yields the third inequality. .
It is clear that equality obtains in each of the three relations if and only if (a1al ' a.)1/. S a1 + a2 + ... + an.
n
dr. = 0 for all r = s, and r, S = 1,2, ... , n; that is, if and only if A is unitarily
similar to a diagonal matrix of its eigenvalues. The result now follows from If 11,12 , , A.. are the eigenvalues of A, we have, by Theorem 1,
Theorem 5.2.1. [det AI2 = /1 11 21121 2 '11.1 2
Corollary 1 (A. Hirsch t ). Ifwe define S (lld
Z
+ .~. + 11n1
2
)"

p = max lar.l,
r

and if 1 is any eigenvalue of A, then


III s np,
f1 = max

I bie 11 S na,
r
Ibr.l, 'I: = max Icr. I,
r

:s; (~ II Alli)" :s; (~nlp2


and the result follows on taking square roots.
r
, Acta Math. Z5(1902),367-370. , Acta Math. 25 (1902),359-365.
380 10 NORMS AND BoUNDS FOil EiGENVALUES 10.9 MISCIlLLANIlOUS EXERCISES 381

Exercise 2. If Alt AZ' .. , A,. are the eigenvalues of A and Ill> Ilz,... , Iln are 6. If II I/y is the vector norm of Exercise 10.1.7, and /I II and II II. denote the
those of A'"A, show that matrix norms induced by lilly and II liE' respectively, prove that for
any A eenxn,

IIAII = IIHI/zAH I /211 .,


with equality if and only if A is normal.
where Hl/2 denotes the positive definite square root of 0 H:
Exercise 3. With the notation of Exercise 2, show that A is normal if and If II IIv is a vector norm in en, then the function II lid defined by
only if, for some ordering of the eigenvalues, Pi = Illlz, i = 1,2, , n. 0
(1)

10.9 Miscellaneous Exercises is called the dual norm of II IIv.


7. Prove that
1. Prove that for any A E e nxn, (a) II lid is a vector norm;
(b) Iyxl S IIxll vllylid for all x, y E en (the generalized Cauchy-
IIAII~ = s~ + si + ... + s~, Schwarz inequality);
where SI, sz, ... , sn are the singular values of A. (c) If Ilxll, = (Li'=llxd,)I/, is a Holder vector norm, then the
corresponding dual norm of x is
Hint. Use Theorem 5.7.2 and Exercise 10.4.7.
2. Let Peen xn be an orthogonal projector on 9' c: en. Prove that
IIx - PxllE S Ilx - Qxll E
for any projector Q on 9' and any x E en. Interpret the result where (lfq) + (lip)
II xll ll = (
n
.L
= 1
ll
Ixd
)1/ 11
, .

= 1. By use of part (b) for these norms, obtain the


I
geometrically (in terms ofa distance) in the case of R3 and dim 9' = 1. Holder inequality.
3. LetA = HI + iH 2,Hf = H1>H1 = H 2 8. If A = lib'" and II II is the matrix norm induced by a vector norm II Ilv, I
prove that IIAII = lIall v llbll d, where II lid is defined in Eq. (1). :1
(a) show that IIAII~ = IIHII1~ + IIHzlli ;
(b) For any Hermitian matrix He e nxn, show that !J. If x :F 0, prove that there is aye en such that y"'x = IIxll YIIYlld = 1.
10. Prove that if II IIdd is the dual norm of II lid, then IIxlldd = IIxll v for all
I
IIA - HillE S IIA - HIIE x e en. The norm II lid is defined in Eq. (1).
and
U. Let II II be a norm in en x n (not necessarily a matrix norm). Define
IIA - iHzlI s S IIA - iHilE'
Interpret the result geometrically. N(A) = sup
o .. x"cn. n
IIAXII
IIXII
1
4. If A = HU is the polar decomposition of A, prove that IIAIIE = IIHlIs'
Indicate the corresponding property of complex numbers. and prove that -I
5. Define a vector norm II lIy by Eq. (10.4.4) and prove that IIxll y;;:: lilTxl I
(a) N is a matrix norm; -I
Also, if II III is the matrix norm induced by II Ilv' then IIAII I S IIAII for >I

(b) If II II is a matrix norm, then N(A) S IIAII for all A e C,x n; i


all A e e nxn, where 1/ 1/ is the matrix norm in Eq. (10.4.4).
(c) N(A) = IIAII for all Ae en l< n if and only if /I II is a matrix norm
Hint. Start with 1/ (,)'IIT)(mT) 1/ (y :F 0) to obtain the first result. and 11111 = 1.
u
"
382 10 NolIJrfS AND BoUNDS FOR EIGIlNVALUI!S

12. If II II is a norm in C"'" (not necessarily a matrix norm), then there


exists a number e > 0 such that the function
CHAPTER 11
M(A) = ellAII,
is a matrix norm.
13. Use Exercise 1.8.5 to show that if, for any matrix norm IIXII s M and
II YII S M (where X, Ye CnlC "), then f?r r ..;. 1,2, ... , Perturbation Theory
IIxr - rll s rM"-lllX - YII
14. (a) Let.1 be a Jordan block of size p with eigenvalue l. Let Jl =~A)
and f(t) = t,-ltl". Using Theorem 9.4.4 show that there are positive
constants k l , k 2 , and to such that
kt!(t) S IleJ111. S k2!(t), t ~ to
(b) Let A e C"'"',1t = maxAsO'CA) 9f.e(l)and letp be the largest ~~ex
of any eigenvalue for which Jl = 9t.e(A). Show that there are positive
The greater part ofthis chapter is devoted to investigations of the behavior
constants KI , K2, and to such that
of matrix eigenvalues under. perturbations of the elements of the matrix.
Kt!(t) S """'". s Kd(t), t ~ to. Before starting on this subject it is instructive (and particularly important
(c) Use the result of part (b) to show that for the numerical analyst) to examine the effects of perturbations of A and
(i) ileA' II. -to 0 as t -to 00 if and only if II. < O. b on a solution x of the linear equation Ax = b.
(ii) If It S 0 and p = 1, then lleAfll. s; K 2 for aU t ~ to'
(iii) If Jl < 0 there are positive constants K and a such that
ileA' II. s Ke- ott for all t ~ O.
15. Let Fe C""" and 11.1 Perturbations in the Solution of Linear Equations

A= [;. ~l
Given A E C""", consider the equation Ax = b (x, be C") and assume that
Prove that A is positive definite if and only if IIFII.< 1. det A :F 0, so that x is the unique solution of the equation.
In general terms, a problem is said to be stable, or well conditioned, if
Hint. Use the factorization
"small" causes (perturbations in A or b) giverise to "small" effects(perturba-

[:. ~] = [;. ~][~ 1_0F.F][~ ~] tions in x). The degree of smallness of the quantities in question is usually
measured in relation to their magnitudes in the unperturbed state. Thus if
x + J1 is the.solution of the problem after the perturbation, we measure the
together with Exercise 10.4.5. magnitude ofthe perturbation in the solution vector by 1IJ1l1vl/lx/lv for some
16. LetA e C"xll,B e Orand U e 0"" with orthononnal columns, i.e. U*U Vectornorm /l /lv' We call this quotient the relativeperturbation of Jr.
=
=I,.. If A =UBU* show that 11A1I. IIBII. Before investigating the problem with a general coefficient matrix A, it
is very useful to examine the prototype problem of variations in the coeffi-
cients of the equation x = b. Thus, we are led to investigate solutions of
(1 + M)~ = b, where M is "small" compared to 1. The departure of ~
from Jr will obviously depend on the nature of (1 + M) -1, if this inverse

383
384 It PllkTURBATION1HllUI\.,

exists. In this connection we have one of the most useful results of lineat We assumenow that" = IIA -1\1 IIFII < I, and that UJ II = ~ I . . . . . .... __

analysis:
lIA- 1FII :s; IIA- 111 IIFI! < I, Theorem 1 implies that (1 + A-I F)-l exists,
and that 0(1 + A-IF)- 1 11 < (1 - 6)-1. Thus
Theorem 1. 1fll II denotes any matrix normfor which 11111 = 1 and if 11MII <
'1 = (1 + A- 1F)- I A- 1 k - (I + A-IF)-lA-lpx,
1, then (1 + M) -1 exists, I
and, if 1/ II. is any vector norm compatible with II II,
(1 + M)-l =I - M + M2 -
I
llA-tll "
and IlYllv S 1 _ a I/kll v+ 1 _ 6 l1 xllv'
1I(l + M)-111 ~ 1 _ ~IMII I
To obtain a convenient expression for our measure, 1I111v/llxll. of the
magnitude of the perturbation, we use the fact that Ax = 6 implies 116.11 ~ I
Note that in viewof Theorem 10.3.1, the hypotheses of the theorem can 1>6 IIAllllxllv' hence 1/0xllv ~ IIAII/llbll II ". O. Thus
replaced by a condition on the spectral radius: PlII < 1.
IIYllv s IlAIIIIA- 1110kll v + _6_ I
PROOF. Theorem 10.3.1 implies that' for any eigenvalue A.o _~f M, IAol S; IIxll v 1 - " 11611v 1 - lJ
11MII < 1.Moreover,the function f defined by f(A.) = (1 + 1) has a power I
seriesexpansion about A. = 0 with radius of convergence equal to 1.Hence The first term on the right contains the relativeperturbation of 6, and the
(seeTheorem 9.8.3), f(M) = (1 + M)-l existsand last term is jDdependent of the right-hand side of the original equation. We
nowdefine ,,(A) = IIAIIIIA- 111 to be the condition number of A (with respect
I
(1 + M)-l = 1- M + M" - .... to II II) and note that I
If wewrite S, = I - M + M" - .,. M,,-l for the pili partial sum an~
note that UM'\1 ~ IIMII', we obtain
lJ = IIA- 111I1PII = ,,(A) IIFIl I
IIAD .
p-l p-l 1- I\MII" 1 Thus, weobtain finally
IIS,,1I :s; t~1 IIM"11 ~ t~1 II Mli
k
= 1_ UMII S 1 - IIMI\ . I
IIJlllv K(A) (1lkllv IIFII)
As this bound is independent of p, the result follows on taking the limit as Ilxll vS; 1- (K(A)IIFIl/IiAll) IIbll. + llAll .
(2)
I
p-.oo,
This gives us an upper bound for the relativeperturbation of x in terms of I
Let us return now to our original problem.We have the relative perturbations of 6 and A and the condition number "(A). In
= h, particular, the decisive part played by the condition number should be
Ax
I
noted. This is the case whether perturbations occur in h alone, in A alone, I
I~
I
with det A """ 0, and we supposethat b is changedto b + k, that A is chang " or in b and A simultaneously. The condition number will also play an im-
to A + F, and that x + y satisfies portant part in the investigation of perturbations of eigenvalues (Section 2). I
(A + F)(x + y) = 6 + k, Exerdse 1. Prove that if IIA- 1 11 11F0< 1, then IIFII 0:::: llAII. I
We will compute an upper bound for the relativeperturbation 111I1v/llxllv'
Exercise 2. If A = diag[l,2, 3, ... ,10] and 1161110 = 1, and it is known I
Subtracting the first equation from the second, we have
onlythat the elements ofF and k are bounded in absolute value bye, where
(A + F)y + F3t .. k, s < OJ, then
I
or 1111110 20e
A(l + A-lF), = k - Fz:
Il.tlll ~ 1 - lOa' I
I
I
I
I
I
I
386 II Pl!RTURBATION 1'HBoRY 11.2 PERTURBATIONS OFTHEEIGENVALUES OF A SIMPLE MATRIX 387

(This is a "worst case" in which the general error bound or (2) is actually
attained. Note the result of Ex. 10.4.8) 0

11.2 Perturbations of the Eigenvalues of a Simple Matrix

/lYIIe., 208 We now take up the question of the dependence of the eigenvalues of a
IIxll ao = 1 + 8' simplematrix A on the elementsofA. The resultsofthis section are essentially
due to F. L. Bauer and A. S. Householder."
Exercise 3. Prove that for any A e en IC nand any matrix norm, the condition
number 1C(A) satisfiesthe inequality I<:(A) ~ 1. Let A be a simple matrix with eigenvalues ..t l , l2"'" A". Then if D ==
diag[ll" .. ,lnJ, there is a nonsingular matrix P such that A == PDP-I.
Exercise 4. If I<:(A) is the spectralconditionnumber of A, that is,the condition In Theorem 10.5.1 we have been able to characterize a class of matrix norms
number obtained using the spectral norm, and Sl and Sn are the largest and with favorable properties when applied to diagonal matrices, and so we
smallest singular values of A, respectively, prove that r<:(A) = S Ilsn' are led to the idea of using IIDII to estimate IIAII for such a norm. Thus, if
Exercise S. If I<:(A) is the spectral condition number ofa normal nonsingular
n II is a matrix norm induced by an absolute vector norm, the equation A =
matrix A E C"xn, and II and ..tnare (in absolute value)the largestand smallest
eor: and Theorem 10.5.1 imply that
eigenvalues of A, respectively, prove that I<:(A) == l..tll/l A" I. /lAII S r<:(P) max Il;/,
1 sisn
Exercise 6. Prove that the spectral condition number 1C(A) is equal to 1
if and only if cxA is a unitary matrix for some IX O. where 1C(P) = IIPII lip-III is the condition number of P with respect to. II II,
as defined in Section 1. However, as Exercise 6.5.4 shows, the matrix P is
Exercise 7. If the linear equation Ax = b, det A 0, is replaced by the not uniquely definedby the equation A = PDP-I. In particular, A = QDQ-l
equivalent equation Bx = A*b,whereB == A*A, prove thatthismay be done for any matrix Q= PDt> where DJ is a nonsingular diagonal matrix. We
at the expense of the condition number; that is, that I<:(B) ~ I<:(A). therefore obtain the best possible bound,
Exercise 8. Use Theorem 1 to prove that M' -+ 0 as p -+ 00. More IIAII S v(A) max IlA,
precisely, prove that M' -+ 0 as p -+ 00 if and only if the spectral radius of 1 sis"
M is less than 1. if we define v(A) = infp r<:(P), the infimum being taken with respect to all
Exercise 9. If A E C"xn satisfies the condition IIAII' < k for some matrix
matrices P for which p- 1AP is diagonal. This result may be viewed as a
norm II II, all nonnegative integers p, and some constant k independent of p, lower bound for the spectral radius of A; coupled with Theorem 10.3.1, it
show that A also satisfies the resolvent condition: for all complex z with yieldsthe first half of the following result.
Iz I > 1, the matrix zl - A is nonsingular and Theorem 1. If A is a simple matrix, " II is a matrix norm induced by an
absolute vectornorm, and v(A) = infp 1C(P), where p- 1AP is diagonal, then
-1 k
lI(zl - A) /I S 1%1 _ I'
Exercise 10. Let A E e nIC n be an invertible normal matrix with eigenvalues
Am"" and ..t".ln of largest and smallest absolute values, respectively. Let and (1)
All = lmaxll, Au = ..t".lnU with 111111 == /lull == I in the euclidean norm. Write . leA) s min Iljl s v(A)I(A),
b = II, k = 8U, with 8 > O. Show that if Ax == band A(x + J1) = b +k, 1 sisn

[cf. Eq. (1)], then where 'eA) is defined in Eq. (10.5.1).


IlylI = BI<:(A)
t Numerische Math. 3 (1961), 241-246.
IIxll .

\,
388 11 PeRTURBATION THEORY 11.2 PERTURBATIONS OF THE EIGENVALUES OF A SIMPLEMATRIX 389

PROOF. The first inequality in (1) is just the result of Exercise 10.S.2. For where x = P-Iy :1= O. Thus,
the second inequality, the equation A = PDp-1 and Corollaries 10.S.1 and (p.l - D)x = p-l BPx (2)
10.5.2 imply that
and hence
I(A) ~ I(P)I(P-I) min IAJI. 1
I~J~n
l(p.l - D) S; 1I(p.1 - D)xll v = II P - BPx llv S; IIP-IBPII S; rc(P) IIBII.
Theorem 10.5.2 then gives IIxllv IIxliv

I(A) ~ (rc(p-l min IAJI, Therefore we have


I~J~n
min Ip. - AJI S; IC(P) IIBII.
and the second inequality in(l) follows. _ IsiSn

For a general simple matrix A this theorem is of doubtful computational Sincethis is true for every P diagonalizing A, we have
valuebecausethe quantity lI(A)is so difficult to estimate.However, weshould min Ip. - Ajl S; v(A) IIBII = r,
observe that pp-l = I implies that rc(P) = IIPII lip-III ~ 1 (see also Exer- 1 ~J~n
cise 11.1.3), and hence thatv(A) ~ l.
If A is a normal matrix then we know that P may be chosento be a unitary
Hence, p.lies in at least one of the disks IZ - AJI S; r, j = 1, 2, ... , n.
matrix. If, in addition, we use the euclidean vector norm, and hence the Qxercise 1. Show that the eigenvalues in Exercise 10.6.1 lie in the disks
spectral matrix norm II II, then 11P1i = lip-III = 1. Thus, for a normal with radius 2 and centers at 3 .JW,
5i, and - 2. .
matrix and spectral norm we have lI(A) = 1 and the theorem reducesto the
known cases: Hint. Use Exercise lOAH to estimate IIBII. 0
If the perturbation matrix B of Theorem 2 is also known to be simple,
IIAII = max IAJI, I(A) = min IJ1J1. then the radius of the disks obtained in that theorem can be reformulated,
l~J~n l~J~n
as indicated in the next theorem.
Our next theorem demonstrates the significance of v(A) as a condition
number for A reflecting the stability of the eigenvalues under perturbation Theorem ~. If, in addition to the hypotheses of Theorem 2, the matrix Bis
of the matrix elements. . simple with eigenvalues P.h , 1J.n, then p.liesin at leastone ofthe disks
Theorem 1. Let A,B e cnlC n, with A simple. IfA has eigenvalues AI'A2 , . , A,. {z:lz - AJI S; v(A, B) max Ip.JI}, j = 1,2, ... , n,
andp. isaneigenvalue ofA + B, and iffor a matrixnorminduced by anabsolute J
vectornorm II IIv we have where
r = lIBlIlI(A), v(A, B) = inC K(P-IQ)
P.Q
then p. lies in at least one ofthe disks
andP and Q diagonalize A and B, respectively.
{z:lz - AJI S; r}, j = 1,2, ... , n,
PROOF. We may now write Eq. (2) in the form
ofthe complexz-plane.
(p.l - D)x = (P-IQ)D1(P-1Q)-lX,
PROOF. Since p. is an eigenvalue of A + B, there is a 1 :1= 0 for which
(A + B)y = p.y.
Where D1 = diag(p.h ... , p.n]. Hence
If A = PDP-I, where D = diag[A1 , , An], then l(p.l - D) S;K(P-lQ) IIDl " ,
(D + p- l BP)x = p.x, and the result follows from Theorem lO.5.1.
390 11 PmtTURBATlON THEoRY 11.3 ANALmC PERTURBATIONS 391
We can easily obtain a stronger form of this theorem. Observe that P will Exercise 4. LetA elR""",and let Ph"" J.I" betheeigenvaluesof-!(A + AT).
diagonalize A if and only if P win diagonalize al + A for any complex If k is the spectral radius of!<A - AT)and
number , and similarly for Q, B, and - a.I + B. Since
min Ipj - Pkl ~ 2k,
(al + A) + (-IX! + B) = A + B, jk

we can now apply Theorem 3 to al + A and - a.l + B. prove that the eigenvalues of A are real.
Exercise 5. Suppose that A, Be R/I"/I, AT = A, and Ibjkl S b for j, k =
Corollary 1. Under the hypotheses of Theorem 3, the eigenvalue P liesin at
1,2, ... , n. If PI' ... , P.. are the eigenvalues of A and
least oneof the disks
min Ill; - Ilkl > 2nb,
{z:lz - (IX + 4J)1 ~ v(A, B) max IPJ - IX!}, j = 1,2, ... , n, jk
ISjS/l
prove that the eigenvalues of A + B are real.
for anyfixed complex number IX.
Exercise 6. Let A and A + Eben x n matrices with eigenvalues
The advantage of this result lies in the fact that IX may now be chosen to AI, A;l' ... ,.iI.,. and Ill' J.l;l"'" /A.., respectively. Show that the eigenvalues
minimize maxJIPJ - atl and hence minimize the area of the disks containing can be ordered in such a way that
the eigenvalues.
The case in which A and B are normal is important enough to warrant IAJ - /AJ I S; /lE/I.,
a separate statement. for j = 1, 2, ... , n. 0

Corollary 2. If. in the hypotheses of Theorem 3, the matrices A and B(/J'e


normal, thenthe eigenvalues ofA + B arecontained in the disks
11.3 Analytic Perturbations
{z: Iz- (IX
.
+ 4J)1 S maxlpj - IX!},
J
j == 1,2, ... , n,
We now take quite a different approach to the perturbation problem.
for anyfixed complex number IX. Suppose that A e e""" is the unperturbed matrix and has eigenvalues
PROOF. Observe that P and Q may be supposed to be unitary. This implies AI' ... , 4/1' We then suppose that A(C)is an n x n matrix whose elements are
that P- 1 Q is unitary and hence, if we use the spectral matrix norm, v(A, B) = analytic functions of { in a neighborhood of (o and that A({o) = A. For
infp,a rc(P-lQ) = 1. The result now follows from the first corollary. . convenience, and without loss of generality, we may suppose that {o = 0,
so A(O) = A. We write 41({)""'.iI.,.(O for the eigenvalues of A(O and then
Exercise 2. If A is a normal matrix with eigenvalues AI' A;l,... ,.iI.,., and (Section 10.6), since the eigenvalues of A(O depend continuously on the
B has all its elements equal to 6, prove that the eigenvalues of A + B Iiein elements of A({), we may suppose that AtO ..... 4J as { -+ 0 for j = 1,2, ... , n.
the disks The following exercise illustrates the fact that, though the eigenvalues
depend continuously on the matrix elements, they may nevertheless vary
{Z+-(~+4J)IS~181}, j=I,2, ... ,n, quite rapidly in pathological circumstances.
Exercise 1. Consider the 10 x 10 matrix A({) defined as a function of the
Exercise J. If A is a normal matrix and B = diag[b 1, , b/ll is real, prove real variable Cby
that the eigenvalues of A + B lie in the disks
{z:lz - (ae + S 4J)1 Pl,j = 1,2, ... , n,
where AI' A;l,"" A/I are the eigenvalues of A and
2ac = max bJ + min bJ and 2P = max bJ - min bi'
1 sJ$/I 1 $J$/I 1 $J$/I 1 $jS/l
392 II PalTURBATIONTIIEORY 11.4 PalTURBATION OF nIB CoMPONENT MATRICIlS 393

Prove that the eigenvalues at , - 0 and those at C= 10- 10 differin absolute Nevertheless, we shall approach the perturbation problem by analysis
value by 0.1. of the spaces generated by eigenvectors, insofar as these are determined by
Note that the difference between A(O) and A(tO-tO) is so small that they the component matrices of A(O (see Section 9.5). The reason for this is that,
may not be distinguishable on a computer with too sbort a word length.: 0 first, the resolvent R~(e) = (zI - A(O)-l is expressed directly in terms of
What can we say about the nature of the functions AJ<O for sufficiently A(O and. second, by means ofEq. (9.9.4)we can then examine the component
smalll'l? We shall see that in certain important special cases. the functions matrices of A(O.In particular, we shall beable to infer properties of vectors in
the range of a component matrix.
AJ<C) are also analytic in a neighborhood of C= O. That this is not generally
the case is illustrated by the next example.
Example 2. The eigenvalues of the matrix

A(O = [~2 ~] 11.4 Perturbation of the Component Matrices \

are the zeros of the polynomial


det(,u -- A(C = A2 - {3.
Suppose that an unperturbed eigenvalue A "splits" into distinct eigen-
values A1({) A,,(C) under the perturbation determined by A(O. We can
I
T"
1

that is. At(O = {3/2 (defined by a particular branch of the function {1/2) and prove that the sum of the first component matrices for each of these eigen- '.1.
A2(C) = -A 1<O. Thus. we see that the eigenvalues are certainly not analytic values is analytic near C= O. This theorem is the peg on which we hang all
in a neighbourhood of { = O. 0 our subsequent ~alysis of the perturbation problem.
It should be noted that det(,u - A(C is a polynomial in A whose coeffi- Theorem 1. Let A() be analytic in a neighborhood of C= 0 and A(O) = A.
cients are analytic functions of {. It then follows from the theory of algebraic
functions that
Suppose that A1 m, ... , Ap<O are the distinct eigenvalues of A({) for which ,t
.A,,(C) -+ Aas C-+ 0,k = 1,2, ... , p, andlet Z"o(Obe the idempotent component
(a) If AJ is an unrepeated eigenvalue of A. then At.{) is analytic in a matrix of A(O associated ,with A,,(O. 1/ Z is the idempotent component matrix
e
neighbourhood of = 0; of A associated with A and we define
(b) If AJ has algebraic multiplicity m and A,,<O -+ AJ for k = IX...... ",j
e
then A,,(C) is an analytic function of t / I in a neighborhood of C = 0. where
I ~ m and e1/1 is one of the I branches of the function {lfl. Thus A,,<O hase
power-series expansion in C 1 / I . ,
thenthere is a neighborhood of{ = 0 in whichthe/ollowing developments hold:
We shall prove result (a) together with a result concerning the associatee!
eigenvectors. Note that. in result (b). we may have I = I and m > 1. in which + CC 1 + {2C 2 + "',
y(c) = z
case A",(C) could be analytic. As might be expected it is case (b) with m > ~.
A(C)Y(C) = AZ + CB 1 + eB 2 + "'.
that gives rise to difficulties in the analysis. The expansions in fraction~
powers of {described in (b) are known as Puiseux series. and their validity for matrices Cr. B, independent of(. r = 1. 2....
wi1l be assumed in the discussions of Section 1 1 . 7 . ,
It is when we try to follow the behavior of eigenvectors that the situatioit' PROOF. Consider a fixed C.;. 0 for which 'CI is small enough to admit the
becomes more complicated. The matrix construction of a simple closed contour L that encircles A, Atm... , A,,({)
and no other eigenvalues of A or of A(C}. Then writing R~ = (zI - A}-l . I

A({) = [~ ~] and R~({) = (zI - A(C- 1, we have

1
f R~dz,
serves to show that even though the eigenvalues may be analytic. the eigen- 1.
JfL R,,(C) dz.
Z= 2 Y(C) = 2 . (1)
vectors need not be continuous at { = o. 1tI JL 1tI
394 II PIlRTURBATION Tmloav II.S PsaTURBATION OF AN UNRIlPIlATBD E,GENVALUE 395

We note that the elements of Rz<O are rational expressions in functions Partial fraction expansions now give
of Cthat are analytic in a neighborhood of C:: 0 for z on the contour L.
Thus, there exist matrices Rfl, r == 1,2, ... , independent of Csuch that 1 [z _1{312 + Z +1{3/2
Rz(C) == R z + (R~l) + CZR~Z) + .... Rz(C) :: 2" . {1 /2 ~ (1/2

Using (1) we have only to integrate this series term by term around the
Z - C3/2 Z + {3 /2
contour to obtain the first result ofthe theorem. This is permissible provided Using Eq. (9.9.4) we obtain
only that each coefficient Rrl (r == 0, I, 2, ...) is integrable on L. Using
Exercise 9.9.3 we see that, for a fixed z, Z10(O
_[1
-! (1/2
C1
/ ]
1 2

and Z20(O
1 1/2
=! [ _(1/2 _C1 ] .
~ence Y({) = ZIO(C) + Zzo({) = I. This example emphasizes the fact that
and hence that, for each r, the matrix R~) is a linear combination of matrices although Y(C) may be analytic in C, the component matrices (of which it
Rz(A(P)R.)' for certain integers p, s with p S r. Now, AIPl(O) is independent is the sum) need not be analytic. Indeed, in this example the component
of z and Rz is certainly continuous in z on L. Hence R~l is continuous on L. . matri~es have a singularity at , = O. It should also be noted that although
Integrating the series for Rz(O around L, we now obtain . the eigenvalues are not analytic in a neighborhood containing C= 0,
~et they are "better behaved" than the component matrices.
Y(C) == Z + CC I + CZCz + ... , Exercise 2. If the matrix
where 0{ 0]
C
r
== _1_
2ni
i L
R(rldz
,
r = 1,2, ....
A(C) ==
[, 0 1
0 0 ,

has eigenvalues Al(C), Az(), and A3({) and Ai(C) -+ 0 as { -+ 0, j = 1, 2, show


Now both Y({) and A({) are analytic on a neighborhood of { = O. It that we may write
follows that their product A(OY(C) is also analYtic there and has a Tayll:it
expansion given by the product of the expansions for A({) and Y<o, so the liC) = (-1).1+ 1~3/2 - H 3 + .. " j = I, 2.
theorem is proved. If ..t = 0 is the unperturbed eigenvalue under investigation, find Z, ZlO({),
A similar theorem can be proved showing that the continuity of A(C) at. Zzo(C), and Y({), and verify that Y({) is analytic in a neighbourhood of C = O.
C= 0 implies that of Y(C) at ( == O.
Exercise 3. Consider eigenvectors elf e2' e3 (unit coordinate vectors) of
Now we show that the individual first component matrices do not neces-: A(O) in Exercise 2. Consider the possibility of extending these, as eigenvectors
sarily have the property of being analytic near C= O. of A({), into a neighbourhood of C= O. Show that e3 can be extended
Exercise 1. Find the matrices Z, Z10(O, Z20({)' and Y({) for the matrix analytically, el has two continuous extensions, and e2 has no such extension.
o
A(C) = [~ ~l
11.5 Perturbation of an Unrepeated Eigenvalue
SOLUTION. At C= 0, the matrix A is the zero-matrix and it is easily seen tha~
Z ~ I. The eigenvalues of A({) are ItI(C) ~,3/Z and Itz({) = -Itl({)' W~
then have " We are now in a position to prove a perturbation theorem of the greatest

R,,(C) = [
Z
_ ,2
_{]-1 ==
Z Z2 _
1
C3
(z {J
{2 Z
practical importance. The notation is that developed in the two preceding
sections.
, ,
11
11.6 EVALUATION OF THBI'ERTURBATION CoilFFICIIlNTS 397
396 PERTURBATION THEoRY

Theorem 1. If .I. is an eigenvalue of A of multiplicity 1, then,for sufficiently


and since Y(C) -+ xyT as' -+ 0, it follows that (yTx(C))(yT(Q,r) -+ 1 as C-+ O.
Thus we may suppose that yTX<C) and yT({) are nonzero for small enough
small lei. there is an eigenvalue 1(C) ofA(O such that
lei. Now we can define x(O by normalizing in such a way that
.1.(0 = .I. + C.l.UJ + C21 (2) + .... (1)
yTX(') = 1. (5)
Also,there areright and left eigenvectors X<C) and)'(0, respectively, associated Since Y({)x = X(C)(yT(C)X), we have yTy({)X = yT({)X. Hence Y(C) =
with 4({)for which X(OyT(O implies
1
and
xC,) = yTy(()X y(C)x.
x<:e) = x + ex(1) + {2 X(2) + "', (2) Since yeo is analytic in a neighborhood of { = 0 and JlTy(O)x = 1, it
y(O = y + {ym + ey<2) + .... (3)
follows that x<{) is analytic in a neighborhood of' = O.
Finally, Eq. (5) also implies xT(C)JI = 1, so the relation JI)xT(C) = yT(C)
PROOF. We suppose throughout that ICI is so small that
.t({) has multi- implies y") = yT(OJI, which is also analytic in a neighborhood of = o. e
plicity 1. It then follows that yeo
= Z10(O, a component matrix. Further-
more (see Theorem 4.10.3) we may write We have now proved the existence of power series expansions in efor an

(4) unrepeated eigenvalue and its eigenvectors. The next step in the analysis is
the discussion of methods for computing the coefficients of these expansions.
where
yTX = yT(C)x(C) = 1.

In this case we obviously have 11.6 Evaluation ()f the Perturbation Coefficients
A(C)Y(O = .I.(C)Y(C),
so premultiplying by yT and postmultiplying byx we obtain We shall firstdiscuss a particular problem that will arise repeatedly later
on. If A is an eigenvalue ofthe matrix A e en"" and II oF 0, we want to solve
yTA(OY({>X 1 + {h1 + {Zbz + .., the equation
1(C) - - -:--....::".....::.-~--
- yTy(C)x - 1 + eel + { ZC2 + ...
(AI - A)x = b. (1)
after making use of Theorem 11.4.1. The coefficients on the right are givenby Given that a solution exists (i.e., b e Im(U - A, we aim to express a
r = 1,2, .... solution x of (1) in terms ofthe spectral properties of A. If A1 , 1 2 , , 1. are the
distinct eigenvalues of A and 1 = A1 we first define the matrix
It follows that 1(0 is analytic in a neighbourhood of' = 0, which is the first m,,-l 'f
part of the theorem.
It is clear that, in the statement of the theorem, x(e) is not definedto within
E = A~2 j?;o (A -\,)1+ 1. Zltj, (2)
a nonzero multiplicative constant (which may vary with We define x(O n just as in Eq. (9.6.2) but replacing .1. 1 anp E 1 by.l.and E. MUltiply Eq. (1) on the
more preciselyby noting first of all that there is no ambiguity in the definition left by E and use Theorem 9.6.2 to obtain
of y(o. We then choose the eigenvectors x, y with yTx = 1, and hence
(I - Z)x = Eb. (3)
xyT = Z.
Then Y({) = X(C)yT(O implies (With the convention of Sections 4 and 5 we write Z for Z10') Since ZE = 0

~
yTy(OX = (yTX(O)(yT({)X), it is easily seen that the vector x = Ell satisfies Eq. (3), and we would like

~
..

t,
398 11 PBRTURBATION THEORY 11.7 PERTURBATION OF A MULTIPLIl EIGENVALUE 399

to assert that it is also a solution of Eq. (1). However, we need an additional Thus, we complete the solution for the first-order perturbation coeffi-
hypothesis to guarantee this. cients as follows: .
Proposition 1. If A is an eigenvalue of A ofindex 1, then Eqs. (1) and (3) have A(1) = yTA(1)x, xl 1) = EA(1)x (8)
the same solution set 9'. Furthermore, x = Eb is a common solution (when
9' oF 0).
PROOF. In view of the argument above, we have only to show that the solu-
,r
The argument can easily be extended to admit the recursive calculation
of Air) and x lr) for any r. The coefficients of in Eq. (4) yield

tion sets coincide. But 1 - Z = E(M - A) and so, using Exercise 3.10.8,
Axlr) + A(1)xlr- 1) + ... + Alr)x = AX(r) + A(1)Xlr- 1) + ... + A(r)x,
we need to check that rank(1 - Z) = rank(M - A). To do this, observe or
that I - Z and M - A have the same kernel, namely, the eigenspace of A
associated with A.
~",. _ A)xlr) = (All) _ A(1)I)x(r-l) + ... + (Air, - A(r)l)x.

Suppose now that the hypotheses of Theorem 11.5.1 are fulfilled. Then It
If we nOW!!lune that A(1), . , Alr-l) and X(l), .. , X lr - 1) are known, then
multiplyiDI',Eln the left by yT gives
has multiplicity 1 and for small enough 1'1, we use the results ofthat theorem
to write .ttr) = yTAI.IX + y TA(r-lI xll ) + ... + yTA ll )X lr-l), (9)
(A e
+ {AU) + A(2) + ... )(x + (x(1) + (2x (2) + ...) for r = 1,2, ... , and x(O) = x. Proposition 1 then yields the result [
[\
= (A + 'AU) + .. )(x + 'x(1) + ...). (4) xlr, = E[(AI1) ._ A(1)I)Xlr- 1) + ... + (A(r-l) _ .t(r-l)l)x(1) + AI)x]. (10) ~
'II
We can equate the coefficients of obtained by multiplying these series in many applications the perturbations of A are linear in ,. Thus we have
I'
I:
for n = 0, 1,2, .. , . The coefficient of {O gives the equation for the unper-
turbed eigenvalue: Ax = h. The coefficient of' gives
(M - A)x(1) = (All) - A(1)I)x.
'
A:) = A + (B for a matrix B independent of ,. In this case we have A(1) =
B u:ld A(2) = A( 3 ) = .,. = O. Equations (9) and (10) now reduce to

11r) = yTBxlr-1), xlr) = E(Bx(r-l) _ r}'! AlJlxlr-il).


I
~

Since y (of Eq. 11.5.4) is a left eigenvector of A, we deduce that yT(All) ...., J"'l
A(1)1)x = O. Since yTx = 1, this condition yields for! = 1,2, .... Recall that in these formulas the superscripts do not denote
All) = yTA(1)x. derivatives but are defined by the expansions (11.5.1) and (11.5.2).

We also know that, with our definition of x<O, the relation (11.5.5) gives us E:x'ercise 1. Show that Eq. (9) in the case r = 2 may be written
yT(X + 'X(1) + ,2x(2) + ...) = 1, A(2 ) = tr(A(2)Z10) + tr(A(1)EAl1)Z10)' 0

and this implies that


r = 1,2, ....
11.7 Perturbation of a Multiple Eigenvalue
To obtain an expression for x(1) from Eq. (5), we use Proposition 1. Thus,
we know that x(1) = E(A(1) - A(1)1)x is a solution. Since Z = xyT and
EZ "" 0, it follows that Ex = EZx = 0, so the expression for xU) simplifies We can now consider the problem of analytic perturbations of an un-
to give x(1) = EA(1)x. " repeated eigenvalue to be solved. In particular, the result (a) of Section 11.3
We should confirm that this choice of 'solution is consistent with (7), has been established. If .tis an iegenvalue of A of multiplicity m > I, then we
that is, that yTx(1) = O. Since yTZ = yT we have must return to the Puiseux series representation of a perturbed eigenvalue
yTx(1) = yTZEA(1)x. ..t(O that tends to A as , -+ 0 (result (b) of Section 11.3). In Section 11.4 we
Ii gained some further insight into the behavior of the component matrices of i
But then ZE = EZ = 0, so yTxl 1) = 0, as required. the perturbed eigenvalues and saw that although their sum Y(C) is analytic \
'I
400 II PERTURBATION THEoRY
II.7 PIlRTURBATION OF A MULTIPLE EIGENVALUE 401

for small enough 1'1 (Theorem 11.4.1), the separate component matrices may Thus, in the Puiseux expansion for AJCC> described in result (b) of Section 11.3,
have singularities at C== 0 (Exercise 11.4.1). e
. the coefficients of i ll are zero for j = 1,2, ... , I - 1. .
Suppose now that we impose the further condition that A(,) be Hermitian
and analytic on a neighborhood oN'" of { == {o in the complex plane. Let
aJ"m be an element of A(C) with j :F k, and let z == x + iy with x and y real.
into eigenvalues A1( { ) , ... , Ap(C) for I"
As in Section 11.4, the eigenvalue A or A = A(O) is supposed to "split"
sufficiently small. Also A({) ==
L..'"'=o C"A("), the matrix Z is the component matrix of A, and (C) is the sum
Then define airtx + iy) = u(x, y) + iv(x, y), where u and v are real-valued of component matrices Z"o(O of A(O for 1 ::s; k ::s; p. Thus, (C) .... Z as
functions. ICI-O.
With these definitions, the Cauchy-Riemann equations imply that Lemma I. With the conventions and hypotheses of the previous paragraphs,
au av au av thefunction
ax == ay and ay == - ax' B(O!! C-l(M - A(C))(O (1)
But a,,!.{) == aj,,(C) == u(x, y) - iv(x, y), which is also analytic on .H: Hence
hasa Taylorexpansion about C== 0 oftheform
au aV au av CD
ax == - oy and ay :;::: ax' B(O == - ZA(1)Z + L {"B(").
11=1 I
. ;
These equations together imply that ai"({) is a constant on oN'", and the
parameter dependence has been lost. This means that analytic dependence PROOF. Using Eq. (11.4.1) and Exercise 9.9.6, we may write
in the sense of complex variables is too strong a hypothesis, and for the
analysis of matrices that are both analytic and Hermitian we must confine
attention to a real parameter Cand assume that for eachj and k the real and
Y(C) == 2
1.
m
iL
R,,(C)dz, A(OY(O == 2
1.
m
i
L
zR.(C)dz,

imaginary parts of aJ" are real analytic functions of the real variable C. where, as in the proof of Theorem 11.4.1, the contour L is simple and closed
For this reason the final theorem of this chapter (concerning normal and contains;" A1(C), ... , Ap(C) and no other eigenvalues of A or A(C).
matrices) must be stated in terms of a real parameter C. When applied to Here, Cis fixed and 1'1 is assumed to be small enough to complete this (and
Hermitian matrices this means that the perturbed eigenvalues remain real subsequent) constructions. Using Eq. (11.4.2) it follows that
and the component matrices (as studied in Theorem 11.4.1) will be Hermitian.
It can also be shown that in this case the perturbed eigenvalues are analytic (M - A(C))Y(C) == - ~f (z - A)R({)dz
2m JL
in {, whatever the multiplicity of the unperturbed eigenvalue may be. The
perturbed component matrices are also analytic and hence the eigenvectors
of A(e) can be supposed to be analytic and orthonormal throughout a neigh-
== -
2m
t rR~)dz.
~iL (z -.t) r=O
borhood of { == O. These important results were first proved by F. Rellich
in the early 19308. Rellich's first proof depends primarily on the fact that the Now, YeO) == Z and, since A has index 1, (M - A)Z == O. Hence the coeffi-
perturbed eigenvalues are real. We shall prove a result (Theorem 2) that cient of Co on both sides of the last equation must vanish and we find that
contains these results by means of a more recent technique due to Kato. CD

(See the references in Appendix 3.) B(C) = ,-l(M - A(C(o == L rBlr),


With this preamble, we now return to a perturbation problem, making '=0
no assumption with regard to symmetry of the matrix A(C). Until we make Where, for r == 0, 1, 2, ... ,
some hypothesis concerning the symmetry of A(e>, we retain the assumption:
of dependence on a complex parameter C. The formulation of corresponding .
results for the case ola real variable' is left as an exercise.We first prove dlat
if Ais an eigenvalue of A of index 1 and Ate> is an eigenvalue of Am for which.
A(C) .... A as C.... 0, then, as , .... 0, and is independent of C. This confirms the existence of the Taylor expansion
X~{) == A + IXJC + OO{lI+(1/11). for B(C).
402 ] 1 PtlRTIJRBATION THBollY 1l.7 PtlRTIJRBATION Of A MULTIPLE EIGENVALUE
403
To examine the leading coefficient we first take advantage of Eq. (11.4.3) PROOF." It remains to prove only the statements concerning eigenvectors.
and Exercise 9.5.2 to write But the~e follow ~mmediately on writing down A(C)x;<C) = A;<OX~O and
..compa~mg coefficients of powers of CI/I. .
R~I) = (zI _A)-IAU)(zl - A ) - I "
.N~w it is important to observe that if aj is an eigenvalue of ZA(1)Z
= (z ~ A + E(Z)A(1)(z ~ A + E(Z). WIth I~dex I, th~n the process can be repeated, that is, we can apply the above
analysis to BC{) Instead of A({) and find that in expansions (3), the coefficients
Hence, using Cauchy's theorem of residues, PI' P2, ... , P/-I all vanish. This observation will be exploited in Exercise 3
below.
Bt) = - L
2~i {~~): + (function analytic near = A)} dz. Z Exercise 1. Find ZA(1)Z for the matrix A(C) given in Exercise 11.4.2 and
show that the result of Theorem 1 is consistent with the expansions for A (()
= -ZA(1)Z and A2(O given in that example. . 1

Let xj.O be a right eigenvector of A(C) associated with one of the eigen- Exercise 2. Verify the results of Theorem 1 for the matrix A({) of Exercise
values AJ{C), 1 ~ j S; p. Then Y(C)xj.') =xj.O and we see immediately from ". 11.4.1 and for the matrix .
Eq. (1) that
B(C)xj.') = C 1(A - Aj.CXj.'). (2) [C(C ~ 1) C~ 1J.
Thus, C- - Aj.\J) is an eigenvalue of B(O, say P(o. But, since B(O has a
1(A
Exercise 3. Prove that if aj is an eigenvalue of ZA(l)Z of index 1 then there
Taylor expansion at , = 0, P(C) also has an expansion in fractional powers is a number bJ such that '
of ,. Consequently, for small enough 1'1 and some integer I,
AiC) = A + aJe + bj C1 + O(l{l1+(I/II), (6)
,-I(A - Aj.\J) = Po + PieHl + p2 , Zfl + ... ,
and Po is an eigenvalue of the unperturbed matrix - ZA(l)Z. This implies and that hj is an eigenvalue of Z(A(I)EA(l) + A(2Z with right eigenvector
s: 0
AJ{C} = A - PoC - PI,1+(l(l) - PzC1+(Z(I) _.... (3)
Consider the matrix function B({) of Lemma 1 once more. Since Y(C)
In other words, isjust a function of AC{) (for a fixed C), Am and Y({) commute. Also Y(C)2 =
AP:) = A + aJe + O<lCl1+U/il) YC{) so we can write, instead ofEq. (I), "

as 1(1 .... 0, and aJ is an eigenvalue of ZA(1)Z. BCC) = C-IY(OCll - AC\J)YC{), (7)


We have proved the first part of the following theorem. and this representation has a clear symmetry. In particular, if A and A(l)
Theorem 1. Let A(e) be a matrix that is analytic in , on a neighborhood of happen to be Hermitian, then YCO) = Z and hence B(O) is also Hermitian.
, = 0, and supposeA(O) = A. Let Abe an eigenvalueofA ofindex 1 and multi- Thus, B(O) has all eigenvalues of index one and repetition of the process
plicity m, and let Aj.O be an eigenvalueofA(Ofor which Aj.O) = A. Then there described above is legitimate, thus guaranteeing dependence ofthe perturbed
is a numberaJand a positive integer I ~ m such that . eigenvalues on ( like that described by Eq. (6). Furthermore, it looks as
though we can repeat the process indefinitely, if all the derivatives of A({)
AJ{C) = A + aJC + O<lCl 1 + (1 (/) (4) ate =0 are Hermitian, and we conclude that Aie> has a Taylor expansion
as ,,, .... 0, and aj is an eigenvalueofZAU)Z. about' = O.
Moreover, to each Aj.C) correspondsat least one eigenvector xj.e) such that, But once we bring symmetry into the problem, the considerations of the
for small enough ICI " preamble to this section come into play. Thus, it will now be assumed that
2/1X2 ~ is a real parameter, and that A(C) is normal for every real C in some open
Xj(C) = x + C1flXI + e + ... , Interval containing ( = O. Now it follows that whenever ACe) is normal,
where x, XI' , X/-IE Ker(A - AI) and ajx = ZA(1)Zx. Y({>. and Z are Hermitian (i.e., they are orthogonal projectors), and B()
404 II PERTURBATION THEoRY 11.7 PERTURBATION OF A MULTIPLB ErOBNVALUB 405
is also normal. In particular, B(O) = -ZA(1)Z is normal and all eigenvalues Suppose that D(C) has rank r in a deleted neighborhood .AI' of , = 0
of B(O) have index 1. So a second step of the reduction process can be taken (i.e., excluding the point' == 0 itselt). For simplicity, and without loss of
and, in fact, repeated indefinitely. This leads to the following conclusion. generality. assume that the rth-order minor
Theorem 2. Let A(e> be analytic andnormal throughout a realneighborhood D(1 2 rr)
of' = 0; then the eigenvalues of 4.(') are analytic in some real neighborhood 1 2 .
of' = O. is nonzero in .IV. Then expand the minor
Moreover, to each eigenvalue AllJ analytic on a real neighborhood of
, == 0 corresponds at least one eigenvector xJ{C) that is analytic on the same
neighborhood.
D(1 2 ... r r +
1 2 '" r r+l
J)
Exercise 4. Show that if Cis real and A(C) is normal, then B(O is normal. by its last row, and write II' ... /,+ 1 for the cofactors of d,+ 1 1> dr + 1 ,+ i-
Thus .

(1 22 ..... , 1) == J~/'+l.A == O.
Exercise 5. Let A({) be analytic and Hermitian throughout a real neigh-
borhood of C= 0, and suppose that all eigenvalues Al(O)' ... , l,,(O) of A(O) r + ,+ 1
are distinct. Show that there is a real neighborhood .K of' = 0, real functions D1 r +1
Al(C), .. , A,,(C) that are analytic on .AI', and a matrix-valued function U(C) Define ,'+ 2 = '" = /n == 0 (for any , in .IV) and then write the vector I ==
that is analytic on .AI' such that, for Ce .AI; [1 1

I"
'2 ... 'llY'
Obviously, I is a function of Cwith a power series expansion in e1/1 for
small enough. Furthermore, D(C)I(C) == 0, so that I(C) is an eigenvector
I
I
and with the required property. To see this, let Di be the jth row of D(C). Then

DJ.1 == t dill l == D(ll


1=1
2 ... r
2 rr+
j 1)'
(This result also holds without the hypothesis of distinct unperturbed \
eigenvalues.) 0 This is zero for each j because if 1 S j S r we have a determinant with.two i
equal rows, and ifj ~ r + 1 we have a minor of P of order r + 1. I
There are expansions of the form (5) for eigenvectors even when the un-
perturbed eigenvalue has index greater than 1. Indeed, Exercise 6 demon-
strates the existence of a continuous eigenvector. However, even this cannot
Exercise 7 [Rellich), Verify that the matrix A(x) defined for all real x by
~
I
A(O) == 0, A(x) = e-1/XZ fcos(2/x) Sin(2/X)]
be guaranteed if the hypothesis of analytic dependence on the parameter is
removed. This is iIJustrated in Exercise 7.due to F. Rellieh, in which the matrix
lsin(2/x) -eos(2/x)
is infinitely differentiable at C= 0 but not analytic, and it has no eigenvector is continuous (even infinitely differentiable) at x = 0 and that there is no
at C= 0 that can be extended in a continuous way to eigenvectors for non. eigenvector 1l{x) continuous near x = 0 with 1(0) r:/: O. 0
zero values of (.

Exercise 6. Adopt the notation and hypotheses of Theorem I, except thet


the eigenvalue Aof A may have any index m. Show that there is an eigenvector
x/J) of A(C) such that the expansion (5) holds for lei s.mallenough.
SoLUTION. Using statement (b) of Sectionl!.3, we may assume that there
is an eigenvalue AiC) of A(C) that has a power series expansion in powers of .
!
;
,1/1 for 1'1 small enough. Then the matrix D(O = AJ{OI - A(e) has elements
with power series expansions of the same kind. \

\
12.1 TIm NOTION OP A KRONECKIlll PRODUCT
407

case in applications and yields beautiful results concerning the eigenvalues


of the product matrix in terms of the factors. The idea of the Kronecker
CHAPTER 12 product arises naturally in group theory and, as a result, has important
applications in particle.physics.
If A- [aI JJr.J=1 e,m""" B - [blj]r.J=1 e,n"n, then the right Kronecker
(or direct, or tensor) product of A and B, written A 0 B, is defined to be the
Linear Matrix Equations partitioned matrix

and Generalized Inverses aU B auB ..


a21 B a22B '"
A0B-. .
[ . .
aml B am2 B
For example, if

then
In this chapter we consider the general linear matrix equation
a ll bll allbu aubu aUb 12 ]
A 1X B 1 + A 2 X B 2 + .,. + ApXB p - C, A B _ Q U b21 aU b22 a12 b21 a12 b22
for the unknown matrix X; we will also consider its particular cases, [ a;ubu a21b U a 22 b ll a2Z b12'
especially the equation of great theoretical and practical importance: ~1~1 ~lbn an~l ~2bn
AX+XB-C. It should be noted that the left Kronecker product of A and B, defined to
Our investigation of these matrix equations will depend on the idea of the be the matrix
"Kronecker product" of two matrices, a useful notion that has several

[
important applications. A~1 1 . A~ln]
In the second half of the chapter some generalizations of the notion of the . .,
inverse of a square matrix are developed. Beginning with "one-sided" Abn1 Ab nn
I
inverses, we go on to a widely useful definition of a generalized inverse that has similar properties to those of A 0 B, but to avoid unnecessary duplica- 'ii
l
is applicable to any matrix, whether square or rectangular. The tion, only the right Kronecker product, called merely the Kronecker product,
Moore-Penrose inverse appears as a special case of this analysis, and the will be discussed.
chapter concludes with the application of the Moore-Penrose inverse to the
solution of Ax - b, where A and b are an arbitrary (rectangular) matrix and Exercise 1. Verify that for any square matrix A - [al.ilT.J= l'
vector, respectively. (a) In 0 A - diag[A, A, ... , A],
a Ul n auln ... a1mlnj
(b) A 01" - : : :;
12.1 The Notion of a Kronecker Product [
amlIn am2In alll",I"
(C) t; 0 In - I"",. 0
In this section we introduce and examine a new operation that, in full
generality, is a binary operation from pm'" X pn"" to """"11<, although we The properties of the Kronecker product incorporated in Proposition 1
shall confine our attention to the case I =: m, k =: n. This is the most useful follow immediately from the definition.

406
408 12 LINIlAR MATRIXEQUATIONS AND GENmwJzIlO INVERSES 12.1 THE. NOTION OF A KRONI!CKIlll PRODUCT
409
Proposition 1. If the orders of the matrices involved are such that all the
operations below are defined, then
PROOF. F~rst observe that by a simultaneous interchange of rows and
colu~ns With the same number. the matrix A I. can be reduced to the
(a) If /l e', (pA) B == A (pB) == p(A l8l B); matrixJ, A. Thus, there exists a permutation matrix P such that
(b) (A + B) 0 C == (A e
C) + (B 0 C); !,T(A 1/,,)P == I" e A. In fact, if the permutation of 1. 2. 3, ... , mn, which
IS equa to
(e) A l8l (B + C) == (A B) + (A C);
(d) A (B0 C) == (A 0 B)0 C; 1, n + 1, 2n + 1,... (m - l)n + 1,2, n + 2, 2n + 2, ... , (m _ l)n + 2,
(e) (A 0 B)T == AT 0 BT . .. , n, 2n, 3n, .. , (m - l)n, mn,
We need a further property that is less obvious and will be found very
useful is denoted by i.. i 2 , " ' , imn, then the permutation P has the unit coordinate
vectors ell' e12, . , elm" for its columns. Now observe that the same matrix P
Proposition 2. If A, C e ,,,,x,,, and B, D e /Fnxn, then performs the transformation ro: B)P == B I",. Since ppT == I, then
by a repeated use of Corollary l(a), it is found that
(A l8l B)(C e D) = AC e BD. PT(A B)P == PT(A I")PPT(/,,, B)P
PROOF. Write A B == [aIjB]~j= l ' C D == [CIjD]~j= t- The i,jth block == (In A)(B I,..) == B A,
e
of (A B)(C l8lD) is
and the proposition is established.
'" ",

Flj == L (ail;B)(c"jD) == L ail;c"jBD, for i s i, j :s;; m. Exercise 2. If A e fF"'x"" B e /FI1XI1, prove that
~=1 t=l

On the other hand, by the definition ofthe Kronecker product, AC (81 BD = det(A B) == (det A)"(det B)"'.
[')IljBDJ4j" l , where ')IIJis the i,jth element of AC and is given by the formula SOLUTION. In view of Corollary l(a), it is sufficientto find the determinants
')Ilj =Lr= 1 aikC"j' Thus F Ii == ')IljBD for each possible pair (i,j), and the of~he matnce~ ~ (81/ and I", B. By Exercise l(a), det(/", e
11 B) == (det B)....
result follows. Usmg Proposition 3 and the property det P = 1, we have
Proposition 2 has some important consequences, which we list in the det(A In) == (det P'f)det(/11 A)(det P) F (det Ar,
following corollaries.
and the result folIows. 0
Coronary 1. If A e fF"'x"" B e fF"xn, then
Now we define a vector-valued function associated with a matrix and
(a) A B == (A 1")(1,,, 0 B) = (1", B)(A IJ; closely related to the Kronecker product. For a matrix A e fF"'x" write
(b) (A 0 B)-1 = A-I B- 1, provided that A-I and B- 1 exist. A == [A. 1 A. 2 A.I1 ] , where A.j e fFIN,) == 1,2, ... , n. Then the vector
Note that the assertion of Corollary l(a) states the commutativity of
A l8l/n and I", l8l B. A.l]
At
CoroUary 2. If AI, A 2 , All e fF"'x", and B 1, B 2 , , BII e /Fnxn, then [A."
efF-

(AI (81 B 1)(A 2 l8l B 2 ) (A p l8l B p ) == (A 1A 2 All) l8l (B 1B2 B lI ) .


is said to be the oec-function of A and is written vec A. It is the vector formed
The Kronecker product is, in general, not commutative. However, the by "stacking" the columns of A into one long vector. Note that the vee-
next Proposition shows that A B and B A are intimately connected. function is'linear:
Proposition 3. If A e /F"'x"" Be ,"XII, then there exists a permutation Vec(ocA + PB) == oc vee A + P vec B,
matrix PER"' X 111ft such that
for any A.Be,lI/x" and oc,pe9&: Also, the matrices A.,A2 , ... ,A" from
pT(A B)P = B l8l A. C"'X" are linearly
.
independent as members of fFmx" if and only if vec A ..
410 12 LINllAIt MATRIXEQUATIONS AND GIlNmlAuZED INVBRSIlS 12.2 EIGENVALUES OF KRON1lCKER PllODUcrs 411

vee A 2 , , vee At are linearly independent members of !F"". The next Exercise 6. Prove that, if All B 1 are similar to the matricesA2 , B2 , respec-
result indicates a close relationship between the vee-function and the tively, then A 1 B I is similar to A 2 B 2 In particular, if Al and B 1 are
Kronecker product. simple, then so is A I B r-

Proposition 4. If A E IF III " III, BE IF""", X E IFill"", then Exercise 7. If A E IFill "Ill, prove that, for any function f().) defined on the
spectrum of A,
vee(AXB) = (BT e A) vee X. (a) f(I" A) = I" f(4);
PROOF. For j = 1,2, ... , n, the j-column (AXB).} can be expressed (b) f(A I,,) = f(A) I". e 0

(AXB).} = AXB.} = L" bt}(AX).t = L'" (bt}A)X.t,


t=1 t=l
12.2 Eigenvalues of Kronecker Products and Composite
where bt ) is the element of B in the k,jth position. Consequently, Matrices
(AXB).} = [bl)A b2)A . . . bnjA] vec X,
and it remains to observe that the expression in brackets is thejth block-row One of the main reasons for interest in the Kronecker product is a beauti-
inBTA. fully simple connection between the eigenvalues of the matrices A and B and
A B. We obtain this relation from a more general result concerning com-
Coronary I, With the previous notation, posite matrices of Kronecker products. .
Consider a polynomial p in two variables, x and y, with complex co-
(a) vee(AX) = (I" A) vee X;
efficients. Thus, for certain complex numbers elj and a positive integer I,
e
(b) vec(XB) = (BT 1m> vee X;
(c) vec(AX + XB) = I" A) + (BT I... vec X. I
p(x, y) = L CI}X'yi.
Important applications of these results will be considered in Section 3. i.}=O
We conclude with a few more properties of the Kronecker product. , If A E CIII"'? and Bee""", we consider matrices in cm""III" of the form
I
Exercise 3. If A E IF'" XIII, B E IF""", prove that p(A; B) = L ciJAi Hi.
I,}=O
(a) tr(A B) = (tr A)(tr B);
(b) rank(A B) = (rank A)(rank B). For example, ifp(x,y) = 2x + xy3, we write 2x + xy3 = 2xl yo + X1y3 and
Hint. For part (a) use the representation A B = [a,}B]T,}=I' Use peA; B) = 2A e I" + A e B 3

Proposition 2 for part (b). The next theorem gives a connection between the eigenvalues of A and
B and those of p(A; B).
Exercise 4. Write A(1) = A and define Alu I) = A A[k), k = I, 2, ....
Prove that neorem 1 (C. Stephanos"). If ).1""').m are the eigenvalues ofA E elll" lII
and til' ... , /l" are the eigenvalues ofBee" " ", then the eigenvalues of peA; B)
(a) A[UI) = AllclAlll; are the mn numbers p()." /l.), wherer = 1,2, ... , m and s = 1, 2, ... , n.
(b) If A, B e IF" X", then (AB)[kl = AltIB[Ic).
PROOR Let matrices P E elll" III and Q e e""" be defined so that
Note the contrast with (AB)Ic.
PAP- 1 =J I , QBQ-l = J 2 ,
Exercise 5. Verify that where J 1 and J 2 are matrices in Jordan normal form (Section 6,5). It is clear
(a) The Kronecker product of diagonal matrices is a diagonal matrix; that J~ is an upper-triangular matrix having).L ... , ).~ as its main diagonal
(b) The Kronecker product of upper- (r.espectively, l~wer-) triangula~
tlour. Math.puresappl. V6(1900), 73-128.
matrices is an upper- (respectively, lower-) tnangular matnx.
412 12 LINI!AR MATRIX EQUATIONS AND GJlNIlRALJZIlD INVIlRSIlS 12,3 APPLICATIONS OF TUB KaoNECKEIl PRoDUCT 413

elements and similarly, "'i"'" "': are the main diagonal elements of the Exercise 3. If x = [tXl tXz tXm]T e em is an eigenvector of A e em"m
upper-triangular matrix J z. corresponding to the eigenvalue .t, and y e e" is an eigenvectorof Been"n
Furthermore,from Exercise121.5 it follows that J~ @ J~ is also an upper- corresponding to the eigenvalue p., show that an eigenvector % of A B
triangular matrix. It is easily checked that its diagonal elements are A:.~ associated with A", is
for r = 1, 2, ... ,m, s = 1, 2, ... , n. Hence the matrix p(J 1; J J is upper-
z= [tXtJlT tX2yT lXmyT]T. (1)
triangular and has diagonal elements pel"~ ",,,). But the diagonal elementsof
an upper-triangular matrix are its eigenvalues; hence p(J 1; J z) has eigen- Exercise 4. Show that if A and B are positivedefinitematrices,then A @B
values p(A.. ",J for r = 1, 2, ... , m, s = 1, 2, ... , n. is positivedefinite. 0
Now we have only to show that p(J 1; J z) and p(A; B) have the same
eigenvalues. On applying Corollary 2 to Proposition 12.1.2, we find that Define the (right) Kronecker product of two rectangular matrices
J~ @ J~ = PA p - l @ QWQ-l
A = [a,.J1:j~ I e:Fm" I and B E iFn " lc to be the matrix
'
= (P Q)(A @ Bi)(p-I @ Q-I), A 10>.
'01'
B -A [a IjB]m.1 e armn"/lc
I.j'" I " (2)
'
and by Corollary 1 to Proposition 12.1.2, p- 1 Q-l = (P Q)-I. Thus, Exereise 5. Establish the assertions of Propositions 12.1.1 and 12.1.2 for
rectangular matrices.
J~ e J~ = (P Q)(A ' Bi)(P Qr 1
,

and hence Exercise 6. Let x, y e em and II, 11 e cn. Show that the equation x @ II Ii

P(Jl; Jz) = (P Q)p(A; B)(P Q)-I. = y II implies that y = AX and 11 = AU for some Ae e. 0
This showsthat p(J I; J z) and p(A; B) are similar and so they must have the Observe that, using the general definition of a Kronecker product of
same eigenvalues (Section 4.10). rectangular matrices (see Eq. (2, we may write % in Eq. (1) as % = x y
and use the generalized Proposition 12.1.1 (see Exercise 5) for proving
The two following specialcases (for p(x, y) = xy and p(x, y) = x + y) are Exercise 3. Furthermore, if A and B are simple,so is A B (Exercise 12.1.6)
probably the most frequently used. and hence there is an eigenbasis for C"'n" mn with respect to A @ B consisting
Corollary1. The eigenvalues of A @ B are the mn numbers l,,,,.. r =1, entirely of the Kronecker products x, Ij, where {X,}i"=l, {Yj}j.l are the
eige~bases'in em and en for A and B, respectively. In the general case the
2,.. , m; s = 1,2, ... , n.
situation is more complicated.
CoroDary2. The eigenvalues of (In @ A) + (B @ 1m ) (or,what is equivalent
in view of Exercise 4.11.9, of (In A1) + (BT 1,J) are the mn numbers
A, + "'", r == 1, 2, ... , m, S = 1, 2, ... , n.
The matrix (In @A) + (B @1m> is often called the Kronecker sum of A 12.3 Applications of the Kronecker Product to Matrix
andB. Equations
Exercise 1. If C e fF""''' _ is the Kronecker sum of A E iF"''''"and Be iF" "n, .
prove that tf = eA ell and indicate a corresponding result in numbers.
We start by studying the general linear matrix equation
Hint. Use Exercise 12.1.7 for f(A) = eA..
Exercise 1. Let A e C"" n and let B e C""''' _ be the partitioned matrix with A 1KB1 + A2XB2 + ... + A,XB, = C, (1)
each of its mZ blooks equal to A. If A has eigenvalues Ill"'" Iln (not Where AJ'e em"m, BJ e C""n U = 1,2, ... , p), X, C e em"n, and its particular
necessarilydistinct), prove that the eigenvalues of Bare m"'I"'" m"'n and cases. The method we use relies on the assertion of Proposition 12.1.4 and
m(n - 1) zeros. reduces Eq. (1) to a matrix-vector equation of the form Gx = e, where
Hint. Express B as a Kronecker product. G e C""'''- and x, c e C-. , '

l'
414 12 LINBAIl MATRIX EQuATIONS AND OENERALIZI!D INVI!RSI!S 12.3 APPLICATIONS OF THE KRONIlCKER PRODUCT
415
Theorem 1. A matrix X e C",>c n is a solution of Eq. (1) if and only if the PROOF. First observe that the conditions of the theorem imply that A and,
vectorx =:;: vec X defined in Section 12.1 is a solution of the equation - B have no eigenvaluein common, and hence there is a unique solution X
ofEq.(3).
Gx == c, (2)
Consider a matrix-valued function Z(t) defined as the solution of the
with G =:;: Lf=1 (BJ AJ) and e = vec e. initial-valueproblem '
PROOF. By Proposition 12.1.4, vec(AJXBJ) = (BJ 0 A J) vee X for each dZ/dt = AZ + ZB, Z(O) = C. (5)
j == 1,2, ... , p. Since the function vec A is linear, then As indicated in Exercise9.12.14, the solution ofEq. (5) is
p
vec e = L' (BJ A J) vec X Z(t) = ed-tCe&.
J=1 Now observe that, integrating the differentialequation in (5) from t =0
and the result follows. to t = 00 under the hypothesis that the matrix (4) exists,we obtain
Having transformed the linear matrix equation to' an equation of the
form Gx = e, we may now apply the results developed in Section 3.10 to Z(oo) - Z(O) = A LooZ(t)dt + (LooZ(t)dt)B.
determine criteria for the existence and uniqueness of a solution (see also
Theorem 12.6.7 to follow). Henceassuming, in addition, that
Coronary 1. Equation (1) has a solution X if and only if rank[G c] == Z(oo) = Jim eAtCeBt = 0, (6)
t .... OO
rank G.
it is found that the matrix given by Eq. (4) is the (unique) solution of Eq. (3).
Corollary 2. Equation (1) has a unique solution if and only if the matrix G Thus it suffices to check that, for stable A and B, the assumptions we have
in Eq. (2) is nonsingular. had to make are valid. To this end, write the spectral resolution (Eq. (9.5.1
We now consider an important particular case of Eq. (1); the matrix for the function eAt (0 < t < 00):
equation 5 mk-l

AX+XB=C. (3)
e" '= L L tJeA"tZ"j,
, "=1 j"'O
, (7)

Theorem 2. Equation (3) has a unique solution if and only if the matrices A where A,1> A,2 , , A,5 are the distinct eigenvalues of A with indices
and - B have no eigenvalues in common. m1, m2, ... , mIt respectively. Consider the Cartesian decomposition of the
eigenvalues, A" = IX" + iP", and write
PROOF. In the case beingconsidered, the matrix Gin Eq, (2) is (In C8l A) +
(BT C8l 1m> and its eigenvalues are, by Corollary 12.2.2, the numbers A,. + p" e
Akt
= e"ktelJJkt = e"kt(cos P"t + i sin p"t),
r = 1, 2, ... , m, s = 1, 2, ... , n. It remains now to apply Corollary 2 and for each k = 1,2, ... , s. Since IX" < 0 (k = 1,2, ... , s), it follows that, for
recall that a matrix is nonsingular if and only if all its eigenvalues are non- each k, exp(A"t)..., 0 as t -+ 00. Hence (see Eq, (7 lim,... oo e" = 0 and,
zero. ' similarly, lim,... oo eBt = O. Thus, the relation (6) as wellas the existenceofthe
There is a special case in which we can obtain an explicit form for the integral in (4) follow.
solution matrix X of Eq. (3). Now consider the case in which A and -B have eigenvalues in common.
Theorem 3. If all eigenvalues of the matrices A e e"''''" and Been" n have Then the matrix G = (In A) + (BT 1m) is singular and solutions exist,
negativereal parts (that is, A and B are stable), then the unique solution X of provided rank[G c] = rank G (Corollary 1). The number of linearly
Eq. (3) is givenby independent solutions is then determined by the dimension of the kernel of
G.The direct determination ofdim(Ker G)posesa relativelydifficult question
and can be found in the standard referencebook by MacDuffee (1946) where
the Frobenius proof is presented; this proof dates back to 1910.
416 12 LINBAa MATRIXEQUATIONS ANDG1iNllRALlZllD 1NVJ!RS1lS 12.4 COWtITlNG MATRICES
417
However, we prefer at this moment to abandon the idea of using the Pro~eeding to the analysis of the . reduced Eq. (2) we write
Kronecker product in investigatirlg the solvability and the number of J = ~lag[Jl,.J2'"'' :',,], where J. = 4.1". + N,.. is a k. x k. Jordan block
linearly independent solutions of Bq, (3). Our line ofattack will restart with a associated With an eigenvalue A. (1 ~ s ~ pl. Hence, performing a corre-
study of the particular cases AX = XA in Section 124 and then AX + sponding partition of Yin Eq. (2),
XB = 0 in' Section 12.5, where the problems mentioned regarding the
general equation AX + XB = C will be resolved. Y== [Yrt Jf,''' l, Y"eC"'''''',
Eq. (2) is reduced to the following system:
Exercise 1. Let A be a stable matrix and suppose W;::: O. Show that the
Lyapunov equation AX + XA* = - W has a unique solution X and that J. y" = Y"J" s, t = 1, 2, ... , p,
X ~ O. (See also Section 13.1.) or, exploiting the structure of Jordan blocks,

Exercise 2. LetA e mxm, BeC"x",and CeCmx". If A and B have spectral.


c (A. - A,)Y" = Y"Nrc. - N". Y." S, t = 1,2, ... , p. (3)
radii IJ... and IJ.B' respectively, and IJ...IJ.B < 1, show that the equation We now consider two possible cases.
X=AXB+ C Case (1). A.:F A,. Multiplying Eq. (3) by A. - A, and using Eq. (3) again,
we obtain
has a unique solution X and that this solution is given by
GO
(A. - l,)2y" = (Y"N", - N,.. Y,,)Nrc. - N".(Y.,Nrc. - N". Yrt) .
X = LAJCBj. Simplifying the right-hand expression and multiplying successively by
J.. o A. - l" it is found that, for r = 1,2, ... ,
Exercise 3. Show that Eq. (8) has a unique solution if AIJ. :# 1 for all.
AE a(A) and /l e a(B). jt(-I)iC)Nty.,N~-J,
(A. - ArYY., = (4)
Exercise 4. Let A, C E C"" II and C'. = C. If the equation where it is assumed that Nf. == lie., N~ = I",.
X= A*XA + C Now recall that the matrices N,.. are nilpotent and therefore N': = 0 for
has a unique solution X, show that X is Hermitian. Also, if /l.. < 1, show that sufficientlylargem.HenceifrinEq.(4)islargeenough,then(A. - ;;)'y. = 0
(X - C) ;::: O. In particular, if C ~ 0, then X ;::: O. and the assumption A. :# l, yields y., = 0. II

Case (2) A. = l,. In this case Eq. (3) becomes


ExerciseS. If A> 0, C> 0, and X = AXA + C, show that/l... < 1. 0
Y"N". = N", Y." 1 ~ s, t s p, (5) I
12.4 Commuting Matrices
and if y., = ['I'lj]b~I' comparing the corresponding elements of the matrices
on the right and left in Eq. (5), it is not difficult to see that
11+ l,j = 'I".j- it i = 1,2, ... , k., j = 1,2, ... , k"
Where i~ is supposed that '1'/0 = 'I'".+l.J = O.
(6) l
An important special case of Eq, (12.3.3) arises when C = 0 and B == - Ai
AX = XA, A, X e e""". (1)
The equalities (6) show that the k. x k, matrix Y., (1 ~ S, t s p) is in one
ofthe following forms, depending on the relation between k. and k,:
I
Clearly, finding all the solutions X of Eq. (1) is equivalent to determining
all matrices that commute with the given matrix A. To this end, we firSt
observe that if A = PJP- 1 , where J is a Jordan normal form for A, the~
Eq. (1) is equivalent to the e q u a t i o n '
(a) if k; = kit then Y., is an upper-triangular Toeplitz matrix;
(b) if k. < k" then 1';, = [0 Y".];

(c) if k, > k, then Y" = [;] ,


II.

JY= YJ, I
I :i,
in which Ya:. and Yrc. denote (square) upper-triangular Toeplitz matrices of
where Y = P- 1XP. orders k. and k, respectively.
12.4. CoMMUTING MATRlCIiS 419
418
We have ~een that, quite obviously, any polynomial PeA} in A eO'''
Thus, we may state the fonowing.
commutes. WIth A; w~ now. ask ~bout conditionson A such that, conversely,
1lIeoreml. If AeC"lC and A .... PJP- 1, where J is a Jordan canonical
ft
each matrix commuting With A IS a polynomialin A.
form of A, then a matrix X commutes with A if and only if it is of the form
X = rrr:, where Y = [1';,]:,'''1 is the partition of Y consistent with the ~roposition~. Every matrix commuting with A e e" is a polynomial in A
partition of J into Jordan blocks, and where Y" = 0 for A.:!:,\, and r.,
is in
if and only if no two Jordan blocks in the Jordan canonical form of A are
associated with the same eigenvalue. Or, equivalently,the elementary divisors
one of the/orms (a), (b), or (c) if A. = At 0/ A are pairwise relatively prime.
Exerdse 1. Check that the set of matrices commutingwith the matrix
PROOF. First recall that in view of the Cayley-Hamilton theorem the

p((~ ~] +l~ ~ 11 + (~ :])P-'


number of linearly independent powers of A does not exceed n. Thus, if
AX = X A implies that X is a polynomial of A, then X can be rewritten as
A= X = peA), where I = deg p(A.) ~ n and all' powers 1, A, A 2, , A,-l are
!ine~ly independent. Hence the dimension of the solution set of Eq. (1) is
consists entirely of the matrices 10 this case = I ~ n. But Eq, (7) showsthat (X ~ n, therefore IX = n and the
Yo i0')'1 IXo IXl resultfollows from Eq. (7).
o i ')'0 0 0 IXo Conversely, if ..1. lt A2 , , A, are the distinct eigenvalues of ....with indices
...... ..__-l....._..----_ ...- ....- m1, m2' ... , m., respectively, then the Jordan form J of A consistsof exactly
X= P Po PI II s; ~1 02 s blocksJ,. Aswe haveseen,the solution X of AX = X A is then similarto a
o Po! 0 s; ~1 directsum of upper-triangularToeplitzmatrices:
o 0 i 0 0 s, (1) y(l) y(1) ]
o 1 m,-l
wherethe letters denote arbitrary complexnumbers. 'D [y yg) -, :
The number of arbitrary parameters in the general solution X of Eq. (1) Yj= ". y~) ,
is obviously greater than or equal to n. ')'~I)

Exercise 2. If IX" denotes the degreeof the greatest common divisor of t~e for i = 1,2, ... , s, In view of Eq, (9.2.4) it is clear that a polynomial P(A)
elementary divisors (A - A.Y' and (A - ,t,'f. (1 ~ s, t ~ p), prove that X 10 satisfying the conditions
Eq. (1) has
P(A.I) = y(O')' -!-
I! p'(A) - y(1) 1
(ml _ 1}1 P
(m'-11(2) -
"., - (I)

L" IX.,
I - 1, .. , ')'m,-lt
IX =
. '=1 i = 1,2, ... , s, gives the required result:
undefined elements.
X = P diag[Ylt Yi, , .]P-l
Hint. Observethat if A, = 1" then " = min(k" k,). 0
= P diag[p(J 1)' , p(J,)]P- 1
Rephrasing the result of Exercise 2, the dimension of the linear. spaceof
= Pp(J)P-1 = p(PJP- 1) = peA).
solutionsX of AX = XA is the number IXgivenby Eq. (7). To seethis, deno~
the parameters of X by l'lt l'2"'" "I", and denote by X, the solution; Note that p(A} can be chosen with degreenot exceeding n - 1.
Eq. (I) obtained by putting ')', = 1, Yi = 0 U = 1, 2, ... , ,i :!: i). Cle,arly, t"~
matrices X, (i == 1,2, ... , IX) are linearlyindependent and any solution X:o~ We now confine our attention to matrices commuting with a simple
matrix. First, observethat Theorem 1 implies the result of Exercise 3.
Eq, (1) can be written
" I, . Exerdse 3. If A is simple and Ab A2 , , A, are the distinct eigenvalues of
X= L')'IX A, show that the linear space of solutions of AX = X A has dimension
1"1
Pi + Il~ + ... + P:, where Il, (1 ~ i ~ s) is the algebraicmultiplicity of A,.
Thus, X l' X 2' , X" constitute a basis in the solution space of AX =X
420 12 UNBAR MATRIX EQUATIONS AND GENERALIZED INVERSES 12.5 SoLUTIONSOF AX + XB = C
421

Exercise 4. Show that if A e en x n has all its eigenvalues distinct, then 12.5 Solutions of AX + XB == C
every matrix that commutes with A is simple. 0

Proposition 2. Simple matrices A, X e en x ncommute if and only if they have In the previous section we described the form of solutions of the equation
a set of n linearly independent eigenvectors in common. AX ::::: XA. Now observe that a matrix X E em Xn is a solution of the equation
AX + XB = 0, where A E c mx m, BE e n x n, if and only if the (m + n) x
In other words, if A and X are simple and AX = X A, then there exists a (m + n) matrices
basis in en consisting entirely of common eigenvectors of A and X.

PROOF. Let AI' A2 , , A. be the distinct eigenvalues of A with indices [


1m
o
X]
In' (1)
ml' m2' ... , m.. respectively, and let A = PJ r:', where the diagonal matrix
J is a Jordan canonical form of A. By Theorem I, the general form of matrices commute. Thus, applying Theorem 12.4.1 to these two matrices, the next
commuting with A is result readily follows.
Theorem!. Let A eCmx m, BEcn x n and let A = PJ1P-l, B = QJ 2Q-l,
X::;: pyp-l = P diag[Yl' 12, ... , .JP- 1 where
e
where lJ E mJ x mJ, j = I, 2, ... .s. Reducing each lJ to a Jordan canonical
form, lJ::::: Q}J}Qi 1, we obtain
and
X = (PQ)diag[J 1, J 2 , , JJ(PQr 1
,
J2 = diaglfllIrl + Nrl' ... , /lql r + Hr.]
where Q = diag[Q1> Q2'' QJ. are Jordan canonicalforms of A and B, respectively. Then any solution X of
Since X is simple so is diag[J I' J 2' , J J, and therefore the columns of
PQ give n linearly independent eigenvectors of X. The same is true for A, AX + XB =0 (2)
because is of the form X = PYQ-l, where Y= [Y"J::''..l is the general solutio~ of
J 1 Y + YJ 2 = 0, with Y" E et " s = 1,2, ... , p, t = 1,2, ... , q. Therefore
xr

Y has the properties S, = 0 if A.:F - /l" and S, is a matrix of the form


describedin Theorem 12.4.1for A. = - /l,.
Conversely, if A and X have a set of n linearly independent eigenvectors in
x
common, then A and are obviously simple, and the eigenvectors determine The argument used in Section 12.4 can. also be exploited here.
a matrix P for which A = PD 1P-l, X = PD 2P- 1 , and D 1 , D2 are diagonal
matrices. Then A and X commute, for Corollary I. The generalform of solutions of Eq. (2) is given by the formula
II
AX = PD 1D2P- 1 = PD 2D 1P- 1 =,:~:'A. X = L"IiX" (3)
1-1
Exercise S. If A is a normal matrix with all its eigenvalues distinct, prove where Xt>X 2 , , XII are linearly independent solutions of Eq. (2) and
that every matrix that commutes with A is normal. II q

Exercise 6. Prove that normal matrices commute if and only if they have a
IX = L LIX",
3",,1 '=1
(4)

where IX~ (1 S; s S; p, 1 S t S q) is the degree of the greatest common divisor


common set of orthonormal eigenvectors.

Exercise 7. If A is a simple matrix with s distinct eigenvalues, prove that: of the elementary divisors (A - A.Y' and (A + p.,Y'of AandB,respectively.
every matrix that commutes with A has at least s linearly independent Note that the number C( in Eq. (4) is equal to dim(Ker G), where G = I
eigenvectors in common with A. 0 A+~~. n
422 12.5 SoLUTIONSOF AX + XB == C 423
The next exercise indicates a second approach to equation (2) based on PROOF. If Eq. (5) has a solution X, it is easily checked that the matrices in
the observation that 1m X is A-invariant. (6) are similar, with the transforming matrix .
Exercise 1. Show that the equation AX = XB has a solution X of rank r
if and only if A and B have r eigenvalues in common with corresponding
partial multiplicities in A and B. Conversely.Jet
OUTLINE OF SOLUTION. Observe first that Im X is A-invariant. Then use
Ex. 6.5.3 to show that X has a factorization X = EX 0 where E is m x r, Xo
is r x n, both have full rank, and the columns of E are made up of Jordan
[: _~] = p-{: _~]p (7)

chains of A. Thus AE = EJ and J is an r x r matrix in Jordan normal form. for some nonsingular matrix P e e(m + II) lC (m+ II). Define linear transformations
Then show that XoB = JXo , and conclude that the eigenvalues of J (and 11,7; e ~(e(m+lI)lC(m+IIJ) by the rule
their partial multiplicities) are common to both A and B.
Conversely, define E and X o offull rank so that Tt(Z) = [ : _;]z -z[: _;}
[~ _~]z -z[: -;J.
AE = EJ, XoB = JX o,
and show that X = EX0 satisfies AX = X B. 0 7;(Z) =
Proceeding to the general equation where Z e c(m+lI) x 1m+II), and observe that it suffices to find a matrix in
AX+XB = C, Ker 7; of the form
where A e c mx m
B e e" XII, X, C e em XII, it is easily seen, after rewriting this,
, (8)
as a system of mn linear (scalar) equations, that Eq, (2) (also rewritten as
such a system) is the corresponding homogeneous equation. Thus we can . Indeed, in this case a straightforward calculation shows that the matrix U
combine the results of Section 3010 with the foregoing discussion. in (8) is a solution of (5).
Eq. (7) implies that
'Theorem 2. If Bq; (.5) is solvable, then either it has a unique solution (when i
o(A) " (I(-B) = 0) or it has infinitelymany solutions given by theforlnula I
and hence "i
X = x; + g,
Ker 12 = {PZ:Z e Clm+II)lC(m+lI) and Z e Ker 71}.
where X o is a fixed particular solution of Eq. (5) and g, being the general
solution of the homogeneous equation, Eq. (2), is of theform (3). Consequently,

Thus, to complete an investigation of Eq. (5), it remains to obtain a dim(Ker 71) = dim(Ker T2 ) . (9)
criterion for its solvability in terms of the matrices A, B, and C. This criterion Let us consider the set [/ consisting of all pairs (V, W), where Ve C"xm,
is due to W. E. Roth t and the proofpresented here is that of H. Flanders and; We C" lC ", and such that BV + VA = 0 and BW= WB. Then [/ is a linear
H. K. Wimmer'. space with respect to the natural operations
Theorem 3. Equation (5) has a solution if and only if the matrices a(V, W) = (ocV, ocW), IX e e,
and (Vt, Wj) + (Vz, fJ2) = (VI + Vz, ~ + JrVz).
are similar. Define the transformations q>, e ~(Ker 1;, 9'), i = I, 2, by the formula

t Proc. Amer. Math. Soc. 3 (1952), 392-396.


*SIAM Jour. Appl. Math. 32 (1977), 707-710. q>{[: ~]) = (V, W)
424 12 UNBAll MATRIX EQuATIONS AND GBNEIlALIZIlD INVERSES 12.6 ONE-SIDED 1NVBRSES
425
and observe that this section to a study of the notion of one-sided invertibility ofmatrices but
first. recall the corresponding ideas for linear transformations studied .
Ker If'1 = Ker ({J2 = {[: ~lAR = RA. AU + UB = o} (10) Section 4.6.
!'- 1-
m
matrix e ~: K" is sai~ to be left (respectively, right) invertible if there
Furtnermore.Im e, = ~Thisisbecause(V, W)e9'impliesBV + VA = 0 exists a matrix AL (respectively. Ai 1) from F" K m such that
and BW = WB. Hence lA
Ai = I" (respectively. AAi 1 = I m > . ( l )
[: ~]eKerl1. and If'l([: ~])=(V,W). . A ma~rix A.i (respectively, AR1~ satisfying Eq. (1) is called a left (respec-
1

tively, rrg~t) mverse of A'. CI~ar1y.If m n and A is nonsingular, then the


=
Thus. 9' elm If'l' Since the reverse inclusion is obvious. Im q>1 = ~ Now one-Sided inverses of A coincide WIth the regular inverse A - 1.
we have 1m If'2 c 9' = 1m If'1' But, on the other hand (see Eq. (4.5.2. . The fir.st results are obtained directly from Theorems 4.6.1 and 4.6.2 on
m
dim(Ker q>i) + dim(Im q>i) = dim(Ker Ti). i = 1.2. mterpretmg members of F " " as linear transformations from C" to cm.

and therefore Eqs. (9) and (10) imply that dim(Im If'l) = dim(Im q>2) and, Theorem1. Thefollowing statementsareequivalentfor any matrix A E :J& m "":
consequently. that (a) the matrix A is left invertible;
(b) m ~ n and rank A = n;
1m q>1 = 1m If'2 = ~
(c) The columns of A are linearly independent as members of Fm.
Now observe that (d) Ker A = {OJ. '

[~ _~] E Ker 11 and If'1([~ _~]) = (0, -1). Theorem2. Thefollowing statementsareequivalentforany matrix A e g;m K,,:
(a) the matrix A is right invertible;
Thus. in view of Eq. (11). there is a matrix (b) m S nandrank A = m;
(c) The rowsof A are linearly independent as members of g;n.
[~: ~] E Ker T2 (d) Im A = g;m.

such that Th~s, ~ x n ~atrices of fu~l ran~ (see Section 3.8) and only they are
0;De-sl~ed Invertible. They are Invertible from the left if m ~ n and from the )
...

(0. -1) = If'2([~: ~]). ngh~ If m S n. In the case m = n a matrix of full rank is (both-sided) in-
vertible. . l.
1
This means that Vo = 0 and Wo = -1. Thus we have found a matrix i~> Exercise 1. Let "t
Ker T2 of the desired form (8). This completes the proof. .

12.6 One-Sided Inverses


A=[~ ~-a B-H-!].
Ver~y that B = Ail and that A = Bi 1 Find other right inverses of A and
left Inverses of B. 0
In this and the following sections we shall introduce some generalizatiQ letA EFm~",m~ Itt and rank A = n. Then Ais left invertible (Theorem 1)
of the inverse matrix. These generalized inverses retain some of the impo and hence the equation
properties of the inverse but are defined not only for nonsingular (squat
matrices but for singular square and rectangular matrices, as well. We devo (2)
12 LINEAR MATRIX EQUATIONS AND GENIlRALlZIlD INVIlllSllS . 12.6 ONll-SIDI3D INVIIRSBS 427
426
is solvable. To describe all solutions of Eq. (2), that is, the set of all left Theorem 5. If A is left invertible, then the equation Ax = b is solvable if and
inverses of A, we first apply a permutation matrix P E !Fill x III such that the only if
Pi x n nonsingular submatrix Al in PA occupies the top n rows, that is, (J", - AAil)h = O. (6)
Furthermore, if Eq. (6) holds, then the solution of Eq. (5) is unique and is
given by theformula
x = Ailb.
It is easily verified that matrices of the form
PROOF. Let x be a solution of Bq, (5). Then the multiplication of Eq, (5) by
[All - BA 2A I I B)P (3) , AAi I from the left gives .
forany Be !F" X (IlI - " ) satisfy Bq. (2), that is, are left inverses of A. Conversely, AAilh == A(AiIA)x = Ax = h,
if Ai 1 is a left inverse of A, then, representing Ai 1 = [C B]P, where
C E ,,,x",B e ," X (IlI - I), and substituting it in Bq.(2), we obtain CAl + BA 2 and the condition (6) follows. Conversely, if Eq. (6) holds, then the vector
= I". Hence C = All - BA 2Ai l and the following result is proved. x = ALlh is a solution of Bq. (5). If XI is another solution of (5), then
A(x - XI) = 0 and the assertion (d) of Theorem 1 yields the requited result,
Theorem 3. If A e 'Ill X" is left invertible, then all left inverses of A are x=x I
given byformula (3), where B is an arbitraryn x (m - n) matrix.
Theorem 6. If A is right invertible, then Eq. (5) is solvable for any h e'llI,
A similar result is easily proved for right inverses. and if b 'F 0, everysolution x is ofthe form .
Theorem 4. Let A e !Fill X" be right invertible. Then all right inverses of A x = Ailb, (7)
are given by theformula
for someright inverse Ai 1 of A.
Ail = Q[A;I - ~;IA4Cl PROOF. The solvability of Bq. (5) for any vector he'''' is obvious since
A(Ai I b) = b, and X in Eq. (7) is a solution of Eq. (S). Suppose now that X is
for any (n - m) x m matrix C. Here Q e ,,,xn is a permutation matrix such a solution ofEq. (5). Decompose A as in Theorem 4: A= [A 3 A 4]QT, where
that AQ = [A 3 A4] and the m x m matrix A 3 is nonsiPlgular. det A 3 'F 0 and Q is a permutation matrix. Perform a consistent decomposi-
tion of x: X = Q[xI xD T Then Eq. (5) can be rewritten A aXI + A 4 X2 = b,
It should also be noted that the matrices AA LI and Ai I A are idempotents: hence XI = A;lb - A;IA4x2.NowcomputeAilbusingEq.(4)andusethe
(AAi1)(AAi I) A(AiIA)Ail AAi l.
= = fact that (since b 'F 0) a matrix C E F(n-m) x'" exists such that Cb = X2:

Similarly, Ai I A is also idempotent.


Exercise 2. HA is lef~ invertible, prove that
Ailb = Q(Ai
lb
-C~;IA4Cb] =QDJ = x,

and the assertion follows.


(a) AAi l is a projector onto 1m A;
(b) dim(Ker Ail) = m - dim(Im A). Theorems 5 and 6 are easily included in a single more general statement
that will also serve to introduce an idea developed in the next section.
Exercise 3. If A is right invertible, prove that Comparison should also be made with Theorem 3.10.2. Let A E'IlI"",and
(a) I - Ai I A is a projector onto Ker A; let X e '"" m be any matrix for which
(b) dim(Im Ail) = n - dim(Ker A). "0 (8)
AXA =A.
We now consider the equation If A is left (or right) invertible, we may choose for X a left (or right) inverse of
Ax = b, A e ,,,,x,,, x e ''', be 'Ill, A, of course, butas we shall see in the next section, such matrices X exist even
if X has no one-sided inverse.
with a one-sided invertible matrix A.
428 12.7 GsNllRALIZ1!D INVERSElI
429
'Theorem'. Let A e pM", be P, and let X be a solution of Eq. (8). Then ~us, ~ g~era1ized inverse ~~ays exists and is,.generally, not unique.
the equation A% = b has a solution if and only ifAXh = h, and then any Bearing in mind the decomposition (3), it is easy to deduce the following
solution hastheform criterion for uniqueness of the generalized inverse.
x = Xb + (1 - XA)y,
E%ereise 1. Prove that the generalizedinverse of A is unique if and only if
for some y e /F". A = 0 orA is a square and nonsingular matrix. 0
E%ercise 4. With the assumptions of Theorem 7, prove that A% = 0 has the The following properties of generalized inverses follow readily from the
general solution % =(I - XA)y for any, e '''. In other words, if % satisfies (8) defining relations (1) and (2).
=
then 1",(1- XA) KerA. Then prove Theorem 7. 0
Proposition 2. If A e pMIJ and AI denotes one of its generalized inverses,
then
12.7 Generalized Inverses
(a) AlA and AA I are idempotent matrices.
(b) rank AI = rank A.
In the preceding section we introduced the notion of a one-sided inverse (c) the matrix (AI). is oneof the generalized inverses of A.
for A e fF"'x", and we showed that such an inverse exists if and only if the PROOF. Parts (a) and (c) follow immediately from the defining properties
matrix A is of full rank. Our purpose is now to definea generalizationof the (1) and (2) as applied to AI. Part (b) follows on applyingthe result of Exercise
concept of a one-sided inverse that will apply to any m x n matrix A. To, 3;8.5 to the same relations.
this end, we first observe that a one-sidedinverse X of A (if it exists)satisfies IJ
the equations If A e !F x" is a nonsingular matrix, so'is A -I and obviously

AXA=A, 1m A + Ker A-I = /FIJ + {OJ = ,IJ,


XAX=X. Ker A +1m A-I = {O} +,IJ = fFIJ.
The next result generalizes these statements for an arbitrary matrix A and
ThereCore we define a generalized inverse of A e ,,,,x,, to be a ma~ its generalized inverses.
X e fF"M '" for which Eq. (1) and (2) hold, and we write X = A' for such ~
matrix. Proposition 3. If A E fF",xlJ and AI denotes one of the generalized inverses
of A, then
Proposition 1. A generalized inverse AI eXistsforany matrix A e ,,,,X". 1m A + Ker AI = /F'" (5)
PROOF. Observe first that the m x n zero matrix has the n x m zero matrix: Ker A + Im AI = /F IJ. (6)
as its generalizedinverse.Then let r :# 0 be the rank of A. Then (see Section PROOF. It has been noted in Proposition 2(a) that is a projector and
AAI
2.7) it is clear that Im(AA I) c: 1m A. We show that, in fact, AAI projects onto
1m A. If y = Ax e Im A, then since A = AAIA, y = (AAIXA%) E Im(AAI).
A= R[Io 0Jv.
r
0 '
So 1m A c: Im(AAI) and hence Im(AAI) = Im A.
It is deduced from this, and Proposition 2(b), that
for some nonsingular matrices R e ,,,,x"', V e ,''x''. It is easily che:;:k~ rank A = rank(AAI) = rank AI,
that matrices of the form >

and tl1enthat Ker AI,,", Ker(AAI). For Ker AI c: Ker(AA I), obviously. But
AI = V-I[lr BI ]R- I , also, by Theorem 4.5.5
B 2 B 2B I dim(Ker AI) = m - rank: AI = m - rank(AAI) = dim Ker(AA~.
Cor any B I e ,rx(,,-r), B2 e ,(",-r)Mr, satisfythe conditions (1) and (2). Consequently, Ker AI = Ker(AAJ).
430 12 UNJlAR MATRIX EQUATIONS ANDOBNERALIZIlD INVBRSIlS 12.7 OI!NllRAL'ZEDINVBRSIlS
431
Finally, combining these results with the fact that AAI is a projector it is Consequently, for some r x (m - r) matrix X and (lJ - r) x r matrix Y,
found that
'"

which is equation (5).


= Im(AA I) + Ker(AAI) = 1m A + Ker AI,
Now define AI by
91 = R Im[I~J ~= V-I Im[~J
Since the defining equations for AI are symmetric in A, AI we can replace
A, AI in (5) by AI, A, respectively, and obtain (6). . AI = V-I[IYr
-X ]R-I
-YX '
CoroUary 1. If AI is a generalized inverse of A, then AAI is a projector onto
1m A along Ker AI, and AlA is a projector onto 1m AI along Ker A. and it is.easily verified that AI is a generalized inverse of A with the required
properties.
Note also the following property ofthe generalized inverse of A that follows
immediately from the Corollary, and develops further the relationship with . This section is to be concluded with a result showing how any generalized
properties of the ordinary inverse. . mverse can be used to construct the orthogonal projector onto Im A' this
sho~d be compared with Corollary 1 above in which AAI is shown to' be a
Exercise 2. If A e p" n, prove that projector onto I~ A, but n~t necessarily an orthogonal projector, and also
with a construction of Section 5.8. The following exercise will be useful in
the proof. It concerns situations in which left or right cancellation ofmatrices
is permissible. ' ,
for any x E 1m AI and ye 1m A. In other words, considered as linear trans-
formations, A and AI are mutual inverses on the pairs of subspaces Im Al Exercise 3. Prove that whenever the matrix products exist,
and 1m A, respectively. "D
(a) A*AB = A*AC implies'AB = AC
Proposition 3 shows that a generalized inverse for an m x n matrix (b) BA*A = CA*A implies BA* = CA*.
determines a complementary subspace for Im A in fF'" and another com-
plementary subspace for Ker A in IF". The next resultsshows that a converse Hint. Use Exercise 5.3.6 to establish part (a) and then obtain part (b) from
(a). 0 .
statement also holds.
Proposition 4. Given a matrix A E ", x n, let 9'1' 9'z be subspaces of '" and Proposition 5. If A E '''''''' and (A *A)I is any generalized inverse of A *A,
then P = A(A *A}'A * is the orthogonal projector of m onto 1m A.
''', respectively, suchthat
PROOF. Equation (1) yields
ImA+91=fF'" and KerA +~ =".
Then there is a generalized inverse AI .of A such that Ker AI = 91
A*A(A*A)IA*A = A*A,
Im AI =~. and the results of Exercise 3 above show that

PROOF. Let A have rank r and write A in the form (3). It is easily seen that A(A*A)'A*A = A, A*A(A*A}'A* = A*. (8)
. Using either equation it is seen immediately that p z = P.
V(Ker A) = Ker[~ ~l R-I(lm A)"= Im[~ ~] Using Proposition 2(c) one may write
p* = AA*A)')*A* = A(A*AiA*,
and then, using (7), that where (A*AY is (possibly) another generalized inverse of A*A. It can be
shown that p* = P by proving (P - P*)(P - P*)* = O. Write I
Im[~ ~] +R- I
(91) = P, Ker(~ ~] + V(~) = '''.. = + p*p _ p z _ (p*)2,
\\1

~" !.I
(P - P"')(P _ P*)* pp*

.''y"'.'ii
':,
.

"
~
432 12 LINIlA1l MA.TRlX EqUA.nONS AND GIlNI'lIl.A.L1Z1iD INVERSllS 12.8 TIm MOORE-PllNROSB INVI!IlSIl
433
and use equation (8) (for either inverse) to show that the sum on the right
PROOF. Let Xl and X2 be two matrices satisfying the Eqs. (3) through (6).
reduces to zero. By Eqs. (4) and (5), we have
Obviously, P is a projector of , ...into 1m A. To show that the projection
is onto 1m A,let y e 1m A. Then y = Ax for some x e r and, using the first Xr = XrA*Xr = AX,Xf, i = 1,2,
equality in (8), we obtain and hence Xr - Xr = A(X 1Xr - X;zXr). Thus
y = A(A*A)IA*(Ax) = Ps,
Im(Xf - Xf) c: 1m A. (7)
where % = Ax, and the result follows. _
On the other hand, using Eqs, (3) and (6). it is found that
A* = A*XrA* = X,AA*. i = 1,2,
12.8 The Moore-Penrose Inverse which implies (Xl - X;z)AA* = Oor AA*(Xr - Xf) = O. Hence
It has been shown in Propositions 12.7.3 and 12.7.4 that a generalized
Im(Xr - X!) c: Ker AA* = Ker A*
inverse of an m x n matrix determines and is determined by a pair of direct (by Exercise 5.3.6). But Ker A* is orthogonal to Im A (see Exercise 5.1.7)
complements to 1m A in,...and Ker A in ,a.
An important case arises when
orthogonal complements are chosen. To ensure this, it suffices to requ~e
and therefore it follows from Eq. (7) that Xr - Xf = 0..
It was pointed out in Exercise 5.1.7 that for any A e,...>e R
the projectors AAI and AlA to be Hermitian (see Section 5.8)j Indeed. m
this case AA I is an orthogonal projector onto 1m A along Ker A (Corollary ImA E9 Ker A* = , ....
12.7.1) and then
ImA E9 Ker AI = p. (1)
Ker A E9 Im A* = 'R.
and similarly. Comparing this with Bqs, (1) and (2) the following proposition is obtained
Ker A E9 Im AI = ,a. (2)
immediately.

Thus, we define the Moore-Penrose inverset of A e pXiI, written A+, Proposition 1. Forany matrix A
as a generalized inverse of A satisfying the additional conditions
KerA+ = KerA*, ImA+ = ImA*.
(AA I )* = AAI
A straightforward computation can be applied to determine A + from A.
(AIA)* = AlA. Let r = rank A and let
Obviously, the existence of an orthogonal ~omplementto a subspace. and A = FR*. Fe , ... xr, R* e ,r>e R (8)
Proposition 12.7.4 together guarantee the CXlsten.ce of A ~ for.any matrix A..
It turns out. moreover. that the Moore-Penrose mverse IS unique. . be a rank decomposition of A. that is, rank F = rank R* = r. Noting that
the r x r matrices F*F and R*R are of full rank, and therefore invertible, we
Proposition 1. Let A e ,"'XR. There exists a unique matrix X e ,Rxm for define the n x n matrix
which
AXA=A, X = R(R*Rrl(F*F)-lp*. (9)
XAX=X, It is easy to check that this is the Moore-Penrose inverse of A, that X = A +.
Note that the linearly independent columns of A can be chosen .to be the
(AX)* = AX. columns' of the matrix F in Eq. (8). This has the useful consequences of the
(XA)* -XA. next exercise.

tEo H. Moore, Bull. Amer. Math. Soc. 26(1920),394-395, and R. Penrose, Proc, Cambridge Exercise 1. Let A be m x n and have full rank. Show that if m S; n, then
Phil. Soc. 51 (1955),406-413. A+ = A*(AA"')-l and ifm ~ n, then A+ = (A*Ar 1A*. 0
434 12 LINEAR MATRIX EQUATIONS ANDGIlNBRALIZIlD INVERllBS 12.9 TIm BEST APPROXIMATE SoLUTION OF Ax = 6
435
Representation (9) can also be used in proving the following properties of The bases constructed in this way, are called the 'singular bases of A. It
the Moore-Penrose inverse. turns out that the singular bases allow us to gain deeper insight into the
structure ofMoore-Penrose inverses.
Proposition 3. Let A + be the Moore-Penrose inverse of A. Then
(a) (A*)+ = (A+)*; Proposi~on 4. Let A ep"n and let Sl ~ S2 ~ ... ~s,> O=S,+I = ... =Sn
be the smgular values of A. Then s1 1 si\ ... s,- 1 are the nonzero singular
(b) (aA)+ = a-IA+ for any a e 9'. a:#: 0;
values of .1'1.+. Moreover. if {X/}~"I and bar'=1 are singular bases of A. then
(c) If A is normal(Hermitian. positive semi-definite). then so is A +; {YIlr'= I, and {xll~=I are singular bases of A + that is.
(d) For unitary U. V.
(UAV)+ = V*A+U*. .1'1.+ -i s/ IXI
Y, - 0
if i= 1.2... r,
if' ' 1'=r+ ..... m. (12)
PROOF. Let us illustrate the method used hereon the proof of the assertion
(d).
(A+)*x =fsjlYI if i=I.2. r, (13)
If A = FR* is a rank decomposition of A. then (UF)(R*V) is a rank
/ lO if i = r + 1... n.
decomposition for UAV. Hence. using Eq. (9) and the unitary properties of PROOF. Consider the matrix (.1'1.+)*.1'1.+ and observe that. by Exercise 2(e)
and Proposition 2.
U and V. we have
(UAV)+ = (V*R)(R*VV*Rrl(F*UU*F)-I(F*U*) = V*A+U*.
Kerf1+)*A+) = KerAA*V) = Ker AA*.
and each subspace has dimension m - r. Thus A + has m - r zero singular
Additional properties of A + are listed in the next exercises. values and the bases {y j}j= I and {XI}~= I can be used as singular bases of A +.
Exercise 2. Prove that Furthermore. applying A + to Eqs, (10) and using the first result of Exercise 3.
we obtain, for x/elm A*. i = 1.2... n,
(a) 0+ = OT;
(b) (A+)+ = A; X, == .1'1'+.
. 1'1. XI = {S/A+Y/ . i=I.2
if
. r,
,
(c) (.1'1..1'1.*)+ = (A+)*A+ and (.1'1.*.1'1.)+ =A+(A+)*; . o IfI = r + 1... n.
(d) (.1'1.1)+ = (.1'1.+)" for any positive integer k and a normal matnx A. which implies (12). Clearly. si- I (i = 1.2, ... r) are the nonzero singular
values of A+ . "
Exercise 3. Show that Now rewrite the first relation in Exercise 3 for the matrix .1'1.*:
A+Ax'=x. .1'1..1'1.+, = 1. (.1'1.*)+ .1'1.*% = %. (14)
,I
for any x e Im A*. 1 E 1m A. for any % e Im A. and apply (A +)* = (.1'1.*)+ to (11). Then. since
YIo ... , Y, belong to Im A. we have
Hint. Use Exercise 12.7.2 and Proposition 2. 0
- (.1'1.+)*.1'1.* _ fs,{A+)*XI if i = 1.2... r,
Now recall that in the proofs of Theorems 5.7.1 and 5.7.2. we introduced Y/- YI-lo
if i = r + 1... , n,
a pair of orthonormal eigenbases {XI' X2 xn} and {YI. 12 . '. . 1m} of
the matrices .1'1.*.1'1. and AA*. respectively. such that and the result in (13) follows.
S11 1 if i = 1.2..... r. Exercise 4. If .1'1.= UDv* .as in Theorem 5.7.2. and det D '" O. show that
AXI = { 0 I' f 'l=r + 1..... n.
A+ = VD-IU*. 0 .

where it is assumed that the nonzero singular values Sl (l SiS r) of A a~~


12.9 The Best Approximate Solution of the Equation Ax = b
associated with the eigenvectors x.. X2 ..... x,. Note that since ~*Axi ='
sf XI for i = 1.2... , n, the multiplication of (10) by .1'1.* on the left yields)
The Moore-Penrose inverse has a beautiful application in the study of the
SIXI if i = 1. 2... , r equation
.1'1.*1/ = { 0 if i = r + 1..... m. Ax= b. (1)
436 12 12.9 THE BEST APPROXIMATE SoLUTION OF Ax = b 437

where A e,"'''ltand 6 e,ItI, and we wish to solvefor xe". We have seent It remains only to show that IIx111 ;;:: lIotoll for any vector .It E:Ii" minimiz-
that there is a solution of Eq. (1) if and only if 6 e 1mA. In general, the ing !lAx - bll. But then, substituting XI instead of x in Eq. (3), !IAxI - bll ==
solution set is a "manifold" of vectors obtained by a "shift" of Ker A, as lIAxo - b II yields A(x I - xo) == 0, andhence x I - X o e Ker A. According to
described in Theorem 3.10.2. Using Exercise 12.8.3, we see that the vector; Proposition 12.8.2, Xo = A + b e 1m A'" and, since Ker A .1 1m A"', it follows
Xo == A + b is one of the solutions in this manifold: that xo.l (Xt - x o)' Applying the Pythagorean theorem again, we finally
obtain
Axo == AA+b == b,
IIxl1l1 = IIxol12 + IIxI - xoll 2 ~ II xoll z.
and it is the only solution of Eq. (1) if and only if A is nonsingular, and then
A+ = A-I. The proof of the theorem is complete.
Also, a vector x e [Fit is a solution of Bq, (1) if and only if IIAx - 611 == 0 -, Note that the uniquenessof the Moore-Penrose inverse yieldsthe unique-
for any vector norm II II in !F m . nessof the best approximate solution of Eq. (1).
Consider the case b,; 1m A. Now there are no solutions of Eq. (1) but,
keeping in mind the preceding remark, wecan definean approximate solutio~ Exercise 1. Check that the best approximate solution of the equation
of Ax == b to be a vector Xo for which IIAx - 611 is minimized: .; Ax = b, where

..,
IIAxo - 611 == min. IIAx - "" \
\
in the Euclidean vector norm II II. (Then Xo is also known as a least squar~i,
solution of the inconsistent, or overdetermined, system Ax = b. Note that ifxo
is such a vector, so is any vector of the form Xo +" where" e KerA.) .:~.
isgivenby the vector Xo = -h[l lO]T. Find the best approximate solution of
Now for any A e p"" and any be fFtII, the best approximate solution Of Bx = e if B = AT and e = [1 l]T.
Eq. (1) is a vector Xo e'" of the least euclidean length satisfying Eq. (2). Bxercise 2. Generalize the idea of the" best approximate solution" for the \
We are going to show that Xo == A + b is such a vector.Thus, if b e 1m A, then matrix equation AX = B, where A e C"''''', X e C....t , and Be C.... t , and
A + b will be the vector of least norm in the manifold of solutions of Eq. (1~ obtain the formula X 0 = A:I" B for this solution.
(or that equation's unique solution), and if b 1m A, then A + b is the uniqu~
Exercise 3.. Show. that an approximate solution of Ax = b is the best
vector which attains the minimumof (2), and also has minimum norm. approximate solution of the equation if and only if it belongsto the image
Theorem 1. The vector %0 = A + b is the best approximate solution of of A.... 0
Ax == b. If the singular basis {XI' x 2 , , x,,} of A e fFm .... in the linear space ,"
PROOF. For any x e , " write and the nonzero singular values s I' Sz, ... , s; of A are known, then an explicit
form of the best approximate solution of Eq. (1) can be given.
Ax - b == A(x - A+b) + (1 - AA+)( -b),
t Theorem 2. With the notation of the preceding paragraph, the best approxi-
and observe that, because 1 - AA + is the orthogonal projector onto (1m A),. mate solution of Ax = b is given by the formula
(ref. Corollary 12.7.1), the summands on the right are mutually orthogonij
vectors: (4)
(1 - AA +)( -b) e (1m A).L. where

Thus, the extension of the Pythagorean theorem implies IXI = (A"'b,1 XI) , i == 1,2, ... , r. (5)
Sl
IIAx - bllz == IIA(x - A+b)1I 1 + 11(1- AA+)(-b)II
Z

'~1 .~
PROOF. In Eq. (5) and in this proof we usethe notation ( , )forthestandard
== IIA(x - xo)1I 2 + IIAxo _ 1111 2
inner product on IF", or on IF". Thus, (u, e) = v"'u. Let Yt, 11"'" 1m be
and, consequently, the relation (2).
."
i
12 L1N11All MATRIX EQUATIONS AND GilNmwJZIlDINVIlRSES 12.10 MISCIILLANEOUS ExERCISIlS
439
an orthonormal eigenbasis of AA* in'''' (the second singular basis of A). for any X E ,'" X". Prove that the matrix representations of T, and T-
Write the representation of II with respect to this basis: .h h 1 2
Wit respect to t e standard basis [E/J]7:i':.t for "'11<" (see Exercise
3.S.1(c are the matrices
= P1YI + pz Yz + ... + PIIlY""
II
and note that P, = (b, yj), i = 1, 2, ... ,m. Then, using Eq. (12.8.12), we and (I,, A) + (B T I,J,
find that '
respectively.
'" ,
Xo = A+b = L fJiA+Yi = LfJiS,lXi. S. If A e [F"'x"" B e [F"x", and II II denotes either the spectral norm or the
i=l i=l norm of Exercise 10.4.9, prove that
In view ofEq. (12.8.10),
IIA BII = IIAIIIIBII.
i = 1,2, ... , r, Hint. For the spectral norm use Exercise 10.4.5. Compute row sums of
the absolute values of the elements to prove the equality for the second
and therefore norm.

PiS,l = ~ (b, Axj) = 1:z (A"'b, x,), 6. Let A e [F"'x", and C e [F"XtI be invertible matrices. Prove that if
BE [F'" lltl is left (respectively, right) invertible, so is the matrix ABC.
Si Si

for i = 1, 2, ... , r. This completes the proof. 7. Let A e [F"'lltl be a left (or right) invertible matrix. Prove that there
exists Il > 0 such that every matrix B e fF'" ll" satisfying the condition
IIA - BII < Il also is left (or right) invertible. .
In other words, one-sided invertibility is "stable" under small per-
turbations.
12.10 Miscellaneous Exercises
Hint. Pick 8 < II Ai' 111- 1 for the left invertibility case and show that
(I + Cr i AL1B = I, where C = AL'I(B - A).
I. Let Ae,,,,xl,andBe[F"x". S.Let A e [F"'x" Be [F'x" (r ~ min(m, n be matrices offullrank. Prove
, that ,
(a) Prove Proposition 12.1.4 for rectangular matrices ofappropriat~
sizes. , . (AB)+ = B+ A+.
(b) Show that there are permutation matrices PI and P:z such th~!;
P l(A B)p2 = B A. (This is a generalization of Proposition 12.1.3.), Find an example of two matrices A and B such that this relation does
not hold.
2. If {X1o X2"'" x",} is a basis in [F'" and {,1o Y2'" ~, ),,,} is a basis in,
prove that the Kronecker products yJ Xl' 1 s j s n, 1 s i ~ Hint. Use Bq, (12.8.9) with A = F, B = R*.
constitute a basis in [F"'x". 9. If A e [F" X" and A = HU = U IH 1 are the polar decompositions of A,
3. Prove that the Kronecker product of any two square unitary (Hermitian, show that A+ = U"'H+ = HtUr are those of A+.
positive definite) matrices is a unitary (Hermitian, positive definit~J. Hint. Use Proposition 12.8.3d.
matrix.
10. Prove 'that a matrix X is the Moore-Penrose inverse of A if and only if
4. Let A e fF"'x"', Be [F"x" be fixed and consider the transformations 1i.
72 e 9'([F"'X") defined by the rules XAA'" = A'" and X=BA'"
Ti(X) = AXB, 72(X) = AX + XB, for some matrix B.

,
ji
12 LINEAR. MATRIX EQUATIONS AND GENERALIZED INVERSES

ll. Verify thatthematrices

and
[H]
are mutual Moore-Penrose inverses, each of the other.
CHAPTER 13

Stability Problems
12. Consider the permutation matrix P of Proposition 12.1.3. Show that
when m = n, P is a symmetric matrix. In this case, show also that
A E !F""" is a symmetric matrix iCand only ifvec A = Pevec A).

13. (a) Generalize Exercise 12.1.3(b) to admit rectangular A and B.


(b) Let A e e"''''' and rank A = r, Note that the equation AXA = A
(Eq. (12.7.1 can be written in the form (AT A) vee X = vee A. find
the rank of AT A. What can be said about the number of linearly,
independent solutions of AXA = A? In Section 12.3 we defined a stable matrix. Matrices of this kind are of
particular significancein the study of differential equations, as demonstrated
14. Show by example that, in general, (AB)+ ;toB+A +. by the following proposition: The matrix A is stable if and only If,for every
solution vector x(t) of i: = Ax, we have x(t) -+ 0 as t -+ co. This result is
15. If B+ is the Moore-Penrose inverse of B and A =GJ, show that A + = evident when the solutions x(t) = eAtxo of the equation i = Ax (Theorem
t[B+B+]. 9.10.3) are examined, as in the proof of Theorem 12.3.3 (see also Exercise
10.9.14).
In terms of inertia, a matrix A E e""" is stable if and only if its inertia (see
Section 5.5) is (0, n,O). The Lyapunov method, presented in this chapter,
expresses the inertia of any given matrix in terms of the inertia of a related
Hermitian matrix and is of great theoretical importance. ->

In particular, the Lyapunov theory provides a criterion for a matrix to be


stable that avoids a direct calculation of the characteristic polynomial of
A and its zeros. Moreover, given a scalar polynomial with real coefficients,
the distribution of its zeros with respect to the imaginary axis (the Routh-
Hurwitz problem) will be obtained by use of the method of Lyapunov. Also,
a solution of the Schur-Cohn problem of zero location with respect to the
unit circle, as well as other problems concerning the location of zeros of a
real polynomial, will be discussed.

13.1 . The Lyapuno.v Stability Theory and Its Extensions

To illustrate Lyapunov's approach to stability theory, we shall be con-


cerned in the beginning of this section with quadratic forms. Denote by v the
quadratic form whose matrix is a real and symmetric matrix V (see Section

441
442 13 STABILITY PaOBLBMs 13.1 THB LYAPUNOV STABJUTY THIlORY 443

5.10), and similarly for wand W. The positive definiteness of V (orof W)


will be expressed by writing. v > 0 (or w> 0) and similarly for positive II (x}-k
semi-definite forms.
Let A e IR""" and consider the differential equation

.:t(t) = Ax(t). (1) Trajectory of solution


as t increases
Evaluate a quadratic form v at a solution x(t) of Eq. (1), where x(t) e Ilr'.
for each t. We have vex) = xTVx and hence

vex) ::: ;evx + xTV.i = xT(ATV + VA)x.


Hwewrite
ATV + VA = -W, (2)

then clearly W is real and.symmetric and V(x) = -w(x), where w(x) =


xTWx. .
Lyapunov noted that, given a positive definite form w, the stability of A
can be characterized by the existence of a positive definitesolution matrix V
for Eq. (2), for then v(x) is a viable measure ofthe size of x. Thus the equation
i = Ax has the property that lim,...ao x(t) = 0 for every solution vector Fig. 13.1 The trajectory for a solution when A is stable.
x(t) if and only if we can find positive definite forms wand v such that V(x) =:
- w(.x). Furthermore, the positive definite matrix W of Eq. (2) can be chosen
arbitrarily, so a natural choice is often W = 1. Theorem 1 (A. M. Lyapunov'). Let A, We e""" and let W be positive
to see the significance of this heuristically, let A be 2 x 2 and consider a definite.
solution
(a) If A is stable then the equation
x(t) = [Xl(t)] AH + HA* = W (3)
xz(t} ,
hasa unique solution matrix H and H is negativedefinite.
in which Xl(t) and xz(t) are real-valued functions of t. The level curves fot a (b) Ifthereis a negativedefinite matrix H satisfyingEq. (3), then A is stable.
positive definite form v (see Exercise 5.10.6) are ellipses in the Xl> X2 plane
PROOF. (a) Let A be stable. Then A and -A obviously have no eigen-
(Fig. 1), and if Xo = x(to) we locate a point on the ellipse vex) = v(xo) cor:
values in common and therefore Theorem 12.3.2 implies the solvability and
responding to x(t o)' Now consider the continuous path (trajectory) of the
.
solution x(t) in a neighbourhood of (Xl (to), xz(to If w > 0 then < 0, v uniqueness of the solution of Eq. (3). We may use Theorem 12.3.3 to write
the solution H of Eq. (1) explicitly in the form
and the path must go inside the ellipse v(x) = v(xo) as t increases. Further-
more, the path must come within an arbitrarily small distance of the origin
for sufficiently large t. H = - Lao e" we-t., dt.
Returning to a more precise and general discussion, let us now state an'
algebraic version of the classical Lyapunov theorem. More general inertia
theorems are to be proved later in this section, as well as, a more gener t Probteme gintfrat de fa stabitite du mouoement, Ann. Math. Studies, no. 17, 1947, Princeton
stability theorem (Theorem 5). 1Jniv. (Translation of an 1897 original.)
444 13.1 THs LYAPUNOV STABIUTY 1"HBoRY 445

To show that H < 0, compute In numerical practice, we can decide whether a given real matrix A is
stablebysolvingforHintheequationAH + HA* = l(orAH + HA* = W,
x"'Hx = - LOll (x"'eA')W(x"'eA')'" dt, for some other choice of W) and then applying the result of Exercise 8.5.2 to
decide whether H is negative definite. Thus, A is stable if and only if the
and observe that the positive definiteness of Wand the nonsingularity of eA, leading principal minors dl , d1 ; , d" of H satisfy the conditions
for all t yield the positivity of the integrand in Eq. (4). Hence x'" H x < 0 for
all nonzero x e en and thus H < O. dl < 0 and d1dl+ 1 < 0, i = 1, 2, ... , n - 1.
(b) Conversely, let A be an eigenvalue of A"', where A"'x = AX with If the equation AH + HAlle = 1 can be solved to give the coefficients of H
x #:.0. Then we also have x~ = h*. Multiplying Eq. (3) on the left and right 'explicitly in terms of the elements of A, then we obtain a direct determinantal
by J;'" and .t, respectively, we obtain characterization of stable matrices. For general matrices this is not a practical
proposition, but for relatively simple classes of matrices, explicit stability
(1 + A)x*Hx = x*Wx. (5)
criteria may be obtained in this way. In subsequent sections we develop this
Our hypothesis implies that x"'Hx < 0 and also x*Wx > 0, so we have possibility in the particularly important case in which A is defined directly
1 + A < 0, that is, fJle A < O. Since this is true for each eigenvalue A, it by its characteristic' polynomial, for example, when A is a companion
follows that A is stable. matrix. Here we keep the hypothesis that W is positive definite in Eq. (3),
l'lote that, in fact, a slightly stronger statement than part (b) ofthe theorem and we seek some connection between the inertia of A and that of H (where
has been proved. If H isnegative definite and satisfies the Lyapunov equation H'" .. H), which should reduce to thc hypothcsis ofTheorem 1 whcn A is
AH + HA'" = W, then the stability of A can be deduced if x*Wx < 0 for stable.
just one eigenvector x associated with each eigenvalue of A"'. Thus, if A has We now consider a generalization of Lyapunov's theorem due to M. G.
multiple eigenvalues, then W need not be definite on the whole space cn~ Krein t in the USSR and, independently, to A. Ostrowski and H. Schneider!
in the West. .
EXlll'ei.e 1. Verify that if a> 0 then both eigenvalues of
Theorem 2. Let A e e""".1! H is a Hermitian matrix suchthat
A=[~ -~] AH+HA*=W, W>O, (6)

have positive real parts. Show that the matrix


then H is nonsingular and
v(A) = v(H), cS(A) = cS(H) = O.
+ a -2a]
- [4 -2a
1 n(A) = 1I:(H), (7)
H_
4+a 1 Conversely, if cS(A) = 0, then there exists a Hermitian matrix H such that
Eqs. (6) and (7) hold.
is positive definite, and so is AH + HAT.
PROOF. Let Eq. (6) hold. Proceeding as in the proof of Theorem 1(b), we
Ex,reise 2. Check that the Matrix .obtain Eq, (5), which shows that W > 0 yields A + A =F O. Thus A has no
A = P(S - Q), pure imaginary eigenvalues and therefore In A = {p, n - p, O} for some
integer p ~ n. A Jordan form J of A can then be written in the form
where P and Qare positive definite, is stable for any skew-Hermitian matrix S.
Hint. Compute A"'P- I + P-IA and use Theorem 1. P-IAP = J = [~ ~J (8)
Exereise 3. Show that if fJte A = !(A + A lIe) is negative definite then A is
stable. Show that the converse does not hold by considering the matrix 'Stabiliiy of solutions of differential equations in Banach space, by Ju, L. Daleckii and
M. G. Krein, Amer. Math. Soc.,Providence, 1974. Theorem 2 seems to have been included
A = [-o I],
-8
in the lectures of M. G. Krein for some years before first appearing in print in a restricted
edition (in Russian) of an early (1964) version of the 1974 edition.
* J. Math. Anal. Appl. 4 (1962), 72-84. See also O. Taussky, J. Soc. Indust. Appl. Math. 9
where 8 > 0 is small. 0 (1961). 640-643.
446 13 STABJUrY PllOBLBMS
13.1 TuB LVAPUNOV STABlUTY 'TlmoRV 447

where J t e C''''P, J z eC(n-p) l ,, - I'), and In J 1 ;:; {p, 0,O}, In J 2 = In contrast to Theorem 2 the next results are concerned with the case when
AH + HA == W is merely a positive semi-definite matrix. First, let us
{O, n - p, OJ. Define R ... P-tH(P*)-t and rewrite Eq. (6) in the form
agree to write In A ~ In B for any two n x n matrices A and B such that
JR + RJ* = Wo,
7t{A) S n(B),v(A) S v(B).
where Wo = P- 1W(p)-1
> O. The partitions of Rand Wo corresponding
to that of J in Eq. (8) give Clearly, in this case c5(A) ~ c5(B).

J 0
[ 01 J 2
][RRf RR
1 2]
3
[R 1
+ Rf
Proposition 1. Let A E C""" and S(A) = O. If H is Hermitian and
AH+HA*= W~O, (10)
and hence
then In H S In A.

where WI > 0, W3 > O. By Theorem 1 the matrices R1 E cP l(I' and - R3 E PROOF. By Theorem 2, there exists a Hermitian matrix H 0 such that
are positive definite. It remains to show that, in this case,
Cl,,-p)l"-p)
AH o + HoA* > 0 and In A = In Ho

InH = In [:j ::] = {p,n - p,Ol = InA. Putting H. = H + aHo, we observe that
AH, + H.A == (AH + HA) + e(AH o + HoA) > 0,
This follows from the easily verified congruence relation
for all a > O. Thus, using Theorem 2 again, we find that for any B > 0,

QRQ == [~1 -RrRi~R2 + RJ In A = In H. (11)


where =
In particular, the last equality implies t5(A) t5(H.) = 0, and hence the

Q == [~ -Rr Rz
}
matrices A and H. have no eigenvalues on the imaginary axis. In particular,
they are nonsingular, The continuity ofthe (real) eigenvalues of H. as func-
or
tions of 8 (see Section 10.5) then implies that none them can, in the limit,
"jump" throush the origin. Hence the only possibility is that, as 8 -+ 0, one
Conversely, suppose that deAl == 0, so tbat J has a Jordan form as de-
or more eigenvalues of H. become zero as H. approaches H. The result now .
scribed in Eq. (8). A Hermitian matrix H is to be constructed so that Eqs, follows from Eq. (11):
(6) and (7) are valid. To this end, apply Theorem 1, which ensures the exist~
enee of negative definite matrices HI e CPl(P, Hz e 0"-1'1"("-1') such that In A = In H. ~ In H.
-J1 H 1 - HtJ! > 0, . Note that if A is stable, then certainly t5(A) = 0 and the proposition shows
J 2Hz + HzJf > O. that H is semidefinite, a result that has already been seen in Exercise 12.3.1.
Now it is easily checked by use of Sylvester's law of inertia that the matrix Proposition 2. If A E e""" then, for any nonsingular Hermitian matrix
HE cn"" such that Eq. (10) holds,
H== p[-Ho Hz0]p* 1

inA S InB. (12)


is Hermitian and satisfies condition (6). It remains to recall that H 1 < 0,
H 2 < 0, and hence PROOF. Denote A. = A + aH- 1 for B real. Then the matrix
In H == In{ -HI) + In Hz = {p, n - p,O} = In A. A.H + HA: = (AB + HA) + 2cl
448 13 STABIUTY PR08UlMS 13.1 TIm LYAPUNOV STABILITY THooIlY 449

ispositive definite foreach > O. Hence Theorem 2 implies In A, = In H . It follows that x*W = OT and, as noted before, this implies that x*B = OT.
and, in particular, c5(AJ = 0 for all 6 > O. The continuity of the eigenvalues Hence
of A. as functions of 6 now implies that the inertia of A can differfrom that of
x*AH = x*W - x*HA* = OT. (14)
A. only by the number of eigenvalues that become pure imaginary in the
limit as 6 -+ O. Thus, Now we have
InH=lnA.~lnA, x*ABB*A*x S x*AWA*x = x*(A 2HA* + AH(A*)2)X = 0,
and the proposition is established. usingEq, (14) twice. Thus, x*AB = OT. Repeating this procedure, it is found
Combining the results of Propositions 1 and 2, we obtain a generalization that x*A"B = OT for k = 0, 1,.. , n - 1. Therefore we have found an x "!: 0
of Theorem 2 in which W is semi-definite. such that
Theorem 3 (D. Carlson and H. Schneidert), If A E Cft)(ft,.5(A) = 0, and the x*[B AB . . . A"- 1B] = OT,
Hermitian nonsingular matrix HeCft"ft satisfies the condition AH + HA*
= W~. 0, then In A = InH. . and the pair (A, B) failsto becontrollable. Thisshows that, actually, det H "!: 0
and hence the requirements of Theorem 3 are fulfilled.
When A, H, and Ware related as in Eq. (10),with H = H* and W positive
semi-definite, the equality of In A and In H is achieved at the expenseof the It is clear that if (A, W) is controllable, then we may take B = W1/2 (the
further hypotheses that c5(A) = 0 and H is nonsingular. A useful sufficient unique positive semi-definite square root of W) and obtain the next conclu-
condition for these latter hypotheses, and hence for equality of the inertias, sion.
involves the idea of controllability of a matrix pair (see Section 5.13). Recall Coronary 1 (c. T. Chen"; H. K. Wimmerf ) . If AH + HA* = W,
that an n x n matrix A and an n x m matrix B together are controllable if H = H*, W ~ 0, and (A, W) is controllable, then c5(A) = 6(H) = 0 and
and only if InA = InH.
rank[B AB . . . A"-I B] = n. (13)
Now a useful generalization of the classical stability theorem is readily
Theorem 4. If AH + HA* = W ~ BB*, where H* =H and (A,B)is
proved.
controllable, then c5(A) = c5(H) = 0 and In A == In H.
PROOF. We shall show that the controllability of (A, W) yields.5(A) = 0 and Theorem 5. Let A e C'""n with (A, B) controllable, and let W be positive
det H "!: 0, so that the required assertion follows from Theorem 3. semi-definite, with
Suppose on the contrary that c5(A) "!: 0; hence there is a nonzero vector W ~ BB* ~ O.
x e e" such that x*A = irzx* and ex is real. Then A *x = - icxx and
Then the conclusions(a) and (b) of Theorem 1 hold.
x*Wx = x*(AH + HA*)x = (-iex + icx)x*Hx = O.
PROOF. For conclusion (a) follow the proof of Theorem 1 and use Lemma
Since W ~ BB* ~ 0, it follows that 0 = x*Wx ~ IIB*xIl 2, that is, 9.11.2 together with the controllability of (A, B) to establish the definite
=
x*B OT. But then the controllability condition is violated, since property of H.
x*[B AB ... A"-IB] = [x*B iexx*B ... (iexr- 1x*B] =OT, Conclusion (b) is a special case of Theorem 4.
for x"!: O. and hence rank[B AB ... A"-IB] < n. So we must have We conclude by noting that the inertia we have been using is often referred
c5(A) = O. to as the inertia of a matrix with respect to the imaginary axis. In contrast,
Assume now that H is singular. Then there is a vector x "!: 0 such that the triple {x'(A), v'(A), .5'(A)}, where x'(A) (respectively, v'(A) and c5'(A
Hx=Oand
x*Wx = x*(AH + HA*)x = O.
t SIAM J. Appl. Math. 15(1973).158-161.
t J. Math. Anal. Appl. 6 (1963),430-446. t U""or A~q. Appl. 8 (1974).337-343.
450 13 STABILITY PROBUlMS 13.2 STABILITY WITH Ri!sPBCT TO 1HIl UNIT CIIlCUl 451

denotes the number of eigenvalues (counted with their multiplicities) with 13.2 Stability with Respec:;t to the Unit Circle
positive (respectively, negative and zero) imaginary parts, is said to be the
inertia ofA with respect to the real axis. Note that obviously
We have seen that the stability problem for a linear differential equation
{1t(iA), v(iA), c5(iA)} = {v'(A),1t'(A), c5'(A)}. with constant coefficients is closely related to the location of the eigenvalues
of a matrix with respect to the imaginary axis. An analogous role for linear
Exercise 4. Check that the n x n Schwarzmatrix difference equations is played by the location of eigenvalues relative to the
unit circle.
0 1 0 0 To illustrate these ideas,let A e C"XII and a nonzero Xo e CD be given, and
-ell 0 1 consider the simplest difference equation:
S= 0 -Cll - l 0 (IS)
0 1 XJ e C", j = I, 2, .... (1)

0 o -C2 ...,..Cl This equation is a difference analogue of the differential equation (13.1.1),
but its solution is much more simple: xJ = AJx o for j =: 0, I, .... We say
satisfies the Lyapunov equation that the matrix A in Eq. (1) is stable (with respect to the unit circle) if the

where H = diag[clc2'" c", C2C3'" c",


STH +HS= W,

, C"-lC", cJ and
solution sequence xo, Xh ,x" .., of (1) converges to zero for any choice of
the initial vector xo' It is easily seen that this occurs if and only if AJ -+ 0 as
j -+ 00, and this occurs (see Exercise 11.1.8) if and only if the spectral radius of
A is less than 1. Thus, a matrix A is stable(withrespeci to the unit circle) ifand
I
I'

W =: diag[O; 0, - ic:].
Exercise 5. If Ch C2' , e, in (15) are nonzero real numbers, show that
the pair of matrices (W, S") is controllable and hence
only if all its eigenvalues 410 42'... , 411 lie inside the unit circle, that is, l.tll < I
for i = I, 2, ... , n.
- In this section we develop a method analogous to that of Lyapunov for
determining the number of eigenvalues of a matrix (counted with their
multiplicities) lying inside and outside the unit circle and on its circum-
I
7E{S) =: n - k, v(S) = k, c5(S) = 0, ference. It turns out that this problem is closely related to the existence of
'. positive definite solutions H of a Stein equation H - A"'HA = V for some
where k is the number of positive terms in the sequence positive definite V.

Theorem 1. Let A, V e C" x D and let V be positive definite.


Exercise 6. (Sylvester). If A is an n x n positive definite matrix and . (a) If A is stablewith respectto the unit circle, then the equation
He C"X" is Hermitian, show that In AH = In H.
H-A"'HA= V (2)
Exercise 7. (A generalization of Exercise 6 by H. Wielandt). If A e C"X"i
918 A is positive definite, and He C"XII is Hermitian, show' that In AH = hasa uniquesolution H, and H is positivedefinite.
InH. (b) If there is a positive definite matrix H satisfying Eq. (2), then A is
. f.
stablewith respect to the unit circle.
Exercise 8. (W. Habn f ) . Show that if Ole A s 0 and (A,91.e A) is 'l!.-
controllable pair, then A is stable (see Exercise 3). PROOF. First suppose that A is stable and let Al' A2'... ,A" denote the eigen-
values of A, so that IAII < 1 for i = I, 2, ... , n. Then (see Exercise 9.4.3)
Exercise 9. Show that if W1 ~ "Vz ~ 0, then Ker ~ r= Ker '2. 0 A'" + 1 is nonsingular and therefore we may define the matrix

t MOlliltsheftefur Math. 7S (1971), 118-122.


C = (A'" + I)-l(A'" - I). (3)
452 13 STABILITY PR01lLIlMS 13.2 STABILITY WITH RilsPECT TO TIlE UNIT CIRCLE 453

Observe that, byExercise 9.4.4, the matrix C is stable (with respect to the where J 1 e CP llp, J 2 e C(II- P) 1l (II - P), and the eigenvalues of J. and J a lie
imaginary ws). A little calculation shows that for anyH e e" II II inside and outside the unit cirole, respeotively. Preserving the notation or
the proof of Theorem 13.1.2, Eq. (2) yields the equation
CH + HC'" = (A'" + I)-I(A'" - I)H(A + I) + (A'" + I)H(A -1)]
x (A + I)-I R - J*RJ= Vo, J'o > O.
= 2(A'" + l)-I[A"'HA - H](A + Ir l. and, partitioning R and J'o compatibly with J, we obtain
and hence
C(-H) + (-H}e'" = 2(A'" + Irl[H - A"'HA](A + I)-I. (4)
Now apply Theorem 13. 1.1(a) to Eq. (4) to see that, if H - A",HA is positive where VI> 0, Va> O. Now Theorem 1 yields R) > 0 and R 3 < 0 and
definite, so that the matrix on the right ofEq. 4 is positive definite, then His therefore, using Eq, (13.1.9), it is found that
unique and is positive definite. ,
The converse is proved with the line of argument used in proving Theorem In H = In R = (p, n - p, 0),
13.U(b). Suppose that Eq. (2) holds with H > O. If A is an eigenvalue of A
as required.
corr~nding to the eigenvector x, then Ax = =
Ax and x*A'" h"', so that' Conversely, suppose that IAI ,;: 1 for any eigenvalue Aof A. Then A can
Eq. (2) implies be transformed to the Jordan form (6). By Theorem 1 there are positive
%*Vx = (1 - IAI 2 )%*H%. (5) definite matrices HI E Cl'lll', H, e CIII-I') " In-I') such that
The positive definiteness of the matrices V and H now clearly implies IAI < 1
for each eigenvalue A of A. HI - JtHIJ I = JIl, JIl > 0,
following the lead set in the previous section, we now seek a generalization H 2 - (JtrIH 2J;: I = V2, JI2 > o.
of Theorem 1 of the following kind: retain the hypotheses V > 0 and
Hence the positive definite matrix
H'" = H, det H ,;: O,for matrices ofEq. (2) and look for a connection between
the inertia ofH (relative to the imaginary axis) and that of A (relative to the
ulii~circle). Theorem 2 provides such a generalization and is due to Taussky,
H = (S*)-1 [HOI 0
-(J~)-IH2J;:1
]S-I
HiU,and Wimmer. t
satisfies the condition H - A"'HA > O. Obviously, the number of eigen-
neerem Z. Let A e C" ll ". If H is a Hermitian matrix satisfying the Stein
values of A lying inside (respectively, outside) the unit circle is equal to
equation (2) with V > 0, then H is nonsinyular and the matrix A has 'Jt(H)
p = 'Jt(H) (respectively, n - p = v(H, as required.
(respectively,v(H eigenvalues lying inside (respectively, outside) the (open)' ,
uniteircle and no eigenvalues qfmodulus 1.
The following results are analogues of Theorems 13.1.3 and 13.1.4 and
Conversely, if A has no eigenvalues ofmodulus 1, then there exists a Hermitian can be established in the same fashion. They are stated here without proof.
matrix H such that Eq. (2) holds and its inertia gives the distribution of eigen~
values ofA with respect to the unit circle, as described above. Theorem 3. Let A e cn II II and let A have no eigenvalues ofmodulus 1. Ifthere
(5) shows that A has no eigen~
PROOF. . If Eq.(2) is valid, then the relation exists an n x n nonsingular Hermitian matrix H such'that
values of modulus 1. Proceeding as in the proof of Theorem 13.1.2, let the
Jordan form of A be
H - A*HA = V, V ~ 0, (7)
then.A haS1t(H) and v(H) eigenvalues ofmodulus less than 1 and greater than 1,
\
(6) , II
respectively. ~. 'li
~~ ..

t See. respectively, J. Algebra I (19M), 5-10; Linear A/g. App/.:I (1969),131-142; J. Math.
Theorem 4. IfEq. (7) holds and the pair (A *, Y) is controllable, then A has no 'II

Anal. App/. 14 (1973). 164--169. ' eigenvalues ofmodulus 1, det H ~ 0, and the assertion of Theorem 3 is valid. , :~
r )
454 13 STABJUTY PROBLllr.tS 13.3 THIl BBZOUTIAN AND THIl REsULTANT 455

13.3 The Bezoutian and the Resultant If a(A) = ao + alA + ... + a,l', the triangular. Hankel matrix of co-
efficients,

The proofs of the stability results given in the following sections depend
on two broad areas of development. The first is the development of inertia
theory for matrices, which we have already presented, and the second is a
Sea) = [:,: ~~: ~': ~], (2)
a, 0 ... 0
more recent development of properties of Bezout matrices for a pair of
scalar polynomials, which will now be discussed. arises very frequently. For a reason that wiD be clear in a moment (see
Let a(A) ==D:o aJAJ and b(A) == D'=o bJAJ (a, =F 0, bm =F 0, I ~ m) be . Exercises 3 and 4), we call Sea) the symmetrizer of a().). Note that Sea) does
arbitrary complex polynomials. Since b(A)can be viewed as D=o bJ~l, where not depend on ao'
b... + 1 == ... = h, = 0, we may assume without loss of generality that the Exercise 3. If CII denotes the companion matrix of a(A),
polynomials are ofthe form D= 0 aJA}, a, =F 0, and D=o bJll. The expression
o 1 0 0
a(A)b(p) - b(A)a(p) o 0 1
A-p O"
is easily verified to be a polynomial in the two variables Aand p: o o 1
-aD -a 1 -a,-l
a(l.)b(P) - b(l.)a(P) '-I i
== 0, I, .. ,I -
A_ == I: "t'J). pi. where ii J = a,'aJli I, check that
Il '.J-o
-aD 0 o
Then the matrix B == [y1}]l.J~O is called the Bezout matrix (or Bezoutian) as- o a2 a,
sociated with the polynomials a().,) and h()"). To emphasizethis, wealso write
B == Bez(q, b). S(a)CII = O.
The main objectives of this section are to obtain formulae expressing
Bez(a, b) in terms.of the coefficients of a and b (Eqs. (7), (17), and (20), in o a, 0 ... 0
pa,t1icular), to make the connection with the resultant of a().,) and b(..t) and note that this matrix is symmetric. 0
(Eq. (24,aQd to make the first deductions from theseconcerningthe greateSt Define the polynomial t1(A) = ).,'a(A- I) = ao).' + ... + a,_IA + a,; .its
common divisor of a().,) and b(A) in the form of Theorems 1 and 2. symmetrizer is
Exercise 1. Cheek that a,_1 a,-2 ... ao]
BeZ(a, b) = -Bez(b, a), Sell) = a'~2 .... ~~ ~. (3)
[ . .. . .
for any two polynomials a().) and b()') and that Bez(a,b) is symmetric. ao 0 ... 0

Exercise 2. Show that the matrix Another device will also be useful. Define the permutation matrix P (the
rotation matrix) by
0-1 4] 0 0 .. , o 1
B =
[
-14
4 0-2
0
P=
0 0 1 0
, (4)
is the Bezout matrix associated with the polynomials a().) = 1 - A 2 + 2~3 0 1
and bel) .. 2 ...; A2 - 2 - Aa + OA3. Cl . ' 1 0 ... 0 0
456 13 STABILITY PROBLEMS 13.3 TNB BszotmAN AND nm REsULTANT
457
and observe that p2 == I and that SCalP and PS(a) are Toeplitz matrices: for i == 0, 1, ... , 1- 1 we see that "IT forms row (i + 1) in the Bezout matrix
B and that the last equation implies

l~: ."~" :~:jl. [7 o. :::00:]. 1


SCalP ==
o ...
a2

0 a,
PS(al _
a2
aJ a2 . .. a,
"IT[ rJ- [0,+1 0,,, . .. a, 0 ... 0]
b{Jl)
b~)1l
P.'-I
[
Exercise 4. Generalize the result of Exerc~e 3. Namely, show that, for any b(P)Il'-l
I$, k s r- I,
a(ll) ]
S(a)C: = diag[ -es, S2], P, s, e ct x lc, S2 e C'-Ic) x ('-Ic), (5)
- [bl+ J bl+ 2 ... bl 0 ... 0] a(~)p. . (8)
where SI and S2 stand for the symmetrizers of the polynomials 010.) = [
a" + a,,_IA + ... + aDA" and a2(A) == .a" + a/c+ IA + ... + a,A'-", respec- a(Il)Il'-1
tively. Now a little calculation will verify that for any polynomial C(Il) with
Also, show that degree at most I,
S(a)C~ = - PS(a)p. 0 (6) col(c{Jl)p.'-l)!= 1 = S(t)P cOI(p'-l)!= 1 + PS(c) col(",'-I)I= 1"'" (9)
The formula derived in the following ~,K'oposition will be useful in the Whereby definition
sequel.
Proposition 1. With the previous notation, the Bezoutian B of scalar poly-
nomials a(A) and b(A) is given by COI(Rill;;;A~f ~ ]. (10)
B == S(a)S(6)P - S(b)S(ti)P, R'-1
PROOF. Rewrite Eq. (1) in the form and, in general, R o, R ..... , R'-l may be scalars or matrices of the same size.
Using Eq. (9) for a(A) and b(A), we find from Eq. (8) that
'-I
(A - Il) :E "INt)AI = a(A)b{Jl)
'=0 .
- b(A)a(JL), Bcol(",'-l)I=1 == S(a)(S(6)Pcol(p'-I)I=1 + PS(b)col{Jll-l)l=lll')
where 'VNl) == D:;;~ 'V,}Il} for i == 0, I, ... , I - 1. Comparing the coefficients
- S(b)(S(ti)P col(p'-l)l= 1 + PS(a) COI(",I-l):=I"")..
of Al on both sides of the equation we obtain Note that for any a(A) and b(A),

'Vo{Jl) - ll'Vl{Jl) == alb(p) - b 1a{Jl), S(a)PS(b) == S(b)PS(a), S(4)pS(6) = S(6)PS(4). (12)


'Vl{Jl) - 1l'V2{Jl) = a2b{Jl) - b 2a{Jl), .Thus, using the first equality in (12), we obtain
Bcol(",'-l)l=l == S(a)S(b)Pcol(P'-l)l=l - S(b)S(ti)P col(P' - ' )I= l'
'V'-2{Jl) -1l1,-J(It) =a'-l b{Jl) - b'-la{Jl),
Since the last relation holds for any Il e C, and the Vandermonde matrix
1'-l(1t) == a,b{Jl) - b,a{Jl).
for distinct "'10 Il:z,., III is nonsingular (see Exercise 2.3.6), the required
From these equations it is easily deduced that for i == 0, 1, ... I - 1, fonnula (7) is established. .
"INl) = (Jl'-i-l a, + 1l'-'-2a, _ 1 + ... + ItOI+l + 01+ l)b(p) A dual expression for the Bezoutian B = Bez(a, b) is presented next.
- (p'-'-lb, + p.'-'-2b,_1 + '" + pbl+2 + bI+ 1 )a{Jt). Exerdse 5. Check that
Note that deg(y;(p. ::;; I - 1 for 0 ::;; i S; I - 1. Defining
-B = pS(a)S(b) - pS(6)S(a). (13)
1T == [1'0 1'1 . . . '1'1.'-1] Hint. Rewrite Eq, (9) in terms of rows. 0
458 13 STABIUTY PaOBLI!MS 13.3 TuB BUZOUTIAN AND THE REsULTANT 459

It is easily seen that, if the polynomial a is fixed then the matrix &z(a, b) It will be useful to have at hand another representation for the Barnett
is a linear function of the polynomial b. Indeed, if b(A) and C(A) are poly- factorization. Observe first that by the Cayley-Hamilton theorem, a(Cal = 0,
nomials with degree not exceeding I and fJ, y E C, we see that in the definition and therefore
ofEq. (I), I
b(CJ = b(C a) - a,lb,a(Ca) = L 6Jc~,
a(A){fJb(p) + yc(p)} - {Pb(l) + yc(I)}a(1J) j=O

= fJ{a(l)b(1J) - b(I)a(1J)} + y{a(A)c(1J) - c(,l)a(p)}. where 6J = bj - a,-lb,a j ' for j = 0, 1, '" ,I - 1. Define
This immediately implies that pT = [6 0 61 6'-1] (18)
Bez(a, fJb + yc) = fJ Bez(a, b} + y Bez(a, c). and denote the ith row of b(CJ by df for i = 1,2, , I. Since
In particular, writing bel) = D'=o bJAl we have erC~=~ 1 0 0]
m j
Bez(a, b} = L bJ Bez(a, ,li). we obtain
J=O
(19)
Now examine the "primitive" Bezout matrices Bez(a, AJ} for j = 0, 1,2,
... , m. Calculating directly with the definition of Eq. (1) we find first ofall Furthermore, for j = 2, 3, ... , I,
that &z(a, I} = Sea}. More generally, to study Bez(a, lll} we note that
dJ = eJb(CJ = eJ-ICab(C,,} =eJ-tb(CIl)Ca = dT-tCa = drct 1 = pTC~-t.
a(A}pll - llla{p) , lJ~ - ).."p'
-.:...:.:....;-----"~ =
).. - p
L aj--=-:---=--
j=o).. - p
Thus,

P~~. ].
k-I
=; - L at)..kpJ + ... + Aipll-I)
j=O b(CJ= [
, pTC~-1
+ L at)..i-lpll +';" + AllpJ-l).
J=Il+ 1
and combining with the first equation of (l7) we have
Checking the form of the matrices in Eq. (5) we find that

Bez(a, )..k) = diag[ -PSIP, S2] = S{a)C~.


Combining this with Eq. (ls) we get the first part ofthe following proposition,
Bez(a, b) == s{a).[ p!i ] (20)

known as the Bameufacumzauon of Bez(a, b). , pTC~-1

PrOpositiOD 2. lbeorem 1. The polynomials a{l) and bel) have no zeros in common if and
Bez{a, b) = S{a)b{Ca) = -S(b)a(Cb) . only if the associated Bezouimatrix Bez(a, b) is nonsingular.

PROOF. For the second statement, we first use Exercise 1 and then' apply PRoo~.' For definiteness, let 1= deg a(l} ~ deg bel}. Write the corres-
the first statement. Thus. . pondmg Bezo~t matrix B = Bez(a, b) in the first form of (17) and observe
Bez(a, b) = -Bez{b. a) = -S(b)a{Cb) . that det B :F 0 Ifand only if b(Ca) is nonsingular. Since the eigenvalues ofthe
latter matrix are of the form b(A;), where lit A2 , ,).., are the zeros of
as required. tI(A}, the result follows.
460 13.4 TIm HIlII.MITIl AND THl! Roum-HuRWlTZ THEOREMS 461

Let a(A) = E",o a,A', aj .;.: 0, and b(A) = !::'o bIA', b", #= 0, where I ~ m. Bxerelae 6. Ve~ify that the Bezout matrix associated with the polynomials
As in the definition of the Bezout matrix, we may assume without loss of a(A) = L~=o a,A' and ii(A) = L~=o a,A' is skew-Hermitian. Combine this
generality that b(A) is written in the form L~.o bIA', where the coefficients with Exercise 1 to show that the elements of Bez(a, ii) are all pure imaginary.
of the higher powers are, perhaps, zeros. With this assumption, we define
Exer~ise 7. Che~k that. the Bezout matrix B associated with the poly-
the resultant (or Sylvester) matrix R(a, b) associated with the polynomials
nomials o(A) = LI=O a,A', a, #= 0, and b(A) = :D'=o blAI, I ~ m, satisfies the
a(A) and b(A) as the 21 x21 matrix equation
ao al al-l al CaB = BCa, .(22)
. . .. where C", and C", denote the first and second companion matrices associated
ao al al-l with a(A.), respectively. (The second companion matrix is defined by Ca =
R(a, b) =
bo b 1 ... b", 0 0 cr, and C", is defined in Exercise 3.)
' .. ..
' .. .. Hint. Apply Eq, (17).
bo b1 b", 0
Exercise a.
Let Bez(a, b) denote the Bezout matrix associated with the
Partitioning R(a, b) into I x I matrices and using the notation introduced polynomials a(A) and b(A) of degrees I and m (I ~ m~, respectively. Show that
above we may also write
Bez(~, &) = -P Bez(a, b)P,
s calP PS(a)l
R(a, b) = [ S()P PS(b)j" where P is defined in Eq, (4).

Using Eqs. (7) and (12) it follows that Bxereise 9. Let a(A) and b(A) be two polynomials of degrees ( and In
(l ~ m), respectively. Denote a.. = a..(A) = a(A + IX), b.. = b..(A) = b(A. + IX),
P O] [ PS(tt)P sea)] and check that
[ -S(b) Sea) R(a, b) = Bez(a, b) 0
Bez(a.., bJ = V~) Bez(a, b)(V~)T,

.
= [1 0]
0 Bez(a, b)
[PS({t)P S(a~.
1 0 J where V~) = [/~)IXj- ,]'-1 and it is assumed that I~) = 0 for j < i.
\, i,jaO \,

This equation demonstrates an intimate connection between the resultant.


of a(A) and b(A) and their Bezout matrix. In particular, it is clear from equa- Exercise 10. Prove that, for any polynomials a(A.) and bel) of degree not
exceeding I,
tion (21) that R(a, b) and Bez(a, b) have the same rank. Also, the resultan~.,
matrix (as well as the Bezoutian) can be used in solving the problem or -b, a>[~ ~]R(a, b) = [~"""oB).
T
R ( B = Bez(a, b). 0
whether the polynomials a(A) and b(A) have zeros in common.
Theorem 2. The polynomials a(A) and b(A) defined above have no comman
zeros if and only if the resultant matrix R(a, b) is nonsingular. 13.4 The Hermite and the Routh-Hurwitz Theorems
PROOF. Since Sea) is nonsingular, it follows immediately from Eq, (21) that
R(a, b) is nonsingular if and only if Bez(a, b) is nonsingular, So the resulf' The question of stability of eigenvalues can always be posed in terms of the
follows from Theorem 1. " zeros of a polynomial rather than the eigenvalues of a matrix. We need only
There is an interesting and more, general result that we will not discu~l~ replace the eigenvalue equation, Ax = Ax, by the characteristic equation,
Namely, that the degree of the greatest common divisor of a(A) andb(Ai1i ~et(.u - A) = O. One advantage of the Lyapunov theorem of Section 13.1
is just I - rank R(a, b). IS that it enables us to avoid computation of the coefficients of this polynomial
Note that the use of the resultant matrix may be more convenient than t nd any consequent loss of accuracy. Nevertheless, it is important that we
use of the Bezoutian in the search for common divisors since, despite i ~e~elop necessary and sufficient conditions that all the zeros of a polynomial
ie 10 the left half of the complex plane. Consider a polynomial
double size, theelements oftheresultant arejustthecoefficients of thegiv
polynomials (orzero). a(A) = ao + alA + ... + a,A', a, #- 0,
462 13 STABlUTY PaoBLBMS 13.4 TIm Hmwrrs AND THIl Roum-HukWITZ1'HEoREMs . 463

with complex coefficients. The inertia In(a) of the polynomial a(1) is defined PROOF. In applying formula (13.3.22), observe that
as the triple of nonnegative integers {n(a), v(a), ~a)}, where n(a)(respectively,
yea) and eS(a denotes the number of zeros of a(04), counting multiplicities,
with positive (respectively, negative and zero) real parts. Like matrices, the
c. = C: - [0 ... 0 v],

inertia of a polynomial a(.1) is also called the inertiaofa(04) with respect to the
imaginary axis. In particular, a polynomial a(04) with n(a) = eS(a) = 0 is said
to be a stable polynomial (with respect to the imaginary axis). .
In contrast, there is the triple In'(a) = {1t'(a), v'(a), c5'(a)}, where n'(a), C:B - BCa = [0 '" 0 tl]B = ~ (1)
v'(a), c5'(a) denote the number of zeros of a(04) lying in the open upper half-
plane, the open lower half-plane, and on the real axis, respectively. This triple and we are now to show that W ~ O. Using the first factorization of (13.3.17),
and noting that
is referred to as the inertia ofa(A) with respect to the real axis. Clearly,

{n'(a), v'(ti), c5'(a)} = {n(a),v(a),c5(a)}, [0 ... 0 v]S(a) = a,[11 0 ... 0],


we have
where Ii = a(04) = a( - iA).1fa(1)is a polynomial with real C!lefficients, then its
complex zeros occur in conjugate pairs and hence n'(a) = v'(a). W = [0 '" 0 v]B = a,[l1 0 . .. O]a(CJ = a,vefa(C a).
The main result of this section is due to C. Hermite! and is concerned
But
with the distribution ofzeros ofa complex polynomial with respect to the real
axis. The formulation in terms of the Bezoutian is due to M. Fujiwara! in , '-I
1926. Indeed, it was Fujiwara who first used Bezout matrices to develop a eTiilC ) - ~ a-eTCj ~ - T
1""\ a -.t.. j 1 a = .t.. aJeJ+ l
+ a,e,
- TCa = -
-Q,lI,
systematic approach to stability criteria, including those of Sections 13.4and J=O J=O
13.5.
Considerapolynomiala(.1) = ~=o aJ AJ (a, :F 0) and let ii(A) = ~=o aJAJ.
and consequently, W = -la,l2 vvT = la,l2vll* ~ O.
Write Eq, (1) in the form
If B = Bez(a. ii) denotes the Bezout matrix associated with a(A) and ii(04), then
itis skew-Hermitian (see Exercise 13.3.6)and it is alsononsingular if and only .
ifa(A) and ii(04)haveno zeros in common (Theorem 13.3.1). Thus,det8:F On .
and only if a(A) has no real zeros and no zeros occuring in conjugate pairs.
(-iCa)*(fB) + (fB)(-iCa) = W~ 0, (2)

Theorem 1. With the notation of the previous paragraph, assume that . and recall that the assumption of nonsingularity of B implies that c5(iCa) = O.
det B #: O. Then Hence Theorem 13.1.3 yields .

1t'(a) = n({ B).

v'(a) = vG B). and the assertion follows in view of the relations

n(-iC..)* = n(-iC..) = v(iC:) = n'(C..) =n'(a),


a'(a) = a(i B} = O.
and similarly for v( -iC..)*.
f C. R. Acad. Sci. Paris 35 (1852).52-54; 36 (1853). 294-297.. See also J. fUr die Reine II.
Angew. Math. 51 (1856), 39-51. Corollary 1. All the zeros of the polynomial a(l) lie in the open upper half-
J Math. Zeit. 24 (1926). 161-169. plane if and only if (l/i)B > 0, where B = Bez(a, ii).
464 13.4 1'HIl HIlRM1TIl AND THIl RoUTH-HURWITZ limORBMS 465

Exainple 1. The polynomial note that the matrix B = BD is nonsingular iCand only if a(A) has no pure
imaginary zeros or zeros symmetric relative to the origin.
a(A) = A4 + 4i.A3 - 4A 2 - 1 = (A + i)2(A2 + 2iA + 1) Theorem 2. With the notation ofthe previous paragraph, the matrix 8 = BD
clearly has zeros -i, -i, (-1 .Ji)i, We have is Hermitian. IfIi is also nonsingular, then
B = Dez(a, ii) n(a) = v(R),

~4i -:i1 ~i0 0


~] [-~ _~ -~ =:i]
v(a) = n(8),
= [_, O{a) = (;(8) = O.
0 0 -1 0
1 0 0 0 0 0 0-1 PROOF. The assertion of the theorem follows from Theorem 1 and the
easily verified relation
0-4
-4 -4i
-4i
1
1] [-1 0-4
0 0 -1 0-4
4i]
FHB)F* = -D,
[-4i 1
'1 0
0
0
0
0
0 0 -1 0
0 0 0-1 where F= diag[i'- I, i
'-
2 , ,
i, 1] and B is the Dezoutian associated with
altA) = a(-iA) and al(A) = i.i(iA).

=i [ ~ -~
-8
-:
0 -32 O'
~] CoroUary 1. A polynomial a(A) is stable if and only if the matrix R is positive
definite.
o 0 0-8 Example 2. Consider the polynomials a(A) = A2 - 2A + 5 and a(-A) =
A2 + 2A + 5. The corresponding Bezoutian is
Hence, using the Jacobi theorem, Theorem 8.6.1,

~ -~]
- 8 0
InOB) = In ( ~ -3~ -8 0 = {l, 3, O}. and hence the matrix
o
Thus, n'(a) = 1 and v'(a) = 3, as required.
-8 o
0
0
D= [-20o 0]
-4
is negative definite. Thus both zeros, Al and A2 , of the polynomial lie in the
. N~te that Theorem 1 gives no information if a(A) is a real polynomial since
. open right half-plane, as required, since Al = 1 + 2i and A2 = 1 - 2i. 0
10. this case B = O. Methods for evaluating the inertia of a real polynomial
with respect to the real axis, as well as a condition tor a polynomial to have If a(A) in Theorem 2 is a real polynomial, then those i, jth elements of the
no real zeros, are discussed in Section 6. matrix B with i + j odd are zeros. This is illustrated in the following exercise.
As soon as the zero-location problem is solved with respect to the uppef
Exercise 3. Check that the matrix D defined in Theorem 2 and associated
and lower half-planes, the results can be reformulated for any other pair of
with the real polynomial a(A) = ao + alA + a212 + a3A3 + a4A4 is
mutual~y.comple!D;ent~~alf-planes in the complex plane. The problem of
dete~m1Jl1n~ the dl~trI~ution of zeros relative to the imaginary axis is of o
part~cular lDter~t.lD View of the application to differential equations (s
Section 13.1); this IS known as the Routh-Hurwitz problem.
Let a(l) =:= L~=o aJAl (a, :# 0) and let B stand for the Bezout matri
associated with alA) and ii( -A). Defme D = diag[l, -1, ... , (_1)'-1] an
466 13.5 TIm ScHUR-CoHN THEOREM 467

The existence ofsymmetrical patterns of zeros in the matrix allows us to matrix B = PB, is Hermitian and, ifit is nonsingular,
compute the inertia of the matrix " IX+(a) = n(B),
IX_(a) = v(B),
o(a) = !5(B) = O.
instead oCIn S. Here
PROOF. The proof of the theorem consists in proving the relation

Jj - C:'JjC~ = (a(C a* a(C a) > 0 (1)


is a permutation matrix defined by and in applying Theorem 13.2.2. Note that the equality Jj = Jj* is easily
Ql = [el e) .. eJ, Q:z = [e:z e4 .. , eJ] , established by transposing the expression for B in Eq. (13.3.7) and reducing it
where e h e:z ... e, are the unit vectors in C', k = 1- 1 andj = I if I is even,
=
by use of the relation p:z I to the expression in Eq. (13.3.13).
First use the identity (13.3.6) to deduce that
=
k I andj = 1- 1 if lis odd.
S(a)PS(a)C~ = -S(a)S(Ii)P,
Exercise 4. Show that the inertia of the matrix S discussed in Exercise 3 and then (13.3.12) implies
is equal to In B 1 + In B:z, where
S(a)S(Ii)p = -S(a)PS(a)C~.
Use this in the representation for B given by Proposition 13.3.1:
B = S(a)S(a)P - S(a)S(Ii)P
A"different form of the Routh-Hurwitz theorem for real polynomials will
= S(a)S(a)P + S(a)PS(,,>C~.
be obtained in Section 13.9.
But Proposition 13.3.2 also gives B = S(a)ii(CJ, and so
3(cJ = S(a)p + PS(a)~.
13.6 The Schur-Cohn Theorem Consider the conjugate transposed relation and postmultiply by acC,,)
= S(a)-l B to obtain
The determination of the number of zeros of a polynomial lying inside the a(CJ*a(Ca) = PB + C:'S(Ii)pii(cJ. (2)
~t circle is known as the Schur-Cohn problem and, as noted in Section 13.2, But using (13.3.6) again,
It IS closely related to the stability problem for linear difference equations. A
solution of this problem in terms of a corresponding Bezout matrix is pre- S(a)Pa(Ca) = -PS(a)C~ii(Ca) = -PS(a)a(CJC~
sented in this section. = -PBC~.
ConSider a complex polynomial a(,t) = ao +al,t + '" + a,,t' (a, =1= 0)
Thus, Eq. (2) takes the form
and define the inertia ofa(A) with respect to the unit circle to be the triple of
numbers cx(a) = {IX +(a), IX_(a), IXo(a)}, where IX+(a) (respectively, ex_(a) and a(CJ*a(Ca) = B - C:'BC~,
~o.(a.denotesthe number of zeros of a(,t), counted with their multiplicities, which is Eq, (1).
}'lng Inside the open unit circle (respectively, outside and on the unit cir-
Cumference). Hence by Theorem 13.2.2, the matrix C~ has nCB) eigenvalues inside the
unit circle and v(B) eigenvalues outside it. Obviously, the same can be said
Theo~ellli. Let a(A) = B=o aJAJ (a, =1= 0) and let B standfor the Bezoutian about the eigenvalues of C a and, consequently, about the zeros of the
assOCiated with li(l) and a(,i) A ,i'a(,i- 1). If P is a rotation matrix then the polynomial a(,t). II
468 13 STADlLm PROBLEMS 13.6 PBllTUlUlATIONS OFA R1!AL PoLYNOMIAL 469

CoroDary 1. A polynomial a(l) is stablewith respect to the unit circle if and The companion matrix CjJ is related to CII by
only if the matrix B is positive definite. CII == Ca + FD, (4)
where D == diag[b o, bit ... , b,- 1] and
13.6 Perturbations of a Real Polynomial 0
F== .
Let a(.t)be a polynomial of degree I with realcoefficients. In this section we
[ . ~,
are concerned with the effectof perturbing the coefficients of a(.t) with purely la,
imaginary numbers. In particular, we will study the relation between the
inertias (with respect to the real axis) of the original and the perturbed Substituting from Eq. (4) into Eq. (3), it is found that
polynomials. These results will be applied in subsequent sections to obtain (iC;s)*B + B(iCa> == iDF*B - iBFD. (5)
criteria for a real polynomial tobe stable.
The main result oftbis section is presented first and then is used to provide a Now use the representation (13.3.20) for B to obtain
unified approach to some classical stability criteria. T

Theorem 1 (P. Lancaster and M. Tismenetskyt), Let a(4) == LI=o ai.t '
iDF'B-D
1 0
~ ~
0] [
0 6
bT
f' ]
-DAD,
I
(a, oF 0) and b(.t) == ,EI:A b,.ti be polynomials with real coefficients. Write
il(.t) == a(.t) + ib(.t).Ijthe Bezoutmatrix B == Bez(a, b) is nonsingular, that is, [
1 0 0 bTC~-1 I
if a(.t)andb(.t)are relatively prime, then

and
n'(a) S 1t'(ii) == v(B),
8'(a) == 0

\I'(a) S v'(a) == nCB).


(1)

(2)
where bT == [b o b, . . . b,_,] and A is the I x I matrix with all elements
equal to 1. Furthermore, -iBFD == (iDF*B)* == DAD and therefore Eq. (5)
implies
I
(iCa>*B + B(iC;s) == W, (6)
Furthermore, equalities hold throughout Eq. (2) if 8'(a) == O.
where W = 2DAD ~ O.
This theorem means that the perturbation b(A) has the effect of shifting all Since ~'(a) == 0, then 8(iCa) == 0 and, by Theorem 13.1.3, In(iC,) = In B.
real zeros, of a(J.) off the real axis, and the numbers 1t'(a) and v'(a) of eigen- When B is nonsingular, Proposition 13.1.2 can be applied to Eq. (3) to
values ofthe perturbed polynomial above and below the real axis, respectively, yield the inequalities in (2).
can be counted using the inertia properties of B. The additional condition ~'(a) == 8(iCJ == 0 together with det B oF 0
yields (in view of Theorem 13.1.3 and Eq. (3 equality throughout (2).
PROOF. Suppose that ii(Ao) == 0 and 1 0 is real. Then a(40), b(Ao) are real
This completes the proof.
and so a(J.o) == a(J.o) + ib(J.o) == 0 implies that a(.t o) == b(.to) == O. Thus
a(.t) and b(1) have a common zero, contradicting the hypothesis that B is Corollary 1. If, inthe previous notation, B < 0 or B > 0, thenthe zeros o/the
nonsingular. Hence the relation (1). polynomial a(l) lie in the open upper half-plane or open lower half-plane.
As noted earlier, the Bezout matrix B satisfies Eq. (13.3.22) which, when respectively.
a(A) == a(J.), can be written in the form C:B == BCII, where CII is the com-
panion matrix associated with a(,t), or, what is equivalent, Example 1. The polynomial a(,t) == l2 - 1 has two real zeros. The perturba-
tion ib(.l) == 2iJ. satisfies the hypothesis of Theorem land
(iCJ*B + B(iCII) == O. (3)
We also know (see Exercise 13.3.1) that B is real and symmetric. B = [~ ~][~ ~] == 21 > O.
t Linear A.lg. A.ppl. 52 (1983). 479-496. Hence both zeros of 1 2 + 2iA - 1 are in the open lower half-plane.
470 13.7 THE LtBNARD-emPART CluTElUON 471

Bxernat 2. By use of Theorem 1,check that the polynomial ao.) = 14 + Theorem 1. Let Eq. (1) be the L-C splitting of the real polynomial a(A).
4i,P - 4AZ - 1 has three zeros with negative imaginary parts and a zero Denote
with a positive imaginary part. 0 he- ,l2) if I is even,
Another hypothesis leading to the main conclusion (Eq. (2 of Theorem 1 0,1(.1.) = { Ag(_A2 ) if I is odd;
is expressed in terms of controllability.
Ag( - A2) if lis even,
Theorem 2. With the notation of Theorem 1 (and its proof), let the pair a2(A) = { -he _A2) if I is odd.
(ct, W) be controllable. Then the Bezout matrix B associated with a(A) and
b(A) is nonsingular, <5(0) = 0, and the relations (2) hold. lIthe Bezout matrix B associated with al(A) and0,2(.1.) is nonsingular, then a(A)
hasno pureimaginary zeros and
The proof of Theorem 2 is easily obtained by applying Theorem 13.1.4 to
Eq. (6). 7t(a) = n(B), veal = v(B). (2)

Exercise 3. Let a(A) = El-o a,A', a,::F 0,0" == til> i == 0,1, '" ,I. Prove that
PROOF. Let I be even. Define the polynomial ii(A) by ii(A) = a(iA) =
the polynomial ii(A) == a(A) + ibo (bo > 0) has no real spectrum and that h( _A2) + iAg( _A2) = al(A) + ia1.(A) and observe that ii(A) has no real
zeros. Indeed, if there were a real Ao such that 0,1(Ao) + ia20'0) = 0 then,
v'(a) _ 1t'(a) = {O ~f I ~s even, taking complex conjugates, we would obtain al(Ao) - iaz(Ao) = O. Thus,
sgn at If lIS odd. it would follow by comparison that a2(A.O) = 0 and hence al0l0) = O. This
Hint. Apply Theorem 2. 0 conflicts with the assumed relative primeness of al(A) and a2(A) (implied
by det B ::F 0).
Now applying Theorem 13.6.1, we obtain

13.7 The lienard-Chipart Criterion v(B) = n'(ii) = v(a),


nCB) = v'(ii) = n(a),
The names of A. Lienard and M. Chipart are strongly associated with a t5(B) = t5'(ii) = 15(0,),
stability criterion for the zeros of real polynomials. Using the results of the and the theorem is established for the case of an even I.
previous section, we can now present, a modified version of the original If I is odd, then write
Lienard-Chipart! stability criterion in terms of a Bezoutian. '
Consider a polynomial a(A) = E~",o aiAi, with a, ::F 0 and real coefficients;
-iii()') = Ag( _A2 ) - ih(_A2 ) = a l ( ).) + ia2(A)
If I is even, define and observe that the zeros of ti().) and - Ui(A) coincide. Now apply Theorem
,,1. (1/1.)-1 13.6..J. .again,
heAl = 'E aZJ AJ, g(A) == L a1.J+I AJ
Corollary I. The polynomial a(A) is stable if and only if B = Bez(ah a2)
i"'O i-O
If I is odd, then we put is negativf! definite.
l/Z(I-l) 1/Z(I-1) PROOF. If a(A) is stable, then the polynomials a1(A) and a2(A) defined above
h(A) = L aZj AJ, g(A) == L aZi+HV, are relatively prime. Indeed, if a1(A o) = 0,20'0) = 0 then also al(Ao) =
, J=o J=O
"2(10 ) = 0 and ii(Ao) = ii(lo) =. O. Hence a(A) has complex zeros p. and
For brevity, we refer to h(A) and g(A), as well as to the obvious representa- - Ji (p. == iAo) or a pure imaginary zero iAo (Ao = Ao)' In both cases we
tion '
contradict the assumption that all zeros of a(A) lie in the open left half-plane.
a(A) == 2)
h(A + Ag(A2), Thus, B is nonsingular and, by the theorem, must be negative definite.
as the L-C splitting of a(A). Example 1. Consider the polynomial a(A) = (A - 1)2(A + 2) and the
representation
t J. Math. Pures Appl. 10 (1914),291-346. -iii(A) = -ia(iA) = A(-3 _1 2 ) - 2i.
472 13 STUILITY PllOBLllMS 13.7 THIl LIBNARD-CHIPART CJumuoN 473

Here al(l) = -.1.(3 + 12), a2(1) = -2, and the corresponding Bezoutian Consider the matrix Q = [QI Q2] defined by Eq. (13.4.3). For j =
is 1,2, ... , k, we have
QT diag[A~Y_I' A!lJ-tJQ = diag[AW+ b Afll+ h AW-b Afl}- tJ,
[~ ~ ~],
(5)
B= where
200
withIn B = {2, 1,O}. (To seethis,usethe Jacobi method.) Thus,by Bqs, (2),
also In(a) = {2, 1,O}. 0 AY:,l-I=[ ~ ... 0 -~ ],
The following result is a consequence of Theorem 1 and permits the -ao a2 . : '('_1)8-I:a2&_4
description of the inertiaof a real polynomial in terms of that of two Bezou-
tians of smaller size. ~
( 1)&a2& .~ - ~:)kal].
Theorem 2. If a(A) is a real polynomial and h(A) and g(A) form the L-C
A~~-l =[ :
(-1)kal 0 ... 0
splittingofa(it), wherethe BezoutiansB I = Bez(h(A), g(A andB2 = Bez(h(A),
A.g(A are nonsingular, then a(A)has no pure imaginary zeros and Thus, by use of Eq. (5), it follows from Eq. (4) that
n(a) = lI(B I ) + n(B2 ) , QTBQ = diag[.82 , Btl, (6)

lI(a) = n(B I ) + 1I(B2). where, appealing to Eq, (13.3.5) and writing C for the companionmatrix of
g( -A),
In particular, CA) is stable if and only if B 1 > 0 and B 2 < O.
k

PROOF. We give the proof for the case of I even, say I = 2k.
JJ I = L (-I)J-l a"J_1 diag[AW- b Al,f}-l]
J=I

Let B denote the Bezout matrix associated with the polynomials al(A)
and a2(A) defined in: Theorem 1. Applying Proposition 13.3.2 and Eq.
(13.3.5), it is found that
k
B= L (-l}'-l a" J_ I diag[AW_I' AW-il,
J=I
where
0 0 -ao
- a2 a4 (-1)k a,]
-ao 0
4
a2 = .a 00: (aIC-a3C"+ ... +(-I)k-Ial_ICk).
A~~_I =
(~I)kal
0 [
0 .
0
Nowuse Proposition 13.3.2 again to obtain
.81 = B(h( -A), g( -A, jj" = B(h( -A), Ag(-A.
Using Eq. (13.3.7), we easily checkthat
pBIP = -B I, pR2P = -B(h(l), -Ag(l = B 2 , (7)
474 13 STAlIIUTY PROBLIlMS 13.8 TUB MARKOV CRITJlRJON 475

where P = diag[l, -1, ... , (_1)1:-1] e Cl:. Now Sylvester's law of inertia Proposition 1. Let bel) = D=o hjli and c(l) = CjA} he two real D=o
applied to Eqs. (6) and (7) yields, in view of Theorem 1, the required result polynomials in which c, .p O. If B = Dez(c, b) denotes the Bezou: matrix
provided I is even. The case of I odd is siinilarly established. associated with c(A) and b(A), then
Example 2. Let a(.t) be the polynomial considered in Example 1. We have B = S(c)HS(c), (1)
h(l) = 2, g(A) = - 3 + A., and
where H denotes the Hankel matrix of Markov parameters of the rational
function R(.t) = b(A)/c(A).
BI = -2, e, = [_~ -~J
PROOF. Let H = [h/+j-dtj;t, where ht, h2 , , h21- t are defined by
Thus In Bl = {O, 1,O}, In Bz = {I, 1,O}, and a(l) has two zeros in the open
right half-plane and one zero in the open left half-plane, as required. (2)

Example3. Leta(A) = (A - 1)3(.t + 2).Thenh(.t) = -2 - 3A + Xt,g(A) = It is easily found, by comparing the corresponding coefficients on both
5 - A. and sides of Eq. (2), that

s, = [-1~ -~J. = [ ~~ -~l


bT = hIs(c), (3)
B2
where b = [bo bt ... b,_I], bJ = bj - b,cjc,- t,j
T
= 0, 1, ... , I - 1, and
hI = [hi h2 .. h,). !L I'
Since In BI = {I, 1, O}, In B2 = {2, 0, O}, then, by Eq. (3), In(a) = {3, 1,0}, , .'

as required. 0 Furthermore, using Eq. (13.3.5), it is found that .I


"
A simplified version of the criteria of Theorem 2 will be presented in
Cj+l

I
Section 13.10. C,

~1
Co ] , [ ~, J] = "J+IS(C),
Cj-l
where IIJ+I = [hJ+ I hj+ 2 '" hJ+,] and j = 1,2, ... , I - 1. Combining
13.8 The Markov Criterion this with Eq. (3) we obtain

[;:c, ]. =
Let a(A) = ao + alA + ... + a,A' be a polynomial with real coefficients
a,
and .p 0, and let at(A), az(.t) be defined as in Theorem 13.7.1. Note that
HS(c).
generally deg a2(.t) S deg al(l). Hence we may define a rational function bTC~-l
rCA) = az(A)/al(A) with the property limA... .., Ir(A)1 < 00.
For sufficiently large IAI, we may represent r(.t) by a Laurent series, It remains only to use Eq. (13.3.20) to obtain Eq. (1). . '
00
Now we are in a position to state a modified Markov stability test.
r(A) = L hjA-i,
j=O
Theorem 1. Let H be the Hankel matrix of the Markov parameters of the
and construct the corresponding (truncated) Hankel matrix H of the Markov polynomial a(A) and assume that det H .p O. Then
par~ters hJ:H = [h,+j-l]tj=l' It turns outthat the results obtained in
Sections 13.3and 13.4can bereformulated in terms ofthe inertia of a Hankel ' n(a) = n(H), veal = v(H), 6(a) = O. (5)
matrix of Markov parameters. This is possible in view of the relationship
between Hankel and Bezout matrices that is established in the next PROOF. The assertion of the theorem trivially follows from Theorem 13.7.1,
proposition. Proposition I, and Sylvester's law of inertia.
476 13 STABILITY PROIlLllMS 13.8 TIm MARKOV CRITIlIUON 477
Corollary 1. A real polynomial aU) is stable if andonly if H < O. forsufficiently large IAI. Therefore by Eq. (1) the matrices B z = B(h(..t), Ag(l
EXlIIIIple 1. Let a(A) = (A + 1)2(A + 2). Then and H 2 are congruent. This proves the theorem in the even case.

r(A) = az(A.) =
a10,)
+-2 4:
2
=t
..1.(5 - A) J'" 1
hJA.-J,
Now consider the case when I is odd. Here deg h(l) :::;; deg gel) and
h(A) ~ h
-gel) =J"'O
i.J }+111.
1- j
,
h(A) _ ~ h 1 -
- i.J JII.
J

Ag(A) J'" 1
where the coefficients hh hz, ... , hs are found to be hi = -4, h2 = h4
= 0, h3 = -18,;'s = -90. Hence the Hankel matrix ThusH 1 (respectively,Hz)is congruent to the Bezoutian -B z = B(Ag(A), h(A
(respectively, -B 1 = B(g(l), h(l), and Eqs. (8) follow from Theorem
-40 0-18]0 13.7.2.
H=
[
-18
-18
0 -90 CoroUary I. A real polynomial a(A) is stable
H 2 < 0, where HI and Hz are definedin Eqs. (6).
if and only if HI> 0 and

is negative definite (use Jacobi's method to see this), and therefore a(l) is
stable. 0 Note that by writing the Markov parameters in the traditional form
(I is even),
Applying Eq. (1) and Sylvester's law of inertia to the Bezout matrices B 1
and B 2 , the next result follows from Theorem 13.7.2.
(9)
Theorem 2 (A. A. Markovt), In the notationofthis sectton.let

HI =.
:: hz '"
.:
h
k
1 Hz =.
I:: h
3

.
~m+l1
: ,(6)
it is found that
HI = PH1P, H2 = -pR2 P,
[
~k ~2k-l ' ~m+I' : where P = diag[l, -1, ... , (_1)",-1] and m = t/, and

where k = m ;:: il or k = i(l + I), m = t(l - I), according as lis evenor odd, ro rl ... r m - 1]
and where the Markov parameters hj,j = 0, 1,... , are defined by R1 -_ rl
. . ... ,
r
h.l- j = {9(l)/h(l)
J"'O J h(l)/Ag(l)
if I is even,
if I is odd.
(7) [

.: rZ m - 2

If HI and Hz are nonsingular, then a(l) has no pureimaginary zeros and Thus, the stability criterion with respect to the form (9) is equivalent to the
conditions R 1 > 0, R z > 0, as required.
n(a) = v(H1 ) + n(H2),
v(a) = 7t{H I) + v(Hz). (8) . Example 2. Let a(A.) be as defined in Example 1. Then h(l) = 2 + 4A.,
g(A.) = 5 + A., and
PROOF. Let I be even. Then deg g(A) < deg h(A) and hence, in Eq. (7),
h o = O. Note that since the leading coefficient of h(A) is nonzero, formula (1) h(l) = 4r1 _ 18A.-2 + 90A.-3 + ....
yields the congruence of B 1 = B(h(A), gel~ and HI' Furthermore, it follows Ag(A.)
that
Hence

HI = [:~ ::] = [-1: -:~J. H2 = [h2 ] = [-18],


t zap. Petersburg Akad. Nauk, 1894. See also "The Collected Works of A. A. Markov,"
Moscow, 1948, pp. 78-105. (In Russian.)
and the properties H ~ > 0, H 2 < 0 yield the stability of the given polynomial.
o
478 13 STABJUTY PROBLIlMS 13.9 THE ROUTH-HURWITZ THEoREM 479

13.9 A Determinantal Version of the PROOF. Introduce the 1 x 2s vectors


Routh-Hurwitz Theorem hr = [0 " . 0 ho hi
and
A disadvantage of the previous methods for computing the inertia of a bI = [0 ... 0 b, b'-1 b,-J,
given polynomial is that this inertia is expressed in terms of the inertia of a for k = 0, 1, ... , 2s - 1. Note that
matrix with elements that must first be calculated. In this section wedetermine
the inertia of a real polynomial by computing the leading principal minors of a hI.-l = rho hi h 2.- 1]
matrix whose non-zero elements are simply the coefficients of the poly- and
nomial. b1.-l = [b, b'-1 b'-2.+ 1]'
Let a(A) = E""o ajAj be a polynomial with real coefficients and a, =F 0,
Using Eq, (13.8.3), it is found that, for each k,
and define the I x I matrix
a,-1 a,-3 C'-1 ... C'_~'+I]
a, a,-2 C, ". : _ bT (3)
- k'
0 a,-1 a,-3 . C'-1
b= 0 a, a,-2 o c,
0 0 a,-1 Hence, denoting the unit vectors in C 28 by "1' "2"", "2., we easily check
0 0 a, that
"T cI'-l
called the Hurwitz matrix associated with a(A). We shall show that the
111,-1 b1.-l
Hermitian matrix H from Section 13.8 and the non-Hermitian Hurwitz "I cI'-2
matrix fl have the same leading principal minors. We state a more general h1.-2 b1.-2 (4)
result due to A. Hurwitz. t
j
Proposition 1. Let b(A) = L}ao bjA and c(A.) = L}""o CjA/ be two real.
polynomials with c, =F 0, and let ho' hl> h2, ... denote the Markov parameters
of the rational function 1'(.1) = b(l)/c(l). Then if we define bj = Cj = 0 for where
j<o, ' cI = [0 ... 0 C, C'-l . c,-J, k = 0, I, ... , 2s - I,
C, C'-1 C'-2.+1
and Cj = bj = 0 for j > I. Observe that the Toeplitz matrix in Eq, (4) coin-

Cf8 det
r
t
h

h.
h2

h.+1 '"
h8
t~
h2.- 1
]= det
b,
0
0
b'-1
C,
b,
C'-1
b'-1
h'-2.+ 1
C'-2.+2
b'-2.+2 ,
cides with the matrix in Eq, (3), and therefore their determinants are equal.
Computing the determinant on the left side in Eq, (4), we find that it is equal
to

0 C, C,_.
j) h, b,_.

for s = 1, 2, ... , I. =cf'det [~ ~].


Note that the matrices on the left and right of Eq. (2) have sizes s and 2~,
respectively.

, Math. Ann. 46 (1895), 273-284.


480 13 STABILITY 1'JlOBLBMS 13.9 THE Roum-HURWlTZ THEoREM 481

where Now apply Eq. (2) for bel) = gel), c(l) = h(..l.) to obtain, for s = 1, 2, ... , k,
a, a.-2
. h. h.+ l 2 I
hi ... h.-;-l.] h . - z] 0 a,-l a,-3
I.
"0
'.
,- IXT
n.
= h.-I
.
:
hs
...
hz..- . al'b~l) = det
0 a, a,-2
". ". hi ' [ 0 0 a,-l
= a,Azs- l' (8)
... 0 ho hi '" h.
Thus the determinant of the matrix on the right in Eq. (2) is 0 0 al
h2 '" Using Theorem 13.8.2 and the Jacobi rule, if Ai :F 0 for i = 1,2, ... , I,
we obtain
n(a) = V(1, 5~I), .. , bPI) + P(1, b~21, .. . ,15\2, (9)
where V (respectively, P) denotes the number of alterations (respectively,
as required. constancies) of sign in the appropriate sequence. In view of Eqs, (7) and (8),
we find from Eq. (9) in the case of a positive a, that
The famous Routh-Hurwitz stability result is presented in the following n(a) = V(l, Ah A3 , , AI-I) + P(l, -A2 , A4 , , (-l)kA /)
theorem.
= V(l, Ah A3 , , A,-I) + V(l, Az, A4 , , A,), (10)
Theorem 1. Ifa(A) is a realpolynomial andthere arenozeros in the sequence which is equivalent (in the case a, > 0) to the first equality in Eq. (6). The
(5) second equality in Eq. (6), as well as the proof of bothof them for a, < 0,
is similar. It remains to mention that the case for I odd is established in a
of leading principal minors ofthe corresponding Hurwitz matrix, thena(..t) has similar way.
no pureimaginary zerosand CoroUary.t. A real polynomial a(A) of degree I with a, > 0 is stable if and
only if
n(a) = Va.,
Az A, )
( Ai> AI' ... , A.- 1 ' Al > 0, A2 > 0, ... , A, > O. (11)

veal =I - v(
ah Ab ~:' , A~~) = p(a l' AI' ~:' ... , A~~ J (6)
Example1. If a(A) = (A - 1)3(A + 2) (see Example 13.7.3), then

where V (respectively, P) denotes the number of alterations (respectively,


constancies) ofsign in the appropriate sequence. 9. = [-i =~ -~
o 1 -3 -2
~],
PROOF. Consider the case of I even, say I =
2k. For s = 1,2, ... , k, denote
with Al = -1, A2 = -2, A3 = -8, A4 = 16. Hence
by 5~I) (respectively, 5~Z the leading principal minor of order s of the matrix
HI (respectively H 2) defined in Eqs. (13.8.6). For s = 1,2, ... , I, let A. stand n(a) = V(l, -1,2,4, -2) = 3, v(a) = 1,
for the leading principal minor of order s of the Hurwitz matrix fl. Applying
Eq. (2) for the polynomials bel) = Ag(l), e(l) = h(l), where h(l) and g(A) as required. 0
generate the L-C splitting of a(.t), it is found that, after s interchanges of Note that the Gundelfinger-Frobenius method of Section 8.6 permits us
rows in the matrix on the right in Eq. (2), to generalize the Routh-Hurwitz theorem in the form (10) or in the form
a~'5~Z) = (-1)SA2 s , S = 1,2, ... , k. (7) 7t(a) = V(a" Ai> A3 , , A.) + V(l, A2 , , A.-I), (12)

\.\
1
'I
482 13 STABIUTY PaOIlLI!MS 13.10 THE CAUCHY INDEX AND ITS A!'PUCATIONS. 483

according as I is even or odd. It can be seen from the proof of Theorem 1 Exercise 2. Verify that if
that this extension holds whenever Theorem 13.8.2 and the Grundelfinger-
Frobenius method apply. Theorem 13.8.2 requires the nonsingularity of . r(' ~ /li
Il.) = t... A _ A. + rl(A),
HI and H2othat is, a~l) :F 0 and a!:) :F 0 (or, equivalently, A2/<-l :F 0 and i= I 1

A2 :F 0), where k and m are as defined in that theorem. where /li> Ai are real (i == 1,2, ... , s) and rl(l1) is a rational function without
Application of the Gundelfinger-Frobenius method to the sequences of =
real poles, then 1~r(A) Li sgn /li' where the summation is over all Ai
leading principal minors of HI and H 2, 15\1), a~l), ... , all) and 15\2), a~2), lying in (a, b).
... , a~), requires no two consecutive zeros in these sequences and all) :F 0,
c5~) :F O. In terms ofthe A's this means that there must be no two consecutive Exercise 3. Check that
zeros in the sequences AI' A3 , , A21 - 1 and A2 , A4 , , A2m , and also
~-1 :F 0, A, :F O. I: b(A) == rsgn b(A,)
C(A) II<AI<" c'(A,)'
Example 2. For the polynomial a(A) discussed in Example 13.7.1, it is found
that provided that all zeros Ai of e(A) are simple. 0
I:
[~ -~ ~],
Proposition 1. The Cauchy index e'(A)/c(l) is equal to the number of
fl == distinct real zeros in (a, b) ofthe real polynomial c(A).
o 0 2
PROOF. Let e(A) == CoCA - .1. 1)'1'(..1. - A2)P2 ... (A - Ak)"k, where Ai A for
with A1 == 0, A2 == - 2, A3 = -4. Thus, in view of the Eq. (12), i :Fi, i, j == 1,2, ... k. Suppose that only the first p distinct zeros of) C(A)
7t(a) == V(l,O, -4) + V(l, -2) = 2, are real. If p == 0, then c'(A)/c(A) has no real poles and hence the Cauchy
index is zero. If 1 :s;; P :s;; k, then
as required: 0
c'(A) k P
-==
e(A)
L
II
_,..i_=
lElA-Ai i=lA-Ai
II

l'
L
-""-+r (A)
13.10 The Cauchy Index and its Applications where r 1(A) has no real poles. It remains now to apply the result of Exercise 2.

The notion of the Cauchy index for a rational function clarifies some'
properties of the polynomials involved in the quotient and, as a result, plays Our first theorem links the Cauchy index of polynomials bel) and c(l)

an important role in stability problems. with the inertia properties of the corresponding Bezout matrix. First recall
Consider a real rational function r(A) == b(A)/c(A), where b(A) and c(A) the notion of the signature of a Hermitian matrix as introduced in Section
are real polynomials. The difference between the number of points at which 5.5.
r(A.) suffers discontinuity from - 00 to + 00 as A increases from a to b
(a, be R, a < b) and the number of points at which r(A) suffers discontinuity Theorem I. Let b(A) and c(A) be two real polynomials such that
from + 00 to - 00 on the same interval is called the Cauchy index of rCA)
deg bel) ~ deg e(l)
on (a. b) and is written I:b(l)/c(l). Note that a and b may be assigned
the values a == - 00 and/or b == + 00. and let B = Bez(b, c) denote the corresponding Bezout matriX. Then
Exercise 1. Verify that ifr(A) == /lo/(A - .1.0 ), where /lo and .1.0 are real, then r~ bel) .
- ~ e(A) = slg B. (1)
I if /lo>O,
I~:=sgn/lo= -1 if /lo<O, Notethat,in viewofEq. (13.8.1),the Bezoutmatrix Bin Eq.(I)can be replaced
{
o if #10=0. by the Hankel matrix H of the Markov parameters of b(l)/c(A).
484 13 STABIUTY PRoDwrs 13.10 THB CAUCHY INDEX AND ITS ApPLICATIONS 485

Proposition 1. If the real polynomial a(A) = ao + alA + ... + alA' with


PROOF. Let US establish Eq. (1) for the case when all the real zeros of the
polynomial c(A) are simple. Then we may write the rational function
a, > is stable, then
r(A) = b(.\)/c(A) in the form ao > 0, al > 0, ... , a,_1 > 0, a, > 0. (6)
S P, --
rCA) = L .,..---;- + r 1(A), (2)
'-1 . . - . . ,
.
PROOF. Any real polynomial a(A) of degree I can be represented
where AI' A",... , A. are the distinct real zeros of the polynomial c(A) and
r I (A) is a real polynomial without real poles. Since a(A) =
i=1
n" (A2 + J.l.jA + Vi),
a,TI (A + A,) i-I (7)

1 At
L where s + 2k = n and the numbers Ai> J.l.i' Vi are all real. Obviously, the con-
CD
.,----, = U+ 1 (3)
A - A, t-o A dition that a(A) has only zeros with negative real parts implies that A, > 0,
for sufficiently large A., we may substitute from Eq. (3) into Eq. (2) and find i = 1, 2, , s, and also that the quadratic polynomials .12 + J.l.i A + vl'
j = 1, 2, , k have zeros in the open left half-plane. The last requirement
r(),) = ~
11=0
( PIAt)j),H
1=1
1 + r 1(A). yields Iti > 0, "i > 0 for all l- Now it is obvious from Eq, (7) that since the
coefficients of a(A) are sums of products of positive numbers, they are
Therefore hI; = Lt-l PiAt are the Markov parameters of the function r(A), positive.
for k = 0, 1,....
Note that the condition (6) is not sufficient for the stability of a(A}, as the
Now observe that in view of Eq.(13.8.1} and Exercise 2, Eq. (1) will be
proved for the case being considered if we establish that example of the polynomial P + 1 shows.
Thus, in the investigation of stability criteria we may first assume that the

h"~ll
coefficients of a(.\)are positive. It turns out that in this case the determinantal
sig [:; hi..... = sgn Pi> (4) . inequalities in (13.9.11) are not independent. Namely, the positivity of the
Hurwitz determinants A, of even order implies that of the Hurwitz deter-
: 1-1
minants of odd order, and vice versa. This fact will follow from Theorem 4.
h"-l h""-l We first establish simplifications of the Lienard-Chipart and Markov
in which n = deg'c(A} and ht.d L~=lP,A~, k = 0,1, .... But if H" = stability criteria. This is done with the help of the Cauchy index.
[hl+iJtj~o, a simple verification shows that
H" = v;. diag[pl' P" , It 0, ... O]V~. (5)
Theorem 2. (The Lienard-Chipart criterion). Let a(l) = I:1=o alAi be a
real polynomial andlet
where v;. stands for the n x n Vandermonde matrix associated with
a(A) = h(A") + ..1.g(A2 )
.1.10 .12 , , ..t.... X10 X2' , X,,_. for some distinct real numbers Xl. X", ,X,,_.
different from .11, .\,,' ... , A.. Using Sylvester's law of inertia and the non- be its L-C splitting. Then a(l) is stable if and only if the coefficients of heAl
singularity of the Vandermonde matrix lI", the relation (4) readily follows. havethe samesignas a, andthe Bezoutian B 1 = B(h,g) is positive definite.
In the general case of multiple real zeros of c(~}. the reasoning becomes
more complicated because the decomposition of a rational function into PRooF. The necessity of the condition follows easily from Theorem 13.7.2
elementary fractions contains sums of expressions of the form (A - Air. and Proposition 2. The converse statemer will be proved for the even case,
for m = -1, - 2, ... , -Ii> where I, is the multiplicity of A, as a zero of c(A). I = 2k, while the verification of the odd case is left as an exercise.
Differentiating Eq. (3). the Markov parameters can be evaluated. The details
LetB1 = B(h, g) > 0 and assume the coefficients ao, "", "', a, of h(l)
are left as an exercise.
The result of Theorem 1 has several applications. We will start with some
are positive. Then obviously heAl :<:!: for all positive ..1. E R. Furthermore,
if h(A 1) = for Al > 0, then g(ll) #: 0 since otherwise the Bezoutian B will
simplifications ofthe Lienard-Chipart and Markov stability criteria, but be singular (Theorem 13.3.1). Thus, although the fractions g(A)/h(A) and
we first present an interesting preliminary result. ..1.g(A)/h(A) may suffer infinite discontinuity as Apasses from 0 to + 00, there
486 13 STABIUTY PROIlLllMll 13.10 1'HB CAUCHY INDEX ANDITSApPLICATIONS 487

is no discontinuity in the value of the quotients from - 00 to + 00, or from We conclude this section with one more application of Theorem I, but
+00 to -00. Hence first we need the following statement, which is of independent interest.

rOO g(A.) = roo ,lg(A.) = Theorem S. The number of distinct zeros of a real polynomial a(A) of
o h(A.) h()') . 0 degree I is equal to the rank of the Hankel matrix. H = [s'+J]I.J~o of the
Newton sums So = I, s" = A~ + A.~ + ... + At, k = 1, 2, ... , where AI'
Furthermore, it is clear that
,1.2' , A, are the zeros of a(A).
o g(A) _ 0 Ag(A) We first observe that H = VV T , where V denotes the I x I
L oo h(A.) - -Loo h(A) ,
PROOF.
Vandermonde matrix associated with AI' A2 , , A,. Hence rank H =
and therefore rank V and the result follows.
rOO g().) :=: /0 g(A) + roo g(A) _ /0 g(A) _ _ /0 ,lg(A)
(8)
Note that the significance of this and the following theorem lies in the fact
-110 h(A) -110 h(A) 0 h(A) - -00 h(A) - -00 h(A) . that the Newton sums so, s b can be evaluated recursively via the co-
efficients of the polynomial. If a().) = ao + alA. + ... + a,_IA.'-l + A', then
But the Bezoutian B 1 = Bez(h, g) is assumed to be positive definite, and
therefore, by Theorem 1, So = I, (9)
00 g(A.) for k = I, 2, ... , I - I, and
L 00 h().) = k.
aos,,_, + als"-'-1 + ... + a'-ls"-l + s" :=: 0, (10)
Applying Theorem 1 again, we find from Eq. (8) that B,Z = Bez(h, ,lg) is
negative definite. Theorem 13.7.2now provides the stability ofthe polynomial for k = I, I + 1,... .
a(A). Theorem 6 (C. Borhardt and G. Jacobi"), The real polynomial a(A) =
Note that Theorem 2 can be reformulated in an obvious way in terms of L\=o a,A,has v different pairs of complex conjugatezeros and 7t - v different
the Bezoutian B2 = Bez(h, g) and the coefficients of h(A) (or g(A. realzeros, where {7t, v, IS} is the inertia of the Hankel matrix H = [s,+ J]I.J~o
The congruence of the matrices B 1 and HI defined in Theorem 13.8.2 of the Newton sumsassociated with a(jI,).
leads to the following result. PROOF. Putting C(A) = a(A), b(A) = o'(A) in Theorem 1, we obtain
Theorem 3 (The Markov criterion). A real polynomial a(l1.) is stable if a'(I1.)
and only if the coefficients of h(A),from the L-C splittingof a(A) have the same Bez( a, a') = /+00
-110 a(A)'
signas the leading coefficient of a(A) and also HI> O. Since
The remark made after Theorem 2 obviously applies to Theorem 3. This (11)
fact, as well as the relations (13.9.7) and (13.9.8), permit us to obtain a
where so, 51' 52' are the Newton sums associated with a(A), it follows that
different form of the Lienard-Chipart criterion.
by formula (13.8.1) the matrices Bez(a, a') and the matrix H of the Newton
Theorem 4 (The Lienard-Chipart criterion). A real polynomial a(A) = sums are congruent. Hence the relation (1) shows, in view of Proposition I,
D=o alA', a, > 0, is stable if and only if one of thefollowing four conditions that the number of distinct real zeros of a(..i) is equal to sig H = 7t - v. By
holds: . Theorem 5, rank H = 7t + v is equal to the total number of distinct zeros of
a(A.) and therefore a(A) has 2v distinct complex zeros. The result follows.
(0) aD > 0, a2 > 0, ; Al > 0, A3 > 0, ;
(b) ao > 0, a2 > 0, A2 > 0, A4 > 0, ;
; Exercise 4. Evaluate the number of distinct real zeros and the number of
(c) ao > 0, a 1 > 0, a3 > 0, ; Al > 0, A3 > 0, ... ; different pairs of complex conjugate zeros of the polynomial a(A) = ,1.4 -
(d) ao > 0, al > 0, a3 > 0, ; A2 > 0, A4 > 0, ... 211. 3 + 211. 2 - 2A + 1.
The detailed proof is left to the reader. t J. Math. Pures App/. 11 (1847), 5()-67, and J. Reine Angew. Math. 53 (18S7), 281-283.
488 13 STABILITY PROBlJlMS

SOLUTION. To 1ind the Newton sums of a(A), we may use formulas (9) and
(10) or the representation (11). We have 80 = 4. 8 1 = 2. 82 == O. 83 == 2,
84 :;: 4, 8, == 2, 86 = 0, and hence the corresponding Hankel matrix H is CHAPTER 14

H= [~ ~ ~l
; Matrix Polynomials
8in<;e InB == {2. I. I}. there is one pair of distinct complex conjugate
zeros (v = 1) and one real zero (n - v == 1). It is easily checked that. actually.
a(..t) == (A - 1)2(A2 + 1). and the results are confirmed. 0

It was seen in Sections 4.13 and 9.10 that the solution of a system of
constant-coefficient (or time-invariant) differential equations. ;f(r) - Ax(t)
:;: I(t). where A e e" ~" and x(O) is prescribed, is completely determined
by the properties of the matrix-valued function U - A. More specifically,
the solution is determined by the zeros of det(U - A) and by the bases
for the subspaces Ker(U - A)l of en, j:;: 0, 1, 2, ... ,n. These spectral
properties are. in turn, summarized in the Jordan form for A or. as we.shall
now say, in the Jordan form for M - A..
The analysis of this chapter! will admit the description of solutions for
initial-value problems for higher-order systems:
+ LI_tXll-1)(t} + ... + Ltx(l)(r} + Lox(r} == I(t).
L 1x(l)(t} (1)
where L o L 1o L,e e""", det L, ::I: 0, and the indices denote derivatives
with respect to the independent variable t. The underlying algebraic problem
now concerns the associated matrix polynomial L(A} :;: L~=o AlLl of degree
I, as introduced in Section 7.1. We shall confine our attention to the case in
which det L1 'I: O. Since the coefficient of A'" in the polynomial det L(A} is
just det L h it follows that L(A} is regular or, in the terminology of Section
7.5, has rank n.
As for the first-order system of differential equations, the formulation
of general solutions for Eq. (1) (even when the coefficients are Hermitian)
requires the development of a "Jordan structure" for L(1}. and that will be
done in this chapter. Important first steps in this direction have been taken

t The systematic spectral theory of matrilt polynomials presented here is due to I. Gohberg,
P. Lancaster. and L. Rodman. See their 1982 monograph of Appendix 3.

489
490 14 MATJUX POLYNOMIALS
14.1 LINIlAIUZATlON OF A MATRIX POLYNOMIAL 491

T~is mean~ that the latent roots of L(..1.) (as defined in Section 7.7) coincide
in Chapter 7; we recall particularly the Smith normal form and the elemen-
with the eigenvalues of C L. Admitting an ambiguity that should cause no
tary divisors; both of which are invariant under equivalence transformations
?iffic~lties, the set of latent roots of L(..1.) is called the spectrum of L(..1.) and
of L(..1.). In Chapter 7, these notions were used to develop theJordan structure
IS wntten a(L). From the point of view of matrix polynomials, the relation
of ..1.1 - A (or of A). Now they will be used in the analysis of the spectral
(4) says that L(..1.) and ..1.1 - C L have the same invariant polynomial of highest
properties of more general matrix polynomials.
degree. However, the connection is deeper than this, as the first theorem
shows.
Theorem 1. The In x In matrix polynomials

L(A) 0 ] and ..1.1 ," _ CL


14.1 Linearization of a Matrix Polynomial [ o /(1-1)..

are equivalent.
It has .been seen on several occasions in earlier chapters that if a(..1.) = PROOF. First define In x In matrix polynomials E(l) and F(l) by
l:~..,o aJ..1.} is a monic scalar polynomial then the related I x I companion
matrix, 100
-..1.1 I

~ ~ ~
F(l) = 0 -AI I
CII =[ : ], (1) .I 0
0 -..1.1 I
-ao -al -:a2 -a'-l
B'-l(l) B ,_ 2(..1.) Bo(..1.)
is a useful idea. Recall the important part played in the formulation of the
1ir~t ~t~ral normal form of Chapter 7 and in the formulation of stability
-I 0 0
cntena ID Chapter 13 (see also Exercises 6.6.3 and 7.5.4). In particular, Dote
that a(..1.) = det(M - C,,), so that the zeros of a(..1.) correspond to the eigen- E(..1.) =
values ofC". . 0 -I
For a matrix polynomial of size n x n,
I o -I 0
L(..1.) = L . 1.JL J, det L , :1= 0, (2)
where B o(..1.) = L" Br + 1(..1.) = ..1.B,.(l) + L ,- r - l for r = 0, 1, ... , I - 2, and
J=O
I = I". Clearly
we formulate the generalization of the matrix in(l):
det E(l) == det Lit det F(l) == 1. (5)

CL-
-[ ~ ~ ~
: I ~] ' (3)
Hence F(..1.) - J is also a matrix polynomial. It is easily verified that

E(l)(M - CL ) = [L(..1.) 0 ]F(A), (6)


-L o -L, -L2 -~-J o /(1-1)..

where LJ = L ,-ILJ for j = 0, 1, ... ,1- 1. Now consider .the relationships and so .
between L(..1.) and ..1.1," - CL The In x In matrix CL is called the (first)
companion matrix for L(A).
The first observation is that the characteristic polynomials satisfy
[L~) I 0
(/-1)11
] -= E(A)(ll - CLlF(..1.)-l

det L(..1.) = det(M," - CL)(det L/). (4) determines the equivalence stated in the theorem. .
492 14 MATlUXPOLYNOMIALS 14.2 STANDARD TRIPLIlSAND PAIRS 493

Theorem 1 (together with Theorem 7.5.1 and its corollary) shows that all SOLUTION. The solution in the latter case is either
of the nontrivial invariant polynomials (and hence all of the elementary

r~ ~ ;rl r~ ~ ;~]
divisors) of L(A.) and A.l - CL coincide. Here, "nontrivial" means a poly-
nomial that is not identically equal to 1.
Exerci,e 1. Establish the relations (4) and (5). or
Exerci,e 2. Show that if 1 is an eigenvalue of CL then the dimension of
Ker(1t - CL ) cannot exceed n. (Note the special case n = 1.) Exercise 6. Let Lj = LJL,-t for j = 0, I, ... , I - 1, and define the second
companion matrix for L(A) to be the In x In matrix
Extirci,e 3. Formulate the first companion matrix and find all elementary
divisors of (a) L(l) = 1 .. and (b) L(A) = A'I.. - I... 0
'1
Observe now that there are other matrices A, as well as CL , for which
)J - A and diag[L(,1.), I{I-1),,] are equivalent. In particular, if A is a matrix
similar to CL , say A = TC LT- 1, then )J - A = T(AJ - CL)T- 1, so that
Al - A and M - CLare equivalent and, by the transitivity of the equivalence
relation, M - A and diag[L(A), 1(1-1 I J are equivalent. It is useful to make
a formal definition: If L(A) is an n x n matrix polynomial of degree I with Also (see Chapter 13) define the symmetrizer for L(A.) to be
non singular leading coefficient, then any In x In matrix A for which A.l - A L1 L2 ... L ,- 1 L,
and diag[L(A), 1(/-1)11] are equivalent is called a linearization of L(A).
L2 L1 0
The next theorem shows that the matrices A that are similar to CLare,
in fact, all of the linearizations of L(A). (7)

Theorem 1. Any matrix A is a linearization of L(A) if andonly if A is similar u., L,


to the companion matrix CL of L(A). L1 0 0
PROOF. It has been shown that if A is similar to CL then it is a linearization Show that SL is nonsingnlar, that C 2 = SLCLSi 1
and, hence, that C 2 is
ofL(A). Conversely, if Ais a linearization of L(A) then Al- A,.., L(A) + a linearization of L(A). 0
1(1-1)11 (using>; as an abbreviation for "equivalent to"). But by Theorem
+
I, L(l) 1(,-t)" .... A.l - CL so, using transitivityagain,)J - A,.., A.l - CL
Then Theorem 7.6.1 shows that A and C L are similar.
This result, combined with Theorem 6.5.1 (or 7.8.2),shows that there is a 14.2 Standard Triples and Pairs
matrix in Jordan normal form that is a linearization for L(A) and, therefore,
it includes all the information concerning the latent roots of L(A) and their
algebraic multiplicities (i.e., the information contained in a Segre character-
In the spectral theory of a general matrix A E e""", the resolvent function
istic, as described in Section 6.6).
(ll - A)-l plays an important part on several occasions. For a matrix
Exercise 4. Use Exercise 6.6.3 to find a matrix in Jordan normal form that polynomial L(A), as defined in the relations (14.1.2), the matrix-valued
is a linearization for a scalar polynomial (i.e., a 1 x 1 matrix polynomial function L(Ar 1 (defined for aliA; a(L plays the corresponding role and is
L(A. called the resolvent of L(A). The next step in our analysis shows how the
Exercise 5. Find matrices in Jordan normal form that are linearizations resolvent"of the first companion matrix CL can be used to determine the re-
for the matrix polynomials of Exercise 3. Do the same for solvent of L(A).
Theorem 1. For every complex Af; a(L),
L(A) = [A; ~2A]. L(A)-1 = Pt(Al - C L)-tR 1, (1)
494 14 MATRIX PoLYNOMIALS 14.2 STANDARDl'RIPLIlS AND PAlRS 495

where PI' R 1 are the matrices of size n x nland In x n, respectively, defined PROOF. Since (U, 1; V) is a standard triple, there is a nonsingular matrix
by S such that
PI = US, CL = S-lTS, HI = S-IV.

PI = [Ill 0 .. . 0],
HI =[ !].
L,-l
(2)
Observe that (ll - C L)- 1 = (S-I(ll - T)S)-1 = S-I(M - T)- I S, and
substitute into Eq. (1) to obtain the result.

Thus we obtain a representation of the resolvent of L(A) in terms of the


PaooF.First observe that, because of Eq. (14.1.4), the matrix (ll - Cd- 1 resolvent of any linearization for L(.il.). Our next objective is to characterize
is defined at just those points A where L(A)-1 exists. Now consider the proof ' standard triples for L(A.) in a more constructive and direct way; it is not
of Theorem 14.1.1 once more. The relation (14.1.6) implies that generally reasonable to try to find the matrix S of the proof of Theorem 2
explicitly. First we will show how to "recover" the companion matrix CL

[ LOro i 0]
11l-1)n
= F(.t)(ll _ Cd- 1E(.t)- 1. (3)
from any given standard triple.

Lemma 1. Let (U, T, V) be a standard triplefor L(A.) and define


Using the definition of E(,t), it is easily verified that

Q ~T 1.
=[ UT'-1 (5)

Then Q is nonsingular and CL = QTQ - 1.


and this means that the first n columns of EOr 1 are simply equal to HI'
Since the first n rows of F(,t) are equal to Pl' if we equate the leading n x n PaOOF. Recall that PI = [In 0 ... 0] and C L is given by Eq. (14.1.3).
submatrices on the left and right of Eq. (3), we obtain Eq. (1). It is easily seen that
Three matrices (U, T, V) are said to be admissible for the n x n matrix

P~~L] I'N'
polynomial L(A) of degree I (as defined in Eq. (14.1.2 if they are of sizes
nx In, In x In, and In x n, respectively. Then two admissible triples for (6)
L(,t), say (0 h T1, VI) and (U 2, T2 , V2 ) , are said to be similar if there is a non-
singular matrix S such that
[Pl~tl =

U1 = U2 S, T1 = S-l1;S, VI = S-IV2 (4) Substitute for PI and CL from Eqs. (4) to obtain QS = I'll' Obviously, Q
, is nonsingular and S = Q-l. In particular, CL = QTQ-l.
Exercise 1. Show that similarity of triples is an equivalence relation on
the set of all admissible triples for L(A); '0 In fact, this lemma shows that for any standard triple (U, T, V), the primitive
triple (PI' CL , HI) can be recovered from Eqs. (4) on choosing S = Q-l,
Now, the triple (P b CL , HI) is admissible for L(,t), and any triple similar
where Q is given by Eq. (5).
to (Ph CL , H 1)is said to be a standardtriple for L(A.). Of course, with a trans-
forming matrix S = I, this means that (P I' CL , HI) is itself a standard triple.. Theorem 3. An admissible triple (U, T, V) for L(,t) is a standard triple if and
Note that, by Theorem 14.1.2, the second member of a 'standard triple for only if the threefollowing conditions are satisfied:
L(.:I.) is always a linearization of L(.:I.). Now Theorem 1 is easily generalized.
(a) The matrix Q ofEq. (5) is nonsingular;
Theorem 2. If(U, T, V) is a standard triplefor L(A) and A (J(L), then
(b) L,UT' + L'_lUT'-l + ... + L1UT + LoU = O.
(4)-1 = U(M - T)-lV. (c) V = Q-IR bwhereQandR l aregiven by Eqs. (5) and (2), respectively.
496 14 MATRIX PoLYNOMIAlS 14.2 STANDARD TRIPLIlS AND ~A1RS 497

PROOF. We first check that conditions (a). (b), and (c) are satisfied by the This theorem makes it clear that a standard triple is completely determined
standard triple (PI' CL R I ) . Condition (a) is proved in the proof of Lemma by its first two members. It is frequently useful to work in terms of such a
1. The relation (6) also gives pair and so they are given a name: a standard pair for L(A) is the first two
members of any standard triple for L(A). Thus (U, T) is a standard pair for
PtC!. =: (PICi-l)CL == [0 0 I]CL =: [-0 ... -,-t]. L(A) if and only if conditions (a) and (b) ofthe theorem are satisfied.
Consequently,
Exercise 2. Show that, if (U, T) is a standard pair for L(A), then (U, T)
(7) is also a standard pair for any matrix polynomial of the form AL(A), where
But also, detA:;l:O.

~~L]
Exerdse 3. Let C2 be the second companion matrix for L(A), as defined in
'f l:;JPtC{ == [L o
~o
t; '" L,-l] [
:
== [L o t.; ... L,-t]. Exercise 14.1.6. Show that the admissible triple

P1Ci- l
So, a.dding the .last two relations. we get B..
o LJP I C{ =: 0, which is just the
relation (b) With (U, T. V) replaced by (Ph CL , R t ) . In this case 'l =: I,
so condition (c) is trivially satisfied.
Now let (U, T, V) be any standard triple and let
forms a standard triple for L(A).
U =: PIS, T == S-lCLS, V=: S-tR t.
Then Exerdse 4. Let (U, T, V) be a standard triple for L(A), and let r be a con-
tour in the complex plane with o(L) inside r. If I(A) is a function that is
analytic inside r and continuous on r, show that
== UT PtCL'jO._
U ] _ [Pt
'l . - . S-S, 2~i ff(A)L(A)-1 dA == Uf(T)V.
UT~-l Pt~i-l
[

using (6) so that (a) is satisfied. Then


Exercise S. If U, T) is a standard pair for L(A), show that in the termin-
ology of Section 5.13, (U, T) is an observable pair of index I. (This is also a


J=O
LJUTi == (f
i-O
LJPtCi)S == 0,
sufficient condition for (U, T) of sizes n x In and In x In, respectively,
to be a standard pair for some monic matrix polynomials of degree I; see
Theorem 1). 0
which is condition (b). Finally, QV == S(S-l R l ) == R h which is equivalent
to (c). . There is a useful converse statement for Theorem 2.
Conversely. suppose that we are given an admissible triple for L(A) that Theorem 4. Let (U, T, V) be an admissible triple for L(A) and assume that
satisfies conditions (a). (b), and (c). We must show that (U. T, V) is similar
to (Ph CL, R l ) First, from the definition (5) or'l,h is obvious that U == PtQ L(A)-l == U(U - T)-lV. (8)
and, using condition (a), Q is nonsingular. Then(U, T, V) is a standard triplefor L(A).
Then use condition (b) to verify that 'IT =: CLQ by multiplying out the
partitioned matrices, Thus, T =: Q-ICLQ. PROOF. For IAI sufficiently large, L(A.)- 1 has a Laurent expansion ofthe form
Combine these results with condition (c) to get . L(A)-l == A-'L,-t + r'-lA l + A-'-2A 2 + ... , (9)
U == PtQ, T == Q-ICL'l, V=: Q-tR l ,
for some matrices AI. A 2 , Now let r be a circle in the complex plane
and so (U, T, V) is similar to (Ph CL , R 1 ) and hence is a standard triple. with centre at the origin and having a(L) and the eigenvalues of T in its
498 14 MATlUX POLYNOMIALS 14.2 STANDARD liuPLIlS AND PAIRS
499
interior. Then integrating Eq, (9) term by term and using the theorem of It has now been shown that U and T satisfy conditions (a) and (b) of
residues,
Theorem 3. But condition (c) is also contained in Eqs. (11). Hence (U. T. V)
is a standard triple.
~ f AlL(A)-l dA =
2m J
{OL, -I
if j = O. 1,... 1- 2;
if j=l-l. Coronary 1. If(U. T. V) is a standard triple for L(l). then (VT. TT. U'f) is a
But also (see Theorem 9.9.2), standard triple for LT(A.) A L~=o lJLT, and (V*. T*. U) is a standard triple
for L(A.) ~ B=o AJLt.
~f.
2m L
Ai(AI - T)-1 dA. = v. j = 0.1.2.... (10)
PROOF. Apply Theorem 2 to obtain Eq. (8). Then take transposes and apply

It therefore follows from Eq. (8) (using to denote an n x n matrix of no Theorem 4 to get the first statement. Take conjugate transposes to get the
immediate interest) that . second.

Exercise 6. Let (U. T. V) be a standard triple for L(l) and show that
.. 0 0

~L( ~
,
A'-' ] 0
: L(lr 1d1 = UTiV = {~i"1 if j = O. I, .... 1 - 2;
if j=l-l.
(12)
0 L I-1
1'-1 12l-:Z
Hint. Examine Eqs. (11). 0
L,l *
UTV '" The properties in the preceding example can be seen as special cases of the
following useful result for standard triples.

Theorem 5. If (U. T, V) is a standard triple for L(A), S is the symmetrizer


for L(A.) as defined in Eq. (14.1.7), and matrices Q. R are defined by

= [ ~T J
UT'-l
[V TV .. T'-lV). (11) Q=
UT
U . ],
[UTt-1
R = [V TV ... T '- 1VJ. (13)

Both matrices in the last factorization are In x In and must be nonsingular, then
since the matrix above of triangular form is clearly nonsingular.
Using (10) once more we also have. for j = O. 1... ,I - 1.
(14)
1.
o =; 2 r
mJ r
1JL(1)L(1)-1 d1
1
= 2m .Sr 1iL(A)U(1I - T)- l V dA PROOF. First use Exercise 14.1.6 to write C L = Si1C:zSL (where C:z is the
second companion matrix for L(l so that. for j = O. 1. 2...
= (L, UT' + ... + L 1 UT + L o U)TJV. CiR 1 = Si 1C!(SLR 1) = SilC~PI
Hence,
Now it is easily verified that
(L,UT'+"+L 1UT+LoU)[V TV.:. T'-lV]=O.
[R 1 CLR1 ... q- 1 Rtl = Si 1[PI C:zPI ... Q- 1PD = Si 1.
and since [V TV '" T 1- 1VJ
is nonsingular, D-o LJUTJ = O.
(15)
soo 14 MATRIX POLYNOMIAlS 14.3 THE STltUcruRn OF JORDAN 1'RIPUlS 501

For any triple (U, T, V)there is a nonsingular S such that U =P 1S.


First observe that for eachj, the vector x) #: 0. otherwise Q has a complete
T = S-1C L S. and V = S-1 R 1. Thus, using Eq. (6), column of zeros. contradicting its nonsingularity. Then note that each term
in Eq. (2) is an n x In matrix; pick out the jth column of each. Since J is
assumed to be diagonal. it is found that for j = 1,2, ... In.
LIA~x) + ... + L 1A jx) + Lox) = O.
In other words L(A)x) = 0 and Xi 0; or x) E Ker L(l).
Now, the nonzero vectors in Ker L(l) are defined to be the (right) latent
and. using Eq. (lS). vectors of L(l) corresponding to the latent root AJ Thus. when (X, J) is a
R = S-1[R 1 CR1 .. Ct- 1R1] = S-1Si 1. Jordan pair and J is diagonal, every column of X is a latent vector of L(l).
The general situation. in which J may have blocks of any size between I
Thus. RSQ = (S-ISi 1)SS = 1. as required. and In. is more complicated, but it can be described using the same kind of
procedure. First it is necessary to generalize the idea of a "latent vector" of
a matrix polynomial in such a way that it will include the notion of Jordan
chains introduced in Section 6.3. Let L(')(A) be the matrix polynomial ob-
14.3 The Structure of Jordan Triples tained by differentiating L(l) r times with respect to A. Thus. when L(l) has
degree I, L(')(l) = 0 for r > I. The set of vectors xo, XIo ' Xre, with Xo 0,
is a Jordan chain of length k + 1 for L(A) corresponding to the latent root
By definition, all standard triples for a matrix polynomial L(A) with non- .to if the following k + 1 relations hold:
singular leading coefficient are similar to one another in the sense of Eqs.
(14.2.4). Thus. there are always standard triples for which the second term. L(A.o)xo = 0;
the linearization of L(A), is in Jordan normal form, J. In this case, a standard I
triple (X. J, Y) is called a Jordan triple for L(A). Similarly, the matrices L(Ao)X l + 1! L(I)(Ao)xo = 0;
(X, J) from a Jordan triple are called a Jordan pair for L(A). We will now (3)
show that complete spectral information is explicitly given by a Jordan
triple.
Let us begin by making a.simplifying assumption that will subsequently
be dropped. Suppose that L(A) is an n x n matrix polynomial of degree 1
with det L, #: 0 and with all linear elementary divisors. This means that any Observe that xo, the leading vector of the chain, is a latent vector of
Jordan matrix J that is a linearization for L(A) is diagonal. This is certainly L(A.) associated with 10 , and the successive vectors of the chain each satisfy
the case if aUlatent roots ofL(A) are distinct. Write J = diag[llt 1z . . . 1,J. an inhomogeneous equation with the singular coefficient matrix L(Ao).
Let (X. J) be a Jordan pair for L(1) and let x) e en be the jth column of
X. for j = 1.2, ... In. Since a Jordan pair is also a standard pair, it follows Exercise 1. Show that if L(A.) = AI - T, then this definition of a Jordan
from Theorem 14.2.3 that the In x In matrix chain is consistent with that of Eqs. (6.3.3). 0

Q= [+ 1
xr:
(1)
Now we will show that if (X, J) is a Jordan pair of the n x n matrix poly-
nomial L(A), then the columns of X are made up of Jordan chains for L(A).
LetJ = diag[J l' J,]. where J j is a Jordan block ofsizen).j = 1.2... s.
Form the partition X = [X 1 X Z X,]. where X J is n x n) for
j = 1, 2, ... , S, and observe that for r = 0. 1, 2, ... ,
is nonsingular, Also,
L,XJ' + ... + L 1XJ + LoX = O. (2) XJ' = [X lJ'i X zJ2 '" X,,r.J.
502 14 MATRIX PoLYNOMIALS 14.3 TIm STRUCTURIl OFJORDAN TRIPLES 503

Thus. Eq. (2) implies that,for j = I, 2,;.. , s, Now observe that the Jordan chain for L(A) corresponding to the latent
root A2 = 1 has only one member, that is, the chain has length 1. Indeed,
LIXJJ~ + ... + L 1XJJJ + LoXJ = O. (4) if Xo = [0 IX]T and '# 0, the equation for x 1 is
Now name the columns of X J:
L(l)Xl + L(1)(I)xo = G ~}1 + [~ ~] [~] = [~] ,
X J = [x{ x{ ... ~J]' which has no solution.
With the convention AJ-" = 0 ifr < p, we can write The corresponding equation at the latent root Al = 0 is

A~
J ({)lJ- 1 (i );tj-n J +1
L(O)Xl + L(1)(O)xo = [~ ~]Xl + [~ _~] [~] = [~] ,
nJ - 1
which has the general solution Xl = [e fJJT, where pis an arbitrary complex
0 A'i number.
J) = A third member of the chain, x 2 , is obtained from the equation
A'i G)Aj-l
L(0)X2 + L(1)(0)x 1 + !V2l(O)XO = [~ ~]X2 + [~ _~]~] + [~ ~][~]
O 0 lJ
Using the expressions in (4) and examining the columns ~ the order
1,2, ... ,nJ, a chain of the form (3) is obtained with AO replaced by Aj.and
= [~ ~]X2 + [2/X ~ p] = [~l
soon. Thus, for each i. the columns of X J form a Jordan chain of length nj which gives x 2 = [-2/X + P y]Tfor arbitrary y.
corresponding to the latent root Ai' For a fourth member of the chain, we obtain
The complete set of Jordan chains generated by Xl' X 2 , X, is said
to be a canonical set of Jordan chains forL(A).The construction ofa canonical L(0)x3 + L(1)(0)x2 + !V2l(O)Xl = [~ ~]X3 + [~ _~] [ - 2~ + P]
set from a given polynomial L(A) is easy when J is a diagonal matrix: simply
.choose any basis of latent vectors in Ker L(A]) for each distinct AJ. The con- .
struetion when J has blocks of general size is more delicate and is left for
mOJ:'eintensive studies of the theory of matrix polynomials. However. an
+[~~] [~]
= [~ ~]X3 + [-2ac +/X2P -1] = [~J.
example is analyzed in the first exercise below, which also serves to illustrate
the linear dependencies that can occur among the latent vectors and Jordan
chains. which has no solution since /X yt. 0, whatever value of l' may be chosen.
Exercise 2. Consider the 2 x 2 monic matrix polynomial of degree 2, Choosing IX = I, P= y = 0, we have a chain

L(A) = A2[~ ~] + A[~ _~] + [~ ~] ~ [A: 1 A(A ~ 1)). xo=[~J. xl=[~l Xl=[-~].
corresponding to the latent root A1 = O.
and note that det L(A) = A3(A - 1).Thus, L(A) has two distinct latent roots: It is easily verified that the matrices
A1 = 0 and A,2 = 1. Since

. L(O) = [~~] and L(I) = [~ ~], 0 0 1 -2]


X= [ 1 1 0 0'
any vector of the form [0 ]T (IX :f:. 0) can be used as a latent vector for
L(A,) corresponding to either of the latent roots. form a Jordan pair for L(A.).
504 14 MAl1UX PoLYNOMIALS 14.3 TIm STRUCTURE OFJORDAN TtuPLIlS 505

Bxerciae J. Find a Jordan pair for the matrix polynomial discussion of Jordan pairs can he used to describe Y.Thus, let Ybe partitioned
in the form
.. [A2 - :u- 2 A+ 2 ]
L(1) .. ), + 2 12 - 2l - 2 .
Hint. First confirm that u(L) == {-I, 0, 1,4}.
Exercise 4. As in Exercise 14.1.5, let

Verify that
L(l) == e; ~21]. where 1)is n} x n for} == 1,2, ... , s. Then yTp == [YIp. ... Y:P J, and
it follows that. for each}, the transposed rows of 1) taken in reverse order
form a Jordan chain of LT(l) oflength nJ corresponding to A.J.
These Jordan chains are called left Jordan chains of L(A); and left latent
vectors of L()') are the nonzero vectors in Ker LT ( ), } ) . } -1,2, ... s. Thus.

x == o r 100]
. l1 0 1 0 '
the leading members of left Jordan chains are left latent vectors.
Exercise 6. (a) Let A EC,P' n and assume A has only linear elementary
divisors. Use Exercises 4.14.3 and 4.14.4 to show that right and left eigen-
vectors of A can be defined in such a way that the resolvent of A has the
is a Jordan pair for L(A).
spectral representation
Eurcise 5. Find a Jordan pair for each of the scalar polynomials LI(l) == n T
(A - -to)' and L 2(1) == (A. - 1IY(A - A.2)9, where ..1. 1 :I: oi 2 and p, q are posi- (M - A)-I == L xJY) .
tiveintegers. 0 J=l A - A.j
Now suppose that (X, J, Y) is a Jordan triple for L(A.). The roles of X and (b) Let L(A.) be an n x n matrix polynomial of degree I with det L, :I: 0
J are understood, but what about the matrix Y? Its interpretation in terms and only linear elementary divisors. Show that right and left latent vectors
of spectral data can he obtained from Corollary 14.2.1, which says that for L(l) can be defined in such a way that
(yT,J'f, XC)js a standard triple for LT(A.). Note that this is not a Jordan 'n
=1: T
x~J
triple, becau~~ t~e linearization JT is the transpose ofa Jordan form and so L(l)-I .
is not, in general. in Jordan form itself. To transform it to Jordan form we )=IA-A.J
introduce the matrix Exercise 7. Suppose that A.)E a(L) has just one associated Jordan block
P == diag[P 1, P2 , , p.J, J), and the size of J) is v. Show that the singular part of the Laurent expansion
for L -I(A) about AJ can be written in the form
where p) is the n) x PI) rotation matrix:
LL
y y-I< T

[! :;:.:~. ~].
Xy_I<_rY,
1<= 1,-0 (A. - A.l'
r, = for some right and left Jordan chains {XI''" xv} and {)lI>"" )ly}, re-
1 0 ... 0 spectively (see Exercise 9.5.2).

(Recall that J == diag[J 10 J J, where JJ has size nJ.) Note that p 2 == I


Hint. First observe that
(A - A.Jr l
Y
(so that p-l == P) and p. == P. It is easily verified that for each}, PjJJPJ - J) (A - AJ) - 2 (A - AJr
and hence that pJ'fp == J. Now the similarity transformation o (A. - A.)-I
yTp, J = pJTP, PXT
(A. - A.J)-I (A. - AJ)-2
of (yT J'f, X T ) yields a standard triple for LT(A) in which the middle term is
in Jordan form. ThuS.(yTP, J, PXT) is a Jordan triplefor LT(A), and our o 0 (A - A)-l
506 14 MATRIX PoLYNOMIALS 14.4 ApPUCATIONS TO DIFFERENTIAL EQUATIONS 507

14.4 Applications to Differential Equations In other words.

Consider the constant-coefficient-matrix differential equation of order I

r"(t)
given by so that. using Eq, (3),
+ L,-Ix"-(t) + ... + L lx(1)(t) + Lox(t) = f(t),
L,r')(t)
where L o L Io L,e C"XR. det L, =F O. and the indiceson x denote deriva-
(1)
(:t - ..1.0 1 1 = O. (4)

tiveswith respect to the independent variable t. The vectorfunction f(t) has Let us represent the differential operator L(dldt) in terms of (dldt) - ..1.0 1.
valuesin CR, is piecewise continuous. and supposed given; x(t) is to be found. To do this, first write the Taylor expansion for L(A) about ,1.0 :
The equation is inhomogeneous when f(t) =F O. and the equation .
L,x'l)(t) + ... + L lx(1)(t) + Lox(t) = 0 (2) L(A) = L(Ao) + 11, L(1)(Ao)(A - ..1.0 ) + ... + h L(I)(Ao)(A - ..1.0)'. I
is said to be homogeneous and has already been introduced in Section 7.10.
Recall the discussion in that section of the space /l(CR) of all continuous
It follows that I
I',

vector-valued functions x(t) defined for real t with values in CR. We now d ) = L(Ao) + 111 L(1)(Ao) (d
L ( dt dt - ..1.0 1) + ... + TI1V,)(..1.0 ) (ddt - Aol )'. \
discussthe relationship betweenJordan chainsof L(A) and primitivesolutions IL
from ~(C") of the homogeneous Eq. (2). . (5)
Observe first that if . 1.0 is a latent root of (..1.) with latent vector xo. so
that L(Ao)xo = 0, then Eq. (2) has a solution xoe AoI, and conversely. This is
because

( !!..)
dt J(X0 eAo') = ,V0 x 0 e Ao',
Now Eq. (4) impliesthat

d )1
( dt - Aol . "l(t) = 0, j=2.3.....
I
so that Thus I
I
L(:t) (xoe
Ao
') = Jo LJ(:tY(xoe
Ao
') = L(Ao)xo = O. L(:t )"l(t) = L(Ao)"t(t) + 1\ L(1)(Ao)"o(t) -I
I
Now suppose that xo. x t are two leading members of a Jordan chain
of L(A) associated with the latent root ..1.0 ' We claim that = (L(Ao)xo)te40' + (L(Ao)XI + 1\ L(1)(Ao)xo)e Ao
' I ;~

,
"o(t) = xoe Ao' and "l(t) = (txo + xl)eAo' = O. j

are linearly independent solutions of Eq. (2). We have already seen that the . using the first two relations of Eqs, (14.3.3). which define a Jordan chain.
first function is a solution. Also. the first function obviously satisfies Thus fIlet) is also a solution of Eq. (2).

(:t - Aol)"o(~ =0. (3)


To see that "o(t) and "l(t) are linearly independent functions of t, observe
the fact that lXIIo(t) + fl"l (t) =0 for all t implies
I
i!
V'

t(lXXo) + (lXXI + flxo) = o.


Now observe that. for the second function.
. Since this is true for all t we must have IX = 0, and the equation reduces.to
flxo = O. Therefore fl = 0 since Xo =F O. Hence "o(t) and "l(t) are linearly
(:t - ..1.0 1) (tx o + xl)e Aor
= xoe Aor

independent.
14 MA11UX POLYNOMIALS
145 GENIlRAL SOLUTIONS OF DlFI1I!IU!NTIAL EQUATIONS 509
508
Now the fun~tion eAa' can be cancelled from this equation. Equating the
The argument we have just completed cries out for generalization to a
vect~r coefficients of powers of t to zero in. the remaining equation, th!=
sequence of functions IIo(t). "t(t)... "I:-t(t). where %o.Xt. . XI:-t is a relations (14.3.3) are obtained and since we assume %0 :F 0 we see that Ao is
Jordan chain and a latent root of L(..t)and XO, Xl' ... Xk- 1 is a corresponding Jordan chain.
lIz(t) = (;, tZxo + tXt + X2 )e""" Exercise 1. Complete the proof of theorem 1.
Exercise 2. Use Theorem 1 and the results of Exercise 14.3.2 to find four
and so on. It is left as an exercise to complete this argument and to prove linearly independent solutions of the system of differential equations
the next result. dZxl
Theorem 1. Let Xo. Xt %t-t be a Jordan chainfor L(A) at Ao Then
dtZ == 0,
the k jUnctions d Zx 2 dx, dx z
Uo(t) == %oe Ao' . dtZ + tit - (it + X t = 0.
Ut(t) == (txo + Xt)eA'. SoLUTION Written in the form of Eq, (2). where x == [Xl xzY, we have

'C-lt) )
[~ ~]X<Z)(t) + [~ _ ~]x(1)(t) + [~ ~]x(t) == O. (7)
lIt-l(t) == ~o]iXt-t-) e"'" (6)
So the associated matrix polynomial is L(..t) of Exercise 14.3.2.
are linearly independent solutionsof Eq. (2). . There is a latent root ,11 = 1 with latent vector [0 I]T. hence one solution
1S
There is also a converse statement.

Theorem 1. Let xo, Xl' ... XI:-t e e" with Xo:F O. 1/ the vector-valued "o(t) = [~}"
function Then there is the latent root ..t2 == 0 with Jordan chain (0 1]7. (1 O)T.
f-t ) [ - 2 O]T. and there are three corresponding linearly independent solutions:
Ut-t(t) =. ( (k _ I)! Xo + .,. + tXt-Z + Xt-l eJ.o'

is a solution0/ Eq. (2), then .to is a ,"tent root 0/ L(.il) and xo, Xl"'" Xt-l is
IIo(t) = [0]l '
t " l (t) = [1]
t' uz(t) == [-2"+ t]
ttZ .
a Jordan chain/or L(.il) corresponding to AO' It is easily verified that {vo(t), uo(t). "t(t),lIz(t)} is a linearly independent
PROOF. Let functions "o(t), ... , Ut- 2(t) be defined as in Eqs, (6) and, for
set of vector-valued functions in rt(C Z) . Furthermore. it follows from
Theorem 7.10.1 that they form a basis for the solution space of Eq. (7). 0
convenience, let ",(t) be identically zero if i == -1, -2, .. , . It is found (see
Eqs. (3) and (4 that

d )1
( dt - Aol IIt-t(t) == IIt-i-l(t), 14.5 General Solutions of Differential Equations

for j == 0, 1, 2, .... Using the expansion (5),


Results 'obtained in the preceding section are natural generalizations of
Proposition 9.10.1. Indeed. this line of investigation can be pursued further
0== L(:t)"k-l(t) to generalize Theorem 9.10.1 and show that the In-dimensional solution
space ofEq. (14.4.2) (see Theorem 7.10.1) can be spanned by linearly mdepen-'
== L(Ao)uk-l(t) + L(l)(J'0)ut-2(t) + " .+ ~ L(I)(A.o)uI:-I-t(t}. dent vector functions defined (as in Theorem 14.4.1) in terms of all latent
510 14 MATRIX PoLYNOMIALS 14.5 GENIlRAL SoLUTIONS OF DIFFERENTIAL EQUATIONS 511

roots and a canonical set of Jordan chains. The completion of this argument We have now proved the following result.
is indicated in Exercise 2 oftbis section. But first we turn our attention to the
Lemma 1. Everysolution 01Eq. (14.4.1) hastheform (5) for some % E ~.
question of general solutions of initial-value problems in the spirit of Theo-
rems 9.10.4 and 9.10.5. Our main objective is to show how a general solution Note that this lemma achieves the preliminary objective of expressing all
of Bq, (14.4.1) can be expressed in terms of any standard triple for L(A). solutions of;:Eq. (14.4.1) in terms of the standard triple (Ph CL , R I ) . Also,
Consider first the idea of linearization of Eq. (14.4.1), or of Bq, (14.4.2), the general s~lution ofEq. (14.4.2) is obtained from Eq. (4) on settingf(t) == O.
as used in the theory of differential equations. Functions xo(t), ... , X,-let) The step f~om the lemma to our main result is a small one.
are defined in terms of a solution function x(t) and its derivatives by
Theorem 1. Let (U, T, V) be a standard triplefor L(A). Then every solution
xo(t) = x(t), XI(t) = xU)(t), .. , X,-let) = x(r-l)(t). (1) ofEq. (14.4.1) has theform
Then Eq, (14.4.1) takes the form
dX,-I(t)
L, dt + L,-IX'-l(t) + ... + Loxo(t) = I(t). (2)
x(t) == UeT1% + U E eT(I-')VI(r) dr, (6)

for some % E C'".


If we define
PROOF. Recall that, from the definition of a standard triple (see the proof of

get) = [ !],
Lil/(t)
(3)
Theorem 14.~.2), there is a nonsingular S E C'"" '" such that
PI == US, CL == S-ITS, R I = S-lv.
Also, using Theorem 9.4.2, eCL == S-leTS for any IX e IR. Consequently,
P trL == UeTflS,
1
P leCLf1.R ::;; UeTIIV,
1
and let CL be the now-familiar companion matrix for L(A), Egs. (1) and (2)
can be condensed to the form Substitution of these expressions into Eq. (5) with IX == t concludes the
proof.
df(t)
---;jt= CLf(t) + get). (4) In practicethere are two important candidates for the triple (U, T. V) of
the theorem. The first is the triple (PI' CL R l ) , as used in the lemma that
In fact, it is easily verified that Bqs, (1), (2), and (3) determine a one-to-one expresses the igeneral solution in terms of the coefficients of the differential
correspondence between the solutions .r(t) o/Eq. (14.4.1) and f(t) o/Eq. (4). equation. ThCl second is a Jordan triple (X, J, Y), in which case the solution
Thus, the problem is reduced from order I to order 1 in the derivatives, but is expressed in terms of the spectral properties of L(A). The latter choice of
at the expense of increasing the size of the vector variable from I to In and, triple is; of course, intimately connected to the ideas of the preceding section.
of course, at the expense of a lot ofredundancies in the matrix CL
Exercise 1. Suppose that, in Eq. (14.4.1),
However, we take advantage of Theorem 9.10.5 and observe that every
solution of Eq. (4) has the form I(t) == {fo for t ~ 0,

x(t) = eCL1% + L eCL(I-')g(r) dr,


o for t< O.
Let L o be nonsingular and dis A < 0 for every AE u(L). Show that the solu-
tion x(t) satis6es
for some % E Crll. Now recall the definitions (14.2.2) of matrices PI and R 1
in the standard triple (PI' Cb R I) for L(A). Noting that Plx(t) == .r(t), it is
found that, multiplying the last equation on the left by PI' 1"'00

.r(t) ~ P 1eCL1% + PI E eCL('-')RI!(r) dr. (5)


E.rereise 2. tet (X, J) be a Jordan pair for the matrix polynomial L(A)
associated with Eq. (14.1.2). Show that every solution of that equation has
512 14 MATRIX PoLYNOMIALS 14.6 DIfFlllUlNCB EQUATIONS 513

where L o L..... , L , e C"lC , det L,:F O. the sequence / = (/0'/""') of


lI
the form x(t) =: X tlr% for some %e C In Hence show that the solution space
has a basis of Infunctions ofthe form described in Theorem 14.4.1. 0 vectors in C" is given, and a sequence x = (Xo, Xl'" .), called a solution of
Eq. (I), is to be found.
Theorem 1 can be used to give an explicit representation for the solution The theory of difference equations is analogous to that of differential
of a classical initial-value problem. equations developed in the previous sections, and therefore some of the
proofs and technicalities are omitted from this presentation.
Theorem 2. Let (U, T, V) bea standard tripleforL(..t). Thenthere is aunique The following results are analogues of Theorems 14.4.1 and 14.4.2 for the
solution of the differential equation (14.4.1) satisfying initial conditions homogeneous difference equation
xU)(O) = XJ' j = 0, 1, ... , I - 1, (7) L,xJ+' + LI+IXj+l-l + '"+ L 1xj+l + Loxj =: 0, i = 0, 1, ... , (2)
for any given vectors xo, Xl' ... , X,-l e C". This solution is given by Eq. (6) associated with the matrix polynomial L(..t) = L};o ..tiL}, where det L , =F O.
with Once again, we recall the discussion and result of Section 7.10 concerning
the homogeneous equation. In particular, recall that 9'(C") denotes the
(8) linear space of infinite sequences of vectors from cn. Here, it is convenient
to define "binomial" coefficients e) A 0 for integers j, r with j < r,
where SL is the symmetnzerfor L(..t).
Theorem 1. If xo, Xl' ... ' X"-1 is a Jordan chain for L(..t) corresponding to
PROOF. We first verify that the function x(t) given by Bqa,(6) and (8) does. ..to, then the k sequences (u~), u~), ...)e 9'(C"). s = 0, 1, ... k - 1. with
indeed, satisfy the initial conditions (7). Differentiating Eq. (6), it is found that "~o) = ..tbXo

:~~~)]. ~T ]%
=: [ = Q,;. (9)
rJjl) ~ G) ..t{,-l XO+ ..t{,X 1,

[
r (O)
l - ll U~'- 1 ("-1) ~ ( j \ ..t j -"+1 x 0 + (k _j 2)\ ..tj-H2X + ... + (i)..tJ-1x
") k- 1) 0 0 1 1 0 "-2
in the notation ofEq. (14.213). But the result of Theorem 14.2.5 implies that
=
QRSL I, and so substitution of % from Eq. (8) into Eq. (9) shows that + ..tJX"_I' (3)
m
x(t) defined this way satisfies the initial conditions. for i = 0, 1... are linearly independent solutions ofEq. (2).
Finally, the uniqueness of % follows from Eq. (9) in view of the invertibility
ofQ. 1'beorem 2. Let xo, Xl . . . X"-l eCn, with Xo =F O. If the sequence (u~-ll,
u<f-l), ...) e 9'(Cn) , with

("-1)
")
"~1 (i)]j-.
= z: "'0 Xk-l-.'
.;0 S

is a solution of Sq. (2), thenA.o is a latentrootof L(..t) andXo. Xl' ... , X1-1 is a
14.6 Difference Equations
Jordan" chainfor L(A) corresponding to Ao
Exercise 1. Solve the system of difference equations for j = O. 1, ... :
Consider the constant-coefficient-matrix difference equation of order I
given by j+2 - 0,
X (1 ) -

X (2)
j+ 2
+ xU)
j+ 1
- x(1)
j+ 1 +. X }(1) -- 0
L,xj+! + L'-lXj+l-l + ... + L 1xj+l + Loxj = fj' j =: 0, 1, ... , (1)
514 14 MATRIX POLYNOMIALS 14.6 DIFFIlIUlNCE EQUATIONS
515
SOLUTION. Write the given system in the form (2): PROOF. First observe that the general solution x(O) = (xW), x~O), ) of the
homogeneous equation (2) is given by
[~ ~}J+2 + [~ _?}~J+l + [~ ~]XJ == O. } == 0.1.....
x~O) = P lC{%, } == O. 1.2. .... (6)
where xJ == [X~l) X~2)]T for each}. Since the leading coefficient of the as- where % E e'n is arbitrary. This is easily verified by substitution and by using
sociated matrix polynomial L(A) is nonsingular (L 2 == 1). it follows from the Theoren,t7.1O.2.to prove that Eq. (6) gives all the solutions of Eq. (2).
corollary to Theorem 7.10.2 that the given system has four linearly indepen- Thus it remains to show that the sequence (0' (1) ...(1) ) M'th

dent solutions. To find four such solutions. recall the result of Exercise
14.3.2 and use Theorem 1: x~1) = P t
J-lL C{-k-1Rt/". 1 ''''2 , ... ,

j = 1,2, ... ,
,,~o) == [0 I]T. } == 0, 1, ....
"-0
is a particular solution ofEq. (1). This is left as an exercise. We only note that
.,W) == [0 I]T. ,,~o) = 0, } == 1. 2... , the proof relies, in particular. on the use of Theorem 14.2.3 and Exercise
14.2.6 applied to the companion triple (P t, CL R t ).
.,~l) == [1 O]T. ,,\1) = [0 I]T. .,~l) = O. } == 2, 3, ...
Using the similarity of the standard triples of L(A) and the assertion of
.,Ii) == [-2 O]T. .,\2) = [l O]T, "If) = [0. I]T. Lemma 1, we easily obtain the following analogue ofTheorem 14.5.1.
,,~2)==O, }=3,4, ....
Theorem 3. 1/ (U. T. V) is a standard triple/or L(A), then every solution of
The verification of the linear independence of the vectors 1.(0). ,,(0), ,,(1), ,,(2) in Eq. (2) has the form X o = U%. and/or} = 1.2...
.$P(C2 ) is left as an exercise.
Since the solution space has dimension 4, every solution has the form xJ = UTJ% +U L
J-l TJ-"-ty/",
01:1,,(0) + 01:2.,(0) + 01:3,,(1) + ~ ,,(2) ""0
where Z E c-,
for some 01:1. 01:2. 01:3.01:4 E C. 0
We conclude this section by giving an explicit solution of the initial-value
Consider now the formulation of a general solution of the inhomogen- problem for a system ofdifference equations (compare with Theorem 14.5.2).
eous difference equation (1). Observe first that the general solution x E .$P(en)
of (1) can be represented in the following way: Theorem 4. Let (U, T, Y) be a standard triple for L(il). There is a unique
solution 0/ Eq. (1) satiifying initial conditions
x = x(O) + x(1). (4)
where x(O) is a general solution of the homogeneous equation (2) and x(1) xJ = tl J j = 0.1, ... , I - 1, (7)
is a fixed particular solution of Eq. (1). Indeed. it is easily checked that the for any given vectors "0' "1' ... "'-1 E en. This solution is given explicitly
sequence in Eq. (4) satisfies Eq. (1) for every solution X(O) of Eq. (2) and. by the results 0/ Theorem 3 together with the choice
conversely. the difference x - x(1) is a solution of Eq. (2).
Now we take advantage of the idea of linearization to show how a general
solution of Eq. (1) can be expressed in terms of any standard triple of L(l).

Lemma 1. The general solution x =


(Xo, Xl,"') 0/ the nonhomogeneous
equation (1) is given by Xo = P l %, and/or j 1.2, ... , = where SL is the symmetrizerfor L(A). ~ I
J-l
xJ == PlC{% + PI L Ct"-lRd,, (5) The proof of this result is similar to that of Theorem 14.5.2.
"-0
where the matrices PI andR 1 are defined by Eq. (14.2.2) and % E c- is arbitrary. Exercise 2. Complete the proofs of Theorems 1, 2, and 4.
516 14 MATRIX PoLYNOMIALS 14.7 A REpRESENTATION THEOREM
517
Exercile 3. Suppose that I J = I for j = 0, I, 2,, . in Eq. (I), and assume For any other standard triple (U, T, V), write
that all latent roots of L(A) are inside the unit circle. Show that the solution
of Eq. (1) satisfies
lim x J = (L, + L ,- 1 + ... + L O) - I / . 0 Then
J-tCIJ

PIc:.:
PI ] -1 = UT'
[U: ]-1
14.7 A Representation Theorem [ plci- I UT'-I'
and Eqs. (1) and (2) are established.
It has already been seen, in Theorem 14.2.2,that any standard triple for a To obtain Eq. (3), observe that Corollary 14.2.1 shows that (y T, TT, UT)
matrix polynomial L(A) (with det L, :f:: 0) can be used to give an explicit is a standard triple for LTO.), so applying the already established Eq. (2),
representation of the (rational) resolvent function L(A) -I. In this section
we look for a representation of L(A) itself, that is, of the coefficient matrices
Lo , Lit ... , L" in terms of a standard triple. We will show that, in particular, -[L~ LI ... LT-tJ = LTVT(TT)'
yT
vTrT
]-1 I
. ;!
the representations given here provide a generalization of the reduction of a
[
square matrix to Jordan normal form by similarity. y T(TT)' - l
Deorem 1. Let (U, T, V) be a standard triple for a matrix polynomial Taking transposes, Eq. (3) is obtained.
L(A) (as defined in Eq. (14.1.2. Then the coefficients 0/L(A)admitthe/ollow-
ing representations:
Observe that if L(A.) is monic, that is, if L, = I, then the I coefficients
(1) L o , .. , L'-l of L(A) are determined by any standard pair (U, n,
or by a
left standard pair (T, V). Note also that if the theorem is applied using a
. U
UT ]-1 Jordan triple (X, J, Y) in place of (U, 1; Y) the coefficients of L(A) are
.completely determined in terms of the spectral properties of L(A).
... L,-tl = -L,UT' : (2)
[ Consider, in particular, the case in which L(A) = 1A. - A. Thus, L(A)
UT'-I is monic and I = 1. In this case Eqs, (1) and (2), written in terms of a Jordan
and triple (X, J, Y), yield

~
] = (Xy)-l, A = XJY.
] = -[: TY ... T'-lyr IT'YL,. (3) Thus the left Jordan chains (or columns of Y) are given by Y = X-I, and
[ the second equation gives the expression of an,arbitrary A e e"lC" in terms of
L,_I its Jordan form and chains, namely, A = XJX-1.
PROOF. We first show that Eqs. (1) and (2) hold when (U, T, V) is the special
Exercise1. Let B, C e e" lC" and define the matrix polynomial L(A) =
standard triple (PI' CL , R 1 ) . First, Eq. (1) is obtained from Eq. (14.2.12).
Then Eqs. (14.2.6) and (14.2.7) give the special case of Eq. (2):
A2] + AB + C. Show that if (U, T) is a standard pair for L(A), then

[C,B] = -UT;{~Trl.
" ,
If (U, T, V) is a standard triple for L(A), show also that 1
,I. :1
B = 2
-UT y, C = -UT 3 y + B 2' 0 . j
518 14 MATRIX POLYNOMIALS 14.8 MULTIPLES AND DIVISORS 519
14.8 Multiples and Divisors Theorem 1. Let L(A), L.(A), Lz(l) be monic matrix polynomials such that
L(.t) = Lz(,t)L. (A.), and let (U., T., Y1 ) , (Uz, 72, Vz) be standard triples
for L.(..t) and L 2(A.), respectively. Then the admissible triple
The idea of division of one matrix polynomial by another was introduced
in Section 7.2. Recall that a matrix polynomial L&1.) with invertible leading
coefficient is said to be a right divisor of a matrix polynomial L(,t) if L(,t) = U = rU I OJ, T= [~. v~zJ V= [~] (2)
L 2(,t)L 1('\) for some other matrix polynomial L 2 ('\) . The reason for requiring
that L 1(A) have invertible leading coefficient is to avoid the multiplicities of for L(A) is a standard triple for L(l).
divisors that would arise otherwise.For example, a factorization like
PROOF. WithT defined in (2), it is easily seen that

[~ ~] = [:1: ~] [c~m d~nl (ll _ T)-I = [(ll-. T.)- (U - T.rY.U 2(ll - T2)- I ]
o ui : T2)- 1 .
where a, b, c, dee and k, I, m, n are nonnegative integers, should be excluded,
not only because there are so many candidates, but mainly because such Then the definitions of U and V give
factorizations are not very informative. Here we go a little'further and con-
fine our attention to divisors that are monic, where the coefficient of the U(..tI - T)- l y = U 1 (ll - TI ) - I Y1 U Z ( l l - T2 ) - YZ
highest power of A is I. In fact, it will be convenient to consider only monic = L.(lrL 2(lr = (L z(A)L 1(l-1
divisors of monic polynomials.
It turns out that, from the point of view of spectral theory, multiples of =L(A)-I,
matrix polynomials are more easily analyzed than questions concerning
divisors. One reason for this concerns the question of existence of divisors. where Theorem 14.2.2 has been used. The result now follows from Theorem
Thus, it is not difficult to verify that the monic matrix polynomial 14.2.4.

(1) To see the significance of this result for the spectral properties of L(i),
L 1(,t), and L 2(..t), replace the standard triples for L 10I.) and L 2(..t) by Jordan
triples (X 10 J l' Yl) and (X 2' J 2' Y2 ) , respectively. Then L(A.) has a standard
has no monic divisor of degree 1 in .t A thorough analysis of the existence triple (not necessarily a Jordan triple)
question is beyond the scope of this brief introduction to questions concern-
ing spectral theory, multiples, and divisors.
Standard triples, and hence spectral properties (via Jordan triples), of (3)
products of monic matrix polynomials are relatively easy to describe if
triples of the factors are known. Observe first that the following relation Although the second matrix is not in Jordan form, it does give important
between monic matrix polynomials, L(A) = L 2('\)L1('\), implies some partial information about the spectrum of L(..1.) because of its triangular
obvious statements: form. Property (b) of Theorem 14.2.3 implies, for example, (see also Eq.
(a) 0'(L1 ) U 0'(L2 ) = O'(L); (14.3.4 that lJ=oLJX.J{ = 0, where L(l) = D=oAJLJ. This implies the
following corollary.
(b) Latent vectors of L 1(A) and L!(,\) are latent vectors of L(,t) and
LTO.), respectively. Corollary. 1. Under the hypotheses of Theorem I, O'(L.) c:: O'(L), and if
Some of the complications of this theory are associated with cases in which Ao E O'(L.), then every Jordan chain for L.(A.) corresponding to AO is also a
O'(L I ) n O'(L z) :F 0 in condition (a). In any case, it is possible to extend Jordan chain for L(A.) corresponding to 10 ,
statement (b) considerably. This will be done in the first corollary to the Similarly, q(Lz} c:: O'(L) and every Jordan chain of L!(l) is a Jordan chain
following theorem. ofLT(A).
520 14 MATRIX PoLYNOMIALS 14.9 SOLVENTS OF MONIC MATRIX PoLYNOMIALS 521

Given the standard triple (3) for L(A), it is natural to ask when the two This is, of course, a natural generalization of the notion of a zero of a scalar
matrices polynomial, and it immediately suggests questionsconcerningthe factoriza-
tions of L(A) determinedby solventsof L(A). .
[~I Y~:ll [~;J (4)
It has been shown in the Corollaryto Theorem7.2.3 that S satisfies Eq. (1)
if and only if U - S is a right divisor of L(A), that is, if and only if L(A) =
L 2(),.)(Al - S) for some monic matrix polynomial L 2(A) of degree I - 1;
are similar, in which case a Jordan form for L(~) is just the direct sum of
This immediately puts questionsconcerning solvents of L(,t) in the context
Jordan forms forthe factors. An answer is providedimmediately byTheorem
of the preceding section. In particular, recall that matrix polynomials may
12.5.3, which states that the matrices of (4) are similar if and only if there
have no solvent at all (seethe polynomialof Eq. (14.8.1. Furthermore, it is
exists a solutionZ (generally rectangular) for the equation
deduced from Corollary 1 of Theorem 14.8.1 that when a solvent Sexists,
JIZ - ZJ2 = Y2XI (5) all eigenvalues of S are latent roots of L(A) and aU Jordan chains of U - S
must be Jordan chains of L(A). Also, if S = XIJIXi l whereJ, is in Jordan
This allows us to draw the following conclusions.
normal form, then (X I' J 1) is a Jordan pair for U - S. Thus. the matrix
Corollary 2. Let (Xl>J 10 YI ) and (X 2 , J 2 , Y2 ) be Jordan triples for LI(A)

=[ ~~. ]
and L 2(A), respectively. Then diag[J I' J 2] is a linearization for L(A) =
L 2(A)LI(A) if and only if there is a solution Zfor Eq. (5); in this case
RI

[~ ;J = [~ ~] l~ Y2J~I] [~ -IZl (6)


XIJ~-I

has rank n and its columnsform Jordan chamsfor CL' It follows that Im R I
PROOF. Observe that is a Crinvariant subspaceof dimension n. Development of this line of argu-
ment leads to the following criterion.
[ ~ -IZ] = [~ ~r I ,
Theorem1 (H. Langer"), The monic matrix polynomial L(A) of degree
so that the relation (6) is, in fact, the explicitsimilaritybetween the matrices I has a right solvent S if and only if there exists an invariant subspace f/ of CL
of (4). To verify Eq. (6), it is necessary only to multiplyout the product on o/the/orm f/ = Im Ql where
the right and use Eq, (5).
An important specialcase of Corollary 2 ariseswhen a(L I ) rva(L 2 ) = 0,
for then, by Theorem 12.3.2, Eq. (5) has a solution and it is unique. Thus, (2)
a(L I ) n a(L 2 ) = 0 is a sufficient conditionfordiag[J I , J 2 ] to bea lineariza-
tion of L 2(A)LI(A).

PROOF. If S is a right solventof L(l) then.as wehaveseen. M - S is a right


14.9 Solvents of Monic Matrix Polynomials divisor of L(A). It is easilyverified that (I, S, l) is a standard triplefor M - S,
so if (U 2,72, V2 ) is a standard triple for the quotient L(l)(M - S)-I. then
Theorem 14.8.1 impliesthat L(A) has a standard pair of the form
As in the preCeding section, we confine attention here to n x n monic
matrix polynomials L(A) = A'l + D=A JlL}. An n x n matrix S is said to
be a (right) solvent of L(A) if, in the notation of Section7.2, the right value of
L(l1.) at 5 is the zero-matrix,that is,
X = [1 0]. T = [~ l
~:
S' + L,_15'-1 + ... + LIS + L o = O. (1) t Acla Sci. Math. (Szeged) 35 (1973),83-96.
522 14 MATRIX PoLYNOMIALS 14.9 SOLVENTS OF MONICMATRIX PoLYNOMIALS 523

Defining an In x In matrixQ by for some % e en.Therefore, comparing the first n coordinates of fP and Ql"
it is found that % = 0 and consequently. = O. '

x:~] =[:'S'~1 ~'ll.


By using Theorem 1, we show in the next example that certain monic
matrix polynomials have no solvents. This generalizes the example in Eq.
Q =[X;,-l (14.8.1).

Example 1. Consider the monic n x n (I ~ 2) matrix polynomial


(where * denotes a matrix of no immediate interest), it follows, as in Lemma L(A.) = A.'ln - J n,
14.2.1, that CLQ = QT. This implies C LQl = Q1S, where Ql is defined by
Eq. (2). Consequently, CL .9 c:: f/'. where J n denotes the n x n Jordan block with zeros on the main diagonal.
Conversely, let .9 = 1m Ql' where Ql is defined by Eq. (2) for some n x n Clearly t1(L) = {OJ and the kernel of L(O) has dimension 1. It follows that
matrix S, and let fI' be CL-invariant. Then for any y E Cn there is a Z E cn 'n,
L(.il) has just one elementary divisor, .il and so the Jordan matrix associated
such that with L(i!.) is J'n, the Jordan blocksize In with zeros on the main diagonal.
Observe that J'n has just one invariant subspace .9'J of dimension j for
CL Q1Y = Ql%' j = 1, 2, ... , n, and it is given by
Using the definition (14.1.3) of CL , this means that
fIj = span{el' e2,'''' eJ} c:: c-.
S I
S2 S By Lemma 14.2.1,it follows that CL also has just one n-dimeneional invariant
subspace and it is given by QfI'n, where
S'-1
I-I
Y= S'-2 Z.
\
- L LJSi S'-l
.\
J=O
In particular, % = Sy and so - Lj:A LiSiy = S'-IZ implies
i
and (X, J ,n) is a Jordan pair for L(.il). If XI is the leading n x n block of
~".

(S' + LI _ IS'-l + ... + LIS + Lo)Y = O.


Since this is true for any ye en, Eq. (1) follows and S is a right solvent of X, then
L(A).
Note that a subspace fI' of the form 1m Q.. with QI given by EQ. (2),
can be characterized as an n-dimensional subspace containing no nonzero
vectors of the form
= [0 .I ... .T_l]T,
For this subspace to have the form described in Theorem 1 it is necessary
where 0, ., e en, i = 1, 2, ... , 1- 1. Indeed, if. e fI' is such a vector, then
and sufficient that XI be nonsingular, However, since L(")(O) = 0 for k =
1 1,2, ... , I - 1, it is easily seen that we may choose the unit vector e l in en
8 for each of the first I columns of X. Thus XI cannot be invertible if I ~ 2,

.= 82 Z
and C L has no invariant subspace of the required form. 0
In contrast to Example 1, a monic matrix polynomial may have many
S'-I distinct solvents, as shown in the following example.
524 14 MATRIX PoLYNOMIALS 14.9 SOLVIlNTS OF MONIC MATRIX PoLYNOMIALS
525
Exercise 2. Check that the matrices Ex~rcise 3. Prove that if the Vandermonde matrix Y associated with pre-
scribed n x. n matrices S h S2' .. , 8, is invertible, then there exists a monic
n x n matrix polynomial L(A) of degree I such that {S.. S2, ... , S,} is a
complete set of solvents for L(A).
are solvents of the matrix polynomial Hint. Consider the n x In matrix [-Si -S~ ... -SfJv- 1 0

L(A) = A2 I +A [1 1] + [0 -1]
_2 _3 .0 2 0
Theorem 3. Let S.. S2' ... , S, form a complete set of solvents of the monic
n x n matrix polynomial L(A) of degree I. Then every solution of the dijJer-
ential equation L(dldt)x = 0 is ofthe form
The example of scalar polynomials suggests that we investigate the
possibility of capturing spectral information about a monic polynomial of x(t) = e't'zl + e'2'Z2 + ... + e"'Z/ (5)
degree I in a set ofn solvents. Thus, define a set of solvents Sl' S2' .. ' S, of where %.. %2' , z,eC".
L(A) to be a complete set ofsolvents of L(A) if the corresponding Vandermonde
(block) matrix PROOF. Substituting Eq. (4) in the formula (14.5.5) (with jet) = 0) we obtain
x(t) = PleVDv-t,z = PIYeD' y - 1z,
~.
I I
where D = diag[Sh S2'.'.' S,]. Since PI Y = (I I .. I] and eD' =
V= . ~l ~2 ] eC'" (3) diag[e't', e'2', ... , e"I), the result in Eq. (5) follows if we write y-l z =
[
S~-l S~~l SI_I .
[zI zl ... zn T .

Example 4. Consider the differential equation


is nonsingular. Note that in the case I = 2, this condition is equivalent to
the nonsingularity ofthe n x n matrix S2 - SI.
Clearly, in the scalar case (n = 1), if L(A) has a set of I distinct zeros then,
. ()+ [1_ 1]
xt 2 _ 3 t(t)
fO -1]
+ lo 2 x(t) = 0, (6)
according to this definition, they form a complete set of solvents (zeros)
and the matrix equation
of the polynomial. This observation shows that not every (matrix or scalar)
polynomial possesses a complete set of solvents. However, if such a set of
solvents does exist, then there are remarkable parallels with the theory of
X2
+[- 2 1 1] + [0 -1] =
- 3 X 0 2 O. (7)
scalar polynomial equations.
We saw in Exercise 2 that the matrices
Theorem 2. If S.. S2' ... ' S, is a complete set of solvents of a monic n x n

= [~~J. [_~ ~]
matrix polynomial L(A), then diag[S.. S2' .. , S,] is a linearization of L(A).
In particular, 81 S2 = (8)
, satisfy Eq. (7), and since det(S2 - S.) '" 0, they form a complete pair of
a(L) = U a(SJ. solvents for the corresponding matrix polynomial. Thus the general solution
1=1
of Eq, (6) is of the form (5) with / = 2, where S 1 and S2 are given by (8). To I.
PROOF. The assertion of the theorem immediately follows from the easily simplify this, observe that
verified relation
c; V = Vdiag[S.. S2' ... ' Sa, (4) .e
li
' ::;;. I + S1t + fSft 2 + ... = I + S1t = [~ ~J.
where V is given by Sq. (3) (see Exercise 7.11.14). while
The converse problem of finding a monic matrix polynomial with a given
set of solvents is readily solved ; it forms the contents of the next exercise. [ 10] = [0 -t] fl 1] [0 -t]-1
-2 1 1 0 lOll 0 .

11III
526

Therefore

tel] [ 0 1] [e'
14 MATRIx PoLYNOMIALS

0]
I
I
e' -2 0 = -2te' e' . CHAPTER 16
Thus, the general solution of Eq. (6) is

X (t)] = x(t) = plo 1t] .[oc


1
[ xz(t)
1]
+
[e'
-2te
0]
' e'
[OC 3]
I
!
Nonnegative Matrices
and hence
OC
z OC4'
I
I
x 1(t) = 1X1 + IXzt + oc3e', I
j
xz(t) = IXz - 2oc 3te' + 1X4e', i
for arbitrary 1X1, IXZ' 1X3, /X4 e C. 0
Note that the conditions of Theorem 3 can be relaxed by requiring only
similarity between CL and any block-diagonal matrix. This leads to a gener-
alization of the notion of a complete set of solvents for a monic matrix
polynomial and to the form r:=
1C,eXlIZi for the solution of the differential
equation. Also,the notion of a solvent of multiplicity r of a matrix polynomial
A matrix A e R'" "n is said to be nonnegative if no element of A is negative.
If all the elements of A are positive, then A is referred to asa positive matrix.
Square matrices of this kind arise in a variety of problems and it is perhaps
can bedeveloped to obtain a complete analogue of the form of solutions for a
surprising that this defining property should imply some strong results
linear differential equation of order I with scalar coefficients in terms of
concerning their structure. The remarkable Perron-Frobenius theorem is
zeros of the corresponding characteristic equation.
the central result for nonnegative matrices and is established in Theorems
Exercise 5. (a) Let A be a Hermitian matrix. Show that the equation 15.3.1 and 15.4.2.
Matrices whose inverses are nonnegative, or positive,also play an important
X2:-X+A=O (9) part in applica~ions, and a brief introduction to such matrices is also
has Hermitian solutions jf and only if u(A) c: (- 00, !]. contained in this chapter. .
(b) Let A be positive semidefinite. Show that all Hermitian s?luti?ns of If A = [aij] and B = [blJ] are matrices from R"''''', we shall write A ~ B
Bq, (9) are positive semidefinite if and only if u(A) c: [0, iJ and m this case (respectively, A > B) if aij ~ bi j (respectively, aij> bij ) for all pairs (i,j)
u(X) c: [0, 1]. 0 such that 1 S; i S; m, 1 S; j S; n. With this notation, A is nonnegative if
and only if A ~ O. It should be noted that A ~ 0 and A:O do not imply
A> O. Also, if A ~ 0, B ~ C, and AB is defined, then AB;;:: AC; and if
A ~ 0, B > 0, and AB = 0, then A = O. Note that most of these remarks
apply to vectors as well as to matrices.
The duplication of notations with those for definite and semidefinite
square matrices is unfortunate. However, as the latter concepts do not play
an important part in this chapter, no confusion should arise.

527
528 IS NONNEGATIVE MATRICJilI IS.! IRREDUCJBUl MATRlCJilI 529

15.1 Irreducible Matrices Exercise 2. Verify that the directed graph associated with the matrix

The concept of reducibility of square matrices plays an important part


in this chapter. This was introduced in Section 10.7; it would be wise to read
[~o ~ ~]
4 -1
the discussion there once more. Here, the combinatorial aspect oflhis concept fails to be strongly connected.
will be emphasized. Exercise 3. Show that if all the off-diagonal entries of a row A (or a
It is interesting that the concept of reducibility is not connected in any way column of A) are zero, then the graph of A is not,strongly connected.
with the magnitudes or signs of the elements of a matrix but depends only on
the disposition of zero and nonzero elements. This idea is developed in the SOLUTION. In the case of a row of A, say row i, there are no directed lines
concept of the directed graph associated with a matrix; we shall take only the PIP) starting at node i. So there is no connection from node i to any other
first step in this theory to obtain a second characterization of an irreducible node. So the graph is not strongly connected. 0
matrix in Theorem 1. The next two exercises form part of the proof of the next theorem.
Let P l' P2 , , Pitbe distinct points of the complex plane and let A e Cit x It.
Exercise 4. Let A ecnx n have the partitioned form
~ each nonzero element au of A, connect PI to PJ with a directed line
PiP}' The resulting figure in the complex plane is a directed graph for A. We
illustrate this in Fig. 15.1 with a 3 x 3 matrix and is directed graph.
A= [AUo Au],
A 22

[~ -~ ~l
where Au is k x k and 1 S k S n - 1 (so that A is trivially reducible).
Show that the directed graph of A is not strongly connected.
1 -1 0 SoLUTION. Consider any directed path from a node i with i > k. The first
We say that a directed graph is stronglyconnected if,for each pair of nodes segment of the path is determined by the presence of a nonzero element
Ph p) with i :F- j, there is a directed path au in the ith row of A. This row has zeros in the first k positions, so it is
possible to make a connection from node i to nodej only ifj > k: Similarly,
PiP"',, P"JPl~' ... , Pl.-I.p; the path can be extended from node j only to another node greater than k.
connecting Pi to Pi' Here, the path consists of r directed lines. Continuing in this way, it is found that a directed path from node i with
Observe that nodes i and j may be connected by a directed path while j > k cannot be connected to a node less than k + 1. Hence the directed
j and i are not. graph is not strongly connected.
Exercise 1. .Check that the graph of Fig. 1 is strongly connected. Exercise 5. Let A EC"X" and let P be an n x n permutation matrix. Show
that the directed graph of A is strongly connected if and only if the directed
graph of PApT is strongly connected.
SOLUTION. Observe that the graph of PApT is obtained from that of A
just by renumbering the nodes, and this operation does not affect the con-
nectedness of the graph. 0
Tbeorem 1. A square matrix is irreducible if and only if its directed graph
is strongly connected.
PROOF. Exercise 4 shows that if A is reducible, then the graph of PApT
is not strongly connected, for some permutation matrix P. Then Exercise 5
shows that this is equivalent to the statement that A itself has a directed
graph that is not strongly connected. Thus, a strongly connected directed
P3 graph implies that the matrix is irreducible. Proof of the converse statement
Fig.I5.1 The directedgraph of.amatrix. , is postponed to Exercise 15.3.3.
IS NONNllGATIVE MATRICES 15.2 NONNBOATlVJl MATRICES AND NONNllGATIVJl INVERSES 531

In testing particular matrices for reducibility, it is important to keep if A-I is a nonnegative matrix, we can give precise bounds for the solution
in mind the obvious fact that if A is irreducible and B is obtained from A vector because, in this case,
by replacing zero entries by nonzero entries, then B is also irreducible.
A- 1" 1 S; A-I" = X S A- l h2
Exercise 6. Check that the following matrices are irreducible. In both
cases the elements alJ not set equal to zero are, in fact, nonzero. A matrix A E IR n " ft is said to be monotone if it is nonsingular and A -1 ~ O.
Monotone matrices frequently occur with a sign pattern among the elements;
0 au 0 0 0 au 0 0 for example, main-diagonal elements positive and off-diagonal elements
0 0 au au 0 a23 nonpositive. Such matrices are given their own name: a monotone matrix is
0 0 a32 0 0 called an M-matrix if all the off-diagonal elements are nonpositive, The
0 matrices A and B of Exercise 2.11.17 are monotone, and B is an M -matrix,
0 0 an-l. n 0 an- l n
ani 0 0 0 0 0 ~.n-l 0 Exercise J. Use Exercise 1.8.5 to prove that if 0 S A S; B, then Ak S d
fork = 1,2, ....
16.2 Nonnegative Matrices and Nonnegative Inverses t Exercise 2. Prove that if A is an M-matrix, then each main-diagonal
element of A is positive.
Let A e en H ft with elements a)le and let IA I denote the nonnegative matrix Hint. Observe that ifnot, then A has a complete column that is nonpositive,
in IRn Hft with elements laJil. Clearly, one of the properties shared by A and say thejth column, and then At) S O. 0
IA I is that of reducibility or irreducibility. In addition, many norms of A
and IAI take the same value (see Sections 10.3 and 10.4). However, the We now discuss some criteria that ensure that a matrix A e IR" Hft is an M-
relationship between the spectral radii of A and IAI is more delicate, and matrix. They all give sufficient conditions that are often easy to check.
it is frequently of interest. It is plausible that IlA S; iliA" and we first show Theorem 2. Let BE IRn " ft with B ~ O. The matrix 1 - B is an M- matrix if
that this is the case. We return in Section 15.4 to the important question of and only if IlB < 1.
conditions under which Il.. < iliA"
PROOF. If IlB < I then Theorems 10.3.1 and 11.1.1 imply that 1- B is
Theorem J. Let AeC"Hft and BelRftHft.lflAI S B, thenll A S IlB' nonsingular and
PROOF. Let 8 be an arbitrary positive number and define,the scalar multiples (I - B)-1 = 1 + B + B 2 + ....
A. = (p,B + B)-lA, B. = (PB + 8)-lB. Since B ~ 0 it is clear that (l - B)-1 ~ O. Obviously, the elements of 1 - B
Clearly IA.I S B., and it is easily seen (see Exercise 1) that for k = 1,2, ... , off the main diagonal are nonpositive and so I _. B is an M -matrix,
IA.li s~. Conversely, if 1 - B is an M-matrix, then (I - B)-1 ~ O. Let ,t E a(B)
Now it isalsoclear that sg, < l,andso(seeExercise9.8.5)B~ .... 0 ask -+ 00. and lx = Bx with x O. Then, since B ~ 0, it is easily seen that
Since IA: I s IAtti S ~, it follows that A: -+ 0 as k -+ 00 and so (by Exercise
l,tllxl S Blxl,
9.8.5 again) Il.., < 1. But this implies IlA < IlB + 8, and since 8 is arbitrary,
Il.. S IlB' which implies (I - B)lxl S (1 - IAl)lxl, and hence
Note that on putting B = IAI, we obtain Il.. S il,A, from the theorem. Ixl S (1 -1,tI)(I - B)- l lxl ;
Consider now the question of solving an equation Ax = b, where .1
A e IRft " ft and be IRft are given and A is invertible. The unique solution is, of Since Ixl ~ 0 and (l - B)-1 ~ 0, it follows that IAI < 1. But this is the case
course, x = A-lb. However, if b is not known precisely and we are given for any ,t e a(B), so have IlB < 1.
only bounding vectors b l , b 2 E IRft in the sense that b l S "S b z , what can Theorem 3: Let A = [aIJJ7.J= 1 e Rft "II and assume that au > 0 for each
then be said about the solution vector x? The perturbation analysis of i and alj S 0 whenever i j. If A is diagonally dominant, that is,
Section 11.1 gives one answer in terms of the condition number of A. But ft
"The presentation of this section is strongly influenced by the expositions of R. S. Varga ail> L laiJI, i = 1,2, ... , n,
and of J. M. Ortega and W. C. Rheinboldt. See Appendix 3. J=l.J"'1
532 IS NONNOOATIVE MATRICES 15.3 TIm PBRaoN-FaoBENIUS l'HoolUlM (I) 533

or. if A is irreducible and Proposition 1. If the matrix A E Rill( II is nonnegative and irreducible, then

all~
II

I: lai/i. i=1.2... ,n,


(I + A)"-1 > O.
}= 1.}"'i
PROOF. Consider a vector ye R" such that y ~0 and y 'I: 0 and write
withstrict inequalityfor at leastonei, then A is an M -matrix.
% = (I + A)y = y + Ay. (1)
PROOF. First consider the case in which A is diagonally dominant. Let
D = diag[aw ... ,"nII] and define B = I - D- 1A. Note that B has zero Since A ~ 0, the product Ay ~ 0 and so % has at least as many nonzero (and
elements on the main diagonal and that B~ O. Also, the fact that A is hence positive) elements as y. If y is not already positive, we shall prove that
diagonally dominant implies that % has at least one more nonzero element than y. Indeed, if P is a permutation
II matrix such that Py = r"T OT]T and" > 0, then it follows from Eq. (1)
2: Ibijl < 1. i = 1,2, ... , n. and the relation ppT = I that
}=1

It follows immediately from the Gerigorin theorem (Section 10.6)that IlB < 1. (2)
Now we have D -1 A = I - B, and it follows from Theorem 2 that D- 1 A
is an M-matrix. Consequently, A is also an M-matrix. Hence, if we partition % and PApT consistently with the partition of y,
With the second set of hypotheses we follow the same line of argument
but use the corollary to Theorem 10.7.2 to ensure that IlB < 1.
Exercise J. Show that, under either of the hypotheses of Theorem 3,
% = [:.],
...
PApT = [Au
A
Au]
A 21 22 '
In A = {n, 0, O}. If, in addition, A is symmetric then A is positive definite. then Eq. (2) implies
Exercise 4. Let A be a lower-triangular matrix, with positive elements on 11 = II + Au" and w = A 2 1" . (3)
the main diagonal and nonpositive elements elsewhere. Show that A is an
M-matrix. Now observe that the matrix P ApT is nonnegative and irreducible.
Hence, in particular, A II ~ O. A 2 1 ~ 0, A 2 1 :/= 0, and therefore it follows
Exercise 5. Show that if A is an M-matrix and D is a nonnegative diagonal from (3) that 11 > 0, w 4: O. But " > 0 and, therefore, w:!= O. Thus, % has
matrix, then A + D is an M -matrix, at least one more positive element than y. .
Exercise 6. Show that a symmetric M-matrix is positive definite. (Such a If (I + A)y is not already positive then, starting with the element
matrix is called a Stieltjes matrix.) (I + A):z = (I + Afy and repeating the argument, it follows that it has at
least two more positive elements than y. Continuing this process, we find
SOLUTION. Let A be a symmetric M-matrix and .t E u(A). If A S 0 then, after at most n - 1 steps that
by Exercise 5, A - AI is also an M-matrix and must therefore benonsingular.
This contradicts the assumption that AE u(A). Thus AE u(A) implies A > 0 (I + A)II-1 y > 0
and hence A is positive definite. 0
for any y 4: 0, y :!= O. Putting y = t41 for j = I, 2, ... , n, we obtain the desired
result.

15.3 The Perron-Frobenius Theorem (I) Observe that there is a simple result in the other direction. If (I + A)I > 0
for any A E R"XII and any positive integer j, then A must be irreducible.
Otherwise, assuming that A has the partitioned form of Exercise 15.1.4, we
It is obvious that if A ~ 0 then AI' ~ 0 for any positive integer p. We easily obtain a contradiction with the hypothesis.
might also expect that if A has a sufficiently high density of nonzero elements
then. for large enough p. we would obtain AI' > O. Our first result is of this Exercise 1. If A e Rill( n is irreducible and D is a nonsingular, nonnegative
kind. diagonal matrix, prove that (D + ..4)"-1 > O.

1.
534 IS NONNIlGAllVB MATRICES IS.3 THE l'saRoN-FROBENIUS THsOREM (I) 535

Exercise 2. Let A e IR" lIlI be nonnegative and irreducible, and let ai1) be consider the closed set .,II of vectors x such that x ~ f} and L xl == 1.
the i,j element of AV. Show that there is a positive integer q such that tl,1 > O. Thus, .,II c: ! and
Show also that if ",(A) is the minimal polynomial of A, then we may choose
q S deg("'). r == sup rex).
lre"ff
Hint. Observe that A(I + A)"-I > 0 and use the binomial theorem. For If the function rex) ~ere continuous on .,II, we could equate this to max rex),
the second part let rCA) be the remainder on division of A(l + ;;1'-1 by ",(A) x E J{ (see AppendIX 2). However rex) may have discontinuities at points
and use the fact that rCA) > O. where elements of x vanish. We therefore consider the set .AI' of vectors y
defined by
Exercise 3. Let matrix A be as in Exercise 2.
y == (l + A)"-I X , xeJt.
(a) Show that ai1 > 0 if and only if there is a sequence of q edges in the
directed graph of A connecting nodes P, and Pj' By Proposition 1 every member of .AI' is a positive vector and so .AI' c: !.
(b) Use the result of Exercise 2 to show that A has a. strongly connected Now, ,AI' is the image of the closed and bounded set .,II under a continuous
directed graph. 0 func~jon and is therefore also closed and bounded. Furthermore, r(y) is
contmuous on .AI:
If A is a nonnegative irreducible matrix, consider..the real-valued function For any x E J( and corresponding y,
r defined on the nonzero vectors x ~ 0 by
r(x)y == r(x)(I + A)"-I x S (I + A)"-IAx,
. (Ax),
x = nun - - ,
r(.) (4) since r(x)x S Ax, and hence r(x)y S Ay for every y e.AI: Now, re,) is the
I :!i:':!i:1I X, greatest number p such that py S Ayand hence rex) S r(y). Thus,
lr."'O

where (Ax), denotes the ith element of the vector Ax. Then rex) ~ 0 and r == sup rex) :s; max r(y).
",eo IE.H'
forj == 1,2, ... , n, r(x)Xj S (Ax)j' with equality for somej. Thus r(x)x :s; Ax
and, furthermore, rex) is the largest number p such that px S Ax for this x. But since ,AI' c: ~

Exercise 4. Find rex) if max r(y) S sup rex) == sup rex).


,e.H' ",eSl' , "'EO

A == [~~] and x == [~l Hence


r == max r(y), (6)
SoLUTION. Since Ax = [2 I]T, the largest number p such that px S Ax is J'E.H'

2. Thus rex) == 2. and there is a y > 0 such that r == r(y).


There may be other vectors in ! for which rex) attains the value r. Any
Exercise 5. If A == [alj)~.j", I and x = [l 1 I]T, show that
such vector is called an extremal vector of A. Thus, a nonzero vector % ~ 0
II
is an extremal vector of A if r(z) == r or, what is equivalent, r% S A%.
rex) == min L all:' 0 The reason for our interest in the number r is clarified in the next result.
I~'SII AI"'!

Let ! denote the domain of the function r, that is, the set of all nonzero Proposition 2. IJ the matrix A e R" "II is nonnegative and irreducible, then
nonnegative vectors of order n, and define the number r (not to be confused the number r defined by Eq. (5) ispositiveand is aneigenvalue ofA. Furthermore,
with the function r) by everyextremalvectoroj A is positiveand is a righteigenvector of A associated
(5) with the eigenvalue r.
r = sup rex).
lreSl'
PROOF. Let X == [1 1 . . . IY. Then (see Exercise 5) rex) = min, I:a: a'L
From the definition of rex), we observe that r is invariant if x is replaced > O. For, if any row of A consists entirely of zeros, then A is reducible.
by I%X for any IX > O. Thus, in evaluating this supremum, we need only Since r ~ r(x), we deduce that r > O.
536 IS NONNEGATIVE MATRICIlS IS.3 TIm PIlRRON-FROBBNIUS TImoRBM (I) 537
Let z be an extremal vector and let w = (1 + A)n-l z. Without loss of To complete the proof,t we show that in the terminology of section 6.3,
generality, we may suppose that z e Jf. Proposition 1 implies that w > 0, there is no generalized eigenvector of order 2 associated with r, and this
and clearly we.At: We also have Az - rz ~ 0 and, if Az - rz : 0, then will imply that the algebraic multiplicity of r is 1. To see this, let XI > 0
(I + Ar- 1(Az - rz) > O. and y > 0 be eigenvectors of A and AT, respectively, associated with eigen-
value r. Thus,
Hence Aw - rw > 0, or rw < Aw, which implies that r < r(w). But this
contradicts the definition (5) or r, and so we must have Az = rz. Thus, any (rI - A)Xl = 0, (rI - AT)y = O.
extremal vector z is a right eigenvector of A with associated eigenvalue r. Suppose that there is a generalized eigenvector %2 of:: 0 for which (rl - A)X2
Finally, since Az = rz, we have = XI (see Eqs, (6.3.3. Then, since yT(rI - A) == OT, we have yTXI = O.
w = (I + A)n-l z = (1 + r)n-l z, But this contradicts the positivity of the vectors X and s. and r must have
algebraic multiplicity 1.
and since w > 0 and r > 0, we must have z > O. .
Exercise 6. Using the above notations and taking advantage of Exercise
We can now state and prove the first part ofthe Perron-Frobenius theorem 7.9.1 show that if B(l) = adj(U - A), then
for irreducible matrices.
B(l)(U - A) = c(l)l.
Theorem 1. IJthe matrix A e IIln"n is nonnegative and irreducible, then Deduce that the derivative c'(r) > 0, and hence
(a) The matrix A hasa positive eigenvalue, r, equal to the spectral radius B(r) = kxyT > 0
olA;
(b) There is a positive (right) eigenvector associated withthe eigenvalue r; for some positive k.
(c) The eigenvalue r hasalgebraic multiplicity 1. Exercise 7. If A e Rn"n is irreducible and nonnegative and (1) = D-l ajl;'
PROOF. In Proposition 2 we have established the existence of a positive
j = 1, 2, ... , n, prove that
eigenvalue with an associated positive eigenvector. To complete parts min (1) S r S max (1),
(a) and (b) of the theorem it suffices to show that other eigenvalues of A J J
cannot exceed r in absolute value. and that there is equality in either case if and only if (11 = (12 = ... = (1n'
Indeed, if we have lXy = Ay and y of:: 0, then, since A ~ 0, it is easily seen
that SoLUTION. Adding the first n - 1 columns of rI - A to the last one, we
obtain

I' =::: .r: . . I


IIXIlYI = IAyl S AIYI,
where if z eRn, Iz I denotes the vector obtained from z by replacing all its
components by their absolute values. Hence IIXI S r(lyl) S r, which is what
we wanted to prove.
det(rI _ A) = det =:~.:~: ;~:: = o.
Suppose now that z is any right eigenvector associated with r. Thus, r - an - l ." - l r -.(1n-l

Az = rz and z of:: 0, and, as we just showed, -ani -an2 -an.n-t r- (1"

rlzl S Aizi. Expanding by the Jast column and denoting the t, jth element of B(r) =
adj(rl - A) by b,;(r), we obtain
This implies that Izi is an extremal vector and, by Proposition 2, Izi > o. n
Thus, %, "" 0, i == I, 2, .. , n. Hence the dimension of the right eigenspace L (r - (1J)bnJ{ r) = O.
of r is 1. Otherwise, we could find two linearly independent right eigenvectors J=t
ZI' Z2 and then determine numbers IX, {l such that IXZ1 + {lZ2 has a zero
element. This proves that the geometric multiplicity of r is 1. , The authors are indebted to D. Flockerzi for this part of the proof.
538
IS NONNEOA11VB MATRICES
IS.4 TIm PmutON-FROBENJUS THBoREM (II)
539
But we have seen in Exercise 6 that B(r) > 0 and so bnt r ) > 0 for j = 1,
= =... =
2, ... , n. It follows that either 0'1 0'2 Un or there exist indices k, I Theorem.! (H. Wielandt t). Let A E e B E /Rn)( n satisfy IA I S; B and
n )(n,

such that r < air. and r > a,. Hence the result. let B be lrred~cible. Then JJ... = r, where JJ... and r are the spectral radii of A
and B, respectioely, if and only if A is of the form
Exercise B. Using Exercise 7, and without examining the characteristic
polynomial, prove that the following matrix has the maximal real eigenvalue A = coDBD-l, (1)
7 and that there is an eigenvalue equal to 4: where lcol = 1, jDI = I, and cor E O'(A ).

[~ ~ J ~l
Note first of all that, by Theorem 15.3.1, r is in fact an eigenvalue of B.
PROOF. If A has the form (1), it is clear that A and B have the same spectral
radius. .
Conversely, let JJ..... = r and AE a(A) with IAI = P.... = r. Let a be a left
Exercise 9. If A > 0 and G is the component matrix of A associated with r, eigenvector of A corresponding to A, that is, a-O and a*A = Aa*. Then
prove that G > O.
lAa*1 = jAlla*/ = rl a*l S la*IIAI S ja*/B. (2)
Exercise10. If A is irreducible and nonnegative and qA) is the reduced Hence for any positive vector 14,
adjoint of A (see Section 7.9), prove that qr) > O.
rla*lu S; la"'IBu. (3)
Hint. Observe that eS(r) = dn - 1(r) > 0 (Section 7.9).
By Theorem 15.3.1 there is a" > 0 such that Bu = ru, so for this 14 we have
Exercise 11. If A is nonnegative and irreducible, show that A cannot have r Ia* I14 = Ia*,Bu. Thus, for this choice of" equality obtains in (3) and hence,
two linearly independent nonnegative eigenvectors (even associated with also, throughout (2). It follows that la*lB = rlll*l and, applying Theorem
distinct eigenvalues). 15.3.1 again, we must have la*l > O.
Hint. Use Exercise 10.7.2 and Theorem l(c) for the matrices A and AT to Then (2) also yields
obtain a contradiction. 0 la*l(B - IAI) = 0*
which, together with B ~ IA/, implies B = IA I.
Now define co = AIr and
15.4 The Perron-Frobenius Theorem (II)
[a
1
D = dilag lad' ... ' j';iJ an],
The main results of the preceding section generalize results established where a = [a1 anY Then lcol= 1, IDI = I, and it is easily verified that
first by Perron t. Frobeniust published them in 1912 along with deeper results a*D = la*l- Now, a*A = Aa* implies
of this section, results that give more information on the structure of a la*ID- IA = Ala*/D-t,
nonnegative irreducible matrix A in those cases where A has more than one so that
eigenvalue with absolute value equal to r, the spectral radius of A.
An important first step in our argument is a result due to Wielandt that
is independently useful and gives a stronger result than the comparison and
theorem, Theorem 15.2.1, provided the bounding matrix is irreducible. Note
that the proof depends heavily on the Perron-Frobenius results already
established. . In other Weirds,

(4)
t Math. A.nn. 64 (1907), 248-263.
~ Sitzungsber. Preuss. Akad. Wiss. (1912),456-477.
tMath. Zetts. 52 (1950). 642-648.
15 NoNNllOA11VB MATRICIlS 15.4 THIlPIlRRoN-FROIlllNlUS 1'JmoRBM (II) 541
J"U

But B = IAI = Im-ID-IADI and. since 14*1 > 0*. it follows from Bq, (4) By Theorem IS.ll(b). there exists a vector x e Ill", x > 0, such that Ax
that B = ro-ID-IAD. and hence conclusion (1). =rx. Then Eq. (6) implies that the vector xJ = DJx isan eigenvector of A
associated with AJ j = 1,2, ... k. Moreover, since r has algebraic multi-
A particular deduction from Theorem 1 will be important for the proof of plicity 1. the algebraic multiplicity of AJ == (J}Jr is also 1. j := 1,2, ... k.
Theorem 2. Suppose that. in the theorem. A is nonnegative and irreducible. Indeed, r is a simple zero of the characteristic polynomial of A and hence
Apply the theorem with B = A. so that IA I ~ B and Il... = r are trivially wJr is a simple zero of the characteristic polynomial of coJA. 1 ::;;; j ~ k. In
satisfied. The conclusion is as follows: view of Eq. (6) these polynomials coincide:
Coronary 1. Let A be nonnegative and irreducible and let .t e a(A) with det(,u - A) == det(,u - (J}JA). 1 ~i ~ k,
1,1.1 = It... = r, the spectralradius of A. Then A satisfiesthe condition
and therefore the result follows.
A = ~DAD-l (5) Note that the geometric multiplicity of each of AI> A.2 , A.. is then also 1
r and hence the vectors xJ' and therefore the matrices Dj> j := 1,2, ... k,
are defined uniquely up to scalar multiples. We thus assume that D I = I
for somediagonal matrix D with ID I == 1. and that all ones appearing in the matrices D2 D 3 ..... D. are located in
Exercise 1. Let the matrix A of Theorem 1 be nonnegative. Show that the first parts of the main diagonals. With this assumption, the matrices
D I, D 2 , , D. are uniquely defined.
== r if and only if A = B.
Il...
Now observe that Eq. (6) implies the relation
Exercise 2. Suppose that. in Theorem 1. the matrix A is obtained from B A == wjDJADil = wjDJC()).D.AD;l)Di l = wJw.DjD.A(DJD.)-t
by replacing certain rows and/or columns by zero rows and/or columns,
respectively. Show that Il... < r. 0 for any pair i, s, where 1 :s; J. s :s; k, and therefore the same reasoning implies
that the vector DJD.x is an eigenvector of A associated with the eigenvalue
Theorem 2. Let the matrix A e Ill""" be nonnegative and irreducible and wJ(J}.r of modulus r. Hence Wjw. := (1)1 for some i. 1 ~ i ~ k, and. con-
have eigenvalues ,1.1> .:1.2 .:I.". If there are exactly k eigenvalues .:1. 1 == r, =
sequently. DJD. DI Thus, {WI' OJ:z co,,} is a finite commutative (or .'\
A2 1. of modulus r and if ro1> roll are the ditinct kth roots of unity. Abelian) group consisting of k distinct elements. !I
then AJ = roJr.i = 1,2... k. It is well known that the kth power of any element of this group is equal .1.1
Moreover. the n pointsofthe comple:?, planecorresponding to A1> . .t" are to its identity element. Thus. rob W2 wA: are all the kth roots of unity.
invariant under rotations about the origin through 2nlk. If k >1. then there as required.
is a permutation matrix P such that p T AP has the symmetrically partitioned Note that the relation (6) can now be rewritten in the form
form A := e2xI/IeDAD-1 (7)

0 Au 0 0 for some diagonal D with IDI == I. and therefore the matrix A is similar to
0 0 A23 0 the matrix e2l1l/ A:A. Thus the spectrum a(A) is carried into itself if it is rotated
bodily about the origin through 2nlk.
Furthermore, the argument used above yields' D' = 1, j = 1. 2... k,
0 and therefore the diagonal elements of D == Dj are kth roots of unity. If
0 0 0 A.- l 1e k > 1, there exists a permutation matrix P such that
Au 0 0 0
Do == pTDP == diag!PIII.1l2 12.-. 1'.1.]. (8)
PROOF. Let Aj = wjr denote all the eigenvalues of A of modulus r, where where the II are identity matrices of appropriate sizes. 1= 1.2... s, and
coJ = el'PJ 0 = rf'1 < rf'2 < ... < rf'1l < 2n. To show that WI. W2.. W. are P, = elr/t', 1/1, = n,2n/k, 0:= nl < n2 < ... < n. < k. Note that in view of
the kth roots of unity. apply the result of Corollary 1 and write Eq. (7),
(6) (9)
IS NONNEGATIVE MATRICES IS.S RBDUCIBLE MATRICES
543
= =
in which A o pTAP. H A o [A iJ]tJ=l is a partition of A o consistent 15.5 Reducible Matrices
with that of Do in (8), then Eq. (9) implies
AI) = BJ.l.IJ.l.i 1 AI}. 1 :;; i. j :;; s. (10)
We now consider the possibility of removing the assumption that A is
In particular, this implies Ail = 0 for i = 1,2, ... , s. Note also that no irreducible from the hypothesis of Theorem 15.3.1. Conclusions (a) and (b)
block-row in A o consists entirely of zeros, since otherwise we contraetict of that theorem yield results for reducible matrices because such a matrix
the assertion of Proposition 15.3.1. Thus, starting with i = 1 in Bq. (10), can be thought ofas the limit of a sequence ofpositive (and hence irreducible)
at least one of the matrices Au, Au, ... , AI. is not the zero-matrix and matrices. Our conclusions will then follow from the continuous dependence
therefore at least one of the numbers of the dominant eigenvalue (and its normalized right eigenvector) on the
matrix elements.
J1.IJijl = exp[2ni(1 + nl - nJ)/k] In contrast to this, we know from our experience with perturbation theory
is equal to 1. This is possible only if nJ = 1 + nl = 1, 1 S j :s;;; s. Since in Chapter 11 that the structure of a matrix (as determined by its elementary
nit n2' ... , n; are integers and nl < n2 < ... < n., this implies J = 2. Hence divisors) does not depend continuously on the matrix elements. For this
Au = A I 4 = '" = A.. = O. Putting t = 2 in Eq. (10), we similarly derive reason, conclusion (c) of Theorem 15.3.1 yields no useful results in the
that A 2J = O,j = 1,2, ... , sbut} :i: 3. Continuing this process, thefoUowing reducible case.
form of A o is obtained: .
Theorem 1. If the matrix A E /R" ><" is nonnegative, then
o
(a) The matrix A hasa realeigenvalue, r, equalto the spectral radius of A;
(b) There is a nonnegative (right) eigenvector associated with the eigen-
o value r.
o o
Ad Ad A_I 0 PROOF. Define the matrix D(t)e R"><n by its elements dlAt) as follows:

The same argument applied for the case i = sin Eq. (10) yields the existence a o o
if au> 0,
d'At) = { ,')
of at least one j, 1 :s;;; j :s;;; s - 1, such that if au=O.
exp[2ni(l + n. - nJ)/k] = 1, 1 :s;;; j S s - 1. Then D(t) > 0 for t > 0 and D(O) = A. Let p(t) be the maximal real eigen-
Since 0 S n) < n. < k, 1 S j S; S - 1, this equality holds only if 1 + n. value of D(t) for t 0. Then all the eigenvalues of D(t) are continuous I'

- n) = k. But n l = 0, n 2 = 1, ... , n.- l = S - 2, and therefore n. = ~ - 1 functions of t (Section 11.3) and, since pet) is equal to the spectral radius of . i
and nJ = 0, hence j = 1. Thus A.2 = Ad = ... = A- l = O. It re~ams to D(t) for t > 0, lim,...0 + pet) = r is an eigenvalue of D(O) = A. Furthermore,
observe that s = k. To this end, recall that when i = s - 1, the relation (10) this eigenvalue is equal to the spectral radius of A.
holds for Ai) :i: 0 only if j = s. Thus 1 + n.-l - n. = 0, and since n._1 = i We also know (see Exercise 11.7.6) that there is a right eigenvector x(t)
that can be associated with p( t) and that depends continuously on t for t ~ O.
s - 2 and n. = k - 1, we obtain s = k. The proof is complete. !
By Theorem 15.3.1, x(t) > 0 for t > 0 and hence x(t).~ 0 at t O. =
Perron's original results of 1907 can now be stated as a corollary to
Theorems 2 and 15.3.1.
CoroUary 2. A positivesquare matrix A has a real and positiveeigenvalue r
I Exercise 1. What happens if we try to extend the results of Theorem 15.4.2
to reducible matrices by means of a continuity argument?

that has algebraic multiplicity 1 andexceedsthe moduli of all other eigenvalues Exercise 2. If A E IR" ><n is nonnegative and tT) = L:= 1 tTJIr;, prove that
of A. To the eigenvalue r corresponds a positiveright eigenvector.
PROOF. The last part ofTheorem 2 implies that if A > Owemusthavek = 1,
and hence r exceeds the moduli of all other eigenvalues of A.
Ii Hint.
min
J
tT) S r S; max tTJ.
s
Note Exercise 15.3.7 and apply a continuity argument.
544 IS NONNEGATIVS MATRICES 15.6 PRIMITIve AND IMPRIMITIVB MATRICES 545

Exercise3. Show that for any reducible matrix A e !R">l/l there exists a Exercise 1. Check that any positive matrix is primitive. I
permutation matrix P such that Exercise 2. Check that the index of imprimitivity of the matrix
Btt Bu ... B1Ic]
pTAP= ~ B 22 .. B~t ,
[ .. .. . .. ...
o ... 0 Bilk
(1)
A - (! ~ !l
is equal to 2. 0
where the square matrices BII , i = 1, 2, . , k, are either irreducible or 1 x 1
zero-matrices. 0 The index of imprimitivity is easily found (see Exercise2) if the character-
istic polynomial is known. Suppose that
The form (1) of A is often called a normalfarm of a reducible matrix.
c(A) = A" + atA"! + a2A"2 + .. , + a,A"' (1)
Exercise 4. Check that the matrix
is the characteristic polynomial of A E R" >l ", where ai' a2 , , tic are nonzero

l~ ~ ~ ~]
and
A = n > n1 > n2 > ... > n, ~ O.
1 020
If Ao is any eigenvalueand ro1 , , rot are the kth roots of unity, then we
220 1 saw in Theorem 15.3.1 that roJA.o, j = 1,2, ... , k, are all eigenvalues of A.
is reducible and show that its normal form is Thus, c(A) is divisible by the product
= A" - A~;
[~o l.i~i2 . _~] .
(;I; - rol.1.o)(A - ro2.1.o)'" (A - (1)".1. 0 )
B= ... without remainder. Hence, for some polynomial g and an integer s, we have
0 1
c(A) = g(At)A.
o 0:1 0
Comparing this with Eq. (I), we see that k is a common factor of the
Exercise 5. If A e R">l" and al" > 0 for J < k, prove that the eigenvalue differences n - nJ, j = 1,2, ... , t. Suppose that I is the greatest common
p of A with largest real part is real and has multiplicity 1. Prove also that divisor of these differences; then using the reverse argument we see that the
there is a positive (right) eigenvector of A associated with p. What can be said spectrum of A is invariant under rotations 21l/1. If k < I, then 21l/1 < 21/k
if we have only aJt ~ 0 for j < k? and the invariance under rotations through 21l/1 is not consistent with the
Hint. Consider A + 11.1 for large 11.. 0 definition of k. Hence k = 1and we have established the following fact.
'Proposldon 1. If A e Rft>lll is irreducible and nonnegative, and c(A) written
in theform (1) is the characteristicpolynomialofA, then k, the index ofimprimi-
tivity of A, is the greatest commondivisor ofthe differences
15.6 Primitive and Imprimitive Matrices
n - n z, ... ,n - n,.
For example, if
The Perron-Frobenius theorems make it clear that the structure of a
nonnegative irreducible matrix A depends on the number k of eigenvalues c(A) = AtO + alA' + a2A., al' a2 < 0,
whose moduli are equal to the spectral radius of A. In particular, it is con- then k = 3. However, if
venient to distinguish between the cases k = 1and k > 1.Thus, an irreducible
nonnegative matrix is said to be primitive or imprimitive according as k - 1 c(A) = AI O + alA' + a2A + a3' ai' a 2 , a 3 < 0,
or k > 1. The number k is called the index ofimprimitivity. then k = 1.
546 IS NONNllGATIVIl MATlUCES 15.7 STocHASTIC MATRICES
547
The next result provides a rather different characterization of primitive Exercise 6. Check that the matrices
matrices.

k[~
0 1
Theorem I. A square nonnegative matrix A is primitive if and only if there
is a positive integer p such that A" > O.
7 0
2 8
and [0! t ifJ
0
PROOF. First suppose that AI' > 0 and hence that AI' is irreducible. We
have seen (Exercise 10.7.1) that A reducible would imply A" reducible, hence
1 1 iJ ito
are primitive. 0
AI' > 0 implies that A is irreducible. If A has index of imprimitivity k > 1,
then since the eigenvalues of A" are the pth powers of those of A, the matrix The num~r p such that A" > 0 for an n x n nonnegative primitive matrix
A" also has index of imprimitivity k > 1. But this contradicts Perron's A can be estimated as follows:
theorem applied to the positive matrix A". Hence k = 1 and A is primitive.
p.:s; n 2 - 2n + 2.
Conversely, we suppose A to be primitive; hence A has a dominant real
eigenvalue r of multiplicity 1 (Theorems 15.3.1 and 15.4.2). Using Theorem Since p =n 2
- 2n +2 for the matrix
9.5.1 and writing r = AI' we have 010 o
I mr-l
001
A" = r"ZID +L L h,)Z,),
1-2 )-0 A= AeR/I"/I,
where f(A) = A" and h,) is the jth derivative of f at A" We easily deduce o 1
from this equation that, since r is the dominant eigenvalue, 1 1 0 0
. A" the estimation (3) is exact. We omit the proof.
lim p := ZID' (2)
""'rIO r
From Exercise 9.9.7 we also have 15.7 Stochastic Matrices
C(r)
ZID = m(I)(r)' The ~eal matrix P = [PljJi.J-t is said to be a stochastic matrix if P ~ 0
and aU Its row sums are equal to 1:
and the minimal polynomial m is given by /I
LPlJ = 1, i = 1,2, ... , n. (1)
meA) = (J. - r) n(.\ - A,)m,.
I
j= I

1"'2 Matri.ces of thi~ kind .arise in problems of probability theory, which we


Since r is the largest real zero of m(A.) and m(A.) is monic, it follows that shall introduce m Section 15.8. In this section we derive some of the more
m(ll(r) > O. Then since' C(r) > 0 (Exercise 15.3.10), we have ZID > O.Itnow important properties of stochastic matrices.
follows from Eq. (2) that A" > 0 from some p onward. ~eorem I. A nonnegative matrix P is stochastic if and only if it has the
Exercise 3. If A is primitive and p is a positive integer, prove that A" is eiqenualue 1 with (right) eigenvector given by u = [1 1 '" l]T. Further-
more, the spectral radius ofa stochastic matrix is 1.
primitive.
PROOF. If P is stochastic, then the condition (1) may obviously be written
Exercise 4. If A ~ 0 is irreducible and 11 > 0, prove by examining eigen-
values that el + A is primitive. Compare this with Proposition 15.3.1.
Pu = u. Hence 1 is an eigenvalue with (right) eigenvector ".
Conversely, Pu = u implies Eq. (1) and hence that P is stochastic. .
Exercise 5. Check that the first matrix in Exercise 15.1.6fails to be primitive For the last part of the theorem we use Exercise 15.3.7 to show that the
for any n > 1. dominant real eigenvalue of P is 1. '
548 IS NONNEGATIVE MATinCES IS.7 STOCHASTIC MATRICES 549

In the next theorem we see that a wide class of nonnegative matrices can For p sufficiently large we have
be reduced to stochastic matrices by a simple transformation. I' _ p! A -j+1
JI,J - (p _ j + I)! f , (3)
Theorem 2. Let A be an n x n nonnegative matrix with maximalreal eigen-
value r. If there is a positive (right) eigenvector x = [Xl X2 Xn]T and since 1,1.,1 < 1 for 1= k + 1, ... , s, it follows that the last term in Eq,
associatedwith r and we write X= diag[xl' ... , xJ, then (2) approaches the zero-matrix as P -+ 00. If we now assume that P is primitive,
A = rXPX-l, then the second term on the right of Eq. (2) does not appear. Hence it is
clear that P'" exists and, furthermore, P'" = ZlO'
where P is a stochastic matrix. Conversely, suppose it is known that P'" exists; then we deduce from
Eq. (2) that
Note that some nonnegative matrices are excluded from Theorem 2 by the i: Ift,-l
hypothesis that x is positive (see Theorem 15.5.1).
Q = lim L L Ji,jZ'J
PROOF. Let P = r- 1X- 1AX. We have only to prove that P is stochastic. 1'""'''' '=2 J=O
Since Ax = rx we have exists. The m2 + m3 + ... + mt matrices ZIj appearing in this summation
n
c
are linearly independent members of n x n (Theorem 9.5.1) and generate
L alJxl = rx" i = 1,2, ... , n. a subspace 9i say, of cn x n Since Qmust also belong to 9i there are numbers
J~l a'j such that
'f j

By definition, P,j = r-lx/1a'jxl and hence


n n
'"
t.. PiJ = r -1 Xi-l~
t.. a'jXj = 1. It now follows that, for each I and j,
Jm 1 jml

Thus, P is stochastic and the theorem is proved. a'l = lim Ji.J


1'""''''
Finally, we consider the sequence of nonnegative matrices P, r, P 3
, ,
But for 1= 2,3, ... , we havelA,1 = 1 (A, #= 1) and it follows from Eq. (3)
where P is stochastic. We are interested in conditions under which that the limit on the right does not exist Thus, we have a contradiction that
P'" = limP" can be resolved only if the existence of P'" implies k = 1. .
It should be noted that the hypothesis that P be irreducible is notnecessary.
exists. We already know that P'" #= 0, for P has an eigenvalue equal to 1 It can be proved that the eigenvalue 1 of P has only linear elementary
and p r -+ 0 as r -+ 00 if and only if all the eigenvalues of P are less than 1 divisors even when P is reducible. Thus, Eq. (2) is still valid and the above
(Exercise 11.1.8). argument goes through. The proof of the more general result requires a
deeper analysis of reducible nonnegative matrices than we have chosen to
Theorem 3. If P is an irreducible stochastic matrix, then the matrix P'"
present,
= limp"",,,, pI' exists if and only if P is primitive.
Note also that, using Exercise 9.9.7, we may write
PROOF. We know that the maximal real eigenvalue I, of P is equal to 1. .., C(1)
Recall that, since P is irreducible, the eigenvalue 1 is unrepeated (Theorem P = m(1)(I)'
15.3.1(c. So let ,t2, ... ,,tt be the other eigenvalues of P with modulus 1
and let At+ l' ... , ,t" be the distinct eigenvalues of P with modulus less than 1. where C(,t) and m(A) are the reduced adjoint and the minimal polynomial
Wewritef(,t) = AI' and Ji.J for thejth derivative oflaU"and using Theorem =
of P, respectively. If B()") adj(IA - P) and c(A) is the characteristic poly-
9.5.1 again we have nomial of P, then the polynomial identity
B(A) q)..)
(2)
C(A) = m(J.) \\
~
IS NoNNIlGATMl MATlUCIlS 15.8 MAlUCOV CHAINS 551
implies that we may also write of being in state s, at time t.. given that the system is in state Sj at time t.-t.
The laws of probability then imply \.
OD B(1)
P = c(t)(l)' II

Yet another alternative arises if we identify Zto with the constituent


p,(t.) = L quCl.1 t.-t)PP.-l)'
Jel
matrix G1 of Exercise 4.14.3. Thus, in the case in which P is irreducible, or, in matrix-vector form,
there are positive right and left eigenvectors x, " respectively,of P associated
with the eigenvalue 1 such that p, = Q(t.. t,-t)P.-l' r == 1,2" .. , (2)
,TX =1 and P"" = X,T. (4) where we define
Exercise 1. A nonnegative matrix P e Rille II is called doubly stochastic if P' = [Pt(t.) P2(t.) ... PII(t.)Y, S = 0, 1,2,.,.,
both P and p T are stochastic matrices. Check that the matrices
and Q(t.. t.- 1) to be the matrix in R"le" with elements q,tt.ltr-t), called the

PI
i !
= [.0 ! !
0] and P;a = fi i 0**] transition matrix.
Since the system must go to one of the n states at time t; after being.in
i 0 i [o 1 i state j at time t.: 1t it follows that i
.1 I

II
are stochastic and doubly stochastic matrices, respectively,and that P'f and
Pi exist. 0
L q,;(t.lt.-t) = 1. (3)
'=1
Finally, it is assumed that the initial state of the system is known. Thus, jf
the initial state is S, then we have Po = '" the vector with 1 in the ith place
15.8 Markov Chains and zeros elsewhere.We deduce from Eq. (2) that at time t, we have I!
P. = Q(t.. t.- 1 )Q(t. - 10 t.- 2 ) , 2(tt, to)Po' (4)
Nonnegative matrices arise in a variety of physical problems. We sha;ll The process we have just described is known as a Markov chain. If the
briefly present a problem of probability theory that utilizes some of the conditional probabilities qitt.lt.-t) are independent of t or are the same at
theorems developed earlier in the chapter. each instant to, t lt t a" . , then the: process is described as a homogeneous
For simplici~y, we consider a physical system whose state, or condition, Markov chain. We shall confine our attention to this relatively simple
or position, can be described in n mutuany exclusive and exhaustive ways. problem. Observe that we may now write
We let SI' Sz, . SII denote these states. We suppose it possible to examine
the system at times to < t 1 < t z < ' . , and to observe the state of the system Q(t.. t.- 1 ) = Q, r = 1,2, ....
at each of these times. We suppose further that there exist absolute proba- Equation (4) becomes
bilities PI(t.) of finding the system in state s, at time t. for i = I, 2, ... , n
and r = 0, I, 2, .... Since the system must be in one of the n states at time P. = Q'po,
I we have and Eq, (3) implies that the transition matrix Q associated with a homog-
II
enous Markov chain is such that QT is stochastic. For convenience, we define
L p,(t.) = I, r = 0,1,2, .... (1) P= QT,
'=1 In many problems involving Markov chains, we are interested in the
It is assumed that we can determine the probability that state Si is attained limiting behavior of the sequence Po, PI' Pz", This leads us to examine I'
I:
at time t, after having been in state 5J at time t.- 1, for all i andj. Thus, we the existence of Jim....", Q', or lim......, P". If P (or Q) is irreducible, the9 I

assume a knowledge of the conditional probability according to Theorem 15.7.3, the limits exist if and only if P is primitive.
Q,tt.lt.- 1 ) , i,j = 1,2"", n, Since P = QT it follows that Q has an eigenvalue I, and the associated right
.XU. 15 NONNBGATIVIl MATlUCf.S

and left eigenvectors x, 1 of Eq. (15.7.4) are left and right eigenvectors of Q,
respectively. Thus, there are positive right and left eigenvectors e
(-1)
and, (=x) associated with the eigenvalue 1 of Q for which ,l~ = 1 and APPENDIX 1
QGO = ~"T.
We then have
P", = '('IT Po) = (,T Po)~, A Survey of Scalar
since"T Po is a scalar. So, whatever the starting vector Po, the limiting vector Polynomials
is proportional to ~. However, Pr satisfies the condition (1) for every r
and so Po satisfies the same condition. Thus, if we define the positive vector
=
~ = ['1 '2 ... 'n]T so that ~:r.=1 " = 1, then p.., ~ for every possible
choice of Po.
e
The fact that > 0 means that it is possible to end up in any of the n
states Sl' 82' , Sn' We have proved the following result.
Theorem 1. If Q is the transition matrix of a homogeneous Markov chain
and Q is irreducible, then the limiting absolute probabilities PI(t..,} exist if
and only if Q is primitive. Furthermore, when Q is primitive, p,(t..,) > 0 for The expression
i = 1, 2, .. , n and these probabilities are the components of the (properly
normalized) right eigenvector of Q associated with the eigenvalue 1. P(A) = Po + PIA + P2,t2 + ... + PIAl, (1)
with coefficients Po, Pl, ... , p, in a field .ofF and Pt :1= 0, is referred to as
a polynomial of degree lover :F. Nonzero scalars from ofF are considered
polynomials of zero degree, while peA) 55 0 is written for polynomials with
all zero coefficients.
In this appendix we give a brief survey of some of the fundamental pro-
perties of such polynomials. An accessible and complete account can be ,.
found in Chapter 6 of "The Theory of Matrices" by Sam Pedis (Addison
Wesley, 1958), for example.

Theorem 1. For any two polynomials p(A) and q(A) over ofF (with q(A) :f/;. 0),
there exist polynomials d(A) and r(l) over fF such that
p(A) = q(A)d(J.) + r(A), (2)

where deg rCA) < deg q(A} orrCA) == O. The polynomials del) and r(.t) satisfying
Eq. (2) are uniquely defined.

The polynomial d(A) in Eq. (2) is called the quotient on division of P(l) by
q(l) and r(.t) is the remainder of this division. If the remainder r(A) in Eq, (2)
is the zero polynomial, then d(A) (or q(A is referred to as a divisor of p(.t).
In this event, we also say that d(A) (or q(.i divides p(A), or that P(l) is divisible
by d(A). A divisor of degree 1 is called lineal'.

553
554 ApPllNDIX I A SURVEY OF SCALARPoLYNOMIALS 555
Exercise 1. Find the quotient and the remainder on division of P(A) Theorem 4. Any polynomial over C of nonzero degree has at least one zero
== A:J + A + 2 by q(l) == ,1,2 + A + 1. in C. Ii;
Answer. deAl == 1 - 1, r(l) == 1 + 3. Dividing peA) by (1 - AI) with the use of Theorem 1, the next exercise is
easily established.
Exercise 2. Show that P(A) divides q(A) and simultaneously q(A) divides
P(A) if and only if Pel) == cq(l) for some c e:F. 0 Exercise 5. Show that Al is a zero of p(A) if and only if A - Al is a (linear)
divisor of peA). In other words, for some polynomial Pt<-t) over fF,
Given polynomials Pl(A), Pz(A), ... , Pk(A), q(.t) over " the polynomial
q(.t) is caned a common divisor of PiCA) (i == 1,2, ... , k) if it divides each of p(.t) = Pl(,t)(,t - .t1) . 0 (4)
the polynomials Pi(.!)' A greatestcommon divisor deAl of Pi(.t) (i = 1, 2, ... , k), Let A.1 be a zero of P(A). It may happen that, along with A- AI' higher
abbreviated g.c.d.{p,(A)}, is defined to be a common divisor of these poly- powers, (A - A1)2, (A - AI)3, ... , are also divisors of peA). The scalar Al e!F
nomials that is divisible by any other common divisor. To avoid unnecessary is referred to as a zero of P(A) of multiplicity k if (A - AI'! divides p(A) but
complications, we shall assume in the remainder of this section (if not (A - AI)k+ 1 does not. Zeros of multiplicity 1 are said to be simple. Other
indicated otherwise) that the leading coefficient, that is, the coefficient of the zeros are called multiple zeros of the polynomial.
highest power of the polynomial being considered, is equal to 1. Such
polynomials are said to be monic. Exercise 6. Check that 1 1 is a zero of multiplicity k of PeA) if and only if
p(A 1 ) = p'(At) = ... == p("-I)(ll) = 0 but Ik)(AI) :F O. 0
Theorem Z. For any set of nonzero polynomials over fF, there exists a Consider again the equality (4) and suppose pl(l) has nonzero degree.
uniquegreatest common divisor. By virtue of Theorem 4 the polynomial PI(A) has a zero A2 Using Exercise 5,
This result follows immediately from the definition of a greatest common we obtain P(A) = P2(A)(A - A2)(A - .1 1) , Proceeding to factorize Pz(A) and
continuing this process, the following theorem is obtained.
divisor. If the greatest common divisor oCa set of polynomials is I, then they
are said to be relativelyprime. Theorem 5. Any polynomial over C of degree I (I ~ 1) has exactly I zeros
At, A2. '" ,A, in C (which may include repetitions). Furthermore, there isa
Exercise J. Check that the polynomials P(A) = "iz + 1 and q(A) = 1 + 13
representation
are relatively prime.
p(l) = pl.,l - A,)(.1. - 1,-1) ... (A - At), (5)
Exercise 4. Construct a set of three relatively prime polynomials, no two
where P, = 1 in the caseofa monicpolynomial p(A).
of which are relatively prime. 0
Collecting the equal zeros in Eq. (5) and renumbering them, we obtain
Theorem 3. If deAl is the greatest common divisor of Pl(A) and P2(A), then
there exist polynomials cI(A)and c2(A) such that p(A) = PAA - 1 1)"'(1 - Az)kZ (A - As'!-, (6)
+ k 2 + ... + k. == I. j.
cl(l)pl(l) + c2(A)pz{A) = del). where 1, "" AJ (i "" j) and k l
I
In particular, ifPl(.!) and P2(A) are relatively prime, then Given PI(A), P2(A), ... , p,,(A), the polynomial peA) is said to be a common
multipleof the given polynomials if each of them divides P(A):
CI(A)Pl(A) + c2(A)P2(A) == 1 (3)
p(A) == qi(A)pi(A), 1 sis k; (7)
for some polynomials cl(A) and c2(A).
for some polynomials q;(A) over the same field. A common multiple that is a
This important theorem requires more proof. The argument is omitted divisor of any other common multiple is called a least common multiple of J i
in the interest of brevity. the polynomials.
Let Al e:F. If p(A 1 ) = E-o
p,A~ == 0, then Al is a zero of PeA) or a root
Theorem 6. For any set of monic polynomials, there exists a unique monic
of the equation PeA) == O. The fundamental theorem of algebra concerns
the zeros of a polynomial over C. leastcommon multiple.
l'
556 APPENDIX I

Exercise 7. Show that the product n~=l PI(A) is a least common multiple
of the polynomials PI(A), p:z(A), , Pk(A) if and only if any two ofthe given
polynomials are relatively prime. 0 APPENDIX 2
Exercise 7 can now be used to complete the next exercise.
Exercise 8. Check that if peA) is the least common multiple of PI(A),
p:z{A), . , PleA), then the polynomials q~A) (i == 1, 2, ... , k) in Eq. (7) are Some Theorems and Notions
relatively prime. 0
Note that the condition in Exercise 7 is stronger than requiring that the
from Analysis
set of polynomials be relatively prime.
Exercise 9. Prove that if peA) is the least common multiple of pIOL),
P2(A), . , Pk-1(..1.), then the least common multiple of PI(A), Pz(A), ... , pkCA)
\ '
coincides with the least common multiple of jj(A) and PleA). 0
A field !F that contains the zeros of every polynomial over !F is said to be
algebraically closed. Thus Theorem 4 asserts, in particular, that the field C
I
of complex numbers is algebraically closed. Note that the polynomial
peA) = ..1.2 + 1 viewed as a polynomial over the field IR of real numbers In this appendix we discuss without proof some important theorems of
shows that IR fails to be algebraically closed. analysis that are used several times in the main body ofthe book. We present
. If peA) is a polynomial over !F of degree I, and 1 is the only monic poly- the results in a setting sufficiently general for our applications, although they
nomial over !F of degree less than I that divides PeA), then peA) is said to be can be proved just as readily in a more general context. We first need some
irreducible over :F. Otherwise, pel) is reducible over iF. Obviously, in view definitions.
of Exercise 5, every polynomial of degree greater than lover an algebraically Recall that in en any vector norm may be used as a measure of magnitude
closed field is reducible. In fact, if peA) is a monic polynomial over an alge- of distance (Section 10.2). Given a vector norm 1/ 1/ on C" and a real number
braically closed field !F and PeA) has degree I, then peA) has a factorization 8 > 0, we define a neighbourhood .AI;(xo) of Xo E en to be the set of vectors
p(A) = n~= I (A - Aj) into linear factors, where ..1.1> . 1.2 , , A,E!F and are (also called pointsin this context) x in C" for which IIx - xol/ < 8 (that is, it is
not necessarily distinct. If the field fails to be algebraically closed, then the interior of a ball with centre Xo and radius 6; see Section 10.2). If f/' is a
polynomials of degree higher than 1 may be irreducible. set of points of en, then x e en is a limitpoint of f/' if every neighbourhood of x
Exercise 10. Check that the polynomial peA) = A2 + 1 is irreducible over contains some point 1 e f/' with 1 "" x. The set 9' is said to be open if it is a
IR and over the field of rational numbers but is reducible over C. 0 union of neighbourhoods, and closed if every limit point of f/' is in f/. A
set f/' is bountted if there is a real number K ~ 0 and an x e C" such that
Note that all real polynomials of degree greater than 2 are reducible over 111 - xl/ < K for all 1 e f/.
IR, while for any positive integer I there exists a polynomial of degree I with Clearly, these definitions can be applied to sets in e or in IR. We have
rational coefficients that is irreducible over the field of rational numbers. only to replace the norm by the absolute value function. We can now state
the first result.

Theorem 1. Let X be a closed and bounded set of real numbers and let M
and m be the least upper bound and greatest lower bound, respectively, of the
numbers x in X. Then M, me X,

The proof is a straightforward deduction from the definitions.

. 557
558 Al'PBNDlx2 SOME THOOREMS AND NOTIONS FROM ANALYSIS 559

Now consider a function I with values in C'" and defined on an open (c) With
set f/ s;; CII. We say that / is continuous at a point x e f/ if, given Il > 0,
there is a ~ > 0 such that y e .K,,(x) implies lex) = {~: Os x < 1,
x = I,
II/(Y) - /(x)1I < e.
we have f/ = [0, 1], which is closed and bounded, but / is not continuous
We say that I is continuous on f/ if I is continuous at every point of f/. on f/. Nowf(sP) = [0, 1] and SUPxE.9' I(x) = 1 is not attained. 0
If f/ is closed, then I is continuous on f/ if it is continuous on some open set
A set sP in the complex plane C is said to be convex if for any OCIo OC2 EsP
containing f/. The range I(f/) of the function I is a subset of C'" defined by
and any 8 E [0,1],
saying that s e/(f/) if and only if there is a vector 6 ECII such that" = f(6).
(1 - 9)1 + 9OC2 EsP.
Theorem2. Let/ be defined as above. 1/f/ is a closed andbounded set in CIt
andfis continuous on.Y, then/(f/) is a closed and bounded subset o/C"'. The convex set that contains the points eX l' /X 2 , ,OCt and is included in
any other convex set containing them is called the convex hull of oc 1 , OC 2,
In Theorem 2 and the next theorem we may replace CII by IR" or C'" , /X1t;. In other words, the convex hull is the minimal convex set containing
by R"', or both. The next theorem is obtained by combining Theorems 1 the given points.
and 2.
Exercise 2. Prove that the convex hull of 0110 /X2' , Ot:t in C is the set of
Theorem3. Let / be a continuous real-valued function defined on a closed all linear combinations of the form
and bounded subset sPo/CII.If It;

M = sup f(x), m = inf /(x), 9, ~ 0 and L 9, = 1.


1=1
(1)
ItE.9' ItE.9'
(Such sums are called convex linear combinations.)
then thereare vectors Y, % e sPsuch that
Hint. First show that the set of elements given by (1) is convex, and then
M =f(y), m= fez).
check its minimality (in the previous sense) by use of an induction argument.
In this statement SUPn.9' /(x) is read, "the supremum, or least upper bound, ~ 0
ofthe numbers j'(z), where x e fI'." Similarly, infn.9' f(x) denotes an infimum The notion of convexity can be extended to linear spaces in a natural
or greatest lower bound. way. Thus, let 11' be a linear space over :IF (where :IF is IR or C). The line
Theorem 2 tells us that, under the hypotheses of Theorem 3, fey) is a segment joining members x and y of 11' is the set of elements of the form
closed and bounded set of real numbers and the result then follows from (1 - 9)x + (Jy where 9 e [0, 1]. Then a subset Y of!l' is said to be convex if,
Theorem 1. It may be said that, under the hypotheses of Theorem 3, f has for each pair x and yin .Y, the line segment joining x and y is also in sP. For
a maximum and a minimum on sP and we may write example, it is easily seen that any subspace of!l' is convex. iI
sup f(x) = maxf(x), inf lex) = min/ex). Exercise 3. Show that any ball in CIt is a convex set.
xe.9' ",,,.9' x".9' lte.9'

We now show in the following example that the three conditions of Exercise 4. Let A e C"''''' and be C"'. Show that the solution set of Ax = b
Theorem 3-j continuous on.Y, sPbounded, and Y closed -are all necessary. is a convex set (when it is not empty). 0

Exerdse 1. (a) Let lex) = x, and let f/ be the open interval (0, 1).
Here f/ is bounded but not closed and / is continuous on sP. Clearly M = 1,
m = 0, and f does not attain these values on sP.
(b) With f(x) = l/x and f/ the set of all real numbers not.less than I,
we have / continuous on f/ and sP closed but unbo~nded. In ~hls case j(Y)
is the half-open interval (0, 1] and inf"",.9' f(x) = 0 IS not attained.
SUGGESTIONS FOR FURTHER RBADING 561

Gantmacber, F. R. The Theoryof Matrices, vols. 1 and2. New York: Chelsea, 1959. [1-15]
Glazman, 1. M., and Liubich, J. 1. Finite-Dimensional Linear Analysis. Cambridge, Mass.:
M.I.T. Press, 1974. [3-10]
APPENDIX 3 Gohberg.L, Lancaster, P., and Rodman, L. Matrix Polynomials. New York: AcademicPress,
1982. [14] .
Golub, G. H., and Yan Loan, C. F. Matrix Computations. Baltimore, Md.: Johns Hopkins
Univ. Press, 1983. [3.4, 10].
Halmos, P. R. Finite-Dimensional Vector Spaces. New York: Van Nostrand, 1958. [3-6]
Suggestions for Further Hamburger, M. L., and Grimshaw, M. E. Linear Transformations in n-Dtmensional Vector
, Spaces. London: Cambridge Univ., 1956. [3-6]
Reading Heinig, G., and Rost, K. Algebraic Methodsfor Toeplitz-like Matrices and Operators. Berlin:
Akademie Verlag, 1984. [13].
Householder, A. S. The Theoryof Matrices in Numerical Analysis. Boston: Ginn (Blaisdell),
1964. [10, 15]
Kato, T. Pelturbation Theory for Linear Operotors. Berlin: Springer-Yerlag, 1966 (2nd ed.,
1976). [11]
Krein, M. G., and Naimark, M. A. "The Method of Symmetricand Hermitian Forms in the
Theory of the Separation of the Roots of Algebraic Equations." Lin. and Multi/in. Alg. 10
(1981):265-308. [13]
Lancaster, P. Lambda Matrices and Vibrating Systems. Oxford: Pergamon Press, 1966. [14]
Lerer, L., and Tismenetsky, M. "The Bezoutian and the Eigenvalue-SeparationProblem for
Matrix Polynomials." Integral Eq. and Operator Theory 5 (1982): 386-445. [13]
In this appendix, references to several books and papers are collected MacDuffee, C. C. The Theory ofMatrices. New York: Chelsea, 1946. [12]
that cast more light on the subject matter of the book. They may extend the Marcus, M., and Mine, H. A Survey 01 Matrix TheoryandMatrix Inequalities. Boston: Allyn
(finite-dimensional) theory further, or go more deeply into it, or contain and Bacon, 1964. [1-10, 11, 15]
interesting applications. Some of them have also provided sources of in- Mirsky, L. An Introduction to LinearAlgebra. Oxford: Oxford Univ. (Clarendon), 1955. [1-7]
Ortega, J. M., and Rheinboldt, W. C. Iterative Solution of Nonli'near Equations in Several
formation and stimulation for the authors. Variables. New York: AcademicPress, 1970. [15]
The list is prepared alphabetically by author, and the numbers in square Rellic:h, .P. Perturbation The.ory of EigenlJalue Problems. New York: Gordon and Breach,
brackets after each reference indicate the chapters of this book to which the 1969. [11]
reference is mainly related. Russell, D. L. Mathematics ofFinite-Dimensional Control Systems. New York: Marcel Dekker,
1979. [9]
Varga, R. S. Matrix /lerativeAnalysis. EnglewoodCliffs,N.J.: Prentice-Hall, 1962. [15]
Wilkinson,J. H. The Algebraic EigenlJalue Problem. Oxford: Oxford Univ. (Clarendon), 1965.
Barnett, S. Polynomials andLinearControlSystems. New York: Marcel Dekker, 1983. [13]
Barnett, S., and Storey, C. Matrix Methods in Stability Theory. London: Nelson, 1970. [13] [10] I'
! .
Wonham, W. M. Linear Multiuariable Control: A Geometric Approach. Berlin: Springer-
Baumgirtel, H. Endlich-dimenslollQ/e Analytishe StlJrungstheory. Berlin: Acad~mie-Verlag,
Verlag. 1979. [9]
1972.[11] To appear in English translation by Birkhii.user Verlag.
Bellman, R.Introduction to Matrix Analysis. New York: McGraw-HiH, 1960. [1-15]
Ben-Israel, A., and Greville, T. N. E. Generalized Inverses: Theory and Applications. New I
York: Wiley, 1974. [IZ]
Berman, A., and Plemmons, R. J. NonnegatilJe Matrices in the Mathematical Sciences. New
York: Academic Press, 1979.[15]
I
Boullion, T., and Odell, P. Generalized Inverse Matrices. New York: Wiley (lnterseience),
1971.[11]
I;
CampbeII,S. L., and Meyer, C. D. Jr., Generalized InlJerses 01 LinearTransformations. London: I

Pitman, 1979.[12]
'II I
I
Daleckjj, J. L., and Krein, M. G. Stability of Solutions 01 Differential Equations in Banach
Space. Providence. R.ls.: TranS.Amer. Math. Soc. Monographs. vol. 43.1974.[1%,14]
Dunford, N., and Schwartz, J. T. LinearOperators, Part 1: General Theory. New York: Wiley
(lnterscience), 5, 9, 10]

560
Index

A Berman, A., 560


Bessel's inequality, 109
Abel's formula, 346
Bezout matrix, 454-466
Absolute norm, 367-370, 387
Bezoutian,454-466
Adjoint matrix, 42, 274 Bilinear, 104
reduced, 272
Bilinearforms, 202
Adjointtransformation, 169-174 Binary operation, 72
Admissible pairs, 212-217
closed,72
Admissible triple, for a matrix polynomial, Binet-Cauchy formula, 39, 97
494,519
Biorthogonal systems, 113, 154, 195
Algebra, of linear transformations, 121
Block matrix, 17, 55
Algebraic multiplicity, 159,239,279,349 Booleansum of projectors, 199
Algebraically closed field, 556 Borhardt; C., 487
Analyticperturbations, 391-405 Bouillon, T., 560
Analyticfunction, 331 Boundedset, 557
Angle, 106
Annihilating polynomial, 223-229
Appollonius'sTheorem, 107
Approximate solutionsof Ax = b, 436-438 c I:
Augmented matrix, 57
Campbell, S. L., 560
Canonical set of Jordan chains, 502 i l'

B Cardinal polynomials, 308


Carlson, D., 443
Ball, 355, 356, 557 Cartesiandecomposition, 179, 219
Barnettfactorization, ,458 Cauchy index, 482
Barnett, S., 560 theorem, 332
Basis, 83 Cauchy-Schwarz inequality, 42, 106, 114,
standard, 84 381
elements; 84 Cayley transform, 219
Bauer, R L., 368, 387 Cayley-Hamilton theorem, 165, 228, 252
Baumgiirtel, H., 560 Characteristic polynomial, 155-159, 228, 240,
Bellman, R., 560 271,490
Bendixson,I., 379 Chebyshev polynomial, 166
Ben-Israel,A., 560 Chen, C. T., 449

563
564
INDEX INDEX
565
Cholesky factorization, 63, 180
Chrystal's theorem, 276
D Eigenvectors
Generalized eigenvector, 229
left, 154 order of, 229-231
Circulant matrix, 66, 146, 166 Daleckii, J. L., 445, 560
generalized, 66 linear independence of, ISO, 153 Generalized inverse, 428-438
Decomposable matrix, 273
Closed set, 557 of a linear transformation, 148 Geometric mUltiplicity, 159,239,279
Defect, of a linear transformation, 135-138,
Cofactor, 33, 42 of a matrix, 152 Gerigorin disks, 372
172
CoI,457 Elementary divisors, 266-271 Gerigorin tbeorem, 371-377
Definite matrix, 179
Degree of a function of a matrix, 313 Glazrnan, I. M., 561
Column space, 80, 93
linear and nonlinear, 266 Gohberg, I., 489, 561
Common divisor, 554 of a A.-matrix, 247
Elementary matrices, 48, 255 GolUb, G. H., 561
multiple, 555 of nilpotency, 12
"Elementary operations, 47, 253 Gram matrix, 110, 218
Commuting linear transformations, 143 Derivative of a matrix, 330
Equivalence relation, 52, 130, 184, 220, 255, Gramian, 110
Commuting matrices, 166,416-420 Derogatory matrix, 240
354,494 Gram-Schmidt process, 108
normal,420 Determinant, 23 et seq., 157, 184,373,379 Equivalence transformation on matrix Greatest common divisor, 460, 554
simple, 420 bordered, 65
polynomials, 255, 491 Greville, T. N. E., 560
Companion matrix, 32, 36, 68, 69, 157, 262, derivative of, 65
Equivalent matrices, 52 Grimshaw, M. E., 561
455 of a linear transformation, 141
Equivalent matrix polynomials, 255, 261, 279 Gundelfinger's criterion, for inertia of a
of a matrix polynomial, 280, 490, 493 Diagonal, of a matrix, 15
Equivalent norms, 353 Hermitian matrix. 298, 481, 482
second, 461, 490, 493, 497 main, 2
Euclidean norm, 351, 352, 358, 378
Compatible norms, 360, 363, 367 secondary, 2
Euclidean space, 105
Complementary SUbspace, 91 Diagonal matrix, 2 H
Exponential function, 308, 313, 323, 329,
orthogonal, III Diagonally dominant mauix, 373, 531 382
Complexification, 200 Difference equations, 277-278, 340, 512-516 Hadamard's inequality, 65
Extension
C;omponent matrices, 314-322, 332-334, Differential equations, 161-163,274-277, HaIm, W.,450
of a linear transformation, 142
347 334-340,348,506-612,525 Halmos, P. R., 561
of a nonnegative irreducible matrix, 535 Hamburger, M. L., 561
analyticityof,393-395 inhomogeneous, 162, 338-340, 348,
509-512 Hankel matrix, 66
Composition of ttansfonnations, 121
initial-value problem, 336-340, 510-512 Heinig, G., 561
Condition number, 385-387
matrix, 346, 348 F Hermite. C., 462
spectral, 386
steady-state solution, 340 . Hermite interpolation, 306
Conformable matrices, 9 Fibonacci matrix, 35
Dimension, 86 Hermitian forms, 203
Congruent matrices, 184-188,203 Field of values, 283-286
Direct sum Hermi.tian matrix, 3, 15, 63, 178-180,
Conjugate bilinear fonns, 202 Fischer, E., 288
of linear transformations, 144 184-188, 284-302
Consistem equations, 100 Flanders, H., 422
of matrices, 146 Hill, D. R., 452
Constituent matrix, 164 F1ockerzi, D., 537
Directed graph of a matrix, 528 Hirsch, A., 378
Constrained eigenvalues, 291-294 Fr&:tional powers of a matrix, 323
Distance, 354 Holder inequality, 359, 381
Constrained eigenvectors, 291-294 Fredholm alternative, 174,219
Divisor (right and left), of a matrix Holder norms, 35 I, 358, 381
Continuous function, 558 Fredholm tbeorem, 115
polynomial, 249-253, 5t8 Homogeneous equations. 61
Control function, 342 . Frobenius, G., 295, 299, 415, 481, 482, 538
Doubly stochastic matrix, 21 Householder, A. S., 387, 561
Controllable pair, 216, 448-450, 470 Fujiwara, M., 462
Dual norm, 381 Householder transformation, 190
Controllable subspace, 216, 342 Full rank, 96, Ill, 139, 140, 180
Dual system, 345 Hurwitz, A., 478
Controllable system, 342-345 Function defined on the spectrum, 305
Dunford, N., 560 Hurwitz matrix, 478
Convergence Function of-a matrix, 308
in a normed linear space, 356, 357 composite, 324
in C" , 315, 361 E Fundamental polynomials, 308 I
Convex hull, SS9
Convex linear combination, 559
Eigenbasis, lSI, 173 Idempotent matrix, n, 13. 133, 158, 164, i
orthonormal, 174, 190, 193 166, 194-199, 244, 279, 426
Convex set, 356 G
Eigenspace, 149, 159, 175 Image
Courant-Fischer theorem, 289, 300 Eigenvalue Gantmacher, F. R., 561
Courant, R., 288 of a linear lransformation, 133
dominant, 361 Gauss reduction, 49, 56 of a matrix, 77, 80
Cramer's rule, 24, 60
of a linear transformation, 148 Generalized eigenspace, 229. 232-236, 239 of a subspace, 86, 133
Cyclic subspace, 231 of a matrix, 152 order of, 229 of a transformation, 116
566
INDEX INDEX
567
Imaginary part of a matrix, 179 Jordan decomPQsition, 236 Linear transformation, 117
Imprimitive matrices, 544-547 Jordan pair, 500-505 Neighborhood, 356, 557
Inconsistent equations, 56 invertible, 140
Jordan subspace, 230, 23\ Newton sums, 487
Indecomposable matrix, 273 Linearization, 492 et seq., 520, 524
Jordan triple, 500-505 Liouville's formula, 346 Nilpotent matrix, II, 133, 158,244
Index Nonderogatory matrix, 240, 273
Liubich, J. I., 561
of an eigenvalue, 226, 228, 240, 269 Nonnegative matrix, 527
of imprimitivity, 544 Logarithmic function, 312, 313, 317
K Nonsingular matrix, 43, 375
of stabilization, 214 Lower bound, 369, 370, 387 Norm
Kato, T., 400, 561 Lyapunov equation, 416, 443-450
Induced matrix norms, 363-367 euclidean, 351, 352, 358,378
Kemel Lyapunov stability criterion, 443
Inertia of a matrix, 186-188,296-300 Frobenius, 358
with respect to the real axis, 450 of a linear transformation, 135 Holder, 35 I, 358, 381
Inertia of a polynomial, 462-466 of a matrix, 77 infinity, 35 I, .352
with respect to the real axis, 462 Krein, M. G., 445, 560, 561 M of a linear transformation, 363
with respect to !be unit circle, 466 Kronecker delta, 79 of a matrix, 358
M-matrices, 531, 532
Inf,558 Kronecker product, 407-413, 438, 439 MacDuffee, C. C., 415, 561 of a vector, 105, 106, 350 et seq.
Inner product, 7, 104, 180, 186 eigenvalues of, 412 Marcus, M., 561 submultiplicative. 358
standard, 104 for rectangular matrices, 413, 438, 440 Markov, A. A., 476 Normal coordinates, 210
Integral of a matrix, 330 Kronecker sum, 412 Normal linear transformation, 174-177
Markov chains, 550-552
Interpolatory polynomials, 306-308 Markov criterion, 475-477, 486 Normal matrices, 174-177, 183,284, 365,
Invariant polynomials, of a matrix polynomial, Markov parameters, 474, 477, 478, 483 390
261 L Matrix, I analytic perturbation of, 404
Invariant subspace, 143, ISS, 172, 197, WI, Matrix equations, 413-424 commuting, 420
374 A-matrix, 246 et seq. See also Matrix Normal modes, 210
Matrix norm, 358, 381, 382
. trivial, 143 polynomial Normalized vector, 106
induced,363-367
Inverse L-C splitting of a polynomial, 470, 485 NOrmed linear space, 351
LU decomposition, 61 Matrix polynomial, 246 et seq., 489 et seq.
left, 45, 138 monic, 248 Nullspace of a matrix, 77
of a linear transformation, 141 Lagrange interpolating polynomial. 307
Metric, 355
of a matrix, 44, 53 Lancaster, P., 468, 489, 561
Meyer, C. D., 560
right, 45, 139 Langer, H., 521
Minc, H., 561 o
Invertible ).matrix, 247 Laplace's theorem, 37
Minimal polynomial, 224, 240, 245, 271-273 Observable pair, 214-217, 341.497
Invertible linear transformation, 140, 151 Latent roots, 265, 280, 281. 491
Minkowski inequality, 351 Observable system, 340
Invertible matrix, 44, 53, 425 Latent vectors, 280, 501
Minor of a matrix, 32, 36 Odell, P., 560
Involutory matrix, ..12, 44 Laurent expansion, 333 Mirsky, L., 561 Open set, 557
Leading principal minors, 36
Irreducible matrix, 374-377, 528-542 Monotone matrices, 531 Ortega, J. M., 530, 561
Leading vector of a Jordan chain, 501 ! \
Isomorphic spaces, 113, 124, 126 Monotonic norm, 367 Orthogonal complement, III, 195
Isomorphism of algebras, 126 Least squares solution, 436 Moore, E. H., 432 ,!
Left invertihle linear transformation, 138 Orthogonal matrix, 46, 175,201,219,346
Moore-Penrose inverse, 432-440 Orthogonal projection of a vector, 112
Left invertible matrices, 140, 425-427. 439
. Multiples of matrix polynomials, 518-520 Orthogonal projector, 195,380,431
J Legendre polynomials, 114
Multiplicity Orthogonal similarity, 175
Length of a vector, 4, 105, 106
Jacobi criteron, for inertia of a Hermitian of a lalent root, 279. See also Algebraic Orthogonal subspaces, III
Lerer, L. 561
matrix,296 multiplicity. Geometric multiplicity complementary, III
Uvy-Desplanques-Hadamard theorem, 373
Jacobi, G., 487 of a zero of a polynomial. 555 Orthogonal system, 107
Lienard-Cbipart criterion, 470-474, 455, 486
Jacobi identity, 346 Orthogonal vectors, 107
Limit point, 557
Jacobi matrix, 35 Line segment, 559 Orthonormal basis, 108
Jordan basis, 232, 234 N Orthonormal system, 108 I!
r
Linear combination, 78
Jordan block, 237, 244, 268, 274, 311 Ostrowski, A., 445
Linear dependence, 8 I Naimark, M. A., 561
Jordan canonical form, 237, 239, 270, 311 Outer product. 9
Linear hull, 79 Natural frequencies, 210
for real matrices, 242-243 Linear independence, 82 Natural normal form (first and second), 264,
Jordan chain, 230, 235 Linear space, 73 269 p !
for a matrix polynomial, SOl-50S, 519 !
finite dimensional, 84, 86 Nt;:gative definite matrix, 179 "!

length of, 230 Parallelogmm theorem, 107


infinite dimensional, 84, 86 Negative semidefinite matrix, 179
Parseval's relation, 109, 112, 113
568 INDEX INDEX
569
Partition of a matrix. 16 Reducible matrices. 374-377, 543. 544 Shift operator. 277 trivial, 76
Penrose, R. 432 normal form for, 544 Signature, 187 Sum of subspaces, 87
Perlis. S . 553 Reducing subapeces, 197-198 Similar admissible triples, 494 direct, 89, 112
Pennutation matrix. 64 Regular l-matrix, 247. 259, 489 Similarmatrices, 130-133, n5, 262-271 orthogonal, 11 2
Penon. 0 . 538, 542 Relative perturbation, 383 Simple lineartransformation, 146, 151, 164. Sup, 558
Penon-Frobenius theorem. 536 Relatively primepolynomials, 554 173,177 Sylvester, J. J., 450, 460
Perturbation coefficients, 397-399 Rellich, F. 400. 405. 561 Simple matrices, 143. 146, 147, 153, 154, Sylvester's law of inertia, 188, 204
Perturbation of eigenvalues. 387-405 Remainder (rightand left), 249-253 160,239,271,273,419- Symmetric matrix, 3, 14
Plemmons. R. J., 560 Remainder theorem, 251-253 eigenvalues of. 387-390 Symmetric partition. 17
Polardecomposition, 190,380,439 Representation Kronecker product of, 413 Symmetrizer
Polynomial in a matrix, 19 of a linear transformation, 122. 140. 145 Singular bases, 435, 437 of a matrix polynomial, 493
Polynomials, 553-556 of a set of vectors. 92, 95 Singular matrix, 43, 94, ISS of a polynomial, 455
Position vector, 4. 72 of a vector. 85 Singular-value decomposition, 192 Systems of algebraic equations, 56
Positive definite matrices, 179-182, 185, 218, standard, 123 Singular values, of a matrix. 182-184,
219. 309, 373, 382 theorem for matrixpolynomials, 516 192-193,380.386,435
Kronecker product of, 413 Resolution of the identi,ty, 315 Size of a matrix. 1 T
Positive matrix, 527 Resolvent, 164,315,321,322,330-333.505 Skew-Hennilian matrix, 16, 180
Positive semidefinite matrices., 179-182, 185, of a matrixpolynomW. 493, 50S Skew-symmetric matrix, 14. 219. 346 Taussky, 0., 375, 376, 445, 452
218 Resolvent condition, 386 Small oscillations, 208-212. 302, 303 Tensor product, 407
Power method, 361 Resonance, 212 Smithcanonical form, 261, 262 Tismenetsky, M., 468, 561
Preimage of a vector, 137 Restriction of a lineartransformation, 142 Solvents of matrix polynomials, 252. 280, Toeplitz matrix. 68, 69, 79
Primitive matrices. 544-548 Resultant matrix. 460, 461 520-526 Toeplitz, 0 . 176
; I
Principal minors, 36 Rheinboldt, W, C., 530, 561 complete set of, 524 Traceof a matrix, 22, 107, 132, 157, 166
Projectors. 164, 194-199,315, 321.333,
426,430
Right-invertible linear transformation, 139
Right-invertible matrices, 140,425-427.439
Span, 79
Spectral norm, 365, 366, 367, 381
Transfer function. 70'
Transformation. 116
II
commuting, 199 Rodman,L. 489,561 Spectral radius, 359, 365, 377, 530-532, 539, addition and scalar mUltiplication. 120
Puiseux series. 392, 401 Rost, K., 561 540 zero, 120
Pythagoras's theorem, 107 Rotation matrix, 455 Spectral resolution. 314-320 Transition matrix, 98
Roth, W. E., 422 Spectral theorem, 154, 164, 175 Transposed matrix. 13
Routh-Hurwitz problem, 464-466 Spectrum, 150, 152, 178, 188,331 Triangle inequality, 35I
Routh-Hurwitzstability test, 480-482 ofa matrix polynomial, 281,491 TriangUlar matrix. 2, 44. 176
Q Row echelonform, SO, 57 Sphere, 355 Tridiagonal matrix. 35
QR decomposition. Ill, 184 Row space, 80. 94 Square root of a'positive (semi-) definite
Quadratic forms,. 203 Russell. D. L., 561 matrix, 180, 309, 324
Quotient (right and lefO, 249-253 Stablematrix, 414, 416, 441 U
with respect to the unit circle, 451-453
s Stable polynomial. 462, 46S Unimodular l-matrix, 247. 255, 268
Standard pair, 497 Union of bases, 89
Scalarmatrix, 2, 133 Unitsphere. 283
R Standard triple, 494, 519, 520
Scalar multiplication, 6. 72 Unitvectors. 79 I
State vector, 340 I,
Rangeof a function. 558 Scalar product, 7 Unitarily similarmatrices, 175
Stein equation, 451-453
Rangeof a matrix, 77 Schneider, H., 445, 448 Unitary equivalence, 193
Stephanos. C., 411
Rank decomposition. 97. 433 Schur-Cohn problem, 466-468 Unitary matrix, 47, 115, 174, 184, 188-190,
Stieltjes matrices, 532
Rank Schur complement, 46, 56 219, 346, 348
Schur. I., 165. 176, 377 Stochastic matrices, 21, 547-552
of a linear transformation, 133-136, 172 Unitary similarity, 175, 176,365
Schwartz, J. T., 560 doubly, 550
of a matrix. 53 et seq., 93. 114, liS, 129 Unilary space, 105.351
Stoer, J., 368
of a matrix polynomial, 259-262 Schwarzmatrix, 450 Unobservable subspace. 214
Segre characteristic, 241. 492 Storey, C. 560
Rayleigh quotient. 282. 286-294, 363
Semi-defmite matrix, 179 Strongly connected directedgraph, 528
Rayleightheorem. 294
Submatrix, 16
Reachable vector, 342, 345 Sequences of matrices, 325-327 v
Subspace, 75. 105
Real part of a matrix. 179 Series of matrices, 327-329
complementary, 91 Van Loan, C. F., 561
Reduced adjoint, 272, 274, 334 Shennan-MOrison formula, 64

1
570 INDEX

Vandermonde matrix, 35, 66, 69, 307, W


524 Well-conditioned equations, 383
generalized, 70, 281,307 Wielandt, H., 450, 539
Varga, R. S., 530, 561 Wilkinson, J. H., 561
Vee-function, 409, 410 Wimmer, H. K., 422, 452
Vector, 3 Witzgall, C., 368
position, 4 Wonham, M., 561
Vibrating systems,208, 302

Вам также может понравиться