Вы находитесь на странице: 1из 393

Compact Textbooks in Mathematics

Belkacem Said-Houari

Linear
Algebra
Compact Textbooks in Mathematics
Compact Textbooks in Mathematics

This textbook series presents concise introductions to current topics in


mathematics and mainly addresses advanced undergraduates and master
students. The concept is to offer small books covering subject matter
equivalent to 2- or 3-hour lectures or seminars which are also suitable for
self-study. The books provide students and teachers with new perspectives
and novel approaches. They feature examples and exercises to illustrate
key concepts and applications of the theoretical contents. The series
also includes textbooks specifically speaking to the needs of students
from other disciplines such as physics, computer science, engineering, life
sciences, finance.
 compact: small books presenting the relevant knowledge
 learning made easy: examples and exercises illustrate the application
of the contents
 useful for lecturers: each title can serve as basis and guideline for a
semester course/lecture/seminar of 2–3 hours per week.

More information about this series at http://www.springer.com/series/11225


Belkacem Said-Houari

Linear Algebra
Belkacem Said-Houari
Department of Mathematics, College of Sciences
University of Sharjah
Sharjah, United Arab Emirates

ISSN 2296-4568 ISSN 2296-455X (electronic)


Compact Textbooks in Mathematics
ISBN 978-3-319-63792-1 ISBN 978-3-319-63793-8 (eBook)
DOI 10.1007/978-3-319-63793-8

Library of Congress Control Number: 2017951442

Mathematics Subject Classification (2010): 15A03, 15A04, 15A18, 15A42, 15A63, 15B10, 11C20,
11E16

© Springer International Publishing AG 2017


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole
or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical
way, and transmission or information storage and retrieval, electronic adaptation, computer
software, or by similar or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are
exempt from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information
in this book are believed to be true and accurate at the date of publication. Neither the
publisher nor the authors or the editors give a warranty, express or implied, with respect
to the material contained herein or for any errors or omissions that may have been made.
The publisher remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.

Printed on acid-free paper

This book is published under the trade name Birkhäuser, www.birkhauser-science.com


The registered company is Springer International Publishing AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
To my daughter NOUR
vii

Preface

Linear algebra is the study of the algebraic properties of linear transfor-


mations and matrices and it is an essential part of virtually all areas of
mathematics. It is also a fundamental and an extremely powerful tool in every
single discipline of sciences and engineering.
This is a self-contained textbook on linear algebra written in an easy
way, so that it can be accessible to many readers. It begins in  Chap. 1
with the simplest linear equation and generalizes many notions about this
equation to systems of linear equations, and then introduces the main ideas
using matrices and their properties. We believe that this is the right approach,
since most students take the first course of linear algebra already knowing
something about linear equations, lines, and systems of linear equations.
Then follows a detailed chapter ( Chap. 2) on determinants and their
properties where we also study the relationship between determinants and
the inverses of matrices and the use of determinants in solving systems of
linear equations. We introduce the main ideas with detailed proofs. We also
investigate some particular determinants that are very useful in applications.
In addition, we explain in a simple way where the ideas of determinants come
from and how they fit together in the whole theory.
In  Chap. 3, we introduce the Euclidean spaces using very simple geo-
metric ideas and then discuss various important inequalities and identities.
These ideas are present in the theory of general Hilbert spaces in a course
of functional analysis, so it is much better for students to learn them and
understand them clearly in Euclidean spaces.
The core of  Chap. 4 is a detailed discussion of general vector spaces
where rigorous proofs to all the main results in this book are given. This
is followed by a chapter ( Chap. 5) on linear transformations and their
properties.
In  Chap. 6, we introduce notions concerning matrices through linear
transformations, trying to bridge the gap between matrix theory and linear
algebra.
 Chapters 7 and 8 are more advanced, where we introduce all the
necessary ideas concerning eigenvalues and eigenvectors and the theory of
symmetric and orthogonal matrices.
One of the aspects that should make this textbook useful for students is
the presence of exercises at the end of each chapter. We did choose these
exercises very carefully to illustrate the main ideas. Since some of them
are taken (with some modifications) from recently published papers, it is
possible that these exercises appear for the first time in a textbook. All the
exercises are provided with detailed solutions and in each solution, we refer
to the main theorems in the text when necessary, so students can see the main
tools used in the solution. In addition all the main ideas in this book come
viii Preface

with illustrating examples. We did strive to choose solutions and proofs


that are elegant and short. We also tried to make this textbook in about
400 pages by focusing on the main ideas, so that students will be able to
easily and quickly understand things. In addition, we tried to maintain a
balance between the theory of matrices and the one of vector spaces and
linear transformations.
This book can be used as a textbook for a first course in linear algebra for
undergraduate students in all disciplines. It can be also used as a booklet for
graduate students, allowing to acquire some concepts, examples, and basic
results. It is also suitable for those students who are looking for simple, easy,
and clear textbook that summarizes the main ideas of linear algebra. Finally
it is also intended for those students who are interested in rigorous proofs of
the main theorems in linear algebra. We believe that if a good student uses
this book, then she (he) can read and learn the basics of linear algebra on her
(his) own.
We would like to thank Salim A. Messaoudi (from KFUPM) for valuable
suggestions and corrections which improved the contents of some parts of
this book. We also thank Sofiane Bouarroudj (from NYU Abu Dhabi) for the
many discussions that we have had about the proofs of some theorems of
linear algebra.

Abu Dhabi, United Arab Emirates Belkacem Said-Houari


October 06, 2016
ix

Contents

1 Matrices and Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1


Belkacem Said-Houari
1.1 Systems of Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 The Group of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.1.2 Multiplication of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2 Square Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.2.1 The Ring of Square Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.2.2 The Identity Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.2.3 Inverse Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.2.4 Diagonal Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.2.5 Triangular Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.2.6 Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
1.3 Solving Linear Systems with Elementary Row Operations . . . . . . . . . . 39
1.3.1 The Gauss–Jordan Elimination Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
1.4 The Matrix Transpose and Symmetric Matrices . . . . . . . . . . . . . . . . . . . . . 49
1.4.1 Transpose of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
1.4.2 Symmetric Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
1.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

2 Determinants .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Belkacem Said-Houari
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.2 Determinants by Cofactor Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
2.3 Properties of the Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
2.4 Evaluating Determinants by Row Reduction . . . . . . . . . . . . . . . . . . . . . . . . 79
2.4.1 Determinant Test for Invertibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
2.5 The Adjoint of a Square Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
2.5.1 Cramer’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
2.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

3 Euclidean Vector Spaces .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121


Belkacem Said-Houari
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
3.2 Vector Addition and Multiplication by a Scalar . . . . . . . . . . . . . . . . . . . . . 121
3.2.1 Vector Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
3.2.2 Multiplication of a Vector by a Scalar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
3.2.3 Vectors in Coordinate Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
3.2.4 Linear Combination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
3.3 Norm, Dot Product, and Distance in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
3.3.1 The Norm of a Vector in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
3.3.2 Distance in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
3.3.3 The Dot Product of Two Vectors in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
x Contents

3.4 Orthogonality in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141


3.4.1 Orthogonal Projections in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

4 General Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159


Belkacem Said-Houari
4.1 Definition and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
4.2 Properties of Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
4.3 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
4.3.1 Direct Sum Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
4.4 Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
4.5 Bases of Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
4.6 Dimension of a Vector Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
4.6.1 Dimension of a Subspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
4.6.2 Construction of a Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
4.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

5 Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199


Belkacem Said-Houari
5.1 Definition and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
5.2 Fundamental Properties of Linear Transformations . . . . . . . . . . . . . . . . . 201
5.2.1 The Kernel and the Image of a Linear Transformation . . . . . . . . . . . . . . . . 204
5.3 Isomorphism of Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
5.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215

6 Linear Transformations and Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 227


Belkacem Said-Houari
6.1 Definition and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
6.2 Change of Basis and Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
6.3 Rank of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
6.3.1 Some Properties of the Rank of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
6.4 Methods for Finding the Rank of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
6.4.1 The Method of Elementary Row and Column Operations . . . . . . . . . . . . . 246
6.4.2 The Method of Minors for Finding the Rank of a Matrix . . . . . . . . . . . . . . . 251
6.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255

7 Eigenvalues and Eigenvectors .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269


Belkacem Said-Houari
7.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
7.2 Properties of Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . 272
7.3 Eigenvalues and Eigenvectors of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
7.4 Diagonalization .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
7.4.1 Spectrum of Symmetric Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
xi
Contents

7.5 Triangularization and the Jordan Canonical Form . . . . . . . . . . . . . . . . . . 298


7.5.1 Triangularization of an Endomorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
7.5.2 The Jordan Canonical Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
7.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307

8 Orthogonal Matrices and Quadratic Forms . . . . . . . . . . . . . . . . . . . . 323


Belkacem Said-Houari
8.1 Orthogonal Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
8.1.1 The Gram–Schmidt Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
8.1.2 The QR Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
8.1.3 The LU Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336
8.2 Positive Definite Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
8.2.1 The Cholesky Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
8.3 Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
8.3.1 Congruence and Sylvester’s Law of Inertia . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
8.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361

Servicepart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
xiii

List of Figures

Fig. 1.1 The case where the system has a unique solution
.x0 ; y0 /: the solution is the intersection point
of two lines .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Fig. 1.2 The case where the system has no solution . . . . . . . . . . . . . . . . 2
Fig. 1.3 The two lines coincide and there are infinitely
many points of intersection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Fig. 1.4 The addition of two vectors in the xy-plane . . . . . . . . . . . . . . . . 8
Fig. 1.5 An easy way to perform the multiplication of two
matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Fig. 2.1 To evaluate the 3  3 determinant, we take the
products along the main diagonal and the lines
parallel to it with a .C/ sign, and the products
along the second diagonal and the lines parallel
to it wit a ./ sing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

Fig. 3.1 The vector v D AB. E ............................................... 122


Fig. 3.2 The vectors v and w are equal since they have the
same length and the same direction . . . . . . . . . . . . . . . . . . . . . . . . . 122
Fig. 3.3 The sum of two vectors v C w . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
Fig. 3.4 The sum of two vectors v C w . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
Fig. 3.5 The sum of vectors is associative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Fig. 3.6 Multiplication of a vector by a scalar . . . . . . . . . . . . . . . . . . . . . . . . . 123
Fig. 3.7 Components of a vector if the initial point is
the origin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Fig. 3.8 Components of a vector if the initial point is not
the origin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
Fig. 3.9 The sum of two vectors v C w in a coordinate system . . . 125
Fig. 3.10 The vector v . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Fig. 3.11 The vector v is a linear combination of e1 and e2 . . . . . . . . . . 128
Fig. 3.12 The dot product of u and v . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
Fig. 3.13 The cosine law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Fig. 3.14 The sum of two vectors v C w in a coordinate system . . . 133
Fig. 3.15 The parallelogram identity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
Fig. 3.16 The projection of u on v .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
Fig. 3.17 Pythagoras’ theorem in a right triangle . . . . . . . . . . . . . . . . . . . . . . 144
Fig. 3.18 Apollonius’ identity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
1 1

Matrices and Matrix Operations

Belkacem Said-Houari

© Springer International Publishing AG 2017


B. Said-Houari, Linear Algebra, Compact Textbooks in Mathematics,
DOI 10.1007/978-3-319-63793-8_1

1.1 Systems of Linear Equations

In order to introduce the main ideas of Linear Algebra, we first study matrix algebra.
So, the first thing we begin with is the following simple linear equation:

ax D b; (1.1)

where a and b are two real numbers. We know from elementary algebra that if a ¤ 0,
then Eq. (1.1) has in R the unique solution

b
xD D a1 b: (1.2)
a

Next, suppose that we want to solve the following system of two linear equations in R2 :
(
ax C by D p;
(1.3)
cx C dy D q;

where a; b; c; d; p and q are real numbers. There are at least two ways of looking for
the solutions of (1.3): geometrically and algebraically.
Geometrically
It is well known that both equations in (1.3) are equations of lines in the xy-plane. Thus,
the solutions of the system in the xy-plane are the points of intersection of the two lines.
Therefore, we may distinguish the following three cases:
Case 1. The two lines intersect in exactly one point .x0 ; y0 / as in ⊡ Fig. 1.1. This is
the case where the slopes of the two lines are not the same. That is,
a c a c
 ¤ ; or equivalently, ¤ :
b d b d
In this case the system (1.3) has a unique solution .x0 ; y0 /.
2 Chapter 1 • Matrices and Matrix Operations
1
y ax + by = q

y0 ax + by = p

x
x0

⊡ Fig. 1.1 The case where the system has a unique solution .x0 ; y0 /: the solution is the intersection point
of two lines

ax + by = p
y
ax + by = q

⊡ Fig. 1.2 The case where the system has no solution

Case 2. The two lines may be parallel and distinct, which occurs when

a c p q
D and ¤ :
b d b d

In this case there is no intersection and the system (1.3) has no solution. See ⊡
Fig. 1.2.
Case 3. The two lines may coincide, which occurs when

a c p q
D and D :
b d b d

In this case there are infinitely many points of intersection and consequently, there
are infinitely many solutions to (1.3). See ⊡ Fig. 1.3.
Algebraically
Algebraically, we may solve system (1.3) by at least two methods: The method of
substitution and the method of elimination. For the substitution method, we express
1.1  Systems of Linear Equations
3 1
ax + by = p
y ax + by = q

⊡ Fig. 1.3 The two lines coincide and there are infinitely many points of intersection

one of the two variables from the first equation in (1.3) and substitute it in the second
equation as follows:

a p
yD xC ; (1.4)
b b

provided that b ¤ 0 and then plugging expression (1.4) into the second equation of
(1.3), we find

.ad  bc/x D pd  bq: (1.5)

It is clear that if

ad  bc ¤ 0;

then the system (1.3) has a uniques solution. On the other hand, if ad  bc D 0 and
pd bq ¤ 0, then Eq. (1.5) shows that system (1.3) has no solution. Finally, if ad bc D
0 and pd  bq D 0, then system (1.3) has infinitely many solutions.
Thus, as we have said earlier, Eq. (1.1) has a unique solution if and only if

a ¤ 0:

On the other hand, system (1.3) has a unique solution if and only if

ad  bc ¤ 0:

So, the question now is what if we have a 3  3 system, or in general an n  n system?


Can we always discuss existence of solutions in terms of similar relations between the
coefficients? And if so, how can we find such relations in general? As we will see later,
the answer to the above question leads us to the study of Matrix Algebra.
4 Chapter 1 • Matrices and Matrix Operations
1
Definition 1.1.1 (Linear Equation)
A linear equation in n variables x1 ; x2 ; : : : ; xn is an equation of the form

a1 x1 C a2 x2 C    C an xn D b; (1.6)

where a1 ; a2 ; : : : ; an and b are real numbers.

ⓘ Remark 1.1.1 Equation (1.6) can be written as

f .X/ D a1 x1 C a2 x2 C    C an xn D b;

where f is a linear transformation from Rn to R and X is the vector


0 1
x1
B : C
XDB C
@ :: A : (1.7)
xn

Also, in a linear equation as (1.6) all variables occur only to the first power and the
equation does not involve any products of variables and they do not appear as arguments
of trigonometric, logarithmic, exponential, or other functions.

Example 1.1
The following equations are not linear:
p
x1 x2 D 1; x1 x2 C x2 C x3 D 0; x1 C 2 cos x C 3x3 D 1; x1  x2 D 0:

Definition 1.1.2
We define a system of linear equations as a set of m equations with n unknowns
8
ˆ
ˆ a11 x1 C a12 x2 C    C a1n xn D b1 ;
ˆ
ˆ
< a21 x1 C a22 x2 C    C a2n xn D b2 ;
ˆ ::: :: :: :: (1.8)
ˆ
ˆ : : :

am1 x1 C am2 x2 C    C amn xn D bm ;

where aij ; 1  i  m; 1  j  n are real numbers.


1.1  Systems of Linear Equations
5 1

ⓘ Remark 1.1.2 As we have seen for the system of two equations (1.3), we will show in
the sequel that system (1.8) has either
▬ no solution, or
▬ exactly one solution, or
▬ infinitely many solutions.

As in Remark 1.1.1, we may write the system (1.8) using a linear transform from Rn
to Rm
0 1 0 1
f1 .X/ b1
B : C B : C
f .X/ D @ :: A D @ :: C
B C B
A;
fm .X/ bm

where X is the vector in (1.7) and

fi .X/ D ai1 x1 C ai2 x2 C    C ain xn D bi ; 1  i  m: (1.9)

Now, going back to the system (1.8), let us assume that n D 3. Then each equation
is the equation of a plane in three-dimensional space. So, the solution of the system is
represented by a point in the intersection of m planes in the xyz-space. It is quite hard to
find the intersections of those planes. In general, the geometric method becomes hard to
apply if n  3: So, we rely on the algebraic method to solve such system for n  3: The
core problem of linear algebra is how to solve the system (1.8).

Definition 1.1.3 (Homogeneous System)


If all the right-hand sides in (1.8) are zero, that is bi D 0; 1  i  m, then system
(1.8) is called homogeneous.

We can easily see that every homogeneous system has the zero vector X D 0 as a
solution.
Now, we introduce the following definition which will help us to rewrite system
(1.8) in a more convenient form.

Definition 1.1.4 (Dot Product)


Let
0 1 0 1
x1 y1
B : C B : C
XDB C
@ :: A and YDB C
@ :: A
xn yn

(Continued )
6 Chapter 1 • Matrices and Matrix Operations
1
Definition 1.1.4 (continued)
be two vectors in Rn . Then the dot product or the inner product of X and Y is the
real number X  Y defined as

X  Y D Y  X D x1 y1 C x2 y2 C    C xn yn :

Now, using Definition 1.1.4, we may rewrite Eq. (1.9) as

Vi  X D bi ; 1  i  m;

where Vi is the vector


0 1
ai1
B : C
Vi D B C
@ :: A ; 1  i  m: (1.10)
ain

We may also write the dot product as a row vector times a column vector (with no dot):
0 1
x1
B : C
.ai1 ; : : : ; ain / B C
@ :: A D ai1 x1 C ai2 x2 C    C ain xn :
xn

Using this notation, we recast system (1.8) as


2 30 1 0 1 0 1
a11 a12 a13 ::: a1n x1 V1  X b1
6 7B C B C B C
6 a21 a22 a23 ::: a2n 7 B x2 C B V2  X C B b2 C
6 : :: 7 B C B C B C
6 : :: :: :: 7B : C D B : C D B : C; (1.11)
4 : : : : : 5 @ :: A @ :: A @ :: A
am1 am2 am3 ::: amn xn Vm  X bm

or equivalently as

AX D b; (1.12)

where
2 3 0 1 0 1
a11 a12 a13 ::: a1n x1 b1
6 7 B C B C
6 a21 a22 a23 ::: a2n 7 B x2 C B b2 C
AD6
6 :: :: :: :: :: 7
7; XDB C
B :: C ; and b D B C
B :: C : (1.13)
4 : : : : : 5 @ : A @ : A
am1 am2 am3 ::: amn xn bm
1.1  Systems of Linear Equations
7 1
Definition 1.1.5
In the above formulas, the rectangular array of numbers A is called a matrix. The
numbers aij ; 1  i  m; 1  j  n, are called the entries or coefficients of the
matrix A.a
a
See  Chap. 6 for the definition of matrices through linear transformations.

The matrix A consists of m rows (horizontal lines) and n columns (vertical lines).

Notation A matrix A is sometimes denoted by A D .aij / with 1  i  m and 1  j  n.


The entry aij lies in the ith row and the jth column.

Now, if we want to solve system (1.8), then it is natural to consider the system in its
matrix form (1.12), since it looks similar to Eq. (1.1). Therefore, our first intuition is to
write the solution as in (1.2), that is

X D A1 b: (1.14)

In doing so, many questions arise naturally. For instance:


▬ How can we define the inverse of a matrix?
▬ We know that the inverse of a real number a exists if and only if a ¤ 0; when does
the inverse of a matrix exists? And if it does exist, how can we find it?
▬ How do we multiply a matrix by a vector?
▬ Can we perform the usual algebraic operations (addition, multiplication, subtrac-
tion,. . . ) on matrices?

The answers of the above questions are the building blocks of matrix algebra in
particular, and linear algebra in general.
One of the interesting cases is when m D n. In this case, we say that A is a square
matrix and we have following definition.

Definition 1.1.6 (Square Matrix )


A matrix A with m rows and n columns is called an m  n matrix. If m D n, then the
matrix A is called a square matrix. The collection of the entries aij with i D j is
called the main diagonal.

Example 1.2
1. The matrix
" #
1 0 2
AD p
1 2 3

is a 2  3 matrix (two rows and three columns).


8 Chapter 1 • Matrices and Matrix Operations
1
2. The matrix
2 3
3 1 0
6 7
A D 4 2 0 2 5
1  9

is a square matrix and the entries of the main diagonal are 3; 0; and 9.


In order to define the addition of matrices, let us first consider two vectors in R2 ,
! !
x1 x2
X1 D and X2 D :
y1 y2

Each vector can be seen as 2  1 matrix. In order to define X1 C X2 , we first need to think
geometrically and draw both vectors in the xy-plane.
It is clear from ⊡ Fig. 1.4 that the vector X1 C X1 is
!
x1 C x2
X1 C X2 D : (1.15)
y1 C y2

Guided by the case for the 2  1 matrices, we can go ahead and define analogously the
addition of two m  n matrices.

Definition 1.1.7 (Addition of Matrices)


Let A and B be two m  n matrices (that is A and B have the same size).a Then, the
sum of A and B is the matrix A C B obtained by adding the entries of A to the
corresponding entries of B. More precisely, if A D .aij / and B D .bij /, with

(Continued )

⊡ Fig. 1.4 The addition of two y


vectors in the xy-plane
y1 + y2
X1 + X2
y2
X2
X1
y1

x
x2 x1 x1 + x2
1.1  Systems of Linear Equations
9 1

Definition 1.1.7 (continued)


1  i  m and 1  j  n, then

A C B D .aij C bij /; 1  i  m; 1  j  n:
a
The size of a matrix is described in terms of the rows and columns it contains.

It is clear from above that the addition of matrices is commutative, that is,

A C B D B C A:

Example 1.3
Consider the matrices
" # " #
1 0 1 001
AD and BD :
2 0 3 310

Then A C B is the matrix given by


" #
100
ACBD :
113

Using the same intuition we multiply a vector by a scalar, then this means
geometrically that we multiply each of its components by the same scalar. In a similar
manner we define the multiplication of a matrix by a scalar.

Definition 1.1.8 (Multiplication by Scalar)


Let A be an m  n matrix and  be a scalar. Then the product A is the matrix
obtained by multiplying each entry of the matrix A by . Thus,

A D .aij / D .aij /; 1  i  m; 1  j  n:

Example 1.4
Take
" #
2 1
AD and  D 2:
3 0
10 Chapter 1 • Matrices and Matrix Operations
1
Then
" #
4 2
2A D :
6 0

Notation We denote the set of scalars by K (K will ordinarily be R or C) and the set of m  n
matrices by Mmn .K/. If m D n, then we write Mmn .K/ simply as Mn .K/.

ⓘ Remark 1.1.3 It is clear that for A and B in Mmn .K/, A C B is in Mmn .K/. Thus, the
addition .C/ ( binary operation)a in the set Mmn .K/ satisfies the closure property.
Also, we have seen that for any  in K and for any A in Mmn .K/, A is in Mmn .K/ .
a
This binary operation is defined from Mmn .K/ Mmn .K/ ! Mmn .K/ and takes .A; B/ from Mmn .K/
Mmn .K/ to A C B in Mmn .K/.

Definition 1.1.9 (Zero Matrix)


We define the zero matrix in Mmn .K/, and denote it by 0 or 0mn , as the matrix
whose entries are all zero.

Example 1.5
The following matrices are zero matrices:
" # " # " #
00 000 0
; ; ; Œ0:
00 000 0

Theorem 1.1.4 (Properties of the Zero Matrix)


Let A be a matrix in Mmn .K/ and 0 be the zero matrix in Mmn .K/ . Then
1. 0 C A D A C 0 D A,
2. 0A D 0,

where 0 is the zero in K.

The proof of this theorem is a straightforward consequence of the definition of the


addition of matrices and of the multiplication of a matrix by a scalar.
1.1  Systems of Linear Equations
11 1
1.1.1 The Group of Matrices

In this subsection we introduce an important algebraic structure on the set of matrices


Mmn .K/. We start with the following definition.

Definition 1.1.10
A group is a nonempty set G together with a binary operation

GG ! G

.a; b/ 7! a  b;

satisfying the following conditions:


G1. (associativity) for all a; b; c in G,

.a  b/  c D a  .b  c/I

G2. (existence of a neutral (identity) element) there exists an element e in G


such that

a  e D e  a D a;

for all a in G;
G3. (existence of the inverse) for each a in G, there exists a0 in G such that

a  a0 D a0  a D e:

We usually denote a group by .G; /.


If

abDba

for all a; b in G, then the group is called commutative or Abelian.1

Example 1.6
One of the simplest Abelian group is the group of integers .Z; C/ with the addition .C/ as
the binary operation. The structure of group in this set gives meaning to the negative integers
as the inverses of positive integers with respect to the addition law .C/ . 

1
Named after the Norwegian mathematician Niels Henrik Abel.
12 Chapter 1 • Matrices and Matrix Operations
1

Theorem 1.1.5
The set .Mmn .K/; C/ is a commutative group.

Proof
We saw earlier that addition in Mmn .K/ is a binary operation from Mmn .K/ 
Mmn .K/ ! Mmn .K/ W .A; B/ 7! A C B: It is not hard to see from Definition 1.1.7
that if A; B; and C are matrices in Mmn .K/, then

.A C B/ C C D A C .B C C/;

i.e., the binary operation .C/ is associative in Mmn .K/.


Also, if 0 is the zero matrix in Mmn .K/, then

A C 0 D 0 C A D A;

for each matrix A in Mmn .K/. Thus, the zero matrix is the neutral element in Mmn .K/
with respect to .C/.
Now, for each A in Mmn .K/, A is also a matrix in Mmn .K/ and satisfies

A C .A/ D .A/ C A D 0:

Therefore, the matrix A is the inverse of A in Mmn .K/ with respect to the binary
operation .C/.
Next, since

A C B D B C A;

for any two matrices A and B in Mmn .K/, the group Mmn .K/ is commutative. This
completes the proof of Theorem 1.1.5. t
u

1.1.2 Multiplication of Matrices

The next step is to define the multiplication of matrices. We have seen in (1.11) the
multiplication of a matrix by a vector. Now, consider a matrix B in Mpr .K/. Then B
can be written as

B D ŒB1 ; B2 ; : : : ; Br  ;
1.1  Systems of Linear Equations
13 1
where Bj ; 1  j  r are vectors in Mp1 .K/, that is,
0 1
b1j
B : C
Bj D B C
@ :: A :
bpj

The matrix A in (1.11) is in Mmn .K/ and we write as before


23
V1
6 : 7
AD6 7
4 :: 5
Vm

where Vi is the row vector defined as

Vi D .ai1 ; ai2 ; : : : ; ain /; 1  i  m:

Now, using the same idea as in (1.11), and assuming that p D n, we may find, for
instance
2 3
V1
6 : 7
6 : 7 B1 D C1 ;
4 : 5
Vm

where C1 is a vector in Mm1 .K/ whose first component is the dot product V1  B1 and
generally whose ith component is the dot product Vi  B1 . In order for these dot products
to be defined, B1 must have the same number of components as Vi ; 1  i  m. That is
p D n. If we do this for all the vectors Bj ; 1  j  r, then we obtain the matrix

C D ŒC1 ; C2 ; : : : ; Cr ;

where each Ck ; 1  k  r, is a vector with m components. Therefore, the matrix C is in


the set Mmr .K/ and we have the following definition.

Definition 1.1.11 (Multiplication of Matrices)


Let A be a matrix in Mmn .K/ and B be a matrix in Mpr .K/. Then:
▬ If p D n, we define the product AB as

AB D .aij /  .bjk / D C D .cik /; 1  i  m; 1  j  n; 1  k  r;

(Continued )
14 Chapter 1 • Matrices and Matrix Operations
1
Definition 1.1.11 (continued)
with
X
n
cik D ai1 b1k C ai2 b2k C    C ain bnk D aij bjk :
jD1

▬ If p ¤ n, then the product AB is undefined.

In ⊡ Fig. 1.5 we show an easy way of how to multiply two matrices by positioning
them as in the ⊡ Fig. 1.5. So, to get the entry c22 , for instance, we multiply the entries of
the second row of the matrix A with the entries of the second columns of the matrix B.

Example 1.7
Consider the two matrices
2 3
12 " #
6 7 43
A D 43 45 and BD ;
21
01

Then
2 3
8 5
6 7
AB D 4 20 13 5 :
2 1

Example 1.8
Consider the square matrices
" # " #
ab 01
AD and BD :
cd 00

Find all values of a; b; c and d (if any) for which AB D BA: 

Solution
We first compute
" #" # " #
ab 01 0a
AB D D : (1.16)
cd 00 0c
1.1  Systems of Linear Equations
15 1

B : n rows r columns

b 11 b 12 ... b 1r

b 21 b 22 ... b 2r
12
b

.. .. .. ..
.
21

. . .
a

22
b
22

b n1 b n2 ... b nr
a

..
.

2
bn
n
a2

a 11 a 12 ... a 1n c 11 c 12 ... c 1r

a 21 a 22 ... a 2n c 21 c 22 ... c 2r

.. .. .. .. .. .. .. ..
. . . . . . . .

a m1 a m2 ... a mn c m1 c m2 ... c mr

A : m rows n columns C AB : m rows r columns

⊡ Fig. 1.5 An easy way to perform the multiplication of two matrices

On the other hand,


" #" # " #
01 ab cd
BA D D : (1.17)
00 cd 00

From (1.16) and (1.17), we deduce that AB D BA if and only if


" # " #
0a cd
D ;
0c 00
16 Chapter 1 • Matrices and Matrix Operations
1
that is c D 0 and a D d. Thus, the matrices A satisfying AB D BA are the matrices of the
form
" #
ab
AD ;
0a

where a and b are any real numbers. Some examples of such matrices are
" # " # " #
10 21 3 7
; ;
01 02 0 3

there are infinitely many of them. J

ⓘ Remark 1.1.6 It is clear that even if we can do the multiplications AB and BA, as in the
case of square matrices, for instance, then in general

AB ¤ BA:

To illustrate this, take


" # " #
01 00
AD and BD :
00 01

Then
" # " #
01 00
AB D ; but BA D :
00 00

Theorem 1.1.7 (Multiplication is Distributive over the Addition)


Let A be a matrix in Mmn .K/ and B and C be matrices in Mnr .K/. Then

A.B C C/ D AB C AC: (1.18)

The proof is straightforward. We can just compute the two sides in (1.18) and find that
they are equal. We omit the details.

Theorem 1.1.8 (Multiplication is Associative)


Let A, B and C, be matrices in Mmn .K/, Mnp .K/ and Mpr .K/, respectively. Then
we have

A.BC/ D .AB/C: (1.19)


1.1  Systems of Linear Equations
17 1
Proof
Let

A D .aij /; B D .bjk /; C D .ckl /; 1  i  m; 1  j  n; 1  k  p; 1  l  r:

Thus, we have

X
n
AB D .˛ik /; with ˛ik D aij bjk
jD1

and

X
p
BC D .ˇjl /; with ˇjl D bjk ckl :
kD1

Therefore,

A.BC/ D .il /;

with

X
n X
n X
p
il D aij ˇjl D aij bjk ckl
jD1 jD1 kD1

X
n X
p
X
p
X
n
D aij bjk ckl D aij bjk ckl
jD1 kD1 kD1 jD1

p  n 
X X X
p
D aij bjk ckl D ˛ik ckl
kD1 jD1 kD1

D .AB/C:

This finishes the proof of Theorem 1.1.8. t


u

Example 1.9
Consider the matrices
" # " # " #
1 1 2 3 2 6
AD ; BD ; CD :
2 4 4 5 3 7

Then
" #" # " #
6 2 2 6 18 22
.AB/C D D
20 14 3 7 82 22
18 Chapter 1 • Matrices and Matrix Operations
1
and
" #" # " #
1 1 5 33 18 22
A.BC/ D D :
2 4 23 11 82 22

Example 1.10
Consider the matrices:
2 3 2 3
1 4 " # 8 6 6
6 7 20 0 6 7
A D 4 2 3 5 ; BD ; C D 4 6 1 1 5 :
0 1 1
1 2 4 0 0

Find a matrix K such that AKB D C: 

Solution
Since A is a 3  2 matrix and B is a 2  3 matrix, K must be a 2  2 matrix, otherwise, the
product AKB is not defined. Thus, we put
" #
ab
KD :
cd

Since the product of matrices is associative (Theorem 1.1.8), we first compute the product
AK as
2 3 2 3
1 4 " # a C 4c b C 4d
6 7 ab 6 7
AK D 4 2 3 5 D 4 3c  2a 3d  2b 5 :
cd
1 2 a  2c b  2d

Now, we multiply AK by B and obtain


2 3 2 3
a C 4c b C 4d " # 2.a C 4c/ b C 4d b  4d
6 7 20 0 6 7
.AK/B D 4 3c  2a 3d  2b 5 D 4 2.3c  2a/ 3d  2b 2b  3d 5 :
0 1 1
a  2c b  2d 2.a  2c/ b  2d 2d  b

The equality, AKB D C now reads


2 3 2 3
2.a C 4c/ b C 4d b  4d 8 6 6
6 7 6 7
4 2.3c  2a/ 3d  2b 2b  3d 5 D 4 6 1 1 5 :
2.a  2c/ b  2d 2d  b 4 0 0
1.2  Square Matrices
19 1
This gives
8
ˆ
ˆ 2.a C 4c/ D 8;
ˆ
ˆ
ˆ
ˆ
< C 4d D 6;
b
2.3c  2a/ D 6;
ˆ
ˆ
ˆ
ˆ 3d  2b D 1;
ˆ

b  2d D 0:

Solving this system, we find a D 0; b D 2; c D 1; and d D 1. Thus, the matrix K is


" #
02
KD :
11

1.2 Square Matrices

We have introduced square matrices in Definition 1.1.6. In this section, we show that this
class of matrices plays a central role in matrix algebra in particular, and linear algebra
in general.

1.2.1 The Ring of Square Matrices

The class of square matrices enjoys very important algebraic properties. One of these
properties is that the set Mn .K/ has the closure property under multiplication. That is,
for any two matrices A and B in Mn .K/, the product AB is an element in Mn .K/. (This
does not hold for matrices in Mmn .K/ if m ¤ n). In other words, we have the binary
operation

Mn .K/  Mn .K/ ! Mn .K/ W .A; B/ 7! AB: (1.20)

Another property is that the multiplication is distributive over addition from the right
and from the left. That is, for all matrices A; B; and C in Mn .K/, one can easily
verify that

A.B C C/ D AB C AC (1.21)

and

.B C C/A D BA C CA: (1.22)


20 Chapter 1 • Matrices and Matrix Operations
1
ⓘ Remark 1.2.1 (Binomial Formula) Let A and B be two matrices in Mn .K/ such that
AB D BA. Then we have the binomial formula

X
m

.A C B/m D Cmk Ak Bmk ; Cmk D :
kD0
kŠ.m  k/Š

In particular, since the identity matrix commute with all matrices in Mn .K/, then
we have

X
m
.I C A/m D Cmk Ak :
kD0

Definition 1.2.1
Let R be a nonempty set on which two binary operations called addition .C/ and
multiplication ./ are defined. Then, R is a ring (with respect to the given addition
and multiplication) if:
R1. .R; C/ is an Abelian group;
R2. multiplication is associative, that is

a  .b  c/ D .a  b/  c;

for all a; b and c in R;


R3. multiplication is distributive over addition from the right and from the left,
that is

a  .b C c/ D a  b C a  c

and

.b C c/  a D b  a C c  a:

The ring R is commutative if

ab Dba

for all elements a; b in R, and noncommutative otherwise.


If there exists e in R such that

a  e D e  a D a;

for all elements a in R, then e is called a unit or identity element of R, and R is


called a unitary ring.
1.2  Square Matrices
21 1
Example 1.11
The set of real numbers .R; C; / with the usual addition and multiplication is a commutative
unitary ring. 

1.2.2 The Identity Matrix

Here we define the identity matrix precisely as the identity element in a ring R.

Definition 1.2.2
Let I be a square matrix in Mn .K/. Then I is an identity matrix if and only if (here
we omit “” in the multiplication)

AI D IA D A; (1.23)

for any square matrix in Mn .K/.

We can easily check, by using the definition of the product of two matrices
(Definition 1.1.11), that (1.23) holds if and only if the matrix I has the form
2 3
1 0 0  0
60 1 0  07
6 7
6 7
ID6
6
0 0 1  07;
7 (1.24)
6 :: :: :: :: :: 7
4: : : : :5
0 0 0  1

i.e., means that all entries are zero except the entries aii on the main diagonal, which are
aii D 1.

Notation In the sequel, we will sometimes denote by In the identity matrix in Mn .K/.

Example 1.12
The following are examples of identity matrices
2 3
" # 100
10 6 7
I D Œ1; ID ; I D 40 1 05:
01
001


22 Chapter 1 • Matrices and Matrix Operations
1

Theorem 1.2.2
The set of square matrices .Mn .K/; C; / with the binary operations .C/ and ./
introduced in Definitions 1.1.7 and 1.1.11, respectively, is a unitary noncommutative
ring.

Proof
We know from Theorem 1.1.5 that .Mn .K/; C/ is an Abelian group. Since (1.19)–(1.22)
are also satisfied, .Mn .K/; C; / is a ring. It is clear that this ring is noncommutative since
the multiplication of matrices is noncommutative. This ring has also an identity element, the
identity matrix defined above. t
u

1.2.3 Inverse Matrices

As we have seen before, the solution of the linear equation (1.1) is given by (1.2). The
constant a in (1.1) can be seen as a square matrix in M1 .K/ and a1 is the inverse matrix
of a in M1 .K/. So, the solution in (1.2) is defined only if a1 exists. Thus, the natural
question is now whether we can generalize this idea to any square matrix in Mn .K/,
with n  1? In other words, can we write a solution of system (1.12) in the case m D n
in the form

X D A1 b (1.25)

where A1 is the inverse matrix of A analogously as in (1.2)? To answer this question,
we need first to define A1 . For n D 1 and a ¤ 0 we have a1 D 1=a and satisfies

aa1 D a1 a D 1; (1.26)

thus a1 exists if and only if

a ¤ 0: (1.27)

So, as we indicated above, if we think about the constants in (1.26) as square matrices
in M1 .K/ and 1 as the identity matrix in M1 .K/, then we can define the inverse of the
matrix a as a new matrix a1 in M1 .K/ satisfying (1.26). Now it is quite obvious how
to extend this definition to matrices in Mn .K/; n  1 and introduce the inverse of the
matrix A as follows.

Definition 1.2.3 (Inverse of a Square Matrix)


Let A and B be two square matrices in Mn .K/. Then, B is the inverse of A if and
only if

AB D BA D I: (1.28)
1.2  Square Matrices
23 1
Notation The inverse of the matrix A is denoted by A1 . Thus, (1.28) reads

AA1 D A1 A D I: (1.29)

Now, using (1.29) and multiplying the equation

AX D b (1.30)

by A1 , we get (formally)

A1 .AX/ D A1 b:

Since the multiplication of matrices is associative, we obtain from above that

A1 .AX/ D .A1 A/X D IX D X:

Therefore,

X D A1 b: (1.31)

Consequently, to find a solution to the system of linear equations (1.8) (with m D n) it is


enough to find the inverse matrix A1 . Here the following questions arise naturally:
▬ Does the inverse A1 of the square matrix A does always exist? If yes, we say that A is
invertible or nonsingular.
▬ In practice it is really important to know whether the solution of the system of linear
equations is unique or not. Formula (1.31) indicates that the solution is unique if and
only if A1 is unique. So, is the inverse A1 unique?
▬ If the inverse A1 exists, then how can we find it?

We can immediately answer the first question in the negative. As a simple example,
for a matrix a in M1 .K/, a1 exists if and only if a ¤ 0. Also, the zero matrix 0 in
Mn .K/ has no inverse since for any matrix A in Mn .K/

0A D A0 D 0 ¤ I;

which violates the definition of the inverse A1 . So, we need a criteria to determine
which square matrices in Mn .K/ have inverses in Mn .K/. This will be investigated in
the coming sections.
For the second question, we have the following theorem.

Theorem 1.2.3 (Uniqueness of the Inverse Matrix)


Let A be a square matrix in Mn .K/ and assume that A1 exists. Then A1 is unique.
24 Chapter 1 • Matrices and Matrix Operations
1
Proof
To prove this statement, we assume that there are two inverses B and C of the matrix A and
show that B D C. Now, since both B and C are inverses of A, they both satisfy (1.28). That is

AB D BA D I; and AC D CA D I: (1.32)

Now, since the multiplication of matrices is associative (Theorem 1.1.8), we have that

A.BC/ D .AB/C D IC D C: (1.33)

On the other hand, the first identity in (1.32) yields

A.BC/ D .AB/C D .BA/C D B.AC/ D BI D B: (1.34)

Relations (1.33) and (1.34) show that

B D C;

which ends the proof of Theorem 1.2.3. t


u

Finally, to answer the third question is really a challenging problem especially if the
matrix A is of large size. To understand why this is the case, let us consider a 22 matrix
and compare the amount of work required with that in the case of an 1  1 matrix.
So, we consider the matrix A in M2 .K/ given by
" #
ab
AD
cd

and try to find the inverse A1 . Actually, there are at least two obvious methods of how
to proceed. First, we may just assume that A1 exists as a matrix in M2 .K/ and apply
(1.29) to find the entries of A1 . The second method is based on the strong connection
between the inverse of A and the solution of the linear system (1.30). That is, if we know
A1 , then we know the solution by formula (1.31). Conversely, if the solution of system
(1.30) exists, then it should be written in the form (1.31). Consequently, our strategy is
to solve the 2  2 system where A is the matrix of coefficients and then, once we found
the solution, we try to write it in the form (1.31) and thus obtain A1 .
We consider system (1.3), that is
(
ax C by D p;
(1.35)
cx C dy D q:

As we have seen in (1.5), we can write

pd  bq
xD :
ad  bc
1.2  Square Matrices
25 1
Plugging this expression in the first equation of (1.35), we get

a p aq  bc
yD xC D : (1.36)
b b ad  bc
Therefore, the solution of (1.35) is
" #
1 pd  bq
XD
ad  bc aq  bc

which can be written as


" #" #
1 d b p
XD : (1.37)
ad  bc c a q

Of course, formula (1.37) makes sense only if

ad  bc ¤ 0: (1.38)

We summarize the above discussion in the following theorem.

Theorem 1.2.4
If ad  bc ¤ 0, then the inverse of the square matrix
" #
ab
AD
cd

is given by
" #
1 1 d b
A D : (1.39)
ad  bc c a

We can plainly see how the level of difficulty of finding the inverse changes from
the 1  1 matrix to the 2  2 matrix. For the 1  1, matrix, the inverse exists if and only if
(1.27) is satisfied and then a1 D 1=a. On the other hand, the inverse of the 2  2 matrix
exists if and only if (1.38) is verified, and then the inverse is given by (1.39). So, as we
have seen above, the difficulty of finding A1 is increasing. We will see in the coming
sections other methods for finding the inverse A1 of a matrix A in Mn .K/ with n  3.

Example 1.13
Find the inverses of the following matrices:
" # " #
10 12
AD ; BD :
23 12


26 Chapter 1 • Matrices and Matrix Operations
1
Solution
For the matrix A, since ab  cd D 3 ¤ 0, A1 exists and applying (1.39), we get
" # " #
1 1 3 0 1 0
A D D :
3 2 1 2=3 1=3

For the matrix B and since ab  cd D 0, then B1 does not exists. J

We have defined before the product of two matrices, so the question now is how is
the inverse related to the product. For matrices in M1 .K/, we have

.ab/1 D a1 b1 D b1 a1 :

But keep in mind that this is true since the product of matrices in M1 .K/ is commutative,
while we already know that the product of matrices in Mn .K/; n > 1 is not commutative
in general. So, only one of the above equalities is true for matrices, and as we will see,
it is the second one.

Theorem 1.2.5 (Inverse of the Product of Two Matrices)


Let A and B be two matrices in Mn .K/ and assume that their inverses exist. Then,

.AB/1 D B1 A1 : (1.40)

Proof
Since the multiplication of matrices is associative, we can write

.AB/.B1 A1 / D A.B1 B/ A1


D AIA1 D I:

Similarly,

.B1 A1 /.AB/ D B1 .A1 A/B

D B1 IB D I:

Consequently, by (1.29), B1 A1 is an inverse of AB, and since the inverse of a matrix is
unique (Theorem 1.2.3), then B1 A1 is the only inverse of AB. u
t

ⓘ Remark 1.2.6 Using induction, we may easily generalize (1.40) for any finite number
of matrices as

.A1 A2    A`1 A` /1 D A1 1 1 1


` A`1    A2 A1

where Ai ; 1  i  `; are matrices in Mn .K/.


1.2  Square Matrices
27 1
Example 1.14
Given the two matrices
" # " #
10 1 2
AD ; BD :
02 0 1

Find .AB/1 by two methods. 

Solution
In the first method, we compute the product AB and then we use Theorem 1.2.4 to find the
inverse .AB/1 . We have
" #
1 2
AB D ;
0 2

and so
" # " #
1 1 2 2 1 1
.AB/ D D :
2 0 1 0 1=2

In the second method, we use Theorem 1.2.5. Thus, we have, by using (1.39),
" # " #
1 1 20 1 0
A D D
2 01 0 1=2

and
" # " #
1 1 2 1 2
B D 1 D :
0 1 0 1

Therefore,
" #
1 1 1 1 1
.AB/ DB A D :
0 1=2

Theorem 1.2.7
Let A be an invertible matrix in Mn .K/ and let  ¤ 0 be a scalar in K. Then .A1 /1
and .A/1 exist and we have

1 1
.A1 /1 D A; and .A/1 D A : (1.41)

28 Chapter 1 • Matrices and Matrix Operations
1
Proof
The first property in (1.41) is trivial. To prove the second one, we have
1  1
.A/ A1 D  AA1 D I:
 

Similarly,
1 
A1 .A/ D I:


The uniqueness of the inverse yields the desired result. t


u

Now we can collect the above properties of invertible matrices in Mn .K/ and give them
an algebraic structure.

Theorem 1.2.8 (The General Linear Group GL.n; K/)


The set of square invertible matrices in Mn .K/ is a group with respect to multiplica-
tion. This group is non-Abelian for n  2 . We denote this group by GL.n; K/.

Proof
To prove the theorem, we can simply verify the assumptions in Theorem 1.1.10. First, it is
clear that the GL.n; K/ is not empty, since the identity matrix lies in this set. In addition, it is
clear from Theorem 1.2.5 that if A and B are two elements of GL.n; K/, then AB is also an
element of GL.n; K/. Since the multiplication of matrices is associative in Mn .K/, it is also
associative in GL.n; K/.
Furthermore, it is clear that I is the identity element in GL.n; K/. Next, for any A in
GL.n; K/ there exists A1 in GL.n; K/ satisfying (1.29). Thus, .GL.n; K/; / is a group. It is
non-Abelian since, we know that multiplication is noncommutative: for example take
" # " #
10 11
AD ; BD :
02 03

Then both A and B belong to GL.2; K/, but


" # " #
11 12
AB D whereas BA D :
06 06

t
u

In the next theorem we exhibit the relationship between invertible matrices and
homogeneous systems of linear equations.
1.2  Square Matrices
29 1

Theorem 1.2.9
Let A be a square matrix in Mn .K/. Then the following two properties are
equivalent:
1. the matrix A is invertible;
2. the homogeneous system associated to the matrix A has the trivial solution X D 0
as the unique solution.

Proof
We first need to show that (1) implies (2). So, assume that A is invertible. Then the
homogenous system associated to A has the solution

X D A1 b (1.42)

where b D 0 is the zero vector. Since the inverse of A is unique, the solution X is uniquely
defined by (1.42), and so X D 0 is the unique solution of the homogeneous system. We leave
it to the reader to show that (2) implies (1), which can be done in several ways. t
u

As we have stated before, since one of the main goals of matrix algebra is to provide
necessary conditions for the invertibility of a square matrix A and ways to calculate its
inverse A1 . Hence, we want to characterize at least some particular sets of square
matrices, where we can easily determine if matrices from these sets are invertible or not
and compute the inverses if possible. Among these sets are the set of diagonal matrices
and triangular matrices.
First, we exclude some classes of square matrices that have no inverse.

Theorem 1.2.10
A square matrix that has either a zero row or a zero column is not invertible.

Proof
Let A be a square matrix in Mn .K/ with a zero row. Then, for any matrix B in Mn .K/,
the corresponding row in the product AB is also zero. So, AB cannot be the identity matrix.
Similarly, if A has a zero column, then the product BA has a zero column, so, BA cannot be
the identity matrix. t
u

Example 1.15
The matrices
2 3
" # 101
00 6 7
; 40 0 35
32
502

are not invertible. 


30 Chapter 1 • Matrices and Matrix Operations
1
1.2.4 Diagonal Matrices

Here we introduce the set of diagonal matrices that plays an important role in the theory
of matrices.

Definition 1.2.4 (Diagonal Matrix)


Let D D .dij /; 1  i  n; 1  j  n be a square matrix in Mn .K/. Then D is a
diagonal matrix if all the entries outside the main diagonal are zero. That is
2 3
d1 0 0  0
6 7
60 d2 0    0 7
6 7
6 0 d3    0 7
DD6 0 7; (1.43)
6 : :: 7
6 : : 7
4 : 5
0 0 0    dn

with di D dii ; 1  i  n. We may also write D D diag.d1 ; d2 ; : : : ; dn /.

Example 1.16
The following matrices are diagonal:
2 3 2 3 2 3 2 3
" # 1 0 0 50 0 100 000
20 6 7 6 7 6 7 6 7
AD ; B D 4 0 3 0 5; C D 40 0 0 5; I D 40 1 05; 0 D 40 0 05:
01 p
0 0 00 2 001 000

The next, theorem provides an easy test, which tells us if a diagonal matrix is
invertible or not and gives us right away the inverse.

Theorem 1.2.11 (Inverse of a Diagonal Matrix)


Let D be a diagonal matrix in Mn .K/. Then D is invertible if and only if all its entries
(i.e., the entries of its main diagonal) are nonzero. In this case the inverse of D is given
by
2 3
1=d1 0 0  0
6 7
6 0 1=d2 0    0 7  
6 7 1 1 1
6 0 1=d3    0 7
D1 D6 0 7 D diag ; ;:::; :
6 : :: 7 d1 d2 dn
6 : : 7
4 : 5
0 0 0    1=dn
1.2  Square Matrices
31 1
Proof
We first suppose that di ¤ 0. Then, one can easily check that the matrix B defined by
2 3
1=d1 0 0  0
6 7
6 0 1=d2 0    0 7
6 7
6 0 1=d3    0 7
BD6 0 7
6 : :: 7
6 : : 7
4 : 5
0 0 0    1=dn

satisfies

DB D BD D I;

which means that B is an inverse of D and since the inverse is unique (Theorem 1.2.3),

B D D1 :

Also, it is clear that D1 exists if and only if di ¤ 0; 1  i  n:


Now assume that D is invertible. We need to show that di ¤ 0; 1  i  n. Indeed, there
exists a matrix K D .kij /; 1  i; j  n such that

DK D KD D I: (1.44)

We have (see Exercise 1.2)


2 3
d1 k11 d1 k12 d1 k13 : : : d1 k1n
6d k d k d k : : : d2 k2n 7
6 2 21 2 22 2 23 7
DK D 6
6 :: :: :: :: :: 77
4 : : : : : 5
dn kn1 dn kn2 dn kn3 : : : dn knn

and
2 3
d1 k11 d2 k12 d3 k13 : : : dn k1n
6d k d k d k : : : dn k2n 7
6 1 21 2 22 3 23 7
KD D 6
6 :: :: :: :: :: 77:
4 : : : : : 5
d1 kn1 d2 kn2 d3 kn3 : : : dn knn

Hence, (1.44) gives

di kii D 1; 1  i  n:

This shows that di ¤ 0; 1  i  n. t


u
32 Chapter 1 • Matrices and Matrix Operations
1
Example 1.17
Find the inverses of the following matrices:
2 3 2 3
40 0 60 0
6 7 6 7
A D 40 3 0 5; B D 40 0 0 5:
0 0 2 0 0 1

Solution
For the matrix A, since all the entries of its main diagonal are nonzero, A is invertible and
2 3
1=4 0 0
6 7
A1 D 4 0 1=3 0 5 :
0 0 1=2

On the other hand, since in the matrix B one entry of the main diagonal is zero, B is not
invertible i.e., B1 does not exist. J

Among the interesting applications of matrix algebra is the solution of systems of


differential equations. This is essentially based on the computation of the exponential of
a square matrix A, which is defined as the infinite sum

A2 A3 Ak
eA D I C A C C C  C C  (1.45)
2Š 3Š kŠ

for a matrix A in Mn .K/. So, to compute eA , we need to compute Ak for all k  1,


and this is a challenging and difficult problem even with more advanced computers,
especially if the size of the matrix is large. In addition, since the sum in (1.45) is
infinite, then we need some advanced mathematical tools to tackle such a problem (see
 Sect. 7.5.2). So, the problem of finding (1.45) is reduced to the computation of Ak .
One of the useful properties of a diagonal matrix D is that we can easily compute Dk for
any k  1, as shown in the following theorem.

Theorem 1.2.12 (Power of a Diagonal Matrix)


Let D be a diagonal matrix (defined as in (1.43)). Then for any positive integer k, we
have
2 3
dk 0 0  0
6 1 7
60 07
d2k 0   
6 7
6 07
D D60
k 0 d3k    7 D diag.d1k ; d2k ; : : : ; dnk /: (1.46)
6 : :: :: 7
6 : : 7
4 : : 5
0 0 0    dnk
1.2  Square Matrices
33 1
Proof
The proof of (1.46) is simple and can be done by induction. First, it is clear that (1.46) holds
for k D 1. Now, we assume that (1.46) holds for k and show that it also holds for k C 1. That
is, we assume that
2 3
dk 0 0  0
6 1 7
60 07
d2k 0   
6 7
6 07
D D60
k 0 d3k    7 (1.47)
6 : :: :: 7
6 : : 7
4 : : 5
0 0 0    dn k

and show that


2 3
dkC1 0 0  0
6 1 7
6 0 d2 kC1
0  0 7
6 7
6 7
DkC1 D6 0 0 d3kC1  0 7: (1.48)
6 : :: :: 7
6 : : 7
4 : : 5
0 0 0    dnkC1

It is straightforward to see that (1.48) can be obtained by simply computing

DkC1 D Dk  D

and using (1.43) and (1.47). t


u

Example 1.18
Consider the matrix
2 3
1 0 0
6 7
A D 4 0 2 0 5:
p
0 0 2

Find A6 . 

Solution
Since A is diagonal, Theorem 1.2.12 shows that
2 3 2 3
.1/6 0 0 1 0 0
6 7 6 7
A6 D 4 0 .2/6 0 5 D 4 0 64 0 5 :
p 6
0 0 . 2/ 0 0 8

J
34 Chapter 1 • Matrices and Matrix Operations
1
Example 1.19
Consider the matrix
2 3
1 0 0
6 7
A D 4 0 1 0 5 :
0 0 3

Show that A1 exists and find .A1 /5 . 

Solution
Since A is diagonal, and the main diagonal does not contain zero, it follows (see Theo-
rem 1.2.11) that A1 exists and can be computed easily as
2 3
1 0 0
6 7
A1 D 4 0 1 0 5 :
0 0 1=3

Also, since A1 is diagonal we have (see Theorem 1.2.12)


2 3 2 3
.1/5 0 0 1 0 0
6 7 6 7
.A1 /5 D 4 0 .1/5 0 5 D 4 0 1 0 5 :
5
0 0 .1=3/ 0 0 1=243

Example 1.20
Find an invertible diagonal matrix A that satisfies
2 3
16 0 0
6 7
A2 D 4 0 9 05:
0 01

Solution
Take A in the form
2 3
d1 0 0
6 7
A D 4 0 d2 0 5 ;
0 0 d3
1.2  Square Matrices
35 1
with di ¤ 0; i D 1; 2; 3. The inverse of A is
2 3
1=d1 0 0
6 7
A1 D 4 0 1=d2 0 5 ;
0 0 1=d3

Therefore,
2 3 2 3
.1=d1 /2 0 0 16 0 0
6 7 6 7
A2 D .A1 /2 D 4 0 .1=d2 /2 0 5 D 4 0 9 05:
2
0 0 .1=d3 / 0 01

Whence
1 1 1
D 4; D 3; D 1:
d1 d2 d3
This yields
1 1
d1 D ; d2 D ; d3 D 1:
4 3
Therefore, the matrix A is given by
2 3
1=4 0 0
6 7
A D 4 0 1=3 0 5 :
0 0 1

1.2.5 Triangular Matrices

We have introduced the diagonal matrices in Sect. 1.2.4 and showed that these matrices
have very important properties. In particular, we have shown that we can immediately
know if a diagonal matrix is invertible or not and if it is invertible, then we can easily
find its inverse. Now, since the diagonal matrices form a very narrow set in the class of
square matrices Mn .K/, it is quite natural to ask the following question: is there a larger
set of square matrices than the set of diagonal matrices which enjoy some properties of
the diagonal matrices? The answer is yes and this class of square matrices consists of
the so-called triangular matrices.

Definition 1.2.5 (Triangular Matrix)


A square matrix in which all the entries above the main diagonal are zero is called
lower triangular, and a square matrix in which all the entries below the main
diagonal are zero is called upper triangular. A matrix that is either upper triangular
or lower triangular is called triangular.
36 Chapter 1 • Matrices and Matrix Operations
1
Example 1.21
The matrix
2 3
1 4 1 2
60 3 5 1 7
6 7
AD6 7
40 0 1 6 5
0 0 0 2

is upper triangular while the matrix


2 3
1 0 0 0
6 2 4 0 0 7
6 7
BD6 7
4 1 3 1 0 5
6 2 5 2

is lower triangular.
Obviously, every diagonal matrix is triangular. 

In the next theorem we characterize the invertible triangular matrices exactly as we


did for the diagonal matrices; however, we do not know the inverse immediately as in
the diagonal case. Thus, we have already lost some properties by expanding the set of
diagonal matrices.

Theorem 1.2.13
Let A be a triangular matrix in Mn .K/. Then, A is invertible if and only if all the entries
of the main diagonal are nonzero.

Proof
We prove the statement for upper triangular matrices. The proof for lower triangular matrices
is similar.
Let A be an upper triangular matrix in Mn .K/, then A has the form
2 3
a11 a12 a13  a1n
6 7
6 0 a22 a23  a2n 7
6 7
6 a3n 7
AD6 0 0 a33  7;
6 : :: :: :: :: 7
6 : : 7
4 : : : : 5
0 0 0    ann
1.2  Square Matrices
37 1
that is aij D 0 for all i > j. The linear homogeneous system associated to the matrix A is

a11 x1 C a12 x2 C    C a1n xn D 0;


a22 x2 C    C a2n xn D 0;
::
: (1.49)
a.n1/.n1/ xn1 C a.n1/n xn D 0;
ann xn D 0:

It is clear that if ann ¤ 0, then the last equation in (1.49) has only one solution xn D 0.
Inserting this value into the equation just before the last one, we deduce that if a.n1/.n1/ ¤
0, then xn1 D 0 is the unique solution. If we apply the same procedure to all the equations
in (1.49), we deduce that if aii ¤ 0; 1  i  n, then the only solution of (1.49) is the trivial
solution X D 0. Consequently, applying Theorem 1.2.9, we conclude that A is invertible if
and only if aii ¤ 0; 1  i  n. t
u

1.2.6 Trace

As we have seen, for diagonal and triangular matrices, the entries of the main diagonal of
those matrices are very important and by examining those entries we can immediately
identify the invertible matrices. Now, since the entries of the main diagonal are also
important in a general square matrix, so can we gain something by doing the usual
algebraic operations on those entries? For example, for diagonal and triangular
matrices, if the product of all the entries of the main diagonal is not zero, then the
matrix is invertible. Now, what about the sum of the entries of the main diagonal in a
square matrix, does it give us anything? The answer is affirmative, and as we will see
later, it turns out to be very useful. We call this sum the trace of the square matrix.

Definition 1.2.6 (Trace)


Let A D .aij /; 1  i  n; 1  j  n be a square matrix in Mn .K/. The trace of A,
denoted by tr.A/, is defined to be the sum of the entries of the main diagonal of A:

X
n
tr.A/ D aii : (1.50)
iD1

Example 1.22
Consider the matrices
2 3
1 0 2 " #
6 7 b11 b12
A D 4 3 4 0 5; BD :
b21 b22
1 5 2
38 Chapter 1 • Matrices and Matrix Operations
1
Then we have

tr.A/ D 1 C 4  2 D 3; tr.B/ D b11 C b22 :

In the next theorem we summarize some properties of the trace.

Theorem 1.2.14
Let A and B be two square matrices in Mn .K/ and k be a scalar. Then:
1. tr.A C B/ D tr.A/ C tr.B/.
2. tr.AT / D tr.A/.
3. tr.kA/ D k tr.A/.
4. tr.AB/ D tr.BA/.

In fact the last property holds for A in Mmn .K/ and B in Mnm .K/. Here AT denotes
the transpose of A (see Definition 1.4.1).

Proof
Properties (1)–(3) are trivial and follow directly from the definition. So, we only need to
show property (4). We have, by the definition of the multiplication of matrices,

X
n
AB D .cik / with cik D aij bjk ; 1  i  m; 1  j  n; 1  k  m:
jD1

Hence,

X
m m X
X n 
tr.AB/ D cii D aij bji
iD1 iD1 jD1

m X
X n 
D bji aij
iD1 jD1

n X
X m 
D bji aij
jD1 iD1

D tr.BA/:

t
u
1.3  Solving Linear Systems with Elementary Row Operations
39 1
Example 1.23
Use Theorem 1.2.14 to show that we cannot find two square matrices A and B in Mn .R/
such that

AB  BA D I; (1.51)

where I is the identity matrix in Mn .R/. 

Solution
We assume that (1.51) holds and show that this leads to a contradiction. Indeed, if (1.51)
holds, then by Theorem 1.2.14 we have

tr.AB/ D tr.BA/:

Whence

tr.AB/  tr.BA/ D tr.AB  BA/ D 0:

On the other hand

tr.I/ D n:

This is a contradiction. Hence, there are no matrices A and B such that (1.51) holds.
J

1.3 Solving Linear Systems with Elementary Row Operations

As we have seen above, in order to find the solution of a linear system (of n equations
and n unknowns) it is enough to compute the inverse of its associated n  n matrix A.
Moreover, since it is very simple to find the inverse of a diagonal matrices, it is quite
simple to solve the systems associated to them. We know from elementary algebra that
if we add an equation in the system to another one and then replace the original equation
by the sum of the two, then the solution does not change. For example, in system (1.3),
if we replace the second equation, by the sum of the two equations, we obtain
(
ax C by D p;
(1.52)
.a C c/x C .b C d/y D p C q:

Thus, if we assume that ad  bc ¤ 0, then the solution of (1.52) is the same solution of
(1.3). In matrix language, this operation is equivalent to replace the second row in the
matrix A
" #
ab
AD
cd
40 Chapter 1 • Matrices and Matrix Operations
1
to get
" #
a b
aCc bCd

and replacing the vector


" #
p
bD
q

by
" #
p
:
pCq

For simplicity, we may collect these operations in one matrix and transform the matrix
" #
ab p
BD (1.53)
cd q

into the matrix


" #
a b p
:
aCc bCd pCq

The matrix B in (1.53) is called the augmented matrix associated to the system (1.3).
Similarly, the same thing is true if we replace a row r in the augmented matrix by the
product kr, where k is a scalar.

Definition 1.3.1 (Augmented Matrix)


The augmented matrix associated to system (1.8) is the matrix in Mm.nC1/ .K/
defined as
2 3
a11 a12 a13 ::: a1n b1
6a ::: b2 7
6 21 a22 a23 a2n 7
6 : :: 7
6 : :: :: :: :: 7: (1.54)
4 : : : : : : 5
am1 am2 am3 : : : amn bm

The following elementary row operations, will not change the solution of (1.8):
▬ Multiply a row through by a nonzero constant.
▬ Interchange two rows.
▬ Add a constant multiple of a row to another row.
1.3  Solving Linear Systems with Elementary Row Operations
41 1
1.3.1 The Gauss–Jordan Elimination Method

This method is simply based on some row operations that lead to the simplest diagonal
matrix (the identity matrix if possible) for which the inverse matrix can be easily
computed if it exists. To apply the method, and for simplicity, we use the augmented
matrix described in Definition 1.3.1. Essentially, the idea is to reduce the augmented
matrix ŒAjb, where A is a square matrix in Mn .K/ and b is a vector in Mn1 .K/, to
the form ŒDjc where D is a diagonal matrix in Mn .K/ and c is a vector in Mn1 .K/,
or simply to ŒIjd, where I is the identity matrix in Mn .K/ and d is in Mn1 .K/. In this
case the solution of the system

AX D b (1.55)

will be simply X D d. As an example, consider the system


8
ˆ
< 2x1 C x2 C x3 D 5;
8x2  2x3 D 12; (1.56)
:̂ 8x C 3x D 14;
2 3

which can be written in matrix form as

AX D b;

where
2 3 2 3 2 3
2 1 1 x1 5
6 7 6 7 6 7
A D 4 0 8 2 5 ; X D 4 x2 5 and b D 4 12 5 :
0 8 3 x3 14

To apply the row operation method, consider the augmented matrix


2 3
2 1 1 5
6 7
B D 4 0 8 2 12 5 :
0 8 3 14

So, we want to get zeros everywhere except on the main diagonal. Let us denote in each
step of the row operation the obtained first, second, and third rows by r1 ; r2 , and r3
respectively. So, first in the matrix B we replace r3 by r3 C r2 and obtain
2 3
2 1 1 5
6 7
4 0 8 2 12 5 : (1.57)
0 0 1 2
42 Chapter 1 • Matrices and Matrix Operations
1
Next, in (1.57) we replace r1 by 8r1 C r2 and obtain
2 3
16 0 6 28
6 7
4 0 8 2 12 5 : (1.58)
0 0 1 2

Now, in (1.58) we replace r1 by r1  6r3 and obtain


2 3
16 0 0 16
6 7
4 0 8 2 12 5 ; (1.59)
0 0 1 2

and then in (1.59) we replace r2 by r1 C 2r3 , obtaining


2 3
16 0 0 16
6 7
4 0 8 0 8 5 : (1.60)
0 0 1 2

1
Finally, in (1.60) we replace r1 by r
16 1
and r2 by  18 r2 obtaining
2 3
100 1
6 7
40 1 0 15: (1.61)
001 2

Now, since the inverse of the identity matrix is itself, we deduce from (1.61) that
2 3
1
6 7
X D 415
2

is the solution of (1.56).

Finding the Matrix Inverse


A simple and important application of the Gauss–Jordan method is to compute the
inverse A1 of an invertible matrix A in Mn .K/. The idea is as follows: suppose that
we want to solve the system (1.55) with b the vector
2 3
1
6 7
607
b D e1 D 6 7
6 :: 7 :
4:5
0
1.3  Solving Linear Systems with Elementary Row Operations
43 1
We apply the Gauss–Jordan method to transform the augmented matrix ŒAje1  to the
matrix ŒIjb1 , where I is the identity matrix in Mn .K/ and b1 is the new resulting vector
in Mn1 .K/. Then the solution of (1.55) is X1 D b1 .
We can repeat the same process for all standard vectors e2 ; e3 ; : : : ; en given by
2 3
0
607
6 7
6:7
6:7
6:7
ei D 6 7 ;
617
6 7
6 :: 7
4:5
0

i.e., all components are zero except the ith component which is 1. In this way, we get the
augmented matrices ŒIjbi  and the corresponding solutions Xi D bi . For each vector ei
the steps are the same: apply the Gauss–Jordan method to the augmented matrix ŒAjei 
to get the new augmented matrix ŒIjbi . Hence, we can do all the steps simultaneously
and transform the matrix

ŒAje1 ; e2 ; : : : ; en 

into the matrix

ŒIjb1 ; b2 ; : : : ; bn :

Now, since e1 ; e2 ; : : : ; en are the column vectors of the identity matrix I, if we denote
by B the matrix which has b1 ; b2 ; : : : ; bn as column vectors then the above procedure is
equivalent to transform the matrix

ŒAjI

to the new matrix

ŒIjB:

It is readily verified that B D A1 . Indeed, since Xi D bi we have

AB D AŒb1 ; b2 ; : : : ; bn  D ŒAb1 ; Ab2 ; : : : ; Abn 


D ŒAX1 ; AX2 ; : : : ; AXn 
D Œe1 ; e2 ; : : : ; en 
D I:
44 Chapter 1 • Matrices and Matrix Operations
1
Example 1.24
Use the Gauss–Jordan method to find the inverse of the matrix
2 3
584
6 7
A D 42 3 25:
121

Solution
We apply the Gauss–Jordan method to find A1 . Consider the matrix
2 3
584 100
6 7
42 3 2 0 1 05:
121 001

We apply elementary row operations and in each step, we denote by r1 ; r2 , and r3 the rows of
the new matrix. First, we replace r2 by r2  2r3 and get
2 3
5 8 4 10 0
6 7
4 0 1 0 0 1 2 5 :
1 2 1 00 1

Next, we replace r3 by 5r3  r1 and get


2 3
5 8 4 1 0 0
6 7
4 0 1 0 0 1 2 5 :
0 2 1 1 0 5

Continuing, we replace r3 by r3 C 2r2 to obtain


2 3
5 8 4 1 0 0
6 7
4 0 1 0 0 1 2 5 ;
0 0 1 1 2 1

then replace r1 by r1 C 8r2 to obtain


2 3
5 0 4 1 8 16
6 7
4 0 1 0 0 1 2 5 :
0 0 1 1 2 1
1.3  Solving Linear Systems with Elementary Row Operations
45 1
Furthermore, we replace r1 by r1  4r3 to get
2 3
5 0 0 5 0 20
6 7
4 0 1 0 0 1 2 5 :
0 0 1 1 2 1

Finally, replacing r1 by 15 r1 and r2 with r2 we get


2 3
1 0 0 1 0 4
6 7
4 0 1 0 0 1 2 5 :
0 0 1 1 2 1

Consequently,
2 3
1 0 4
6 7
A1 D 4 0 1 2 5 :
1 2 1

Example 1.25
Consider the matrix
2 3
100
6 7
A D 45 4 05:
101

Show that A1 exists and use elementary row operations (Gauss–Jordan method) to find A1 .


Solution
Since A is a lower triangular matrix and the entries of its main diagonal are nonzero, the
inverse exists (Theorem 1.2.13).
To find A1 , use elementary row operation to transform the matrix

ŒAjI

into a matrix of the form

ŒIjB:

If we achieve this, then A1 D B.


46 Chapter 1 • Matrices and Matrix Operations
1
So, we consider the matrix
2 3
100 100
6 7
45 4 0 0 1 05: (1.62)
101 001

Let r1 ; r2 , and r3 be the rows of all the matrices obtained by means of row operations. We
replace in (1.62) r2 by r2  5r1 to get
2 3
100 1 00
6 7
4 0 4 0 5 1 0 5 ; (1.63)
101 0 01

then replace r3 in (1.63) by r3  r1 to get


2 3
100 1 00
6 7
4 0 4 0 5 1 0 5 ; (1.64)
0 0 1 1 0 1

and finally replace r2 by 14 r2 in (1.64) obtaining


2 3
100 1 0 0
6 7
4 0 1 0 5=4 1=4 0 5 : (1.65)
0 0 1 1 0 1

Consequently,
2 3
1 0 0
6 7
A1 D 4 5=4 1=4 0 5 :
1 0 1

Example 1.26
Find the inverse of the matrix
2 3
0 0 0 k1
6 7
60 0 k2 07
AD6 7
40 k3 0 05
k4 0 0 0

where k1 ; k2 ; k3 , and k4 are all different from zero. 


1.3  Solving Linear Systems with Elementary Row Operations
47 1
Solution
We proceed as above and write
2 3
0 0 0 k1 1 0 0 0
6 7
60 0 k2 0 0 1 0 07
6 7: (1.66)
40 k3 0 0 0 0 1 05
k4 0 0 0 0 0 0 1

We may exchange the rows as follows: r1 and r4 , and then r3 and r2 , to obtain
2 3
k4 0 0 0 0 0 0 1
6 7
60 k3 0 0 0 0 1 07
6 7: (1.67)
40 0 k2 0 0 1 0 05
0 0 0 k1 1 0 0 0

1 1 1 1
Now, in (1.67) we replace r1 by r1 , r2 by r2 , r3 by r3 , and r4 by r4 , obtaining
k4 k3 k2 k1
2 3
1 0 0 0 0 0 0 1=k4
6 7
60 1 0 0 0 0 1=k3 0 7
6 7:
40 0 1 0 0 1=k2 0 0 5
0 0 0 1 1=k1 0 0 0

Consequently, the inverse of A is given by


2 3
0 0 0 1=k4
6 0 0 1=k3 0 7
6 7
A1 D6 7:
4 0 1=k2 0 0 5
1=k1 0 0 0

Example 1.27
Let k ¤ 0 be a real number. Consider the matrix
2 3
k10
6 7
A D 40 k 15:
00k

Show that A1 exists and use elementary row operations to find A1 . 

Solution
Since A is an upper triangular matrix, the inverse exists if and only if all the entries of the
main diagonal are nonzero. So, since we took k ¤ 0, A1 exists.
48 Chapter 1 • Matrices and Matrix Operations
1
To find it, we use elementary row operations to transform the matrix

ŒAjI

to a matrix of the form

ŒIjB:

Once we achieve this, A1 D B.


So, we consider the matrix
2 3
k10 100
6 7
40 k 1 0 1 05: (1.68)
00k 001

As above, let r1 ; r2 and r3 be the rows of all the matrices obtained from row operations. In
(1.68) we replace r2 by kr2  r3 to get
2 3
k 1 0 10 0
6 2 7
4 0 k 0 0 k 1 5 : (1.69)
0 0 k 00 1

Next, in (1.69), we replace r1 by k2 r1  r2 to obtain


2 3
k3 0 0 k2 k 1
6 7
4 0 k2 0 0 k 1 5 ; (1.70)
0 0 k 0 0 1

1 1
and then in (1.70), replace r1 by r ,r
k3 1 2
by r,
k2 2
and r3 by 1k r3 to find
2 3
1 0 0 1=k 1=k2 1=k3
6 7
4 0 1 0 0 1=k 1=k2 5 :
001 0 0 1=k

Consequently,
2 3
1=k 1=k2 1=k3
6 7
A1 D 4 0 1=k 1=k2 5 :
0 0 1=k

J
1.4  The Matrix Transpose and Symmetric Matrices
49 1
Example 1.28
Show that the matrix
2 3
1 6 4
6 7
A D 4 2 4 1 5
1 2 5

is not invertible. 

Solution
To show that A is not invertible, it suffices to do some row operations and find one row which
has only zeros.
So, we consider the matrix
2 3
1 6 4 100
6 7
4 2 4 1 0 1 0 5 : (1.71)
1 2 5 0 0 1

Let r1 ; r2 and r3 be as before. In (1.71) we replace r2 by r2 C 2r3 , obtaining


2 3
1 64 1 00
6 7
4 0 8 9 5 1 2 5 : (1.72)
1 2 5 0 0 1

Now, in (1.72) we replace r3 by r3 C r1 to get


2 3
164 1 0 0
6 7
4 0 8 9 5 1 2 5 ; (1.73)
0 8 9 6 1 1

and then in (1.73), we replace r3 by r3  r2 to finally get


2 3
164 1 0 0
6 7
4 0 8 9 5 1 2 5 : (1.74)
0 0 0 11 2 3

Since the third row in left-hand side of (1.74) contains only zeros, A is not invertible.
J

1.4 The Matrix Transpose and Symmetric Matrices

In this section, we introduce two important notions: the transpose of a matrix and
symmetric matrices.
50 Chapter 1 • Matrices and Matrix Operations
1
1.4.1 Transpose of a Matrix

As we have seen before, we usually use two notations for a vector X:


3
2
x1
6 : 7
XD6 7
4 :: 5 or X D .x1 ; : : : ; xn /:
xn

Using the first notation, we can write the system (1.8) in the matrix from (1.12), with
A the matrix given in (1.13). The question now is: can we write the system (1.8) in a
matrix form using the second notation for the vector X? To do this, we recast (1.8) as
2 3
a11 a21 a31 ::: am1
6 7
6 a12 a22 a32 ::: am2 7
.x1 ; : : : ; xn / 6
6 :: :: :: :: :: 7
7 D .b1 ; : : : ; bm /:
4 : : : : : 5
a1n a2n a3n ::: amn

The n  m matrix appearing here is called the transpose of the matrix A.

Definition 1.4.1 (Transpose of a Matrix)


Let A be a matrix in Mmn .K/. We define the transpose of A, denoted by AT , as the
matrix in Mnm .K/ obtained by interchanging the rows and columns of A. That is,
the first row of AT is the first column of A, the second row of AT is the second
column of A, and so on.

Example 1.29
Let
" #
102
AD :
340

Then
2 3
1 3
6 7
AT D 4 0 45:
2 0

Now, we list some properties of the transpose of matrices.


1.4  The Matrix Transpose and Symmetric Matrices
51 1

Theorem 1.4.1 (Properties of Transpose)


Let k be a scalar. We assume that the sizes of the matrices A and B are such that the
operations below can be performed. Then:
1. .AT /T D A:
2. .A C B/T D AT C BT :
3. .kA/T D kAT :
4. .AB/T D BT AT :

Proof
The first three properties are direct consequences of the definition of the transposed matrix.
We need to prove the last two. The proof of (4) can be done by a direct computation. So,
assume that A is a matrix in Mmn .K/ and B is a matrix in Mnr .K/. Then,

AB D C D .cik /; 1  i  m; 1  k  r;

with

X
n
cik D aij bjk :
jD1

Hence,

X
n 
CT D .AB/T D .cki /1im D bkj aji 1im D BT AT :
1kr 1kr
jD1

t
u

We would also like to know how to find the inverse of the transpose AT if we know
the inverse of A? The answer is given in the following theorem.

Theorem 1.4.2 (The Inverse of the Transpose of a Matrix)


Let A be a square matrix in Mn .K/. If A is invertible, the AT is invertible and

.AT /1 D .A1 /T :

Proof
We can establish the invertibility and obtain the formula at the same time, by showing that

AT .A1 /T D .A1 /T AT D I; (1.75)


52 Chapter 1 • Matrices and Matrix Operations
1
and using the uniqueness of the inverse (Theorem 1.2.3) to conclude that .AT /1 D .A1 /T :
To show (1.75), we have, by (4) in Theorem 1.4.1,

AT .A1 /T D .A1 A/T D I T D I:

and

.A1 /T AT D .AA1 /T D I T D I:

Thus, (1.75) is verified and Theorem 1.4.2 is proved. t


u

1.4.2 Symmetric Matrices

In this section, we discuss an important class of square matrices. We have introduced


above the matrix transpose AT associated to a matrix A and its properties. The interesting
question now is the following: can we gain something if the matrix A is a square matrix
and its transpose AT turns out to coincide with A? In this case the main diagonal will
not change and all the other entries of the matrix A are symmetric with respect to the
main diagonal. Accordingly, we call A a symmetric matrix and in fact yes, such matrices
enjoy many interesting properties.

Definition 1.4.2 (Symmetric Matrix)


Let A be a square matrix in Mn .K/. The matrix A is said to be symmetric if

AT D A: (1.76)

Example 1.30
The following matrices are symmetric:
2 3 2 3
" # 1 4 5 d1 0 0
12 6 7 6 7
; 4 4 3 0 5 ; 4 0 d2 0 5 :
24
5 0 2 0 0 d3

In the next theorem we exhibit some important symmetric matrices.

Theorem 1.4.3
Let A be a matrix in Mn .K/. Then AAT ; AT A, and A C AT are symmetric matrices.
1.5  Exercises
53 1
Proof
First, for the matrix B D AAT , Theorem 1.4.1 shows that

BT D .AAT /T D .AT /T AT D AAT D B:

Thus, B is symmetric.
Second, by the same method we have for C D AT A

CT D .AT A/T D AT .AT /T D AT A D C:

Therefore, C is symmetric.
Finally, for D D A C AT , then, we have, again by Theorem 1.4.1,

DT D .A C AT /T D AT C .AT /T D AT C A D D;

so, D is also symmetric. t


u

1.5 Exercises

Exercise 1.1
We consider, for any real number x, the matrix
" #
cosh x sinh x
AD :
sinh x cosh x

1. For x and y in R, find A.x/A. y/.


2. Let n be an integer. Find ŒA.x/n .

Solution
1. We have
" #" #
cosh x sinh x cosh y sinh y
A.x/A. y/ D
sinh x cosh x sinh y cosh y
" #
cosh x cosh y C sinh x sinh y cosh x sinh y C sinh x cosh y
D
cosh x sinh y C sinh x cosh y cosh x cosh y C sinh x sinh y
" #
cosh.x C y/ sinh.x C y/
D
sinh.x C y/ cosh.x C y/
D A.x C y/; (1.77)

where we have used the known identities

cosh x cosh y C sinh x sinh y D cosh.x C y/


54 Chapter 1 • Matrices and Matrix Operations
1
and

cosh x sinh y C sinh x cosh y D sinh.x C y/:

2. It is clear that A0 D I2 D A.0/: Now, let n > 0; then by (1) above we have

ŒA.x/2 D A.x/A.x/ D A.2x/:

We show by induction that for all n > 0,

ŒA.x/n D A.nx/: (1.78)

First, it is clear that (1.78) holds for n D 0; 1, and 2.


Second, assume that (1.78) holds for n and prove it for n C 1. Thus, we have

ŒA.x/n D ŒA.x/n A.x/ D A.nx/A.x/ D A..n C 1/x/:

Thus, (1.78) holds for n C 1 and therefore, it is true for all n  0:


Now, for n < 0, we see first, by using (1.77), that
" #
10
A.x/A.x/ D A.x/A.x/ D A.0/ D I2 D :
01

Therefore, the uniqueness of the inverse shows that ŒA.x/1 D A.x/, and by definition
Ap D ŒA1 p ; p > 0. Hence, we have for n D p < 0,

ŒA.x/n D ŒA.x/p D ŒA1 .x/p D A.px/ D A.nx/:

Consequently, for any integer n, we have

ŒA.x/n D A.nx/:

Exercise 1.2 (Multiplication by a Diagonal Matrix)


Let D be the diagonal matrix in Mn .K/ with diagonal entries d1 ; : : : ; dn and A be a square
matrix in Mn .K/. Compute the products DA and AD. 

Solution
First, let us examine the simple case n D 2; then we will generalize it for all n. So, let
" # " #
d1 0 a11 a12
DD and AD :
0 d2 a21 a22
1.5  Exercises
55 1
We can easily verify that
" # " #
d1 a11 d1 a12 d1 a11 d2 a12
DA D and AD D :
d2 a21 d2 a22 d1 a21 d2 a22

So, we see that the multiplication of the matrix A from the left by D is effected by multiplying
the successive rows of A by the successive diagonal entries of D, and the multiplication of A
from the right by D is effected by multiplying the successive columns of A by the successive
diagonal entries of D.
Now, we want to show that this property holds for any n  2. So, let A D .ajk /; 1  j 
n; 1  k  n and D D dij ; 1  i  n; 1  j  n with dij D 0 for i ¤ j and dii D di . Using
Definition 1.1.11, we get

DA D .cik /; 1  i  n; 1  k  n;

with the entries of the ith row being

X
n
cik D dij ajk D dii aik D di aik ; 1  k  n:
jD1

Thus,
2 3
d1 a11 d1 a12 d1 a13 : : : d1 a1n
6d a d a d a : : : d2 a2n 7
6 2 21 2 22 2 23 7
DA D 6
6 :: :: :: :: :: 77:
4 : : : : : 5
dn an1 dn an2 dn an3 : : : dn ann

The same argument shows


2 3
d1 a11 d2 a12 d3 a13 : : : dn a1n
6d a d a d a : : : dn a2n 7
6 1 21 2 22 3 23 7
AD D 6
6 :: :: :: :: :: 77:
4 : : : : : 5
d1 an1 d2 an2 d3 an3 : : : dn ann
J

Exercise 1.3 (Nilpotent Matrices)


A square matrix in Mn .K/ is nilpotent of order k if Ak D 0.
1. Show that if A is nilpotent, then I C A is invertible.
2. Calculate the inverse of the matrix
23
123
6 7
L D 40 1 25:
001


56 Chapter 1 • Matrices and Matrix Operations
1
Solution
1. Assume that A is nilpotent of order k. We want to show that .I C A/1 exists. In the case
of real 1  1 matrices, we have (under some assumptions) the Taylor series expansion

X1
1
.1 C a/1 D D 1  a C a2  a3 C    D .1/n an :
aC1 nD0

Analogously, we may look for the inverse of I C A as the matrix B defined by

X
k1
B D I  A C A2  A3 C    D .1/n An : (1.79)
nD0

The sum in the above equation will be finite since Ak D 0. It remains to verify that B is the
inverse of .I C A/, that is we have to prove that

.I C A/B D B.I C A/ D I:

Indeed, since Ak D 0, we have

X
k1
.I C A/B D .I C A/ .1/n An
nD0

X
k1 X
k1
D .1/n An C .1/n AnC1
nD0 nD0

X
k1 X
k2
D .1/n An C .1/n AnC1
nD0 nD0

X
k1 X
k1
D .1/n An C .1/n1 An
nD0 nD1

D I:

The same argument easily shows that B.I C A/ D I: Consequently,

B D .I C A/1 :

2. It is clear that the matrix L can be written as L D I3 C A, with


2 3
023
6 7
A D 40 0 25:
000
1.5  Exercises
57 1
Then, we can easily calculate
2 3 2 3
004 000
6 7 6 7
A2 D 4 0 0 0 5 and A3 D 4 0 0 0 5 :
000 000

Hence, A is nilpotent of order 3. Using (1.79), we find that


2 3
1 2 1
6 7
L1 D I3  A C A2 D 4 0 1 2 5 :
0 0 1

Exercise 1.4
Let A be a matrix in M2 .K/ of the general form
" #
ab
AD :
cd

Show that

p.A/ D A2  .a C d/A C .ad  bc/I2 D 0:2

Solution
We compute first A2 :
" #
a2 C bc b.a C d/
A2 D AA D
c.a C d/ cb C d2

and then
" # " #
a.a C d/ b.a C d/ ad  bc 0
.a C d/A D ; .ad  bc/I2 D :
c.a C d/ d.a C d/ 0 ad  bc

This clearly yields,

A2  .a C d/A C .ad  bc/I2 D 0:

2
As we will see later, the number a C d is called the trace of A, the number ad  bc is called the determinant
of A, and the polynomial p./ D 2  .a C d/ C .ad  bc/ is called the characteristic polynomial of A. See
Definition 7.3.2.
58 Chapter 1 • Matrices and Matrix Operations
1
Exercise 1.5 (Idempotent Matrices)
Let A be a matrix in Mn .K/. Then A is said to be idempotent if A2 D A.
1. Show that if A is idempotent, then so is I  A.
2. Show that if A is idempotent, then 2A  I is invertible and is its own inverse.
3. Find all the idempotent matrices in M2 .R/.
4. Show that if A is idempotent, and if p is a positive integer, then Ap D A.

Solution
1. Since AI D IA D A, we can easily show that

.I  A/2 D I  2A C A2

D I  2A C A

D I  A;

where we have used the fact that A2 D A. This shows that I  A is idempotent since
.I  A/2 D I  A:
2. We have

.2A  I/.2A  I/ D .A C .A  I//.A C .A  I//

D A2 C .A  I/2 C 2A.A  I/
D A2 C .I  A/2 C 2A2  2A
D I;

where we have used the fact that A and I  A are idempotent matrices. Consequently,

.2A  I/1 D 2A  I:

3. Let us consider the matrix


" #
ab
AD :
cd

Then, we have
" #" # " #
2 ab ab a2 C bc ab C bd
A D D :
cd cd ac C cd bc C d2
1.5  Exercises
59 1
Therefore, A2 D A leads to the system of equations
8 2
ˆ
ˆ a C bc D a;
ˆ
< ab C bd D b;
(1.80)
ˆ
ˆ ac C cd D c;

bc C d2 D d:

From the first and the third equations, we deduce that if b D 0 and a C d ¤ 1, then a D 0
or a D 1 and c D 0. If a C d D 1, then c can be any real number. Then from the fourth
equation, we deduce that a D 1 or d D 0. Then in this case, the idempotent matrices are
" # " # " # " #
00 10 00 10
; ; ; :
00 01 c1 c0

If b ¤ 0, then, from the second equation, we have a C d D 1 and from the first equation,
2 2
we have c D a ba D dd b . Thus, the idempotent matrices of M2 .R/ are the matrices of the
form
2 3
a b
4 a  a2 5
1a
b

where a is in R and b is in R  f0g.


4. We can show by induction that if A2 D A, then

Ap D A; (1.81)

for any positive integer p. It is clear that (1.81) is satisfied for p D 1 and p D 2. Now, assume
that (1.81) holds for p and show that it is still holds for p C 1. We have

ApC1 D Ap A D AA D A2 D A:

Consequently, (1.81) holds for any positive integer p. J

Exercise 1.6 (Rotation Matrix)


We define the rotation matrix in M2 .R/ as
" #
cos   sin 
R./ D ;
sin  cos 

where  is the rotation angle.


1. Show that R1 ./ D R./ (rotate back by ).
2. Show that R.1 /R.2 / D R.1 C 2 /.


60 Chapter 1 • Matrices and Matrix Operations
1
Solution
1. Since R./ is a matrix in M2 .R/, using (1.39) we deduce that
" #
1 1 cos  sin 
R ./ D
cos2  C sin2   sin  cos 
" #
cos./  sin./
D
sin./ cos./
D R./:

2. We have by a simple computation


" #" #
cos 1  sin 1 cos 2  sin 2
R.1 /R.2 / D
sin 1 cos 2 sin 2 cos 2
" #
cos 1 cos 2  sin 1 sin 2  sin 2 cos 1  sin 1 cos 2
D
sin 2 cos 1 C sin 1 cos 2 cos 1 cos 2  sin 1 sin 2
" #
cos.1 C 2 /  sin.1 C 2 /
D
sin.1 C 2 / cos.1 C 2 /
D R.1 C 2 /:

The above result means that rotating by 1 and then by 2 , is the same as rotating by 1 C 2 .
J

Exercise 1.7 (Involutory Matrix)


Let A be a matrix in Mn .K/. We say that A is involutory matrix if

A2 D I:

1. Check that for any real number , the matrix


" #
cos  sin 
AD
sin   cos 

is an involutory matrix.
2. Find all the involutory matrices in M2 .R/.
3. Show that a matrix A is involutory if and only if

.I  A/.I C A/ D 0:

4. Show that if A is an involutory matrix, then, the matrix B D 12 .I C A/ is idempotent.


1.5  Exercises
61 1
Solution
1. We need to verify that A2 D I. By a simple computation,
" #" #
2 cos  sin  cos  sin 
A D AA D
sin   cos  sin   cos 
" #
cos2  C sin2  0
D
0 cos2  C sin2 
" #
10
D D I:
01

2. Let A be a matrix in M2 .R/, thus,


" #
ab
AD ;
cd

with a; b; c and d real numbers. We compute A2 to find


" #" # " #
2 ab ab a2 C bc ab C bd
A D AA D D :
cd cd ac C cd bc C d2

Therefore, A is involutory if and only if A2 D I, that is


8 2
ˆ
ˆ a C bc D 1;
ˆ
< ab C bd D 0;
(1.82)
ˆ ac C cd D 0;
ˆ

bc C d2 D 1:

If b D 0, then a D ˙1 and d D ˙1. Thus, the third equation in the above system gives: if
a D 1 and d D 1 or a D 1 and d D 1, then c D 0, in the other cases a D 1 and d D 1
or a D 1 and d D 1, then c can be any real number. Therefore, for b D 0, the involutory
matrices are
" # " # " # " #
10 1 0 1 0 1 0
; ; ; :
01 0 1 c 1 c 1

Now, if b ¤ 0, then the second equation in (1.82) yields d D a and a2 C bc D 1. Therefore,


the involutory matrices in this case are
" #
a b
; with a2 C bc D 1:
c a
62 Chapter 1 • Matrices and Matrix Operations
1
3. Let A be an involutory matrix in Mn .K/. Then, since I commutes with any matrix in
Mn .K/, and since A2 D I,

.I  A/.I C A/ D I 2  A2  AI C IA D I  I D 0:

4. The matrix B is idempotent (Exercise 1.5), if and only if B2 D B. We have

1 1
B2 D BB D .I C A/ .I C A/
2 2
1
D .A2 C 2IA C I 2 /
4
1
D .I C 2A C I/
4
1
D .I C A/ D B;
2

where we have used the fact that A2 D I.


J

Exercise 1.8
Let A and B be two matrices in Mn .K/ and I be the identity matrix in Mn .K/. Check that if
I C AB is invertible, then I C BA is invertible, and find its inverse. 

Solution
Assume that I C AB is invertible, that is .I C AB/1 exists. Now, a matrix C in Mn .K/ is the
inverse of .I C BA/ if and only if

.I C BA/C D C.I C BA/ D I: (1.83)

The first equality, i.e.,

.I C BA/C D I;

leads to (since the of multiplication is associative and distributive over the addition)

C C B.AC/ D I:

Or, equivalently,

B.AC/ D I  C: (1.84)

Multiply from the left by A, we get

.AB/.AC/ D A  AC;
1.5  Exercises
63 1
whence

AC C .AB/.AC/ D A:

That is

.I C AB/.AC/ D A:

So, multiplying from the left by .I C AB/1 we have

AC D .I C AB/1 A: (1.85)

Now, using the second identity in (1.83), we obtain

C.BA/ D I  C: (1.86)

Multiplying from the right by B we get

C.BA/B D B  CB:

That is

.CB/.I C AB/ D B:

Multiplying here from the right by .I C AB/1 , we get

CB D B.I C AB/1 : (1.87)

From (1.85) and (1.87) we deduce that

.CA/B D B.AC/ D B.I C AB/1 A:

On the other hand, (1.84) and (1.86) imply that

.CA/B D B.AC/ D I  C:

Consequently,

I  C D B.I C AB/1 A;

and so

.I C BA/1 D C D I  B.I C AB/1 A:

J
64 Chapter 1 • Matrices and Matrix Operations
1
Exercise 1.9
Solve in Mn .K/ the equation

2A C 3AT D I: (1.88)

Solution
Using the properties of the transpose, we recast (1.88) as

.2A C 3AT /T D I T :

That is,

2AT C 3A D I: (1.89)

Multiplying Eq. (1.88) by 2 and Eq. (1.89) by 3 and adding the results, we obtain

5A D I:

Therefore, A D 15 I: J

Exercise 1.10
Let
" #
ab
AD
cd

and B be two matrices in M2 .K/ such that

A2 B D BA2 and a C d ¤ 0:

Show that

AB D BA:

Solution
We have seen in Exercise 1.4 that if A is a matrix in M2 .K/, then

A2  .a C d/A C .ad  bc/I2 D 0:

Consequently, we can write A as

1
AD ŒA2 C .ad  bc/I2 :
.a C d/
1.5  Exercises
65 1
Since the two matrices A2 and I2 commute with B, we have

1
AB D ŒA2 C .ad  bc/I2 B
.a C d/
1
D ŒA2 B C .ad  bc/I2 B
.a C d/
1
D ŒBA2 C .ad  bc/BI2 
.a C d/
1
D BŒ ŒA2 C .ad  bc/I2 
.a C d/
D BA:

Thus, A and B commute. J

Exercise 1.11 (Subgroup of GL3 .R/)


Let .G; / be a group and let H be a nonempty subset of G. Then H is a subgroup of G if
(H1) For all x; y in H, x  y is in H.
(H2) If x is in H, then the inverse x0 is in H.
Let a be a real number, and define the matrix M.a/ in M3 .R/ as
2 3
1 a a
6 a2 a2 7
6 7
M.a/ D 6 a 1 C 2 2 2 7:
4 5
a2 a
a  1
2 2
Next introduce the set

H D fM.a/ W where a is in Rg

and the matrix


2 3
0 11
6 7
U D 4 1 0 05:
1 0 0

1. Show that M.a/ D I3 C aU C a2 U 2 .


2. Prove that, for all a and b in R,

M.a C b/ D M.a/M.b/: (1.90)

3. Show that H is a subgroup in GL.3; R/ with respect to multiplication of matrices.


4. Find .M.a//k , where k is an integer.


66 Chapter 1 • Matrices and Matrix Operations
1
Solution
1. We compute
2 32 3 2 3
0 11 0 11 0 0 0
6 76 7 6 7
U 2 D UU D 4 1 0 0 5 4 1 0 0 5 D 4 0 1 1 5 :
1 0 0 1 0 0 0 1 1

Now, we write the matrix M.a/ as


2 3
1 a a
6 a2 7a2
6 7
M.a/ D 6 a 1 C 2 7 2 2
4 5
a2 a
a  1
2 2
2 3 2 3 2 3
100 0 11 2 0 0 0
6 7 6 7 a 6 7
D 40 1 05 C a4 1 0 05 C 40 1 1 5:
2
001 1 0 0 0 1 1

Thus,

a2 2
M.a/ D I3 C aU C U :
2

2. By assertion (1) above, and since I3 commute with U and U 2 ,


  
a2 2 b2 2
M.a/M.b/ D I3 C aU C U I3 C bU C
U
2 2
2  2 
.a C b/ 2 a b b2 a a2 b2 4
D I3 C .a C b/U C U C C U3 C U :
2 2 2 4

We can easily check that


2 32 3 2 3
0 0 0 0 11 000
6 7 6 7 6 7
U3 D U2 U D 4 0 1 1 5 4 1 0 0 5 D 4 0 0 0 5 :
0 1 1 1 0 0 000

Also,

U 4 D U 3 U D 0:

Consequently,

.a C b/2 2
M.a/M.b/ D I3 C .a C b/U C U
2
D M.a C b/:
1.5  Exercises
67 1
We may also prove the above identity by a direct computation, using the form of the matrices
M.a/ and M.b/.
3. It is clear that H is nonempty, since M.0/ D I3 is in H. Also, by (1.90), if M.a/ and
M.b/ are in H, then the product M.a/M.b/ is also in H. In addition, using (1.90) once again,
we have

M.a/M.a/ D M.a  a/ D M.0/ D I3 :

Thus, if M.a/ is in H, then the inverse M 1 .a/ D M.a/ is in H and so H is a subgroup of


GL.3; R/.
4. First, let us assume that k  0. We claim that (1.90) yields

.M.a//k D M.a/M.a/    M.a/


„ƒ‚…
k times

D M.a C a C    C a/ D M.ka/: (1.91)


„ƒ‚…
k times

This can be verified by induction. It is clear that (1.91) is true for k D 0, k D 1, and k D 2.
Next, assume that (1.91) holds for k and show that it also holds for k C 1. We have, by (1.90),

.M.a//kC1 D .M.a//k M.a/

D M.ka/M.a/

D M.ka C a/

D M..k C 1/a/:

Therefore, (1.91) also hold for k C 1. Consequently, for any integer k  0,

.M.a//k D M.ka/:

Now, if k  0, we have k0 D k  0, and so we can write

.M.a//k D Œ.M.a//1 k


0
D .M.a//k

D M.k0 a/

D M.ka/:

Consequently, for any integer k,

.M.a//k D M.ka/:

J
68 Chapter 1 • Matrices and Matrix Operations
1
Exercise 1.12
Show that any matrix A in M2 .R/, with A ¤ I2 , satisfying A3 D I2 has trace equal to 1. 

Solution
As we have seen in Exercise 1.4, if A is a matrix in M2 .K/, then we have

A2  tr.A/A C det.A/I2 D 0: (1.92)

Multiplying (1.92) by A, we get

A3  tr.A/A2 C det.A/A D 0:

Since A3 D I2 , this yields


 
I2  tr.A/ tr.A/A  det.A/I2 C det.A/A D 0; (1.93)

where we have used again (1.92). Rearranging (1.93), we get


   
det.A/  .tr.A//2 A C 1 C tr.A/ det.A/ I2 D 0:

This gives, since A ¤ I2 ,

det.A/ D .tr.A//2 and 1 C tr.A/ det.A/ D 0:

Consequently,

.tr.A//3 D 1:

Since A is a real matrix, tr.A/ is a real number and so

tr.A/ D 1:

J
69 2

Determinants

Belkacem Said-Houari

© Springer International Publishing AG 2017


B. Said-Houari, Linear Algebra, Compact Textbooks in Mathematics,
DOI 10.1007/978-3-319-63793-8_2

2.1 Introduction

As, indicated before, one of the main goals in linear algebra is to be able to determine
whether a given square matrix is invertible or not, and if invertible, to find its inverse.
In this chapter, we give a general criterion for the invertibility of square matrices. So, let
us first recall the equation

ax D b; (2.1)

and its solution, given by

b
xD D a1 b: (2.2)
a

So, the solution in (2.2) is defined if and only if

a ¤ 0: (2.3)

Now, for a system of two equations and two unknowns, we have seen that the system
(
ax C by D p;
(2.4)
cx C dy D q;

has a unique solution if and only if

ad  bc ¤ 0: (2.5)

Using matrix notation, then system (2.4) can be rewritten as

AX D b; (2.6)
70 Chapter 2 • Determinants

where
2 " # " # " #
ab x p
AD ; XD ; bD :
cd y q

The number adbc constructed from the entries of the matrix A is called the determinant
of the 2  2 matrix A and is denoted by

det.A/ D ad  bc:

In analogy with this, and if we regard the constant a in (1.1) as a square matrix in
M1 .K/, then the number a in (2.3) is the determinant of the matrix Œa. As (2.3) and
(2.5) show, the Eq. (2.1) and the system (2.4) have unique solutions, that is to say, the
associated matrices are invertible, if and only if their determinants are not zero. So, the
natural question is the following: can we extend this condition for any square matrix A
in Mn .K/? That is, can we show that the matrix A in Mn .K/ is invertible if and only if
its determinant is not zero? Before answering this question, we need to explain how to
find the determinant of a square matrix A in Mn .K/. A main goal in this chapter is to
give answers to the above two questions.

2.2 Determinants by Cofactor Expansion

Since according to the above definitions, the determinant of the 1  1 matrix Œa is a, we
may write the determinant of the 2  2 matrix as

det.A/ D ad  bc D detŒa detŒd  detŒb detŒc: (2.7)

Thus, we expressed the determinant of the 2  2 matrix A using determinants of 1  1


matrices. Now, we can define the determinant recursively, meaning that the definition of
the determinant of an n  n matrix makes use of the determinant of .n  1/  .n  1/
matrices.
So, we want to obtain an analogous formula of (2.7) for a matrix in Mn .K/ for any
n  1. For this purpose, it is more convenient to rewrite the matrix A in M2 .K/ as
" #
a11 a12
AD
a21 a22

and rewrite (2.7) as

det.A/ D detŒa11  detŒa22   detŒa12  detŒa21 : (2.8)


2.2  Determinants by Cofactor Expansion
71 2
Now in order to find and expression for the determinant of a matrix A in M3 .K/, we
consider the system of linear equations
8
ˆ
< a11 x1 C a12 x2 C a13 x3 D b1 ;
a21 x1 C a22 x2 C a23 x3 D b2 ; (2.9)
:̂ a x C a x C a x D b :
31 1 32 2 33 3 3

We leave it to the reader to check that the above system has a unique solution if and only
if

a11 .a22 a33  a23 a32 /  a21 .a12 a33  a13 a32 / C a31 .a12a23  a13 a22 / ¤ 0: (2.10)

We call the number in (2.10) the determinant of the matrix


2 3
a11 a12 a13
6 7
A D 4 a21 a22 a23 5 :
a31 a32 a33

The left-hand side in (2.10) can be rewritten as

a11 .a22 a33  a23 a32 /  a12 .a21a33  a13 a32 / C a31 .a12 a23  a13 a22 /
" # " # " #
a22 a23 a21 a23 a12 a22
D a11 det  a12 det C a31 det :
a32 a33 a31 a33 a13 a23

So, we have written the determinant of a 3  3 matrix in terms of determinants of 2  2


matrices.
If, we denote
" # " # " #
a22 a23 a21 a23 a12 a22
M11 D det ; M21 D det ; M31 D det ;
a32 a33 a31 a33 a13 a23

we can rewrite the determinant of the 3  3 matrix as


2 3
a11 a12 a13
6 7
det 4 a21 a22 a23 5 D a11 M11  a12 M12 C a13 M13 : (2.11)
a31 a32 a33

We observe that M11 is obtained by removing the first row and the first column from
the matrix A and computing the determinant of the resulting 2  2 matrix. Similarly,
we can find M12 by removing the first row and the second column and computing the
determinant of the remaining matrix, and so on. These Mij are called the minors of the
matrix A.
72 Chapter 2 • Determinants

Definition 2.2.1 (Minor)


2 Let A D .aij /; 1  i; j  n, be a matrix in Mn .K/; n  2. For any 1  i; j  n, the
determinant of the matrix in Mn1 .K/ obtained by deleting row i and column j
from A is called a minor and is denoted by Mij , and we write

2 3
a11 a12 : : : a1.j1/ a1.jC1/ : : : a1n
6 : : : a2.j1/ : : : a2n 7
6 a21 a22 a2.jC1/ 7
6 :: :: :: :: :: :: :: 7
6 7
6 : : : : : : : 7
6 7
Mij D det 6
6 a.i1/1 a.i1/2 : : : a.i1/.j1/ a.i1/.jC1/ : : : a.i1/n 7
7: (2.12)
6 7
6 a.iC1/1 a.iC1/2 : : : a.iC1/.j1/ a.iC1/.jC1/ : : : a.iC1/n 7
6 7
6 :: :: :: :: :: :: :: 7
4 : : : : : : : 5
an1 an2 : : : an.j1/ an.jC1/ : : : ann

Example 2.1
Consider the matrix
2 3
103
6 7
A D 42 1 25:
051

Then,
" # " #
12 22
M11 D det D 1  1  2  5 D 9; M12 D det D 2;
51 01
" # " #
21 03
M13 D det D 10; M21 D det D 15;
05 51
" # " #
13 10
M22 D det D 1; M23 D det D 5;
01 05
" # " #
03 13
M31 D det D 3; M32 D det D 4;
12 22
" #
10
M33 D det D 1:
21


2.2  Determinants by Cofactor Expansion
73 2
In (2.11), we saw that the second term has a negative sign, while the first and the last
terms have a positive sign. So, to avoid the negative signs and to be able to write an easy
formula for the determinant, we define what we call the cofactor.

Definition 2.2.2 (Cofactor)


Let Mij ; 1  i; j  n be the minors associated to the square matrix
A D .aij /; 1  i; j  n. Then, we define the cofactor Cij as

Cij D .1/iCj Mij : (2.13)

Example 2.2
In Example 2.1, we have, for instance,

C11 D .1/1C1 M11 D M11 D 9

and

C23 D .1/2C3 M23 D M23 D 5:

Using the above definition, we can rewrite (2.11) as


2 3
a11 a12 a13
6 7
det 4 a21 a22 a23 5 D a11 C11 C a12 C12 C a13 C13
a31 a32 a33
D a21 C21 C a22 C22 C a23 C2 3
D a31 C31 C a32 C32 C a33 C33 :

We can also write the above determinant using the columns, as follows:
2 3
a11 a12 a13
6 7
det 4 a21 a22 a23 5 D a11 C11 C a21 C21 C a31 C31
a31 a32 a33
D a12 C12 C a22 C22 C a32 C32
D a13 C13 C a23 C23 C a33 C33 :

Now, the above formulas for the determinant can be generalized to any square matrix in
Mn .K/ as follows.
74 Chapter 2 • Determinants

Definition 2.2.3 (Determinant)


2 Let A D .aij /; 1  i; j  n be a matrix in Mn .K/; n  2. Then, we define the
determinant of A, using the rows of the matrix A, as

det.A/ D ai1 Ci1 C ai2 Ci2 C    C ain Cin ; (2.14)

for any fixed i. Or, using the columns of the matrix A, as

det.A/ D a1j C1j C a2j C2j C    C anj Cnj ; (2.15)

for any fixed j. The above two formulas are called the cofactor expansion of the
determinant.

Example 2.3
Find the determinant of the matrix A given by
2 3
103
6 7
A D 42 1 25:
051

Solution
To calculate the determinant of A, we need first to choose one row or one column and make
use of Definition 2.2.3 accordingly. The smart choice is to take the row or the column that
contain the largest number of zeros. In this case, we may choose the first row and use it to do
the cofactor expansion. So, we use (2.15) with i D 1 and write

det.A/ D a11 C11 C a12 C12 C a13 C13 :

Since a12 D 0, this becomes

det.A/ D a11 C11 C a13 C13

D C11 C 3C13 : (2.16)

We have computed the minors of the above matrix A in Example 2.1, so we have

C11 D .1/1C1 M11 D M11 D 9; C13 D M13 D 10:

Inserting these values in (2.16), we get

det.A/ D 9 C 3  10 D 21:
2.2  Determinants by Cofactor Expansion
75 2
⊡ Fig. 2.1 To evaluate the
3  3 determinant, we take the + + + − − −
products along the main
diagonal and the lines parallel to a11 a12 a13 a11 a12
it with a .C/ sign, and the
products along the second
diagonal and the lines parallel to a21 a22 a23 a21 a22
it wit a ./ sing

a31 a32 a33 a31 a32

We can obtain the above result using the trick in ⊡ Fig. 2.1 as follows:

det.A/ D 1  1  1 C 0  2  0 C 3  2  5  3  1  0  1  2  5  0  2  1 D 31  10 D 21:

Example 2.4
Calculate the determinant by a cofactor expansion for
2 3
2 1 5
6 7
A D 4 1 4 3 5 :
4 2 0

Solution
We may calculate the determinant of A using the third row:

det.A/ D a13 C13 C a23 C23 C a33 C33

D 4C13 C 2C23 :

Now, we have
" #
1C3 1 5
C13 D .1/ M13 D det D 17
4 3

and
" #
2C3 21
C23 D .1/ M23 D  det D 1:
53
76 Chapter 2 • Determinants

Consequently,
2
det.A/ D 4  17 C 2  .1/ D 66:

2.3 Properties of the Determinants

In this section, we give the determinant of some particular matrices and establish some
properties of the determinant.
It is clear from Definition 2.2.3 that if we use the cofactor expansion along one of
the rows or along one of the columns of the matrix A, then we obtain the same value
for the determinant. This implies that, A and AT have the same determinant and we state
this in the following theorem.

Theorem 2.3.1 (Determinant of the Transpose)


Let A be a matrix in Mn .K/. Then

det.A/ D det.AT /: (2.17)

In the following theorems, we calculate the determinant of diagonal and triangular


matrices.

Theorem 2.3.2 (Determinant of a Diagonal Matrix)


Let D D .dij /; 1  i  n; 1  j  n be a diagonal matrix in Mn .K/. Then the
determinant of D is the product of the entries of the main diagonal. That is,

det.D/ D d11 d22    dnn : (2.18)

Proof
The proof of (2.18) can be done by induction. We first take n D 2 and let D2 be the matrix
" #
d11 0
D2 D :
0 d22

Clearly,

det.D2 / D d11 d22 :


2.3  Properties of the Determinants
77 2
Therefore, (2.18) holds for n D 2. Now, assume that (2.18) holds for n  1, that is
2 3
d11 0 0    0
6 7
6 0 d22 0    0 7
6 7
6 7
det.Dn1 / D det 6 0 0 d33    0 7 D d11 d22    d.n1/.n1/ ;
6 : :: 7
6 : : 7
4 : 5
0 0 0    dn1

and let us show that (2.18) holds for n. Choosing i D n and applying formula (2.14), we get

det.D/ D dn1 Cn1 C dn2 Cn2 C    C dnn Cnn


D dnn Cnn

D dnn .1/nCn Mnn

D dnn det.Dn1 /

D dnn d11 d22    d.n1/.n1/ : (2.19)

Thus, (2.18) holds for n as claimed. t


u

Example 2.5
Let
2 3
1 0 0
6 7
A D 4 0 3 05
0 05

Then

det.A/ D .1/  3  5 D 15:

ⓘ Remark 2.3.3 We deduce immediately from Theorem 2.3.2 that if In is the identity
matrix in Mn .K/, then

det.In / D 1:

Theorem 2.3.4 (Determinant of a Triangular Matrix)


Let A D .aij /; 1  i; j  n be a triangular matrix in Mn .K/. Then the determinant of
A is the product of the entries of the main diagonal. That is

det.A/ D a11 a22    ann : (2.20)


78 Chapter 2 • Determinants

Proof
2 We prove this statement for upper triangular matrices; the same argument works for lower
triangular matrices. So, let
2 3
a11 a12 a13    a1n
6 7
6 0 a22 a23    a2n 7
6 7
6 a3n 7
AD6 0 0 a33    7
6 : :: :: 7
6 : : 7
4 : : 5
0 0 0    ann

As in the case of diagonal matrices, we proceed by induction. For n D 2, we take


" #
a11 a21
A2 D ;
0 a22

and then

det.A2 / D a11  a22 :

Now, assume that the determinant of the matrix


2 3
a11 a12 a13    a1.n1/
6 7
6 0 a22 a23    a2.n1/ 7
6 7
6 7
An1 D6 0 0 a33    a3.n1/ 7
6 : :: 7
6 : : 7
4 : 5
0 0 0    a.n1 /.n1/

is equal to

det.An1 / D a11 a22    a.n1/.n1/ :

Letting i D n in (2.14), we get

det A D ann Cnn D ann .1/nCn Mnn

D ann det.An1 /

D ann a11 a22    a.n1/.n1/ :

This completes the proof of Theorem 2.3.4. t


u
2.4  Evaluating Determinants by Row Reduction
79 2
Example 2.6
Find the determinant of the matrices
2 3
2 3 1 0 0 0
1 3 1 63
6 7 6 1 0 0 77
A D 40 2 0 5 and BD6 7:
40 5 3 0 5
0 0 4
3 1 6 3

Solution
Since A is upper triangular and B is lower triangular using (2.20) we get

det.A/ D 1  2  .4/ D 8

and

det.B/ D 1  .1/  3  .3/ D 9:

ⓘ Remark 2.3.5 We have seen in Theorems 1.2.11 and 1.2.13 that diagonal and
triangular matrices are invertible if and only if all the entries of the main diagonal are
not zero. That is, if the product of those entries is not zero, which is equivalent to the
fact that the determinant is not zero.

2.4 Evaluating Determinants by Row Reduction

As we have seen above, it is easy to compute the determinant of diagonal and triangular
matrices. So, if we can apply the row operations to transform a square matrix into
a triangular matrix (which is easier than transforming it to a diagonal one), then the
determinant of the new matrix can be calculated by just taking the product of the entries
of the main diagonal. Therefore, the question is: how do the row operations affect the
determinant? In this section, we answer this question and compute some determinants
using the row reduction method.
We begin by a fundamental theorem that will lead us to an efficient procedure for
evaluating the determinant of square matrices.

Theorem 2.4.1
Let A D .aij /; 1  i; j  n, be a square matrix in Mn .K/. If A has a row of zeros or a
column of zeros, then

det.A/ D 0:
80 Chapter 2 • Determinants

Proof
2 Suppose that there exists 1  i0  n, such that ai0 j D 0 for all 1  j  n. Then, using (2.14)
for i D i0 , we deduce that

det.A/ D ai0 1 Ci0 1 C ai0 2 Ci0 2 C    C ai0 n Ci0 n D 0:

Similarly, if there exists 1  j0  n, such that aij0 D 0 for all 1  i  n, then using (2.15)
for j D j0 , we get

det.A/ D a1j0 C1j0 C a2j0 C2j0 C    C anj0 Cnj0 D 0:

This finishes the proof of Theorem 2.4.1. t


u

Now let
" #
ab
AD
cd

be a matrix in M2 .K/ and let B1 be the matrix that results by interchanging the two rows
and B2 be the matrix that results by interchanging the two columns; that is,
" # " #
cd ba
B1 D and B2 D :
ab dc

Then,

det.B1 / D det.B2 / D cb  ad D .ad  bc/ D  det.A/:

Next, let B3 the matrix that results by multiplying one row (the first row for instance)
by a scalar k and B4 be the matrix that results from multiplying the first column by a
scalar k; that is
" # " #
ka kb ka b
B3 D and B4 D :
c d kc d

Then,

det.B3 / D det.B4 / D k.ad  bc/ D k det.A/:

Finally for this case, let B5 be the matrix that results by adding a multiple of one row
of the matrix A to another row and B6 be the matrix that results by adding a multiple of
one column of A to another column, that is, for instance,
" # " #
a C kc b C kd a C kb b
B5 D and B6 D :
c d c C kd d
2.4  Evaluating Determinants by Row Reduction
81 2
Then,

det.B5 / D det.B6 / D ad  bc D det.A/:

The ways the above row operations affect the value of the determinant remain valid
for any square matrix in Mn .K/; n  1.

Theorem 2.4.2
Let A be a matrix in Mn .K/.
1. If B is the matrix that obtained by interchanging two rows or two columns of A,
then

det.B/ D  det.A/: (2.21)

2. If B is the matrix that obtained by multiplying a single row or a single column of


A by a scalar k, then

det.B/ D k det.A/: (2.22)

3. If B is the matrix that obtained by multiplying one row of A by a scalar k and


adding it to another row, or from multiplying one column of A by a scalar k and
adding it to another column, then

det.B/ D det.A/: (2.23)

Proof
Let A D .aij / and B D .bij /; 1  i; j  n.
1. Without loss of generality, we can consider for instance the case where B is the matrix
obtained from A by interchanging the first two rows. Let Mij and Mij0 ; 1  i; j  n, denote
the minors of A and B and Cij and Cij0 denote the cofactors of A and B, respectively. Then, by
using the cofactor expansion through the first row of A, we have

det.A/ D a11 C11 C a12 C12 C    C a1n C1n

D a11 M11  a12 M12 C    ˙ a1n M1n :

On the other hand, we have, by using the second row of B,

0 0 0
det.B/ D b21 C21 C b21 C21 C    C b2n C2n
0 0 0
D a11 C21 C a12 C22 C    C a1n C2n ; since b2j D a1j ; 1  j  n;
0 0 0
D a11 M21 C a12 M22   ˙ a1n M2n
82 Chapter 2 • Determinants

0
D a11 M11 C a12 M12     ˙ a1n M1n ; since M2j D M1j ; 1  j  n;
2
D  det.A/:

2. Assume that the matrix B is obtained by multiplying a row i0 of A by a nonzero constant


k. Then, using the cofactor expansion along the row i0 , we may calculate det.A/ and det.B/
using Ci0 j and Ci00 j ; 1  j  n, to denote the cofactors of ai0 j and bi0 j , respectively. We have

det.A/ D ai0 1 Ci0 1 C ai0 2 Ci0 2 C    C ai0 n Ci0 n

and

det.B/ D bi0 1 Ci00 1 C bi0 2 Ci00 2 C    C bi0 n Ci00 n :

Since

aij D bij ; for i ¤ i0 ;

we have

Ci0 j D Ci00 j ; 1  j  n:

We also have bi0 j D kai0 j . Therefore,

det.B/ D bi0 1 Ci00 1 C bi0 2 Ci00 2 C    C bi0 n Ci00 n

D .ka/i0 1 Ci0 1 C .ka/i0 2 Ci0 2 C    C .ka/i0 n Ci0 n

D k.ai0 1 Ci0 1 C ai0 2 Ci0 2 C    C ai0 n Ci0 n /

D k det.A/:

Similarly, we can show .3/. We leave this as an exercise to the reader. Also, the same
argument can be applied if we use columns instead of rows. t
u

Elementary Matrices
If the matrix A in Theorem 2.4.2 is the identity matrix, then the matrix B is called an
elementary matrix.

Definition 2.4.1 (Elementary Matrix)


A matrix E is called an elementary matrix if it can be obtained from the identity
matrix by performing a single elementary row (or column) operation.
2.4  Evaluating Determinants by Row Reduction
83 2
Example 2.7
The following matrices are elementary matrices:
2 3
2 3 2000 2 3
100 6 7 103
6 7 60 1 0 07 6 7
40 0 15; 6 7; 40 1 05:
40 0 1 05
010 001
„ƒ‚… 0001 „ƒ‚…
Interchange the second and the third row of I3 „ƒ‚… Add 3 times the third row of I3 to the first row
Multiply the first row of I4 by 2

Our goal now is to show that the row operations on a matrix A are equivalent of
multiplying the matrix A from the left by a finite sequence of elementary matrices, and
similarly the column operations on A are equivalent of multiplying the matrix A from
the right by a finite sequence of elementary matrices. To see this first on an example,
consider the matrix
2 3
1 2 3
6 7
A D 44 6 05
5 1 7

and let B be the matrix obtained from A by interchanging the first and the second rows:
2 3
4 6 0
6 7
B D 4 1 2 3 5 :
5 1 7

Passing from A to B is equivalent to multiply A by the elementary matrix


2 3
010
6 7
E1 D 4 1 0 0 5 ;
001

that is

B D E1 A:

Next, let C be the matrix obtained by multiplying the second column of A by 2:


2 3
1 4 3
6 7
C D 4 4 12 0 5 :
5 2 7
84 Chapter 2 • Determinants

Then,
2
C D AE2 ;

with
2 3
100
6 7
E2 D 4 0 2 0 5 :
001

ⓘ Remark 2.4.3 It is not hard to see that every elementary matrix is invertible and its
inverse is also an elementary matrix.

Now, from the definition of elementary matrices, we can easily deduce the follow-
ing:
▬ If E1 is the elementary matrix obtained by interchanging two rows or two columns
of the identity matrix, then det.E1 / D 1.
▬ If E2 is the elementary matrix obtained by multiplying a single row or a single
column of the identity matrix by a scalar k, then det.E2 / D k.
▬ If E3 is the elementary matrix obtained by adding a multiple of one row (respectively,
one column) of the identity matrix to another row (respectively, another column),
then det.E3 / D 1.

The following theorem is also very useful.

Theorem 2.4.4
Let A be a matrix in Mn .K/. If A contains two proportional rows or two proportional
columns, then

det.A/ D 0:

Proof
Let A D .aij /; 1  i; j  n, be a matrix in Mn .K/. Assume that there exist 1  i0 ; i1  n
such that the row ri0 and the row ri1 satisfy ri1 D kri0 .
Let B be the matrix that obtained by adding ri1 to kri0 in A. Then, by using (2.23), we
have

det.B/ D det.A/:

But all the entries of the resulting row in B are zero. Thus, Theorem 2.4.1 implies that
det.B/ D 0 and therefore, det.A/ D 0. The same method can be applied if two columns
of A are proportional. t
u
2.4  Evaluating Determinants by Row Reduction
85 2
Example 2.8
Consider the matrices
2 3 2 3
1 3 1 1 3 9
6 7 6 7
A D 40 2 0 5 and B D 40 2 6 5:
2 6 2 0 1 3

Since the first row and the third row of A are proportional .r3 D 2r1 /, det.A/ D 0. Similarly,
since the second column and the third column of B are proportional, det.B/ D 0. 

Example 2.9
Use the row reduction method to calculate the determinant of the matrix
2 3
0 1 5
6 7
A D 4 3 6 9 5 :
2 6 1

Solution
Since the determinant of a triangular matrix is the product of the entries of the main diagonal,
we apply the necessary row operations in order to get a triangular matrix. First, let A1 be the
matrix that obtained by interchanging r1 (the first row) and r2 in A, that is
2 3
3 6 9
6 7
A1 D 4 0 1 5 5 :
2 6 1

Theorem 2.4.2 leads to

det.A1 / D  det.A/:

Next, let A2 be the matrix obtained by multiplying the first row in A1 by k D 1=3, i.e.,
2 3
1 2 3
6 7
A2 D 4 0 1 5 5 :
2 6 1

By Theorem 2.4.2,

1
det.A2 / D det.A1 /:
3
86 Chapter 2 • Determinants

Let A3 be the matrix obtained by replacing r3 in A2 by r3  2r1 , i.e.,


2
2 3
1 2 3
6 7
A3 D 4 0 1 5 5 :
0 10 5

Then again by Theorem 2.4.2,

det.A3 / D det.A2 /:

Finally, let A4 be the matrix obtained by replacing r3 in A3 by r3  10r2 , i.e.,


2 3
1 2 3
6 7
A4 D 4 0 1 5 5 :
0 0 55

Then

det.A4 / D det.A3 / D 55;

since A4 is a triangular matrix.


Now, using the above formulas, we have

det.A/ D  det.A1 / D 3 det.A2 / D 3 det.A3 / D 3 det.A4 / D 3  .55/ D 165:

Theorem 2.4.5
Let A be a matrix in Mn .K/ and k be a scalar. Then, for B D kA, we have

det.B/ D kn det.A/: (2.24)

Proof
Let A D .aij /; 1  i; j  n. Then the matrix B is given by B D .kaij /; 1  i; j  n. So, to
get B we need to do n row operations. Let A0 D A, An D B and Ai ; 1  i  n be the matrix
obtained by multiplying the row ri of the matrix Ai1 by k. Then, applying Theorem 2.4.2,
we get

det.Ai / D k det.Ai1 /
2.4  Evaluating Determinants by Row Reduction
87 2
and therefore

det.B/ D det.An / D kn det.A0 / D kn det.A/:

This finishes the proof of Theorem 2.4.5. t


u

Theorem 2.4.6 (The Determinant is Not Linear)


Let A and B be two matrices in Mn .K/; n  2. Then in general

det.A C B/ ¤ det.A/ C det.B/: (2.25)

Proof
To show (2.25), we provide a counterexample. Thus, consider the two matrices
" # " #
12 3 0
AD and BD :
03 1 2

Then,
" #
42
ACBD :
11

We have

det.A/ D 3; det.B/ D 6; and det.A C B/ D 2:

so,

2 D det.A C B/ ¤ det.A/ C det.B/ D 3:

t
u

A very important property of the determinant is the following multiplicative property.

Theorem 2.4.7 (Multiplicativity)


Let A and B be two matrices in Mn .K/. Then

det.AB/ D det.A/ det.B/: (2.26)


88 Chapter 2 • Determinants

Proof
2 If A is a singular matrix, then det.A/ and det.AB/ are both zero (Theorem 2.4.8). Hence,
(2.26) holds. So, we can assume that A is invertible. Then, A can be row reduced (using the
Gauss–Jordan elimination method in  Sect. 1.3.1) to the identity matrix. That is, we can
find a finite sequence of elementary matrices E1 ; E2 ; : : : ; E` such that

E 1 E 2    E ` A D In : (2.27)

Hence,

A D E`1    E21 E11 In

and Ei1 ; 1  i  ` are elementary matrices.


Since

AB D E`1    E21 E11 B;

the proof of (2.26) is reduced to show that for any elementary matrix E and any square matrix
M in Mn .K/, we have

det.EM/ D det.E/ det.M/: (2.28)

It is clear that (2.28) is satisfied, since (see Theorem 2.4.2):


▬ If E represents a row exchange, the det.E/ D 1 and then det.EM/ D  det.M/, since
the product EM is equivalent to exchange two rows in M.
▬ If E represents the multiplication of the ith row by a nonzero constant k, then det.E/ D k
and det.EM/ D k det.M/.
▬ If E represents adding k times row j to row i, then det.E/ D 1 and det.EM/ D det.M/.

Consequently,

det.AB/ D det.E`1    E21 E11 B/ D det.E`1 / det.E`1


1
   E21 E11 B/

since E D E`1 is an elementary matrix and we may take M D E`11


   E21 E11 B. We can
1 1 1 1
continue the process for E D E`1 and M D E`2    E2 E1 B, eventually obtaining

det.AB/ D det.E`1 / det.E`1


1
/    det.E11 / det.B/
 
D det E`1 E`1
1
det.E`21
/    det.E11 / det.In / det.B/;

by using (2.28) with E D E`1 and M D E`1


1
and the fact that det.In / D 1: Finally, we
arrive at
 
det.AB/ D det E`1 E`1
1
   E11 In det.B/

D det.A/ det.B/:

This ends the proof of Theorem 2.4.7. t


u
2.4  Evaluating Determinants by Row Reduction
89 2
Example 2.10
Consider the matrices
" # " #
3 0 2 0
AD and BD :
1 1 4 3

Then,
" #
6 0
AB D :
2 3

Hence,

det.A/ D 3; det.B/ D 6;

and

det.AB/ D 18 D det.A/ det.B/:

2.4.1 Determinant Test for Invertibility

We have seen that Eq. (2.1) has a unique solution if and only if

a ¤ 0; or detŒa D a ¤ 0:

Similarly, we have shown that system (2.4) has unique solution if and only if
" #
ab
ad  bc ¤ 0; or det D ad  bc ¤ 0:
cd

This is equivalent to say that the inverse of the above matrix exists if and only its
determinant is not zero. In fact, this is the case for any matrix A in Mn .K/.

Theorem 2.4.8
Let A be a matrix in Mn .K/. Then, A is invertible if and only if

det.A/ ¤ 0:
90 Chapter 2 • Determinants

Proof
2 First assume that A is invertible, and A1 is its inverse. Then we have

AA1 D A1 A D In :

Hence, applying Theorem 2.4.7, we get

det.AA1 / D det.A/ det.A1 / D det.In / D 1:

This shows that (since K D R or C) det.A/ ¤ 0.


Conversely, assume that det.A/ ¤ 0. Then using the row operation method (or the Gauss–
Jordan elimination method), we write A as in (2.27):

E 1 E 2    E ` A D In ; (2.29)

and so as before, that

A D E`1    E21 E11

and Ei1 ; 1  i  `, is the elementary matrix corresponding to the ith row operation applied
by the Gauss–Jordan elimination algorithm.
Now, denoting

B D E1 E2    E` ;

we get

AB D BA D In :

Hence, A is invertible and A1 D B. t


u

Now, if A is an invertible matrix, then what is the relationship between the


determinant of A and the determinant of A1 ? This is answered in the following
theorem.

Theorem 2.4.9 (Determinant of the Inverse)


Let A be an invertible matrix in Mn .K/. Then,

1
det.A1 / D : (2.30)
det.A/
2.5  The Adjoint of a Square Matrix
91 2
Proof
Since A is invertible, Theorem 2.4.8 implies that det.A/ ¤ 0. Writing the invertibility relation

AA1 D A1 A D I;

taking the determinants, and using Theorem 2.4.7, we obtain

det.A1 A/ D det.A1 / det.A/ D det.I/ D 1:

Hence,

1
det.A1 / D ;
det.A/

as claimed. t
u

Example 2.11
Consider the matrix
" #
ab
AD :
cd

We have seen in Theorem 1.2.4 that if A is invertible, then


" #
1 1 d b
A D :
ad  bc c a

Now, we have by using (2.24),


" #
1 1 d b ad  bc 1 1
det.A /D det D D D :
.ad  bc/2 c a .ad  bc/ 2 ad  bc det.A/

2.5 The Adjoint of a Square Matrix

We have seen in Theorem 2.4.8 that the inverse of A exists if and only det.A/ ¤ 0. Since
our ultimate goal is to compute A1 , we may ask whether there is a way to compute
A1 by using the determinant? To answer this question, let us consider a matrix A in
M2 .K/,
" #
ab
AD :
cd
92 Chapter 2 • Determinants

Recall again that the inverse A1 is given by


2 " #
1 1 d b
A D : (2.31)
det.A/ c a

We want now to explain the relationship between A and the matrix


" #
d b
BD :
c a

It can be easily seen that the cofactors of A are

C11 D d; C12 D c; C21 D b; C22 D a:

If we form the matrix


" # " #
C11 C12 d c
CD D :
C21 C22 b a

Therefore, we can easily seen that

B D CT :

Consequently, formula (2.31) can be rewritten as

1
A1 D CT : (2.32)
det.A/

The matrix C is called the cofactor matrix of A or matrix of cofactors of A and CT is


called the adjoint of A and we may generalize this idea and give the following definition.

Definition 2.5.1 (Adjoint of a Matrix)


Let A be a matrix in Mn .K/ and Cij ; 1  i; j  n, be the cofactors of A, the matrix C
defined by
2 3
C11 C12 ::: C1n
6C ::: C2n 7
6 21 C22 7
CD6
6 :: :: :: :: 7
7
4 : : : : 5
Cn1 Cn2 : : : Cnn

is called the cofactor matrix or matrix of cofactors of A and the transpose of C is


called the adjoint of A, and we write

adj.A/ D CT :
2.5  The Adjoint of a Square Matrix
93 2
Example 2.12
Find the adjoint of the matrix
2 3
1 0 2
6 7
A D 4 1 3 0 5 :
1 0 2

Solution
We compute the cofactors of A as
" # " #
3 0 1 0
C11 D det D 6; C12 D  det D 2;
0 2 1 2

and similarly,

C13 D 3; C21 D 0; C22 D 4; C23 D 0;


C31 D 6; C32 D 2; C33 D 3:

Consequently, the cofactor matrix of A is


2 3
6 2 3
6 7
C D 4 0 4 0 5 ;
6 2 3

and then
2 3
6 0 6
6 7
adj.A/ D CT D 4 2 4 2 5 :
3 0 3

Now, as in the case of the inverse of a 2  2 matrix considered in (2.32), we have the
following theorem.

Theorem 2.5.1 (The Adjoint Formula for the Inverse)


Let A be an invertible matrix in Mn .K/. Then its inverse is given by

1
A1 D adj.A/: (2.33)
det.A/
94 Chapter 2 • Determinants

Proof
2 We need to show that the matrix B defined by

1
BD adj.A/
det.A/

satisfies

AB D BA D I: (2.34)

Then, B is an inverse of A and the uniqueness of the inverse (Theorem 1.2.3) leads to B D
A1 . To check (2.34), let A D .aij /; 1  i; j  n, and adj.A/ D .dji /; 1  j; i  n, with
dji D Cij . By Definition 1.1.11,

A adj.A/ D .bij /; 1  i; j  n;

with the entries of the above product satisfying

bij D ai1 d1j C ai2 d2j C    C ain dnj

D ai1 Cj1 C ai2 Cj2 C    C ain Cjn :

Now, if i D j, then the above formula is the cofactor expansion of the determinant of the
matrix A along the ith row.
On the other hand, if i ¤ j, then

ai1 Cj1 C ai2 Cj2 C    C ain Cjn D 0:

The above equation is just the determinant of the matrix A, where we replace the ith row by
the jth row. Then, in this case the matrix contains two identical rows and so its determinant
is zero (Theorem 2.4.4).
Therefore, we obtain
2 3
det.A/ 0 : : : 0
6 0 det.A/ : : : 0 7
6 7
A adj.A/ D 6
6 :: :: :: :: 77 D det.A/I:
4 : : : : 5
0 0 : : : det.A/

Since A is invertible, det.A/ ¤ 0, and we have


 
1
A adj.A/ D AB D I:
det.A/

By the same method, we can show that BA D I, and therefore, B D A1 . This completes the
proof of Theorem 2.5.1. t
u
2.5  The Adjoint of a Square Matrix
95 2
Example 2.13
Use the adjoint formula to find the inverse of the matrix
2 3
1 0 2
6 7
A D 4 1 3 0 5 :
1 0 2

Solution
We have computed the cofactors of A in Example 2.12. Let us now find the determinant of A.
Using the cofactor expansion along the second column, we have

det.A/ D a12 C12 C a22 C22 C a32 C32

D 3C22

D 12:

Thus, since det.A/ ¤ 0, A1 exists and

1
A1 D adj.A/:
det.A/

From Example 2.12, we have


2 3
6 0 6
6 7
adj.A/ D 4 2 4 2 5 :
3 0 3

Therefore
2 3 2 3
6 0 6 1=2 0 1=2
1 6 7 6 7
A1 D  4 2 4 2 5 D 4 1=6 1=3 1=6 5 :
12
3 0 3 1=4 0 1=4

Example 2.14
Use formula (2.33) to find the inverse of the matrix
2 3
3 2 1
6 7
A D 4 2 0 1 5:
1 2 1


96 Chapter 2 • Determinants

Solution
2 We need first to compute the cofactors of A as in Example 2.12. We find

C11 D 2; C12 D 3; C13 D 4; C21 D 4; C22 D 4; C23 D 4;

C31 D 2; C32 D 1; C33 D 4

Consequently, the cofactor matrix is


2 3
2 3 4
6 7
C D 4 4 4 4 5
2 1 4

and so
2 3
2 4 2
6 7
adj.A/ D 4 3 4 1 5 :
4 4 4

Now, we use the cofactor expansion along the second row to find the determinant of A as

det.A/ D a21 C21 C a22 C22 C a23 C23


D 2C21 C C23

D 4:

Since det.A/ ¤ 0, A1 exists and is given by


2 3 2 3
2 4 2 1=2 1 1=2
1 16 7 6 7
A1 D adj.A/ D  4 3 4 1 5 D 4 3=4 1 1=4 5 :
det.A/ 4
4 4 4 1 1 1

Example 2.15
Use the adjoint matrix to find the inverse of the matrix
2 3
2 1 1
6 7
A D 4 0 1 3 5 :
0 0 2


2.5  The Adjoint of a Square Matrix
97 2
Solution
First, it is clear that since A is a triangular matrix,

det.A/ D 2  .1/  .2/ D 4 ¤ 0:

This means that A is invertible. Now, we need to find the adjoint of A. We compute first the
cofactor matrix C of A. We have
" #
1 3
C11 D det D 2; C12 D 0; C13 D 0; C21 D 2;
0 2
C22 D 4; C23 D 0; C31 D 2; C32 D 6; C33 D 2:

Consequently,
2 3 2 3
C11 C12 C13 2 0 0
6 7 6 7
C D 4 C21 C22 C23 5 D 4 2 4 0 5 :
C31 C32 C33 2 6 2

Thus,
2 3
2 2 2
6 7
adj.A/ D CT D 4 0 4 6 5 ;
0 0 2

and so
3 2
2 3
2 2 2 1=2 1=2 1=2
1 16 7 6 7
A1 D adj.A/ D 4 0 4 6 5 D 4 0 1 3=2 5 :
det.A/ 4
0 0 2 0 0 1=2

Example 2.16
1. Use the row reduction method to find the determinant of the matrix
2 3
2 4 6
6 7
A D 4 0 0 1 5 :
2 1 5

2. Use the adjoint matrix to find A1 .


3. Use the above results to solve the system of equations
8
ˆ
< 2x1 C 4x2 C 6x3 D 1;
x3 D 2; (2.35)

2x1  x2 C 5x3 D 1:


98 Chapter 2 • Determinants

Solution
2 1. Denote by r1 ; r2 , and r3 the rows of A and of all the matrices obtained by means of row
operations. Our goal is to apply the row operation method to get a triangular matrix from A.
First, we exchange r2 and r3 and get
2 3
2 4 6
6 7
A1 D 4 2 1 5 5 ;
0 0 1

and det.A1 / D  det.A/. Next, we replace r2 by r2  r1 and obtain


2 3
2 4 6
6 7
A2 D 4 0 5 1 5
0 0 1

and det.A2 / D det.A1 /. Now, since A2 is a triangular matrix,

det.A2 / D 2  .5/  .1/ D 10:

Consequently,

det.A/ D  det.A1 / D  det.A2 / D 10:

2. Since det.A/ ¤ 0, A is invertible. We need to find the adjoint of A. We compute first


the cofactor matrix C of A. We have
" #
0 1
C11 D det D 1; C12 D 2; C13 D 0; C21 D 26;
1 5
C22 D 2; C23 D 10; C31 D 4; C32 D 2; C33 D 0:

Consequently,
2 3 2 3
C11 C12 C13 1 2 0
6 7 6 7
C D 4 C21 C22 C23 5 D 4 26 1 10 5 ;
C31 C32 C33 4 2 0

and so
2 3
1 26 4
6 7
adj.A/ D CT D 4 2 2 2 5 :
0 10 0
2.5  The Adjoint of a Square Matrix
99 2
This gives
2 3 2 3
1 26 4 1=10 13=5 2=5
1 1 6 7 6 7
A1 D adj.A/ D  4 2 2 2 5 D 4 1=5 1=5 1=5 5 :
det.A/ 10
0 10 0 0 1 0

3. We may write the system (2.35) as


2 3 2 3 2 3
2 4 6 x1 1
6 7 6 7 6 7
AX D b; with A D 4 0 0 1 5 ; X D 4 x2 5 ; and b D 4 2 5:
2 1 5 x3 1

Since A is invertible, the solution of (2.35) is given by


2 3 2 3 2 3
1=10 13=5 2=5 1 49=10
6 7 6 7 6 7
X D A1 b D 4 1=5 1=5 1=5 5  4 2 5 D 4 4=5 5 :
0 1 0 1 2

2.5.1 Cramer’s Rule

In this subsection, we will use the adjoint formula to find the solution of the system

AX D b (2.36)

where A is an invertible matrix in Mn .K/ and X and b are vectors in Mn1 .K/. That is
2 3
a11 a12 ::: a1n 32 2
3
6 7 x1 b1
6 a21 a22 ::: a2n 7 6 : 7 6 : 7
AD6
6 :: :: :: :: 7
7; XD6 7
4 :: 5 and bD6 7
4 :: 5 : (2.37)
4 : : : : 5
xn bn
an1 an2 ::: ann

To clarify the idea, take n D 2. Then we have


" # " # " #
a11 a12 x1 b1
AD ; XD and bD :
a21 a22 x2 b2
100 Chapter 2 • Determinants

Since A is invertible, the solution of (2.36) is given by


2 " #
x1 1
XD D A1 b D adj.A/b
x2 det.A/
" #" #
1 a22 a12 b1
D
det.A/ a21 a11 b2
" #
1 b1 a22  b2 a12
D :
det.A/ b1 a21 C b2 a11

Now, if we consider the two matrices


" # " #
b1 a12 a11 b1
A1 D and A2 D ;
b2 a22 a21 b2

then we have

b1 a22  b2 a12 D det.A1 / and  b1 a21 C b2 a11 D det.A2 /:

It is clear that the matrix A1 is obtained by replacing the first column of A by the vector
b and the matrix A2 is obtained by replacing the second column of A by the vector b.
This shows that the solution of (2.36) is given by
det.A1 / det.A2 /
x1 D and x2 D :
det.A/ det.A/
This method of finding x1 and x2 is called the Cramer rule and is generalized in the
following theorem.

Theorem 2.5.2 (Cramer’s Rule)


Consider the system of linear equations (2.36). Assume that det.A/ ¤ 0. Then, the
components of the unique solution of (2.36) are given by

det.A1 / det.A2 / det.An /


x1 D ; x2 D ;:::; xn D ; (2.38)
det.A/ det.A/ det.A/

where Aj ; 1  j  n, is the matrix obtained by replacing the entries in the jth column
of the matrix A by the entries of the column
2
3
b1
6b 7
6 27
bD6 7
6 :: 7 :
4 : 5
bn
2.5  The Adjoint of a Square Matrix
101 2
Proof
First method. It is clear that if det.A/ ¤ 0, then A is invertible and the unique solution of
(2.36) is given by X D A1 b. Now using formula (2.33), we have

1
XD adj.A/b
det.A/
2 32 3
C11 C21 ::: Cn1 b1
6C C ::: Cn2 7 6 7
6
1 6 12 22 7 6 b2 7
D
det.A/ 6 : : :: :: 7 6 7
76 : 7;
4 :: :: : : 5 4 :: 5
C1n C2n : : : Cnn bn

whence
2 3 2 3
x1 b1 C11 C b2 C21 C    C bn Cn1
6x 7 6b C C b C C  C b C 7
6 27 6 1 12 n n2 7
6 : 7D 1 6 :
2 22
6 : 7 6 : :: :: 77:
4 : 5 det.A/ 4 : : : 5
xn b1 C1n C b2 C2n C    C bn Cnn

Thus, the jth component of X is given by

b1 C1j C b2 C2j C    C bn Cnj


xj D :
det.A/

But it is clear that if we replace a1j ; 1  j  n, by bj , then

b1 C1j C b2 C2j C    C bn Cnj

is the cofactor expansion of the determinant of the resulting matrix


2 3
a11 a12 : : : a1.j1/ b1 a1.jC1/ : : : a1n
6 7
6 a21 a22 : : : a2.j1/ b2 a2.jC1/ : : : a2n 7
Aj D 6
6 :: :: :: :: :: :: : 77
4 : : : : : : : : : :: 5
an1 an2 : : : an.j1/ bn an.jC1/ : : : ann

since C1j ; C2j ; : : : ; Cnj are simultaneously cofactors of A and of Aj . Consequently,

det.Aj /
xj D :
det.A/

Second method. We denote by Aj .b/ the matrix obtained by replacing the jth column of
A by the vector b. Let a1 ; a2 ; : : : ; an be the column vectors of A and let e1 ; e2 ; : : : ; en be the
column vectors of the identity matrix I. Then, we have, for 1  j  n,

Ij .X/ D Œe1 ; : : : ; X; : : : ; en ;
102 Chapter 2 • Determinants

where we replaced ej by X. Clearly,


2
AIj .X/ D ŒAe1 ; : : : ; AX; : : : ; Aen 

D Œa1 ; : : : ; b; : : : ; an  D Aj .b/:

Now, applying Theorem 2.4.7, we have

det.AIj .X// D det.A/ det.Ij .X// D det.Aj .b//:

Since det.Ij .X// D xj , we conclude that

det.Aj .b// det.Aj


xj D D :
det.A/ det.A/
This finishes the proof of Theorem 2.5.2. t
u

Example 2.17
Use Cramer’s rule to find the solution of the linear system
(
x1  x2 D 1;
(2.39)
x1 C 2x2 D 3:

Solution
System (2.39) can be written in matrix form as

AX D b;

with
" # " # " #
1 1 x1 1
AD ; XD and bD :
1 2 x2 3

Following Cramer’s rule, we introduce the matrices


" # " #
1 1 11
A1 D and A2 D :
3 2 13

Since det.A/ D 3 ¤ 0, A is invertible and the components of the unique solution of (2.39)
are given by

det.A1 / 5 det.A2 / 2
x1 D D and x2 D D :
det.A/ 3 det.A/ 3

J
2.5  The Adjoint of a Square Matrix
103 2
Example 2.18
Use Cramer’s rule to find the solution of the system
8
ˆ
< x1 C 2x2 C 3x3 D 1;
2x1 C 5x2 C 3x3 D 6; (2.40)

x1 C 8x3 D 6:

Solution
The system (2.40) can be written in the form

AX D b;

with
2 3 2 3 2 3
123 x1 1
6 7 6 7 6 7
A D 42 5 35; X D 4 x2 5 ; and b D 4 6 5:
108 x3 6

We can use the method of cofactor expansion to show that

det.A/ D 1 ¤ 0:

Therefore, A is invertible, and following Cramer’s rule we introduce


2 3 2 3 2 3
1 23 1 1 3 12 1
6 7 6 7 6 7
A1 D 4 6 5 3 5 ; A2 D 4 2 6 3 5 ; A3 D 4 2 5 6 5 :
6 0 8 1 6 8 1 0 6

By the cofactor expansion, we show that

det.A1 / D 2; det.A2 / D 1; det.A3 / D 1:

Hence, the components of the solution of (2.40) are given by

det.A1 / 2 det.A2 / det.A3 /


x1 D D D 2; x2 D D 1; x3 D D 1:
det.A/ 1 det.A/ det.A/

J
104 Chapter 2 • Determinants

2.6 Exercises
2
Exercise 2.1
1. Consider the matrix
2 3
c20
6 7
A D 41 c 25;
01c

where c is a real number. Find all values of c, if any, for which A is invertible.
2. Put c D 1 and use the adjoint matrix to find A1 .


Solution
1. The matrix A is invertible if and only if det.A/ ¤ 0. Now, using the cofactor expansion,
we have

det.A/ D a11 C11 C a12 C12 C a13 C13


" # " #
c2 12
D c det  2 det
1c 0c

D c.c2  2/  2c
D c.c2  4/ D c.c  2/.c C 2/:

So, the matrix A is invertible if and only if c ¤ 0, c ¤ 2 and c ¤ 2.


2. Since for c D 1, det.A/ ¤ 0, then A is invertible. We need to find the adjoint of A. We
compute first the cofactor matrix C of A. A simple computation (see  Sect. 2.5) shows that
2 3 2 3
C11 C12 C13 1 1 1
6 7 6 7
C D 4 C21 C22 C23 5 D 4 2 1 1 5 :
C31 C32 C33 4 2 1

Thus,
2 3
1 2 4
6 7
adj.A/ D CT D 4 1 1 2 5 ;
1 1 1

and so
2 3 2 3
1 2 4 1=3 2=3 4=3
1 16 7 6 7
A1 D adj.A/ D  4 1 1 2 5 D 4 1=3 1=3 2=3 5 :
det.A/ 3
1 1 1 1=3 1=3 1=3

J
2.6  Exercises
105 2
Exercise 2.2
Let
" #
ab
AD :
cd

1. Find A2 and tr.A2 /:


2. Show that
" #
1 tr.A/ 1
det.A/ D det :
2 tr.A2 / tr.A/

Solution
1. We have
" #" # " #
2 ab ab a2 C bc ab C db
A D AA D D :
cd cd ac C dc d2 C bc

Thus, the trace of A2 is

tr.A2 / D a2 C bc C d2 C bc D a2 C d2 C 2bc:

2. We have
" #
tr.A/ 1
det D .tr.A//2  tr.A2 /
tr.A2 / tr.A/

D .a C d/2  .a2 d2 C 2bc/


D 2ad  2bc

D 2 det.A/;

which gives the desired result. J

Exercise 2.3
Let A and B be two invertible matrices in Mn .R/. Show that if

AB D BA;

then n is even. 
106 Chapter 2 • Determinants

Solution
2 Since AB D BA, we have

det.AB/ D det.BA/:

Using the properties of the determinant (Theorem 2.4.5), we get

det.AB/ D .1/n det.BA/:

Using the product rule (Theorem 2.4.7) and the fact that det.A/ ¤ 0 and det.B/ ¤ 0
(Theorem 2.4.8), we obtain .1/n D 1. This shows that n is even. J

Exercise 2.4
1. Find the determinant of the matrix A in M2 .C/ given by
" #
! !
AD
1 !

and the determinant of the matrix B in M3 .C/ given by


2 3
1 1 1
6 7
B D 4 1 ! !2 5 ;
1 !2 !

where ! D cos 2 3 C i sin


2
3 :
2. Find B1 . 

Solution
1. By direct computation,

det.A/ D ! 2 C !:

Using Euler’s formula, we have

! 3 D cos.2/ C i sin.2/ D 1:

Thus, ! is a cubic root of 1, so

! 3  1 D .!  1/.! 2 C ! C 1/ D 0;

which gives

! 2 C ! C 1 D 0;
2.6  Exercises
107 2
since ! ¤ 1. Hence,

det.A/ D ! 2 C ! D 1:

Similarly, to find the determinant of B, we use the cofactor expansion along the first row, to
get

det.B/ D a11 C11 C a12 C12 C a13 C13

D C11 C C12 C C13


" # " # " #
! !2 1 !2 1 !
D det  det C det
!2 ! 1 ! 1 !2

D ! 4 C 3! 2  2!:

Since ! 3 D 1, we have ! 4 D !. Therefore,

det.B/ D 3.! 2  !/:

On the other hand, since ! satisfies the equation

a2 C a C 1 D 0 (2.41)

and since ! 3 D 1, it follows that ! 2 satisfies the same equation, that is

! 4 C ! 2 C 1 D 0:

Since the coefficients of the Eq. (2.41) are real, one necessarily has that !N is also a solution
to (2.41), therefore, ! 2 D !.
N Consequently,

det.B/ D 3.! 2  !/
D 3.!N  !/

D 6i sin.2=3/:

2. To find the inverse of B, we need first to find the adjoint matrix adj.B/. We compute
the cofactors of B as follows:
" # " #
! !2 2 4 2 1 !2
C11 D det D !  ! D !  !; C12 D  det D ! 2  !;
!2 ! 1 !
" # " #
1 ! 2 1 1
C13 D det D !  !; C21 D  det D ! 2  !;
1 !2; !2 !
108 Chapter 2 • Determinants

" # " #
1 1 1 1
2 C22 D det D !  1; C23 D  det D 1  !2;
1! 1 !2
" # " #
1 1 2 1 1
C31 D det D !  !; C23 D  det D 1  !2;
! !2 1 !2
" #
1 1
C33 D det D !  1:
1!

Consequently, the cofactor matrix is


2 3
!2  ! !2  ! !2  !
6 7
C D 4 !2  ! !  1 1  !2 5 ;
!2  ! 1  !2 !  1

and thus
2 3
!2  ! !2  ! !2  !
6 7
adj.B/ D CT D 4 ! 2  ! !  1 1  ! 2 5 :
!2  ! 1  !2 !  1

Consequently, applying (2.33), we get


2 3
!2  ! !2  ! !2  !
1 1 6 2 7
B1 D adj.B/ D 4 !  ! !  1 1  !2 5
det.B/ 3.! 2  !/ 2 2
! ! 1! !1
2 3
1 1 1
63 3 3 7
6 7
61 1 ! 7
D66 7;
7
6 3 3! 3 7
41 ! 1 5
3 3 3!

where we have used the fact that ! 2 C ! C 1 D 0. J

Exercise 2.5 (Vandermonde Determinant)


Let .a1 ; a2 ; : : : ; an / be in Kn . We define the Vandermonde determinant V.a1 ; a2 ; : : : ; an / to
be the determinant of the Vandermonde matrix
2 3
1 a1 a21 : : : an1
1
61 a2 a22 : : : an1 7
6 2 7
Vn D 6
6 :: :: :: 7
7: (2.42)
4: : : 5
1 an a2n : : : an1
n
2.6  Exercises
109 2
1. Find V.a1 ; a2 ; : : : ; an / for all n  1.
2. Deduce that the Vandermonde determinant is zero if and only if at least two of the ai ’s
coincide.

Solution
1. We find the determinant of Vandermonde by induction. First, for n D 1, we have V.a1 / D
1. Next, for n D 2, we have
" #
1 a1
V.a1 ; a2 / D det D a2  a1 :
1 a2

For n D 3, we have
2 3
1 a1 a21
6 7
V.a1 ; a2 ; a3 / D det 4 1 a2 a22 5 :
2
1 a3 a3

To find the above determinant, we use Theorem 2.4.2 and replace c2 in the above matrix by
c2  a1 c1 and c3 by c3  a1 c2 , where c1 ; c2 and c3 are the first, the second, and the third
columns of the above matrix to get
2 3
1 0 0
6 7
V.a1 ; a2 ; a3 / D det 4 1 a2  a1 a2 .a2  a1 / 5
1 a3  a1 a3 .a3  a1 /
" #
a2  a1 a2 .a2  a1 /
D det
a3  a1 a3 .a3  a1 /
D .a2  a1 /a3 .a3  a1 /  .a3  a1 /a2 .a2  a1 /
D .a2  a1 /.a3  a1 /.a3  a2 /
Y
D .ai  aj /:
1j<i3

Now, we assume that


Y
V.a1 ; a2 ; : : : ; an1 / D .ai  aj / (2.43)
1j<in1

and show that


Y
V.a1 ; a2 ; : : : ; an / D .ai  aj /: (2.44)
1j<in
110 Chapter 2 • Determinants

Indeed, using the same method as before and replacing c2 by c2  a1 c1 , c3 by c3  a1 c2 up to


2 cn , which we replace by cn  a1 cn1 , we get
2 3
1 0 0 ::: 0
6 7
6 1 .a2  a1 / .a2  a1 /a2 : : : .a2  a1 /an2 2 7
V.a1 ; a2 ; : : : ; an / D det 6
6 :: :: :: 7
7
4: : : 5
1 .an  a1 / .an  a1 /an : : : .an  a1 /an2 n
2 3
.a2  a1 / .a2  a1 /a2 : : : .a2  a1 /a2 n2
6 :: :: 7
D det 6
4 : :
7
5
.an  a1 / .an  a1 /an : : : .an  a1 /an2 n
2 3
1 a2 : : : an2
2
6: :: 7
D .a2  a1 /.a3  a1 /      .an  a1 / det 6
4 ::
7
: 5
1 an : : : an2
n

D .a2  a1 /.a3  a1 /      .an  a1 /V.a1 ; a2 ; : : : ; an1 /


Y Y
D .ai  a1 /  .ai  aj /:
1<in 1j<in1

Now, by using (2.43), we conclude that (2.44) holds.


2. It is clear that from (2.44) if there exists 1  i0 ; j0  n and ai0 D aj0 with i0 ¤ j0 , then
V.a1 ; a2 ; : : : ; an / D 0: J

Exercise 2.6 (Hankel Matrix)


We define the Hankel matrix by
2 3
s0 s1 s2 : : : sn1
6 s s s ::: s 7
6 1 2 3 n 7
Hn D 6
6 :: :: :: 7
7 D .siCj2 /; 1  i; j  n; (2.45)
4 : : : 5
sn1 sn snC1 : : : s2n2

where

X
n
sk D aki ; 0  k  2n  2;
iD1

for some a1 ; a2 ; : : : ; an in K.
1. Show that Hn D VnT Vn , where Vn is the Vandermonde matrix (2.42).
2. Find det.Hn /.


2.6  Exercises
111 2
Solution
We have
2 32 3
1 1 ::: 1 1 a1 a21 : : : an1
1
6 a2 : : : an 7 6 a2 a22 : : : an1 7
6 a1 761 2 7
VnT Vn D 6
6 :: :: :: 7 6
76 : :: :: 7
7
4 : : : 5 4 :: : : 5
an1
1 an1
2 : : : an1
n 1 an a2n : : : an1
n

To get siCj2 , we need to multiply row ri1 of VnT with column cj1 of Vn . For instance

s0 D s1C12 D r1  c1 ; i D j D 1;

s1 D s1C22 D s2C12 D r1  c2 D r2  c1 ; i D 1; j D 2; or i D 2; j D 1:
::
:

sk D ri  cj ; with i C j D k C 2:

Thus, for example


2 j1 3
a1
6 aj1 7 X X
i1 6 2
6 7 n n
sk D Œai1 ; i1
; : : : ;   7 D
iCj2
D aki ; .0  k  2n  2/:
1 a2 an 6 :: 7 ai
4 : 5 iD1 iD1
aj1
n

Consequently, it is clear that VnT Vn D Hn .


2. Using Theorem 2.4.7 and Theorem 2.3.1, we have

det.Hn / D det.VnT Vn / D det.VnT / det.Vn /

D Œdet.V/2
Y
D .ai  aj /2 ;
1j<in

where we have used (2.44). J

Exercise 2.7 (Skew–Symmetric Matrix)


Let A be a square matrix in Mn .K/. A is called skew symmetric if AT D A.
1. Show that for any matrix A in Mn .K/, then A C AT is symmetric and A  AT is skew
symmetric.
2. Show that any matrix A in Mn .K/ can be written as the sum of a symmetric matrix and a
skew symmetric one.
3. Let A be a skew symmetric matrix in Mn .K/. Show that if n is odd, then

det.A/ D 0:


112 Chapter 2 • Determinants

Solution
2 1. Let A be a matrix in Mn .K/. Then,

.A C AT /T D AT C .AT /T D AT C A:

Consequently, A C AT is symmetric.
Similarly,

.A  AT /T D AT  .AT /T D AT  A D .A  AT /:

So, A  AT is skew symmetric.


2. Now, for any matrix A in Mn .K/, we have

1 1
AD .A C AT / C .A  AT /:
2 2
Since A C AT is symmetric, 12 .A C AT / is also symmetric; moreover, since A  AT is skew
symmetric, so is 12 .A  AT /.
3. Since A is skew symmetric, we have A D AT . This gives

det.A/ D det.AT / D .1/n det.AT /;

by making use of (2.24). Now, since det.A/ D det.AT / (Theorem 2.3.1), then by the above
reasoning if n is odd, then

2 det.A/ D 0:

This yields

det.A/ D 0:

Exercise 2.8 (Companion Matrix)


The companion matrix associated with the polynomial

p.x/ D a0 C a1 x C    C an1 xn1 C xn

is defined to be the matrix


2 3
0 1 0 ::: 0
6 : 7
6 7
6 0 0 1 :: 0 7
6 7
Cp D 6
6
:: :: :: :: 7 ;
6 : : : : 77
6 7
4 0 0 0 ::: 1 5
a0 a1 a2 : : : an1

where a0 ; a1 ; : : : an1 are in K.


2.6  Exercises
113 2
1. Show that det.Cp / D .1/n a0 .
2. Prove that det.I  Cp / D p./.
3. Deduce that det.0 I  Cp / D 0 if and only if 0 is a root of p./.

Solution
1. We use the row reduction method (Theorem 2.4.2) to compute the determinant of Cp . We
can apply two methods:
For the first method, we denote the columns of the matrix Cp and the columns of the
obtained matrices after any column operation by C1 ; C2 ; : : : ; Cn . Permuting the first and the
second columns, we get
2 3
1 0 0 ::: 0
6 : 7
6 7
6 0 0 1 :: 0 7
6 7
Cp;1 D6
6
:: :: :: :: 7
6 : : : : 77
6 7
4 0 0 0 ::: 1 5
a1 a0 a2 : : : an1

and det.Cp;1 / D  det.Cp /. Similarly, permuting C2 and C3 , we obtain


2 3
1 0 0 ::: 0
6 : 7
6 7
6 0 1 0 :: 0 7
6 7
Cp;2 D6
6
:: :: :: :: 7
6 : : : : 77
6 7
4 0 0 0 ::: 1 5
a1 a2 a0 : : : an1

and det.Cp;2 / D  det.Cp;1 / D .1/2 det.Cp /. We continue in this way and in each operation,
we permute Ci and CiC1 ; i D 1; : : : ; n  1, and finally, after .n  1/ operations, we get
2 3
1
0 ::: 0 0
6 : 7
6 7
6 0 1 :: 0 0 7
6 7
Cp;.n1/ D6 :
6 ::
:: :: 7
6 : : 0 77
6 7
4 0 0 : : : 1 0 5
a1 a2 : : : an1 a0

and det.Cp;.n1/ / D .1/n1 det.Cp /. Since Cp;.n1/ is a triangular matrix, its determinant is
the product of the entries of the main diagonal, that is

det.Cp;.n1/ / D a0 :
114 Chapter 2 • Determinants

Consequently,
2
det.Cp / D .1/n a0 :

For the second method, we just replace the nth row rn in Cp by rn C a1 r1 C a2 r2 C    C


an1 rn1 , and obtain
2 3
0 1 0 ::: 0
6 : 7
6 7
6 0 0 1 :: 07
6 7
AD6
6
:: :: : : :: 7 ;
6 : : : :77
6 7
4 0 0 0 ::: 15
a0 0 0 : : : 0

with det.A/ D det.Cp /. Now, we use the cofactor expansion along the first column to
compute the determinant of A

det.A/ D a0 Cn1 D a0 .1/nC1 Mn1 ;

where
2 3
1 0 ::: 0
6 7
6 0 1 ::: 07
6 7
D det 6 D 1:
:: 7
Mn1
6 :: : : 7
4: : :5
0 0 ::: 1

Thus,

det.Cp / D det.A/ D .1/n a0 :

2. We have
2 3
 1 0 : : : 0
6 : 7
6 7
6 0  1 :: 0 7
6 7
6
I  Cp D 6 :: :: :: :
:: 7:
: : : 7
6 7
6 7
4 0 0 0 ::: 1 5
a0 a1 a2 : : :  C an1

Now, to compute the determinant of I  Cp , we use the cofactor expansion along the first
column, obtaining

.n1/ .n1/
det.I  Cp / D C11 C a0 Cn1 D M11 C a0 .1/nC1 Mn1 ;
2.6  Exercises
115 2
where
2 3
::
6  1 : 0 7
6 : :: :: 7
.n1/ 6 : : 7
M11 D det 6 : : 7
6 7
4 0 0 ::: 1 5
a1 a2 : : :  C an1

and
2 3
1 0 : : : 0
6 7
6  1 ::: 0 7
.n1/ 6 7
D det 6 D .1/n1 ;
:: 7
Mn1 (2.46)
6 :: :: 7
4 : : : 5
0 0 : : : 1

since the above matrix is triangular. Thus,


.n1/ .n1/
det.I  Cp / D M11 C a0 .1/nC1 .1/n1 D M11 C a0 :

Now, observe that the matrix in (2.46) is the same as the matrix I  Cp , but of size
.n  1/  .n  1/. So, the same method as before yields
.n1/ .n2/
M11 D M11 C a1 ;
.n2/
where M11 is the determinant of the .n  2/  .n  2/ matrix
2 3
::
6  1 : 0 7
6 : :: :: 7
6 : : 7
6 : : 7:
6 7
4 0 0 ::: 1 5
a2 a3 : : :  C an1

Consequently, we obtain

.n2/
det.I  Cp / D 2 M11 C a1  C a0 :

Repeating this process until the end, we obtain

.2/
det.I  Cp / D n2 M11 C an3 n3 C    C a1  C a0 ; (2.47)

where
" #
.2/  1
M11 D det D 2 C an1 C an2 :
an2  C an1
116 Chapter 2 • Determinants

Plugging this into (2.47), we get


2
det.I  Cp / D n C n1 an1 C an2 n2 C an3 n3 C    C a1  C a0 D p./:

3. It is clear, since det.I  Cp / D p./, that if p.0 / D 0, then det.0 I  Cp / D 0 and


if det.0 I  Cp / D 0, then p.0 / D 0. J

Exercise 2.9 (Tridiagonal Matrix)


The matrix An in Mn .K/ is called tridiagonal if it has the form
2 3
a1 b1 0 : : : : : : 0
6 7 ::
6 7
6 c1 a2 b2 7 :
6 7
6 0 c2 a3 b3 : : : 7 0
6 7
An D 6 : :: :: :: 7: ::
6 :: : : : 7 :
6 7
6 : 7
6 : :: :: :: 7
4 : : : : bn1 5
0 ::: 0 cn1 an

1. Show that

det.An / D an det.An1 /  bn1 cn1 det.An2 /; n  3: (2.48)

2. The Fibonacci sequence is the sequence of numbers 1; 2; 3; 5; 8; 13; : : : generated by the


relation Fn D Fn1 C Fn2 ; n  3, with F1 D 1 and F2 D 2. Show that the determinant
of the Fibonacci matrix
2 3
1 1 0 ::: ::: 0
6 :: 7
6 7
6 1 1 1 :7
6 7
6 0 1 1 1 : : : 0 7
6 7
Bn D 6 : :: :: :: 7
6 :: : : : 7
6 7
6 : 7
6 : :: :: :: 7
4 : : : : 15
0 ::: 0 1 1

is the nth Fibonacci number Fn .

Solution
1. The cofactor expansion along the last column, yields

det.An / D an Cnn C bn1 C.n1/n :


2.6  Exercises
117 2
Consequently,
2 3
a1 b1 0 : : : : : : 0
6 7::
6 7
6 c1 a2 b2 7 :
6 7
6 0 c2 a3 b3 : : : 70
6 7
det.An / D an .1/ nCn
det 6 : : : : 7
6 :: :: :: :: 7
6 7
6 : 7
6 : :: :: :: 7
4 : : : : bn2 5
0 ::: 0 cn2 an1
2 3
c1 a1 b1 : : : : : : 0
6 :: 7
6 7
6 0 c2 a2 b2 : 7
6 7
6 : 7
6 0 0 c3 a3 : : 0 7
C bn1 .1/ n1Cn
det 6
6 :
7:
7 (2.49)
6 :: :: :: :: 7
6 : : : bn3 7
6 : 7
6 : :: :: 7
4 : : :c a 5
n3 n2
0 ::: 0 0 cn1

On the other hand, we have


2 3
c1 a1 b1 : : : : : :0
6 :: 7
6 7
6 0 c2 a2 b2 : 7
6 7
6 : 7
6 0 0 c3 a3 : : 0 7
det 6
6 :
7
7
6 :: :: :: ::
6 : : : bn3 7
7
6 : 7
6 : :: :: 7
4 : : : cn3 an2 5
0 ::: 0 0 cn1
2 3
a1 b1 0 : : : ::: 0
6 7 ::
6 7
6 c1 a2 b2 7 :
6 7
6 0 c2 a3 b3 ::: 7 0
2n2 6 7
D cn1 .1/ det 6 : :: :: :: 7
6 :: : : : 7
6 7
6 : 7
6 : :: :: :: 7
4 : : : : bn3 5
0 ::: 0 cn3 an2
D cn2 det.An2 /:

Plugging the last expression into (2.49), we get

det.An / D an det.An1 /  bn1 cn1 det.An2 /;

which is exactly (2.48).


118 Chapter 2 • Determinants

2. We may apply (2.48) for an D 1; bn D 1 and cn D 1 for all n, we get


2
det.Bn / D det.Bn1 / C det.Bn2 /; n  3:

Thus, for n D 1, we have det.B1 / D detŒ1 D 1: For n D 2, we have


" #
1 1
det.B2 / D det D 2:
1 1

Therefore, det.Bn / D Fn for all n  1. J

Exercise 2.10 (Determinant of Block Matrices and Schur’s Formula)


Let M be a matrix in Mn .K/ and let A; B; C and D be matrices in Mp .K/ with 2p D n.
1. Show that if
" #
A 0pp
MD ;
0pp D

then

det.M/ D det.A/ det.D/ D det.AD/: (2.50)

2. Show that (2.50) remains true if M has one of the forms


" # " #
A 0pp A B
MD ; or MD :
C D 0pp D

3. Show that if M is the matrix written blockwise as


" #
A B
MD ;
CD

then the formula

det M D det.AD  BC/ (2.51)

is not true in general.


4. Show that if D is invertible, then

det.M/ D det.D/ det.A  BD1 C/ (Schur’s formula):


2.6  Exercises
119 2
Solution
1. Assume that A D .aij / is an r  r matrix .r  p/ and D is a p  p matrix. We use induction
on r to show (2.50). If r D 1, then by expanding along the first row, we obtain
2 3
a11 0 : : : 0
6 7
6 7
6 7
6 0 7
det 6 7 D a11 det.D/ D det.A/ det.D/:
6 :: 7
6 7
4 : D 5
0

Now, we assume that (2.50) is true for r D p  1 and show that it remains true for r D p. We
have
" #
A 0pp
det.M/ D det ;
0pp D

where the matrix A can be written as


2 3
A.p1/.p1/ a1p
6 :: 7
AD6
4
7
: 5;
ap1 : : : ap.p1/ app

with A.p1/.p1/ D .aij /; 1  i; j  p  1. Obviously,

det.A/ D app det.A.p1/.p1/ /

and
" #
A.p1/.p1/ 0.p1/p
det.M/ D app det D app det.A.p1/.p1/ / det.D/
0p.p1/ D
D det.A/ det.D/;

where we have used the induction hypothesis.


2. The first formula can be proved by the same method as in the first question. The second
one can be proved by using the determinant of the transpose (Theorem 2.3.1).
3. It is clear that (2.51) is true for p D 1. We provide a counterexample for p  2.
Assume, for instance, that p D 2 and we consider the matrices
" # " # " # " #
1 2 2 1 1 0 42
AD ; BD ; CD ; DD :
1 1 1 3 1 4 23
120 Chapter 2 • Determinants

Then,
2 " #
7 4
det.AD  BC/ D det D 115:
6 13

On the other hand, we have


2 3
1 2 2 1
6 7
6 1 1 1 37
det.M/ D det 6 7 D 94:
4 1 0 4 25
1 4 2 3

4. To prove Schur’s formula, we observe that


" #" # " #
A B Ip 0pp A  BD1 C B
D :
CD D1 C Ip 0pp D

Then, we apply (2.50), to get


" # " # " #
A B Ip 0pp A  BD1 C B
det det D det :
CD D1 C Ip 0pp D

That is,

det.M/ D det.D/ det.A  BD1 C/;

since det Ip D 1. J
121 3

Euclidean Vector Spaces

Belkacem Said-Houari

© Springer International Publishing AG 2017


B. Said-Houari, Linear Algebra, Compact Textbooks in Mathematics,
DOI 10.1007/978-3-319-63793-8_3

3.1 Introduction

The main objectives in this chapter are to generalize the basic geometric ideas in R2 or
R3 to nontrivial higher-dimensional spaces Rn . Our approach is to start from geometric
concepts in R2 or R3 and then extend them to Rn in a purely algebraic manner.
In engineering and physics, many quantities, such as force and velocity, are defined
by their direction and magnitude. Mathematically, this is what we call a vector. Thus,
we can simply define a vector in 2- or 3-dimensional space as a line segment with a
definite direction, or graphically as an arrow connecting a point A (called initial point)
and a point B (called terminal point), as shown in ⊡ Fig. 3.1
In engineering and physics, some quantities like weight or height can be represented
by a scalar. On the other hand, a force, for example, can be described by its magnitude
and direction, therefore, a vector is characterized by its magnitude (length) and its
direction. Accordingly, two vectors are equivalent (or equal) if they have the same length
and the same direction. See ⊡ Fig. 3.2.

3.2 Vector Addition and Multiplication by a Scalar

3.2.1 Vector Addition

Consider the vector v with initial point A and terminal point B and let w be the vector
with initial point B and terminal point C. Then it is not hard to see that v C w is the
vector with initial point A and terminal point C, as in ⊡ Fig. 3.3.
Now, the question is that: how can we add v to w if the terminal point of v is not the
initial point of w? In this case, as we stated before, two vectors with the same length and
the same direction are equal. Thus, as in ⊡ Fig. 3.4, the dashed vector is equal to the
vector v and its initial point is in the same time the terminal point of v and thus we can
122 Chapter 3 • Euclidean Vector Spaces

E
⊡ Fig. 3.1 The vector v D AB
B

3
A

⊡ Fig. 3.2 The vectors v and w


are equal since they have the
same length and the same
direction w

⊡ Fig. 3.3 The sum of two w


vectors v C w
B C

+w

⊡ Fig. 3.4 The sum of two w


vectors v C w

+
w
+

add the two vectors as above. Now, from this simple remark, we may deduce directly
that:
1. The sum of vectors is commutative. That is

v C w D w C v; (3.1)

as illustrated in ⊡ Fig. 3.4.


2. The sum of vectors is associative, that is

u C .v C w/ D .u C v/ C w; (3.2)

as illustrated in ⊡ Fig. 3.5.


3.2  Vector Addition and Multiplication by a Scalar
123 3
⊡ Fig. 3.5 The sum of vectors w
is associative

+w
u )
+w

+
(
u+

u
w=
)+
(u +

⊡ Fig. 3.6 Multiplication of a


vector by a scalar

1
2
−2

⊡ Fig. 3.7 Components of a y


vector if the initial point is the
origin 2 P

x
O 1

3.2.2 Multiplication of a Vector by a Scalar

Let v be a nonzero vector in 2- or 3-dimensional space and let k be a scalar. Then, we


define the product kv to be the vector whose length is jkj times the length of v and
whose direction is the same direction of v if k > 0 and the opposite one if k < 0. See ⊡
Fig. 3.6.

3.2.3 Vectors in Coordinate Systems

Let v be a vector in 2-dimensional space with its initial point at the origin of
a rectangular coordinate system. Then the vector is completely determined by the
coordinates of its terminal point; see ⊡ Fig. 3.7.

Thus, v D OP D .v1 ; v2 /, where v1 and v2 are called the components of v. Now,
in the coordinate system the vector is determined by its components rather than by its
direction and its length.
Now, if the initial point P1 is not the origin, then we just need to translate the
coordinate system in such a way that the point P1 will be the origin in the new coordinate
system as in ⊡ Fig. 3.8.
Let P.x1 ; y1 / be the initial point of the vector v and P2 .x2 ; y2 / be the terminal point
of v. Then in the x0 y0 -plane the components of the vector v are v D .x02 ; y02 /, since P1 is
the origin in the x0 y0 -plane. Since in the xy-plane we have x02 D x2  x1 and y02 D y2  y1 ,
the components of v in the xy-plane are v D .x2  x1 ; y2  y1 /.
124 Chapter 3 • Euclidean Vector Spaces

⊡ Fig. 3.8 Components of a y


vector if the initial point is not
the origin
y
3 y2 P2
y2

y1 x
P1 x2
x
x1 x2

Now, we are going to generalize the above notions to any n-dimensional space.
We have seen in (1.15) how to add two vectors in 2-dimensional space (R2 ) and
3-dimensional space (R3 ). To generalize (1.15), let us first define the space Rn .

Definition 3.2.1 (Euclidean Space Rn )


Let n be a positive integer. An ordered n-tuple is a finite sequence of n real numbers
.v1 ; v2 ; : : : ; vn /. The space Rn is the set of all ordered n-tuples. This space Rn can
be also regarded as the space of all vectors with n components.

Now, similarly to the geometric approach in (1.15), we may define the addition in
Rn . Thus, if v D .v1 ; v2 ; : : : ; vn / and w D .w1 ; w2 ; : : : ; wn / are two vectors in Rn , we
define the sum v C w to be the vector

v C w D .v1 C w1 ; v2 C w2 ; : : : ; vn C wn /: (3.3)

Example 3.1
Let v D .1; 3; 1/ and w D .1; 0; 2/ be two vectors of R3 . Then,

v C w D .0; 3; 1/:

As depicted in ⊡ Fig. 3.9, if v is the vector with the initial point A and terminal point B
and w is the vector with initial point B and terminal point C, then the sum v C w is the
vector with the initial point A and terminal point C.
The real numbers v1 ; v2 ; : : : ; vn are called the components of the vector v. Form
(3.3), we may easily show that

v C w D w C v:

Thus, the addition .C/ of vectors is commutative. Moreover, and as we have seen for
the 2-dimensional space, it is clear that
3.2  Vector Addition and Multiplication by a Scalar
125 3
⊡ Fig. 3.9 The sum of two y w
vectors v C w in a coordinate
system B C

+w

A
x

⊡ Fig. 3.10 The vector v y

− 1
x
1

− 2

u C .v C w/ D .u C v/ C w;

for any three vectors u; v and w in Rn . Thus, the addition of vectors is also associative.
In particular, if w D v, then

v C v D 2v D .v1 C v1 ; v2 C v2 ; : : : ; vn C vn / D .2v1 ; 2v2 ; : : : ; 2vn /:

We may also define the vector .1/v D v (⊡ Fig. 3.10) to be the vector with the same
length as v and its direction the opposite of that of v. Using the geometric representation
in R2 , for example, we find that if v D .v1 ; v2 /, then v D .v1 ; v2 /.
Analogously, if k is a scalar, then we define kv to be the vector

kv D .kv1 ; kv2 ; : : : ; kvn /: (3.4)

Also, we define the zero vector in Rn to be the vector whose components are all zero.
That is, 0 D 0Rn D .0; 0; : : : ; 0/, and this vector has the property

0Cv DvC0Dv

for any vector v in Rn . Thus, the zero vector is an identity element with respect to the
addition of vectors. Furthermore, for any vector v in Rn ,

v C .v/ D 0:

Thus, v is the inverse of v with respect to the addition of vectors.


126 Chapter 3 • Euclidean Vector Spaces

Consequently, we may collect all the above properties in the following theorem (see
Definition 1.1.10).

3
Theorem 3.2.1 (Group Structure of Rn )
The space .Rn ; C/ is an Abelian group with respect to the addition .C/.

Let us list a number of further properties:


Using (3.4), we have for any vector v in Rn ,

1v D 1.v1 ; v2 ; : : : ; vn / D .1v1 ; 1v2 ; : : : ; 1vn / D .v1 ; v2 ; : : : ; vn / D v:

Thus, 1 is the identity element for multiplication by scalars.


Next, it is clear that for any two scalars k1 and k2 , we have

.k1 k2 /v D .k1 k2 v1 ; k1 k2 v2 ; : : : ; k1 k2 vn / D k1 .k2 v1 ; k2 v2 ; : : : ; k2 vn / D k1 .k2 v/:

This property represents the compatibility of scalar multiplication with the multiplica-
tion in R.
Next, (3.3) and (3.4) imply that

k.v C w/ D k.v1 C w1 ; v2 C w2 ; : : : ; vn C wn /
D .k.v1 C w1 /; k.v2 C w2 /; : : : ; k.vn C wn //
D .kv1 C kw1 ; kv2 C kw2 ; : : : ; kvn C kwn /
D k.v1 ; v2 ; : : : ; vn / C k.w1 ; w2 ; : : : ; wn /
D kv C kw:

Thus, distributivity of scalar multiplication with respect to vector addition holds.


Finally, distributivity of scalar multiplication with respect to the addition in R also
holds. That is, for any two scalar k1 and k1 , we have

.k1 C k2 /v D ..k1 C k2 /v1 ; .k1 C k2 /v2 ; : : : ; .k1 C k2 /vn /


D .k1 v1 C k2 v1 ; k1 v2 C k2 v2 ; : : : ; k1 vn C k2 vn /
D .k1 v1 ; k1 v2 ; : : : ; k1 vn / C .k2 v1 ; k2 v2 ; : : : ; k2 vn /
D k1 v C k2 v:

The above four properties together with Theorem 3.2.1 endow Rn with an algebraic
structure called vector space.
Let us give a general definition of a vector space E over a field K.
3.2  Vector Addition and Multiplication by a Scalar
127 3
Definition 3.2.2 (Vector Space)
Let E be a nonempty set and K be a field. Let two operations be given on E: one
internal denoted by .C/ and defined as:

EE!EW .v; w/ 7! v C w

and one external denoted by ./ and defined as:

KE !E W .k; v/ 7! k  v D kv:

Then, E is called a vector space over the field K if for all x; y in E and for all ;  in
K the following properties hold:
1. .E; C/ is an Abelian group.
2. 1K x D x.
3. .x/ D ./x.
4. .x C y/ D x C y.
5. . C /x D x C x.

Example 3.2
As we have seen above, Rn is a vector space over R. 

Notation. We ca also write a vector v D .v1 ; v2 ; : : : ; vn / in Rn as a column vector


23
v1
6 : 7
vD6 7
4 :: 5 :
vn

In this case and as we have seen in  Chap. 1, the vector v can be seen as an n  1
matrix.

3.2.4 Linear Combination

As shown in ⊡ Fig. 3.11, the vector v can be written in the coordinate system as

.v1 ; v2 / D .v1 ; 0/ C .0; v2 / D v1 .1; 0/ C v2 .0; 1/:

Now, if we denote e1 D .1; 0/ and e2 D .0; 1/, then v can be written as

v D v1 e1 C v2 e2 ; (3.5)

where v1 and v2 are scalars. Relation (3.5) says that the vector v is a linear combination
of the two vectors e1 and e2 .
128 Chapter 3 • Euclidean Vector Spaces

⊡ Fig. 3.11 The vector v is a y


linear combination of e1 and e2

3 2e2

x
1e1

Now, we generalize the above notion to the space Rn as follows.

Definition 3.2.3 (Linear Combination)


Let v; v1 ; v2 ; : : : ; vp be vectors in Rn . We say that v is a linear combination of
v1 ; v2 ; : : : ; vp if there exist scalars k1 ; k2 ; : : : ; kp such that

v D k1 v1 C k2 v2 C    C kp vp : (3.6)

Example 3.3
Let v D .v1 ; v2 ; v3 / be a vector in R3 ; then v can be written as

v D v1 .1; 0; 0/ C v2 .0; 1; 0/ C v3 .0; 0; 1/:

Thus, v is a linear combination of the vectors e1 D .1; 0; 0/; e2 D .0; 1; 0/, and e3 D .0; 0; 1/.


3.3 Norm, Dot Product, and Distance in Rn

In this section, we define some numbers associated to vectors, called norm, dot product,
and distance.

3.3.1 The Norm of a Vector in Rn

As, we have said earlier, a vector is determined by its length (or magnitude or norm) and
its direction. Now, to define the norm of a vector in Rn , let us first consider the vector
v D .v1 ; v2 / in the coordinate system R2 as in ⊡ Fig. 3.11. Then, using Pythagoras’
theorem, we have
q
kvk2 D v12 C v22 ; or kvk D v12 C v22 ;

where kvk denotes the length of v. This indicates how to generalize the above notion to
vectors in Rn for any n  2.
3.3  Norm, Dot Product, and Distance in Rn
129 3
Definition 3.3.1 (Norm of a Vector)
Let v D .v1 ; v2 ; : : : ; vn / be a vector in Rn . Then the norm of v is denoted by kvk
and is defined as
q
kvk D v12 C v22 C    C vn2 : (3.7)

The norm in (3.7) is called the Euclidean norm since it is associated with the
Euclidean geometry.

Example 3.4
Consider the vector v D .1; 0; 2/. Then,
p p
kvk D 12 C 02 C .2/2 D 5:

Let us list some properties of the norm of a vector.

Theorem 3.3.1 (Properties of the Norm of a Vector)


Let v D .v1 ; v2 ; : : : ; vn / be a vector in Rn and k be a scalar. Then, we have
1. kvk  0.
2. kvk D 0 if and only if v D 0.
3. kkvk D jkjkvk.

Proof
1. The first property is a direct consequence of (3.7).
2. Assume first that kvk D 0. Then by (3.7), we deduce that

v12 C v22 C    C vn2 D 0;

whence, v1 D v2 D    D vn D 0: Thus, v D .0; 0; : : : ; 0/ D 0. Conversely, by definition,


p
0 D 02 C 02 C    C 02 D 0:

3. Since kv D .kv1 ; kv2 ; : : : ; kvn /, then,


p
kkvk D .kv1 /2 C .kv2 /2 C    C .kvn /2
q
D k2 .v12 C v22 C    C vn2 /
q
D jkj v12 C v22 C    C vn2

D jkjkvk:
130 Chapter 3 • Euclidean Vector Spaces

ⓘ Remark 3.3.2
It is clear from above that the norm of a vector in Rn can be seen as a mapping N
defined on the spaces Rn and with values into RC as:
3
N W Rn ! RC ; v 7! N .v/ D kvk;

such that the properties in Theorem 3.3.1 are satisfied. t


u

Unit Vectors

Definition 3.3.2 (Unit Vector)


A vector of norm 1 is called a unit vector.

Example 3.5
The vectors e1 D .1; 0/ and e2 D .0; 1/ are unit vectors in R2 , since ke1 k D ke2 k D 1. 

Example 3.6 (Normalizing a Nonzero Vector)


Let v be a nonzero vector in Rn . Let
v
uD :
kvk

Then, u is a unit vector since


 
 v  1
kuk D  
 kvk  D kvk kvk D 1;

where we have applied the last property in Theorem 3.3.1, with k D 1=kvk. The above
process of obtaining the unit vector u is called the normalization of v. We can write v as

v D kvku:

Since kvk  0, we see that the vectors u and v have the same direction. 

Example 3.7 (Standard Unit Vectors in Rn )


The vectors

e1 D .1; 0; 0; : : : ; 0/ e2 D .0; 1; 0; : : : ; 0/; : : : ; en D .0; : : : ; 0; 1/

are unit vectors in Rn , called the standard unit vectors in Rn . 

ⓘ Remark 3.3.3 It is clear that any vectors v D .v1 ; v2 ; : : : ; vn / can be written as a linear
combination of the standard unit vectors e1 ; e2 ; : : : ; en as

v D v1 e1 C v2 e2 C : : : vn en :
3.3  Norm, Dot Product, and Distance in Rn
131 3
3.3.2 Distance in Rn

Definition 3.3.3 (Distance)


Let P1 .x1 ; x2 ; : : : ; xn / and P2 . y1 ; y2 ; : : : ; yn / be two points in Rn . We define the
# »
distance between P1 and P2 to be the norm of the vector P1 P2 and we write

# » p
d.P1 ; P2 / D kP1 P2 k D . y1  x1 /2 C . y2  x2 /2 C    C . yn  xn /2 : (3.8)

Example 3.8
Let P1 .0; 1; 2/ and P2 .1; 3; 1/ be two points in R3 . Then

# » p p
d.P1 ; P2 / D kP1 P2 k D .1  0/2 C .3  1/2 C .1  2/2 D 6:

In the following theorem, we list some properties of the distance.

Theorem 3.3.4 (Properties of the Distance)


Let P1 .x1 ; x2 ; : : : ; xn /, P2 . y1 ; y2 ; : : : ; yn / and P3 .z1 ; z2 ; : : : ; zn / be three points in Rn .
Then, we have
1. d.P1 ; P2 /  0.
2. d.P1 ; P2 / D 0 if and only if P1 D P2 .
3. d.P1 ; P2 / D d.P2 ; P1 /.
4. d.P1 ; P2 /  d.P1 ; P2 / C d.P2 ; P3 / (triangle inequality).

Proof
The first and the second properties are obvious. To prove the third property, we have
p
d.P1 ; P2 / D . y1  x1 /2 C . y2  x2 /2 C    C . yn  xn /2
p
D .x1  y1 /2 C .x2  y2 /2 C    C .xn  yn /2

D d.P2 ; P1 /:

For the fourth property, since the distance is positive, it is enough to show that
 2  2
d.P1 ; P2 /  d.P1 ; P2 / C d.P2 ; P3 / : (3.9)
132 Chapter 3 • Euclidean Vector Spaces

We have
 2
d.P1 ; P2 / D .x1  y1 /2 C .x2  y2 /2 C    C .xn  yn /2
3
D ..x1  z1 / C .z1  y1 //2 C ..x2  z2 / C .z2  y2 //2 C : : :
C ..xn  zn / C .zn  yn //2
D .x1  z1 /2 C .x2  z2 /2 C    C .xn  zn /2
C .z1  y1 /2 C .z2  y2 /2 C    C .zn  yn /2

C 2.x1  z1 /.z1  y1 / C 2.x2  z2 /.z2  y2 / C    C 2.xn  zn /.zn  yn /:


(3.10)

Clearly,

.x1  z1 /.z1  y1 / C .x2  z2 /.z2  y2 / C    C .xn  zn /.zn  yn /


rh ih i
 .x1  z1 /2 C .x2  z2 /2 C    C .xn  zn /2 .z1  y1 /2 C .z2  y2 /2 C    C .zn  yn /2

D d.P1 ; P2 /d.P2 ; P3 /

Plugging this last inequality into (3.10), then, we obtain (3.9). t


u

3.3.3 The Dot Product of Two Vectors in Rn

Since vectors in Rn can be seen as column matrices, we can not multiply two vectors
using matrix multiplication (Definition 1.1.11). Instead, we can define the dot product
of two vectors. To examine a simple case first consider two vectors u D .u1 ; u2 / and
v D .v1 ; v2 / in R2 and let  be the acute angle between them, as in ⊡ Fig. 3.12. We
define the dot product of these two vectors as the product of their norms and the cosine
of the angle  :

u  v D kukkvk cos : (3.11)

Formula (3.11) is based on 2-dimensional geometry and it cannot be applied for vectors
in Rn with n > 3. For this reason we will provide an algebraic formula to define the

⊡ Fig. 3.12 The dot product of u


u and v

θ
3.3  Norm, Dot Product, and Distance in Rn
133 3
⊡ Fig. 3.13 The cosine law
A c

B
b
a
C

⊡ Fig. 3.14 The sum of two y


vectors v C w in a coordinate
system
P(u 1 , u 2 )

u
Q( 1 , 2)

θ
x

dot product of two vectors in Rn for any n  1. To do this, let us first review the cosine
formula for any triangle with A; B and C as its angles and a; b and c the lengths of its
sides, as shown in ⊡ Fig. 3.13.
We have

c2 D a2 C b2  2ab cos C;
a2 D b2 C c2  2bc cos A; (3.12)
b2 D a2 C c2  2ac cos B:

Consider two vectors u D .u1 ; u2 / and v D .v1 ; v2 / in the standard coordinate


system of R2 as in ⊡ Fig. 3.14.
Using the cosine rule (3.12), we have

kPQk2 D kuk2 C kvk2  2kukkvk cos : (3.13)

As we have seen before,



kPQk2 D .v1  u1 /2 C .v2  u2 /2 ; kvk2 D v12 C v22 ; kuk2 D u21 C u22 :

Plugging these last expressions into (3.13), we have

u  v D kukkvk cos  D u1 v1 C u2 v2 :
134 Chapter 3 • Euclidean Vector Spaces

In this last formula, we got a rid of the cosine of the angle  and so, by knowing the
components of the vectors u and v, we can easily compute the dot product between the
two vectors using the algebraic expression
3
u  v D u1 v1 C u2 v2 :

This indicates how to generalize the definition of the dot product for any two vectors
u D .u1 ; u2 ; : : : ; un / and v D .v1 ; v2 ; : : : ; vn / in Rn , for all n  2.

Definition 3.3.4 (Dot Product)


Let u D .u1 ; u2 ; : : : ; un / and v D .v1 ; v2 ; : : : ; vn / be two vectors in Rn . Then, the
dot product (also called the inner product or scalar product) of u and v is defined as

u  v D u1 v1 C u2 v2 C    C un vn : (3.14)

Example 3.9
Consider the vectors u D .1; 2; 1/ and v D .3; 0; 1/ in R3 . Then

u  v D 1  3 C 2  0 C .1/  .1/ D 2:

It is clear that formula (3.14) allows us to deduce several properties of the dot
product. We list some of them.

Theorem 3.3.5 (Properties of the Dot Product)


Let u D .u1 ; u2 ; : : : ; un /; v D .v1 ; v2 ; : : : ; vn /, and w D .w1 ; w2 ; : : : ; wn / be vectors
in Rn and k be a scalar. Then, we have
1. u  0 D 0:
2. u  u D kuk2 :
3. u  v D v  u.
4. u  .v C w/ D u  v C u  w.
5. k.u  v/ D .ku/  v D u  .kv/.

Proof
1. It is clear that, since 0 D .0; 0; : : : ; 0/, then u  0 D 0.
2. By Definition 3.3.1,

u  u D u21 C u22 C    C u2n D kuk2 :


3.3  Norm, Dot Product, and Distance in Rn
135 3
3. It is obvious that

u  v D u1 v1 C u2 v2 C    C un vn
D v1 u1 C v2 u2 C    C vn un
D v  u:

4. By (3.3),

v C w D .v1 C w1 ; v2 C w2 ; : : : ; vn C wn /:

Hence,

u  .v C w/ D u1 .v1 C w1 / C u2 .v2 C w2 / C    C un .vn C wn /

D .u1 v1 C u2 v2 C    C un vn / C .u1 w1 C u2 w2 C    C un wn /
D u  v C u  w:

5. Similarly, using (3.4), we have

k.u  v/ D .ku1 /v1 C .ku2 /v2 C    C .kun /vn


D .ku/  v
D u1 .kv1 / C u2 .kv2 / C    C un .kvn /
D u  .kv/:

t
u

It is clear from above that the dot product has very similar properties to those of the
usual multiplication in R. Here is a simple example.

Example 3.10
For any two vectors u and v in Rn , we have

.u  2v/  .u C 3v/ D u  .u C 3v/ C .2v/  .u C 3v/


D .u  u/ C .u  .3v// C .2v/  u C .2v/  .3v/

D kuk2 C 3.u  v/  2.v  u/  6kvk2


D kuk2 C .u  v/  6kvk2 :

Next, we prove a very important inequality, called the Cauchy–Schwarz inequality.


This inequality evaluates the absolute value of the dot product of two vectors in terms
136 Chapter 3 • Euclidean Vector Spaces

of their norms. Its idea is very simple and for vectors in R2 or R3 it is a direct
consequence of the definition of the dot product (formula (3.11)) and the properties
of the cosine function. So, let u and v be two nonzero vectors in R2 or R3 . Then, from
3 (3.11), we have
uv
cos  D ;
kukkvk

and since

j cos  j  1;

it follows that
ˇ ˇ
ˇ uv ˇ
ˇ ˇ
ˇ kukkvk ˇ  1:

This yields

ju  vj  kukkvk: (3.15)

Inequality (3.15) is called the Cauchy–Schwarz inequality (in R2 or R3 ) and trivially


holds if u or v is the zero vector. Now, we can state and prove this inequality for any two
vectors u and v in Rn .

Theorem 3.3.6 (Cauchy–Schwarz Inequality)


Let u D .u1 ; u2 ; : : : ; un / and v D .v1 ; v2 ; : : : ; vn / be two vectors in Rn . Then,

ju  vj  kukkvk: (3.16)

That is,
q q 
ju1 v1 C u2 v2 C    C un vn j  u21 C u22 C    C u2n v12 C v22 C    C vn2 :

The equality holds if and only if u D kv for some a scalar k D uv


kvk2
.

Proof
In fact there are several proofs of the Cauchy–Schwarz inequality; here we present one that
uses the above properties of the dot product.
As, we have seen above, it is clear that for any two vectors u and v in Rn and for any 
in R, we have

.u C v/  .u C v/ D ku C vk2  0:


3.3  Norm, Dot Product, and Distance in Rn
137 3
On the other hand, by using the properties of the dot product, we have

.u C v/  .u C v/ D .u  u/ C 2 .v  v/ C 2.u  v/


D 2 kvk2 C 2.u  v/ C kuk2
D a2 C 2b C c; (3.17)

with

a D kvk2 ; b D .u  v/; c D kuk2 :

Since a > 0, the polynomial of  in (3.17) is positive for any  in R if and only if the
discriminant

 D b2  ac < 0;

that is
p p
jbj  a c;

or equivalently

j.u  v/j  kukkvk;

which is exactly the Cauchy–Schwarz inequality.


Now, it is clear that if u D kv, then,

ju  vj D kkvk2 D kkvkkvk
D kukkvk:

On the other hand, if ju  vj D kukkvk, then, u D uv


kvk2
v. This finishes the proof of
Theorem 3.3.6. t
u

Example 3.11
Consider the vectors u D .1; 0; 1/ and v D .1; 2; 0/. Thus, we have
p p
kuk D 2; kuk D 5; u  v D 1;

It is clear that
p p
1< 2 5;

which means that the Cauchy–Schwarz inequality holds. 


138 Chapter 3 • Euclidean Vector Spaces

Example 3.12 (Young’s Inequality)


Consider the two vectors u D .a; b/ and v D .b; a/ in R2 . Then, by applying the Cauchy–
Schwarz inequality, we have
3
ju  vj D 2jabj  kukkvk D a2 C b2 :

Thus, for any two real numbers a and b,

a2 b2
jabj  C : (3.18)
2 2

p b
Now, let > 0 be any real number. Applying (3.18) for a0 D a and b0 D p , we obtain

a0 2 b0 2
ja0 b0 j  C ;
2 2

or equivalently,

2 b2
jabj  a C : (3.19)
2 2

Inequality (3.19) is known as Young’s inequality. If we put =2 D , then inequality (3.19)


becomes

b2
jabj  a2 C : (3.20)
4

Example 3.13
Show that for any real numbers a1 ; a2 ; : : : ; an , we have the inequality

.a1 C a2 C    C an /2  n.a21 C a22 C    C a2n /: (3.21)

Solution
Consider in Rn the two vectors

u D .1; 1; : : : ; 1/ and v D .a1 ; a2 ; : : : ; an /:

Then, we have the Cauchy–Schwarz inequality,

ju  vj  kukkvk:
3.3  Norm, Dot Product, and Distance in Rn
139 3
That is,

p q 2
ja1 C a2 C    C an j  n a1 C a22 C    C a2n ;

or equivalently

.a1 C a2 C    C an /2  n.a21 C a22 C    C a2n /:

Which is exactly (3.21). J

Now, having the Cauchy–Schwarz inequality, we can prove some important inequal-
ities and identities.

Theorem 3.3.7 (Triangle Inequality)


Let u and v be two vectors in Rn . Then,

ku C vk  kuk C kvk: (3.22)

Proof
We know that

.u C v/  .u C v/ D ku C vk2 : (3.23)

On the other hand, by using the properties of the dot product,

.u C v/  .u C v/ D u  u C v  v C 2.u  v/

D kuk2 C kvk2 C 2.u  v/: (3.24)

By the Cauchy–Schwarz inequality, we have

2.u  v/  2j.u  v/j  2kukkvk:

Plugging this last inequality into (3.24), we obtain

.u C v/  .u C v/  kuk2 C kvk2 C 2kukkvk


D .kuk C kvk/2 : (3.25)

Comparing (3.23) and (3.25), we get

ku C vk2  .kuk C kvk/2 ;

which eventually leads to (3.22). t


u
140 Chapter 3 • Euclidean Vector Spaces

⊡ Fig. 3.15 The parallelogram


identity

u−
3
u

+
u
Theorem 3.3.8 (Parallelogram Identity)
Let u and v be two vectors in Rn . Then,

ku C vk2 C ku  vk2 D 2.kuk2 C kvk2 /: (3.26)

Proof
The parallelogram identity in R2 is illustrated in ⊡ Fig. 3.15. To prove it, we have, on one
hand

ku C vk2 D .u C v/  .u C v/ D .u  u/ C .v  v/ C 2.u  v/
D kuk2 C kvk2 C 2.u  v/: (3.27)

On the other hand,

ku  vk2 D .u  v/  .u  v/ D u  u C v  v  2.u  v/
D kuk2 C kvk2  2.u  v/: (3.28)

Combining (3.27) and (3.28), we obtain (3.26). t


u

Above, we wrote the norm of a vector u in terms of the dot product. Now, we want
to do the opposite. So, can we write the dot product of two vectors using their norms?
The answer of this question is given in the following theorem.

Theorem 3.3.9 (Polarization Identity)


Let u and v be two vectors in Rn . Then, we have

1 1
uv D ku C vk2  ku  vk2 : (3.29)
4 4
3.4  Orthogonality in Rn
141 3
Proof
To prove (3.29), we subtract (3.28) from (3.27), obtaining

ku C vk2  ku  vk2 D 4.u  v/;

which is exactly (3.29). t


u

3.4 Orthogonality in Rn

From Euclidean geometry we know that two vectors u and v in R2 are orthogonal
or perpendicular if and only if the acute angle between them is  D =2. This is a
geometric definition. To define the orthogonality of vectors in Rn for any n  2, we
need to find instead an algebraic formulation. We have seen from formula (3.11) that if
we know the angle between two vectors, then we can define the algebraic quantity that
we called the dot product. Thus, if  D =2, then we have from (3.11) that

u  v D 0: (3.30)

We see immediately that the two vectors are orthogonal if and only if their dot product
is zero. Since (3.30) is an algebraic definition of the orthogonality, then we can take it
as a definition of the orthogonality of vectors in Rn for any n  2.

Definition 3.4.1 (Orthogonality in Rn )


Let u and v be two vectors in Rn . Then u and v are said to be orthogonal if

u  v D 0: (3.31)

Example 3.14
1. The vectors e1 D .1; 0/, and e2 D .0; 1/ are orthogonal in R2 since e1  e2 D 0.
2. In R3 , the vectors e1 D .1; 0; 0/; e2 D .0; 1; 0/ and e3 D .0; 0; 1/ are pairwise orthogonal
since

e1  e2 D e2  e3 D e3  e1 D 0:

3. The vectors u D .1; 0; 1; 0/ and v D .1; 0; 1; 2/ are orthogonal in R4 .




Example 3.15
1. Show that the vectors u D .a; b/ and v D .b; a/ are orthogonal in R2 .
2. Use the above result to find two vectors that are orthogonal to w D .2; 3/.
3. Find a unit vector that is orthogonal to y D .5; 12/.

142 Chapter 3 • Euclidean Vector Spaces

Solution
1. We have

3 u  v D ab  ba D 0;

so u and v are orthogonal.


2. Let w D .2; 3/. Then, we have w D .a; b/ with a D 2 and b D 3. From the first
question, we deduce that w is orthogonal to the vector .b; a/ D .3; 2/. Also, the vector
.a; b/ is orthogonal to the vector .b; a/. Thus, w is also orthogonal to .3; 2/.
3. Let y D .5; 12/. Then from above, we deduce that y is orthogonal to .12; 5/. Denote
z D .12; 5/. We form a unit vector from this vector by putting (see Example 3.6)

1 1 1
uD zD p .12; 5/ D .12; 5/ D .12=13; 5=13/:
kzk 2
.12/ C .5/ 2 13

It is clear that u is orthogonal to y since

y  u D 5.12=13/ C 12.5=13/ D 0:

3.4.1 Orthogonal Projections in Rn



Let F be a force applied to a particle moving along a path. The magnitude of this force
in the direction of a nonzero vector v is given by

#» #» v
kFv k D k F k cos 
kvk

where v=kvk is the unit vector in the direction of v. The vector Fv is called the

orthogonal projection of F on v.
Before we define the projection of a vector u on a nonzero vector v, we prove the
following assertion.

Theorem 3.4.1
Let u and v be two vectors in Rn . Assume that v is a nonzero vector. Then, u can be
written in exactly one way in the form

u D w1 C w2 ; (3.32)

where w1 D kv and w2 is orthogonal to v (as shown in ⊡ Fig. 3.16).


3.4  Orthogonality in Rn
143 3
⊡ Fig. 3.16 The projection of u
u
on v w2

w1

Proof
Since w1 D kv, where k is a scalar, and since w2 is orthogonal to v, we have w2  v D 0.
Thus, we obtain

u  v D .w1 C w2 /  v

D w1  v C w2  v
D .kv/  v

D kkvk2 :

Consequently,

uv
kD ;
kvk2

whence
uv uv
w1 D v and w2 D u  v:
kvk2 kvk2

t
u

Definition 3.4.2 (Orthogonal Projection)


The above vector w1 is called the orthogonal projection of u on v and we denote

uv
projv u D v: (3.33)
kvk2

ⓘ Remark 3.4.2 It is clear that the vector w1 can be written as

uv v
w1 D ;
kvk kvk

and since v=kvk is a unit vector, we call the number

uv
kvk

the component of u in the direction of v.


144 Chapter 3 • Euclidean Vector Spaces

⊡ Fig. 3.17 Pythagoras’


theorem in a right triangle
u+
3
u

Example 3.16
Let u D .1; 1; 0/ and v D .2; 0; 1/. Then, we have

uv
projv u D v
kvk2
2
D p .2; 0; 1/
5
 
4 2
D p ; 0; p :
5 5

Pythagoras’ Theorem in Rn
In Euclidean geometry, Pythagoras’ theorem says that in a right triangle as in ⊡
Fig. 3.17, we have

ku C vk2 D kuk2 C kvk2 : (3.34)

Since the angle between the vectors u and v is  D =2, we have by making use of
(3.11) that u  v D 0. Consequently, instead of the geometric property discussed above,
we formulated an algebraic property that the two vectors should satisfy in order for the
Pythagoras theorem to hold. That is to say, if the dot product of two vectors is zero, then
the identity (3.34) is satisfied. Thus, using this algebraic assumption, we can generalize
the Pythagoras theorem for any two orthogonal vectors in Rn and we have the following
statement.

Theorem 3.4.3 (Pythagoras’ Theorem in Rn )


Let u D .u1 ; u2 ; : : : ; un / and v D .v1 ; v2 ; : : : ; vn / be two orthogonal vectors in Rn .
Then,

ku C vk2 D kuk2 C kvk2 : (3.35)


3.5  Exercises
145 3
Proof
Since u and v are orthogonal, u  v D 0. On the other hand,

ku C vk2 D .u C v/  .u C v/ D .u  u/ C .v  v/ C 2.u  v/

D kuk2 C kvk2 C 2.u  v/

D kuk2 C kvk2 ;

which is exactly (3.35). t


u

3.5 Exercises

Exercise 3.1 (Orthogonal Matrix)


Consider the matrix
2 1 1 3
p p 0
6 2 2 7
6 7
6 1 1 1 7
AD6 
6 2 2 p 7:
7
6 2 7
4 1 1 1 5
 p
2 2 2

Show that:
1. Each row of A is a unit vector and each column of A is a unit vector.
2. The row vectors of A are pairwise orthogonal.
3. The column vectors of A are pairwise orthogonal.

A matrix with the above properties is called an orthogonal matrix. See  Sect. 8.1. 

Solution
We denote by
     
1 1 1 1 1 1 1 1
r1 D p ; p ; 0 ; r2 D  ; ; p ; r3 D ; ; p
2 2 2 2 2 2 2 2

the row vectors of A, and by


     
1 1 1 1 1 1 1 1
c1 D p ;  ; ; c2 D p ; ;  ; c3 D 0; p ; p
2 2 2 2 2 2 2 2

the column vectors of A.


1. It is clear that
r
1 1
kr1 k D kc3 k D C D 1:
2 2
146 Chapter 3 • Euclidean Vector Spaces

Similarly, we have
r
1 1 1
3 kr2 k D kr3 k D kc1 k D kc2 k D C C D 1:
4 4 2

2. We have

1 1 1 1
r1  r2 D  p   p D 0:
2 2 2 2

Similarly,

r1  r3 D r2  r3 D 0:

Thus, the row vectors r1 ; r2 , and r3 are pairwise orthogonal.


3. We can easily check that

c1  c2 D c2  c3 D c3  c1 D 0;

i.e., the column vectors c1 ; c2 , and c3 are pairwise orthogonal. J

Exercise 3.2
Let u and v be two vectors in Rn .
1. Show that
ˇ ˇ
ˇ ˇ
ˇkuk  kvkˇ  ku  vk: (3.36)

2. Prove that

ku C vkku  vk  kuk2 C kvk2 : (3.37)

When does equality hold? Give a geometric interpretation of this inequality in R2 .

Solution
1. To establish (3.36), we write u D v C .u  v/. Then, by the triangle inequality (3.22),

kuk D kv C .u  v/k
 kvk C ku  vk:

That is

kuk  kvk  ku  vk: (3.38)


3.5  Exercises
147 3
On the other hand, we also have v D u C .v  u/, so, as above

kvk D ku C .v  u/k

 kuk C kv  uk:

Therefore,

kvk  kuk  kv  uk D ku  vk: (3.39)

Now (3.38) and (3.39) yield (3.36).


2. To prove (3.37), we have, as in (3.27) and (3.28),

ku C vk2 D kuk2 C kvk2 C 2.u  v/ (3.40)

and

ku  vk2 D kuk2 C kvk2  2.u  v/; (3.41)

which in turn yield


  
ku C vk2 ku  vk2 D kuk2 C kvk2 C 2.u  v/ kuk2 C kvk2  2.u  v/
D .kuk2 C kvk2 /2  4.u  v/2 :

Since .u  v/2  0, then we have

ku C vk2 ku  vk2  .kuk2 C kvk2 /2 ;

which is exactly (3.37). Of course equality holds if and only if u  v D 0, which means
if and only if the vectors are orthogonal. Geometrically, this means that the product of the
lengths of the diagonals in a parallelogram as in ⊡ Fig. 3.15 is less than or equal to the sum
kuk2 C kvk2 , and we have equality if and only if the parallelogram is a square, that is, if and
only if the vectors u and v are orthogonal. J

Exercise 3.3
1. Suppose that u; v; w, and z are vectors in Rn such that

u C v C w C z D 0:

Prove that u C v is orthogonal to u C z if and only if

kuk2 C kwk2 D kvk2 C kzk2 : (3.42)


148 Chapter 3 • Euclidean Vector Spaces

2. Prove that
 
1 1 1 1
16  .a C b C c C d/ C C C (3.43)
3 a b c d

for all positive numbers a; b; c; d. 

Solution
1. We first compute kwk2 , obtaining

kwk2 D w  w

D ..u C v C z//  ..u C v C z//


 
D .u  u/ C .v  v/ C .z  z/ C 2 .u  v/ C .v  z/ C .u  z/
 
D kuk2 C kvk2 C kzk2 C 2 .u  v/ C .u  u/  .u  u/ C .v  z/ C .u  z/ : (3.44)

We have

.u  v/ C .u  u/ D .u  .u C v// and .v  z/ C .u  z/ D .u C v/  z:

Consequently,

.u  v/ C .u  u/ C .v  z/ C .u  z/ D u  .u C v/ C .v C u/  z

D .u C v/  .u C z/:

Plugging this last identity into (3.44), we get


 
kwk2 D kuk2 C kvk2 C kzk2 C 2 .u C v/  .u C z/  u  u

D kvk2 C kzk2  kuk2 C 2.u C v/  .u C z/;

since u  u D kuk2 . Now, from this last formula, we see that if u C v and u C z are orthogonal,
then

kwk2 C kuk2 D kvk2 C kzk2 : (3.45)

Conversely, if (3.45) holds, then .u C v/  .u C z/ D 0, i.e., u C v and u C z are orthogonal.


2. We consider the two vectors in R4
 
p p p p 1 1 1 1
u D . a; b; c; d/ and vD p ; p ; p ; p :
a b c d
3.5  Exercises
149 3
Then
r
p 1 1 1 1
kuk D aCbCcCd and kvk D C C C :
a b c d
Also, we have
p 1 p 1 p 1 p 1
uv D a p C b p C c p C d p D 4:
a b c d

Applying the Cauchy–Schwarz inequality, we get

.u  v/2  kuk2 kvk2 ;

that is
 
1 1 1 1
16  .a C b C c C d/ C C C ;
a b c d

which is exactly inequality (3.43). In fact, the above inequality can be generalized to any
positive numbers ai ; 1  i  n, as follows:
! !
X
n Xn
1
2
n  ai :
iD1
a
iD1 i

Exercise 3.4 (Apollonius’ Identity)


Let u; v, and w be vectors in Rn . Show that

1
ku  vk2 D 2kw  uk2 C 2kw  vk2  4kw  .u C v/k2 : (3.46)
2

Solution
Recall the parallelogram identity:

kU C Vk2 C kU  Vk2 D 2.kUk2 C kVk2 /; (3.47)

for any two vectors U and V in Rn . Applying (3.47) to the vectors

wu wv
UD and VD ;
2 2
we get

1 1 1 1
kw  .u C v/k2 C k .u  v/k2 D 2k .w  u/k2 C 2k .w  v/k2 ;
2 2 2 2
150 Chapter 3 • Euclidean Vector Spaces

⊡ Fig. 3.18 Apollonius’


identity

a d
3 b

or

1 1 1 1
kw  .u C v/k2 C ku  vk2 D kw  uk2 C k.w  v/k2 :
2 4 2 2

This gives (3.46).


In R2 , Apollonius’ identity has the following geometric meaning: in a triangle with sides
of lengths a; b and c, let d be the length of the line segment from the midpoint of the side of
length c to the opposite vertex, see ⊡ Fig. 3.18. Then,

1 2
a2 C b2 D c C 2d2 :
2
J

Exercise 3.5
Prove that the following statements are equivalent:
1. ku  vk D ku C vk:
2. ku C vk2 D kuk2 C kvk2 :
3. The vectors u and v are orthogonal.

Solution
In order to show that the above statements are equivalent, we need to show that .1/ ) .2/ )
.3/ ) .1/:
First, assume that ku  vk D ku C vk: Then,

ku  vk2 D ku C vk2 :

On the other hand,

ku C vk2 D kuk2 C kvk2 C 2.u  v/ (3.48)

and

ku  vk2 D kuk2 C kvk2  2.u  v/: (3.49)


3.5  Exercises
151 3
Hence, the right-hand sides of (3.48) and (3.49) must coincide, i.e.,

u  v D u  v;

or equivalently, u  v D 0: Using this last equality in (3.48), we obtain

ku C vk2 D kuk2 C kvk2 :

Thus, we have proved that .1/ ) .2/.


Next, assume that .2/ is satisfied. Then (3.48) implies that u  v D 0. This means that u
and v are orthogonal. This shows that .2/ ) .3/.
Finally, we assume that u and v are orthogonal, that is u  v D 0. Using (3.48) and (3.49),
we deduce that ku C vk2 D ku  vk2 , which is exactly .1/. Thus, we have proved that
.3/ ) .1/. J

Exercise 3.6 (Dunkl–Williams Inequality)


Let u and v be two nonzero vectors in Rn . Show that
 
1  u v 
.kuk C kvk/  
 kuk kvk   ku  vk;
 (3.50)
2

where equality holds if and only if kuk D kvk or kuk C kvk D ku  vk. 

Solution
We have
 2    
 u v  u v u v
   D   
 kuk kvk  kuk kvk kuk kvk
1 1 2
D .u  u/ C .v  v/  .u  v/
kuk2 kvk2 kukkvk
2
D 2 .u  v/: (3.51)
kukkvk

Using (3.41), we recast then, (3.51) as


 2
 u v  1  
  2 2 2
 kuk  kvk  D kukkvk 2kukkvk  .kuk C kvk  ku  vk / :

Hence
 2
 u v  1  
  2 2
 kuk kvk  D kukkvk ku  vk  .kuk  kvk/ :

152 Chapter 3 • Euclidean Vector Spaces

Multiplying both side in the last identity by .kuk C kvk/2 =4 and adding ku  vk2 to both
sides, we get
 2  2
3  u v  kuk C kvk
ku  vk2   
 kuk kvk  2
.kuk C kvk/2  
D ku  vk2  ku  vk2  .kuk  kvk/2
4kukkvk
.kuk  kvk/2  
D .kuk C kvk/2  ku  vk2 : (3.52)
4kukkvk

Using the triangle inequality (3.22), we deduce that the right-hand side in (3.52) is positive.
Therefore, we obtain
 2  2
 u v 
ku  vk2     kuk C kvk  0;
 kuk kvk  2

which in turn yields (3.50).


It is clear from (3.52) that its right-hand side is zero if and only if kuk D kvk or kuk C
kvk D ku  vk. These last conditions lead to the identity
 
1  u v 
.kuk C kvk/  
 kuk kvk  D ku  vk:

2

Exercise 3.7 (Hölder’s Inequality)


Let u D .u1 ; u2 ; : : : ; un / and v D .v1 ; v2 ; : : : ; vn / be two vectors in Rn . Show that for any
p; q > 1 with 1=p C 1=q D 1,
!1=p !1=q
X
n X
n X
n
ju  vj  jui vi j  jui j
p
jvi j
q
D kukp kvkq ;
iD1 iD1 iD1

with
!1=p !1=q
X
n X
n
kukp D jui j
p
and kvkq D jvi j
q

iD1 iD1

For p D 2, the Hölder inequality reduces to the Cauchy–Schwarz inequality. 

Solution
Without loss of generality, we may assume that

kukp ¤ 0 and kvkq ¤ 0:


3.5  Exercises
153 3
We introduce the new vectors
u v
uD D .u1 ; u2 ; : : : ; un / and vD D .v1 ; v2 ; : : : ; vn /;
kukp kvkq

where
ui vi
ui D and vi D ; i D 1; : : : ; n:
kukp kvkq

We can easily check that

kukp D kvkq D 1:

Now, using the more general Young inequality1

jajp jbjq 1 1
jabj  C ; C D 1;
p q p q

for a D ui and b D vi , we get

jui jp jvi jq
jui vi j  C :
p q

Summing over i we have


ˇXn ˇ X n
ˇ ˇ
ju  vj D ˇ ui vi ˇ  jui vi j
iD1 iD1

X
n
D kukp kvkq jui vi j
iD1
!
Xn
jui jp X jvi jq
n
 kukp kvkq C
iD1
p iD1
q
 
kukp kvkq
D kukp kvkq C D kukp kvkq ;
p q

since kukq D kvkq D 1 and 1=p C 1=q D 1. This yields the desired result. J

Exercise 3.8 (Minkowski’s Inequality)


Let xi ; yi  0 for 1  i  n, and let p > 1. Show that
!1=p !1=p !1=p
X
n X
n
p
X
n
p
.xi C yi /p  xi C yi : (3.53)
iD1 iD1 iD1

1
This Young inequality can be easily shown by using the concavity of the logarithm function.
154 Chapter 3 • Euclidean Vector Spaces

Solution
The inequality is trivial if xi D yi D 0 for all 1  i  n.
Now, without loss of generality, assume that
3
X
n
.xi C yi /p ¤ 0:
iD1

We have

X
n X
n X
n
.xi C yi /p D xi .xi C yi /p1 C yi .xi C yi /p1 : (3.54)
iD1 iD1 iD1

Applying the Hölder inequality to the two terms in the right-hand side of (3.54), with
q D p=.p  1/, we get
!1=p !1=q
X
n X
n
p
X
n
xi .xi C yi / p1
 xi .xi C yi /p
iD1 iD1 iD1

and
!1=p !1=q
X
n X
n
p
X
n
yi .xi C yi / p1
 yi .xi C yi /p :
iD1 iD1 iD1

Using these two estimates and (3.54), we get


!1=p !1=q !1=p !1=q
X
n X
n
p
X
n X
n
p
X
n
.xi C yi /p  xi .xi C yi /p C yi .xi C yi /p ;
iD1 iD1 iD1 iD1 iD1

which is equivalent to (3.53) since 1=p C 1=q D 1.


We leave it to reader to show that the inequality (3.53) is reversed for 0 < p < 1, that is
!1=p !1=p !1=p
X
n X
n
p
X
n
p
.xi C yi /p  xi C yi : (3.55)
iD1 iD1 iD1

Exercise 3.9
Let u and v be two nonzero vectors in Rn . Prove that
   
 1   1 
 u  kukv D v  kvku : (3.56)
 kuk   kvk 


3.5  Exercises
155 3
Solution
We have
 2    
 1  1 1
 u  kukv  D u  kukv  u  kukv
 kuk  kuk kuk
1
D .u  u/  2.u  v/ C kuk2 .v  v/
kuk2
D 1  2.u  v/ C kuk2 kvk2 :

On the other hand, we have


 2    
 1  1 1
 v  kvku D v  kvku  v  kvku
 kvk  kvk kvk
1
D .v  v/  2.u  v/ C kvk2 .u  u/
kvk2
D 1  2.u  v/ C kuk2 kvk2 :

Comparing the two identities above, we obtain


 2  2
 1   
 u  kukv  D  1 v  kvku :
 kuk   kvk 

Taking the square root of both sides in the above identity, then we obtain (3.56). J

Exercise 3.10
Let u and v be two nonzero vectors in Rn .
1. Show that if kuk ¤ kvk, then for any
> 0, we have

kuk
C1  kvk
C1
k.ukuk
 vkvk
/k  ku  vk: (3.57)
kuk  kvk

2. Show that if u ¤ 0 and v ¤ 0, then


 
 u v  kuk
C1  kvk
C1 ku  vk
 
 kuk
C2  kvk
C2   kuk  kvk kuk
C1 kvk
C1
: (3.58)

Solution
1. To show (3.57), it is enough to prove that (since the right-hand side of (3.57) is positive
for
> 1)

k.ukuk
 vkvk
/k2 .kuk  kvk/2  .kuk
C1  kvk
C1 /2 ku  vk2 :
156 Chapter 3 • Euclidean Vector Spaces

By expanding both sides in the above inequality, and using the fact that
   
k.ukuk
 vkvk
/k2 D ukuk
 vkvk
 ukuk
 vkvk

3
D kuk2
C2 C kvk2
C2  2kuk
kvk
.u  v/

and

ku  vk2 D kuk2 C kvk2  2.u  v/;

we obtain
h ih i
kuk2
C2 C kvk2
C2  2kuk
kvk
.u  v/ kuk2 C kvk2  2kukkvk
h ih i
 kuk2
C2 C kvk2
C2  2kuk
C1 kvk
C1 kuk2 C kvk2  2.u  v/ :

After cancelling like terms and moving all terms to the right side, we get the equivalent
inequality

0  .2kukkvk  2.u  v//.kuk


C2  kvk
C2 /.kuk
 kvk
/: (3.59)

By the Cauchy–Schwarz inequality, we deduce that

2kukkvk  2.u  v/  0:

In addition, the two factors

.kuk
C2  kvk
C2 /.kuk
 kvk
/

have the same sign for


> 0. Therefore, (3.59) is satisfied and consequently, (3.57) holds
true.
2. To show (3.58), we have first (by squaring each side and expanding)
   !
 u v   ukvk
C2  vkuk
C2  
 
 kuk
C2  kvk
C2  D  kuk
C2 kvk
C2

 
 ukuk
 vkvk

D kuk
C1 kvk
C1 


1
D k.ukuk
 vkvk
/k :
kuk
C1 kvk
C1

Applying (3.57) to the last term on the right-hand side in the above identity, then (3.58)
holds. J
3.5  Exercises
157 3
Exercise 3.11
Let u and v be two nonzero vectors in Rn . Show that
1 
uv  ku C vk2  .kuk  kvk/2 (3.60)
4
and
1 
uv  .kuk  kvk/2  ku  vk2 : (3.61)
4


Solution
To show (3.60), we have, by the polarization identity (identity (3.29))

1 1
uv D ku C vk2  ku  vk2 : (3.62)
4 4
Now, using (3.38), we obtain

kuk  kvk  ku  vk;

which gives

ku  vk2  .kuk  kvk/2 : (3.63)

Now (3.60) follows by using (3.63) in (3.62).


To show (3.62) we have first

ku C vk2  .kuk  kvk/2 D kuk2 C kvk2 C 2.u  v/  .kuk2 C kvk2  2kukkvk/


D 2kukkvk C 2.u  v/:

Using the Cauchy–Schwarz inequality, we have

2kukkvk C 2.u  v/  0:

Therefore, we deduce that

ku C vk2  .kuk  kvk/2 :

Plugging this last inequality into (3.62), then, (3.61) is satisfied. J

Exercise 3.12 (Lagrange’s Identity)


Let u D .u1 ; u2 ; u3 / and v D .u1 ; v2 ; v3 / be two vectors in R3 . We define the cross product
u  v of u and v in R3 to be the vector

u  v D .u2 v3  u3 v2 ; u3 v1  u1 v3 ; u1 v2  u2 v1 /:
158 Chapter 3 • Euclidean Vector Spaces

Show that

kuk2 kvk2 D .u  v/2 C ku  vk2 : (3.64)


3
This identity relates norms, the dot product, and the cross product. 

Solution
To show (3.64), we compute all the terms in this identity. So, we have

ku  vk2 D .u2 v3  u3 v2 /2 C .u3 v1  u1 v3 /2 C .u1 v2  u2 v1 /2


D u22 v32 C u23 v22 C u23 v12 C u21 v32 C u21 v22 C u22 v12

2.u2 v3 u3 v2 C u3 v1 u1 v3 C u1 v2 u2 v1 /: (3.65)

On the other hand,

.u  v/2 D .u1 v1 C u2 v2 C u3 v3 /2

D u21 v12 C u22 v22 C u23 v32 C 2.u1 v1 u2 v2 / C 2.u1 v1 u3 v3 / C 2.u2 v2 u3 v3 /: (3.66)

Adding (3.65) to (3.66), we get

ku  vk2 C .u  v/2
D u22 v32 C u23 v22 C u23 v12 C u21 v32 C u21 v22 C u22 v12 C u21 v12 C u22 v22 C u23 v32

D .u21 C u22 C u23 /.v12 C v22 C v32 /

D kuk2 kvk2 :

J
159 4

General Vector Spaces

Belkacem Said-Houari

© Springer International Publishing AG 2017


B. Said-Houari, Linear Algebra, Compact Textbooks in Mathematics,
DOI 10.1007/978-3-319-63793-8_4

4.1 Definition and Examples

In  Chap. 3 we studied the properties of the space Rn and extended the definition of its
algebraic structure to a general vector space (Definition 3.2.2). In this chapter, we study
some properties of general vector spaces. We first recall Definition 3.2.2.

Definition 4.1.1 (Vector Space)


Let E be a nonempty set and Ka be a commutative field. We define two operations.
An internal operation on E denoted by .C/ and defined as:

EE!EW .v; w/ 7! v C w;

and an external one denoted by ./ and defined as:

KE !E W .; v/ 7!   v D v:

So, E is a vector space over the field K if .E; C/ is an Abelian group and for all x; y
in E and for all ;  in K the following properties hold:
1. 1  x D x.b
2. .x/ D ./x.
3. .x C y/ D x C y.
4. . C /x D x C x.
a
A commutative field is a commutative unitary ring (Definition 1.2.1) such that e ¤ 0 and all nonzero
elements are invertible with respect to multiplication. For example, .R; C; / and .C; C; / are fields.
b
Here 1 is the identity element with respect to the multiplication in K.

Example 4.1
As we have seen in Example 3.2, Rn is a vector space over the field R. 
160 Chapter 4 • General Vector Spaces

Example 4.2 (C Is a Vector Space over R)


We consider the set of complex numbers z D x C iy with i2 D 1 and x, y real numbers. We
define the addition and multiplication by scalars as follows: if z D x C iy and w D a C ib are
two complex numbers and  is a real number, then

4 z C w D .x C a/ C i.y C b/ and z D .x/ C i.y/:

Then, it is easily checked that .C; C; / is a vector space over R. In fact this vector space has
the same algebraic property as R2 . See Example 5.8 for more details.


Example 4.3 (The Vector Space of m  n Matrices)


The set of matrices .Mmn .K/; C; / with the usual matrix operations of addition and
multiplication by scalars is a vector space over the field K D R or K D C. Indeed, we
have proved in Theorem 1.1.5 that .Mmn .K/; C/ is an Abelian group. Also, it is clear that
for any  and  in K and for any two matrices A and B in Mmn .K/, we have
1. 1  A D A:
2. .A/ D ./A:
3. .A C B/ D A C B:
4. . C /A D A C A:

Example 4.4 (The Vector Space of Real-Valued Functions)


Let X be an arbitrary nonempty set and let E D F .X; R/ be the set of all functions from X to
R. We define the operation of addition .C/ and multiplication by scalars ./, respectively, as

EE!EW .f ; g/ 7! f C g

where for all x in X,

.f C g/.x/ D f .x/ C g.x/

and

RE !E W .; f / 7!   f D f

where

.f /.x/ D f .x/;

for all x in X. Then one can easily check that E D .F .X; R/; C; / is a vector space over the
field R. 
4.2  Properties of Vector Spaces
161 4
We have seen in Example 3.2 that Rn is a vector space over the field R. In fact, this
is true for any field K.

Example 4.5 (Kn Is a Vector Space over K)


Define the addition in Kn by

.a1 ; a2 ; : : : ; an / C .b1 ; b2 ; : : : ; bn / D .a1 C b1 ; a2 C b2 ; : : : ; an C bn /

and multiplication by scalar by

.a1 ; a2 ; : : : ; an / D .a1 ; a2 ; : : : ; n an /

for ai ; bi ; 1  i  n, and  in K. Then, we can easily show that Kn is a vector space


over K. 

Notation We denote the vector space .E; C; / simply by E.

4.2 Properties of Vector Spaces

Let us list some important algebraic properties of vector spaces. These properties will
be used later on in the book. We start with the following theorem.

Theorem 4.2.1 (Properties of Vector Spaces)


Let E be a vector space over a field K. Let u be a vector in E and  be a scalar in K.
Then, we have
1. 0K u D 0E ;
2. 0E D 0E ;
3. ./u D u;
4. if u D 0E , the  D 0K or u D 0E ,

where 0K is the zero of the field K (the identity element with respect to the addition in
K) and 0E is the zero of E (the identity element of the Abelian group .E; C/). If K D R,
we denote 0R simply by 0.

Proof
To prove the first property, we have by using axiom (4) in Definition 4.1.1 that

.  /u C u D ..  / C /u D u:


162 Chapter 4 • General Vector Spaces

Since .E; C/ is an Abelian group, it follows that

.  /u D u  u; (4.1)

which holds for any  and  in K and for any u in E. Recall that if  is in K, then  is also
4 in K, since .K; C/ is an Abelian group with respect to addition in K and  is the inverse of
 with respect the addition in K. Since (4.1) holds for any  and  in K, setting  D  we
have

0K u D 0E :

Next, to prove the second property, we use axiom (3) in Definition 4.1.1 and the fact that
.E; C/ is an Abelian group (v is in E for any v in E), to get

.u  v/ C v D ..u  v/ C v/ D v

for any u and v in E and any  in K. This gives, for the same reason as before,

.u  v/ D u  v; (4.2)

which for u D v yields

0E D 0E :

For the third property, we have

./u D .0K  /u D 0K u  u D 0E  u D u:

Finally, for the fourth property, we deduce from the first and the second properties that

0K u D 0E D 0E :

Conversely, the identity

u D 0E ;

implies either  D 0K (the second property), or  ¤ 0K and in this case  is invertible (since
.K; C; / is a commutative field) and we have

1 .u/ D .1 /u D 1  u D u D 0E ;

where we have used axiom (1) in Definition 4.1.1 and the fact that u D 0E : Thus, the fourth
property holds and the proof of Theorem 4.2.1 is complete. t
u
4.3  Subspaces
163 4
4.3 Subspaces

Verifying all the axioms in Definition 3.2.2 seems rather tedious, but as we will see in
many situations, we do not need to verify all of them since in many cases, the vector
spaces under consideration are contained in a larger vector space. For example we can
easily show that the set of 2  2 matrices with real entries is a vector space over R with
the usual matrix operations of addition and multiplication by scalars. But this space is
contained in the larger vector space .Mmn .R/; C; /. So, in this section, we will discuss
how to recognize such vector spaces. We start with the following definition.

Definition 4.3.1 (Subspace)


A nonempty subset F of a vector space E is called a subspace of E if F is itself a
vector space under the addition and multiplication by scalar defined on E.

Now, we should gain something from the fact that F is a subset of the larger space
E. Indeed, to show that F is itself a vector space under the addition and the scalar
multiplication defined on E, it is not necessary to verify all the axioms in Definition 4.1.1
since many of them are “inherited” from E. For example, there is no need to prove that
u C v D v C u for all elements of F since this property is already satisfied for all the
elements of E (recall that .E; C/ is an Abelian group) and hence for all elements of F,
since F is a subset of E. However, other properties are not inheritable. For example, for
two elements u and v in F, u C v is of course in E, but it may not lie in the subset F;
similarly, u is in E for any  in K, but it might not be in F.
Thus, we can easily prove the following theorem.

Theorem 4.3.1 (Characterization of Subspaces)


Let E be a vector space and let F be a nonempty subset of E. Then, F is a subspace of
E if and only if for all u and v in F and for all  in K, we have
1. u  v 2 F, and
2. u 2 F:

ⓘ Remark 4.3.2 The first property in Theorem 4.3.1 means that F is a subgroup of the
group .E; C/ (see Exercise 1.11). In addition, this property can be replaced by

u C v 2 F: (4.3)

Of course, this property alone does not imply that F is a subgroup of .E; C/.
164 Chapter 4 • General Vector Spaces

The proof of Theorem 4.3.1 is very simple and we omit it. We simply need to verify
that the axioms of Definition 4.1.1 are satisfied if and only if the two properties of
Theorem 4.3.1 hold.
Now, we may combine the two properties in Theorem 4.3.1 and state the following
result.
4
Theorem 4.3.3
Let E be a vector space over a field K and F be a nonempty subset of E. Then, F is
subspace of E if and only if for all u and v in F and all  and  in K, we have

u C v 2 F: (4.4)

Proof
First, assume that F is a subspace of E. Then, by the second property in Theorem 4.3.1, u
and v belong to F; therefore, (4.3) yields (4.4). Conversely, assume that (4.4) is satisfied,
then we get for  D  D 1 the property (4.3). In addition, for  D 0K , we get the second
property in Theorem 4.3.1. This completes the proof of Theorem 4.3.3. t
u

ⓘ Remark 4.3.4 From now on, we consider Theorem 4.3.1 or Theorem 4.3.3 as the
definition of a subspace.

Example 4.6
Let E be a vector space over a field K and F be a subspace of E. Then F always contains 0E .
In addition, the set f0E g is itself a subspace of E which is the smallest subspace of E, called
the zero subspace of E. Also, E is a subspace of itself and it is the largest subspace of E. 

Example 4.7
Let E be a vector space over a field K and u be an element of E. Then, the set

F D fu W  2 Kg

is a subspace of E. 

Solution
Let v and w be two elements of F. Then there exist  and  in K such that

v D u and w D u:

Then

v C w D u C u D . C /u;

which means that v C w 2 F, since  C  2 K.


4.3  Subspaces
165 4
Next, for any ˛ 2 K, we have

˛v D ˛.u/ D .˛/u:

This shows that ˛v 2 F, since ˛ 2 K. Consequently, F is a subspace of E. J

Example 4.8 (Orthogonal Subspace)


Consider the vector space Rn over the field R and let u be a vector of Rn . Denote by F the
set which consists of all the vectors of Rn that are orthogonal to u. Show that F is a subspace
of Rn . 

Solution
We may define F to be the set of all vectors whose dot product with u is zero (Defini-
tion 3.4.1): if v 2 F, then u  v D 0. It is clear that F is not empty, since u  0Rn D 0:
This implies that 0Rn 2 F. Now, let u and w be two elements of F and ˛ and ˇ be two real
numbers. Then we have, by using the properties of the dot product,

u  .˛v C ˇw/ D ˛.u  v/ C ˇ.u  w/ D 0;

and so ˛v C ˇw 2 F. Hence, F is a subspace of Rn . J

Example 4.9 (Subspace of Linear Combinations)


Let E be a vector space over a field K and u1 ; u2 ; : : : ; un be n elements of E. Then the set
( )
X n
F D u D 1 u1 C 2 u2 C    C n un D i ui ;
iD1

where 1 ; 2 ; : : : ; n are elements of K is a subspace of E. F it is called the subspace of


linear combination of u1 ; u2 ; : : : ; un , or the subspace spanned by u1 ; u2 ; : : : ; un . 

Solution
First of all, it is clear that the set F is not empty since it does contain 0E because

0E D 0K u1 C 0K u2 C    C 0K un :

Second, let u and v be two elements of F and ˛ and ˇ be two elements of K. Then there exist
i ; 1  i  n, and i ; 1  i  n, in K such that

u D 1 u1 C 2 u2 C    C n un and v D 1 u1 C 2 u2 C    C n un :

Consequently,

˛u C ˇv D ˛.1 u1 C 2 u2 C    C n un / C ˇ.1 u1 C 2 u2 C    C n un /

D .˛1 /u1 C .˛2 /u2 C    C .˛n /un C .ˇ1 /u1 C .ˇ2 /u2 C    C .ˇn /un

D .˛1 C ˇ1 /u1 C .˛2 C ˇ2 /u2 C    C .˛n C ˇn /un ;


166 Chapter 4 • General Vector Spaces

which is an element of F, since ˛i Cˇi ; i  1  n are elements of K. Thus, F is a subspace


of E. J

Example 4.10 (Subspaces of Even and Odd Functions)


A real-valued function defined on the real line is said to be an even function if f .x/ D f .x/
4 for all x in R and it is called an odd function if f .x/ D f .x/ for all x in R. Let F1 be the set
of all even functions and F2 the set of all odd functions. Show that F1 and F2 are subspaces
of the space F .R; R/ with the operations of addition and multiplication by scalars defined in
Example 4.4. 

Solution
We will show that F1 is a subspace of F .R; R/, leaving it to the reader to show that F2 is
also a subspace of F .R; R/.
First, it is clear that F1 is not empty since the zero function 0F .R;R/ , defined as

0F .R;R/ W R ! R W x 7! 0F .R;R/ .x/ D 0R ;

is an element of F1 since 0F .R;R/ .x/ D 0F .R;R/ .x/ D 0R for all x in R.


Now, consider two functions f and g in F1 and let ˛ and ˇ be two real numbers. Then,

f .x/ D f .x/; and g.x/ D g.x/

for all x in R, and so

.˛f C ˇg/.x/ D ˛f .x/ C ˇg.x/

D ˛f .x/ C ˇg.x/

D .˛f C ˇg/.x/

for all x in R, This means that ˛f C ˇg is an even function and thus it is an element of F1 .
Therefore, F1 is a subspace of F .R; R/. J

Example 4.11 (Subspaces of Diagonal and Triangular Matrices)


Consider the space of 2  2 square matrices M2 .R/ over the field R.
We define F1 to be the set of upper triangular matrices of the form
" #
ab
0d

and F2 to be the set of diagonal matrices of the form


" #
a0
;
0d
4.3  Subspaces
167 4
with a; b and d are real numbers. Show that F1 and F2 are subspaces of M2 .R/. 

Solution
First, it is clear that
" #
00
0M2 .R/ D
00

belongs to both F1 and F2 . Second, it is clear that the sum of two upper triangular matrices
is an upper triangular matrix and also the multiplication of an upper triangular matrix by a
scalar yields an upper triangular matrix. This means that F1 is a subspace of M2 .R/. We can
show similarly that F2 is a subspace of M2 .R/. J

Example 4.12 (Null Space)


In the vector space Rn , consider the set

N D N .A/ D f u is a column vector in Rn such that Au D 0Rm g; 1

where A is a matrix in Mmn .R/. Show that N is a subspace of Rn . This subspace is called
the null space of the matrix A. 

Solution
First of all N is not empty, since A0Rn D 0Rm , which means that 0Rn belongs to N . Now,
let u and v be two elements in N and ˛ and ˇ be two elements of R. Then, we have

Au D 0Rm and Av D 0Rm :

Now, using the properties of matrices, we have

A.˛u C ˇv/ D ˛.Au/ C ˇ.Av/


D ˛0Rm C ˇ0Rm D 0Rm :

Hence, N is a subspace of Rn . J

In the following theorem we will show that if we have a family of subspaces of a vector
space E, then we can always construct other subspaces from this family.

Theorem 4.3.5 (Intersection of Subspaces)


Let E be a vector space over a field K and let F1 and F2 be two subspaces of E. Then

F D F1 \ F2

(Continued )

1
We may also denote N .A/ D Ker.A/.
168 Chapter 4 • General Vector Spaces

Theorem 4.3.5 (continued)


is also a subspace of E. More generally, if Fi is a family of subspaces of E indexed by
a nonempty set I, then
\
4 FD Fi ;
i2I

is a subspace of E.

Proof
To simplify things, we prove the above theorem for two vector spaces F1 and F2 . The same
prove can be adapted for the general case.
It is clear that

F D F1 \ F2

is not empty, since 0E belongs to both subspaces F1 and F2 due to the fact that both are
subspaces of E.
Next, consider two elements u and v in F and two elements ˛ and ˇ in K. Then, u and
v are also elements of F1 and of F2 . Then, we deduce that ˛u C ˇv is an element of F1 and
F2 , again because F1 and of F2 are subspaces of E, and therefore, ˛u C ˇv is an element of
F. This shows that F is a subspace of E. t
u

ⓘ Remark 4.3.6 The subspace F spanned by u1 ; u2 ; : : : ; un , defined in Example 4.9, is the


smallest subspace containing u1 ; u2 ; : : : ; un . In fact, F is the intersection of all subspaces
of E containing u1 ; u2 ; : : : ; un .

ⓘ Remark 4.3.7 If F1 and F2 are two subspaces of a vector space E, then the union

F D F1 [ F2

is not a vector space in general, unless F1 contains F2 or F2 contains F1 . See


Exercise 4.9 for more details.

As we have seen above, the union of two subspaces is not necessarily a subspace.
However, for any subspaces F1 and F2 of a vector space E, there is at least one subspace
containing F1 and F2 , called their sum and denoted F1 C F2 .
4.3  Subspaces
169 4

Theorem 4.3.8 (Sum of Two Subspaces)


Let E be a vector space over a field K and F1 and F2 be two subspaces of E. Then, the
set
˚
F D F1 C F2 D u 2 E such that u D u1 C u2 with u1 2 F1 and u2 2 F2

is a subspace of E. In fact, F is the smallest subspace containing F1 and F2 .

Proof
It is clear that 0E belongs to both F1 and F2 sine both are subspaces of E. In addition, since
we can write

0E D 0E C 0E ;

0E belongs to F, and so F is not empty.


Next, let u and v be two elements of F and ˛ and ˇ be two elements of K. Thus,

u D u1 C u2 and v D v1 C v2 ;

with u1 ; v1 2 F1 and u2 ; v2 2 F2 . Then,

˛u C ˇv D ˛.u1 C u2 / C ˇ.v1 C v2 /
D .˛u1 C ˇv1 / C .˛u2 C ˇv2 /:

Now, since F1 and F2 are subspaces of E, ˛u1 C ˇv1 2 F1 and ˛u2 C ˇv2 2 F2 , whence
˛u C ˇv 2 F. Consequently, F is a subspace of E.
Now, if U is another subspace of E containing F1 and F2 , then U contains all the elements
u1 of F1 as well as all the elements u2 of F2 , hence, it contains all the elements u D u1 C u2 ,
since U is a subspace. Thus, U contains all the elements u of F. Hence, F1 CF2 is the smallest
subspace containing F1 and F2 . t
u

ⓘ Remark 4.3.9 Let E be a vector space over a field K and F be a subspace of E. Define
˚
CE .F/ D u 2 E such that u … F :

Then we have

E D F C CE .F/;

but CE .F/ is not a subspace of E since it does not contain 0E .


170 Chapter 4 • General Vector Spaces

4.3.1 Direct Sum Decomposition

We have seen in Theorem 3.4.1, that if we fix a vector v in Rn , then each vector u in Rn
can be written in exactly one way in the form

4 u D w1 C w2 ; (4.5)

where w1 D v and w2 is orthogonal to v. In other words, we consider Rn as a vector


space over the field R and introduce the sets F1 consisting of all vectors parallel to v,

F1 D fw1 D v; with  2 Rg

and the set F2 of all vectors orthogonal to v,

F1 D fw2 such that v  w2 D 0g :

We have seen in Example 4.7 that F1 is a subspace of Rn and in Example 4.8 that F2 is
also a subspace of Rn .
Form (4.5), we can say that

Rn D F1 C F2 ;

with the additional property, that each element of Rn can be written in a unique way as
the sum of an element of F1 and an element of F2 . In this case, we say that Rn is the
direct sum of F1 and F2 . More generally, we have

Definition 4.3.2 (Direct Sum)


Let F1 and F2 be two subspaces of a vector space E. Then E is said to be the direct
sum of F1 and F2 , and one writes

E D F1 ˚ F2 ;

if for each u in E there exist unique vectors v in F1 and w in F2 such that

u D v C w:

Theorem 4.3.10
Let F1 and F2 be two subspaces of a vector space E. Then E is the direct sum of F1
and F2 if and only if E D F1 C F2 and F1 \ F2 D f0E g.
4.3  Subspaces
171 4
Proof
First, assume that E D F1 ˚ F2 . Let u be an element in F1 \ F2 . Then we can write u as

u D u C 0E ;

with u 2 F1 and 0E 2 F2 , or

u D 0E C u;

with 0E 2 F1 and u 2 F2 . Since u can be written in exactly one way as the sum of an element
of F1 and an element of F2 , we deduce that u D 0E . Therefore, F1 \ F2 D f0E g.
Conversely, assume that E D F1 C F2 and F1 \ F2 D f0E g and let u be an element of E
such that

u D v1 C w1 D v2 C w2

with v1 ; v2 2 F1 and w1 ; w2 2 F2 . Then,

v1  v2 D w1  w2 :

Thus, we deduce that v1  v2 belongs to F2 and thus belongs to F1 \ F2 , and w1  w2 belongs


to F1 and therefore belongs to F1 \ F2 . Since F1 \ F2 D f0E g, we deduce that

v1  v2 D w1  w2 D 0E :

This gives v1 D v2 and w1 D w2 . Therefore, E D F1 ˚ F2 . t


u

Example 4.13
Consider the two subspaces F1 of even functions and F2 of odd functions defined in
Example 4.10. Show that

F .R; R/ D F1 ˚ F2 :

Solution
Let f be a function in F .R; R/. Then for all x in R, f can be written as

f .x/ C f .x/ f .x/  f .x/


f .x/ D C
2 2
D g.x/ C h.x/;

with

f .x/ C f .x/ f .x/  f .x/


g.x/ D and h.x/ D :
2 2
172 Chapter 4 • General Vector Spaces

It is clear that g is an even function, since

f .x/ C f .x/
g.x/ D D g.x/
2

and h is an odd function since,


4
f .x/  f .x/
h.x/ D D h.x/:
2

Thus, we have shown that F .R; R/ D F1 C F2 : It remains to prove that F1 \ F2 D f0F .R;R/ g.
Let f be an element in F1 \ F2 ; then f is odd and even at the same time. That is, for all x in
R, we have

f .x/ D f .x/ and f .x/ D f .x/:

This gives f .x/ D 0 for all x in R. Hence f D 0F .R;R/ . Therefore, F1 \ F2 D f0F .R;R/ g and
consequently, Theorem 4.3.10 implies that F .R; R/ D F1 ˚ F2 . J

4.4 Linear Independence

In this section, we investigate whether the elements in a given set of a vector space are
interrelated in the sense that one element can be written as a linear combination of the
others, as we have seen in Example 4.9. As we will see later, it is really important to
know if such a relation exists. Now, we start with the following definition.

Definition 4.4.1 (Linear Dependence and Independence)


Let E be a vector space over a field K and u1 ; u2 ; : : : ; un be elements of E. The
elements u1 ; u2 ; : : : ; un are linearly dependent (or the family of elements
u1 ; u2 ; : : : ; un is linearly dependent) if there exist 1 ; 2 ; : : : ; n 2 K, not all 0K ,
such that

1 u1 C 2 u2 C    C n un D 0E : (4.6)

Otherwise, they are linearly independent. In other words, u1 ; u2 ; : : : ; un are linearly


independent if

1 u1 C 2 u2 C    C n un D 0E

implies that 1 D 2 D    D n D 0K :
4.4  Linear Independence
173 4
Example 4.14
In the vector space R2 , the two vectors e1 D .1; 0/ and e2 D .0; 1/ are linearly independent,
since the relation

1 e1 C 2 D 0R2 ;

implies that 1 D 2 D 0. But the vectors u1 D .1; 1/ and u2 D .2; 2/ are linearly dependent,
since

2u1 C u2 D 0R2 :

Example 4.15
Show that in the vector space F .R; R/, the two functions est and ert are linearly independent
if and only if s ¤ r. 

Solution
First, assume that s ¤ r and let ˛ and ˇ be two real numbers such that

˛est C ˇert D 0:

Differentiating both sides in the above equation, we get

˛sest C ˇrert D 0:

Multiplying the first equation by s and subtracting the result from the second equation, we
obtain ˇ.r  s/et D 0. Since s ¤ r, we have ˇ D 0. Plugging this into the first equation, we
obtain ˛ D 0. Hence the two functions est and ert are linearly independent.
Conversely, assume that the functions est and ert are linearly independent and r D s.
Then, we have

est  ert D 0;

which contradicts the linear independence of est and ert . Thus, necessarily r ¤ s. J

We shall often use the following linear dependence criterion.

Theorem 4.4.1
Let E be a vector space over a field K and u1 ; u2 ; : : : ; un be elements of E. The elements
u1 ; u2 ; : : : ; un are linearly dependent if and only if (at least) one of the elements is a
linear combination of the rest.
174 Chapter 4 • General Vector Spaces

Proof
First, assume for instance that u1 is a linear combination of the other elements. Then, there
exist ˛2 ; ˛3 ; : : : ; ˛n 2 K such that

u1 D ˛2 u2 C ˛3 u3 C    C ˛n un :
4
That is

u1  ˛2 u2  ˛3 u3      ˛n un D 0E :

Hence, (4.6) is satisfied with 1 D 1; i D ˛i ; i D 2; : : : ; n:


Conversely, assume that u1 ; u2 ; : : : ; un are linearly dependent. Then, there exists at least
one p, 1  p  n, such that p ¤ 0K and

1 u1 C 2 u2 C    C p up C    C n un D 0E :

This gives

1 2 n
up D  u1  u2      un ;
p p p

i.e., up is a linear combination of the remaining elements. t


u

Theorem 4.4.2
Let E be a vector space over a field K and u1 ; u2 ; : : : ; un be elements of E. If
u1 ; u1 ; : : : ; un are linearly independent, then ui ¤ 0E for all 1  i  n. In particular,
if u is an element of E, then the family fug is linearly independent if and only if
u ¤ 0E .

Proof
The proof is straightforward since if there is 1  p  n with up D 0E , then, we have for
some  ¤ 0K ,

0K u1 C 0K u2 C    C up C    C 0K un D 0E ;

which contradicts the fact that u1 ; u2 ; : : : ; un are linearly independent. t


u
4.5  Bases of Vector Spaces
175 4

Theorem 4.4.3
Let E be a vector space over a field K and u1 ; u2 ; : : : ; un be linearly independent
elements of E. Let 1 ; 2 ; : : : ; n and 1 ; 2 ; : : : ; n be elements of K such that

1 u1 C 2 u2 C    C n un D 1 u1 C 2 u2 C    C n un : (4.7)

Then i D i ; i D 1; : : : ; n.

Proof
We can write (4.7) as

.1  1 /u1 C .2  2 /u2 C    C .n  n /un D 0E :

Since u1 ; u2 ; : : : ; un are linearly independent, it follows that

i D i ; i D 1; : : : ; n:

t
u

4.5 Bases of Vector Spaces

In this section, our main goal is to show how a small finite set of vectors or elements of a
vectors space (called a basis) can be used to describe all the other vectors in that vector
space. We introduce this fundamental property of basis in linear algebra. To explain the
idea, consider the vector space R2 and take the two vectors e1 D .1; 0/ and e2 D .0; 1/
of R2 . Then any vector u D .x; y/ of R2 can be written as a linear combination of the
vectors e1 and e2 as

u D xe1 C ye2 :

In this case, we say that the set fe1 ; e2 g spans the vector space R2 . In addition, since the
vectors e1 and e2 are linearly independent (Example 4.14), the above linear combination
is unique (Theorem 4.4.3). That is to say, there is one and only one way to write u
as a combination of the two vectors e1 and e2 . So, in order to be able to write u in a
unique way as a linear combination of the two vectors e1 and e2 , two properties should
be satisfied:
▬ The set fe1 ; e2 g spans R2 , and
▬ the two vectors e1 and e2 are linearly independent.
176 Chapter 4 • General Vector Spaces

Any set of two vectors in R2 satisfying the above two properties is called basis. Now,
we extend the above idea to general vector spaces.

Definition 4.5.1 (Span of a Vector Space)


Let E be a vector space over a field K. We say that the set of elements
4 fu1 ; u2 ; : : : ; un g of E spans E if their linear combinations fill E. That is, for each u in
E, there exist 1 ; 2 ; : : : ; n in K such that

u D 1 u1 C 2 u2 C    C n un :

We may also say that the elements u1 ; u2 ; : : : ; un generate the vector space E and
write

E D span.u1 ; u2 ; : : : ; un /:

Example 4.16
It is trivial to see that in the vector space Rn , the set of vectors fe1 ; e2 ; : : : ; en g with

ei D .0; 0; : : : ; 0; 1; 0; : : : ; 0/ .0 everywhere except the ith component/

spans Rn . 

Example 4.17
Let Pn be the set of all polynomials with real coefficients and of degree less than or equal
to n. One can easily show that this set is a vector space over the field R. Now, the family
f1; x; x2 ; : : : ; xn g spans this vector space, since any polynomial p.x/ in Pn can be written as

p.x/ D a0 C a1 x C    C an xn ;

where a0 ; a1 ; : : : ; an are real numbers. 

Definition 4.5.2 (Basis)


Let E be a vector space over a field K. A basis B of the vector space E is a set of
elements of E, B D fu1 ; u2 ; : : : ; un g, which is linearly independent and spans E. In
this case each element u of E can be written in exactly one way as

u D 1 u1 C 2 u2 C    C n un

for some 1 ; 2 ; : : : ; n in K. The elements 1 ; 2 ; : : : ; n are called the


components of u in the basis B.
4.5  Bases of Vector Spaces
177 4
Example 4.18
The set of vectors fe1 ; e2 ; : : : ; en g defined in Example 4.16 is a basis of the vector space Rn
since it spans Rn and is linearly independent. 

Example 4.19
Show that the family f1; x; x2 ; : : : ; xn g is a basis of the vector space Pn defined in
Example 4.17. 

Solution
We have seen in Example 4.17 that the family f1; x; x2 ; : : : ; xn g spans Pn . It remains to show
that this family is linearly independent. So, let 0 ; 1 ; : : : ; n be elements in R such that for
all x in R, we have

0 C 1 x C    C n xn D 0: (4.8)

Since (4.8) holds for all x in R, it also holds for x D 0. Thus, putting x D 0 in (4.8), we get
0 D 0. Now, by taking the derivative of (4.8) with respect to x, we obtain

1 C 22 x C    C nn xn1 D 0: (4.9)

Similarly, taking x D 0 in (4.9), we get 1 D 0. By repeating this process up to the nth


derivative of (4.8), we obtain

0 D 1 D    D n D 0:

Thus, the family f1; x; x2 ; : : : ; xn g is linearly independent and hence forms a basis in Pn . J

Example 4.20
We consider the space M2 .R/ of 2  2 square matrices over the field R.
We define in this space the elements
" # " # " # " #
10 01 00 00
M1 D ; M2 D ; M3 D ; M4 D :
00 00 10 01

Show that the set fM1 ; M2 ; M2 ; M4 g is a basis of M2 .R/. 

Solution
First, we need to show that the family fM1 ; M2 ; M2 ; M4 g spans M2 .R/. So, consider a matrix
" #
ab
MD
cd

in M2 .R/. Then, it is clear that

M D aM1 C bM2 C cM3 C dM4 :

Thus, fM1 ; M2 ; M2 ; M4 g spans M2 .R/.


178 Chapter 4 • General Vector Spaces

Second, we need to show that fM1 ; M2 ; M2 ; M4 g is linearly independent. So, let


1 ; : : : ; 4 2 R be such that

1 M1 C 2 M2 C 3 M3 C 4 M4 D 0M2 .R/ :

That is
4 " # " # " # " # " #
10 01 00 00 00
1 C 2 C 3 C 4 D ;
00 00 10 01 00

or equivalently,
" # " #
1 2 00
D :
3 4 00

This yields 1 D 2 D 3 D 4 D 0: Thus, the family fM1 ; M2 ; M2 ; M4 g is linearly


independent and consequently form a basis in M2 .R/. J

ⓘ Remark 4.5.1 (Non-uniqueness of Bases) A basis in a vector space is not unique,


meaning that we may find more than one basis in the same vector space. As an example,
we can consider the vector space R2 and the two vectors u1 D .1; 2/ and u2 D .1; 3/.
Then, we easily show that fu1 ; u2 g is a basis in R2 . But we have also shown in
Example 4.16 that the set fe1 ; e2 g D f.1; 0/; .0; 1/g is a basis in R2 .

4.6 Dimension of a Vector Space

In this section, we will associate a number to a vector space. Recall that the cardinality
of a set B is the number of elements of B.

Definition 4.6.1 (Dimension of a Vector Space)


Let E be a vector space over a field K. The dimension of E, denoted by dimK E is
defined to be the cardinality of a basis B for E. In addition, the zero vector space is
defined to have dimension zero.

Example 4.21
Let us list the dimensions of some spaces that we have encountered before.
▬ The dimension of Rn is n since fe1 ; e2 ; : : : ; en g is a basis of Rn (Example 4.18).
▬ The dimension of M2 .R/ is 4 since fM1 ; M2 ; M3 ; M4 g is a basis of M2 .R/ (Exam-
ple 4.20). In general, we can easily prove that dimK Mmn .K/ D m  n:
▬ We have proved in Example 4.19 that the set f1; x; x2 ; : : : ; xn g is a basis of the space Pn .
Consequently, dimR Pn D n C 1:


4.6  Dimension of a Vector Space
179 4
Now, we state one of the main theorems in linear algebra.

Theorem 4.6.1 (Existence of a Basis)


Every non-zero vector space E of finite dimension has at least one basis.

The proof of this theorem is based on Zorn’s lemma and is beyond the scope of this
book.

Definition 4.6.2 (Maximal Linearly Independent Set)


Let E be a vector space over a field K and let S be a set of elements of E. Then the
set S is called a maximal linearly independent set if the set S [ fvg is linearly
dependent for every element v in the vector space E, that does not belong to S.

Example 4.22
In R2 , the set S D fe1 ; e2 g D f.1; 0/; .0; 1/g is maximal linearly independent subset, since
for any vector u D .x; y/ in R2 , the set S [ fug D fe1 ; e2 ; ug is linearly independent because
u D xe1 C ye2 . 

Theorem 4.6.2
Let E be a vector space over a field K and let S D fu1 ; u1 ; : : : ; un g be a maximal set of
linearly independent elements of E. Then, S is a basis of E.

Proof
We just need to show that S spans E. That is, each element of E can be expressed as a linear
combination of the elements of S. Let w be an element in E. Since S is a maximal linearly
independent set, the elements w; u1 ; u2 ; : : : ; un are linearly dependent. Hence there exists
0 ; 1 ; : : : ; n in K, not all 0K , such that

0 w C 1 u1 C    C n un D 0E : (4.10)

It is clear that 0 ¤ 0K : otherwise the elements u1 ; u2 ; : : : ; un would be linearly dependent.


Thus, we obtain from (4.10)

1 2 n
wD u1  u2 C    C un :
0 0 0

This shows that w is a linear combination of the elements of S. Thus, S is a basis of E. t


u
180 Chapter 4 • General Vector Spaces

Theorem 4.6.3
Different bases of the same vector space E have the same number of elements (the
same cardinality).

4 In order to prove this theorem, we first establish the following lemma.

ⓘ Lemma 4.6.4 Let E be a vector space over a field K and B D fu1 ; u2 ; : : : ; un g be a


basis of E. Let v1 ; v2 ; : : : ; vm be elements in E. If m > n, then v1 ; v2 ; : : : ; vm are linearly
dependent. That is, any linearly independent set S of n elements is maximal.

Proof
To prove this lemma, we assume that v1 ; v2 ; : : : ; vm are linearly independent. Since B D
fu1 ; u1 ; : : : ; un g is a basis of E, each element from the set fv1 ; v2 ; : : : ; vm g can be written in a
unique way as a linear combination of the elements of B. For instance, we have

v1 D 1 u1 C 2 u2 C    C n un

for some 1 ; 2 ; : : : ; n in K. By assumption, we know that v1 ¤ 0E , therefore, there exists at


least 1  p  n, such that p ¤ 0K (without loss of generality, we may assume that p D 1).
Hence, we get

1 2 n
u1 D v1  u2      un :
1 1 1

This implies that the set

G1 D fv1 ; u2 ; : : : ; un g

spans E. Indeed, for w in E, we have, since B is a basis,

w D a1 u1 C a2 u2 C    C an un
 
1 2 n
D a1 v1  u2      un C a2 u2 C    C an un
1 1 1
   
a1 a1 2 a1 n
D v1 C a2  u2 C    C an  un
1 1 1

for some a1 ; a2 ; : : : ; an in K. Next, since G1 spans E, we have, for some 1 ; 2 ; : : : ; n in K

v2 D 1 v1 C 2 u2 C    C n un :

Now, it is clear that there exists at least 2  j  n, such that j ¤ 0K : otherwise, if 2 D


3 D    D n D 0K , then we have v2 D 1 v1 , which contradicts the assumed linear
independence of the family fv1 ; v2 ; : : : ; vm g . Thus, as before, without loss of generality, we
4.6  Dimension of a Vector Space
181 4
may assume that 2 ¤ 0K . Then, we have as before,

1 1 3 n
u2 D v2  v1  u3      un :
2 2 2 2

Hence, the set

G2 D fv1 ; v2 ; u3 ; : : : ; un g

also spans E. The idea now is to continue our procedure and to replace u1 ; u2 ; : : : by
v1 ; v2 ; : : : to conclude at the end (by induction) that

Gn D fv1 ; v2 ; : : : ; vn g

spans E, and since m > n, the elements vnC1 ; vnC2 ; : : : ; vm are linear combinations
of v1 ; v2 ; : : : ; vn , which contradicts our assumption on the linear independence of
v1 ; v2 ; : : : ; vm . This concludes our proof. t
u

Proof of Theorem 4.6.3


Let B1 D fu1 ; u2 ; : : : ; un g and B2 D fv1 ; v2 ; : : : ; vm g be two bases of the same vector space
E. Lemma 4.6.4 implies that it is not possible to have m > n or n > m. Then necessarily
m D n. t
u

Theorem 4.6.5
Let E be a vector space over a field K with dimK E D n and let u1 ; u2 ; : : : ; un be
linearly independent elements of E. Then the set fu1 ; u2 ; : : : ; un g constitutes a basis
of E.

Proof
According to Lemma 4.6.4, the set fu1 ; u2 ; : : : ; un g is a maximal set of linearly independent
elements of E, thus, Theorem 4.6.2, implies that this set is a basis of E. t
u

ⓘ Remark 4.6.6 Let E be a vector space over a field K with dimK E D n. Then we deduce
from above that:
▬ Any set of linearly independent elements of E has at most n elements.
▬ Any set that has at least n C 1 elements is linearly dependent.
182 Chapter 4 • General Vector Spaces

4.6.1 Dimension of a Subspace

As we have said before, any subspace F of a vector space E is itself a vector space. So,
we need to find the dimension of this subspace and compare it with the dimension of E.
Thus, we have the following theorem.
4
Theorem 4.6.7 (Dimension of a Subspace)
Let E be a vector space over a field K with dimK E D n .n > 0, that is E ¤ f0E g). Let
F be a subspace of E with F ¤ f0E g. Then,

dimK F  dimK E:

In particular, if

dimK F D dimK E; then F D E:

Proof
Suppose that dimK F > dimK E. Then there exists at least one basis in F with at least n C 12
elements, that is, there exists at least one linearly independent set of F with at least n C 1
elements. But each linearly independent set of F is also linearly independent in E. Hence,
we obtain a linearly independent set in E with at least n C 1 elements, which contradicts
Remark 4.6.6. Thus, dimK F  dimK E.
Now, if dimK F D dimK E, then there exists a basis B D fu1 ; u2 ; : : : ; un g in F. Then,
B is also a basis of E (Theorem 4.6.5). Therefore, any v 2 E can be written as a linear
combination of elements of F of the form

v D 1 u1 C 2 u2 C    C n un

with 1 ; 2 ; : : : ; n 2 K. Consequently, v also belongs to F. This yields E F and since


F E (by definition), we have F D E. t
u

4.6.2 Construction of a Basis

Suppose that we are given r linearly independent vectors u1 ; u2 ; : : : ; ur of a vector space


E of dimension n, with r < n. Then, we may ask the following question: is it possible to
extend the set of vectors fu1 ; u2 ; : : : ; ur g so as to obtain a basis of E? The answer turns

2
This basis has at least one element u0 ¤ 0E since F ¤ f0E g.
4.7  Exercises
183 4
out to be positive:

Theorem 4.6.8 (Construction of a Basis)


Let E be a vector space over a field K with dimK E D n: Let r be a positive integer
with r < n and let v1 ; v2 ; : : : ; vr be linearly independent elements of E. Then, there
exist elements vrC1 ; : : : ; vn of E such that the set

fv1 ; v2 ; : : : ; vn g

is a basis of E.

Proof
Since dimK E D n and r < n, the set S D fv1 ; v2 ; : : : ; vr g cannot be a basis of E, and then by
Theorem 4.6.2, S cannot be a maximal set of linearly independent elements of E. Hence, by
Definition 4.6.2, there exists vrC1 2 E such that the set S [ fvrC1 g is linearly independent.
Now, if r C 1 D n, then according to Theorem 4.6.5, S is a basis of E. If r C 1 < n, then we
repeat the same procedure until we construct (by induction) a set of n linearly independent
elements fu1 ; u2 ; : : : ; un g of E and then, this should be a basis of E due to the same reason as
before (Theorem 4.6.5). t
u

Example 4.23
We know that dimR R3 D 3. Consider the two vectors u D .1; 0; 1/ and v D .1; 3; 2/. It is
clear that u and v are linearly independent. If we now take the vector w D .1; 2; 4/, then
we may easily show that u; v, and w are linearly independent and thus form a basis of R3 . 

4.7 Exercises

Exercise 4.1
Consider the vector space F .R; R/ introduced in Example 4.4, defined over the field R. Show
that the three functions in this vector space

f .x/ D sin x; g.x/ D sin.x C p/; and h.x/ D cos.x C q/;

where p and q are two real numbers, are linearly dependent. 

Solution
Using the sine and cosine laws, we have

sin.x C p/ D sin x cos p C cos x sin p (4.11)


184 Chapter 4 • General Vector Spaces

and

cos.x C q/ D cos x cos q  sin x sin q: (4.12)

Multiplying (4.11) by cos q and (4.12) by . sin p/ and adding the results, we obtain
4
cos q sin.x C p/ C sin p cos.x C q/ D cos q cos p sin x C sin q sin p sin x
D .cos q cos p C sin q sin p/ sin x

D cos.p  q/ sin x:

Denoting ˛ D cos q; ˇ D sin p and  D  cos.p  q/, we get

˛ sin.x C p/ C ˇ cos.x C q/ C  sin x D 0;

but ˛; ˇ and  are not all zero, for any real numbers p and q. Consequently, f ; g, and h are
linearly dependent. J

Exercise 4.2 (Components of a Vector in a Basis)


Consider the space R4 over the field R. Let

u1 D .1; 2; 1; 2/; u2 D .2; 3; 0; 1/; u3 D .1; 3; 1; 0/; u4 D .1; 2; 1; 4/

be vectors in R4 .
1. Show that the set B D fu1 ; u2 ; u3 ; u4 g is a basis in R4 .
2. Find the components of the vector

v D .7; 14; 1; 2/

in this basis.

Solution
1. We have seen in Example 4.21 that dimR R4 D 4. Now, since B has also four elements,
then according to Theorem 4.6.5, in order to show that B is a basis, it is enough to prove that
B is a linearly independent set. So, let 1 ; 2 ; 3 and 4 be elements in R satisfying

1 u1 C 2 u2 C 3 u3 C 4 u4 D 0R4 ; (4.13)

that is

1 .1; 2; 1; 2/ C 2 .2; 3; 0; 1/ C 3 .1; 3; 1; 0/ C 4 .1; 2; 1; 4/ D .0; 0; 0; 0/:
4.7  Exercises
185 4
This gives (see  Chap. 3) the system of equations
8
ˆ 1 C 22 C 3 C 4 D 0;
ˆ
ˆ
<
21 C 32 C 33 C 24 D 0;
ˆ
ˆ 1  3 C 4 D 0;

21  2 C 44 D 0:

Now, it is clear that the above system has .1 ; 2 ; 3 ; 4 / D .0; 0; 0; 0/ as a solution. This
solution is unique, since the matrix
2 3
1 2 1 1
6 2 3 3 27
6 7
AD6 7
4 1 0 1 15
2 1 0 4

is invertible (Theorem 1.2.9), because (Theorem 2.4.8)

det.A/ D 2 ¤ 0:

Thus, (4.13) implies 1 D 2 D 3 D 4 D 0. Hence, the vectors u1 ; u2 ; u3 and u4 are


linearly independent in R4 and thus B is a basis of R4 .
2. Since B is a basis of R4 , there exists ˛1 ; ˛2 ; ˛3 , and ˛4 in R such that

v D ˛1 u1 C ˛2 u2 C ˛3 u3 C ˛4 u4 : (4.14)

Hence, ˛1 ; ˛2 ; ˛3 and ˛4 are the components of v in the basis B. To find these components,
we proceed as before and obtain the system of equations
8
ˆ
ˆ ˛1 C 2˛2 C ˛3 C ˛4 D 7;
ˆ
<
2˛1 C 3˛2 C 3˛3 C 2˛4 D 14;
ˆ ˛1  ˛3 C ˛4 D 1;
ˆ

2˛1  ˛2 C 4˛4 D 2:

Its solution is given by (see  Sect. 1.2.3)


2 3 2 3
˛1 7
6 7 6 7
6 ˛2 7 6 14 7
6 7 D A1 6 7: (4.15)
4 ˛3 5 4 1 5
˛4 2
186 Chapter 4 • General Vector Spaces

We can easily use the method described in Theorem 2.5.1 to find


2 3
17=2 5 13=2 2
6 7
6 3 2 3 1 7
A1 D6 7;
4 5 3 3 1 5
4 7=2 2 5=2 1

so (4.15) becomes
2 3 2 32 3 2 3
˛1 17=2 5 13=2 2 7 0
6 7 6 76 7 6 7
6 ˛2 7 6 3 2 3 1 7 6 14 7 6 2 7
6 7D6 76 7 D 6 7:
4 ˛3 5 4 5 3 3 1 5 4 1 5 4 2 5
˛4 7=2 2 5=2 1 2 1

Consequently, the components of the vector v in the basis B are .0; 2; 2; 1/. J

Exercise 4.3 (Dimension of a Direct Sum)


Let E be a finite-dimensional vector space over a field K and let F1 and F2 be two subspaces
of E.
1. Show that

dimK .F1 C F2 / D dimK F1 C dimK F2  dimK .F1 \ F2 /: (4.16)

2. Deduce that if E D F1 ˚ F2 , then

dimK E D dimK F1 C dimK F2 : (4.17)

Solution
1. Denote

dimK F1 D n; dimK F2 D m; and dimK .F1 \ F2 / D r:

Of course, here r  min.n; m/ since F1 \ F2 F1 and F1 \ F2 F2 (Theorem 4.6.7).


To prove (4.16), we will construct a basis of the subspace F1 C F2 and find the dimension
of F1 C F2 by proving that the number of elements (cardinality) of this basis is exactly
nCmr. Since F1 \F2 is a subspace of E (Theorem 4.3.5), then according to Theorem 4.6.1,
this subspace has a basis. So, let S D fw1 ; w2 ; : : : ; wr g be a basis of F1 \ F2 . Thus, S is a
linearly independent set in F1 \ F2 and therefore, it is a linearly independent set in both
F1 and F2 . Theorem 4.6.8 implies that there exist elements urC1 ; : : : ; un of F1 and elements
vrC1 ; : : : ; vm of F2 , such that

B1 D fw1 ; w2 ; : : : ; wr ; urC1 ; : : : ; un g
4.7  Exercises
187 4
is a basis of F1 and

B2 D fw1 ; w2 ; : : : ; wr ; vrC1 ; : : : ; vm g

is a basis of F2 .
Now,

B D B1 [ B2 D fw1 ; w2 ; : : : ; wr ; urC1 ; : : : ; un ; vrC1 ; : : : ; vm g

contains n C m  r elements. Consequently, it is enough to show that B is a basis of F1 C F2 .


That is, B spans F1 C F2 and the elements of B are linearly independent elements in F1 C F2 .
First, it is clear that if u is an element of B, then u is either in F1 or in F2 . If u is in F1 , then
we may write it as

u D u C 0E ;

which is an element of F1 C F2 since 0E belongs to F2 (F2 is a subspace). Similarly, if u


belongs to F2 , we write

u D 0E C u

which is an element of F1 C F2 . Thus, it is clear that B is a set of F1 C F2 .


Now, let z be an element in F1 C F2 . Then there exist z1 in F1 and z2 in F2 , such that

z D z1 C z2 :

Since B1 is a basis of F1 and B2 is a basis of F2 , we have

z1 D 1 w1 C 2 w2 C    C r wr C rC1 urC1 C    C n un ;

and

z2 D 1 w1 C 2 w2 C    C r wr C rC1 vrC1 C    C m vm ;

where i ; i D 1; : : : ; n, and j ; j D 1; : : : ; m, are elements of K. Hence, we may write z as

z D z1 C z2 D .1 C 1 /w1 C .2 C 2 /w2 C    C r wr

CrC1 urC1 C    C n un C rC1 vrC1 C    C m vm :

So, z is a linear combination of the elements of B, which shows that B spans F1 C F2 .


Next, we want to show that the elements of B are linearly independent. Let ˛i ; i D
1; : : : r; j ; j D r C 1; : : : ; n, and k ; k D r C 1; : : : ; m, be elements of K satisfying

˛1 w1 C ˛2 w2 C    C ˛r wr C rC1 urC1 C    C n un C rC1 vrC1 C    C n vn D 0E ;


188 Chapter 4 • General Vector Spaces

or equivalently

X
r X
n X
m
˛i wi C j uj C k vk D 0E : (4.18)
iD1 jDrC1 kDrC1

4 This implies that

X
m X
r X
n
k vk D  ˛i wi  j uj ;
kDrC1 iD1 jDrC1

X
m
which shows that k vk belongs to F1 and hence to F1 \ F2 . Thus, since S D
kDrC1
fw1 ; w2 ; : : : ; wr g is a basis of F1 \ F2 , there exist ˇ1 ; ˇ2 ; : : : ; ˇr in K such that

X
m X
r
k vk D ˇi wi ;
kDrC1 iD1

or equivalently,

X
m X
r
k vk  ˇi wi D 0E :
kDrC1 iD1

Since B2 is a linearly independent set, this last relation yields

ˇ1 D ˇ2 D    D ˇr D rC1 D    D m D 0K : (4.19)

By the same method, we may write, using (4.18),

X
n X
r X
m
j uj D  ˛i wi  k vk :
jDrC1 iD1 kDrC1

X
n
This shows that j uj is an element in F1 \ F2 and as before, we can show, by using the
jDrC1
fact that B1 is a linearly independent set, that

rC1 D    D m D 0K : (4.20)

Next, thanks to (4.19) and (4.20), relation (4.18) becomes

X
r
˛i wi D 0E ;
iD1

which gives ˛1 D ˛2 D    D ˛r D 0K . J
4.7  Exercises
189 4
Consequently, the set B is linearly independent and therefore, is a basis of F1 C F2 .
Hence,

dimK .F1 C F2 / D n C m  r D dimK F1 C dimK F2  dimK .F1 \ F2 /:

2. If E D F1 ˚F2 , then F1 \F2 D f0E g (Theorem 4.3.10) and then dimK .F1 \F2 / D
0. Thus, by (4.16), the identity (4.17) holds.

Exercise 4.4 (Direct Sum of Symmetric and Skew-Symmetric Matrices)


We consider the space of square real matrices Mn .R/. We define the set of symmetric
matrices S to be the set of all matrices in Mn .R/ satisfying AT D A and the set of skew-
symmetric matrices W to be the set of all matrices in Mn .R/ satisfying AT D A.
1. Prove that S and W are subspaces of Mn .R/.
2. Prove that

Mn .R/ D S ˚ W: (4.21)

Solution
1. It is clear that the zero matrix 0 D 0Mn .R/ satisfies

0T D 0 D 0:

So, 0Mn .R/ belongs to both S and W.


Next, we can easily deduce from Theorem 1.4.1 that the sum of two symmetric matrices
is symmetric and the multiplication of a symmetric matrix by a scalar gives a symmetric
matrix. Hence, S is a subspace of Mn .R/. Using the same Theorem 1.4.1, we have for A and
B two elements in W and  in R that

.A C B/T D AT C BT D A  B D .A C B/

and

.A/T D AT D .A/ D .A/:

Hence, W is a subspace of Mn .R/.


2. Now, let A be a matrix in Mn .R/. Then we can write A as

1 1
AD .A C AT / C .A  AT /:
2 2

We have shown in Exercise 2.7 that A C AT is symmetric and A  AT is skew-symmetric.


Thus, we have proved that

Mn .R/ D S C W:
190 Chapter 4 • General Vector Spaces

It remains to show that (see Theorem 4.3.10) S \ W D f0Mn .R/ g. So, let A be an element of
the intersection S \ W. Thus, A satisfies

A D AT and AT D A:

4 This means 2A D 0Mn .R/ , and so A D 0Mn .R/ . Hence, S \ W D f0Mn .R/ g and therefore
(4.21) holds. J

Exercise 4.5 (Linearly Independent Column Vectors of a Square Matrix)


1. Consider the vector space R2 over the field R. Let u1 D .a; b/ and u2 D .c; d/ be two
vectors in R2 . Show that u1 and u2 are linearly independent if and only if

ad  bc ¤ 0:

2. Now, consider the vector space Rn .n  2/ over R and let u1 ; u2 ; : : : ; un be n vectors in


Rn . Show that u1 ; u2 ; : : : ; un are linearly independent (form a basis of Rn ) if and only if

det.A/ ¤ 0;

where A is the matrix in Mn .R/ defined as

A D Œu1 ; u2 ; : : : ; un  (u1 ; u2 ; : : : ; un are the column vectors of A):

3. Show that the vectors v1 D .1; 2; 3/; v2 D .1; 1; 0/, and v3 D .3; 4; 3/ are linearly
dependent in R3 .


Solution
1. Assume that u1 and u2 are linearly independent. Then the equation

1 u1 C 2 u2 D 0R2 ; (4.22)

with 1 and 2 in R, has the trivial solution 1 D 2 D 0 as the uniques solution. Equation
(4.22) is equivalent to the system
(
a1 C c2 D 0;
b1 C d2 D 0:

This system has the trivial solution 1 D 2 D 0 as the unique solution if and only if (see
Theorem 1.2.9) the matrix
" #
ac
AD
bd
4.7  Exercises
191 4
is invertible. That is, if and only if

det.A/ D ad  bc ¤ 0:

2. By the same method, let ui D .ui1 ; ui2 ; : : : ; uin /; 1  i  n, be vectors in Rn . These


vectors are linearly independent if and only if the equation

1 u1 C 2 u2 C    C n un D 0Rn

has the unique trivial solution 1 D 2 D    D n D 0. Expanding this equation, we obtain


8
ˆ
ˆ u11 1 C u21 2 C    C un1 n D 0;
ˆ
ˆ
< u12 1 C u22 2 C    C un2 n D 0;
ˆ :::
ˆ
::
:
::
:
::
:
ˆ

u1n 1 C u2n 2 C    C unn n D 0:

This system has the unique solution 1 D 2 D    D n D 0 if and only if the matrix
2 3
u11 u21 u31 ::: un1
6u ::: un2 7
6 12 u22 u32 7
AD6
6 :: :: :: :: :: 7
7 D Œu1 ; u2 ; : : : ; un 
4 : : : : : 5
u1n u2n u3n : : : unn

is invertible. That is, if and only if det.A/ ¤ 0.


3. We have
2 3
113
6 7
detŒv1 ; v2 ; v3  D det 4 2 1 4 5 D 0:
303

Thus, according to our previous proof, v1 ; v2 , and v3 are linearly dependent in R3 . J

Exercise 4.6
Let E be a vector space over a field K. Let G; F1 , and F2 be three subspaces of E. Show that
if

E D G ˚ F1 D G ˚ F2 and F1 F2 ; (4.23)

then F1 D F2 . 
192 Chapter 4 • General Vector Spaces

Solution
To show that F1 D F2 , it is enough to prove that F2 F1 . Let w2 be an element of F2 , then
w2 is an element of E. Hence, using (4.23), we write w2 as

w2 D v C w1 ;
4 where v 2 G and w1 2 F1 . Since F1 F2 , we deduce that w1 2 F2 . This implies that

v D w2  w1 2 F2

and thus v 2 G \ F2 D f0E g. That is, v D 0E . Therefore, w2 D w1 , and so w2 2 F1 . Hence


F2 F1 . J

Exercise 4.7 (Direct Product of Two Vector Spaces)


Let E and F be two finite-dimensional vector spaces over the same field K. We define the
space (called direct product of E and F) E  F to be the set of all pairs .u; v/ whose first
component is an element u of E and whose second component is an element v of F. We
define the addition in E  F by

.u1 ; v1 / C .u2 ; u2 / D .u1 C u2 ; v1 C v2 /;

where .u1 ; v1 / and .u2 ; v2 / are elements of E  F. Also, for .u; v/ in E  F and for  in K,
we define the multiplication by scalars as

.u; v/ D .u; v/:

1. Show that .E  F; C; / is a vector space over K with 0EF D .0E ; 0F /.


2. Show that

dimK .E  F/ D dimK E C dimK F:

3. We define the sets E1 and F1 as follows:

E1 D E  f0F g D f.u; 0F /; where u 2 Eg

and

F1 D f0E g  F D f.0E ; v/; where v 2 Fg :

Show that E1 and F1 are subspaces of E  F.


4. Prove that

E  F D E1 ˚ F1 : (4.24)


4.7  Exercises
193 4
Solution
1. We leave it to the reader to verify that E  F satisfies the axioms in Definition 4.1.1.
2. Denote dimK E D r and dimK F D s. Then, according to Theorem 4.6.1, there exist
elements u1 ; u2 ; : : : ; ur of E and v1 ; v2 ; : : : ; vs of F such that fu1 ; u2 ; : : : ; ur g is a basis of E
and fv1 ; v2 ; : : : ; vs g is a basis of F. Now define in E  F the sets

B1 D f.u1 ; 0F /; .u2 ; 0F /; : : : ; .ur ; 0F /g

and

B2 D f.0E ; v1 /; .0E ; v2 /; : : : ; .0E ; vs /g:

So, we want to show that the set B D fB1 ; B2 g which consists of the elements of B1 [ B2 is a
basis of E  F. First we prove that B spans E  F. So, let .u; v/ be an element of E  F, with
u 2 E and v 2 F. There exist ˛1 ; ˛2 ; : : : ; ˛r and ˇ1 ; ˇ2 ; : : : ; ˇs in K such that

u D ˛1 u1 C ˛2 u2 C    C ˛s us and v D ˇ1 v1 C ˇ2 v2 C    C ˇs vs :

Then

.u; v/ D ˛1 .u1 ; 0F / C ˛2 .u2 ; 0F / C    C ˛r .ur ; 0F /

Cˇ1 .0E ; v1 / C ˇ2 .0E ; v2 / C    C ˇs .0E ; vs /:

Thus, it is clear that the set B spans E  F. Now, since B1 and B2 are linearly independent
sets, one can easily show that B is a linearly independent set (we leave this to the reader) and
thus conclude that B is a basis of E  F. Hence

dimK .E  F/ D r C s D dimK E C dimK F:

3. We show that E1 is a subspace of E  F. First, it is clear that 0EF D .0E ; 0F / 2 E1 .


Now, let w1 and w2 be two elements of E1 and 1 and 2 be two elements of K. Then, we
have

w1 D .u1 ; 0F / and w2 D .u2 ; 0F /

for some u1 and u2 in E. We have

1 w1 C 2 w2 D 1 .u1 ; 0F / C 2 .u2 ; 0F / D .1 u1 C 2 u2 ; 0F /:

Since E is a vector space, 1 u1 C 2 u2 2 E, and thus 1 w1 C 2 w2 2 E1 . This shows that F1


is a vector subspace of E  F. By the same method, we can show that F1 is a vector subspace
of E  F.
4. To prove (4.24), in view of Theorem 4.3.10, we need to show that

E  F D E1 C F1 and F1 \ F2 D f0EF g:
194 Chapter 4 • General Vector Spaces

So, let w be an element of E  F. Then there exists u 2 E and v 2 F such that

w D .u; v/ D .u; 0F / C .0E ; v/:

Since .u; 0F / 2 E1 and .0E ; v/ 2 F1 , we deduce that E  F D E1 C F1 .


4 Now let w be an element of E1 \ F1 . Then there exists u 2 E and v 2 F such that

w D .u; 0F / D .0E ; v/:

This yields u D 0E and v D 0F . Thus, w D .0E ; 0F / D 0EF . Hence F1 \ F2 D f0EF g, and


so (4.24) holds. J

Exercise 4.8 (Complement of a Subspace)


Let E be a finite-dimensional vector space over a field K, and U be a subspace of E. Show
that there exists a subspace W of E such that

E D U ˚ W:

Solution
Since E has finite dimension, so has U, since U is a subspace of E (Theorem 4.6.7). Denote

dimK E D n; dimK U D m; with m  n:

First, if m D n, then F D E (Theorem 4.6.7) and thus in this case we can take W D f0E g.
Second, if m < n, then, according to Theorem 4.6.1, U has a basis, say, B1 D
fu1 ; u2 ; : : : ; um g. Thus, B is a linearly independent set in U and hence is a linearly
independent set in E. By Theorem 4.6.8, there exist elements wmC1 ; : : : ; wm of E such that
the set

B D fu1 ; u2 ; : : : ; um ; wmC1 ; : : : ; wn g

is a basis of E. We define the space W as

W D spanfwmC1 ; : : : ; wn g:

We need to show that E D U ˚ W, that is (Theorem 4.3.10),

E D UCW and U \ W D f0E g:

To prove the first equality, let v be an element of E. Since B is a basis in E, then it spans E
and thus there exist 1 ; 2 ; : : : ; n in K such that

v D 1 u1 C 2 u2 C    C m um C mC1 wmC1 C    C n wn :
4.7  Exercises
195 4
Put

u D 1 u1 C 2 u2 C    C m um and w D mC1 wmC1 C    C n wn :

Then u 2 U, w 2 W, and v D u C w. This shows that E D U C W.


Next, we need to prove that U \ W D f0E g. So, let v be an element in U \ W. Then
v 2 U and v 2 W, that is

v D ˛1 u1 C ˛2 u2 C    C ˛m um

and

v D ˛mC1 wmC1 C    C ˛n wn :

Consequently,

˛1 u1 C ˛2 u2 C    C ˛m um  ˛mC1 wmC1      ˛n wn D 0E :

Since B is a linearly independent set,

˛i D 0; i D 1; : : : ; n:

Therefore v D 0E . Thus our result holds. J

Exercise 4.9 (Union of Two Subspaces)


Let E be a vector space over a field K. Let F1 and F2 be two subspaces of E. Show that
F1 [ F2 is a subspace of E if and only if F1 F2 or F2 F1 . 

Solution
First, assume that F1 [ F2 is a subspace of E and let w1 be an element of F1 and w2 be an
element of F2 . Then, both w1 and w2 are elements of F1 [ F2 and since F1 [ F2 is a subspace,

w D w1 C w2

is also an element of F1 [ F2 . That is, w 2 F1 or w 2 F2 . If w 2 F1 , then

w2 D w  w1 2 F1 ;

since F1 is a subspace. This shows that F2 F1 . Similarly, if w 2 F2 , then

w1 D w  w2 2 F2 ;

since F2 is a subspace. Thus, F1 F2 .


Conversely, assume for instance that F1 F2 . We need to show that F1 [F2 is a subspace.
It is clear that 0E belongs to F1 [ F2 since it does belong to both subspaces. Now, let u and
196 Chapter 4 • General Vector Spaces

v be two elements of F1 [ F2 and  and  be two elements of K. Then, u and v belong to


F2 since F1 F2 . Then u C v 2 F2 , since F2 is a subspace. Thus, u C v 2 F1 [ F2 .
Consequently, F1 [ F2 is a subspace. J

Exercise 4.10 (Orthogonal Complement)


4 Consider the vector space Rn over the field R. Let F be a subspace of Rn . We define the
orthogonal complement of F to be the set of all vectors in Rn that are orthogonal to all
vectors of F, and denoted it F ? . That is

F ? D fu 2 Rn ; such that the dot product u  v D 0; for all v 2 Fg :

1. Show that F ? is a subspace of Rn and F .F ? /? .3


2. Show that Rn D F ˚ F ? .
3. Now, let G be another subspace of Rn . Prove the following

.F C G/? D F ? \ G? ; (4.25)
? ? ?
.F \ G/ DF CG : (4.26)

4. Deduce that if Rn D F ˚ G, then Rn D F ? ˚ G? .

Solution
1. It is clear that 0Rn is an element of F ? since for any vector u 2 Rn , we have u  0Rn D 0.
In particular u  0Rn D 0 for all vectors u 2 F. Now, let u and v be two vectors in F ? and 
and  be two real numbers. By Theorem 3.3.5, for any w in F, we have

.u C v/  w D .u  w/ C .v  w/


D 0;

since u  w D 0 and v  w D 0. Therefore, F ? is a subspace of Rn . To show that .F ? /? D F,


we need to prove that F .F ? /? and .F ? /? F. Let u be an element of F. Then for any
w 2 F ? , we have u  w D 0. That is u 2 .F ? /? . This yields F .F ? /? .
2. To show that Rn D F ˚ F ? , let v be an element of F and u be an element of Rn . Then,
using Theorem 3.4.1, we deduce that u can be written in exactly one way as

u D w1 C w2

with w1 D kv 2 F and w2 is orthogonal to w1 , that is w2 2 F ? . We deduce that Rn D F˚F ? .

3
In fact since dimK Rn is finite, we even have F D .F? /? , but the proof of .F? /?  F requires some
knowledge of topology. We omit it here.
4.7  Exercises
197 4
3. To prove (4.25), we need to show that

.F C G/? .F ? \ G? / and .F ? \ G? / .F C G/? :

Let v be an element of .F C G/? . Then for any u 2 F C G, we have

v  u D 0: (4.27)

It is clear that (4.27) is also satisfied for the elements of F and for the elements of G, since
F F C G and G F C G. Thus, u 2 F ? and u 2 G? , i.e., u 2 F ? \ G? . This implies
that F C G F ? \ G? .
Next, let w be an element of F ? \ G? . Then

w  u D 0; and w  v D 0;

for all u 2 F and for all v 2 G. This implies

w  .u C v/ D 0:

That is, w 2 .F C G/? , since any element of F C G is written as u C v with u 2 F and v 2 G.


Thus, F ? \ G? .F C G/? . Consequently, (4.25) holds.
Now to show (4.26), we have, since for any subspace H of Rn , .H ? /? D H,

.F \ G/? D Œ.F ? /? \ .G? /? ? :

Applying (4.25) to F ? and G? , we get

.F \ G/? D Œ.F ? /? \ .G? /? ? D Œ.F ? C G? /? ? D F ? C G? :

This ends the proof of (4.26).


4. Now, if Rn D F ˚ G, then we have

Rn D F C G and F \ G D f0Rn g:

Hence, we have by using (4.26)

F ? C G? D .F \ G/? D f0Rn g? D Rn ;

since f0Rn g is orthogonal to all the elements of Rn . On the other hand, making use of (4.25),
we get

F ? \ G? D .F C G/? D fRn g? D f0Rn g;


198 Chapter 4 • General Vector Spaces

since f0Rn g is the only vector which is orthogonal to all vectors of Rn . So, we have already
proved that

Rn D F ? C G? and F ? \ G? D f0Rn g:

4 Consequently, Rn D F ? ˚ G? . J
199 5

Linear Transformations

Belkacem Said-Houari

© Springer International Publishing AG 2017


B. Said-Houari, Linear Algebra, Compact Textbooks in Mathematics,
DOI 10.1007/978-3-319-63793-8_5

5.1 Definition and Examples

In  Chap. 1, we defined a matrix as a rectangular array of numbers (Defini-


tion 1.1.5). In this chapter, we give the mathematical definition of matrices through
linear transformations. We will see that the multiplication of two matrices is equivalent
to the composition of two linear transformations. One of the important proper-
ties of linear transformations is that they carry some algebraic properties from one
vector space to another. Sometimes, this will provide us with the necessary knowl-
edge of some vector spaces, without even studying them in detail, but by rather
seeing them as the result of a linear transformation of other well-known vector
spaces.
We saw in  Chap. 1 that if we multiply an m  n matrix A by a vector u in Rn we
obtain a vector v D Au in Rm . This multiplication satisfies a very important property
called linearity. That is, if u1 and u2 are two vectors in Rn , v1 D Au1 , and v2 D Au2 ,
then

A.u1 C u2 / D Au1 C Au1 D v1 C v2 ;

and for any scalar , we have

A.u1 / D Au1 D v1 :

So, by doing the above matrix multiplication, we have transformed linearly the
elements of the space Rn to elements of the space Rm . This is what we call linear
transformation, and we can generalize this notion to any two vector spaces as follows.
200 Chapter 5 • Linear Transformations

Definition 5.1.1 (Linear Transformation)


Let E and F be two vector spaces over the same field K. A linear transformation
(also called homomorphism) from E to F is defined to be a linear map f from E into
F and satisfying, for all u and v in E and  in K:

f .u C v/ D f .u/ C f .v/; (5.1)

and
5
f .u/ D  f .u/: (5.2)

If, in addition, f is bijective (or one-to-one and onto), then f is called an


isomorphism. If E D F, then f is called a linear operator or endomorphism. If in
addition f is bijective, then f is called an automorphism.

ⓘ Remark 5.1.1 The two properties (5.1) and (5.2) can be combined in a single property
and one writes: f is a linear transformation from the vector space E to the vector space F
if for all u; v 2 E and all ;  2 K, we have

f .u C v/ D  f .u/ C  f .v/: (5.3)

Notation The set of linear transformations from E to F is denoted by L .E; F/. If E D F,


then we denote L .E; F/ simply by L .E/.

Example 5.1 (The Zero Transformation)


Let E and F be two vector spaces over the same field K. The zero transformation 0L .E;F/ is
the transformation that maps all elements u of E to the zero element 0F of F. That is

0L .E;F/ .u/ D 0F :

Clearly, 0L .E;F/ is a linear transformation. Indeed, for all u; v 2 E and all ;  2 K, we have

0L .E;F/ .u C v/ D 0F


D 0F C 0F

D 0L .E;F/ .u/ C 0L .E;F/ .u/:

Example 5.2 (The Identity Operator)


Let E be a vector space over a field K. We define the identity IdE to be the operator which
maps each element u of E into itself. That is, for all u in E,

IdE .u/ D u:
5.2  Fundamental Properties of Linear Transformations
201 5
Then, IdE is an endomorphism, because for all u; v 2 E and all ;  2 K, we have

IdE .u C v/ D u C v


D IdE .u/ C IdE .v/:

Example 5.3 (Linear Transformation Associated to Matrix Multiplication)


Let A be a fixed matrix in Mmn .R/. Consider the transformation f acting from the vector
space Rn to the vector space Rm as follows:

f .u/ D Au (5.4)

for any column vector u in Rn . This transformation f is linear. Indeed, let u, v be two vectors
in Rn , and ,  be real numbers. Then, using the properties of matrices, we have

f .u C v/ D A.u C v/


D Au C Av

D  f .u/ C  f .v/:

Example 5.4 (The Derivative is a Linear Transformation)


Let F .R; R/ be the vector space of all real-valued functions defined on R. Let D .R; R/
be the vector space of all differentiable real-valued functions defined on R. Define the
transformation D from F .R; R/ to D .R; R/ by

df
D. f / D :
dx

Then, it is clear that D is linear, since

d
D.f C g/ D .f C g/ D D. f / C D.g/:
dx

5.2 Fundamental Properties of Linear Transformations

In this section we list some fundamental properties of linear transformations. As usual,


we start with algebraic properties. Let E and F be two vector spaces over the same
field K. For any f and g in L .E; F/ and for any  in K, we define the addition and
202 Chapter 5 • Linear Transformations

multiplication by scalars as follows:

. f C g/.u/ D f .u/ C g.u/ and .f /.u/ D  f .u/:

One can easily verify that L .E; F/ with the above addition and multiplication by scalars
is a vector space over K.
In addition to the above algebraic structure of the vector space L .E; F/, we exhibit
5 now more properties of linear transformations.

Theorem 5.2.1 (Composition of Linear Transformations)


Let E; F, and G be three vector spaces over the same field K. Let f be an element of
L .E; F/ and g be an element of L .F; G/. Then g ı f is an element of L .E; G/.

Proof
We have

gıf W E !G

u 7! .g ı f /.u/ D gŒ f .u/:

Now, let u, v be elements of E and ,  be elements of K. Then by the linearity of f and g,


we have

.g ı f /.u C v/ D gŒ f .u C v/

D gΠf .u/ C  f .v/

D gΠf .u/ C gΠf .v/

D .g ı f /.u/ C .g ı f /.v/:

This establishes the linearity of g ı f . t


u

Linear transformations preserve some algebraic properties of vector spaces, as we


will show in the following theorem. We have seen in  Chap. 1 that the multiplication
of any m  n matrix by the zero vector 0Rn of Rn results in the zero vector 0Rm in Rm .
In fact, this property is also true for any linear transformation as we will proof in the
following theorem.
5.2  Fundamental Properties of Linear Transformations
203 5

Theorem 5.2.2
Let E and F be two vector spaces over the same field K and f be a linear transformation
from E into F. Then we have:
1. f .0E / D 0F .
2. f .u/ D f .u/.
3. If H is a subspace of E, then f .H/ is a subspace of F.
4. If M is a subspace of F, then f 1 .M/ is a subspace of E.

Proof
1. To show that f .0E / D 0F , since f .0E / is an element of f .E/, we need to show that for all v
in f .E/,

v C f .0E / D f .0E / C v D vI (5.5)

this will imply that f .0E / D 0F . So, let v be an element of f .E/. Then there exists u 2 E such
that v D f .u/. We have

u C 0E D 0E C u D u:

This implies, due to the linearity of f , that

f .u C 0E / D f .u/ C f .0E / D f .u/:

Similarly,

f .0E C u/ D f .0E / C f .u/ D f .u/:

The last two relations give (5.5), since v D f .u/.


2. We have that for u in E,

u C .u/ D 0E ;

Thus, the linearity of f implies

f .u/ C f .u/ D f .0E / D 0F :

This gives f .u/ D f .u/, due to the uniqueness of the inverse in the Abelian group .F; C/.
3. If H is a subspace of E, then H contains 0E . Consequently, f .H/ contains f .0E / D 0F .
This implies that f .H/ is not empty. Now, let v1 , v2 be two elements of f .H/ and ,  be two
elements of K. Then there exist u1 and u2 in H such that

v1 D f .u1 / and v2 D f .u2 /:


204 Chapter 5 • Linear Transformations

It is clear that u1 C u2 2 H, since H is a subspace of E. By using the linearity of f ,

v1 C v2 D  f .u1 / C  f .u2 /


D f .u1 C u2 /

which is an element of f .H/. Hence, f .H/ is a subspace of F.


4. Let M be a subspace of F. Clearly, f 1 .M/ contains 0E . Let u1 and u2 be two elements
5 of f 1 .M/. Then, f .u1 / and f .u2 / are elements of M. Hence, for any ˛ and ˇ in K, ˛ f .u1 / C
ˇ f .u2 / 2 M, since M is a subspace of F. The linearity of f implies

˛ f .u1 / C ˇ f .u2 / D f .˛u1 C ˇu2 /:

This shows that ˛u1 C ˇu2 is an element of f 1 .M/. Consequently, f 1 .M/ is a subspace
of E. t
u

5.2.1 The Kernel and the Image of a Linear Transformation

We saw in Example 4.12 that for a fixed matrix A in Mmn .R/ the null space

N D f column vectors u in Rn ; such that Au D 0Rm g

is a subspace of Rn . In view of (5.4), the subspace N can be written as

N D f column vectors u in Rn ; such that f .u/ D 0Rm g D f 1 f0Rm g;

where f is defined in (5.4). The space N is called the null space of the matrix A, or the
kernel of the linear transformation f . We can generalize this to any linear transformation
as follows.

Definition 5.2.1 (The Kernel of a Linear Transformation)


Let E and F be two vector spaces over the same field K and f be a linear
transformation from E into F. The kernel of f is the set of all elements u 2 E such
that f .u/ D 0F . We denote the kernel of f by Ker. f / and write

Ker. f / D fu 2 E; such that f .u/ D 0F g D f 1 f0F g:

As usual, when introducing a new set in algebra, a natural question is to check if this
set has an algebraic structure. Here we show that the kernel of a linear transformation is
a subspace.
5.2  Fundamental Properties of Linear Transformations
205 5

Theorem 5.2.3 (The Subspace Ker. f /)


Let E and F be two vector spaces over the same field K and f be a linear transformation
from E into F. Then Ker. f / is a subspace of E.

Proof
We may prove Theorem 5.2.3 directly, by showing that Ker. f / satisfies the properties of
a subspace. Or, we may easily see that since f0F g is a subspace of F and since Ker. f / D
f 1 f0F g, Theorem 5.2.2 gives that Ker. f / is a subspace of E. t
u

Example 5.5
Consider the linear transformation f W R2 ! R defined as

f .u; v/ D u  2v:

Then

Ker. f / D f.u; v/ 2 R2 ; such that f .u; v/ D 0g


D f.u; v/ 2 R2 ; such that u D 2vg

Thus, Ker. f / is the subspace of R2 spanned by the vector .2; 1/. 

The kernel of a linear transformation is useful in determining when the transforma-


tion is injective (one-to-one). Indeed, we have:

Theorem 5.2.4 (Injective Linear Transformation)


Let E and F be two vector spaces over the same field K and f be a linear transformation
from E into F. Then, the following two statements are equivalent:
1. Ker. f / D f0E g.
2. The linear transformation f is injective. That is, if u and v are elements in E such
that f .u/ D f .v/, then u D v.

Proof
First, we show that .1/ implies .2/. That is, we assume that Ker. f / D f0E g and show that f
is injective. So, let u, v be elements in E such that f .u/ D f .v/, That is

f .u/  f .v/ D f .u  v/ D 0F ; (5.6)


206 Chapter 5 • Linear Transformations

where we have used the linearity of f . The identity (5.6) implies that u  v 2 Ker. f /, and
since Ker. f / D f0E g, we deduce that u  v D 0E , that is u D v. This implies that f is
injective.
Conversely, assume that f is injective; we need to show that Ker. f / D f0E g. First, it is
clear that f0E g Ker. f / since Ker. f / is a subspace of E (or since f .0E / D 0F ). Now, let u
be an element of Ker. f /. Then, by definition, we have

5 f .u/ D 0F D f .0E /:

Since f is injective, it follows that u D 0E . Thus, we proved that Ker. f / f0E g and therefore
Ker. f / D f0E g. Hence, we have shown that .2/ implies .1/. This completes the proof of
Theorem 5.2.4. t
u

As mentioned earlier, one of the main properties of linear transformations is that they
allow to transfer some algebraic properties from one vector space to another. As we saw
in  Chap. 4, it is very important to know the dimension of a vector space. In other
words, it is important to know, or to be able to construct a basis in a vector space. More
precisely, let E and F be two vector spaces over the same field K and let f be a linear
transformation from E into F. Assume that dimK E D n and let BE D fu1 ; u2 ; : : : ; un g
be a basis of E (which exists according to Theorem 4.6.1). A natural question is: under
what conditions on f we have dimK E D dimK F? And if dimK E D dimK F, then could
BF D f f .u1/; f .u2 /; : : : ; f .un/g be a basis of F? To answer these questions, we start
with the following statement.

Theorem 5.2.5
Let E and F be two vector spaces over the same field K and f be an injective linear
transformation from E into F. Let u1 ; u2 ; : : : ; un be linearly independent elements of E.
Then f .u1 /; f .u2 /; : : : ; f .un / are linearly independent elements of F.

Proof
Let 1 ; 2 ; : : : ; n be elements of K such that

1 f .u1 / C 2 f .u2 / C    C n f .un / D 0F : (5.7)

The linearity of f implies that

f .1 u1 C 2 u2 C    C n un / D 0F :

This means that 1 u1 C 2 u2 C    C n un 2 Ker. f /. Since f is injective, Ker. f / D f0E g


(Theorem 5.2.4). Hence,

1 u1 C 2 u2 C    C n un D 0E :
5.2  Fundamental Properties of Linear Transformations
207 5
Since u1 ; u2 ; : : : ; un are linearly independent elements of E, it follows

1 D 2 D    D n D 0K : (5.8)

Thus, we have proved that (5.7) yields (5.8), which proves that f .u1 /; f .u2 /; : : : ; f .un / are
linearly independent. t
u

From the above result, we deduce the following result.

Theorem 5.2.6
Let E and F be two vector spaces over the same field K, with dimK F D n; and f
be an injective linear transformation from E into F. Let u1 ; u2 ; : : : ; un be linearly
independent elements of E. Then the set BF D f f .u1 /; f .u2 /; : : : ; f .un /g is a basis
of F.

Proof
By Theorem 5.2.5, BF is a linearly independent set of elements in F. Since the cardinality
of BF is equal to n, Lemma 4.6.4 implies that BF is a maximal linearly independent set of
elements in F. Hence, Theorem 4.6.2 implies that BF is a basis in F. t
u

ⓘ Remark 5.2.7 Theorem 5.2.6 can be used to construct a basis of a vector space as
follows: suppose that we have a vector space F of finite dimension n and we want to
construct a basis for F. Then it is enough to find an injective linear transformation f
from another space E to F and n linearly independent elements of E. The basis of F will
then be given by the images of the n linearly independent elements of E under f .

Definition 5.2.2 (The Image of a Linear Transformation)


Let E and F be two vector spaces over the same field K and f be a linear
transformation from E into F. We define the image of f to be the set of elements w
in F for which there exists an element u in E such that w D f .u/. We denote the
image of f by Im. f / and write

Im. f / D f f .u/ 2 F; such that u 2 Eg D f .E/:

Example 5.6
In Example 5.3, the image of the linear transformation defined by f .u/ D Au is the set of all
vectors w in Rm that can be written as the product of the matrix A and a vector u in Rn . 

Next, we endow the image of a linear transformation with an algebraic structure.


208 Chapter 5 • Linear Transformations

Theorem 5.2.8 (The Subspace Im. f /)


Let E and F be two vector spaces over the same field K and f be a linear transformation
from E into F. Then, Im. f / is a subspace of F.

Proof
5 First, it is clear that 0F is an element of F since 0F D f .0E / (Theorem 5.2.2). Second, let w1 ,
w2 be elements of Im. f / and ,  be elements of K. Then there exist u1 and u2 in E such that

w1 D f .u1 / and w2 D f .u2 /:

Since f is linear,

w1 C w2 D  f .u1 / C  f .u2 /

D f .u1 C u2 /: (5.9)

Since E is a vector space, u1 C u2 2 E. Thus, in (5.9) we have expressed w1 C w2 as the
image of an element of E. Hence, w1 C w2 2 Im. f /. Consequently, Im. f / is a subspace
of F. t
u

The image of f can be used to determine if the linear transformation is surjective (or
onto), as in the following theorem.

Theorem 5.2.9 (Surjective Linear Transformation)


Let E and F be two vector spaces over the same field K and f be a linear transformation
from E into F. Then, the following two statements are equivalent:
1. Im. f / D F.
2. f is surjective. That is, every element w in F has a corresponding element u in E
such that w D f .u/.

Proof
We prove first that .1/ implies .2/. Let w be an element of F. Since Im. f / D F, then w 2
Im. f /. Thus, by definition there exists u 2 E such that w D f .u/. This means that f is
surjective. Conversely, assume that f is surjective and let z be an element of F. Then there
exists v in E such that z D f .v/. This implies that z 2 Im. f /, which shows that F Im. f /.
Since by definition Im. f / F, we deduce that Im. f / D F, and so .2/ implies .1/. t
u

Next, we introduce one of the fundamental theorems of linear algebra that relates
the dimension of the kernel and the dimension of the image of a linear transformation
to the dimension of the vector space on which the transformation is defined.
5.2  Fundamental Properties of Linear Transformations
209 5

Theorem 5.2.10 (Rank-Nullity Theorem)


Let E and F be two vector spaces over the same field K and f be a linear transformation
from E into F. Assume that the dimension of E is finite. Then, we have

dimK E D dimK Ker. f / C dimK Im. f /: (5.10)

The dimension of the image of f , dimK Im. f /, is also called the rank of f and is denoted
by rank. f /. Also, the dimension of the kernel of f dimK Ker. f / is called the nullity of
f and is denoted by null. f /. Thus, (5.10) can be also recast as

dimK E D null. f / C rank. f /:

Proof
Denote

dimK E D n; dimK Ker. f / D q; and rank. f / D s:

First, if Im. f / D f0F g, then f .u/ D 0F for any u in E. This means that u 2 Ker. f /. Hence,
E D Ker. f / and thus

dimK E D dimK Ker. f /;

which is exactly (5.10). Now, if Im. f / ¤ f0F g, then we have s > 0, and so by Theorem 4.6.1
the space Im. f / has a basis. Let fw1 ; w2 ; : : : ; ws g be a basis of Im. f /. Thus, by definition
there exist u1 ; : : : ; us 2 E such that

wi D f .ui /; i D 1; : : : ; s: (5.11)

The elements u1 ; u2 ; : : : ; us are linearly independent in E. Indeed, let ˛1 ; ˛2 ; : : : ; ˛s be


elements of K such that

˛1 u1 C ˛2 u2 C    C ˛s us D 0E :

Then, by the linearity of f and (5.11), we have

f .˛1 u1 C ˛2 u2 C    C ˛s us / D f .0E / D 0F ;

or equivalently

˛1 w1 C ˛2 w2 C    C ˛s ws D 0F :
210 Chapter 5 • Linear Transformations

Since fw1 ; w2 ; : : : ; ws g is a basis of Im. f /, it follows that

˛1 D ˛2 D    D ˛s D 0K :

Now, if Ker. f / D f0E g, then we can show that the set fu1 ; u2 ; : : : ; us g spans E. Indeed, let
v be an element of E. Then f .v/ 2 Im. f /. Since fw1 ; w2 ; : : : ; ws g spans Im. f /, there exist
1 ; 2 ; : : : ; s in K such that
5
f .v/ D 1 w1 C 2 w2 C    C s ws
D 1 f .u1 / C 2 f .u2 / C    C s f .us /

D f .1 u1 C 2 u2 C    C s us /:

Since f is injective (Theorem 5.2.4), then we have from above

v D 1 u1 C 2 u2 C    C s us :

This shows that fu1 ; u2 ; : : : ; us g spans E and thus is a basis of E. Consequently,

s D dimK E D dimK Ker. f / C rank. f / D 0 C s;

so (5.10) holds.
Next, if Ker. f / ¤ f0E g, then q > 0, and hence there exists a basis fv1 ; v2 ; : : : ; vq g of
Ker. f /. Our goal is to show that the set

B D fu1 ; u2 ; : : : ; us ; v1 ; v2 ; : : : ; vq g

is a basis for E. This will suffice to prove (5.10). First, we show that B spans E. Let v be an
element in E. Then, as above, there exist 1 ; 2 ; : : : ; s in K such that

f .v/ D f .1 u1 C 2 u2 C    C s us /:

By the linearity of f , we have

f .v  1 u1  2 u2      s us / D 0F :

Hence v  1 u1  2 u2      s us 2 Ker. f /. Thus, there exist ˇ1 ; ˇ2 ; : : : ; ˇq 2 K such that

v  1 u1  2 u2      s us D ˇ1 v1 C ˇ2 v2 C    C ˇq vq :

This gives

v D 1 u1 C 2 u2 C    C s us C ˇ1 v1 C ˇ2 v2 C    C ˇq vq ;
5.3  Isomorphism of Vector Spaces
211 5
which shows that B spans E. To prove that B is a linearly independent set, let
1 ; 2 : : : ; s ; ı1 ; ı2 : : : ; ıq be elements of K satisfying

X
s X
q
i ui C ıj vj D 0E : (5.12)
iD1 jD1

This implies, by the linearity of f ,

! 0 1
X
s Xq
f i ui Cf @ ıj vj A D f .0E / D 0F ;
iD1 jD1

whence
!
X
s X
s
f i ui D i wi D 0F ;
iD1 iD1

X
q
since ıj vj 2 Ker. f /. Therefore,
jD1

1 D 2 D    D s D 0K ;

since fw1 ; w2 ; : : : ; ws g is a linearly independent set. Plugging this into (5.12) yields (for the
same reason)

ı1 D ı2 D    D ıq D 0K :

Therefore, B is a linearly independent set and hence a basis of E. This completes the proof
of Theorem 5.2.10. t
u

5.3 Isomorphism of Vector Spaces

In an attempt to carry over some algebraic properties of a vector space to another, we


now define what we call isomorphism between two vector spaces.

Definition 5.3.1 (Isomorphism of Vector Spaces)


Let E and F be two vector spaces over the same field K and f be a linear
transformation from E into F. Then f is said to be an isomorphism if f is bijective
(surjective and injective). If F D E, then this isomorphism is called an
automorphism.

(Continued )
212 Chapter 5 • Linear Transformations

Definition 5.3.1 (continued)


If such isomorphism exists, then the two spaces E and F have the same algebraic
properties (are algebraically equivalent) and we say that they are isomorphic, and
write E Š F.

Example 5.7
5 The identity IdE transformation defined in Example 5.2 is an automorphism. 

Example 5.8 (C Is Isomorphic to R2 )


We consider the vector space C over the field R. Then C is isomorphic to R2 . This can be
easily seen by considering the linear transformation f from R2 to C given by

f W R2 ! C
.x; y/ 7! x C iy:

We can easily show that f is an isomorphism. Thus, we deduce directly that dimR C D 2,
since dimR R2 D 2:
We may also easily prove that the vector space Cn over the field R is isomorphic
to R2n . 

Theorem 5.3.1 (Inverse of an Isomorphism)


Let E and F be two vector spaces over the same field K and f be an isomorphism from
E into F. Then f 1 is an isomorphism from F into E.

Proof
It is clear that if f is bijective, then f 1 is bijective. We need to show only that if f is linear,
then f 1 is also linear. So, let w1 , w2 be elements in F and ,  be elements of K. Then, since
f is bijective, then there exist a unique u1 and a unique u2 in E such that

w1 D f .u1 / and w2 D f .u2 /;

and so

u1 D f 1 .w1 / and u2 D f 1 .w2 /:

Now, we have

w1 C w2 D  f .u1 / C  f .u2 /

D f .u1 C u2 /;
5.3  Isomorphism of Vector Spaces
213 5
whence

f 1 .w1 C w2 / D u1 C u2


D  f 1 .w1 / C  f 1 .w2 /:

Therefore, f 1 is linear, as needed. t


u

In the following theorem we show that the composition of two isomorphisms is also an
isomorphism.

Theorem 5.3.2 (Composition of Isomorphism)


Let E, F and G be three vector spaces over the same field K, f be an isomorphism
from E into F, and g be an isomorphism from F to G. Then the composition g ı f is an
isomorphism from E to G.

The proof of Theorem 5.3.2 is obvious, since the composition of two bijective
transformations is bijective and in Theorem 5.2.1 we have seen that the composition
of two linear transformations is a linear transformation.
We have observed above that in order to prove that a transformation is an
isomorphism, we need to show that this transformation is linear and bijective. This last
requirement can be relaxed under some algebraic assumptions as we will see in the
following theorem.

Theorem 5.3.3
Let E and F be two vector spaces over the same field K and f be a linear transformation
from E into F. Assume that

dimK E D dimK F: (5.13)

If f is injective or surjective, then f is an isomorphism.

Proof
First, assume that f is injective. Then according to Theorem 5.2.4, Ker. f / D f0E g. Then
using (5.10) and (5.13), we deduce that

dimK F D rank. f /: (5.14)

Since Im. f / is a subspace of F, Theorem 4.6.7 together with (5.14) implies that F D Im. f /.
Hence, by Theorem 5.2.9, f is surjective and since we assumed that f is injective, then f is
bijective and therefore, an isomorphism.
214 Chapter 5 • Linear Transformations

Now, if f is surjective, then we have as above F D Im. f /. This implies that

dimK E D dimK F D rank. f /:

Therefore, (5.10) leads to dimK Ker. f / D 0. Then, Ker. f / D f0E g, and hence by
Theorem 5.2.4, f is injective and consequently f is bijective. u
t

5 ⓘ Remark 5.3.4 Theorem 5.3.3 implies that if f is an element in L .E/ (with dimK E
finite), then to show that f is an automorphism, we need just to show that Ker. f / D f0E g
or Im. f / D E.

As we have seen in  Chap. 4, it is very useful to find a basis of a vector space, since
this could allow one to infer many properties of the space. Now suppose that we have
two vector spaces E and F over the same field K, f is an isomorphism from E to F, and
B is a basis of E. So the natural question is whether f .B/ is basis of F. This turns out to
be true as we will see in the following theorem.

Theorem 5.3.5
Let E and F be two vector spaces over the same field K and f be an isomorphism from
E into F. Let B be a basis of E. Then f .B/ is a basis of F.

Proof
Let B be a basis of E. Since f is injective and B is a linearly independent set in E, then
according to Theorem 5.2.5, f .B/ is a linearly independent set in F. Clearly, since B spans E,
then f .B/ spans f .E/. Since f is surjective, then we have (see Theorem 5.2.9) f .E/ D F and
thus f .B/ spans F. Consequently, f .B/ is a basis of F. t
u

ⓘ Corollary 5.3.6 Two finite-dimensional vector spaces over the same field are
isomorphic if and only if they have the same dimension.

Proof
First, assume that E Š F. Then, by Theorem 5.3.5, we deduce that

dimK E D dimK F:

Conversely, assume that

dimK E D dimK F D n:
5.4  Exercises
215 5
Then, Theorem 4.6.1 implies that there exists BE D fu1 ; u2 ; : : : ; un g, a basis of E, and BF D
fw1 ; w2 ; : : : ; wn g, a basis of F. We define the transformation f from E to F as follows:

f WE!F

u D 1 u1 C 2 u2 C    C n un 7! f .u/ D w D 1 w1 C 2 w2 C    C n wn ;

for 1 ; 2 ; : : : ; n 2 K. It is clear that f is linear. Now, let u and v be two elements in E, such
that

f .u/ D f .v/:

Since BE is a basis of E, there exists i and i ; i D 1; 2; : : : ; n elements in K such that

u D 1 u1 C 2 u2 C    C n un and v D 1 u1 C 2 u2 C    C n un :

Thus, f .u/ D f .v/ which implies that

1 w1 C 2 w2 C    C n wn D 1 w1 C 2 w2 C    C n wn :

Theorem 4.4.3 shows that

i D i ; i D 1; 2; : : : ; n:

Hence, u D v. Therefore, f is injective. Hence, since dimK E D dimK F, Theorem 5.3.3


shows that f is bijective. t
u

5.4 Exercises

Exercise 5.1 (The Kernel and the Image of a Projection)


Let E be a vector space over a field K and let f W E ! E be an endomorphism of E such that
f 2 D f ı f D f .1
Show that
1. Ker. f / D Ker. f 2 / if and only if Im. f / \ Ker. f / D f0E g:
2. E D Ker. f / ˚ Im. f /:
3. If g is an endomorphism of E, then g ı f D f ı g if and only if

g.Ker. f // Ker. f / and g.Im. f // Im. f /:

1
A linear transformation satisfying this property is called a projection.
216 Chapter 5 • Linear Transformations

Solution
1. We need to show that Ker. f / Ker. f 2 / and Ker. f 2 / Ker. f /. Let u be an element in
Ker. f /. i.e., f .u/ D 0E . This gives

f 2 .u/ D f . f .u// D . f ı f /.u/ D f .0E / D 0E ;

so u 2 Ker. f 2 /. Thus, Ker. f / Ker. f 2 /. Conversely, let w be an element of Ker. f 2 /. Then,


5 since f 2 D f ,

f 2 .w/ D f .w/ D 0E ;

so w 2 Ker. f /. Hence, Ker. f 2 / Ker. f /.


2. In view of Theorem 4.3.10, we need to prove that

E D Ker. f / C Im. f /; and Ker. f / \ Im. f / D f0E g: (5.15)

So, let u be an element in E. Then we can write u as

u D .u  f .u// C f .u/:

It is clear that f .u/ 2 Im. f /. On the other hand, we have

f .u  f .u// D f .u/  f 2 .u/ D 0:

Thus, u  f .u/ 2 Ker. f /. Hence, we have proved the first assertion in (5.15).
Next, let v be an element in Ker. f / \ Im. f /. Then there exists u in E such that

v D f .u/ and f .v/ D 0E :

This gives f 2 .u/ D 0E . Thus, u 2 Ker. f 2 /. Since Ker. f / D Ker. f 2 /, then u 2 Ker. f / and we
have v D f .u/ D 0E . Hence, Ker. f /\Im. f / D f0E g and consequently, E D Ker. f /˚Im. f /:
3. Assume that g ı f D f ı g and let w be an element in g.Ker. f //. Then there exists u in
Ker. f / such that w D g.u/. Now, since g ı f D f ı g, we have

f .w/ D f .g.u// D g. f .u// D g.0E / D 0E ;

because u 2 Ker. f / and g is a linear transformation. Thus, we have shown that w 2 Ker. f /.
Therefore g.Ker. f // Ker. f /.
Now, let z be an element of g.Im. f //. Then there exists w in Im. f / such that z D g.w/.
Since w 2 Im. f /, there exists u in E such that w D f .u/. Thus,

z D g. f .u// D f .g.u// D f .y/; y D g.u/:

Hence, z 2 Im. f /, which proves that g.Im. f // Im. f /.


5.4  Exercises
217 5
Conversely, assume that g.Ker. f // Ker. f / and g.Im. f // Im. f /. Let u be an
element of E. We claim that

f .g.u// D g. f .u//: (5.16)

Indeed, using .2/, we deduce that

u D u1 C u2 ;

with u1 2 Ker. f / and u2 2 Im. f /. Thus, there exists u3 in E such that

f .u1 / D 0 and u2 D f .u3 /:

Now, we have

g. f .u// D g. f .u1 C u2 // D g. f .u1 // C g. f .u2 //


D g.0E / C g. f .u2 //

D g. f .u2 // (5.17)

and

f .g.u// D f .g.u1 C u2 // D f .g.u1 // C f .g.u2 //: (5.18)

Next, it is clear that g.u1 / 2 g.Ker. f //, and since g.Ker. f // Ker. f /, we deduce that
g.u1 / 2 Ker. f / and hence f .g.u1 // D 0: Also, since u2 2 Im. f /, we have g.u2 / 2
g.Im. f // Im. f /, and so there exists u4 in E such that

g.u2 / D f .u4 /:

Consequently, taking all these into account, we can rewrite (5.17) and (5.18) as

g. f .u// D g. f .u2 // D g. f 2 .u3 // D g. f .u3 // D g.u2 / (5.19)

and respectively

f .g.u// D f .g.u2 // D f . f .u4 // D f .u4 / D g.u2 /: (5.20)

Now, (5.19) and (5.20) yield (5.16), and so f ı g D g ı f . J

Exercise 5.2
Consider the endomorphism f W Rn ! Rn defined as

f .u/ D .u1 C un ; u2 C un1 ; : : : ; un C u1 /; for u D .u1 ; u2 ; : : : ; un /:

Find dimR Ker. f / and dimR Im. f / D rank. f /. 


218 Chapter 5 • Linear Transformations

Solution
First, we need to find the subspace Ker. f /. So, let u D .u1 ; u2 ; : : : ; un / be a vector in Rn .
Then u is a vector in Ker. f / if and only f .u/ D 0Rn . This implies that
8
ˆ u1 C un D 0;
ˆ
ˆ
ˆ
< u2 C un1 D 0;
ˆ :::
ˆ
ˆ
5 :̂
un C u1 D 0:

Solving this system, we find


8
ˆ
ˆ u1 D un ;
ˆ
ˆ
< u2 D un1 ;
ˆ :::
ˆ
ˆ

un D u1 :

First, if n is even, then we have n D 2p then u can be written as

u D .u1 ; u2 ; : : : ; up ; up ; up1 ; : : : ; u1 /:

This vector can be written as

u D u1 .1; 0; : : : ; 0; : : : ; 1/ C u2 .0; 1; : : : ; 0; : : : ; 1; 0/ C    ; Cup .0; : : : ; 1; 1; : : : ; 0/:

Thus, the set of vectors B D fa1 ; a2 ; : : : ; ap g such that aj ; 1  j  p has all components zero
except for the jth component which is 1 and the n  j C 1 component which is 1, spans
Ker. f /. We may easily show that B is a linearly independent set, so a basis of Ker. f /. Thus,
if n is even, then
n
dimR Ker. f / D p D :
2

Now, applying Theorem 5.2.10, we deduce that

n n
rank. f / D dimR Rn  dimR Ker. f / D n  D :
2 2

Second, if n is odd, i.e., n D 2q C 1, each element u of Ker. f / can be written as

u D .u1 ; u2 ; : : : ; uq ; 0; uq ; : : : ; u1 /;

and then

u D u1 b1 C u2 b2 C    C uq bq ;
5.4  Exercises
219 5
where bk ; 1  k  q, is the vector with all components zero, except for the kth component
which is equal to 1 and the .n  .k C 1/ C 2/-nd component, which is equal to 1. As above,
we can easily show that the set of vectors S D fb1 ; b2 ; : : : ; bq g is a basis of Ker. f / and we
have

n1
dimR Ker. f / D q D :
2

Consequently, we obtain as above

n1 nC1
rank. f / D n  D :
2 2
J

Exercise 5.3 (Rank of the Composition)


Let E; F and G be three vector spaces over the same field K. Assume that dimK E is finite.
Let f be an element of L .E; F/ and g be an element in L .F; G/.
1. Show that Im.g ı f / Im.g/.
2. Prove that Ker. f / Ker.g ı f /.
3. Deduce that

rank.g ı f /  min.rank. f /; rank.g//: (5.21)

Solution
First, it is clear that g ı f is an element of L .E; G/, (see Theorem 5.2.1).
1. Let w be an element in Im.g ı f /. Then there exists u in E such that

w D .g ı f /.u/ D g. f .u//:

Put v D f .u/; then v 2 F and w D g.v/, so w 2 Im.g/.


2. Let u be an element in Ker. f /, then we have f .u/ D 0F . Since g is linear, then it holds
that

g. f .u// D g.0F / D 0G

(see Theorem 5.2.2). This means that u 2 Ker.g ı f /.


3. Since Im.g ı f / Im.g/, then we have, according to Theorem 4.6.7,

rank.g ı f / D dimK Im.g ı f /  dimK Im.g/ D rank.g/: (5.22)

On the other hand, from .2/ we have, for the same reason,

dimK Ker. f /  dimK Ker.g ı f /:


220 Chapter 5 • Linear Transformations

Thus, applying Theorem 5.2.10 twice, we have

rank.g ı f / D dimK E  dimK Ker.g ı f /  dimK E  dimK Ker. f / D rank. f /: (5.23)

Finally (5.22) and (5.23) yield (5.21).


J

5 Exercise 5.4
Let E be a vector space over a field K such that dimK E is finite and f be an element of L .E/.
Show that the following statements are equivalent:
1. There exist two projections P and Q in L .E/ such that

f D PQ and Im.P/ D Im.Q/:

2. f 2 D 0L .E/ :

Solution
We have seen in Exercise 5.1 that P and Q satisfy P2 D P and Q2 D Q. Now, we need to
show that .1/ ) .2/ and .2/ ) .1/.
First, assume that .1/ is satisfied and let u be an element of F. Then, we have

f 2 .u/ D f . f .u// D .P  Q/ ..P  Q/.u//

D P2 .u/  Q.P.u//  P.Q.u// C Q2 .u/

D P.u/  Q.P.u//  P.Q.u// C Q.u/:

Now, it is clear that P.u/ 2 Im.P/ D Im.Q/. Then, we have Q.P.u// D P.u/.2 By the same
argument P.Q.u// D Q.u/. Consequently, taking these into account, we have from above
that f . f .u// D 0E for all u in E. This means that f 2 D 0L .E/ . Thus, we have proved that
.1/ ) .2/.
Conversely, since dimK E is finite, then (see Exercise 4.8), Ker. f / has a complement in
E. Hence, there exists a projection P such that Im.P/ D Ker. f /.
We define Q as Q D P  f so f D P  Q. To show that Q is a projection, we need to
prove that Q2 D Q. We have

Q2 D .P  f / ı .P  f / D P2  P ı f  f ı P C f 2 :

Since P2 D P and f 2 D 0L .E/ (by assumption), we deduce that

Q2 D P  P ı f  f ı P: (5.24)

2
Since Q is a projection, we have Q.y/ D y for all y in Im.Q/.
5.4  Exercises
221 5
Now, since Im.P/ Ker. f /, then we deduce that f ı P D 0L .E/ . Also, since f 2 D 0L .E/ ,
we have

Im. f / Ker. f / D Im.P/ D Ker.IdL .E/  P/:

This gives

.IdL .E/  P/ ı f D 0L .E/ ;

whence P ı f D f . J

Inserting these relations into (5.24), we get

Q2 D P  P ı f  f ı P D P  f D Q:

Thus, Q is a projection.
Now, we need to show that Im.P/ D Im.Q/. That is Im.P/ Im.Q/ and Im.Q/
Im.P/. So, first, let w be an element of Im.P/ D Ker. f /. Since w 2 Im.P/, then P.w/ D
w. We also have f .w/ D 0E . On the other hand, we have

Q.w/ D .P  f /.w/ D P.w/ D w:

This means that w 2 Im.Q/ and so Im.P/ Im.Q/.


On the other hand, let u be an element of Im.Q/. Then

u D Q.u/ D P.u/  f .u/;

whence

P.u/ D P2 .u/  .P ı f /.u/ D P.u/  f .u/:

This yields f .u/ D 0E and consequently,

u D Q.u/ D P.u/:

This means that u 2 Im.P/. Thus Im.Q/ Im.P/, which concludes our proof.

Exercise 5.5
Let E be a vector space over a field K such that dimK E D n and f be an element of L .E/.
Show that the following statements are equivalent:
1. Ker. f / D Im. f /.
n
2. f 2 D 0L .E/ and rank. f / D .
2


222 Chapter 5 • Linear Transformations

Solution
As usual, we show that .1/ ) .2/ and .2/ ) .1/. First, assume that Ker. f / D Im. f / and
let u be an element of E. Then f .u/ 2 Im. f /, and since Ker. f / D Im. f /, then f .u/ 2 Ker. f /
and so

f . f .u// D f 2 .u/ D 0E :

5 Since this holds for all elements of E, we deduce that f 2 D 0L .E/ .


Now, using Theorem 5.2.10, we have

dimK E D dimK Ker. f / C rank. f / D dimK Im. f / C rank. f / D 2 rank. f /:

This gives rank. f / D n=2. Thus, we have proved that .1/ ) .2/.
Conversely, assume that .2/ holds; we need to show that Ker. f / D Im. f /. Let w be an
element in Im. f /, then there exists u in E such that w D f .u/. Now, we have

f .w/ D f . f .u// D f 2 .0/ D 0E ;

so, w 2 Ker. f /. This shows that Im. f / Ker. f /.


Now, since rank. f / D n=2, Theorem 5.2.10 implies that dimK Ker. f / D n=2. Since,
Im. f / Ker. f / and rank. f / D dimK Ker. f /, Theorem 4.6.7 shows that Im. f / D Ker. f /. J

Exercise 5.6 (Nilpotent Linear Transformation)


Let E be a vector space over a field K such that dimK E D n and f be an element of L .E/.
Let k be a positive integer. f is called nilpotent of index k 3 if

f k D f ı f ı    ı f D 0L .E/ :
„ƒ‚…
k times

Show that if f is nilpotent of index n, then for any u in E satisfying f n1 .u/ ¤ 0E , the set
B D fu; f .u/; f 2 .u/; : : : ; f n1 .u/g is a basis of E. 

Solution
Since the cardinality of B equals n, Theorem 4.6.5 shows that it is enough to show that B is
a linearly independent set. First, it is clear that since f n1 .u/ ¤ 0E , the linearity of f shows
that for any 0  k  .n  1/, we have f k .u/ ¤ 0E . Now, let 0 ; 1 ; : : : ; n1 be elements in
K satisfying

0 u C 1 f .u/ C 2 f 2 .u/ C    C n1 f n1 .u/ D 0E :

3
See Exercise 1.2 for the definition of a nilpotent matrix.
5.4  Exercises
223 5
Applying f n1 to this identity, using the linearity of f n1 and the fact that f n .u/ D 0E , we get

f n1 .0 u C 1 f .u/ C 2 f 2 .u/ C    C n1 f n1 .u// D f n1 .0E / D 0E : (5.25)

Since f m .u/ D 0E , this gives for all m  n,

0 f n1 .u/ D 0E :

Since f n1 .u/ ¤ 0E , then 0 D 0K . Next, arguing in the same way, we apply f n2 to (5.25)
and using the fact that 0 D 0K , we obtain 1 D 0K . By continuing the process and applying
each time f n` ; 1  `  n to (5.25), we can show that `1 D 0K : Hence,

0 D 1 D    D n1 D 0K :

Consequently, B is a linearly independent set, and thus a basis of E. J

Exercise 5.7 (Triangle and Frobenius Inequalities for the Rank)


Let E and F be two finite-dimensional vector spaces over the same field K. Let f and g be
elements of L .E; F/.
1. Prove that Im. f C g/ Im. f / C Im.g/.
2. Show that rank. f C g/  rank. f / C rank.g/.
3. Deduce that j rank. f /  rank.g/j  rank. f C g/:
4. Show that if f1 ; f2 , and f3 are elements of L .F1 ; F2 /; L .F2 ; F3 /; and L .F3 ; F4 /;
respectively, where F1 ; F2 ; F3 , and F4 are four finite-dimensional vector spaces over the
same field K, then

rank. f2 /  rank. f3 ı f2 / D dimK .Im. f2 / \ Ker. f3 //: (5.26)

Deduce that

rank. f2 ı f1 / C rank. f3 ı f2 /  rank f2 C rank. f3 ı f2 ı f1 / (Frobenius inequality).

(5.27)

Solution
1. First, it is clear that f C g 2 L .E; F/ since L .E; F/ is a vector space. It is also obvious
that Im. f /, Im.g/ and Im. f C g/ are subspaces of F (see Theorem 5.2.8). Now, let w be an
element of Im. f C g/. Then there exists u in E such that

w D . f C g/.u/ D f .u/ C g.u/:


224 Chapter 5 • Linear Transformations

Since f .u/ 2 Im. f / and g.u/ 2 Im.g/, we see that w 2 Im. f C g/. This shows that Im. f C g/
Im. f / C Im.g/.
2. Applying formula (4.16) for F1 D Im. f / and F2 D Im.g/, we get

dimK .Im. f / C Im.g//  dimK Im. f / C dimK Im.g/ D rank. f / C rank.g/:

On the other hand, since Im. f C g/ Im. f / C Im.g/, Theorem 4.6.7 shows that
5
rank. f C g/ D dimK Im. f C g/  dimK .Im. f / C Im.g//  rank. f / C rank.g/:

3. We can apply .2/ for f C g and g, and get

rank. f C g C .g//  rank. f C g/ C rank.g/:

Since rank.g/ D rank.g/, it follows that

rank. f /  rank. f C g/ C rank.g/:

Equivalently,

rank. f /  rank.g/  rank. f C g/:

By applying the same method to f C g and .f /, we get

rank.g/  rank. f /  rank. f C g/:

Combining the above two relations, then we obtain the desired result.
4. First, we need to prove (5.26). Since

rank. f3 ı f2 / D dimK f3 .Im. f2 //;

applying Theorem 5.2.10 with H D Im. f2 / and f D f3 , we have

dimK f3 .H/ D dimK Im. f3 / D dimK H  dimK Ker. f3 /:

It is clear that the kernel of the restriction of f2 to H is H \ Ker. f3 / D Im. f2 / \ Ker. f3 /. It


follows that

rank. f2 /  rank. f3 ı f2 / D dimK .Im. f2 / \ Ker. f3 //:

Now to prove (5.27) we apply (5.26). We have

rank. f2 /  rank. f3 ı f2 / D dimK .Im. f2 / \ Ker. f3 //


5.4  Exercises
225 5
and

rank. f2 ı f1 /  rank. f3 ı f2 ı f1 / D dimK .Im. f2 ı f1 / \ Ker. f3 //:

Consequently, to show (5.27) it is enough to prove that

dimK .Im. f2 ı f1 / \ Ker. f3 //  dimK .Im. f2 / \ Ker. f3 //;

which is true since Im. f2 ı f1 / Im. f2 / (see the first question in Exercise 5.3). J
227 6

Linear Transformations and Matrices

Belkacem Said-Houari

© Springer International Publishing AG 2017


B. Said-Houari, Linear Algebra, Compact Textbooks in Mathematics,
DOI 10.1007/978-3-319-63793-8_6

6.1 Definition and Examples

The goal of this chapter is to make a connection between matrices and linear transfor-
mations. So, let E and F be two finite-dimensional vector spaces over the same field K
such that dimK E D n and dimK F D m. Then, according to Theorem 4.6.1, both spaces
have bases. So, let BE D fu1 ; u2 ; : : : ; un g be a basis of E and BF D fw1 ; w2 ; : : : ; wm g
be a basis of F. Let f be an element of L .E; F/. Since BF is a basis of F, for any
uj ; 1  j  n in BE , f .uj / is uniquely written as a linear combination of the elements
of BF :

f .uj / D a1j w1 C a2j w2 C    C amj wm ; 1  j  n;

where

aij ; 1  i  m; 1jn

are elements of K. It is clear that knowledge of the aij , completely determines the linear
transformation f and allows us to define the matrix associate to f as follows.

Definition 6.1.1 (Matrix Associated to a Linear Transformation)


The matrix associated to the linear transformation f given above is defined as the
map defined from I  J to K as

.i; j/ 7! aij ;

where I D f1; 2; : : : ; mg and J D f1; 2; : : : ; ng, i is an element of the set I and j is an


element of the set J and aij are as above. We denote this matrix as M. f ; BE ; BF / or
sometimes just M. f / (when there is no confusion) and write it as

(Continued )
228 Chapter 6 • Linear Transformations and Matrices

Definition 6.1.1 (continued)


2 3
a11 a12 a13 ::: a1n
6a ::: a2n 7
6 21 a22 a23 7
M. f / D 6
6 :: :: :: :: :: 7
7:
4 : : : : : 5
am1 am2 am3 : : : amn

It is clear that the entries of the jth column (1  j  n) of this matrix M. f / are the
components of f .uj / in the basis BF . We call M. f / the matrix of f in the bases BE
and BF .
6

Example 6.1
Let f be the linear transformation defined as

f W R2 ! R3 ;

.u; v/ 7! .u  v; u C 2v; v/:

Find the matrix for f in the standard bases of R2 and R3 . 

Solution
The standard bases of R2 and R3 are respectively

BR2 D f.1; 0/; .0; 1/g; and BR3 D f.1; 0; 0/; .0; 1; 0/; .0; 0; 1/g:

To find the matrix M. f / associated to the linear transformation f , we need to find the
components of f .1; 0/ and f .0; 1/ in the basis BR3 ; then f .1; 0/ will be the first column of
M. f / and f .1; 0/ will be the second column of M. f /. We have

f .1; 0/ D .1; 1; 0/ D 1.1; 0; 0/ C 1.0; 1; 0/ C 0.0; 0; 1/:

Thus,
2 3
1
6 7
f .1; 0/ D 4 1 5 :
0

Similarly,

f .0; 1/ D .1; 2; 1/ D 1.1; 0; 0/ C 2.0; 1; 0/ C 1.0; 0; 1/;


6.1  Definition and Examples
229 6
and so
2 3
1
6 7
f .0; 1/ D 4 2 5 :
1

Consequently, the matrix M. f / is


2 3
1 1
6 7
M. f / D 4 1 2 5 :
0 1

It is clear that
2 3
" # 1
1 6 7
f .1; 0/ D M. f /  D 415:
0
0

and
2 3
" # 1
0 6 7
f .1; 0/ D M. f /  D 4 2 5:
1
1

Example 6.2 (Identity Matrix Associated to the Identity Transformation)


Let E be a vector space over a field K with dimK E D n and f be the identity operator defined
in Example 5.2. Then, for any basis B D fu1 ; u2 ; : : : ; un g of E, we can easily see that
8
ˆ f .u1 / D u1 D 1u1 C 0u1 C    C 0un ;
ˆ
ˆ
ˆ
< f .u2 / D u2 D 0u1 C 1u1 C    C 0un ;
ˆ :::
ˆ
ˆ

f .un / D un D 0u1 C 0u2 C    C 1un :

Consequently, we easily obtain


2 3
1 0 0  0
6 7
60 1 0  07
6 7
6 0 1  07
M. f / D 6 0 7 D I;
6: :: 7
6: : 7
4: 5
0 0 0  1

where I is identity matrix given in Definition 1.2.2. 


230 Chapter 6 • Linear Transformations and Matrices

Example 6.3 (Zero Matrix Associated to the Zero Transformation)


Let E and F be two vector spaces over the same field K such that dimK E D n and dimK F D
m. Then, for any basis BE of E and any basis BF of F, the matrix associated to the zero
transformation 0L .E;F/ defined in Example 5.1 is

M.0L .E;F/ / D 0mn ;

where 0mn is the zero matrix introduced in Definition 1.1.9. 

Now, a natural question is: how to define matrices associated to the sum and composition
6 of two linear transformations? To answer this question, we may easily show some
properties of these matrices:

Theorem 6.1.1
Assume that E and F are as above and let G be another vector space defined over the
same field K with dimK G D r. Let BG D fv1 ; v2 ; : : : ; vr g be a basis of G. Let f and g
be elements of L .E; F/, h be an element of L .F; G/, and  be an element of K. Then,
it holds that
1. M. f C g/ D M. f / C M.g/.
2. M.f / D M. f /:
3. M.h ı f / D M.h/M. f /.
4. If E D F and f is bijective, then M. f / is invertible and M. f 1 / D .M. f //1 .

Proof
1. As above, assume that

f .uj / D a1j w1 C a2j w2 C    C amj wm ; 1jn

and

g.uj / D d1j w1 C d2j w2 C    C dmj wm ; 1  j  n;

where aij and dij ; 1  i  m; 1  j  n are elements of K. Then, we have

. f C g/.uj / D f .uj / C g.uj / D .aij C d1j /w1 C .a2j C d2j /w2 C    C .amj C dmj /wm :

Consequently,
2 3
a11 C d11 a12 C d12 a13 C d13 ::: a1n C d1n
6 7
6 a21 C d21 a22 C d22 a23 C d23 ::: a2n C d2n 7
M. f C g/ D 6
6 :: :: :: :: :: 7
7
4 : : : : : 5
am1 C dm1 am2 C dm2 am3 C dm3 ::: amn C dmn
D M. f / C M.g/:
6.1  Definition and Examples
231 6
2. In the same way, we can show that M.f / D M. f /.
3. Assume that

h.wi / D b1i v1 C b2i v2 C    C bri vr ; 1  i  m:

We have .h ı f /.uj / 2 G for any 1  j  n. Hence,

.h ı f / D h. f .uj // D c1j v1 C c2j v2 C    C crj vr ; 1  j  n; (6.1)

where ckj ; 1  j  n; 1  k  r are elements of K.


On the other hand, we have

h. f .uj // D h.a1j w1 C a2j w2 C    C amj wm /

D a1j h.w1 / C a2j h.w2 / C    C amj h.wm /


X
m
D aij h.wi /
iD1

X
m
D aij .b1i v1 C b2i v2 C    C bri vr /
iD1
! ! !
X
m X
m X
m
D b1i aij v1 C b2i aij v2 C    C bri aij vr : (6.2)
iD1 iD1 iD1

Comparing (6.1) and (6.2) and using Theorem 4.4.3, we find that

X
m
ckj D bki aij ; 1  k  r;
iD1

which are exactly the entries of the matrix product M.h/M. f / as introduced in Defini-
tion 1.1.11 (with some changes in the indices).
4. If f is an automorphism, then

f ı f 1 D f 1 ı f D IdL .E/ ;

whence, by using .3/,

M. f ı f 1 / D M. f /M. f 1 / D M. f 1 /M. f / D I:

The uniqueness of the inverse (Theorem 1.2.3) shows that M. f 1 / D .M. f //1 : t
u
232 Chapter 6 • Linear Transformations and Matrices

Example 6.4
Consider the linear transformation f defined in Example 6.1 and the linear transformation
h W R3 ! R2 defined by

h.w; y; z/ D .w C y  z; y C z/:

Find the matrix associated to h ı f relative to the standard basis of R2 . 

Solution
We need first to find the matrix M.h/. We have
6
h.1; 0; 0/ D .1; 0/ D 1.1; 0/ C 0.0; 1/; h.0; 1; 0/ D .1; 1/ D 1.1; 0/ C 1.0; 1/

and

h.0; 0; 1/ D .1; 1/ D 1.1; 0/ C 1.0; 1/;

so
" #
1 1 1
M.h/ D :
01 1

Now, we need to find the matrix M.h ı f /. We have

h ı f W R2 ! R2

.u; v/ 7! .2u; u C 3v/:

Thus, we have

.hıf /.1; 0/ D .2; 1/ D 2.1; 0/C1.0; 1/ and .hıf /.0; 1/ D .0; 3/ D 0.1; 0/C3.0; 1/:

Consequently,
" #
20
M.h ı f / D :
13

On the other hand, we have


2 3
" # 1 1 " #
1 1 1 6 7 20
M.h/M. f / D 41 2 5 D D M.h ı f /:
01 1 13
0 1

J
6.1  Definition and Examples
233 6

ⓘ Remark 6.1.2 (Linear Transformation Associated to a Matrix) We have seen above


how to find the matrix associated to a linear transformation. Conversely, assume that we
have a matrix associated to a linear transformation f defined from E to F, where E and F
are as above. Then, we can easily see that

f .uj / D M. f /uj :

Example 6.5
Assume, for example, that we know the matrix M. f / defined in Example 6.1. Then
2 3
" # 1
1 6 7
f .1; 0/ D M. f / D 415:
0
0

and
2 3
" # 1
0 6 7
f .0; 1/ D M. f / D 4 2 5:
1
1

Now, to find f , let .u; v/ be an element of R2 . Then, we have

.u; v/ D u.1; 0/ C v.0; 1/:

Hence, by the linearity of f ,

f .u; v/ D uf .1; 0/ C vf .0; 1/ D u.1; 1; 0/ C v.1; 2; 1/ D .u  v; u C 2v; v/;

which is precisely the linear transformation given in Example 6.1. 

So, we have seen above, that for any linear transformation f in L .E; F/ we associate
a unique matrix M. f / in Mmn .K/ and for any matrix in Mmn .K/, we have a
unique linear transformation associated to it. So, clearly, the transformation T defined
as follows:

T W L .E; F/ ! Mmn .K/


f 7! T. f / D M. f /

is a bijection from L .E; F/ to Mmn .K/. In addition, as we have seen in Theorem 6.1.1,
this transformation is linear. Thus, it defines an isomorphism between L .E; F/ and
Mmn .K/. Thus, we summarize these in the following theorem.
234 Chapter 6 • Linear Transformations and Matrices

Theorem 6.1.3 (L .E; F/ and Mmn are Isomorphic)


Let E and F be two finite-dimensional vector spaces over the same field K, and
denote dimK E D n and dimK F D m. Then the spaces L .E; F/ and Mmn .K/ are
isomorphic.

ⓘ Remark 6.1.4 Since dimK Mmn .K/ D m  n, Theorem 6.1.3 shows that,

6 dimK L .E; F/ D m  n D dimK E  dimK F:

6.2 Change of Basis and Similarity

We have seen that a finite-dimensional vector space may has more than one basis. So,
one question is: what happens to the matrix associated to a linear transformation if
we change bases? Also, we have seen in  Chap. 1 that it is much easier to deal with
diagonal matrices, especially when dealing with differential equations or powers of a
matrix or eigenvalues, as we will see in the coming chapters, since diagonal matrices
enjoy nice properties. So, suppose that we have a linear operator f in L .E/. Can we
choose two bases in E in which M. f / is a diagonal matrix? Thus, it is so important to
investigate the effect of a change of bases on the matrix M. f /.
Let E be a vector space over a field K. Denote dimK E D n and let

B1 D fu1 ; u2 ; : : : ; un g and B2 D fv1 ; v2 ; : : : ; vn g

be two bases of E. We define the linear transformation

pWE!E
uj 7! p.uj / D vj ; 1  j  n:

As before, since vj 2 E, then it can be uniquely written as a linear combination of the


elements of B1 :

vj D p.uj / D ˛1j u1 C ˛2j u2 C    C ˛nj un ; 1  j  n;

where ˛ij ; 1  i; j  n, are elements of K. Let D . p; B1 ; B2 / be the matrix associate


to p. Then, we have
2 3
˛11 ˛12 ::: ˛1n
6 7
˛
6 21 ˛22 ::: ˛2n 7
D6
6 :: :: :: :: 7
7:
4 : : : : 5
˛n1 ˛n2 ::: ˛nn
6.2  Change of Basis and Similarity
235 6

It is quite obvious that p is an automorphism, so according to Theorem 6.1.1, 1 exists


and satisfies

1 D . p; B2 ; B1 / D . p1 ; B1 ; B2 ; /: (6.3)

Thus, we have the following definition.

Definition 6.2.1 (Transition Matrix)


The matrix . p; B1 ; B2 / defined above is called the transition matrix from the basis
B1 to the basis B2 .

Let us denote by Œvj Bk the components of vj in the basis Bk ; k D 1; 2. As we have


seen above, these components can be found using the formula

Œvj B1 D Œvj B2 :

Similarly, if Œuj Bk are the components of uj with respect to the basis Bk ; k D 1; 2, then

Œuj B2 D 1 Œuj B1 :

Example 6.6
Consider the vector space R3 and the two bases

B1 D fu1 ; u2 ; u3 g; u1 D .1; 1; 0/; u2 D .1; 1; 0/; u3 D .0; 2; 1/:

and

B2 D fv1 ; v2 ; v3 g; v1 D .1; 2; 0/; v2 D .1; 1; 2/; v3 D .1; 0; 3/:

Find the transition matrix from B1 to B2 . 

Solution
First, one can easily check that B1 and B2 are bases of R3 . Now, we need to find the
components of vj ; j D 1; 2; 3; in the basis B1 . We easily see that

3 1
v1 D u1 C u2 C 0u3 ;
2 2
v2 D u1  2u2 C 2u3 ;
7 5
v3 D  u1  u2 C 3u3 :
2 2
236 Chapter 6 • Linear Transformations and Matrices

Thus, we obtain
2 3
3=2 1 7=2
6 7
D 4 1=2 2 5=2 5 :
0 2 3

Now, it is clear that


2 3
3=2
6 7
Œv1 B1 D 4 1=2 5 D Œv1 B2 ; Œv2 B1 D Œv2 B2 ; Œv2 B1 D Œv3 B2 :
6 0

Since
2 3 2 3 2 3
1 0 0
6 7 6 7 6 7
Œv1 B2 D 405; Œv2 B2 D 415; Œv3 B2 D 405;
0 0 1

we have
2 3
2=7 8=7 9=7
6 7
1 D 4 3=7 9=7 4=7 5 :
2=7 6=7 5=7

Now, the columns of 1 are the components of uj ; j D 1; 2; 3, in the basis B2 . J

Example 6.7
Consider P2 to be the set of all polynomials with real coefficients and of degree less than or
equal to 2. We have seen in Example 4.19, that

B1 D f1; x; x2 g

is a basis of P2 . Consider the set

B2 D f1; 1 C x; 1 C x C x2 g:

1. Show that B2 is a basis of P2 .


2. Find the transition matrix from B1 to B2 .
3. Find the components of q.x/ D 3 C 2x C 4x2 relative to the basis B2 .


6.2  Change of Basis and Similarity
237 6
Solution
1. By Theorem 4.6.5, it is enough to show that B2 is a linearly independent set. So, let ˛; ˇ
and  be real numbers satisfying, for all x 2 R,

˛ C ˇ.1 C x/ C .1 C x C x2 / D 0:

Taking some particular values of x, such as .x D 0; x D 1; x D 1/; we get the system of
equations
8
ˆ
< ˛ C ˇ C  D 0;
˛ C  D 0;

˛ C 2ˇ C 3 D 0:

It is clear that this system has the unique solution ˛ D ˇ D  D 0: Thus, B2 is a linearly
independent set and hence a basis of P2 .
2. To find the transition matrix from B1 to B2 , we need to find the components of the
elements of B2 with respect to B1 . We have

1 D 1 C 0x C 0x2 ;
1 C x D 1 C x C 0x2 ;
1 C x C x2 D 1 C 1x C 1x2 :

Thus,
2 3
111
6 7
D 40 1 15:
001

We can also easily see that the transition matrix from B2 to B1 is


2 3
1 1 0
6 7
1 D 4 0 1 1 5 :
0 0 1

3. We can write the components of q.x/ with respect to B1 as


2 3
3
6 7
Œq.x/B1 D 425:
4
238 Chapter 6 • Linear Transformations and Matrices

Then, the components of q.x/ with respect to B2 are


2 32 3 2 3
1 1 0 3 1
6 76 7 6 7
Œq.x/B2 D 1 Œq.x/B1 D 4 0 1 1 5 4 2 5 D 4 2 5 :
0 0 1 4 4

Now, let E and F be two finite-dimensional vector spaces over the same field K.
Denote dimK E D n and dimK F D m. Let B be a basis of E and let S1 and S2 be
6 two bases of F. Let f be a linear transformation from E to F and let M. f ; B; S1 / be the
corresponding matrix of f with respect to the bases B and S1 . Then, the question is: how
to find M. f ; B; S2 /?

f p
(E, B) (F, S1) (F, S2)
p◦f

Let p be the linear operator that transform the elements of S1 into the element of S2 as
shown above and . p; S1 ; S2 / be its corresponding matrix. Then, Theorem 6.1.1, yields

M. f ; B; S2 / D M. p ı f ; B; S2 / D . p; S1 ; S2 /M. f ; B; S1 /: (6.4)

Example 6.8
Consider the linear transformation f defined as

f W R2 ! .R3 ; B1 /

.x; y/ 7! .x  y; x C y; y/;

where in R2 we consider the standard basis. Let p be the linear operator defined from R3 to
R3 as

p.uj / D vj ; j D 1; 2; 3;

where uj , vj and B1 are given in Example 6.6. Find M. pıf /, the matrix associated to p ıf . 

Solution
First, we need to find M. f /. So, we need to find the components of f .1; 0/ and f .0; 1/ with
respect to the basis B1 . We have

f .1; 0/ D .1; 1; 0/ D u1 C 0u2 C 0u3

and

f .0; 1/ D .1; 1; 1/ D u1 C 2u2  u3 :


6.2  Change of Basis and Similarity
239 6
Thus,
2 3
1 1
6 7
M. f / D 4 0 2 5 :
0 1

Using (6.4), we have

M. p ı f / D M. f /
2 32 3 2 3
3=2 1 7=2 1 1 3=2 3
6 76 7 6 7
D 4 1=2 2 5=2 5 4 0 2 5 D 4 1=2 1 5 :
0 2 3 0 1 0 1

Now, let E and F be two finite-dimensional vector spaces over the same field K,
with dimK E D n and dimK F D m. Let B1 and B2 be two bases of E and S1 and S2
be two bases of F. Let f be a linear transformation from E into F and M. f ; B1 ; S1 /
be the corresponding matrix to f with respect to the bases B1 and S1 . We want now
to find M. f ; B2 ; S2 /, the corresponding matrix of f with respect to B2 and S2 . So, let
. p1 ; B1 ; B2 / be the transition matrix from B1 to B2 and . p2 ; S1 ; S2 / be the transition
matrix from S1 to S2 , as shown in the following diagram:

p−1
1 f p2
(E, B2) (E, B1 ) (F, S1 ) (F, S 2 )
p2 ◦ f ◦ p−1
1

Then, it is clear that

M. f ; B2 ; S2 / D M. p2 ı f ı p1
1 ; B2 ; S2 /

D . p2 ; S1 ; S2 //  M. f ; B1 ; S1 /  . p1
1 ; B1 ; B2 /; (6.5)
1
D . p2 ; S1 ; S2 //  M. f ; B1 ; S1 /  . p1 ; B1 ; B2 /;

where we have used (6.3). The above formula can be rewritten as

M. f ; B1 ; S1 / D 1 . p2 ; S1 ; S2 //  M. f ; B2 ; S2 /  . p1 ; B1 ; B2 /:

Two matrices satisfying (6.5) are called equivalent and we may give a general definition
of two equivalent matrices as follows.
240 Chapter 6 • Linear Transformations and Matrices

Definition 6.2.2 (Equivalent Matrices)


Let A and B be two matrices in Mmn .K/. We say that A and B are equivalent if
there exist two invertible matrices R in Mm .K/ and S in Mn .K/ such that

B D RAS:

ⓘ Remark 6.2.1 It is clear that the relation A is equivalent to B is an equivalence relation


on the set of matrices Mmn .K/, since:
▬ A is equivalent to A (reflexivity).
6 ▬ If A is equivalent to B, then B is equivalent to A (symmetry).
▬ If A is equivalent to B and B is equivalent to C, then C is equivalent to A (transitivity).

One of the important cases in (6.5) is when E D F, S1 D B1 and S2 D B2 ; then f is


an endomorphism, p1 D p2 , and we have

M. f ; B1 ; B1 / D 1 . p1 ; B1 ; B2 //  M. f ; B2 ; B2 /  . p1 ; B1 ; B2 /:

In this case, we say that the matrix M. f ; B1 ; B1 / is similar to the matrix M. f ; B2 ; B2 /


and we can give a more general definition as follows:

Definition 6.2.3 (Similarity)


Let A and B be two square matrices in Mn .K/. Then, A and B are called similar or
conjugate if there exists an invertible matrix P in Mn .K/ such that

B D P1 AP: (6.6)

It is clear that two similar matrices are equivalent. As we have seen above, similar
matrices represent the same endomorphism with respect to two different bases. The
matrix P is also called a change of bases matrix.
As indicated in the beginning of  Sect. 6.2, one of the main goals of the change of
bases is to transform matrices to diagonal form. Formula (6.6) is a very important tool to
obtain diagonal matrices from some square matrices called diagonalizable matrices; the
process is known as diagonalization of matrices. We will see in the coming chapters that
in this case A and B have something in common. For instance, similar matrices share the
same eigenvalues. We will come back to this in details in the next chapter, but just to
clarify the ideas, we give the following example.

Example 6.9
Consider in M3 .R/ the matrix
2 3
200
6 7
A D 40 3 45:
049
6.3  Rank of Matrices
241 6
Take two bases in R3 :

B1 Dfe1 ; e2 ; e3 g; and B2 Dfu1 ; u2 ; u3 g; u1 D.0; 2; 1/; u2 D.1; 0; 0/; u3 D.0; 1; 2/:

Find the matrix B D P1 AP, where P is the transition matrix from B1 to B2 . 

Solution
It is clear that since B1 is the standard basis of R3 ,
2 3
0 10
6 7
P D Œu1 ; u2 ; u3  D 4 2 0 1 5 :
1 02

Now, we may easily check, by using the methods in  Chap. 1 (for instance), that
2 3
0 2=5 1=5
6 7
P1 D 4 1 2=5 4=5 5 :
0 1=5 2=5

Thus, we have by a simple computation


2 3
10 0
6 7
B D P1 AP D 4 0 2 0 5 :
0 0 11

It is clear that B is a diagonal matrix. This method of obtaining B from the matrix A is a very
important topic in linear algebra and linear programming. J

6.3 Rank of Matrices

We have defined in  Sect. 5.2.1 the rank of a linear transformation to be the dimension
of its image. Also, we have seen in  Chap. 6 that any linear transformation between
two finite-dimensional vector spaces can be represented by a matrix. So, now, in order
to define the rank of a matrix, we have to define first a subspace corresponding to the
image of a linear transformation. This subspace is known as the column space of the
matrix.
So, let E and F be two finite-dimensional vector spaces over the same field K
and let f be a linear transformation from E into F, as in Definition 6.1.1. Then,
once bases are chosen in E and F, one can associate a unique matrix to this linear
transformation. Conversely, we have also seen in Remark 6.1.2 that we can always
associate a unique linear transformation to a given matrix. So, let M. f / be the matrix
given in Definition 6.1.1. We define the set

R .M. f // D fM. f /u; with u 2 Eg D Im. f /:


242 Chapter 6 • Linear Transformations and Matrices

As we have seen in Theorem 5.2.8, this set is a subspace of F. Thus, the rank of M. f / is
the rank of f and it is the dimension of R .M. f //, and we have the following definition.

Definition 6.3.1 (The Rank of a Matrix)


Let E and F be two vector spaces over the same field K, with dimK E D n and
dimK F D m. The rank of A is defined to be the dimension of the column space
R .A/ defined as

R .A/ D fAu; with u 2 Eg:


6
and we write

rank.A/ D dimK R .A/:

ⓘ Remark 6.3.1 From above and since R .A/ is a subspace of F, Theorem 4.6.7 shows
that rank.A/  dimK F D m:
In addition, recall from Example 4.12 that the null space N .A/ of the matrix A is
defined as

N .A/ D fu 2 E; such that Au D 0F g D Ker. f /:

Thus, using Theorem 5.2.10, we deduce that

n D dimK E D rank.A/ C null.A/:

This shows that rank.A/  n. Consequently,

rank.A/  min.n; m/: (6.7)

If rank.A/ D min.n; m/, then we say that A is a matrix of full rank.

One of the main goals in  Chaps. 1 and 2 was to determine whether a matrix is
invertible or not. One of the requirements for A to be invertible was det.A/ ¤ 0, (see
Theorem 2.4.8). Here we can easily deduce an equivalent invertibility criterion as given
in the following theorem.

Theorem 6.3.2 (Rank Criterion for Invertibility)


Let A be a square matrix in Mn .K/. Then, A is invertible if and only rank.A/ D n.
6.3  Rank of Matrices
243 6
Proof
The assertion is obvious since in this case, the linear transformation associated to A
is surjective and therefore according to Theorem 5.3.3 it is an isomorphism, and thus
invertible. t
u

6.3.1 Some Properties of the Rank of a Matrix

Next, we discuss some properties of the rank of a matrix. To clarify the ideas, we assume
that E D Kn , F D Km and, for simplicity, take K D R, but the method works for any
field K. In Rn and Rm we use the standard bases, that is

BRn D fe1 ; e2 ; : : : ; en g and BRm D fe1 ; e2 ; : : : ; em g:

Let A be a matrix in Mmn .K/ and f be the linear transformation associated to it. It is
clear that the components of f .ej /; 1  j  n, with respect to the basis BRm form the jth
column of A. Moreover, the rank of f is the number of the linearly independent columns
of A and we have the following definition.

Definition 6.3.2
Let A be a matrix in Mmn .K/. Then, the rank of A is equal to the largest number of
columns of A which are linear independent.

So, now the question is: how to find the linearly independent columns of a matrix?
If n D m, then we have seen that a square matrix A in Mn .K/ is invertible if and
only if one of the following conditions is satisfied:
1. det.A/ ¤ 0,
2. rank.A/ D n.

So, necessarily these two conditions should be equivalent and thus, we have the
following theorem.

Theorem 6.3.3
Let A be a square matrix in Mn .K/. Then,

rank.A/ D n

if and only if det.A/ ¤ 0.

Proof
We have seen in Exercise 4.5 that the columns of A are linearly independent if and only if
det.A/ ¤ 0. This means that rank.A/ D n if and only if det.A/ ¤ 0. u
t
244 Chapter 6 • Linear Transformations and Matrices

Theorem 6.3.3 shows that in the case where det.A/ ¤ 0, the problem of finding the
rank of A is completely solved: it is equal to the size of the square matrix A.
Combining Theorem 6.3.3 and Theorem 2.4.8, we easily deduce another invertibility
criterion.

Theorem 6.3.4
Let A be a square matrix in Mn .K/. Then, A is invertible if and only if

rank.A/ D n:
6
We have seen in Theorem 2.3.1 that det.A/ D det.AT /; in particular, det.A/ ¤ 0 if
and only if det.AT / ¤ 0. In this case, and according to Theorem 6.3.3,

rank.A/ D rank.AT /: (6.8)

Since the columns of AT are the rows of A, then we can say that the rank of A is also the
maximal number of its linearly independent rows. In fact (6.8) is true for any matrix in
Mmn .K/ and the fact that the rank of a matrix A is equal to the rank of its transpose AT
is one of the important theorems in linear algebra. The space R .AT / is also called the
row space of A. We have the following assertion.

Theorem 6.3.5
Let A be a matrix in Mmn .K/. Then we have

rank.A/ D rank.AT /:

Proof
Actually, several proofs of the above theorem are available. Here we present a simple proof
from [16], assuming that K D R (the same proof also works for K D C). First, it is clear that
if Y is a vector in Rn , then

Y T Y D kYk2 D 0; (6.9)

if and only if Y D 0Rn , (Theorem 3.3.1). Hence, using (6.9), we deduce, by taking Y D AX,
that

AX D 0Rm

if and only if AT AX D 0Rn for any vector X in Rn . Using this last property, we see
that the vectors AX1 ; AX2 ; : : : ; AXk are linearly independent if and only if the vectors
AT AX1 ; AT AX2 ; : : : ; AT AXk are linearly independent, where X1 ; X2 ; : : : ; Xk are vectors in Rn .
6.3  Rank of Matrices
245 6
Consequently, we deduce that

rank.A/ D rank.AT A/: (6.10)

Now, we have

rank.A/ D rank.AT A/ D dimR fAT .AX/ W X 2 Rn g  dimR fAT Y W Y 2 Rn g D rank.AT /:

On the other hand, since .AT /T D A, we have

rank.AT /  rank..AT /T / D rank.A/:

This finishes the proof of Theorem 6.3.5. t


u

It is natural to expect that equivalent matrices share some algebraic properties. So,
do equivalent matrices have the same rank? In fact this turns out to be true.

Theorem 6.3.6 (Equivalent Matrices and Rank)


Let A and B be two equivalent matrices. Then

rank.A/ D rank.B/:

Proof
As we have seen in (6.5), if two matrices are equivalent, then they represent the same linear
transformation, with respect to different bases and so they have the same rank. t
u

This theorem has the following immediate corollary.

ⓘ Corollary 6.3.7 (Similarity and Rank) Let A and B be two similar matrices in
Mn .K/. Then

rank.B/ D rank.A/:

ⓘ Remark 6.3.8 The opposite of Corollary 6.3.7 is not true: The fact that two matrices
have the same rank does not imply that they are similar. For example, the matrices
2 3 2 3
0 0 0 0 0 1 0 0
6 7 6 7
60 0 1 07 60 0 0 07
AD6 7 and BD6 7
40 0 0 15 40 0 0 15
0 0 0 0 0 0 0 0
246 Chapter 6 • Linear Transformations and Matrices

have the same rank equal to 2, but they are not similar. Indeed, we can easily see that
A2 ¤ 0M4 .R/ , whereas B2 D 0M4 .R/ . So, if there exists an invertible matrix P such that
B D P1 AP, then

A2 D PBP1 PBP D PB2 P1 D 0M4 .R/ ;

which is a contradiction. Thus, A and B are not similar.

6.4 Methods for Finding the Rank of a Matrix


6
Definition 6.3.1 does not necessarily make the computation of the rank of a matrix easy
because finding the dimension of the space spanned by the column vectors of a matrix
could be difficult. However, there are several methods that can be applied to calculate
the rank of a matrix. Here we introduced two of them, that we believe are the most
effective and easy.

6.4.1 The Method of Elementary Row and Column Operations

Here we introduce a method for finding the rank of a matrix A in Mmn .K/. Given A,
we can change bases in Kn in such a way that there exists 1  r  min.m; n/ such that
the matrix A can be written (in the new basis) as
" #
Ir 0r.nr/
AD ; (6.11)
0.mr/r 0.mr/.nr/

where 0pq is the p  q matrix with all its entries equal to zero and Ir is the r  r identity
matrix.
We have seen in  Sect. 6.2 that the change of basis is equivalent to the multiplica-
tion of the original matrix by an invertible matrix. Since the form (6.11) can be obtained
by a series of changes of bases, the procedure is equivalent to the multiplication of the
original matrix by a series of invertible matrices. That is, for any matrix A in Mmn .K/,
we can show that there exists a finite sequence of invertible matrices E1 ; E2 ; : : : ; EkCs
such that the matrix

Ek Ek1    E1 AEkC1 EkC2    EkCs (6.12)

has the form (6.11). It is clear that the matrices on the left-hand side of (6.12) belong to
Mm .K/ and the matrices in the right-hand side belong to Mn .K/. Of course according
to Theorem 6.3.6, this operation does not change the rank of the matrix, since all the
matrices that we obtain by means of such a multiplication will be equivalent.
Now, once the form (6.11) is obtained, then it is clear that rank.A/ D r, and we can
also show that each matrix of rank r is equivalent to the matrix in (6.11).
6.4  Methods for Finding the Rank of a Matrix
247 6
So, the question now is: how to choose the matrices in (6.12) so as to reach the form
(6.11)?
To clarify this, consider the example
" #
1 3 1
AD :
01 7

Multiplying A from the left by the matrix


" #
11
E1 D
01

we get
" #
146
A1 D E1 A D :
017

It is clear that E1 is invertible. Also, multiplication by E1 from the left is clearly


equivalent to the row operation r1 C r2 , where r1 is the first row in the matrix A and
r2 is the second row. Now, replacing the first row r1 in A1 by r1  4r2 , we get
" #
1 0 22
A2 D :
01 7

This is equivalent to multiply A1 by the matrix


" #
1 4
E2 D :
0 1

Thus, we have A2 D E2 A1 . Also, it is clear that E2 is invertible.


Next, multiplying A2 from the right by the matrix
2 3
1 0 22
6 7
E3 D 4 0 1 0 5 ;
00 1

we get
" #
100
A3 D A2 E3 D :
017

This last operation is equivalent to replace c3 by 22c1 C c3 , where c1 ; c2 , and c3 are,


respectively, the first, the second, and the third columns of A2 . Finally, replacing c3 in
248 Chapter 6 • Linear Transformations and Matrices

A3 by c3  7c2 , we get
" #
100
A4 D : (6.13)
010

This last operation is also equivalent to multiply A3 from the right by the matrix
2 3
10 0
6 7
E4 D 4 0 1 7 5 :
6 00 1

Summarizing all the above operations, we obtain

A4 D E2 E1 AE3 E4 :

It is clear that all the above matrices Ei ; 1  i  4 are invertible. Now, if we put
R D E2 E1 and S D E3 E4 , then we obtain

A4 D RAS:

This means that the matrices A4 and A are equivalent, so they have the same rank. It is
clear that A4 is in the form (6.11), with r D 2. Consequently,

rank.A/ D rank.A4 / D 2:

What we have done above is a series of changes of bases in the vector spaces R2
and R3 to find the appropriate bases in these spaces in which A has the final form (6.13).
Also, as we have seen above, to find these appropriate bases, we need to perform some
row and column operations on the matrix. Basically, these row and column operations
are:
▬ Multiply a row (or a column) through by a nonzero constant.
▬ Interchange two rows (or two columns).
▬ Add a constant times one row to another row (or a constant times one column to
another column).

These operations are called elementary row (respectively, column) operations.

Example 6.10
Find the rank of the matrix
2 3
0 1 2 1
6 7
A D 4 2 0 0 6 5 :
4 2 4 10


6.4  Methods for Finding the Rank of a Matrix
249 6
Solution
Denote the columns of A and the matrices obtained by means of elementary operations on A
by cj ; 1  j  4, and their rows by ri ; 1  i  3. First, keeping in mind (6.7), we deduce
that rank.A/  3: Now, our goal is to perform some row or column elementary operations so
as to obtain a matrix of the form (6.11). Interchanging r1 and r2 , we get
2 3
2 0 0 6
6 7
A1 D 4 0 1 2 1 5 ;
4 2 4 10

and replacing r3 by r3 C 2r2 in A1 , we get


2 3
2 0 0 6
6 7
A2 D 4 0 1 2 1 5 :
0 2 4 2

Now, replacing r3 by r3 C 2r2 in A2 , we obtain


2 3
2 0 0 6
6 7
A3 D 4 0 1 2 1 5 :
0 00 0

Next, replacing c3 by c3  2c2 in A3 we obtain


2 3
2 0 0 6
6 7
A4 D 4 0 1 0 1 5 ;
0 00 0

and replacing c4 by c4 C 3c1 in A4 , we get


2 3
2 0 0 0
6 7
A5 D 4 0 1 0 1 5 :
0 00 0

Now, replacing c4 by c4 C c2 in A5 , we get


2 3
2 0 0 0
6 7
A6 D 4 0 1 0 0 5 ;
0 000

and finally, replacing here c1 by  12 c1 , we get


2 3
1000
6 7
A7 D 4 0 1 0 0 5 :
0000
250 Chapter 6 • Linear Transformations and Matrices

Thus, we have brought A to the form (6.11) with r D 2. Consequently,

rank.A/ D rank.A7 / D 2:

ⓘ Remark 6.4.1 As we have seen in Example 6.10 it can be a long process to reach the
form (6.11), especially if the size of the matrix is large. But if on the way we found the
rank of the matrix, then we can stop even before finding the final form (6.11), since our
main goal is to find the rank of the matrix, not writing it as in (6.11). For instance, in
6 Example 6.10, we can easily determine the rank from A2 since in A2 , r2 and r3 are
linearly dependent and r1 and r2 and linearly independent. This gives
rank.A2 / D 2 D rank.A/. So, in the process of finding the form (6.11), by applying
elementary row (or column) operations, it is very helpful to check, after each step, for
linearly independent columns and linearly independent rows, since this together with
the above theorems may help us finding the rank of a matrix quickly without even
reaching the final form (6.11). We illustrate this in the following example.

Example 6.11
Find the rank of the matrix
2 3
2 3 4
6 7
6 3 1 5 7
AD6 7:
4 1 0 1 5
0 2 4

Solution
First, using (6.7), we deduce that rank.A/  3. Our goal now is to perform some elementary
row and column operations so as to obtain a matrix in the form (6.11). We interchange the
second row r2 and the fourth row r4 , and obtain
2 3
2 3 4
6 0 2 4 7
6 7
A1 D 6 7:
4 1 0 1 5
3 1 5

Now, replacing the third column c3 in A1 by c1  c3 , we get


2 3
2 3 6
6 0 2 4 7
6 7
A2 D 6 7:
4 1 0 0 5
3 1 2
6.4  Methods for Finding the Rank of a Matrix
251 6
Interchanging here r1 and r3 , we get
2 3
1 0 0
6 7
6 0 2 4 7
A3 D 6 7:
4 2 3 6 5
3 1 2

We see immediately that c3 D 2c2 . Thus, we deduce that rank.A3 /  2. On the other hand,
we see that c1 and c2 are linearly independent. Then rank.A3 /  2 (since the rank is the
maximal number of linearly independent columns). Thus, we deduce that rank.A/ D 2. J

Example 6.12
Find the rank of the matrix
2 3
2 4 1
6 7
A D 4 1 2 0 5 :
0 5 3

Solution
Since A is a square matrix, we may first calculate the determinant of A and get

det.A/ D 29:

Since det.A/ ¤ 0, Theorem 6.3.3 implies that rank.A/ D 3. J

6.4.2 The Method of Minors for Finding the Rank of a Matrix

In Example 6.12, we have seen that when det.A/ ¤ 0, the problem of finding the
rank of A is easily solved. This approach is more convenient than the elementary row
and column operations in Example 6.10. But up until now, it seems to work only for
invertible square matrices. So, it is natural to look for a possible extension of this
approach to matrices that are not necessarily square or invertible. In fact, it turns out that
this method can be applied to an arbitrary matrix and we have the following theorem.

Theorem 6.4.2 (Minors and Rank)


Let A be a matrix in Mmn .K/. Then, the rank of A is equal to r; 1  r  min.m; n/, if
and only if there exists a nonzero minora of order r which is the largest nonzero minor.
a
See Definition 2.12 for the definition of the minors of a matrix.
252 Chapter 6 • Linear Transformations and Matrices

Proof
Let A be the matrix written in the standard bases of Kn and Km as
2 3
a11 a12 a13 ::: a1n
6a ::: a2n 7
6 21 a22 a23 7
AD6
6 :: :: :: :: :: 7
7:
4 : : : : : 5
am1 am2 am3 : : : amn

Assume first that rank.A/ D r. Then, there exist r linearly independent column vectors
6 v1 ; v2 ; : : : ; vr (without loss of generality, we assume that these vectors are the first r columns
of A). Let B1 D fe1 ; e2 ; : : : ; em g be the standard basis of Km .
First, if r D m  n, then B2 D fv1 ; v2 ; : : : ; vm g constitutes a basis of Km and we have

X
m
vj D aij ei ; 1  j  mI
iD1

hence, the transition matrix from the basis B1 to the basis B2 is


2 3
a11 a12 ::: a1m
6a ::: a2m 7
6 21 a22 7
D6
6 :: :: :: :: 7
7:
4 : : : : 5
am1 an2 ::: amm

It is clear that is invertible, so det. / ¤ 0. Also, det. / is a minor of A since the matrix
can be obtained from A by removing the last n  m columns.
Second, if r < m, since the elements of B2 are linearly independent, Theorem 4.6.8 shows
that there exist vrC1 ; : : : ; vm such that (we are allowed to choose vj D ej ; r C 1  j  m)

S D fv1 ; v2 ; : : : ; vr ; erC1 ; : : : ; em g

is a basis of Km . In this case, the transition matrix from B to S takes the form
2 3
a11 a12 : : : a1r 0 ::: 0
6 : : : a2r 0 ::: 07
6 a21 a22 7
6 :: :: :: :: :: :: :: 7
6 7
6 : : : : : : :7
6 7
D6
6 ar1 an2 : : : arr 0 ::: 07 7:
6 7
6 a.rC1/1 a.rC1/2 : : : a.rC1/r 1 ::: 07
6 7
6 :: :: :: :: :: :: :: 7
4 : : : : : : :5
am1 am2 : : : amr 0 ::: 1
6.4  Methods for Finding the Rank of a Matrix
253 6
Since S is a basis of Km , is invertible and det. / ¤ 0: This determinant can be computed
by using the last columns (m  r times) as
2 3
a11 a12 ::: a1r
6a ::: a2r 7
6 21 a22 7
det. / D det 6
6 :: :: :: :: 7
7:
4 : : : : 5
ar1 an2 ::: arr

Therefore, this minor is not zero.


Conversely, assume that there exists a nonzero minor of order r of the matrix A and that
the minor corresponding to the rows i1 ; i2 ; : : : ; ir is nonzero, that is the determinant of the
matrix
2 3
ai1 1 ai1 2 : : : ai1 r
6 7
6 ai2 1 ai2 2 : : : ai2 r 7
BD6 6 :: :: :: :: 7
7
4 : : : : 5
air 1 air 2 : : : air r

is not zero (det.B/ ¤ 0). Now, to show that v1 ; v2 ; : : : ; vr are linearly independent, we take
1 ; 2 ; : : : ; r in K such that

1 v1 C 2 v2 C    C r vr D 0Km ; (6.14)

and we need to show that 1 D 2 D    D r D 0K .


Now, as before, each vj ; 1  j  r can be written as a linear combination of the elements
of the basis B1 D fe1 ; e2 ; : : : ; em g as

X
m
vj D aij ei ; 1  j  r: (6.15)
iD1

Taking into account (6.15), then (6.14) can be expressed as a linear system of the form
8
ˆ a11 1 C a12 2 C    C a1r r D 0;
ˆ
ˆ
ˆ
< a21 1 C a22 2 C    C a2r r D 0;
ˆ :::
ˆ
ˆ

am1 1 C am2 2 C    C amr r D 0:

This leads, by retaining the rows i1 ; i2 ; : : : ; ir ,


8
ˆ ai1 1 1 C ai1 2 2 C    C ai1 r r D 0;
ˆ
ˆ
ˆ
< ai2 1 1 C ai2 2 2 C    C ai2 r r D 0;
ˆ :::
ˆ
ˆ

air 1 1 C air 2 2 C    C air r r D 0:
254 Chapter 6 • Linear Transformations and Matrices

This last system has a unique solution 1 D 2 D    D r D 0K , since the matrix B is


invertible. This shows that v1 ; v2 ; : : : ; vr are linearly independent, and thus rank.A/ D r.
This completes the proof of Theorem 6.4.2. t
u

Example 6.13
Use Theorem 6.4.2 to find the rank of the matrix
2 3
2 3 4
6 7
6 3 1 5 7
AD6 7:
4 1 0 1 5
6 0 2 4

Solution
First, according to (6.7), it is clear that rank.A/  3. We may easily check that the
determinants of all the 3  3 submatrices are zero. On the other hand, we have
" #
3 1
det D 1 ¤ 0:
1 0

Thus, rank.A/ D 2. J

Example 6.14
Consider the matrix
2 3
a 2 1 b
6 7
A D 4 3 0 1 4 5 ;
5 4 1 2

where a and b are real numbers.


1. Show that for any real numbers a and b, we have rank.A/  2.
2. Find the values a and b for which rank.A/ D 2.

Solution
1. Using (6.7), we deduce that rank.A/  3: To show that rank.A/  2 and according to
Theorem 6.4.2, it suffices to find a nonzero minor of order 2. Indeed, we have
" #
30
det D 12 ¤ 0:
54

Then, rank.A/  2 for all values a and b.


6.5  Exercises
255 6
2. To find the values a and b for which rank.A/ D 2 we need to look for all values a and
b that make the first row a linear combination of the second and the third rows. So, let  and
 be two real numbers such that

.a; 2; 1; b/ D .3; 0; 1; 4/ C .5; 4; 1; 2/:

This leads to the system of equations


8
ˆ
ˆ 3 C 5 D a;
ˆ
< 4 D 2;
ˆ
ˆ    D 1;

4 C 2 D b;

which gives  D 1=2;  D 1=2; a D 1 and b D 3. So, if a D 1 and b D 3, then


rank.A/ D 2, otherwise, the three rows will be linearly independent and rank.A/ D 3. J

6.5 Exercises

Exercise 6.1
Consider in M3 .R/ the matrix
2 3
a0b
6 7
A D 4b a 05:
0ba

Find the rank of A according to the values of a and b. 

Solution
First, if a D b D 0, then rank.A/ D 0. Also, by (6.7), rank.A/  3. On the other hand, we
have

det.A/ D a3 C b3 :

Hence, if a ¤ b, then det.A/ ¤ 0, and Theorem 6.3.3 shows that rank.A/ D 3.
Now, if a D b ¤ 0, then A becomes
2 3
a 0 a
6 7
A D 4 a a 0 5 :
0 a a
256 Chapter 6 • Linear Transformations and Matrices

Adding the first column to the last column, we get


2 3
a 0 0
6 7
A1 D 4 a a a 5 :
0 a a

Clearly the second and third columns are linearly dependent. Hence, rank.A1 /  2. On the
other hand, since a ¤ 0, then the first and second columns are linearly independent. Thus,
rank.A1 /  2. Consequently, if a D b ¤ 0, then rank.A/ D rank.A1 / D 2: J
6 Exercise 6.2 (A Property of a Matrix of Rank 1)
1. Let A and B be two matrices in Mmn .K/. Show that if rank.B/ D 1, then

j rank.A C B/  rank.A/j  1: (6.16)

2. Let A be a square matrix in Mn .K/ and B be a matrix in Mnm .K/. Show that if A is
invertible, then

rank.AB/ D rank.B/:

Solution
1. Since any matrix can be represented by a linear transformation and its rank is the rank
of this transformation, then all the properties for rank and nullity of linear transformations
obtained in  Chap. 5 remain true for matrices. So, it is clear from Exercise 5.7 that

rank.A C B/  rank.A/ C rank.B/:

This means, since rank.B/ D 1, that

rank.A C B/  rank.A/  1: (6.17)

On the other hand, Exercise 5.7 also shows that

rank.A/  rank.B/  rank.A C B/:

Again, since rank.B/ D 1, it follows that

rank.A C B/  rank.A/  1: (6.18)

Combining (6.17) and (6.18), we obtain inequality (6.16).


2. It is clear that (5.21) can be written (in matrix form) as

rank.AB/  min.rank.A/; rank.B//: (6.19)


6.5  Exercises
257 6
Since A is invertible, we have B D A1 AB, so applying (6.19), we get

rank.B/ D rank.A1 AB/  rank.AB/:

Consequently, we deduce that

rank.AB/  rank.B/  rank.AB/;

which finally yields rank.AB/ D rank.B/. J

Exercise 6.3 (Frobenius Inequality and Sylvester Law of Nullity)


Let A be a matrix in Mmr .K/, B be a matrix in Mrp .K/, and C be a matrix in Mpn .K/.
1. Show that

rank.AB/Crank.BC/  rank.B/Crank.ABC/ (Frobenius’ inequality). (6.20)

2. Deduce that for any two matrices A in Mmr .K/ and K in Mrn .K/

rank.AK/  rank.A/ C rank.K/  r: (6.21)

3. Prove that if A and B are square matrices in Mn .K/; then

null.AB/  null.A/ C null.B/ (Sylvester’s law of nullity). (6.22)

Solution
1. Inequality (6.20) is a direct consequence of (5.27).
2. Applying the Frobenius inequality for p D r, C D K and B D Ir , we obtain

rank.A/ C rank.K/  rank.B/ C rank.AK/:

Since rank.B/ D r, (6.21) holds.


3. If A is square matrix of order n, then Theorem 5.2.10 implies that

null.A/ D n  rank.A/:

Thus, using (6.21) for K D B and r D n, we get

n  rank.AB/  n  rank.A/  rank.B/ C n:

This leads to (6.22), by applying once again Theorem 5.2.10. J


258 Chapter 6 • Linear Transformations and Matrices

Exercise 6.4 (Rank of Idempotent Matrices)


Let A be a matrix in Mn .K/ such that rank.A/ D r:
1. Prove that there exist matrices B in Mnr .K/ and C in Mrn .K/ such that

A D BC and rank.B/ D rank.C/ D r:

The matrix A in Mn .K/ is called idempotenta (see Exercise 1.5) if A2 D A.


2. Show that if A is an idempotent matrix, then

rank.A/ D tr.A/:b
6
3. Deduce that if A is an idempotent matrix in Mn .K/ with rank.A/ D n, then A D In .
4. Show that if A is idempotent with rank.A/ D r, then

rank.In  A/ D n  r:

5. Let A and B be two idempotent matrices in Mn .K/. Show that if AB D BA D 0, then A C B


is idempotent and

rank.A C B/ D rank.A/ C rank.B/:

6. Find the rank of the matrix


2 3
2 2 4
6 7
A D 4 1 3 4 5 :
1 2 3


a
Idempotent matrices are the matrices associated to projections.
b
In fact if A2 D kA, then we have tr.A/ D k  rank.A/.

Solution
1. Since rank.A/ D r, then there exists r linearly independent column vectors B1 ; B2 ; : : : ; Br
of A. Then, these vectors form a basis of the column space R .A/. We introduce the matrix B
whose columns are the vectors B1 ; B2 ; : : : ; Br as

B D ŒB1 ; B2 ; : : : ; Br :

It is clear that B is in Mnr .K/. Since B1 ; B2 ; : : : ; Br form a basis, they are linearly
independent. Hence, rank.B/ D r.
Now, it is clear that any column of A, say the ith column Ai , may be expressed as

Ai D BCi
6.5  Exercises
259 6
where Ci is the vector of the coefficients of the linear combination of B1 ; B2 ; : : : ; Br that gives
Ai . Denoting

C D ŒC1 ; C2 ; : : : ; Cn 

and

A D ŒA1 ; A2 ; : : : ; An ;

then we have

A D BC:

Finally, (6.19) shows that

r D rank.A/ D rank.BC/  rank.C/  r:

Hence, rank.C/ D r.
2. Now, since A is idempotent, we have

A2 D BCBC D A D BC: (6.23)

Using (6.19), we deduce that

rank.CB/  rank.A/ D r:

Since CB is a square matrix in Mr .K/, rank.CB/  r. This shows that

rank.CB/ D r;

so CB is invertible (Theorem 6.3.2). Hence, multiplying (6.23) from the left by .CB/1 C and
from the right by B.CB/1 , we obtain

CB D Ir :

Now using Theorem 1.2.14, we have

tr.A/ D tr.BC/ D tr.CB/ D tr.Ir / D r D rank.A/:

3. It is clear that, according to Theorem 6.3.4, A is invertible and we have

A D In A D A1 AA D A1 A2 D A1 A D In :


260 Chapter 6 • Linear Transformations and Matrices

4. We have seen in Exercise 1.5 that if A is idempotent, then In  A is also idempotent.


Thus, using assertion (1), we get

rank.In  A/ D tr.In  A/ D tr.In /  tr.A/ D n  r:

5. We have

.A C B/2 D A2 C B2 C AB C BA D A2 C B2 D A C B;

since AB D BA D 0 and A and B are idempotent. This shows that A C B is idempotent.


6 Now, using assertion (2) together with Theorem 1.2.14, we have

rank.A C B/ D tr.A C B/ D tr.A/ C tr.B/ D rank.A/ C rank.B/:

6. We may easily check that A2 D A and thus A is idempotent. Then applying (1), we get

rank.A/ D tr.A/ D 2 C 3  3 D 2:

Exercise 6.5
Let f be an endomorphism of R3 whose matrix with respect to the standard basis B D
fe1 ; e2 ; e3 g is
2 3
0 1 0
6 7
M. f / D 4 0 0 1 5 :
1 3 3

1. Show that f is an automorphism and find its inverse f 1 .


2. Find a basis S D fs1 ; s2 ; s3 g in R3 such that

f .s1 / D s1 ; f .s2 / D s1 C s2 ; f .s3 / D s2 C s3 :

3. Find the transition matrix from the basis B to the basis S and find 1 .

Solution
First, we have rank. f / D dimR Im. f / D rank.M. f //: Now, we have by a simple computation

det.M. f // D 1 ¤ 0:

This immediately shows, by using Theorem 6.3.3, that rank.M. f // D 3. Consequently,

dimR Im. f / D dimR R3 :


6.5  Exercises
261 6
Hence, since Im. f / is a subspace of R3 , Theorem 4.6.7 implies that

Im. f / D R3 :

Now Theorem 5.2.9 shows that f is surjective and hence, applying Theorem 5.3.3, we deduce
that f is bijective and therefore an automorphism.
Applying Theorem 6.1.1, we have

M. f 1 / D .M. f //1 :

So, we can easily, by using the methods in  Chap. 1 or  Chap. 2, find that
2 3
3 3 1
6 7
.M. f //1 D 41 0 05:
0 1 0

Consequently, we have (see Remark 6.1.2),


2 3 2 32 3 2 3
u 3 3 1 u 3u  3v C w
6 7 6 76 7 6 7
f 1 .u; v; w/ D .M. f //1 4 v 5 D 4 1 0 0 5 4 v 5 D 4 u 5;
w 0 1 0 w v

Thus, we deduce that

f 1 .u; v; w/ D .3u  3v C w; u; v/:

2. It is clear that
2 3 2 3
u v
6 7 6 7
f .u; v; w/ D M. f / 4 v 5 D 4 w 5;
w u  3v C 3w

so

f .u; v; w/ D .v; w; u  3v C 3w/:

Let s1 D .u1 ; v1 ; w1 / then f .s1 / D s1 implies that


8
ˆ
< v1 D u1 ;
w1 D v1 ;

u1  3v1 C 3w1 D w1 :
262 Chapter 6 • Linear Transformations and Matrices

This means that u1 D v1 D w1 . Consequently, s1 D .u1 ; u1 ; u1 / D u1 .1; 1; 1/. So, we can


choose s1 D .1; 1; 1/. Similarly, we have f .s2 / D f .u2 ; v2 ; w2 / D s1 C s2 , whence
8
ˆ
< v2 D 1 C u2 ;
w2 D 1 C v2 ;

u2  3v2 C 3w2 D 1 C w2 :

So, s2 D .u2 ; u2 C 1; 2 C u2 /, and we can choose s2 D .0; 1; 2/. By the same method and
since f .s3 / D f .u3 ; v3 ; w3 / D s2 C s3 , we obtain the system of equations
6 8
ˆ
< v3 D u3 ;
w2 D 1 C v3 ;

u3  3v3 C 3w3 D 2 C w3 :

Then, we get s3 D .u3 ; u3 ; 1 C u3 /. As before, we can choose s3 D .0; 0; 1/.


3. It is clear that the entries of the column j; j D 1; 2; 3 of the transition matrix are the
components of sj with respect to the basis B. That is,
2 3
100
6 7
D 41 1 05:
121

It is clear that this matrix is invertible. To find 1 , we simply need to find the components
of ej ; j D 1; 2; 3 with respect to the basis S. We have
8
ˆ
< s1 D e1 C e2 C e3 ;
s2 D e2 C 2e3 ;

s3 D e3 :

Then, this leads to


8
ˆ
< e1 D s1  s2 C s3 ;
e2 D s2  2s3 ;

e3 D s3 :

Consequently, we obtain
2 3
1 0 0
6 7
1 D 4 1 1 0 5 :
1 2 1

J
6.5  Exercises
263 6
Exercise 6.6
Let A be a matrix in Mmn .K/ and B be a matrix in Mnm .K/. Show that if m > n, then

det.AB/ D 0:

Solution
First, using (5.21) together with Theorem 6.1.1, we deduce that

rank.AB/  min.rank.A/; rank.B//: (6.24)

Also, keeping in mind (6.7), we have rank.A/  min.n; m/. Since m > n, we deduce that
rank.A/  n. Consequently, applying (6.24), we obtain

rank.AB/  rank.A/  n < m:

Since AB is a square matrix in Mm .K/, Theorem 6.3.3 and the fact that rank.AB/ ¤ m imply
that det.AB/ D 0: J

Exercise 6.7 (Consistency and Rank)


Consider the system of m linear equations in n unknowns of the form

AX D b; (6.25)

where A is a matrix in Mmn .K/, X is a vector in Mn1 .K/, and b is a vector in Mm1 .K/.
The system (6.25) is said to be consistent (or solvable) if it possesses at least one solution.
1. Show that system (6.25) is consistent if and only if
h i
rank A b D rank.A/:

2. Show that the system


2 32 3 2 3
1 2 0 x1 11
6 76 7 6 7
4 2 3 7 5 4 x2 5 D 4 2 5 (6.26)
1 4 2 x3 7

is inconsistent.


264 Chapter 6 • Linear Transformations and Matrices

Solution
1. First, if the system is consistent, then it has at least one solution. Then, in this case b should
be a linear combination of the columns of A, that is

b D x1 A1 C x2 A2 C    C xn An ;
h i
where A1 ; A2 ; : : : ; Am are the columns of A. Therefore, the rank of A b is the number of
linearly independent elements from the set fA1 ; A2 ; : : : ; An g, which is exactly (by definition)
the rank of A. h i
Conversely, assume that rank A b D rank.A/; then b is a linear combination of the
6 columns of A, and the coefficients of this combination provide a solution to (6.25). This
means that (6.25) is consistent.
2. We have
2 3
1 2 0
6 7
det 4 2 3 7 5 D 0;
1 4 2

and
" #
1 2
det D 7 ¤ 0:
2 3

Thus, rank.A/ D 2. On the other hand, for the matrix


2 3
1 2 0 11
6 7
4 2 3 7 2 5
1 4 2 7

we have
2 3
1 2 11
6 7
det 4 2 3 2 5 D 174 ¤ 0:
1 4 7
h i
Consequently, rank A b D 3. Hence, the system (6.26) is inconsistent. J

Exercise 6.8 (Diagonally Dominant Matrix)


Let A D .aij /; 1  i; j  n, be a matrix in Mn .C/. Show that if for all i D 1; : : : ; n,
X
jaii j > jaij j; (6.27)
i¤j
6.5  Exercises
265 6
then A is invertible. Property (6.27) means that the magnitude of the diagonal entry in a row
is larger than or equal to the sum of all the magnitudes of all the other, non-diagonal entries
in that row. A matrix satisfying this property is called diagonally dominant of its rows.
As an application to the above result, show that the matrix
2 3
3 2 0
6 7
A D 4 1 3 1 5
1 2 4

is invertible. 

Solution
To show that A is invertible it is equivalent to prove that the null space

N .A/ D fX 2 Cn ; such that AX D 0Cn g

is f0Cn g, since in this case rank.A/ D n and Theorem 6.3.2 shows that A is invertible. Assume
that A is not invertible. Then, there exists a vector X in N .A/ with
2
3
x1
6x 7
6 27
XD6 7
6 :: 7 ¤ 0Cn :
4 : 5
xn

Let xi0 be the component in X satisfying

jxi0 j D max jxi j; i D 1; : : : ; n:

It is clear that since X ¤ 0Cn , then jxi0 j > 0: On the other hand, since X is in N .A/, we have
AX D 0Cn . This implies that for all i D 1; : : : ; n, we have

X
n
aij xj D 0:
jD1

Now, in particular, for i D i0 , we have

X
n
ai0 j xj D 0;
jD1

which yields
ˇ X ˇ X X
ˇ ˇ
jai0 i0 xi0 j D ˇ  ai0 j xj ˇ  jai0 j j  jxj j  jxi0 j jai0 j j:
j¤i0 j¤i0 j¤i0
266 Chapter 6 • Linear Transformations and Matrices

Since jxi0 j ¤ 0, we have


X
jai0 i0 j  jai0 j j:
j¤i0

Consequently, (6.27) is not satisfied for i D i0 . Thus, we deduce that if (6.27) holds, then A
has to be invertible.
For the application, we have for the matrix A,

j3j > j1j C j0j; j  3j > j1j C j1j; j4j > j  1j C j2j:
6
Consequently, A is invertible. The converse of the above result is not true. For instance, the
matrix
2 3
3 2 0
6 7
4 1 4 4 5
1 2 4

is invertible, although (6.27) is not satisfied, since j  4j < j1j C j4j. J

Exercise 6.9 (Idempotent Matrix)


Let P be a matrix in Mn .K/. We say that P is idempotent if P2 D P.
Now, let A be a matrix in Mn .R/ satisfying for ˛ and ˇ in R with ˛ ¤ ˇ,

.A  ˛In /.A  ˇIn / D 0: (6.28)

1. Show that there exists two elements  and  in K such that the two matrices P and Q
defined as:

P D .A  ˛In / and Q D .A  ˇIn /

are idempotent. Show that P C Q D In .


2. Write the matrix A in terms of P and Q and deduce Ak for any positive integer k.
3. Show that if ˛ˇ ¤ 0, then A is invertible and find A1 .
4. Consider for m ¤ 0 the matrix
2 3
0 m m2
6 7
A D 4 1=m 0 m 5 : (6.29)
1=m2 1=m 0

Find ˛ and ˇ such that A satisfies (6.28) and deduce Ak for any integer k.


6.5  Exercises
267 6
Solution
1. We need to find  and  such that P2 D P and Q2 D Q. Indeed, we have

P2 D .A  ˛In /..A  ˛In //


D 2 .A2  2˛A C ˛ 2 In /
D 2 .˛  ˇ/.A C ˛In /;

1
where we have used (6.28). Thus, P2 D P if and only if  D  . By the same argument,
˛ˇ
1
we can show that Q2 D Q if and only if  D . Now, it is clear that
˛ˇ

1 1
PCQD .A  ˛In / C .A  ˇIn / D In :
˛ˇ ˛ˇ

2. We can easily deduce from the definition of P that

A D .˛  ˇ/P C ˛In

D ˇP C ˛Q:

Now, since P2 D P; Q2 D Q, and

PQ D .A  ˛In /.A  ˇIn / D 0;

by (6.28), then, it follows that for any positive integer k, we have

Ak D ˇ k P C ˛ k Q:

3. Now, if ˛ˇ ¤ 0, then, we may rewrite (6.28) as


   
˛Cˇ 1 ˛Cˇ 1
A In  A D In  A A D In :
˛ˇ ˛ˇ ˛ˇ ˛ˇ

Hence A is invertible and the uniqueness of the inverse shows that

˛Cˇ 1
A1 D In  A:
˛ˇ ˛ˇ
In terms of P and Q, the above formula can be rewritten as

1 1
A1 D P C Q:
ˇ ˛

4. For A defined as in (6.29), we have

.A  ˛In /.A  ˇIn / D 0;


268 Chapter 6 • Linear Transformations and Matrices

whence
2 32 3 2 3
˛ m m2 ˇ m m2 000
6 76 7 6 7
4 1=m ˛ m 5 4 1=m ˇ m 5 D 4 0 0 0 5 :
1=m2 1=m ˛ 1=m2 1=m ˇ 000

That is,
2 3
˛ˇ C 2 m.˛ C ˇ C 1/ m2 .˛ C ˇ C 1/ 2 3
6 ˛CˇC1 7 000
6 ˛ˇ C 2 m.˛ C ˇ C 1/ 7 6 7
6 6
4 m 7 D 40 0 05:
5
˛CˇC1 ˛CˇC1 000
˛ˇ C 2
m2 m

Since m ¤ 0, this last equation leads to the two equations

˛ˇ C 2 D 0 and ˛ C ˇ C 1:

That is, ˛ D 2 and ˇ D 1. Hence, the matrices P and Q now becomes
2 3
2=3 m=3 m2 =3
1 6 7
P D  .A  2I3 / D 4 1=.3m/ 2=3 m=3 5
3 2
1=.3m / 1=.3m/ 2=3

and
2 3
1=3 m=3 m2 =3
1 6 7
Q D .A C I3 / D 4 1=.3m/ 1=3 m=3 5 :
3
1=.3m2 / 1=.3m/ 1=3

Thus A D P C 2Q and it is invertible, and therefore for any integer k, we have


2 3
2.1/k C 2k 2k  .1/k 2k  .1/k
6 m m2 7
6 3 3 3 7
6 2k  .1/k 2.1/k C 2k 2k  .1/k 7
A D .1/ P C 2 Q D 6
k k k
6 m 7:
7
6 3m 3 3 7
4 2k  .1/k 2k  .1/k 2.1/k C 2k 5
3m2 3m 3
J
269 7

Eigenvalues and Eigenvectors

Belkacem Said-Houari

© Springer International Publishing AG 2017


B. Said-Houari, Linear Algebra, Compact Textbooks in Mathematics,
DOI 10.1007/978-3-319-63793-8_7

7.1 Definitions

In the previous chapters, we have defined some numbers associated to a matrix, such as
the determinant, trace, and rank. In this chapter, we focus on scalars and vectors known
as eigenvalues and eigenvectors. The eigenvalues and eigenvectors have many important
applications, in particular, in the study of differential equations.
Let E be a vector space over a field K with dimK E D n, and f be an endomorphism
in L .E/. We have seen in Example 6.9 that it might be possible to find a basis of E
in which the matrix M. f / associated to f is diagonal. So, the question is: Does such
basis always exist, and if so, how can we find it? In addition, is the matrix M. f / always
diagonal in this basis? One of the main goals here is to answer these questions.

Definition 7.1.1 (Eigenvector and Eigenvalue of an Endomorphism)


Let E be a vector space over a field K and f be an endomorphism in L .E/. Let u be
an element in E such that u ¤ 0E . Then we say that u is an eigenvector of f if there
exists  in K such that

f .u/ D u: (7.1)

In this case, we say that  is an eigenvalue of f , and u is an associated eigenvector.

It is clear that if u D 0E , then (7.1) is satisfied for any  in K.

Theorem 7.1.1 (Uniqueness of the Eigenvalue)


Let f be an endomorphism in L .E/ and u ¤ 0E be an eigenvector of f . Then, the
eigenvalue  of f associated to the eigenvector u is unique.a
a
We can deduce here that two different eigenvalues of f cannot have the same associated eigenvector u.
270 Chapter 7 • Eigenvalues and Eigenvectors

Proof
Assume that there exists another eigenvalue  associated to u and satisfying (7.1). Then

u D u:

This gives, .  /u D 0E . This yields, by using Theorem 4.2.1,    D 0K , which shows
that  D . This gives the uniqueness of the eigenvalue. t
u

We have seen above that the eigenvalue  associated to the eigenvector u is unique.
On the other hand, the eigenvector u is not unique. In fact, if u is an eigenvector, then
for any ˛ ¤ 0K , ˛u is also an eigenvector, since

7 f .˛u/ D ˛f .u/ D ˛.u/ D .˛u/:

Moreover, if u and v are two eigenvectors associated to the same eigenvalue , then
u C v is also an eigenvector associated to , since

f .u C v/ D f .u/ C f .v/ D u C v D .u C v/:

Thus, if  is an eigenvalue of f , then we may construct a whole subspace V./ ¤ f0E g


of E associated to  and we have the following definition.

Definition 7.1.2
Let  be an eigenvalue of f as in Definition 7.1.1. Then the eigenspace V./ is
defined to be the set of all u in E such that f .u/ D u:

Now, we introduce an important property of the eigenspace.

Theorem 7.1.2
Let f be an endomorphism as in Definition 7.1.1. Let 1 and 2 be two eigenvalues of
f such that 1 ¤ 2 . Then we have

V.1 / \ V.2 / D f0E g: (7.2)

Proof
Let u be an element of V.1 / \ V.2 /. Then we have

f .u/ D 1 u D 2 u:

That is .1  2 /u D 0E . Since 1 ¤ 2 , we deduce that u D 0E , which completes


the proof. t
u
7.1  Definitions
271 7
Notation When there is no confusion, we write vectors in Kn as
23
x1
6x 7
6 27
XD6 7
6 :: 7 or simply as X D .x1 ; x2 ; : : : ; xn /;
4 : 5
xn

where x1 ; x2 ; : : : ; xn are elements of K.

Example 7.1
Consider the endomorphism f defined as follows:

f W R3 ! R3 ;

.x; y; z/ 7! .2x C z; 4y C 5z; z/:

We have

f .2; 0; 0/ D .4; 0; 0/ D 2.2; 0; 0/:

Consequently,  D 2 is an eigenvalue of f and the vector .2; 0; 0/ is an associated


eigenvector. 

Example 7.2
Consider the endomorphism f defined as follows:

f W R3 ! R3 ;

.x; y; z/ 7! .x  y; x C y C z; 3z/:

Find its eigenvalues and the corresponding eigenvectors and eigenspaces. 

Solution
Let  be an eigenvalue of f , i.e.,

f .x; y; z/ D .x; y; z/ D .x; y; z/:

This yields the system of equations


8
ˆ
< x  y D x;
x C y C z D y;

3z D z:
272 Chapter 7 • Eigenvalues and Eigenvectors

Solving this system gives

 D 0; and x D y; z D 0

 D 2; and x D y; z D 0;
 D 3; and y D 2x; z D 3x:

Thus, the eigenvalues are 1 D 0; 2 D 2 and 3 D 3 and the eigenspaces are

V.1 / D f.x; x; 0/g; V.2 / D f.x; x; 0/g; V.3 / D f.x; 2x; 3x/g;

with x 2 R. Clearly, V.1 / is spanned by the vector X1 D .1; 1; 0/, V.2 / by the vector
7 X2 D .1; 1; 0/, and V.3 / by the vector X3 D .1; 2; 3/. Thus, X1 , X2 and X3 are the
eigenvectors associated to the eigenvalues 1 , 2 , and 3 , respectively. J

Example 7.3 (Eigenvalues of a Projection)


Let E be a finite-dimensional vector space over a field K. Show that if f is a projection in
L .E/, then its eigenvalues are 0K or 1. 

Solution
We have defined in Exercise 5.1 a projection as an element f in L .E/ that satisfies f ı f D f .
So, let  be an eigenvalue of f , i.e.,

f .u/ D u;

for some u in E with u ¤ 0E . Then

. f ı f /.u/ D f . f .u// D f .u/ D f .u/ D 2 u:

On the other hand, since f is a projection, we have

f . f .u// D f .u/ D u:

Combining the two identities above, we obtain u D 2 u: Since u ¤ 0E , it follows that


 D 1 or  D 0K . J

7.2 Properties of Eigenvalues and Eigenvectors

By (7.1)  being an eigenvalue of the endomorphism f means that

f .u/ D u; with u ¤ 0E :


7.2  Properties of Eigenvalues and Eigenvectors
273 7
This is equivalent to

. f   IdE /.u/ D 0E ; with u ¤ 0E :

This is also equivalent to say that Ker. f  IdE / contains u and since u ¤ 0E , then
we deduce that  is an eigenvalue of f if and only if Ker. f  IdE / ¤ f0E g. Hence,
Theorem 5.2.4 shows that f  IdE is not injective, and therefore is not bijective. Hence,
f  IdE is not invertible.
So, we have already proved the following statement.

Theorem 7.2.1 (Characterization of Eigenvalues of an Endomorphism)


Let E be a finite dimensional vector space over a field K and f be an endomorphism in
L .E/. Then the following statements are equivalent:
1.  is an eigenvalue of f .
2. The endomorphism f  IdE is not invertible.

We go on to investigate some properties of the eigenvalues and eigenvectors. First,


looking at Example 7.2 one more time, we see that the eigenvectors X1 ; X2 , and X3 are
linearly independent and thus according to Theorem 4.6.5, they form a basis of R3 : In
fact, this turns out to be always true if the eigenvalues are distinct:

Theorem 7.2.2 (Eigenvectors Associated to Distinct Eigenvalues Form a Basis)


Let E be a vector space over a field K such that dimK E D n and f be an endomorphism
in L .E/. Suppose that f has n distinct eigenvalues 1 ; 2 ; : : : ; n and let u1 ; u2 ; : : : ; un
be the associated eigenvectors, respectively. Then the set

B D fu1 ; u2 ; : : : ; un g

is a basis of E.

Proof
Let ˛1 ; ˛2 ; : : : ; ˛n be n elements of K satisfying

˛1 u1 C ˛2 u2 C    C ˛n un D 0E : (7.3)

Now, let

gi D f  i IdE ; i D 1; 2; : : : ; n:
274 Chapter 7 • Eigenvalues and Eigenvectors

Applying g1 to (7.3) and using the fact that g1 .u1 / D 0E (since 1 is an eigenvalue of f ) and

g1 .uj / D .j  1 /uj ; j D 2; : : : n;

we obtain

X
n
.j  1 /˛j uj D 0E : (7.4)
jD2

Now, applying g2 to (7.4), we obtain, as above,

X
n

7 .j  1 /.j  2 /˛j uj D 0E : (7.5)


jD3

After n  1 such operations, by applying each time gi ; i D 2; : : : ; n, to


!
X
n Y
i1
.j  k / ˛j uj D 0E ; (7.6)
jDi kD1

we obtain

.n  1 /.n  2 /    .n  n1 /˛n un D 0E :

This gives ˛n D 0K , since all the eigenvalues are distinct. Since the ordering of the
eigenvalues and eigenvectors is arbitrary, we can easily verify that ˛1 D ˛2 D    D
˛n1 D 0K . This shows that the set B D fu1 ; u2 ; : : : ; un g is linearly independent, and since
dimK E D n, Theorem 4.6.5 implies that B is a basis of E. t
u

ⓘ Corollary 7.2.3 Let E be a vector space over a field K such that dimK E D n and f be
an endomorphism in L .E/. Then f has at most n distinct eigenvalues.

Proof
Suppose that 1 ; 2 ; : : : ; m are distinct eigenvalues of f . Let u1 ; u2 ; : : : ; um be their
corresponding eigenvectors. Thus, Theorem 7.2.2 shows that u1 ; u2 ; : : : ; um are linearly
independent. Hence, Lemma 4.6.4 shows that m  n. t
u

Now, consider the endomorphism f defined as

f W R3 ! R3 ;
.x; y; z/ 7! .2x C 4y C 3z; 4x  6y  3z; 3x C 3y C z/: (7.7)

We can easily check that f has three eigenvalues

1 D 1; 2 D 3 D 2:
7.2  Properties of Eigenvalues and Eigenvectors
275 7
One of the eigenvalues has a multiplicity 2. We may also show that the eigenspace V.1 /
is spanned by the vector X1 D .1; 1; 1/ and V.2 / D V.3 / is spanned by the vector
X2 D .1; 1; 0/. By Theorem 4.6.3, the set fX1 ; X2 g is not a basis of R3 .
So, we have seen in Theorem 7.2.2, that if E is a vector space over a field K with
dimK E D n and f is an endomorphism of E which has n distinct eigenvalues (all with
multiplicity one), then the associated eigenvectors form a basis of E. On the other hand,
we have shown that for the endomorphism defined in (7.7), the eigenvectors associated
to the eigenvalues of f does not form a basis of R3 , since not all the eigenvalues are
of multiplicity one. Thus, the question now is: when do the eigenvectors associated to
eigenvalues with multiplicities not necessary equal to one form a basis of E? To answer
this question, we define what we call algebraic multiplicity and geometric multiplicity
of an eigenvalue.

Definition 7.2.1 (Algebraic and Geometric Multiplicities)


Let E be a finite-dimensional vector space over a field K and f be an endomorphism
of E. Let  be an eigenvalue of f .
▬ The algebraic multiplicity of the eigenvalue  is the number of times  appears
as an eigenvalue of f .
▬ The geometric multiplicity of the eigenvalue  is the number of linearly
independent eigenvectors associated to . Or the dimension of the eigenspace
associated to .

For example for the endomorphism defined in (7.7), the eigenvalue 2 D 2 has
algebraic multiplicity 2 and geometric multiplicity 1.

Definition 7.2.2 (Complete and Defective Eigenvalues)


Let E be a finite-dimensional vector space over a field K and f be an endomorphism
of E. Let  be a repeated eigenvalue of f with algebraic multiplicity `.
▬ The eigenvalue  is called complete if there are ` corresponding linearly
independent associated eigenvectors. That is, if the geometric multiplicity of 
is equal to its algebraic multiplicity.
▬ The eigenvalue  is defective if the geometric multiplicity is strictly less than
the algebraic multiplicity.

Example 7.4
Show that all the eigenvalues of the endomorphism

f W R3 ! R3 ;

.x; y; z/ 7! .2x C y C z; x  2y C z; x C y  2z/; (7.8)

are complete. 
276 Chapter 7 • Eigenvalues and Eigenvectors

Solution
First, we can easily show that f has two eigenvalues,

1 D 0; 2 D 3 D 3:

It is clear that 2 has algebraic multiplicity 2 (that is ` D 2). Now, in order for 2 to be
complete, we need to find two independent eigenvectors associated to 2 D 3. That is, we
need to show that the geometric multiplicity is also equal to 2. So, let X D .x; y; z/ be an
eigenvector associated to 2 D 3, i.e.,

f .X/ D 3X:

7 Equivalently,
8
ˆ
< x C y C z D 0;
x C y C z D 0;

x C y C z D 0:

This means that X is an eigenvector associated to  D 3 if and only if its components


satisfies

x C y C z D 0;

or

z D x  y:

Therefore, we can write X as

X D .x; y; z/ D x.1; 0; 1/ C y.0; 1; 1/:

Consequently,

X1 D .1; 0; 1/ and X2 D .0; 1; 1/

are two linearly independent eigenvectors associated to 2 D 3. Therefore, the geometric
multiplicity is equal to 2, so indeed 2 D 3 is a complete eigenvalue. J

Example 7.5
Show that the endomorphism

f W R2 ! R2 ;

.x; y/ 7! .y; 0/; (7.9)

has a defective eigenvalue. 


7.3  Eigenvalues and Eigenvectors of a Matrix
277 7
Solution
It is easy to check that  D 0 is an eigenvalue of f with algebraic multiplicity 2. If X D .x; y/
is an eigenvector associated to , then y D 0. Thus all the eigenvectors of f are multiple
of the vector X1 D .1; 0/. Therefore, the geometric multiplicity is 1, and so the eigenvalue
 D 0 is defective. J

Theorem 7.2.4
Let E be a vector space over a field K such that dimK E D n and f be an endomorphism
in L .E/ such that all its eigenvalues are complete. Then, the set of eigenvectors
associated to the complete eigenvalues form a basis of E.

Proof
Let 1 ; 2 ; : : : ; ` be the set of complete eigenvalues of f and let ki ; i D 1; 2; : : : ; `; be the
algebraic multiplicities of the i . Then,

k1 C k2 C    C k` D n:

Since the eigenvalues are complete,

dimK V.i / D ki ; i D 1; 2; : : : ; `;

where V.i / is the eigenspace associated to i ; i D 1; 2; : : : ; `. Using (4.16) together with


(7.2), we deduce that

dimK ŒV.1 / ˚ V.2 / ˚    ˚ V.` / D n:

Therefore, applying Theorem 4.6.7, we obtain

E D V.1 / ˚ V.2 / ˚    ˚ V.` /:

Hence, the union of the bases of V.i /; i D 1; 2; : : : ; ` which consists of all the eigenvectors
of f forms a basis of E. This completes the proof of Theorem 7.2.4. t
u

7.3 Eigenvalues and Eigenvectors of a Matrix

In this section, we can restate for matrices all the results concerning the eigenvalues of
an endomorphism. As we saw in  Chap. 6, to each linear transformation f , one can
associate a unique matrix A D M. f / and vice versa. The eigenvalues, eigenvectors, and
eigenspaces of A D M. f / are then, by definition, the eigenvalues, eigenvectors, and
eigenspaces of f .
278 Chapter 7 • Eigenvalues and Eigenvectors

Definition 7.3.1 (Eigenvalues of a Matrix and Spectrum)


Let A be a matrix in Mn .K/. Let X be a vector in Kn such that X ¤ 0Kn . Then we
say that X is an eigenvector of A if there exists  in K such that

AX D X: (7.10)

In this case, we say that  is an eigenvalue of A and X is called an associated


eigenvector.
The set of all distinct eigenvalues of A is called the spectrum of A and it is denoted
by .A/.

7 Example 7.6
Consider the matrix
2 3
1 b1 c1
6 7
A D 4 2 b2 c2 5 :
3 b3 c3

Find the entries bi ; ci ; i D 1; 2; 3 such that A will have the following eigenvectors:
2 3 2 3 2 3
1 1 0
6 7 6 7 6 7
X1 D 4 0 5 ; X2 D 4 1 5 ; X3 D 4 1 5 :
1 0 1

What are the eigenvalues of A? 

Solution
By Definition 7.3.1, X1 is an eigenvector of A if

AX1 D 1 X1 ;

for some 1 in R. This gives the system


8
ˆ
< 1 C c1  1 D 0;
c2 C 2 D 0;

c3  1 C 3 D 0;

the solution of which is

c2 D 2; and c1  c3 D 2: (7.11)

Similarly, X2 and X3 are eigenvectors of A, if

AX2 D 2 X2 ; and AX3 D 3 X3 ;


7.3  Eigenvalues and Eigenvectors of a Matrix
279 7
for some 2 and 3 in R. Thus, we obtain the systems
8 8
ˆ
< b1 C 2  1 D 0; ˆ
< b1  c1 D 0;
b2  2  2 D 0; and b2  c2  3 D 0;
:̂ :̂
b3  3 D 0: b3  c3 C 3 D 0:

Consequently, we obtain

b3 D 3; b1 D c1 ; b1 C b2 D 3; b2 C b3 D c2 C c3 : (7.12)

Hence, (7.11) and (7.12) yield

b1 D 5; b2 D 2; b3 D 3; c1 D 5; c2 D 2; c3 D 3:

Thus, the matrix A is


2 3
1 5 5
6 7
A D 4 2 2 2 5 :
3 3 3

Now, from the above systems, we can easily see that 1 D 6; 2 D 4 and 3 D 0, which
are the eigenvalues of A. J

We have seen in Theorem 7.2.1 that  is an eigenvalue of an endomorphism f if and


only if f   IdE is not invertible. Now, similarly,  is an eigenvalue of A if and only if
A  In is not invertible, and this according to Theorem 2.4.8 is equivalent to say that
det.A  In / D 0: Thus, we summarize these in the following theorem.

Theorem 7.3.1 (Characterization of Eigenvalues of a Matrix)


Let A be a matrix in Mn .K/ and  be an element of K. Then the following statements
are equivalent:
1.  is an eigenvalue of A.
2. The matrix A  In is not invertible i.e., it is singular.
3. det.A  In / D 0.

In Examples 7.1 and 7.6, we have found the eigenvalues and eigenvectors simul-
taneously. The third statement in Theorem 7.3.1 separates completely the problem of
finding the eigenvalues of a matrix from that of finding the associated eigenvectors. So,
we can easily find an eigenvalue of a matrix, without needing to know a corresponding
eigenvector.
280 Chapter 7 • Eigenvalues and Eigenvectors

Example 7.7
Find the eigenvalues and the corresponding eigenvectors for the matrices:
2 3
" # 5 01
1 6 6 7
AD and B D 4 1 1 05:
0 5
7 1 0

Solution
1. To find the eigenvalues of A, we use Theorem 7.3.1 and we compute det.A  I2 /. We get
" #
7 1   6
det.A  I2 / D det D .1  /.5  /:
0 5

Therefore,  is an eigenvalue of A is and only if .1  /.5  / D 0. This gives

1 D 1; 2 D 5:

Now, if
" #
x1
X1 D
x2

is an eigenvector associated to 1 , then we have


" # " #
x 0
.A  1 I/ 1 D :
x2 0

This gives 6x2 D 0, so x2 D 0, and then


" # " #
x1 1
X1 D D x1 :
0 0

So, we can choose


" #
1
X1 D :
0

Similarly, if
" #
x1
X2 D
x2
7.3  Eigenvalues and Eigenvectors of a Matrix
281 7
is an eigenvector associated to 2 D 5, then we have
" # " #
x1 0
.A  2 I2 / D :
x2 0

This leads to 6x1 C 6x2 D 0, that is, x1 D x2 and then


" #
1
X2 D x1 :
1

Thus,
" #
1
X2 D :
1

2. For the matrix B, we have


2 3
5 0 1
6 7
det.B  I3 / D det 4 1 1   0 5 D 3 C 62  12 C 8 D .  2/3 :
7 1 

Thus,  D 2 is an eigenvalue of algebraic multiplicity 3. By the same method as above,


we can find one eigenvector associated to  D 2 W
2 3
1
6 7
Y1 D 4 1 5 :
3

Example 7.8
Find the eigenvalues and the corresponding eigenvectors of the matrix
2 3
200
6 7
A D 40 3 45:
049

Solution
As above, we compute

det.A  I/ D 3 C 142  35 C 22

D .  11/.  2/.  1/:


282 Chapter 7 • Eigenvalues and Eigenvectors

Thus, the eigenvalues of A are the zeros of this polynomial, namely

1 D 1; 2 D 2; 3 D 11:

Now, if
2 3
x1
6 7
X1 D 4 x2 5
x3

is an eigenvector associated to 1 D 1, that is,


2 3 2 3
7 x1 0
6 7 6 7
.A  1 I/ 4 x2 5 D 4 0 5 ;
x3 0

then x1 D 0 and x2 D 2x3 . Hence,


2 3 2 3
0 0
6 7 6 7
X1 D 4 2x3 5 D x3 4 2 5 :
x3 1

So, we can take


2 3
0
6 7
X1 D 4 2 5 :
1

Similarly, if
2 3
x1
6 7
X2 D 4 x2 5
x3

is an eigenvector associated to 2 D 2, i.e.,


2 3 2 3
x1 0
6 7 6 7
.A  2 I/ 4 x2 5 D 4 0 5 ;
x3 0
7.3  Eigenvalues and Eigenvectors of a Matrix
283 7
then we get x2 D x3 D 0. Hence,
2 3
1
6 7
X2 D x1 4 0 5 :
0

Thus,
2 3
1
6 7
X2 D 4 0 5 :
0

By the same method, we find that


2 3
0
6 7
X3 D 4 1 5
2

is an eigenvector associated to 3 D 11: J

Example 7.9
Consider the matrix A in M3 .C/ given by
2 3
cos   sin  0
6 7
A D 4 sin  cos  0 5 ; 0 <  < 2:
0 0 1

Find the eigenvalues of A and the corresponding eigenvectors. 

Solution
We have
 
det.A  I/ D .  1/ 2 cos  C 2 C 1 :

Thus, det.A  I/ D 0 implies that

1 D 1; 2 D cos  C i sin ; and 3 D cos   i sin :

Alternatively, using the Euler formula

ei D cos  C i sin ;


284 Chapter 7 • Eigenvalues and Eigenvectors

we have

1 D 1; 2 D ei ; and 3 D ei :

By the same method as before, we easily find eigenvectors X1 ; X2 and X3 associated to 1 ; 2 ,


and 3 , respectively, as
2 3 2 3 2 3
0 1 1
6 7 6 7 6 7
X1 D 4 0 5 ; X2 D 4 i 5 ; and X3 D 4 i 5 :
1 0 0

J
7
Definition 7.3.2 (Characteristic Polynomial)
Let A be a matrix in Mn .K/. Then, for any  in K,

p./ D det.A  In /; (7.13)

is a polynomial of degree n in  called the characteristic polynomial of A.

So, it is clear from Theorem 7.3.1 that  is an eigenvalue of A if and only if p./ D 0.
That is, if and only if  is a zero of p./.

Example 7.10
The characteristic polynomial associated to the matrix A in Example 7.8 is

p./ D 3 C 142  35 C 22:

We proved in Exercise 1.4 that if A is a matrix in M2 .K/, then

p./ D 2  tr.A/ C det.A/:

Or equivalently,

p./ D .1/n 2 C .1/n1 tr.A/ C det.A/; n D 2:

In fact this formula can be generalized to any matrix A in Mn .K/ and we have the
following theorem.
7.3  Eigenvalues and Eigenvectors of a Matrix
285 7

Theorem 7.3.2
Let A be a matrix in Mn .K/. Then, its characteristic polynomial has the form

p./ D .1/n n C.1/n1 tr.A/n1 Can2 n2 Can3 n3 C  Ca1 Cdet.A/;
(7.14)

where a1 ; : : : ; an2 are elements of K.

Proof
Let A D .aij /; 1  i; j  n. Then, we have
2 3
a11   a12 ::: a1n
6 7
6 a21 a22   ::: a2n 7
p./ D det.A  In / D det 6
6 :: :: :: 7:
7
4 : : : 5
an1 an2 : : : ann  

Computing this determinant using the cofactor expansion along the first row, we get
2 3
a22   a23 ::: a2n
6 a a33   ::: a3n 7
6 32 7
det.A  In / D .a11  / det 6
6 :: :: :: 7 C Qn2 ./;
7
4 : : : 5
an2 an3 : : : ann  

where Qn2 is a polynomial of degree n2 with coefficients in K. The determinant appearing
here is the characteristic polynomial of an .n  1/  .n  1/ matrix. So, by induction, we find

p./ D det.A  In / D .a11  /.a22  /    .ann  / C Qn2 ./; (7.15)

We compute

.a11  /.a22  /    .ann  / D ./n C ./n1 .a11 C a22 C    C ann / Ce Qn2 ./;
„ƒ‚…
WD tr.A/

(7.16)

where e
Qn2 is a polynomial of degree at most n  2. Thus, combining (7.15) and (7.16), we
get

p./ D .1/n n C .1/n1 tr.A/n1 C Pn2 ;


286 Chapter 7 • Eigenvalues and Eigenvectors

where Pn2 is a polynomial of degree n  2 and has the form

Pn2 ./ D an2 n2 C an3 n3 C    C a1  C a0 : (7.17)

The last coefficient in (7.17) can be obtained easily. Indeed, from (7.14), we have

p.0/ D det.A/:

On the other hand, using (7.13), we have

p.0/ D a0 :

7 This gives a0 D det.A/ and concludes the proof of Theorem 7.3.2. t


u

ⓘ Corollary 7.3.3 Let A be a matrix in Mn .K/ and let 1 ; 2 ; : : : ; n be the eigenvalues


of A, not necessary distinct. Then,

X
n Y
n
tr.A/ D i and det.A/ D i :
iD1 iD1

Proof
The eigenvalues of A are the roots of its characteristic polynomial p./. Hence, from
elementary algebra, the polynomial p./ can be factored as

Y
n
p./ D .1/n .  i /:
iD1

Xn
Expanding this last formula, we find that the coefficient of n1 is .1/n1 i and the
Qn iD1
constant term is iD1 i : Comparing this with (7.14), then the result follows. u
t

Theorem 7.3.4 (Spectrum of the Transpose)


Let A be a matrix in Mn .K/. Then, we have

.AT / D .A/:

Proof
This is clear, since

det.AT  In / D det.A  In /T D det.A  In /;

where we have used Theorem 2.3.1. t


u
7.3  Eigenvalues and Eigenvectors of a Matrix
287 7
We have seen in Theorems 2.3.2 and 2.3.4 that if A is a diagonal or triangular matrix,
then it is easy to find its determinant, because it is the product of the entries of the main
diagonal. As we show next, it is also not difficult to find the eigenvalues and eigenvectors
of a diagonal or triangular matrix and we have the following theorem.

Theorem 7.3.5 (Eigenvalues and Eigenvectors of Diagonal and Triangular


Matrices)
Let A be a diagonal or triangular matrix in Mn .K/. Then, the eigenvalues of A are the
entries of the main diagonal.

Proof
We prove the statement for diagonal matrices; the same argument applies to triangular
matrices. So, let
2 3
d1 0 0  0
6 7
6 0 d2 0    0 7
6 7
6 0 0 d3    0 7
ADDD6 7;
6 :: :: 7
6 : 7
4 : 5
0 0 0    dn

Then,
2 3
d1   0 0  0
6 7
6 0 d2   0  0 7
6 7
6 7
D  In D 6 0 0 d3    0 7:
6 : :: 7
6 : : 7
4 : 5
0 0 0    dn  

Consequently, the characteristic polynomial

p./ D det.D  In / D .d1  /.d2  /    .dn  /;

has the roots

i D di ; i D 1; : : : ; n:

This yields the desired result. t


u
288 Chapter 7 • Eigenvalues and Eigenvectors

Example 7.11
Consider the two matrices
2 3 2 3
1 0 0 1 2 5
6 7 6 7
A D 4 0 3 0 5 and B D 4 0 0 9 5:
0 0 4 0 0 5

Then, since A is diagonal, the eigenvalues of A are 1 D 1; 2 D 3 and 3 D 4. Also, since


B is triangular, the eigenvalues of B are 1 D 1; 2 D 0 and 3 D 5 

We have seen in Theorem 6.3.6 that similar matrices have the same rank. So, one
can ask: do similar matrices share the same spectrum? The answer turns out to be
7 affirmative:

Theorem 7.3.6 (Spectrum of Similar Matrices)


Let A and B be two similar matrices in Mn .K/. Then we have

.A/ D .B/: (7.18)

In addition, A and B have the same characteristic polynomial.

Proof
To show (7.18), we need to prove that .A/ .B/ and .B/ .A/. Since A and B are
similar matrices, (see Definition 6.2.3), there exists an invertible matrix P such that

B D P1 AP and A D PBP1 :

Let  be an eigenvalue of A, i.e., there exists X in Kn with X ¤ 0Kn such that AX D X. This
implies that PBP1 X D X. This gives

B.P1 X/ D .P1 X/:

Now, since X ¤ 0Kn and P1 is invertible, then Y D P1 X ¤ 0Kn and we have BY D Y.
Hence  is an eigenvalue of B and Y is its associated eigenvector. This shows that .A/
.B/. Conversely, let  be an eigenvalue of B, then there exists Y in Kn such that Y ¤ 0Kn
and BY D Y. This gives P1 APY D Y: This yields

A.PY/ D .PY/:

Hence,  is an eigenvalue of A and X D PY is its corresponding eigenvector. Thus, .B/


.A/.
7.3  Eigenvalues and Eigenvectors of a Matrix
289 7
It remains to show that A and B have the same characteristic polynomial. We have

pA ./ D det.A  In / D det.P.B  In /P1 / D det.B  In / D pB ./;

where we have used the fact that In D PIn P1 , (2.26), and (2.30). t
u

ⓘ Remark 7.3.7 The converse of Theorem 7.3.6 is not true. For example, the two
matrices
" # " #
10 12
A D I2 D and BD
01 01

have the same eigenvalues, but they are not similar. Indeed, assuming that there exists
an invertible matrix P such that

B D P1 AP;

we would have

B D P1 AP D P1 I2 P D I2

which a contradiction since B ¤ I2 .

ⓘ Remark 7.3.8 If  is an eigenvalue of A, i.e., there exists an X in Kn , X ¤ 0Kn , with

AX D X;

this gives

A2 X D A.AX/ D .AX/ D 2 X;

i.e., 2 is an eigenvalue of A2 . It easily follows, by induction, that if  is an eigenvalue


of A, then n is an eigenvalue of An for any positive integer n. In addition, if A is
invertible, then the statement is also true for negative integers n.

Example 7.12
The matrix
" #
2 1
AD
1 2

has the eigenvalues 1 D 3 and 2 D 1.


290 Chapter 7 • Eigenvalues and Eigenvectors

Then
" #
2 5 4
A D
4 5

which has the eigenvalues 1 D 9 D .3/2 and 2 D 1 D .1/2 . 

7.4 Diagonalization

We have seen before the importance of diagonal matrices. In this section, we look for
necessary conditions for a square matrix A to be diagonalizable (similar to a diagonal
7 matrix). We start with the following definition.

Definition 7.4.1 (Diagonalizable Matrix)


Let A be a matrix in Mn .K/. We say that A is diagonalizable if it is similar to a
diagonal matrix. That is, if there exists an invertible matrix P in Mn .K/ such that
the matrix

B D P1 AP

is diagonal.

Example 7.13
The matrix A in Example 7.8 is diagonalizable, since the matrix
2 3 2 32 32 3
10 0 0 2=5 1=5 200 0 10
6 7 6 76 76 7
B D 40 2 0 5 D 41 0 0 5 4 0 3 4 5 4 2 0 1 5 D P1 AP
0 0 11 0 1=5 2=5 049 1 02

with
2 3 2 3
0 2=5 1=5 0 10
6 7 6 7
P1 D 41 0 0 5; P D 4 2 0 1 5 ;
0 1=5 2=5 1 02

is a diagonal matrix. 

The diagonalization of a matrix has many important applications. We give here three
of them.
7.4  Diagonalization
291 7
Powers of Diagonalizable Matrices
Suppose we have a square matrix A in Mn .K/ and we want to compute Ak for any k  0.
If A is diagonalizable, i.e., there exist a diagonal matrix B and an invertible matrix P such
that

B D P1 AP; or A D PBP1 ;

then, we have

Ak D AA A
„ƒ‚…
k times

D .PBP1 /.PBP1 /    .PBP1 /


D PB.P1 P/    .P1 P/BP1
D PBk P1 :

Since B is a diagonal matrix, computing Bk is trivial, see Theorem 1.2.12


Decoupling of a System of Differential Equations
Another very important application of diagonalization is the decoupling technique for
systems of differential equations. So, consider the system of differential equations

dX.t/
D AX.t/; (7.19)
dt

where A D .aij /; 1  i; j  n, with aij in R and X.t/ a vector in Rn ,


2 3
x1 .t/
6 : 7
X.t/ D 6
4 :: 5 :
7
xn .t/

That is, (7.19) has the form


8
ˆ dx1 .t/
ˆ
ˆ D a11 x1 .t/ C    C a1n xn .t/;
ˆ
< dt
::
ˆ : ::::::::::::::::::
ˆ
ˆ dxn .t/
:̂ D an1 x1 .t/ C    C ann xn .t/:
dt

Suppose A is a diagonalizable matrix, then there exist a diagonal matrix B and an


invertible matrix P such that B D P1 AP. Performing the change of variables

Y.t/ D P1 X.t/; X.t/ D PY.t/;


292 Chapter 7 • Eigenvalues and Eigenvectors

we obtain

dY.t/ dX.t/
D P1
dt dt
D P1 AX.t/
D P1 APY.t/
D BY.t/;

where B D .i /; 1  i  n, is the diagonal matrix which has on its main diagonal the
eigenvalues of A. The last system can be rewritten as
8
7 ˆ
ˆ
dy1 .t/
D 1 y1 .t/
ˆ
ˆ
< dt
::
ˆ : ::::::
ˆ
ˆ dyn .t/
:̂ D n yn .t/:
dt

This system is decoupled and each of its equations can be solved separately.
The Solution of a System of Recurrence Sequences
In numerical analysis, we sometimes end up with a system of recurrence sequences, and
if the matrix representing this system is diagonalizable, then it is easy to write each term
of the resulting sequences as a function of n, as we show in the following example.

Example 7.14
Consider the two sequences .un / and .vn / defined for all n in N by the relations
(
unC1 D 2un C vn ;
(7.20)
vnC1 D un C 2vn

with u0 D 1 and v0 D 0. Find un and vn as functions of n for all n in N. 

Solution
We can write system (7.20) as
" # " # " #
21 un unC1
XnC1 D AXn ; with A D ; Xn D ; XnC1 D :
12 vn vnC1

The eigenvalues of A are 1 D 1 and 2 D 3. Thus A is diagonalizable and we have


" # " # " #
1 10 1 1 1 1=2 1=2
A D PBP ; with BD ; PD ; P D :
03 1 1 1=2 1=2
7.4  Diagonalization
293 7
Consequently, for any n in N, we have
" #" #" #
n 1 1 0 1 1 1=2 1=2
A D PB P D
n
0 3n 1 1 1=2 1=2
2 1 3n 1 3n 3
C  C
6 2 2 2 2 7
D64 1 3n 1
7
3n 5 :
 C C
2 2 2 2

Consequently,
2 1 3n 1 3n 3 2 1 3n 3
C  C " # C
6 2 2 2 2 7 1 6 2 2 7
Xn D An X0 D 6
4 1 3n 1 3n
7 6
5 0 D4 1 3n
7:
5
 C C  C
2 2 2 2 2 2

Finally, for all n in N, we have

1 3n 1 3n
un D C ; vn D  C :
2 2 2 2
J

Now we state the following result on diagonalizability.

Theorem 7.4.1 (Necessary and Sufficient Conditions for Diagonalization)


Let A be a matrix in Mn .K/. Then, A is a diagonalizable if and only if the matrix A has
n linearly independent eigenvectors.

Proof
First, assume that A has n linearly independent eigenvectors u1 ; u2 ; : : : ; un . Then the matrix
P whose columns are these eigenvectors, that is

P D Œu1 ; u2 ; : : : ; un  (7.21)

is invertible, (see Exercise 4.5). We have

P1 AP D P1 ŒAu1 ; Au2 ; : : : ; Aun 

D P1 Œ1 u1 ; 2 u2 ; : : : ; n un 

D P1 Œu1 ; u2 ; : : : ; un B

D P1 PB D B;
294 Chapter 7 • Eigenvalues and Eigenvectors

where
2 3
1 0 0  0
6 7
6 0 2 0    0 7
6 7
6 0 7
BD6 0 0 3    7:
6 : :: :: 7
6 : : 7
4 : : 5
0 0 0    n

Consequently, A is similar to a diagonal matrix B i.e., A is diagonalizable.


Conversely, assume that A is diagonalizable, then there exists an invertible matrix P such
that the matrix B D P1 AP is diagonal. This yields
7
AP D PB: (7.22)

Let v1 ; v2 ; : : : ; vn be the column vectors of P. Thus,

P D Œv1 ; v2 ; : : : ; vn :

Formula (7.22) shows that

Avi D bi vi ; i D 1; 2; : : : ; n (7.23)

where bi is the ith diagonal entry of B. Since P is invertible, vi ¤ 0Kn ; i D 1; 2; : : : ; n.


Consequently, (7.23) shows that bi is an eigenvalue of A and vi is an associated
eigenvector. t
u

Combining Theorems 7.2.4 and 7.4.1, we deduce the following corollary.

ⓘ Corollary 7.4.2 Let A be a matrix in Mn .K/, all of whose eigenvalues are complete.
Then A is diagonalizable.

Example 7.15
Show that the matrix
2 3
1 2 0 0
6 7
60 1 2 07
AD6 7
40 0 1 25
0 0 0 1

is defective. 

Solution
The matrix A has one eigenvalue  D 1 with algebraic multiplicity equal to 4: this is easily
seen because A is a triangular matrix. By Definition 7.2.2 the geometric multiplicity of  D 1
7.4  Diagonalization
295 7
is not equal to the algebraic multiplicity. Or, more importantly, we can use Corollary 7.4.2
to show that A is not diagonalizable, and thus is defective. Indeed, since  D 1 is the only
eigenvalue of A, then assuming that A is diagonalizable, there exists and invertible matrix P
such that

A D PI4 P1 D I4 :

This is a contradiction, since A ¤ I4 , and so A is not diagonalizable and therefore is


defective. J

ⓘ Remark 7.4.3 The transition matrix P defined in (7.21) is also called the eigenvector
matrix and it is not unique, since the eigenvectors are not unique. For instance, in
Example 7.13, if we multiply the first column by 2, then we get
2 3
0 10
6 7
Q D 4 4 0 1 5 ;
2 02

and Q also satisfies

B D Q1 AQ:

The following result is important in applications.

Theorem 7.4.4 (Diagonalization and Commuting Matrices)


Let A and B be two diagonalizable matrices in Mn .K/. Then, A and B share the same
eigenvector matrix P if and only if they commute, that is AB D BA.

Proof
First, assume that A and B are diagonalizable, with the same eigenvector matrix P. Then there
exist two diagonal matrices, D1 and D2 , such that

A D PD1 P1 and B D PD2 P1 :

Hence,

AB D .PD1 P1 /.PD2 P1 / D PD1 D2 P1 :

On the other hand, we have

BA D .PD2 P1 /.PD1 P1 / D PD2 D1 P1 :


296 Chapter 7 • Eigenvalues and Eigenvectors

Since D1 and D2 are diagonal matrices, D1 D2 D D2 D1 (diagonal matrices always commute),


and therefore AB D BA.
Conversely, assume that A and B are diagonalizable and AB D BA. Let X be an
eigenvector of A, that is

AX D X (7.24)

for some  in K (we assume that  ¤ 0K , since for  D 0K the result is trivial). We need to
show that X is also an eigenvector of B. Indeed, we have

ABX D BAX D BX D BX: (7.25)

7 From (7.24) and (7.25) we deduce that both X and BX are eigenvectors of A sharing the
same eigenvalue  (unless BX D 0Kn , which is not the case since  ¤ 0K and X ¤ 0Kn ).
If  is a simple eigenvalue, then the eigenspace V./ is a one-dimensional vector space, so
necessarily BX D X, for some  in K with  ¤ 0K (since X and BX must be linearly
dependent). Therefore, X is an eigenvector of B with corresponding eigenvalue . We leave
the case where the eigenvalues are not simple to the reader, since it requires some additional
work.
t
u

7.4.1 Spectrum of Symmetric Matrices

In this section, we study some important properties of symmetric matrices. First, we


show that for real symmetric matrices all the eigenvalues are real. In this case, it is easy
to determine the signs of the eigenvalues (which is very important, for instance, in the
stability theory of differential equations). We also prove that symmetric matrices are
diagonalizable. Therefore, all the properties of diagonalizable matrices can be carried
over to symmetric matrices.

Theorem 7.4.5 (Eigenvalues of Real Symmetric Matrices)


Let A be a symmetric matrix in Mn .C/, with all its entries real numbers. Then all the
eigenvalues of A are real or, equivalently, its characteristic polynomial splits over R.

Proof
By Definition 1.4.2, the matrix A is symmetric if and only if A D AT . Now, let  be an
eigenvalue of A. To show that  is real, we need just to prove that N D , where N is the
complex conjugate of . Since  is an eigenvalue of A, there exists u in Cn with u ¤ 0Cn such
that Au D u: Then, we can take the conjugate to get AN N u. Since the entries of A are
N u D N
7.4  Diagonalization
297 7
real, we have AN D A. Hence,

N u:
ANu D N

Now, by taking the transpose of the this equation and using Theorem 1.4.1, we get

N uT :
uN T AT D uN T A D N

Next, we multiply both sides of the above equation from the right by u, we get

N uT u:
uN T Au D N

This yields

N uT u:
uN T .u/ D NuT u D N

Since u ¤ 0Cn , then uN T u ¤ 0. This gives N D . t


u

The following theorem is extremely important since it can be used in many


applications.

Theorem 7.4.6 (Eigenvalues of a Symmetric Matrix)


If A is a symmetric matrix in Mn .K/, then all its eigenvalues are complete. Therefore,
it is diagonalizable.

The proof of Theorem 7.4.6 is beyond the scope of this book and can be found in
advanced linear algebra textbooks.

Example 7.16
Consider the matrix
2 3
13 4 2
6 7
4 4 13 2 5 :
2 2 10

Show that A is diagonalizable. 

Solution
It is not hard to check that A has the eigenvalues 1 D 9; 2 D 9, and 3 D 18. Consequently,
1 has algebraic multiplicity 2. Now, since A is symmetric, then according to Theorem 7.4.6,
the geometric multiplicity of 1 equal to 2. This can be easily seen by computing the
298 Chapter 7 • Eigenvalues and Eigenvectors

eigenvectors of A associated to 1 D 9, which yields


23 2 3
1 1
6 7 6 7
u1 D 4 0 5 ; u2 D 4 1 5 :
2 0

The two vectors u1 and u2 are linearly independent, showing that the geometric multiplicity
of 1 is equal to 2. Hence, A is diagonalizable. J

7.5 Triangularization and the Jordan Canonical Form


7
In this section, we discuss the case of non diagonalizable matrices (endomorphisms).
So, we try to find simple forms for those matrices (endomorphisms) that can be easily
handled.

7.5.1 Triangularization of an Endomorphism

We have seen above that if a matrix A is complete (that is, all its eigenvalues are
complete), then it is similar to a diagonal matrix (diagonalizable), with the eigenvector
matrix providing the transition matrix. In this case, the matrix satisfies some nice
properties that are inherited from the properties of diagonal matrices. Now, if A is not
diagonalizable (as in the defective case), is it possible for A to be similar to a triangular
matrix, for instance? That is to say, is there an invertible matrix M such that the matrix
B D M 1 AM is triangular? If such a matrix M exists, then we say that the matrix A is
triangularizable. Of course in this case we do not expect to have all the nice properties
of diagonalizable matrices, but at least we can keep some of them. So, in this section, we
will be looking for the necessary assumptions that a defective matrix A should satisfy in
order to be triangularizable. We start with the following definition.

Definition 7.5.1 (Triangulability)


1. Let E be a finite-dimensional vector space over a field K. Let f be an
endomorphism of E. Then, f is said to be triangularizable if there exists a basis
B in E such that the matrix M. f / associated to f in the basis B is triangular.a
2. A matrix A in Mn .K/ is said to be triangularizable if it is similar to a triangular
matrix. That is, if there exists an invertible matrix M such that the matrix
B D M 1 AM is triangular.
a
See Definition 1.2.5.

Now, we want to find a traingularizability criterion for matrices. We state the


theorem for matrices, but the same thing can be said for endomorphisms.
7.5  Triangularization and the Jordan Canonical Form
299 7

Theorem 7.5.1 (Necessary and Sufficient Traingularizability Conditions)


Let A be a matrix in Mn .K/. Then, A is triangularizable if and only if its characteristic
polynomial has n zeros (counted with their algebraic multiplicity) in K.

Proof
First assume that the characteristic polynomial pA ./ of A has n zeros in K. We prove by
induction that A is similar to a triangular matrix.
For n D 1, it is clear that a matrix of order 1 is triangular, and thus triangularizable.
Now, assume that all matrices of order n whose characteristic polynomials have n zeros
in K are triangularizable. Let A be a matrix of order n C 1 for which pA has n C 1 zeros in
K. Then there exist at least an eigenvalue  in K and an eigenvector Y in KnC1 associated
to it, i.e., Y ¤ 0KnC1 and AY D Y. Now, according to Theorem 4.6.8, there exist vectors
X1 ; X2 ; : : : ; Xn in KnC1 such that the set fY; X1 ; : : : ; Xn g forms a basis in KnC1 . Consider the
matrix P1 with Y; X1 ; : : : ; Xn as its column vectors. That is,

P1 D ŒY; X1 ; : : : ; Xn :

This matrix P1 is the transition matrix from the standard basis of KnC1 to the basis
fY; X1 ; : : : ; Xn g. It is clear that P1 is invertible and we have
2 3
  ::: 
6 7
6 7 " #
6 7  VT
6 7
A1 D P1 AP1 D 6 0
1
7D ;
6: 7 0n1 B
6: 7
4: B 5
0

where B is a matrix in Mn .K/ and V is a vector in Kn . It is clear from the hypothesis


that pB has n zeros in K. Thus, applying the induction hypothesis, we deduce that B is
triangularizable. Thus, there exists an invertible matrix Q in Mn .K/ such that the matrix
A2 D Q1 BQ is triangular. We consider the matrix P2 defined as
2 3
1 0 ::: 0
6 7
6 7 " #
6 7 1 0Tn1
6 7
P2 D 6 0 7 D :
6: 7 0n1 Q
6: 7
4: Q 5
0
300 Chapter 7 • Eigenvalues and Eigenvectors

It is clear that P2 is invertible and we have


2 3
1 0 ::: 0
6 7
6 7 " #
6 7 1 0Tn1
6 7
P1
2 D 60 7D :
6: 7 0n1 Q1
6: 7
4: Q1 5
0

We have
" #" #" # " #
1 0Tn1  VT 1 0Tn1  VTQ
T D P1
2 A1 P2 D D :
0n1 Q1 0n1 B 0n1 Q 0n1 A2
7
This last matrix is triangular and we have

T D M 1 AM; with M D P1 P2 :

Conversely, suppose A is triangularizable, i.e., there exist a triangular matrix K and an


invertible matrix ƒ such that
2 3
1      
6 7
6 0 2      7
6 7
6 7
K D ƒ1 Aƒ D 6 0 0 3     7 ;
6 : : : 7
6 : : : :: 7
4 : 5
0 0 0    n

where 1 ; 2 ; : : : ; n are the eigenvalues of A. Then

pK ./ D .  1 /.  2 /    .  n /:

Hence, since pK ./ D pA ./ (Theorem 7.3.6), we deduce that pA ./ has n zeros in K. This
finishes the proof of Theorem 7.5.1. t
u

Example 7.17
Consider the matrix
2 3
2 1 2
6 7
A D 4 15 6 11 5 :
14 6 11

Show that A is not diagonalizable, but it is triangularizable. 


7.5  Triangularization and the Jordan Canonical Form
301 7
Solution
We compute the eigenvalues of A. We have

det.A  I3 / D pA ./ D .  1/3 :

Thus, A has one eigenvalue  D 1 with algebraic multiplicity 3. We have only one
eigenvector (up to a constant multiple) associated to , namely
2 3
1
6 7
X D 415:
2

Thus, the geometric multiplicity of A is equal to 1. Hence, A is defective and therefore, not
diagonalizable. But since pA has three zeros (one zero with algebraic multiplicity equal to 3)
in R, then A is triangularizable. One can also verify that
3 2 2 3 2 3
100 11 0 1 0 0
6 7 6 7 6 7
A D M 1 BM; with M D 4 1 3 2 5 ; B D 4 0 1 1 5 ; M 1 D 4 3 1 2 5 :
221 00 1 4 2 3

There are several methods for finding the matrix M. We will not discuss those
methods now. J

According to the theorem of d’Alembert, that says that each polynomial of order n
with coefficients in C has n zeros in C, we deduce from Theorem 7.5.1, the following
corollary.

ⓘ Corollary 7.5.2 Let A be a matrix in Mn .C/. Then A is triangularizable.

7.5.2 The Jordan Canonical Form

We have defined in (1.45) the exponential of a matrix A in Mn .K/, as

A2 A3 Ak
eA D I C A C C C  C C ::: (7.26)
2Š 3Š kŠ

One of the challenging problems for instance, in the study of differential equations is to
compute (7.26). For a diagonal matrix D D diag.d1 ; d2 ; : : : ; dn /, this is simple and we
obtain
2 3
ed1 0 0  0
6 0 ed2 0    0 7
6 7
6 7
e D6
D
6
0 0 ed3    0 7 D diag.ed1 ; ed2 ; : : : ; edn /:
7
6 :: :: :: 7
4 : : : 5
0 0 0  edn
302 Chapter 7 • Eigenvalues and Eigenvectors

If A is not a diagonal matrix, but is diagonalizable, that is

A D PDP1 ;

then we can also compute eA as

eA D PeD P1 :

Another case is when we can write the matrix A as the sum of two matrices,

A D D C N;

7 where D is a diagonal matrix and N (called a nilpotent matrix, see Exercise 1.2) satisfies

N k0 D 0;

(here zero is the n  n matrix with all its entries equal to zero) for some positive integer
k0 > 0: The above decomposition is also known as the Dunford decomposition.
Here the simplicity of the method depends on the smallness of k0 . In this case, D
commutes with any other square matrix of the same order. In particular, DN D ND.
Thus, applying the formula eDCN D eD eN D eN eD ,1 we obtain

eA D eN eD
 
N2 Ak0 1
D ICNC CC diag.ed1 ; ed2 ;    ; edn /: (7.27)
2Š .k0  1/Š

Example 7.18
Find eA for
" #
ab
AD ;
0a

where a and b are scalars. 

Solution
The matrix A can be written as A D D C N with
" # " #
a0 0b
DD and ND :
0a 00

1
This formula holds only for commuting matrices.
7.5  Triangularization and the Jordan Canonical Form
303 7
It is clear that N 2 D 0. Thus, applying formula (7.27) we get J

eA D ea .I C N/
" # " #
a 1 b ea bea
De D :
01 0 ea

Example 7.19
Find the Dunford decomposition of the matrix
2 3
110
6 7
A D 40 1 15:
001

Solution
We can simply write A as
2 3 2 3
100 010
6 7 6 7
A D 4 0 1 0 5 C 4 0 0 1 5 D I3 C N:
001 000

Clearly,
2 3
001
6 7
N2 D 4 0 0 0 5
000

and N 3 D 0. Thus, N is nilpotent and NI3 D I3 N. Hence, the above decomposition is the
Dunford decomposition of the matrix A. J

The problem now is to find eA when A is not diagonalizable. It turns out that there are
several methods of how to do this (at least 19 of them, see [19]). One of these methods
is to reduce the matrix to the so-called Jordan canonical form.
We can easily see that a matrix of the form
2 3
 1 0  0
6 7
6 0  1 ::: 07
6 7
6 : 7
J./ D 6
6 0 0  :: 07
7 (7.28)
6: 7
6: :: 7
4: : 15
0 0 0  
304 Chapter 7 • Eigenvalues and Eigenvectors

has the Dunford decomposition

J./ D In C N; (7.29)

where In is the identity matrix and N is the matrix


2 3
0 1 0  0
6 7
6 0 0 1 ::: 0 7
6 7
6 : 7
6
N./ D 6 0 0 0 : 0 7 : 7:
6: 7
6: :: 7
4: : 15
7 0 0 0  0

Clearly, the matrix N is nilpotent and thus we can compute the exponential of the matrix
J as we have shown above. The matrix J./ is called a Jordan block.

Definition 7.5.2 (Jordan Block)


A Jordan block with value  is a square, upper triangular matrix whose entries are
all equal to  on the diagonal, all equal to 1 immediately above the diagonal, and
equal to 0 elsewhere as in (7.28).

Thus, we know how to compute the exponential of a Jordan block using the Dunford
decomposition. Thus if we can show that there exists an invertible matrix P such that

A D PJP1 ; (7.30)

where J is the block diagonal matrix


2 3
J.1 / 0    0
6 7
6 0 J.2 /    0 7
JD6 6 :: :: : : : 77 (7.31)
4 : : : :: 5
0 0    J.` /

with Ji D J.i /; 1  i  `  n, are Jordan blocks, then we can compute eA . For


example, every diagonal matrix is a matrix in the Jordan canonical form with each
Jordan block a 1  1 block.
Writing A in the form (7.30) leads eventually to the computation of eA by simply
computing eJ as
2 3
eJ.1/ 0  0
6 7
6 0 eJ.2 /  0 7
eJ D 6
6 :: :: :: : 7
7
4 : : : :: 5
0 0    eJ.` /
7.5  Triangularization and the Jordan Canonical Form
305 7
and hence it remains to compute the exponential of each Jordan block. However,
the method of obtaining the matrix J requires detailed knowledge of the generalized
eigenvectors for each eigenvalue of A, which are vectors X satisfying

.A  I/k X D 0Kn

for some positive integer k, and also requires some knowledge of the generalized
eigenspace of ,
˚
V./ D Vj .A  I/k X D 0Kn ; for some k :

So now, the question is when does the matrix J exists? Here is a theorem that answers
this question.

Theorem 7.5.3 (Existence of the Jordan Form Matrix)


Let A be a matrix in Mn .C/ with ` distinct eigenvalues 1 ; 2 ; : : : ; ` . For each i ; 1 
i  `, denote its algebraic multiplicity by mi and its geometric multiplicity by i . Then
A is similar to a Jordan form matrix of the form (7.31), where
▬ ` D 1 C 2 C    C ` .
▬ For each i , the number of Jordan blocks in J with value i is equal to i .
▬ i appears on the diagonal of J exactly mi times.

Furthermore, the matrix J is unique, up to re-ordering the Jordan blocks on the


diagonal.

The proof of Theorem 7.5.3 is very technical, and we omit it here. In fact several
proofs are available; for a discussion of these proofs we refer the reader to the paper
[31] and references therein.
Next, we formulate one of the important theorems in linear algebra.

Theorem 7.5.4 (Cayley–Hamilton Theorem)


Let A be a matrix in Mn .K/ and let

pA ./ D a0 C a1  C    C an n

be the characteristic polynomial of A. Then, we have

pA .A/ D a0 In C a1 A C    C an An D 0:
306 Chapter 7 • Eigenvalues and Eigenvectors

Proof
First, if A is a diagonal matrix, that is A D diag.1 ; 2 ; : : : ; n /, then

pA ./ D det.A  In / D .1/n .  1 /.  2 /    .  n /:

Hence,

pA .A/ D .1/n .A  1 In /.A  2 In /    .A  n In /


D .1/n diag.0; 2  1 ; : : : ; n  1 /    diag.1  n ; 2  n ; : : : ; n1  n ; 0/

D 0:

7 Second, if A is diagonalizable, i.e., there exists a diagonal matrix B such that A D PBP1 ,
then

pA .A/ D PpB .B/P1 D 0;

since pB .B/ D 0:
Third, in general, if A is not diagonalizable, then, according to Theorem 7.5.3, A is
similar to a matrix J of Jordan form. In this case, we have J D PAP1 and the characteristic
polynomial of J can be written as

pJ ./ D .1/n .  1 /m1 .  2 /m2    .  ` /m` :

Since, J is a block diagonal matrix, it suffices to show that pJ .Ji / D 0. Indeed, we have

pJ .Ji / D .1/n .Ji  1 I/m1 .Ji  2 I/m2    .Ji  ` I/m` :

We can easily see that the matrix Ji  i I has the form


2 3
0 1 0  0
6 7
6 : 7
6 0 0 1 :: 07
6 7
6 7
Ji  i I D 6 0 0 0 : : : 07 :
6 7
6: :: 7
6: : 7
4: 15
0 0 0  0

This is an mi  mi nilpotent matrix. Thus, .Ji  i I/mi D 0. Consequently, we deduce that

pJ .Ji / D 0:

Hence, we obtain as before pA .A/ D 0: This completes the proof of Theorem 7.5.4. t
u
7.6  Exercises
307 7
Example 7.20
Consider the matrix
" #
12
AD :
34

Then, we have

pA ./ D 2  5  2:

Moreover,
" #
2 7 10
A D :
15 22

Now, we can easily check that

A2  5A  2I2 D 0:

7.6 Exercises

Exercise 7.1 (Spectrum of Commuting Matrices)


Let A and B be matrices in Mn .K/. We say that A and B commute if AB D BA. Show that if
A and B commute, then

.AB/  f0K g D .BA/  f0K g;

meaning that AB and BA have the same nonzero eigenvalues. 

Solution
We need to show that

.AB/  f0K g .BA/  f0K g and .BA/  f0K g .AB/  f0K g:

So, let  be an eigenvalue of AB with  ¤ 0K . Then there exists an eigenvector X in Kn ,


X ¤ 0Kn , such that

.AB/X D X:
308 Chapter 7 • Eigenvalues and Eigenvectors

We have

BA.BX/ D B.AB/X D B.X/ D BX:

Thus, BX is an eigenvector of BA associated to , with BX ¤ 0Kn (since  ¤ 0K and


X ¤ 0Kn ). This yields

.AB/  f0K g .BA/  f0K g:

Since A and B play a symmetric role, we also have

.BA/  f0K g .AB/  f0K g:


7
J

Exercise 7.2 (Matrix of Rank 1)


Let A be a matrix in Mn .K/.
1. Show that if rank.A/ D k; k  n, then A has the eigenvalue  D 0 at least of multiplicity
n  k.
2. Deduce that if rank.A/ D 1 and tr.A/ ¤ 0, then  D tr.A/ is an eigenvalue of multiplicity
1 and if tr.A/ D 0, then  D 0 is the only eigenvalue of A of multiplicity n.

Solution
1. Since rank.A/ D k, then it is clear from Theorem 6.4.2, that all minors of A of order strictly
bigger than k are zero. Hence, in this case, the characteristic polynomial of A reads as

p./ D .1/n n C .1/n1 tr.A/n1 C an2 n2 C    C ank nk ;

which has  D 0 as a root of order at least n  k.


2. Applying .1/ for k D 1, we deduce that the characteristic polynomial is

p./ D .1/n n C .1/n1 tr.A/n1 :

Hence, if tr.A/ ¤ 0, then  D tr.A/ is a root of p./, otherwise  D 0 is a root of


multiplicity n. J

Exercise 7.3 (Properties of Nilpotent Matrix)


Let N be a square matrix in Mn .K/.
1. Show that N is nilpotent if and only if all its eigenvalues are equal to 0.
2. Show that if N is a nilpotent matrix, then tr.N/ D 0.
7.6  Exercises
309 7
3. Show that if N is a nilpotent matrix, then N is similar to a strictly upper or strictly lower
triangular matrix (a triangular matrix with the entries of the main diagonal are all zero).
4. Show that a nilpotent matrix remains nilpotent with respect to any basis.

Solution
1. First, if all the eigenvalues of N are equal to 0, then its characteristic polynomial is

pN ./ D det.N  In / D .1/n n :

Then using the Cayley–Hamilton theorem (Theorem 7.5.4), we have

pN .N/ D .1/n N n D 0:

Hence, N n D 0, i.e., N is nilpotent.


Conversely, assume that N is nilpotent, then, we have N k D 0 and N k1 ¤ 0, for some
positive integer k  1. Let  be an eigenvalue of N, i.e., there exists X in Kn , X ¤ 0Kn such
that

NX D X:

Hence,

N k X D k X:

This means that k is an eigenvalue of N k and since N k is the zero matrix, k D 0K . This
yields  D 0K .
2. Since N is nilpotent, all its eigenvalues are equal to zero. Then, using (7.3.3), we have

X
n
tr.N/ D i D 0:
iD1

3. Since N is nilpotent, its characteristic polynomial has one root equal to 0, with
algebraic multiplicity n. Then according to Theorem 7.5.1, N is triangularizable. That is,
there exist an invertible matrix P and a triangular matrix T such that N D P1 TP. Since
N and T are similar matrices, they share the same eigenvalues, (Theorem 7.3.6). Since, the
eigenvalues of a triangular matrix are on its main diagonal, the entries of the main diagonal
of T are all equal to zero (because the eigenvalues of N are all equal to zero). Thus, the matrix
T must be strictly upper or lower triangular.
4. Let B1 be a basis of Kn and N be a nilpotent matrix with respect to the basis B1 . Let
B2 be another basis of Kn and P be the transition matrix from B1 to B2 . Then, there exists a
matrix B such that

B D P1 NP:
310 Chapter 7 • Eigenvalues and Eigenvectors

Since N is nilpotent, there exists a positive integer k such that N k D 0. Then

Bk D P1 N k P D 0;

so B is nilpotent. J

Exercise 7.4 (Nilpotent Matrix and Determinant)


Let A and N be two matrices in Mn .K/. Assume that N is nilpotent.
1. Show that if A is invertible, then A C N is invertible.
2. Assume that AN D NA. Show that

det.A C N/ D det.A/:
7
Study first the case where A is invertible.

Solution
1. Assume that A is invertible. Then, A1 N is nilpotent, since if for some k0 we have N k0 D 0,
then (since A1 also commute with N)

.A1 N/k0 D Ak0 N k0 D 0:

Consequently (see Exercise 1.2), In C A1 N is invertible and

A C N D A.In C A1 N/

is invertible, since it is the product of two invertible matrices.


2. If A is invertible, then from above, we have that

det.A C N/ D det.A/ det.In C A1 N/:

We claim that

det.In C A1 N/ D 1:

Indeed, we proved in Exercise 7.3 that a nilpotent matrix is similar to a triangular matrix in
which all the entries of the main diagonal are equal to zero. We denote this triangular matrix
by T. Thus, there exists an invertible matrix P such that A1 N D PTP1 . Then, we have

In C A1 N D PIn P1 C PTP1 D P.In C T/P1 :

Consequently, the matrix In C A1 N is similar to the matrix In C T, and thus det.A1 N/ D
det.In C T/ (similar matrices have the same determinant). Since In C T is a triangular matrix
7.6  Exercises
311 7
with all the entries on the main diagonal are equal to 1, then, det.In C T/ D 1. This proves
the claim.
If A is not invertible, then det.A/ D 0 (see Theorem 2.4.8). If det.A C N/ ¤ 0, then
A C N is invertible. In addition, .A C N/ commute with N, which is a nilpotent matrix. So,
applying what we have proved above, we find that

det.A/ D det.A C N  N/ D det.A C N/ ¤ 0:

This is a contradiction. Thus, one necessarily has det.A C N/ D det.A/ D 0. J

Exercise 7.5
Let E D Pn be the vector space of all polynomials with real coefficients and of degree less
or equals to n. Consider the endomorphism f in L .E/ defined for any p in E as

f . p/ D .x C 1/.x  3/p0  xp:

Find the eigenvalues and eigenvectors of f . 

Solution
Let p be an element of E such that p ¤ 0E . Then, for  in R,

f . p/ D p;

implies that

.x C 1/.x  3/p0  .x C /p D 0: (7.32)

By looking at the higher-order terms in this relation, we deduce that n D 1. Therefore,


p.x/ D ax C b. Plugging this into (7.32), we obtain
(
2a C a C b D 0;
3a C b D 0;

or equivalently,
(
b D .2 C /a;
.2 C 2  3/a D 0:

Since we assumed that p ¤ 0E , we deduce that 1 D 1 and 2 D 3. Now, for 1 D 1, we


get b D 3a and the eigenvector associated to 1 is p1 .x/ D x  3. For 2 D 3, we have
b D a and the eigenvector associated to 2 is p2 .x/ D x C 1. J
312 Chapter 7 • Eigenvalues and Eigenvectors

Exercise 7.6
Let A and B be two matrices in Mn .R/. Assume that AB  BA D A.
1. Show that for any k in N, we have

Ak B  BAk D kAk :

2. Deduce that A is nilpotent.

Solution
1. It is clear that the above identity is true of k D 0. Now, for any k in N  f0g, we have
7
Ak B  BAk D Ak B  Ak1 BA C Ak1 BA  Ak2 BA2 C Ak2 BA2

     ABAk1 C ABAk1  ABk


X
k1
D .Aki BAi  Aki1 BAiC1 /
iD0

X
k1
D Aki1 .AB  BA/Ai
iD0

X
k1
D Aki1 AAi
iD0

D kAk :

2. Define the endomorphism f as follows:

f W Mn .R/ ! Mn .R/

K 7! KB  BK:

We have, for any k in N  f0g,

f .Ak / D kAk :

Now, if Ak ¤ 0, then Ak is an eigenvector of f associated to the eigenvalue  D k.


Consequently, if Ak ¤ 0 for any k, then f has an infinite number of distinct eigenvectors.
This is impossible, since dimR Mn .R/ D n2 < 1. J

Exercise 7.7 (Minimal Polynomial)


Let A be a matrix in Mn .K/.
1. Show that there exists a polynomial p such that p.A/ D 0. In this case we say that the
polynomial p annihilates the matrix A.
7.6  Exercises
313 7
2. The minimal polynomial of A is the smallest monic (the term of highest degree equals
1) polynomial mA that satisfies mA .A/ D 0. Show that the minimal polynomial of A is
unique.
3. Show that the degree of mA is less than or equal to n.
4. Show that if p is a polynomial such that p.A/ D 0, then mA divides p.
5. Prove that  is an eigenvalue of A if and only if  is a root of mA .
6. Show that if the characteristic polynomial pA of A has the form


pA ./ D .  i /ai ;
iD1

with a1 C a2 C    C a` D n, then the minimal polynomial mA of A has the form


mA ./ D .  i /mi ; (7.33)
iD1

where 1  mi  ai .
7. Use the result in (6) to find the minimal polynomial of the matrix
2 3
3 1 1
6 7
A D 41 0 15:
1 1 2

8. Show that if A and B are similar matrices in Mn .K/, then they have the same
characteristic polynomial.

Solution
1. We have seen that dimK Mn .K/ D n2 . Consequently, any set of n2 C 1 elements in
2
Mn .K/ is a linearly dependent set (Lemma 4.6.4). So, consider the set I; A; A2 ; : : : ; An .
This set contains n2 C 1 elements, so it is linearly dependent. Therefore, there exist
a0 ; a1 ; a2 ; : : : ; an2 C1 elements in K, not all zero, such that

2 C1
a0 I C a1 A C a2 A2 C    C an2 C1 An D 0: (7.34)

So, if we consider the polynomial p.x/ defined by

2 C1
p.x/ D a0 C a1 x C a2 x2 C    C an2 C1 xn ;

then, by (7.34), p.A/ D 0.


2. If there are two minimal polynomials m1 .x/ and m2 .x/ of A, then they should have
the same degree r and both have the coefficient of the leading term (the term of the highest
314 Chapter 7 • Eigenvalues and Eigenvectors

degree) equal to 1. Thus,

m1 .x/ D a0 C a1 x C    C ar1 xr1 C xr

and

m2 .x/ D b0 C b1 x C    C br1 xr1 C xr :

Then m.x/ D m1 .x/  m2 .x/ is a polynomial of degree r  1 and it can be written as

m.x/ D c0 C c1 x C    C cr1 xr1 D 0K ; with ci D ai  bi ; i D 1; : : : r  1:

7 Thus, we have

m.A/ D c0 I C c1 A C    C cr1 Ar1 D 0:

Hence, ci D 0K for all i D 1; : : : ; r  1, otherwise m.x/ would be a minimal polynomial of


A, which is a contradiction.
3. We have seen that the characteristic polynomial pA of A is of degree n. Also, the
Cayley–Hamilton theorem (Theorem 7.5.4) gives pA .A/ D 0. This gives us the desired result,
by the definition of mA .
4. Let p be a polynomial such that p.A/ D 0. Then the degree of m is less than or equal
to the degree of p. In this case we can write p as p D qmA C r, where q and r are two
polynomials with the degree of r strictly less than the degree of mA . Hence,

r.A/ D p.A/  q.A/mA .A/ D 0:

This contradicts the minimality of mA unless r is the zero polynomial. Thus, mA divides p.
5. First, assume that  is an eigenvalue of A, then we have AX D X, for some X ¤ 0Kn .
Thus, as we know, that Ak X D k X, for any integer k  0. Hence, for any polynomial p, we
have p.A/ D p./X. In particular, mA .A/ D mA ./X. But since mA .A/ D 0 and X ¤ 0Kn , we
deduce that mA ./ D 0K .
Conversely, if  is a root of mA , then according to (4),  is also a root of pA , since mA
divides pA . Thus,  is an eigenvalue of A.
6. Since pA .A/ D 0, from (4), we deduce that mA divides pA . Therefore, the roots of pA
should be the roots of mA with different multiplicity. Indeed, since pA D qmA , if  is a root
of mA , then it is clear that  is also a root of pA . Conversely, if  is a root of pA , then  is an
eigenvalue of A, hence, according to (5),  is a root of mA . Thus, the only possibility for mA
is to have the form (7.33).
7. The characteristic polynomial of A is

pA ./ D .  2/2 .  1/:


7.6  Exercises
315 7
Thus, we have two possibilities for the minimal polynomial mA ./:

mA ./ D .  2/.  1/ or mA ./ D .  2/2 .  1/:

First, we compute .A  2I/.A  I/. If this matrix is the zero matrix, then the minimal
polynomial is mA ./ D .  2/.  1/. Otherwise, mA ./ D .  2/2 .  1/. We may
easily check that .A  2I/.A  I/ ¤ 0, so indeed mA ./ D .  2/2 .  1/.
8. Since A and B are similar matrices, there exists an invertible matrix S such that B D
S1 AS. Then

mA .B/ D mA .S1 AS/


D S1 mA .A/S

D 0:

So, if there is a minimal polynomial of B of a smaller degree, say mB , then we have, by the
same argument, mB .A/ D 0 which contradicts the minimality of mA . Thus, we conclude that
mA is a minimal polynomial for B, and since the minimal polynomial is unique, we deduce
that mA D mB . J

Exercise 7.8 (Minimal Polynomial and Jordan Canonical Form)


Let A be a matrix in Mn .K/. Let 0 be an eigenvalue of A. We define the Jordan block of
order m0 associated to 0 as
2 3
0 1 0    0
6 7
6 : 7
6 0 0 1 : : 0 7
6 7
6 7
J.0 / D 6 0 0  : : : 0 7 : (7.35)
6 0 7
6 : :: 7
6 : : 7
4 : 1 5
0 0 0    0

1. Show that the minimal polynomial of J.0 / is:

mJ.0 / ./ D .  0 /m0 :

2. Show that the minimal polynomial of the matrix A is


mA ./ D .  i /mi ; (7.36)
iD1

where mi ; 1  i  `, is the order (the size) of the largest Jordan block J.i / in the Jordan
canonical form of A.


316 Chapter 7 • Eigenvalues and Eigenvectors

Solution
1. It is clear from the Dunford decomposition (7.29), that N D J.0 /  0 Im0 is a nilpotent
matrix of order m0 . Thus, we have

N m0 D .J.0 /  0 Im0 /m0 D 0;

and N m0 1 ¤ 0. Hence, we have shown that mJ.0 / is the polynomial of the smallest degree
which satisfies mJ.0 / .J.0 // D 0. Thus, mJ.0 / is the minimal polynomial of J.0 /.
2. Let J be the block diagonal matrix
2 3
J.1 / 0  0
6 0 J. /  0 7
6 2 7
7 JD6
6 :: :: :: :: 7 7 (7.37)
4 : : : : 5
0 0    J.` /

where J.i /; 1  i  `  n, are the Jordan blocks. We have

A D PJP1 ;

for some invertible matrix P. Now, it is clear that

mA ./ D .  1 /m1    .  ` /m`

is the minimal polynomial of J and since A and J are similar matrices, they have the same
minimal polynomial (Exercise 7.7). J

Exercise 7.9 (Minimal Polynomial and Diagonalizationa )


Let A be a matrix in Mn .K/ and let 1 ; 2 ; : : : ; n be the eigenvalues of A.
1. Show that A is diagonalizable if and only if its minimal polynomial has the form

Y
n
mA ./ D .  i /:
iD1

2. Consider the matrix


2 3
111
6 7
A D 40 1 15:
001

Is A diagonalizable?


a
As this exercise shows, the minimal polynomial provides another criterion for diagonalizability.
7.6  Exercises
317 7
Solution
1. In this case, according to Exercise 7.8, each Jordan block is of order 1. Hence, the matrix
J defined in (7.37) is diagonal, and it is similar to A. Hence, A is diagonalizable.
2. We compute the characteristic polynomial of A and find that

pA ./ D .  1/3 :

So, according to Exercise 7.7, its minimal polynomial is

mA ./ D .  1/m0 ; m0  3:

We may easily check that A  I ¤ 0, and .A  I/2 ¤ 0. Thus m0 D 3. Consequently, the


matrix A is not diagonalizable. J

Exercise 7.10 (Nonderogatory Matrix)


A matrix A in Mn .K/ is said to be nonderogatory if its characteristic polynomial and minimal
polynomial coincide (up to a multiplicative constant). Otherwise it is called derogatory.
1. Show that if A is nonderogatory, then every eigenvalue of A has geometric multiplicity 1;
equivalently, A has only one Jordan block for each eigenvalue.
2. Show that the companion matrix Cp , defined in Exercise 2.8 as
2 3
0 1 0 ::: 0
6 : 7
6 7
6 0 0 1 :: 0 7
6 7
Cp D 6
6
:: :: :: :: 7 ;
6 : : : : 77
6 7
4 0 0 0 ::: 1 5
a0 a1 a2 : : : an1

where a0 ; a1 ; : : : ; an1 are in K, is a nonderogatory matrix.

Solution
1. Suppose that there exists one eigenvalue k of A such that to k there correspond two
Jordan blocks. Let the order of the first block be n1 and that of the second block be n2 , with
n1  n2 . Then the characteristic polynomial of A is

pA .A/ D .  1 /m1 .  2 /m2    .  k /n1 .  k /n2    .  ` /m` :

Hence, the polynomial

.  1 /m1 .  2 /m2    .  k /n1    .  ` /m` ;


318 Chapter 7 • Eigenvalues and Eigenvectors

where we removed .  k /n2 , annihilates A. This contradicts the minimality of mA , since,


we assumed that mA D pA .
2. We need to show that the minimal polynomial of Cp is exactly its characteristic
polynomial

p./ D a0 C a1  C    C an1 n1 C n :

Let mCp be the minimal polynomial of Cp . By the preceding results, mCp divides p. If mCp is
a polynomial of degree r < n, then

mCp ./ D ˛0 C ˛1  C    C ˛r1 r1 C r ; r < n:

7 We have

mCp .Cp / D 0

Moreover,

CpT ei D eiC1 ; 1  i  n  1;

where ei is the vector with all components equal to 0K , except for the ith component, which
is 1K . Since Cp and CpT have the same characteristic and minimal polynomials, we have for
e1 D .1K ; 0K ; : : : ; 0K /T that

0 D mCpT .CpT /e1 D ˛0 e1 C ˛1 e2 C    C ˛r1 er C erC1

D .˛0 ; ˛1 ; : : : ; ˛r1 ; 1; 0; : : : ; 0/T


¤ 0:

This is a contradiction. Hence, r D n and therefore the matrix Cp is nonderogatory.


J

Exercise 7.11
Consider the matrix
2 3
1 0 0 0
6 7
6 a1 1 0 07
AD6 7;
4 a2 b1 2 05
a3 b2 c1 2

where a1 ; a2 ; a3 ; b1 ; b2 , and c1 are real numbers. Study the diagonalizability of the


matrix A. 
7.6  Exercises
319 7
Solution
Actually, there are several methods that we can use to study the diagonalizability of A.
Here we use the one based on the minimal polynomial. Since the matrix A is triangular,
the eigenvalues are the elements of the main diagonal. Thus, A has two eigenvalues 1 D 1
and 2 D 2, each of algebraic multiplicity 2. Thus, the characteristic polynomial of A is

pA ./ D .  1/2 .  2/2 :

According to the first question in Exercise 7.9, A is diagonalizable if and only if its minimal
polynomial has the form

mA ./ D .  1/.  2/:

That is, if and only if

.A  I4 /.A  2I4 / D 0;

or equivalently,
2 3 2 3
0 0 0 0 0000
6 a1 0 0 07 6 7
6 7 60 0 0 07
6 7D6 7:
4 a1 b1 0 0 05 40 0 0 05
a1 b2 C a2 c1 b1 c1 c1 0 0000

This gives a1 D c1 D 0. Consequently, A is diagonalizable if and only if a1 D c1 D 0. J

Exercise 7.12 (Circulant Matricesa )


An n  n circulant matrix is a matrix formed from any vector in Cn by cyclically permuting
the entries. For example, if v D .a; b; c/ is a vector in R3 , then the associated 3  3 circulant
matrix is
2 3
abc
6 7
C D 4c a b5:
bca

We see that circulant matrices have constant values on the main diagonal.
1. Find the eigenvalues and the corresponding eigenvectors of the matrix C.
2. Show that if !j ; j D 1; 2; 3 are the cubic roots of the unity, then the eigenvalues of C are

j D q.!j /; j D 1; 2; 3;

where

q.t/ D a C bt C ct2 ;

with the coefficients being the entries of the first row of C.


320 Chapter 7 • Eigenvalues and Eigenvectors

3. Consider the polynomial

p.t/ D t2 C ˛t C ˇ:

Show that there exists a 2  2 circulant matrix such that p.t/ is its characteristic
polynomial and find a polynomial q.t/ such that the eigenvalues 1 and 2 are

1 D q.1/ and 2 D q.1/

(1 and 1 are the square roots of the unity 1).


7 a
The goal of this exercise is to exhibit the beautiful unity of the solutions of the quadratic and cubic equations,
in a form that is easy to remember, which is based on the circulant matrices. This exercise is based on a result
in [12].

Solution
1. By a simple computation, the eigenvalues of C are

1 D a C b C c; 2 D a C b! C c! 2 ; 3 D a C b!N C c!N 2 ;
p
3
where ! D  12 C 2
i with ! 2 D !;
N ! 3 D 1. The corresponding eigenvectors are
2 3 2
3 2 3
1 1 1
6 7 6 7 6 7
V1 D 4 1 5 ; V2 D 4 ! 5 ; V3 D 4 !N 5 :
1 !2 !N 2

We see that ! is the cubic root of the unity and satisfies ! D e2i=3 .
2. The cubic roots of the unity are 1; !, and !.
N Hence, we have

q.1/ D a C b C c D 1 ; q.!/ D a C b! C c! 2 2 and N D a C b!N C c!N 2 D 3 :


q.!/

3. Consider the circulant matrix


" #
ab
CD :
ba

The characteristic polynomial of C is


" #
a b
det D 2  2a C a2  b2 :
b a
7.6  Exercises
321 7
This characteristic polynomial equals p if and only if
r
˛ ˛2
aD and bD˙  ˇ:
2 4

Hence, we can take for C the matrix


" p #
˛=2 ˛ 2 =4  ˇ
CD p :
˛ 2 =4  ˇ ˛=2

Now, we can construct the polynomial q.t/ whose coefficients are the entries of the first row
of C, as:
r
˛ ˛2
q.t/ D  C t  ˇ:
2 4

Now, we have
r r
˛ ˛2 ˛ ˛2
q.1/ D  C ˇ and q.1/ D    ˇ;
2 4 2 4

so q.1/ and q.1/ are the roots of the polynomial p.t/.


As we have seen here, to find the roots of a polynomial p, we first need to find a circulant
matrix C having .1/n p as its characteristic polynomial. The first row of C then defines
a different polynomial q and the roots of p are the eigenvalues of C and are obtained by
applying q to the nth roots of the unity. The same ideas can be applied for cubic and quartic
polynomials.
J
323 8

Orthogonal Matrices and Quadratic


Forms

Belkacem Said-Houari

© Springer International Publishing AG 2017


B. Said-Houari, Linear Algebra, Compact Textbooks in Mathematics,
DOI 10.1007/978-3-319-63793-8_8

8.1 Orthogonal Matrices

In Definition 1.4.2 we introduced the symmetric matrices as those square matrices


that are invariant under the transpose operation. That is, those matrices A in Mn .K/
satisfying A D AT . This class of matrices has very important properties: for instance,
they are diagonalizable (Theorem 7.4.6) and if the entries of a symmetric matrix are
real, then it has only real eigenvalues (Theorem 7.4.5). Here we will study another class
of matrices, whose inverses coincide with their transpose. These matrices are called the
orthogonal matrices. In this section we restrict ourselves to the case of matrices with real
entries. But all the results can be easily extended to the matrices with complex entries.

Definition 8.1.1 (Orthogonal Matrix)


Let A be a matrix in Mn .R/. Then, A is said to be orthogonal if its columns and
rows are orthogonal unit vectors (i.e., orthonormal vectors). See Definition 3.4.1 for
orthogonal vectors in Rn .

ⓘ Remark 8.1.1 We have a similar situation if K D C, the matrix will be then called a
unitary matrix and it enjoys properties similar to those of an orthogonal matrix, but with
respect to the inner product in Cn defined by .u1 ; u2 ; : : : ; un /  .v1 ; v2 ; : : : ; vn / D
u1 vN 1 C    C un vN n . We will not discuss this here, since all the results on orthogonal
matrices can be easily adapted to the case of unitary matrices.
324 Chapter 8 • Orthogonal Matrices and Quadratic Forms

Example 8.1
The matrices
2 3
" # " # 3=7 2=7 6=7
10 cos   sin  6 7
; ; 4 6=7 3=7 2=7 5
01 sin  cos 
2=7 6=7 3=7

are orthogonal. 

Example 8.2 (Permutation Matrix)


A permutation matrix is a square matrix obtained from the same size identity matrix by a
permutation of rows. There are nŠ permutation matrices of size n. For example, if n D 2, the
permutation matrices are
" # " #
8 10
;
01
:
01 10

It is clear that a permutation matrix is orthogonal since all its row vectors and column vectors
are orthonormal vectors. 

One of the important properties of orthogonal matrices is given next.

Theorem 8.1.2 (Characterization of Orthogonal Matrices)


Let A be a matrix in Mn .R/. Then, A is orthogonal if it is invertible and its inverse is
the same as its transpose. That is, if

A1 D AT ; (8.1)

or equivalently, if

AAT D AT A D In : (8.2)

Proof
To prove the theorem it is enough to show that (8.2) holds. Then, the uniqueness of the inverse
(Theorem 1.2.3) gives (8.1). Let v1 ; v2 ; : : : ; vn be the row vectors of the matrix A. Thus
23
v1
6v 7
6 27
AD6 7
6 :: 7 I
4 : 5
vn
8.1  Orthogonal Matrices
325 8
hence, the columns of AT are v1T ; v1T ; : : : ; vnT . Then, we have
23
v1
6 7
6 v2 7
AAT D 6 7 Œv1T ; v2T ; : : : ; vnT  D Œe1 ; e2 ; : : : ; en  D In ;
4:::5
vn

where ei ; 1  i  n, are the standard unit vectors in Rn . Here we used the fact that the
row vectors of A are orthonormal, that is vi vjT D 0 for i ¤ j and vi viT D 1. By the same
argument, we may show that AT A D In . Thus, (8.2) holds and the proof of Theorem 8.1.2 is
complete. t
u

Theorem 8.1.3 (Determinant of an Orthogonal Matrix)


Let A be an orthogonal matrix in Mn .R/. Then

det.A/ D ˙1:

Proof
Since A is orthogonal, then according to Theorem 8.1.2,

AAT D AT A D In :

Thus, using Theorems 2.3.1 and 2.4.7, we have

det.AAT / D det.A/ det.AT / D .det.A//2 D det.In / D 1;

as claimed. t
u

In the following theorem we show and important property of the orthogonal matrices
in M2 .R/.

Theorem 8.1.4 (Characterization of Orthogonal Matrices in M2 .R/)


Let A be an orthogonal matrix in M2 .R/. Then, A has one of the following two forms:
" # " #
cos   sin  cos  sin 
AD or AD ; (8.3)
sin  cos  sin   cos 

for some angle .


326 Chapter 8 • Orthogonal Matrices and Quadratic Forms

Proof
The first matrix in (8.3) is called the counterclockwise rotation matrix and if  D =2, the
second one is called the reflection matrix. First, it is clear that the matrices defined in (8.3)
are orthogonal. Now, let
" #
ab
AD
cd

be a matrix in M2 .R/. By definition, A is orthogonal if and only if its columns and row
vectors of A are orthonormal vectors, i.e., if and only if the following holds:
8 2
ˆ
ˆ a C b2 D 1;
ˆ
< 2
a C c2 D 1;
ˆ 2 2
ˆ c C d D 1;
8 :̂ 2
b C d2 D 1

and
(
ac C bd D 0;
ab C cd D 0:

From the first system, we deduce that there exists and angle  such that

a D d D j cos j; b D c D j sin j:

The second system gives

a D cos ; b D  sin ; c D sin ; d D cos 

or

a D cos ; b D  sin ; c D sin ; d D cos :

Hence, the matrix A has one of the forms in (8.3). t


u

Now, we can easily show that the inverse of any orthogonal matrix is orthogonal.
Indeed, if A is orthogonal, then

AAT D AT A D In ;

so taking here the inverse, we get

.A1 /T A1 D A1 .A1 /T D In :


8.1  Orthogonal Matrices
327 8
Also, the product of two orthogonal matrices is also orthogonal: if A and B are
orthogonal, then

AAT D AT A D In and BBT D BT B D In :

Hence, by Theorem 1.4.1.

.AB/.AB/T D A.BBT /A D In and .AB/T .AB/ D BT .AT A/B D In :

Note that the set of orthogonal matrices is not empty, since it contains at least the matrix
In . Consequently, we have here the algebraic structure of a subgroup (see Exercise 1.11)
and thus we have already proved the following theorem.

Theorem 8.1.5 (The Group O.n/)


The orthogonal matrices form a subgroup of the group of matrices .Mn .R/; /. This
subgroup is called the orthogonal subgroup (or group) and it is denoted by O.n/.

We have proved in Theorem 7.4.5, that the eigenvalues of a real symmetric matrix are
all real. Now, we introduce a very important property of the eigenvectors of a symmetric
matrix.

Theorem 8.1.6 (Eigenvectors of a Real Symmetric Matrix)


Let A be a symmetric matrix in Mn .R/. Then the eigenvectors of A associated to
distinct eigenvalues are orthogonal.

Proof
Let 1 and 2 be two eigenvalues of A with 1 ¤ 2 and let X1 and X2 be corresponding
associated eigenvectors, i.e.,

AX1 D 1 X1 and AX2 D 2 X2 : (8.4)

Multiplying the first equation in (8.4) by X2T , we obtain

1 X2T X1 D X2T .AX1 /


D .X2T A/X1
D .AT X2 /T X1

D .AX2 /T X1

D 2 X2T X1 ;
328 Chapter 8 • Orthogonal Matrices and Quadratic Forms

where we have used the fact that AT D A. It follows that

.1  2 /X2T X1 D 0:

Since 1 ¤ 2 , we deduce that X2T X1 D 0, i.e., X1 and X2 are orthogonal. t


u

Since the eigenvectors associated to distinct eigenvalues of a real symmetric matrix


are orthogonal, then in order to build an orthogonal matrix of these eigenvectors, it
remains to normalize them. This process of normalizing orthogonal vectors is known as
the the Gram–Schmidt process.

8.1.1 The Gram–Schmidt Process

More generally, the Gram–Schmidt process is a method of orthonormalizing a set


8 S0 D fv1 ; v2 ; : : : ; vk g, k  n of linearly independent vectors in Rn . Basically, it takes
the set S0 and generates a new set S1 D fu1 ; u2 ; : : : ; uk g of orthonormal vectors that
spans the same k-dimensional subspace of Rn as S0 . To make the set S0 orthonormal,
there are two main steps. First, one has to make S0 orthogonal, and second, once it is
orthogonal, one has to make it orthonormal. The main idea of this method is based on the
projection formula (3.33). To explain the process, let us, for example, take three linearly
independent vectors v1 ; v2 , and v3 of Rn . We want to find three linearly independent and
orthonormal vectors u1 ; u2 , and u3 . So, the first vector u1 can go with the direction of v1 ,
it just has to be normalized. So, we put
v1
u1 D :
kv1 k

Now, the task is to choose u2 such that it is orthogonal to u1 and has norm equal to 1.
We proceed exactly as we did in Theorem 3.4.1. If we choose

v2  u1
w2 D v2  u1 D v2  proju1 v2 ;
ku1 k2

then since w2 is orthogonal to v1 , we need to take

w2
u2 D ;
kw2 k

since u2 is required to be a unit vector. We see that u1 has the same direction as v1 , but
for u2 we subtracted from v2 the component in the direction of u1 (which is the direction
of v1 ). Now, at this point, the vectors u1 and u2 are set. Now, we need to choose u3 such
that it will not lie in the plane of u1 and u2 , which is exactly the plane of v1 and v2 . So,
we simply need to subtract from v3 any component of u3 in the plane of u1 and u2 . Thus,
we take w3 to be the vector

w3 D v3  proju1 v3  proju2 v3 :
8.1  Orthogonal Matrices
329 8
It is clear that w3 is orthogonal to u1 and u2 . Since u3 is required to be a unit vector, we
choose
w3
u3 D :
kw3 k

This process of choosing u1 ; u2 , and u3 is called the the Gram–Schmidt process and we
may apply the same ideas to any finite number of vectors. So, suppose now that we
have the set S0 D fv1 ; v2 ; : : : ; vk g; k  n, of linearly independent vectors. Hence, to
construct the set S1 D fu1 ; u2 ; : : : ; uk g described above, we use the following algorithm:

w1
w1 D v1 ; u1 D ;
kw1 k
w2
w2 D v2  proju1 v2 ; u2 D ;
kw2 k
w3
w3 D v3  proju1 v3  proju2 v3 ; u3 D ; (8.5)
kw3 k
:: ::
: :
X
k1
wk
wk D vk  projuj vk ; uk D :
jD1
kwk k

We may easily show that the vectors u1 ; u2 ; : : : ; uk are orthonormal. We see that at the
step k, we subtracted from vk its components in the directions that are already settled.
Now, if S0 contains n vectors, then according to Theorem 4.6.2, it forms a basis of
Rn . Therefore, the set S1 is also a basis of Rn , but it is already an orthonormal basis.
Thus, we have already proved the following theorem.

Theorem 8.1.7 (Orthonormal Basis)


In any finite dimensional Euclidean vector space there exists an orthonormal basis.

Example 8.3
Apply the Gram–Schmidt process to the following vectors in R4 :
2 3 2 3
1 1
627 627
6 7 6 7
v1 D 6 7 and v2 D 6 7 :
435 405
0 0


330 Chapter 8 • Orthogonal Matrices and Quadratic Forms

Solution
We follow the Gram–Schmidt process described in (8.5) and define
2 3
p1
14
6 q 7
v1 1 6 27
6 7
u1 D D p v1 D 6 7 7 :
kv1 k 14 6 p3 7
4 14 5
0

Now, to find u2 , we need first to find w2 , so, compute

v2  u1
proju1 v2 D u1 D .v2  u1 /u1 :
ku1 k2
p
8 We have v2  u1 D 5= 14: Thus,
2 5
3
14
6 5 7
5 6 7
proju1 v2 D u1 D 6
6
7 7;
7
14 4 15
5
14
0

and then
2 3
9
6 14 7
6 9 7
6 7 7
w2 D v2  proju1 v2 D 6 7:
6  15 7
4 14 5
0

Finally,
2 p5 3
27 14
6 14 7
6 27 p 5 7
w2 6 14 7
u2 D D6
6
7
q 7:
7
kw2 k 6  45 5 7
4 14 14 5
0

To be convinced, one can easily verify that u1 and u2 are orthonormal vectors. J

8.1.2 The QR Factorization

One of the important ideas in linear algebra is to write a real square matrix as the product
of two matrices, one orthogonal and the other one upper triangular. This process is called
8.1  Orthogonal Matrices
331 8
the QR factorization or QR decomposition. So, if A is a square matrix in Mn .R/, we will
show that one can write

A D QR;

where Q is an orthogonal matrix and R is an upper triangular matrix. We will also show
that if A is invertible, then the above decomposition is unique. Actually, several methods
of finding the QR decomposition are available; here we discuss the method based on the
Gram–Schmidt process. We state the following result.

Theorem 8.1.8 (The QR Decomposition)


Let A be a matrix in Mn .R/. Then, A can be factorized as

A D QR; (8.6)

where Q is an orthogonal matrix and R is an upper triangular matrix. In addition, if A


is invertible and if the diagonal entries of R are positive, then the decomposition (8.6)
is unique.

Proof
First, let us prove the uniqueness. So, assume that A is invertible and assume that there exists
two orthogonal matrices Q1 and Q2 and two upper triangular matrices R1 and R2 , such that

A D Q1 R1 D Q2 R2 :

Then multiplying from the left by Q1 1


2 and from the right by R1 , we obtain

Q D Q1 1
2 Q1 D R2 R1 D R:

Now, since Q is orthogonal, we have

QT D Q1 D R1 :

Since the inverse of a triangular matrix is triangular of the same type, we deduce that Q and
QT are both upper triangular matrices. Hence, Q is an upper as well as a lower triangular
matrix, and so it is a diagonal matrix. In addition its diagonal entries are strictly positive,
then we have

Q2 D QQT D In :

This means that Q D In and therefore the uniqueness of the inverse gives Q1 D Q2 and
R1 D R2 .
332 Chapter 8 • Orthogonal Matrices and Quadratic Forms

The existence of the decomposition (8.6) follows from the Gram–Schmidt process.
Indeed, let v1 ; v2 ; : : : ; vn be the column vectors of the matrix A. Then, according to the Gram–
Schmidt process, we can form a set of orthogonal vectors u1 ; u2 ; : : : ; un as described in (8.5).
Thus, the matrix Q defined as

Q D Œu1 ; u2 ; : : : ; un 

is an orthogonal matrix. Thus, we need to find the matrix R D Œr1 ; r2 ; : : : ; rn  with the column
vectors r1 ; r2 ; : : : ; rn such that

A D Œv1 ; v2 ; : : : ; vn  D QR D Œu1 ; u2 ; : : : ; un Œr1 ; r2 ; : : : ; rn :

To find R, we simply need to write (8.5) in matrix form, which yields


2 3
8 v1  u1 v2  u1    vn  u1
6 0 v u    vn  u2 7
6 2 2 7
RD6
6 :: :: :: :: 7 7:
4 : : : : 5
0 0    vn  un

It is clear that R is an upper triangular matrix. This completes the proof of Theorem 8.1.8. u
t

Example 8.4
Find the QR decomposition of the matrix
2 3
110
6 7
A D 41 0 15:
011

Solution
Let v1 ; v2 , and v3 be the column vectors of A. Now, we need to find u1 ; u2 , and u3 using the
Gram–Schmidt process. Indeed, we have
2 1
3
p
2
v1 6 7
u1 D D6
4
1
p 7:
5
kv1 k 2
0

Now, we need to find w2 , thus, we first take


2 3
1
v2  u1 6 21 7
proju1 v2 D u1 D .v2  u1 /u1 D 4 2 5:
ku1 k2
0
8.1  Orthogonal Matrices
333 8

Thus,
2 3
1
2
6 7
w2 D v2  proju1 v2 D 4  12 5 :
1

Hence,
2 1
3
p
w2 6 16 7
u2 D D 6 p 7:
kw2 k 4 6 5
p2
6

Now, to find w3 , we need first to compute

proju1 v3 and proju2 v3 :

We have
2 3
1
627
proju1 v3 D .v3  u1 /u1 D 4 12 5 ;
0

and similarly,
2 3
1
6
6 7
proju2 v3 D .v3  u2 /u2 D 4  16 5 :
2
6

Hence,
2 3
 23
6 2 7
w3 D v3  proju1 v3  proju2 v3 D 4 3 5:
2
3

Then,
2 3
 p1
6 3 7
w3
u3 D D6
4
1
p 7:
5
kw3 k 3
1
p
3
334 Chapter 8 • Orthogonal Matrices and Quadratic Forms

Consequently, the matrix Q has the form


2 1 1
3
p p  p1
6 2 6 3 7
Q D Œu1 ; u2 ; u3  D 6
4
1
p
2
 p1
6
1
p
3
7:
5
2 1
0 p p
6 3

Also, a simple computation yields


2 3 2 p2 p1 1
p
3
v1  u1 v1  u2 v1  u3 2 2 2
6 7 6 7
R D 4 0 v2  u2 v2  u3 5 D 6 4 0 3
p
6
1
p
6
7:
5
0 0 v3  u3 2
0 0 p
3

8 Now, we state another important theorems of linear algebra. We have seen in


Theorem 7.4.6 that every symmetric matrix is diagonalizable. Thus, if A is a symmetric
matrix, then there exists an invertible matrix S such that the matrix

D D S1 AS

is diagonal. Moreover, the matrix S is the eigenvector matrix (the matrix whose column
vectors are the eigenvectors of A). So, in the light of Theorem 8.1.6, we may ask
whether one can choose these eigenvectors to be orthonormal? In other words, is there
an orthogonal matrix S such that the identity

S1 AS D ST AS

holds true? The answer to this question is affirmative and in this case the matrix A is
called orthogonally diagonalizable .

Theorem 8.1.9 (Spectral Theorem)


Let A be a matrix in Mn .R/. Then, A is orthogonally diagonalizable (that is there exits
an orthogonal matrix S such that the matrix

D D S1 AS D ST AS (8.7)

is diagonal), if and only if A is symmetric.

Proof
First, if (8.7) is satisfied, then, A can be written as

A D SDS1 D SDST :
8.1  Orthogonal Matrices
335 8
Hence,

AT D .SDST /T D SDT ST D SDST D A;

since D is a diagonal matrix, and therefore symmetric. Consequently, A is symmetric.


Now, assume that A is symmetric. To show that A is orthogonally diagonalizable, we first
show that there exists an orthogonal matrix S such that the matrix

T D ST AS (8.8)

is upper triangular. (This result is known as Schur’s lemma).


Since A is a symmetric matrix, T is also symmetric matrix; indeed

.ST AS/T D ST AT S D ST AS D T T :

Since T is symmetric and triangular, it is automatically diagonal. Thus, T D D. It remains to


prove (8.8) which will be done in the next lemma. t
u

ⓘ Lemma 8.1.10 (Schur’s Lemma) Let A be a matrix in Mn .R/ with real eigenvalues.a
Then, there exists an orthogonal matrix S such that the matrix

T D ST AS (8.9)

is upper triangular.
a
If A is a matrix in Mn .C/; then the assumption of real eigenvalues is not needed and we need to use a unitary
matrix instead of the orthogonal one.

Proof
We proceed by induction on the size of the matrix n. For n D 1, A D Œa, therefore, we can
take S D Œ1. Now suppose that the lemma holds true for any .n  1/  .n  1/ (with n  2)
matrix. Let A be an n  n matrix. Let 1 be an eigenvalue of A and v1 be an eigenvector
associated to 1 . By dividing v1 by kv1 k, one can assume that v1 is a unit vector. Thus,
according to Theorem 4.6.8, we can extend the set fv1 g to a basis fv1 ; u2 ; : : : ; un g of Rn .
Using the Gram–Schmidt process, we transform this basis into an orthonormal basis B D
fv1 ; v2 ; : : : ; vn g of Rn . Consider the matrix Q whose columns are v1 ; v2 ; : : : ; vn ,

Q D Œv1 ; v2 ; : : : ; vn :

It is obvious that Q is an orthogonal matrix. Now, set A1 D QT AQ. Then for


2 3
1
6 7
607
e1 D 6 7
6 :: 7
4:5
0
336 Chapter 8 • Orthogonal Matrices and Quadratic Forms

we have

QA1 e1 D AQe1 D Av1 D 1 v1 D 1 Qe1 :

Hence, we obtain A1 e1 D 1 e1 . In other words, A1 has the block-triangular form


2 3
1  : : : 
6 7
6 7
6 7
6 0 7
A1 D 6 7;
6 :: 7
6 7
4 : B 5
0

where B is an .n  1/  .n  1/ matrix. Applying the induction hypothesis, there exists an


.n  1/  .n  1/ orthogonal matrix W such that the matrix W T BW is upper triangular. Now
8 we consider the matrix (block-diagonal) W1 D diag.1; W/. Then the matrix T D W1T A1 W1 is
upper triangular. Set

S D QW1 :

Then it is clear that S is orthogonal (since the product of two orthogonal matrices is
orthogonal) and we have

ST AS D W1T QT AQW1 D W1T A1 W1 D T:

This finishes the proof of Lemma 8.1.10. t


u

8.1.3 The LU Factorization

In many applications where linear systems arise, one needs to solve the equation
AX D b, where A is an invertible matrix in Mn .K/1 and X and b are vectors in Kn . The
best way to solve this equation (system) is to replace the coefficient matrix A (through
some row operations) with another matrix that is triangular. This procedure is known as
the Gauss elimination method and is basically equivalent of writing the matrix A in the
form

A D LU;

where L is a lower triangular matrix and U is an upper triangular matrix. The question
now is the following: does such decomposition always exist, and if so, is it unique? To
provide an answer, we start with the following definitions.

1
K here is not necessary R or C.
8.1  Orthogonal Matrices
337 8
Definition 8.1.2 (Principal Minor)
Let A be a matrix in Mn .K/. The principal submatrix of order k (1  k  n) is the
submatrix formed by deleting from A n  k rows and the n  k columns with the
same indices (for example, delete rows 1; 2 and 5 and columns 1; 2 and 5). The
determinant of this submatrix is called the principal minor of order k of A.

Definition 8.1.3 (Leading Principal Minor)


Let A be a matrix in Mn .K/. A minor of A of order k, (1  k  n) is called a
leading principal minor of order k if it is the determinant of the submatrix obtained
by deleting from A the last n  k rows and columns. This submatrix is called the
leading principal submatrix.

Example 8.5
Consider the matrix
2 3
1 20
6 7
A D 4 1 3 4 5 :
2 25

Then, the leading principal minors of A are


" #
.1/ .2/ 1 2
A D detŒ1 D 1; A D det D 5; A.3/ D det.A/ D 33:
1 3

We will see in  Sect. 8.2 that the leading principal minors can be used as a test for
the definiteness of symmetric matrices.

Theorem 8.1.11 (LU Decomposition)


Let A be an invertible matrix in Mn .K/: Then A admits a unique decomposition of the
form

A D LU; (8.10)

where L is a lower triangular matrix with 1’s on the diagonal and U is an upper
triangular matrix, if and only if all its leading principal minors are nonzero.
338 Chapter 8 • Orthogonal Matrices and Quadratic Forms

Proof
We first prove the uniqueness. Assume that there exist L1 ; L2 ; U1 , and U2 satisfying the
assumptions in Theorem 8.1.11, such that

A D L1 U1 D L2 U2 :

This gives, by multiplying from the left by L1 1


2 and from the right by U2 (recall that all the
four matrices above are invertible, since A is invertible)

L1 1
2 L1 U1 U2 D In :

Now, we put L D L1 1


2 L1 and U D U1 U2 . It is clear that L is a lower triangular matrix and
U is an upper triangular matrix. Thus, since L D U, they necessarily are diagonal matrices,
with the diagonal entries of L being 1’s. Then, we have L D U D In . This shows that L1 D L2
8 and U1 D U2 , so, the decomposition (8.10) is unique.
The proof of the existence of the factorization (8.10), is done by induction on n (the size
of the matrix A). Trivially, if n D 1, then A D Œa D Œ1Œa, so (8.10) is satisfied for n D 1.
Now, assume that (8.10) holds for square matrices of size .n  1/  .n  1/ and partition A as
" #
An1 a1
AD
aT2 ann

where An1 is an .n  1/  .n  1/ matrix, a1 and a2 are vectors of Kn , and ann is in K. Now,


it is clear from the hypothesis of the theorem that all the minors of An1 are not zero and so
An1 is invertible. Hence, we may apply the induction assumption to An1 and write it as

An1 D Ln1 Un1 ;

where Ln1 and Un1 are .n  1/  .n  1/ matrices with Ln1 a lower triangular matrix
whose diagonal elements are equal to 1 and Un1 an upper triangular matrix.
Now, consider the two n  n matrices
" # " #
Ln1 0.n1/1 Un1 c
LD and UD (8.11)
dT 1 01.n1/ ann  dT c

where c and d are vectors in Kn to be determined. Now, the formula A D LU, gives
" # " #
Ln1 Un1 Ln1 c An1 a1
LU D D :
dT Un1 ann aT2 ann

This yields

Ln1 c D a1 ; and dT Un1 D aT2 :


8.1  Orthogonal Matrices
339 8
Since Ln1 and Un1 are invertible matrices, then c and d are uniquely determined as

c D L1
n1 a1 ; and dT D aT2 Un1
T
:

Thus, we have shown that A D LU, with L and U unique and given as in (8.11).
The converse is clear, since if A has the factorization (8.11), then, we may easily show
that if A.k/ is the leading principal minor of A of order k, then A.k/ D L.k/ U .k/ ; where L.k/ and
U .k/ are the principal leading minors of order k of L and U, respectively. Consequently,
Y
A.k/ D ujj ;
1jp

which is nonzero, since U is an invertible matrix. This finishes the proof of Theorem 8.1.11.
t
u

Example 8.6
Find the LU decomposition of the matrix
2 3
12 4
6 7
A D 4 3 8 14 5 :
2 6 13

Solution
First, we compute the leading principal minors of A, obtaining
" #
.1/ .2/ 12
A D detŒ1 D 1; A D det D 2; A.3/ D det.A/ D 6:
38

Thus, according to Theorem 8.1.11, the LU decomposition exists and we have


2 3 2 32 3
12 4 1 0 0 u11 u12 u13
6 7 6 76 7
A D 4 3 8 14 5 D 4 l21 1 0 5 4 0 u22 u23 5
2 6 13 l31 l32 1 0 0 u33
2 3
u11 u12 u13
6 7
D 4 l21 u11 l21 u12 C u22 l21 u13 C u23 5
l31 u11 l31 u12 C l32 u22 l31 u13 C l32 u23 C u33

Solving the above system, we obtain


2 3 2 3
100 124
6 7 6 7
L D 43 1 05 and U D 40 2 25: (8.12)
211 003

J
340 Chapter 8 • Orthogonal Matrices and Quadratic Forms

The LU Algorithm
Computers usually solve square linear systems using the LU decomposition, since it
is simpler and less costly (see Remark 8.2.9 below). When such a decomposition is
available, then solving the system

AX D b (8.13)

is relatively fast and simple. We first write A D LU and solve

LY D b (8.14)

for Y. Of course here we get a unique solution Y, since L is invertible. Also, since L is a
lower triangular matrix, then system (8.14) should be solved in the “forward” direction.
That is, if y1 ; y2 ; : : : ; yn are the components of the vector Y, then these components are
found successively in the same order. Once Y is obtained, one solves for X the system
8
UX D Y: (8.15)

Once again the triangular form of the matrix U makes the computations of X from (8.15)
easy. In this case, we solve the system (8.15) in the “backward” direction. That is, the
components of X are found in the order xn ; xn1 ; : : : ; x2 ; x1 . Consequently, (8.14) and
(8.15) yield

AX D L.UX/ D LY D b:

Hence, the solution X obtained in (8.15) is the solution of (8.13).

Example 8.7
Solve the system of equations
8
ˆ
< x1 C 2x2 C 4x3 D 1;
3x1 C 8x2 C 14x3 D 0; (8.16)

2x1 C 6x2 C 13x3 D 1:

Solution
The system (8.16) can be written as

AX D b

with
2 3 2 3 2 3
12 4 x1 1
6 7 6 7 6 7
A D 4 3 8 14 5 ; X D 4 x2 5 and b D 405:
2 6 13 x3 1
8.2  Positive Definite Matrices
341 8

We have seen in Example 8.6 that the matrix A can be written as A D LU, with L and U as in
(8.12). Next, we solve the system LY D b, where Y is the vector in R3 with the components
y1 ; y2 , and y3 . This gives
8
ˆ
< y1 D 1;
3y1 C y2 D 0;

2y1 C y2 C y3 D 1:

Therefore, y1 D 1; y2 D 3 and y3 D 2. Finally, we solve the system UX D Y, that is


8
ˆ
< x1 C 2x2 C 4x3 D 1;
2x2 C 2x3 D 3;

2x3 D 2:

This yields x3 D 2=3; x2 D 13=6 and x1 D 8=3. Hence, the solution of (8.16) is
2 3
8
6 3 7
X D 4  13
6 5
:
2
3

8.2 Positive Definite Matrices

We have seen in Theorem 7.4.5, that real symmetric matrices have only real eigenvalues;
these, of course can be positive or negative. Thus, the question now is what happen
if all the eigenvalues of a symmetric matrix are positive? Does the matrix enjoy
some particular properties? The signs of the eigenvalues of a matrix are important
in applications, for instance in the stability theory of differential equations. So, it is
quite important to determine which matrices have positive eigenvalues. These matrices
are called positive definite matrices. Symmetric positive definite matrices have rather
nice properties; for example, as we will see later on, every positive definite matrix is
invertible. Also, by studying these matrices we will bring together many things that we
have learned about determinants, eigenvalues,: : : We restrict our discussion here to the
case K D R, but all the properties can be easily extended to the case K D C.
Now, we start with the definition.

Definition 8.2.1 (Positive Definite Matrix)


Let A be a symmetric matrix in Mn .R/. Then A is said to be positive definite if all
its eigenvalues are positive.
342 Chapter 8 • Orthogonal Matrices and Quadratic Forms

Example 8.8
The matrix
" #
12
AD
25

is positive definite, since it is symmetric and its eigenvalues 1 and 2 satisfy (see
Corollary 7.3.3)

1 C 2 D tr.A/ D 6; and det A D 1 2 D 1:

Thus, 1 and 2 have to be positive. 

Notation Here we introduce a new notation which is very useful in this chapter. If X and Y
are two vectors in Rn , we denote
8
hX; Yi D X T Y:

Then for any matrix A in Mn .R/ it holds that

hAX; Yi D hX; AT Yi:

As we have seen in the previous chapters, computing the eigenvalues of a matrix


can be a challenging problem, especially for matrices of high order. So, it seems
difficult to use Definition 8.2.1 in general. Hence, our goal is to find a test that can
be applied directly to a symmetric matrix A, without going through the computation
of its eigenvalues, which will guarantee that the matrix is positive definite. This test is
given in the following theorem.

Theorem 8.2.1 (Characterization of Positive Definite Matrices)


Let A be a symmetric matrix in Mn .R/. Then A is positive definite if and only if

hX; AXi D X T AX > 0; a (8.17)

for all nonzero vectors X in Rn .


a
Since A is symmetric, we have hAX; Yi D hX; AT Yi D hX; AYi.

Proof
First assume that A is positive definite. Then, by Theorem 8.1.9, the eigenvectors
v1 ; v2 ; : : : ; vn of A form an orthonormal basis of Rn . Thus, if i is an eigenvalue of A
and vi is an associated eigenvector, then we have

hvi ; Avi i D hvi ; i vi i D i hvi ; vi i D i kvi k2 D i > 0; i D 1; 2; : : : ; n:


8.2  Positive Definite Matrices
343 8
Since (8.17) is satisfied for the elements of a basis of Rn , it is satisfied for any nonzero vector
X of Rn .
Second, assume that (8.17) is satisfied and let  be an eigenvalue of A, i.e., AX D X,
for some eigenvector X associated to . Then

hX; AXi D hX; Xi D kXk2 > 0:

This shows that  > 0. Thus, A is positive definite. t


u

Example 8.9
Show that the matrix
2 3
2 1 0
6 7
A D 4 1 2 1 5
0 1 2

is positive definite. 

Solution
To do so, we apply Theorem 8.2.1. So, let
2 3
x1
6 7
X D 4 x2 5 (8.18)
x3

be a nonzero vector in R3 . Then,


 
hX; AXi D 2 x21  x2 x1 C x22 C x23  x2 x3 :

Since

1 1
x2 x1   .x21 C x22 / and  x2 x3   .x22 C x23 /;
2 2

it is clear that

hX; AXi  x21 C x23 > 0;

if x1 and x3 are not both zero. Otherwise, if x1 D x3 D 0, then x2 ¤ 0 and hence we have

hX; AXi D 2x22 > 0:

Consequently, the matrix A is positive definite. J


344 Chapter 8 • Orthogonal Matrices and Quadratic Forms

Now, we can show an important property of positive definite matrices.

Theorem 8.2.2 (Invertibility and Positive Definite Matrices)


Let A be a positive definite matrix in Mn .R/. Then A is invertible and A1 is also
positive definite.

Proof
If A is positive definite, then all its eigenvalues 1 ; 2 ; : : : ; n are necessarily positive. Then,
by Corollary 7.3.3, we have

Y
n
det.A/ D i ¤ 0;
iD1
8
and Theorem 2.4.8 shows that A is invertible.
Now, A1 is symmetric since .A1 /T D .AT /1 D A1 . Also, if  is a positive
eigenvalue of A, then 1 is a positive eigenvalue of A1 . Thus, A1 is positive definite. t
u

ⓘ Remark 8.2.3 It is clear that if A is positive definite, then det.A/ > 0. However, if
det.A/ > 0, then A is not necessarily positive definite. For, example, the determinant of
the matrix
" #
1 1
AD
0 1

is positive, but A is not positive definite.

Next, we introduce another important property of positive definite matrices.

Theorem 8.2.4 (Leading Principal Minors and Positive Definite Matrices)


Let A be a symmetric matrix in Mn .R/. Then, A is positive definite if an only if all its
leading principal minors are positive.

To prove Theorem 8.2.4, we need the following lemma, known as the Rayleigh–Ritz
theorem, which gives the relation between the eigenvalues of the matrix A and those
of its principal submatrices. We do not prove this lemma; we need it in order to show
Theorem 8.2.4. The reader is refereed to [13, Theorem 8.5.1].

ⓘ Lemma 8.2.5 (Rayleigh–Ritz) Let A be a symmetric matrix in Mn .R/ and let Ak be


its principal submatrix of order k. Let 1 ; 2 ; : : : ; n be the eigenvalues of A and let
1 ; 2 ; : : : ; k be the eigenvalues of Ak . Then, we have

i  i  iCnk ; i D 1; : : : ; k:
8.2  Positive Definite Matrices
345 8
Proof of Theorem 8.2.4
First assume that A is positive definite. Then we can prove that all its principal minors
are positive. Let Ak be the principal submatrix of order k, and let 1 ; 2 ; : : : ; k be the
eigenvalues of Ak . Since Ak is symmetric, its eigenvalues are real (Theorem 7.4.5). Hence,
1 ; 2 ; : : : ; k can be ordered as 1  2      k , for instance. Applying Lemma 8.2.5,
we deduce that

0 < 1  1  2      k :

This shows that


Y
det.Ak / D i > 0:
1ik

(Recall that 1 > 0, since A is positive definite). Hence, all the principal minors of A are
positive. Consequently, in particular its leading principal minors are positive.
Conversely, assume that all the leading principal minors of A are positive. We denote
by Ak the principal leading submatrix of order k. Then, we prove by induction on k.k D
1; 2; : : : ; n/ that the matrix A D An is positive definite. For k D 1, A1 D Œa is positive
definite, since in this case a D det.A/ > 0 (by assumption) and at the same time a is the
eigenvalue of A1 . Now, for k  2 we assume that Ak1 is positive definite and show that Ak
is positive definite. Let

0 < 1  2      k1 (8.19)

be the eigenvalues of Ak1 and ˛1  ˛2      ˛k be the eigenvalues of Ak . Since Ak1 is a


principal submatrix of Ak , then applying Lemma 8.2.5 we get

˛1  1  ˛2      ˛k1  k  ˛k :

The above formula together with (8.19) show the positivity of ˛2 ; : : : ; ˛k . Since all the
leading principal minors of A are positive, det.Ak / is positive and we have
!
Y
det.Ak / D ˛1 ˛i > 0:
2ik

Thus, ˛1 > 0. Hence, all the eigenvalues of Ak are positive, therefore Ak is positive definite.
We conclude that A is positive definite. t
u

Example 8.10
Use Theorem 8.2.4 to show that the matrix
2 3
3 0 3
6 7
A D 4 0 1 2 5
3 2 8

is positive definite. 
346 Chapter 8 • Orthogonal Matrices and Quadratic Forms

Solution
We just need to verify that the principal minors of A are positive. Indeed, we have
" #
.1/ .2/ 30
A D 3; A D det D 3; A.3/ D det A D 3:
01

Thus, A is positive definite. J

Next, we investigate the relationship between the eigenvalues of a symmetric matrix


and its rank.

Theorem 8.2.6 (Characterization of the Rank of a Symmetric Matrix)


Let A be a symmetric matrix in Mn .R/. Then, the rank of A is the total number of
nonzero eigenvalues of A. In particular, A has full rank if and only if A is positive
8 definite.

To prove the above result, we need to show the following lemma.

ⓘ Lemma 8.2.7 Let A and C be two invertible matrices. Then for any matrix B, we have

rank.ABC/ D rank.B/;

provided that the size of the above matrices are chosen such that the product ABC
makes sense.

Proof
We define the matrix K D ABC. Now, our goal is to show that rank.K/ D rank.B/. Applying
(6.24), we deduce that

rank.K/  rank.AB/  rank.B/:

On the other hand, B D A1 KC1 . Hence, applying (6.24) again, we get

rank.B/  rank.A1 K/  rank.K/:

The two inequalities above show that rank.K/ D rank.B/. t


u

Proof of Theorem 8.2.6


Let 1 ; 2 ; : : : ; n be the eigenvalues of A. Using the spectral theorem (Theorem 8.1.9), then
we can write A as A D SDST , where D D diag.1 ; 2 ; : : : ; n / and S is an invertible matrix.
Hence, applying Lemma 8.2.7, we deduce that rank.A/ D rank.D/, which is clearly the total
number of nonzero eigenvalues of A. If A is positive definite, then all its eigenvalues are
positive, and therefore A has full rank. t
u
8.2  Positive Definite Matrices
347 8
Now, if we relax the condition (8.17) a little bit and allow the inequality hX; AXi  0,
then we obtain a new class of symmetric matrices, called positive semi-definite matrices.

Definition 8.2.2 (Semi-Definite Matrix)


Let A be a symmetric matrix in Mn .R/. Then A is called positive semidefinite if all
its eigenvalues are nonnegative, or equivalently, if

hX; AXi D X T AX  0; (8.20)

for all nonzero vectors X in Rn .

Example 8.11
Show that the matrices
2 3
" # 111
00 6 7
AD and B D 41 1 15
01
111

are positive semi-definite. 

Solution
First, the matrix A is positive semi-definite since its eigenvalues are 1 D 0 and 2 D 1.
Second, for the matrix B, we have

hBX; Xi D hX; BXi D .x1 C x2 C x3 /2  0

for any nonzero vector X given as in (8.18). Thus, B is positive semi-definite. J

Example 8.12
Show that for any rectangular matrix A in Mmn .R/, the matrices AT A and AAT are positive
semi-definite. 

Solution
First, it is clear that AT A is a square symmetric matrix in Mn .R/. For any nonzero vector X
in Rn ,

hX; AT AXi D X T AT AX D .AX/T .AX/ D hAX; AXi D kAXk2  0:

Thus, the matrix AT A is positive semi-definite.


348 Chapter 8 • Orthogonal Matrices and Quadratic Forms

Second, it is also clear that AAT is a symmetric matrix in Mm .R/. Thus, for any nonzero
vector Y in Rm , we have

hAAT Y; Yi D hAT Y; AT Yi D kAT Yk2  0:

Consequently, the matrix AAT is positive semi-definite. J

8.2.1 The Cholesky Decomposition

We have seen in Theorem 8.1.8 that if A is a square matrix in Mn .R/, then we can write
A as

A D QR;
8 where Q is an orthogonal matrix and R is an upper triangular matrix.
Also, we have seen in Theorem 8.1.11 that if A is an invertible matrix and if all its
leading principal minors are nonzero, then we can write A as

A D LU;

where L and U are as before. Now, if we assume that the matrix A is positive definite,
then U has to be equal to LT and we write A as the product of a lower triangular
matrix L and its transpose LT . This product is known as the Cholesky decomposition
(or factorization) and it is very useful in numerical analysis. See Remark 8.2.9 below.

Theorem 8.2.8 (Cholesky’s Decomposition)


Let A be a positive definite matrix in Mn .R/. Then there exists a uniquea lower
triangular matrix L in Mn .R/, with strictly positive diagonal entries, such that

A D LLT : (8.21)
a
We obtain uniqueness only if we assume that the diagonal entries of L are positive.

Proof
Let us first prove the uniqueness. Assume that there exist two lower triangular matrices L1
and L2 satisfying (8.21). Then,

L1 LT1 D L2 LT2 :

This gives, upon multiplying from the left by L1 1 T T 1


2 and from the right by .L2 / D .L2 / ,

In D LLT ; with L D L1


2 L1 :
8.2  Positive Definite Matrices
349 8
This means that

L1 D LT : (8.22)

Since L is a lower triangular matrix (as the product of two lower triangular matrices), then
L1 is a lower triangular matrix and LT is an upper triangular matrix. Then (8.22) shows that
L is a diagonal matrix and satisfies L2 D In . Since its diagonal entries are positive (keep in
mind that the diagonal entries of L1 and L2 are positive). Then we obtain L D In , and thus
L1 D L2 . This shows the uniqueness of the decomposition (8.21).
To establish the existence of the decomposition (8.21), we proceed by induction on the
size of the matrix A. The statement is trivial for n D 1, since if A D Œa, with a > 0, we can
p
take L D LT D Πa.
Now, assume that the decomposition (8.21) exists for any .n  1/  .n  1/ matrix. The
matrix A can be written as
" #
An1 b
AD ;
bT ann

where An1 is a leading principal submatrix of A which is positive definite (Theorem 8.2.4),
b is a vector in Rn1 , and ann is a real positive number. By the induction hypothesis, An1
satisfies (8.21). Thus, there exists a unique lower triangular matrix Ln1 with strictly positive
diagonal entries, such that

An1 D Ln1 LTn1 :

Next, we look for the desired matrix L in the form


" #
Ln1 0.n1/1
LD ;
cT ˛

where c a vector in Rn1 and ˛ > 0 are to be determined. Now, the desired identity

An1 D LLT

leads to
" # " #" #
An1 b Ln1 0.n1/1 LTn1 c
D :
bT ann cT ˛ 01.n1/ ˛

This equation gives

Ln1 c D b; and ˛ 2 C kck2 D ann :


p
Since Ln1 is invertible, it follows that c D L1
n1 b. Also, ˛ D ann  kck2 . It is clear that
2 2 2
ann  kck > 0, since 0 < det.A/ D ˛ .det.Ln1 // . This shows that (8.21) holds for the
matrix A. Thus the proof of Theorem 8.2.8 is complete. t
u
350 Chapter 8 • Orthogonal Matrices and Quadratic Forms

Example 8.13
Find the Cholesky decomposition of the matrix
2 3
25 15 5
6 7
A D 4 15 18 0 5 :
5 0 11

Solution
First, it is clear that A is symmetric. To show that A is positive definite, we need to compute
hX; AXi for any nonzero vector
2 3
x1
6 7
8 X D 4 x2 5
x3

of R3 . We have

hX; AXi D 25x21 C 30x2 x1  10x3 x1 C 18x22 C 11x23 :

Next, using the inequalities

1 1
x2 x1   .x21 C x22 / and  x1 x3   .x21 C x23 /;
2 2

we obtain form above that

hX; AXi  5x21 C 3x22 C 6x23 > 0:

Now, according to Theorem 8.2.8, there exists a unique lower triangular matrix
2 3
l11 0 0
6 7
L D 4 l21 l22 0 5
l31 l32 l33

with l11 ; l22 ; l33 > 0, such that A D LLT . That is


2 3 2 32 3
25 15 5 l11 0 0 l11 l21 l31
6 7 6 76 7
4 15 18 0 5 D 4 l21 l22 0 5 4 0 l22 l32 5
5 0 11 l31 l32 l33 0 0 l33
2 3
l211 l11 l21 l11 l31
6 7
D 4 l11 l21 l221 C l222 l21 l31 C l22 l32 5 :
2 2 2
l11 l31 l21 l31 C l22 l32 l31 C l32 C l33
8.2  Positive Definite Matrices
351 8

Hence, we obtain the following system of equations


8
ˆ
ˆ l211 D 25
ˆ
ˆ
ˆ
ˆ l11 l21 D 15;
ˆ
ˆ
< l l D 5;
11 31
ˆ l221 C l222 D 18;
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ l21 l31 C l22 l32 D 0;
:̂ 2
l31 C l232 C l233 D 11:

Solving this system, we obtain

l11 D 5; l21 D 3; l22 D 3; l31 D 1; l32 D 1; l33 D 3:

Consequently,
2 3
5 00
6 7
L D 4 3 3 05:
1 1 3

ⓘ Remark 8.2.9 The Cholesky decomposition can be used to solve linear systems of the
form

AX D b;

where A is a matrix in Mn .R/ and X and b are vectors in Rn . If A is invertible, then to


solve the above system by computing A1 is costly. First, we compute A1 , which costs
2n3 flops, and then compute X D A1 b, which costs 2n2 flops.
However, if we use the LU decomposition in Theorem 8.1.11, then the cost is
.2=3/n3 flops, which is three times less than the cost in the first method.
Now, if A is positive definite, then we use the Cholesky decomposition and write
A D LLT which costs .1=3/n3 flops, and then we solve LY D b for Y by forward
substitution at the cost of n2 flops. Then, we solve LT X D Y for X by back substitution
and as above the cost for this is n2 . We see that the first method costs six times as much
as the Cholesky decomposition, and the Cholesky decomposition costs two times less
than the LU method.
As we see, there are large differences in the computation cost between the first
method, the LU method, and the Cholesky decomposition method, especially if n is
large.
352 Chapter 8 • Orthogonal Matrices and Quadratic Forms

8.3 Quadratic Forms

Quadratic forms arise in various areas of application; for example, in mechanics a


quadratic form can describe the angular momentum of a solid body rotating about an
axis. Quadratic forms are also used in optimization problems. Now, we introduce the
definition of a quadratic form. Recall that a symmetric matrix is positive definite if for
any nonzero vector X in Rn , the quantity X T AX is a positive real number. Define the
function f as

f W Rn ! R
X 7! X T AX D hX; AXi:

This function is called the quadratic form associated to the symmetric matrix A. In
8 Example 8.9, we have
 
f .X/ D f .x1 ; x2 ; x3 / D 2 x21  x2 x1 C x22 C x23  x2 x3 ; (8.23)

which is the quadratic form associated to the symmetric matrix A considered in


Example 8.9. We see in (8.23) that each term has degree two. So, there are no linear
or constant terms.

Definition 8.3.1 (Quadratic Form)


Let A be a real symmetric matrix. Then the quadratic form associated to A is the
function f that maps each vector X in Rn to the real number

f .X/ D X T AX D hX; AXi: (8.24)

It is clear that the function in (8.24) is a polynomial of degree two and it can be
written as

f .X/ D a11 x21 C a22 x22 C    C ann x2n C 2a12 x1 x2 C 2a13x1 x3 C    C 2an1;nxn1 xn ;
(8.25)

where x1 ; x2 ; : : : ; xn , are the components of the vector X and aij ; 1  i; j  n are the
entries of the matrix A. The terms involving the products xi xj are called the mixed
products, and the matrix A is called the coefficient matrix of the quadratic form f .X/.
We have seen above that the symmetric matrix A is positive definite if the function
f .X/ defined in (8.25) is positive for each nonzero vector X in Rn . One way to see this is
to write the quadratic form (8.25) as the sum of squares. This can be accomplished by
using the spectral theorem (Theorem 8.1.9), as shown in the following theorem.
8.3  Quadratic Forms
353 8

Theorem 8.3.1 (The Principal Axes Theorem)


Let A be a symmetric matrix in Mn .R/ and 1 ; 2 ; : : : ; n be its eigenvalues. Let S be
the orthogonal matrix given in Theorem 8.1.9. Then, the change of variable X D SY
transforms the quadratic form (8.25) into the standard form

f .Y/ D 1 y21 C 2 y22 C    C n y2n ;

where y1 ; y2 ; : : : ; yn are the components of the vector Y in Rn .

Proof
Putting X D SY, we have

f .X/ D X T AX D .SY/T A.SY/

D Y T .ST AS/Y

D Y T DY

D 1 y21 C 2 y22 C    C n y2n ;

where D is the diagonal matrix defined in (8.7). This process is also called the diagonalization
of the quadratic form f .X/. t
u

Example 8.14
Consider the quadratic form

f .x1 ; x2 ; x3 / D 4x21 C 4x22 C x23  2x1 x2 : (8.26)

Find its standard form. 

Solution
The quadratic form can be written as f .X/ D X T AX, with
2 3 2 3
x1 4 1 0
6 7 6 7
X D 4 x2 5 and A D 4 1 4 0 5 :
x3 0 0 1

We see that the diagonal entries of A are the coefficients of the squared terms in (8.26) and
the off-diagonal entries are half the coefficient of the mixed product.
The eigenvalues of A are 1 D 1; 2 D 5, and 3 D 3. Thus, f can be written in the
standard form as

f . y1 ; y2 ; y3 / D y21 C 5y22 C 3y23 :

J
354 Chapter 8 • Orthogonal Matrices and Quadratic Forms

Definition 8.3.2 (Positive Definite Quadratic Form)


Let A be a symmetric matrix in Mn .R/ and let f .X/ be the quadratic form
associated to it. Then f is said to be positive definite (respectively, semi-definite) if
A is positive definite (respectively, semi-definite).

8.3.1 Congruence and Sylvester’s Law of Inertia

We have seen in the principal axes theorem (Theorem 8.3.1) that if f .X/ is the quadratic
form associated to a symmetric matrix A, then we can write f .X/ as

f .X/ D f .SY/ D Y T diag.1 ; 2 ; : : : ; n /Y


D .ST X/T diag.1 ; 2 ; : : : ; n /.ST X/
8
where 1 ; 2 ; : : : ; n are the eigenvalues of A and S is the matrix of eigenvectors of A.
Now, we have the so-called Sylvester’s law of inertia, which says that if there exists
another matrix R such that

f .X/ D .RT X/T diag.1 ; 2 ; : : : ; n /.RT X/

then if we consider the two sets

ƒ D f1 ; 2 ; : : : ; n g and M D f1 ; 2 ; : : : ; n g;

the number of positive elements in ƒ is equal to the number of positive elements in


M, the number of negative elements in ƒ is equal to the number of negative elements
in M and the number of zeros in ƒ is equal to the number of zeros in M. This means
that regardless of how we diagonalize a quadratic form, it will have the same number of
positive coefficients, negative coefficients, and zero coefficients.
Now, in order to prove the above result, we start with this some definitions.

Definition 8.3.3 (Inertia of a Symmetric Matrix)


The inertia of a real symmetric matrix A is defined to be the triple . p;
; q/, where
p;
, and q are, respectively, the numbers of positive, negative, and zero eigenvalues
of A, counted with the algebraic multiplicity. We denote it by In.A/ and write

In.A/ D . p;
; q/:
8.3  Quadratic Forms
355 8
Example 8.15
Find the inertia of the matrix
2 3
20 6 8
6 7
A D 4 6 3 05:
8 08

Solution
We compute the eigenvalues of A, finding

1 p  1 p 
1 D 31 C 385 ; 2 D 31  385 ; 3 D 0:
2 2

Hence, we have p D 2;
D 0, and q D 1: In.A/ D .2; 0; 1/. J

Definition 8.3.4 (Congruent Matrices)


Let A and B be two symmetric matrices in Mn .R/. Then, A and B are called
congruent if there exists an invertible matrix S such that

A D ST BS: (8.27)

Now, we establish the following important result.

Theorem 8.3.2
Let A be a symmetric matrix in Mn .R/. Then, A is congruent to the matrix
2 3
Ip 0 0
6 7
D0 D 4 0 I
0 5 (8.28)
0 0 0qq

with

p C
C q D n;
D r  p; and q D n  r;

where r D rank.A/. The matrix D0 is the canonical form of A with respect to


congruence.
356 Chapter 8 • Orthogonal Matrices and Quadratic Forms

In terms of the quadratic form, this means that if there exists an orthonormal basis
fe1 ; e2 ; : : : ; en g of Rn such that the quadratic form associated to the symmetric matrix A
has the form

X
n
f .X/ D ai x2i ;
iD1

p
then there exists an orthogonal basis fv1 ; v2 ; : : : ; vn g with vi D jai jei ; i D 1; 2; : : : ; n,
in which the quadratic form can be written as

X
p

X
f .X/ D x2i  x2j :
iD1 jD1

Proof of Theorem 8.3.2


8 Since A is symmetric, then using the spectral theorem (Theorem 8.1.9), we write A as

A D SDST ; (8.29)

where D D diag.1 ; 2 ; : : : ; n / is the diagonal matrix of the eigenvalues of A and S is an


orthogonal matrix. Without loss of generality we may assume that the first p eigenvalues
1 ; 2 ; : : : ; p of A are positive and the next r  p eigenvalues pC1 ; pC1 ; : : : ; r of A are
negative (if this is not the case, we may simply order these eigenvalues by using a permutation
matrix, which is an orthogonal matrix as shown in Example 8.2). Now, let D1 be the diagonal
matrix
p p q q q p 
D1 D diag 1 ; 2 ; : : : ; p ; jpC1 j; jpC2 j; jr j; 0; : : : ; 0 :

Thus, the matrix D can be written as

D D D1 D0 D1 ;

where D0 is the matrix defined in (8.28). Now, by substituting into (8.29), we obtain

A D SD1 D0 D1 ST D QD0 QT ;

with Q D SD1 , which is orthogonal. This completes the proof of Theorem 8.3.2. t
u

Theorem 8.3.2 exhibits an important connection between congruent matrices and


positive definite matrices, as shown in the next corollary.

ⓘ Corollary 8.3.3 Let A be a symmetric matrix in Mn .R/. Then A is positive definite if


and only if it is congruent to the identity matrix.
8.3  Quadratic Forms
357 8
Proof
First, assume that A is positive definite. Then, all its eigenvalues are positive. Hence, p D n
in (8.28) and therefore D0 becomes the identity matrix. Conversely, if A is congruent to the
identity matrix, then

A D SIn ST :

Let X be a nonzero vector in Rn . The quadratic form associated to A is

f .X/ D X T AX
D X T SIn ST X

D Y T In Y
X
n
D y2i > 0;
iD1

for any nonzero vector Y D ST X in Rn . Hence, A is positive definite. t


u

Some properties are preserved under the congruence transformation, among them,
the rank.

Theorem 8.3.4 (Rank of Congruent Matrices)


Let A and B be two congruent matrices in Mn .R/. Then

rank.A/ D rank.B/:

Proof
Since A and B are congruent, there exists an invertible matrix S such that A D ST BS. Applying
the result of Exercise 5.3 for two matrices C and D, we have

Ker.D/ Ker.CD/; (8.30)

with equality if C is invertible (which is very easy to check), and

Im.CD/ Im.C/; (8.31)

with equality if C is invertible.


Now, since S is invertible, ST is also invertible and applying (8.31), we deduce that

Im.B/ D Im.BS/:
358 Chapter 8 • Orthogonal Matrices and Quadratic Forms

Hence,

rank.B/ D rank.BS/:

Likewise, since ST is invertible, then applying (8.30), we deduce that

Ker.ST .BS// D Ker.BS/:

Now, applying Theorem 5.2.10, we have

rank.A/ D rank.ST BS/

D n  null.ST BS/

D n  null.BS/
8 D n  null.B/
D rank.B/:

This gives the desired result. t


u

Finding a matrix S satisfying (8.27) can be quite difficult in practice. So, we need
a test that allows us to quickly determine whether two matrices are congruent or not
without computing the matrix S. This test is given by Sylvester’s law of inertia. This
law asserts that two congruent matrices have the same number of positive eigenvalues,
the same number of negative eigenvalues, and the same number of zero eigenvalues.
Moreover, this law is very useful in the study of the stability of solutions of differential
equations, which usually requires the knowledge of the signs of the eigenvalues of some
symmetric matrices.
First, we start with the following definition.

Definition 8.3.5 (Index and Signature of a Diagonal Matrix)


Let D be a diagonal matrix. The index of D is the number of positive entries in D
and the signature of D is the number of positive entries of D minus the number of
negative entries in D. We denote the index of D by p and the number of negative
entries by
.

Since D is a diagonal matrix, the rank r of D is the number of nonzero entries in D.


Thus,

r D p C
:

If s is the signature of D, then we have

s D p 
:
8.3  Quadratic Forms
359 8

Hence,

s D 2p  r:

Theorem 8.3.5 (Sylvester’s Law of Inertia)


Let A and B be two symmetric matrices in Mn .R/. Then, A and B are congruent if
and only if the diagonal representations for A and B have the same rank, index, and
signature. That is, if and only if In.A/ D In.B/.

Proof
First, assume that A and B are congruent. Then Theorem 8.3.4 implies that rank.A/ D
rank.B/. Now, let p1 be the number of positive eigenvalues of A and p2 be the number of
positive eigenvalues of B. Then to show that In.A/ D In.B/ it is enough to prove that p1 D p2 ,
since the two matrices have the same rank. Now, since A and B are congruent, there exists an
invertible matrix S such that

A D SBST : (8.32)
.1/ .2/
Now, using Theorem 8.3.2, we deduce that there exist two matrices D0 and D0 such that
2 3
Ip1 0 0
.1/ 6 7
D0 D 4 0 Irp1 0 5 D PAP
T

0 0 0.nr/.nr/

and
2 3
Ip2 0 0
.2/ 6 7
D0 D 4 0 Irp2 0 5
0 0 0.nr/.nr/

.2/
with B D QD0 QT , where P and Q are invertible matrices. Plugging these into (8.32), we get

.1/ .2/ .2/


D0 D PAPT D PSQD0 QT ST PT D RD0 RT ; (8.33)

with R D PSQ. Now, assume that p2 ¤ p1 , for instance p2 < p1 . Then, we have to reach a
contradiction. Let X be the nonzero vector in Rn , with its first p1 components are not all zero,
but with its last n  p1 components all equal to zero. That is
2
3 2 3
X1 x1
6 0 7 6x 7
6 7 6 27
XD6 7
6 :: 7 ; with X1 D 6
6 :: 7
7
4 : 5 4 : 5
0 xp1
360 Chapter 8 • Orthogonal Matrices and Quadratic Forms

.1/
and X1 ¤ 0Rp1 . Hence, the quadratic form associated to D0 reads

.1/
X
p1
X T D0 X D x2i > 0: (8.34)
iD1

Partition RT in the form


" #
R1 R2
R D
T
;
R3 R4

where R1 ; R2 ; R3 and R4 are sub-matrices with R1 a p2  p1 matrix, R2 an p2  .n  p1 / matrix,


R3 an n  p2  p1 matrix and R4 an .n  p2 /  .n  p1 / matrix. Since p2 < p1 , we can choose
X1 such that X1 ¤ 0Rp1 and R1 X1 D 0Rp2 . Now, we define the vector Y D R3 X1 in Rnp2 and
we have
" #" # " # " #
8 R XD
T R1 R2 X1
D
R1 X1
D
0p2 1
:
R3 R4 0.np1 /1 R3 X1 Y

Consequently, using (8.33), we get

.1/ .2/
X
rp2
X T D0 X D .RT X/T D0 .RT X/ D  y2i  0:
jD1

.1/ .2/
This contradicts (8.34). Similarly, interchanging the roles of D0 and D0 , we can prove that
it is impossible to have p1 < p2 . Consequently, p1 D p2 .
Conversely, if A and B have the same inertia, In.A/ D In.B/ D . p;
; q/, then both
matrices are congruent to the matrix
2 3
Ip 0 0
6 7
D0 D 4 0 I
0 5 ;
0 0 0qq

and then they are congruent to each other (since congruence is an equivalence relation). t
u

Theorem 8.3.5, is very interesting since it tells us that we can determine whether two
symmetric matrices are congruent by just computing their eigenvalues as shown in the
following example.

Example 8.16
Show that the matrices
2 3 2 3
202 1 1 1
6 7 6 7
A D 40 6 25 and B D 4 1 5 1 5
224 1 1 5

are congruent. 
8.4  Exercises
361 8
Solution
The eigenvalues of A are
p p
1 D 2.2 C 3/; 2 D 4; 3 D 2.2  3/:

Hence, In.A/ D .3; 0; 0/.


Now, the eigenvalues of B are

1 p 1 p
1 D .7 C 33/; 2 D 4; 3 D .7  33/:
2 2

Thus, In.B/ D .3; 0; 0/. Since In.A/ D In.B/, A and B are congruent. J

8.4 Exercises

Exercise 8.1 (Square Root of a Definite Matrix)


Let A be a positive semi-definite matrix. A matrix A0 such that A20 D A is called the square
root of A.
1. Show that a matrix A in Mn .R/ is positive definite (respectively, semi-definite) if and
only if it has a positive definite (respectively, semi-definite) square root A0 .
2. Show that if A0 is the square root of A, then rank.A/ D rank.A0 /.
3. Prove that if A is positive semi-definite, then the positive semi-definite square root A0 of
A is unique.
4. Use the above results to find the square root of the matrix
" #
54
AD :
45

Solution
1. We give the proof for the case of a positive definite matrix (the positive semi-definite
case can be proved by the same method). So, assume that A is a positive definite matrix.
Since A is symmetric, then according to the spectral theorem (Theorem 8.1.9), there
exists an orthogonal matrix S such that A D SDST , where D is the diagonal matrix
D D diag.1 ; 2 ; : : : ; n /. Now, since A is positive definite, all its eigenvalues are positive.
Introduce the matrix
p p p
D0 D diag. 1 ; 2 ; : : : ; n /:

Hence, D20 D D and for

A0 D SD0 ST ; (8.35)
362 Chapter 8 • Orthogonal Matrices and Quadratic Forms

we have

A20 D A0 A0 D .SD0 ST /.SD0 ST / D SD0 ST D A:

Consequently, A0 is the square root of A. In addition, it is clear from above that the eigenval-
p p p
ues of A0 are 1 ; 2 ; : : : ; n , which are positive. Hence, A0 is positive definite.
Conversely, if A0 is positive definite, then its eigenvalues are positive. Since, the eigen-
values of A are the squares of those of A0 , they are also positive, hence, A is positive definite.
2. If A is positive definite, then according to Theorem 8.2.2, A is invertible. Also,
according to (1) A0 is positive definite, hence A0 is also invertible. Consequently,
Theorem 6.3.2 yields

rank.A/ D rank.A0 / D n:

8 Now, if A is positive semi-definite and rank.A/ D r, then according to Theorem 8.2.6, A


has exactly r positive eigenvalues. Hence the matrix A0 introduced above also has r positive
eigenvalues and therefore its rank is also equal to r.
3. Assume that there exists a positive semi-definite matrix A1 such that A21 D A. Hence,
the eigenvalues of A1 are the square root of those of A, and since A1 is positive semi-
definite, its eigenvalues must be nonnegative. Consequently, the eigenvalues of A1 and of
the matrix A0 given in (8.35) coincide. Now, since A1 is symmetric, the spectral theorem
(Theorem 8.1.9) implies that

A1 D QD0 QT ;

for some orthogonal matrix Q. Since A20 D A21 D A, we have

SD20 ST D QD20 QT ;

whence

.ST Q/D20 D D20 .ST Q/:

This implies, since D0 is positive semi-definite,

.ST Q/D0 D D0 .ST Q/:

This yields, upon multiplying from the left by S and from the right by QT , A1 D A0 .
4. The eigenvalues of A are 1 D 9 and 2 D 1. Hence, A is positive definite, and A has
a positive definite square root matrix A0 satisfying A20 D A. To find A0 , we see first that A
can be written as

A D SDST ;
8.4  Exercises
363 8
where D is the diagonal matrix
" #
90
DD
01

and S is the orthogonal matrix


2 3
1
p  p1
S D 4 12 1 2 5 :
p p
2 2

Hence, we have from above that


" #
21
A0 D SD0 S D
T
:
12

It is clear that A0 is also positive definite, since its eigenvalues are 1 D 3 and 2 D 1. J

Exercise 8.2 (The Gram Matrix)


Let u1 ; u2 ; : : : ; u` be vectors in Rn . The Gram matrix (or Gramian matrix) is the matrix
defined as

G D .gij /; with gij D ui  uj ; 1  i; j  `:

That is,
2 3
u1  u1 u1  u2    u1  u`
6u u u u    u2  u` 7
6 2 1 2 2 7
GD6
6 :: :: :: :: 7 7:
4 : : : : 5
u`  u1 u`  u2    u`  u`

1. Prove that there exists a matrix A such that G D AT A.


2. Show that G is symmetric and positive semi-definite.
3. Show that G is positive definite if and only if the vectors u1 ; u2 ; : : : ; u` are linearly
independent.
4. Prove that

rank.G/ D dimR spanfu1 ; u2 ; : : : ; u` g:

Solution
1. Let A be the n  ` matrix whose columns are the vectors u1 ; u2 ; : : : ; u` . Then we can easily
check that G D AT A.
364 Chapter 8 • Orthogonal Matrices and Quadratic Forms

2. It is clear that G is symmetric, since ui uj D uj ui for all 1  i; j  `. Now, to show that
G is positive semi-definite, we can use two approaches. First, using (1) and Example 8.12,
we deduce immediately that G is positive semi-definite.
Second, we can use a direct approach. Indeed, let X be a nonzero vector in R` . Then, by
using the properties of the dot product in R` we have


X T GX D .ui  uj /xi xj
i;jD1


D ..xi ui /  .xj uj //
i;jD1
! 0 1
X̀ X̀
D xi ui  @ xj uj A
8 iD1 jD1

 2
X̀ 
 
D xi ui   0; (8.36)
 
iD1

where xi ; i D 1; 2; : : : ` are the components of the vector X. Hence G is positive semi-


definite.
3. We see that the inequality (8.36) is an equality if and only if


xi ui D 0Rn :
iD1

This is not the case if X is a nonzero vector and the vectors fu1 ; u2 ; : : : ; u` g are linearly
independent. Hence, in this case (8.36) is a strict inequality and therefore G is positive
definite.
Conversely, if G is positive definite, then X T GX > 0 whenever X ¤ 0R` . Hence,


xi ui ¤ 0R` :
iD1

Thus, the vectors fu1 ; u2 ; : : : ; u` g are linearly independent.


4. This can be seen from the rank identity

rank.G/ D rank.AT A/ D rank.AAT / D rank.A/ D rank.AT / (8.37)

Since fu1 ; u2 ; : : : ; u` g are the column vectors of A, we have (see Definition 6.3.1)

rank.G/ D rank.A/ D dimR spanfu1 ; u2 ; : : : ; u` g:

We leave it to the reader to check (8.37). See (6.10) for instance. J


8.4  Exercises
365 8
Exercise 8.3 (Minimization of Quadratic Functionsa )
Let A be a symmetric matrix in Mn .R/. Let X and b be two vectors in Rn and c be a real
number. Consider the function

F.X/ D X T AX  2X T b C c: (8.38)

We say that f has a global minimum point X0 if f .X0 /  f .X/ for all X in Rn .
1. Show that if A is positive definite, then the quadratic function F.X/ has a unique global
minimum X0 , which is the solution of the linear system

AX D b; namely X0 D A1 b:

2. Prove that the minimum value of F is

F.X0 / D c  X0T AX0 :

3. Show that if A is positive definite, then the function F is strictly convex, that is

F.X1 C .1  /X2 / < F.X1 / C .1  /F.X2 /; (8.39)

for all X1 and X2 in Rn with X1 ¤ X2 and for any  in Œ0; 1. If we have “” instead of
“<” in (8.39), then we say that F is convex.
4. We define the function F by

F W R2 ! R;

.x1 ; x2 / 7! F.x1 ; x2 / D 4x21  2x1 x2 C 3x22 C 3x1  2x2 C 1:

Find the unique minimum of F.


a
The quadratic optimization problems appear frequently in applications. For instance, many problems in physics
and engineering can be stated as the minimization of some energy functions.

Solution
1. First, since A is positive definite, Theorem 8.2.2 implies that A is invertible. Hence the
linear system AX D b has a unique solution X0 D A1 b. Now, by replacing b by AX0 , we
write the function F.X/ as

F.X/ D X T AX  2X T b C c
D X T AX  2X T AX0 C c
D .X  X0 /T A.X  X0 / C .c  X0T AX0 /: (8.40)
366 Chapter 8 • Orthogonal Matrices and Quadratic Forms

It is clear that if X ¤ X0 , then the first term on the right-hand side of (8.40) is strictly positive,
since the matrix A is positive definite. Also, this term is zero for X D X0 . Since, the second
term in the right-hand side of (8.40) does not depend on X, the minimum of F.X/ occurs at
X D X0 D A1 b.
2. We have from the above that

F.X0 / D F.A1 b/
D .A1 b/T A.A1 b/  2.A1 b/T b C c
D c  bT A1 b
D c  bT X0

D c  X0T AX0 ;

8 since bT D X0T AT D X0T A:


3. According to Exercise 8.5, it is enough to show that the Hessian matrix of f ,
2 3
@2 f 2 @2 f
@x21
.X/ @x@1 @xf 2 .X/ ::: @x1 @xn
.X/
6 7
6 @2 f @2 f @2 f 7
6 @x2 @x1 .X/ @x22 .X/ ::: @x2 @xn .X/ 7
H.X/ D 6
6 :: :: ::
7;
7
6 :: 7
4 : : : 5 :
2
@ f 2
@ f 2
@ f
@xn @x1 .X/ @xn @x2 .X/ : : : @x2 .X/ n

is positive definite. Hence,

H.X/ D 2A:

Since A is positive definite, H.X/ is positive definite and consequently, the function f is
strictly convex.
4. The function F can we written in the form (8.38), with
" # " # " #
x1 4 1  32
XD ; AD ; bD ; c D 1:
x2 1 3 1

It is clear that A is positive definite, since its eigenvalues are

1 p  1 p 
1 D 7C 5 ; 2 D 7 5 :
2 2

Hence, applying the above results, we deduce that F has a global minimum, which is
" #
7
1  22
X0 D A bD 5
:
22
8.4  Exercises
367 8
Now, the value of F at X0 is

13
F.X0 / D c  X0T AX0 D :
44
13
Hence, we deduce that for any X in R2 we have F.X/ > 44
. J

Exercise 8.4
Let A and B be two matrices in Mn .R/.
1. Show that the following two statements are equivalent:

(i) for all X and Y in Rn , we have

X T AY D X T BY: (8.41)

(ii) A D B.

2. Prove that if A and B are symmetric, then the following statements are equivalent:

(i) for all X in Rn , we have

X T AX D X T BX: (8.42)

(ii) A D B.

Solution
1. First, it is clear that (ii) implies (i). Now, let us assume that (i) holds and show that A D B.
So, (8.41) implies that for any vector X in Rn ,

X T .AY  BY/ D 0:

In particular, this equality also holds for the vector AY  BY, which is also a vector in Rn .
Hence, we have

.AY  BY/T .AY  BY/ D k.AY  BY/k2 D 0:

This yields (see Theorem 3.3.1)

AY  BY D 0Rn :

Since the last equality is true for all vectors Y of Rn , then necessarily A  B D 0Mn .R/ .
2. As above, it is trivial to see that (ii) implies (i). Now, if A and B are symmetric and
(8.42) holds, then, applying (8.42) for the vectors X; Y, and X C Y in Rn , we get

X T AX D X T BX; Y T AY D Y T BY; and .X C Y/T A.X C Y/ D .X C Y/T B.X C Y/:


368 Chapter 8 • Orthogonal Matrices and Quadratic Forms

This implies

Y T AX C X T AY D Y T BX C X T BY:

Since A and B are symmetric, we have

Y T AX D .X T AY/T D X T AY and Y T BX D X T BY:

Hence, we obtain X T AY D X T BY, for any X and Y in Rn . Thus, applying (1), we obtain
A D B. J

Exercise 8.5 (The Hessian Matrix)


Let f be a function f W Rn ! R. If all the second derivatives of f exist and are continuous
on the domain of f , the Hessian matrix of f at a point X D .x1 ; x2 ; : : : ; xn / of Rn is the n  n
8 matrix
2 3
@2 f 2 @2 f
@x21
.X/ @x@1 @xf 2 .X/ ::: @x1 @xn .X/
6 7
6 @2 f @2 f
.X/ @x @2 f
.X/ 7
6 @x2 @x1 2 .X/ ::: @x2 @xn 7
H.X/ D 6
6 ::
2
:: ::
7:
7
6 :: 7
4 : : : : 5
@2 f @2 f @2 f
@xn @x1 .X/ @xn @x2 .X/ : : : @x2
.X/
n

1. Show that the Hessian matrix is symmetric.


2. Show that if f is differentiable, then f is convex if and only if for any two vectors X and
Y in Rn ,

f .Y/  f .X/ C rf .X/T .Y  X/; (8.43)

where rf .X/T is the vector


 
@f @f @f
rf .X/T D ; ;:::; :
@x1 @x2 @xn

3. Prove that if H.X/ is positive definite (respectively, semi-definite) for all X in Rn , then f
is strictly convex (respectively, convex) in Rn .
4. Deduce that the function

f .x1 ; x2 ; x3 / D x21 C 2x22 C 3x23 C 2x1 x2 C 2x1 x3 C 3

is strictly convex on R3 .


8.4  Exercises
369 8
Solution
2 2
1. Since @x@i @xf j D @x@j @xf i , we deduce that H.X/ is symmetric.
2. First, assume that (8.43) is satisfied and let X and Y be two vectors in Rn . Let  be in
Œ0; 1. We put Z D Y C .1  /X. Then,

f .Y/  f .Z/ C rf .Z/T .Y  Z/; (8.44)

and

f .X/  f .Z/ C rf .Z/T .X  Z/: (8.45)

Hence, multiplying (8.44) by  and (8.45) by 1   and adding the results, we obtain

f .Y/ C .1  /f .X/  f .Z/ C rf .Z/T .Y C .1  /X  Z/


D f .Y C .1  /X/;

since Z D Y C .1  /X. Hence, f is convex.


Conversely, assume that f is convex. Then, for any X and Y in Rn and  in R, let Z D
Y C .1  /X. By the convexity of f ,

f .Z/ D f .Y C .1  /X/  f .Y/ C .1  /f .X/:

Hence, we get

f .Z/  f .X/  f .Y/  f .Y/ C .1  /f .X/  f .X/

D f .Y/  f .X/:

Recall that

f .X C d/  f .X/
rf .X/T d D lim ;
!0C 

where d is in Rn . Therefore, we deduce (by taking d D X  Y) that,

f .X C .Y  X//  f .X/


rf .X/T .Y  X/ D lim
!0C 
 f .Y/  f .X/:

Hence, (8.43) holds.


It is also clear, by the same method, that the inequality in (8.43) is strict if and only if f
is strictly convex.
3. Let X and Y be two vectors in Rn with X ¤ Y. By using Taylor’s theorem and the
Lagrange form of the remainder, we have

F.Y/ D F.X/ C rf .X/T .Y  X/ C .Y  X/T H.X C ˛.Y  X//.Y  X/;


370 Chapter 8 • Orthogonal Matrices and Quadratic Forms

for some ˛ in Œ0; 1. Since H is positive definite, we have

.Y  X/T H.X C ˛.Y  X//.Y  X/ > 0:

Consequently, we obtain from above

F.Y/ > F.X/ C rf .X/T .Y  X/:

Therefore, from question .2/, we deduce that f is strictly convex. By the same argument, if
H is positive semi-definite, then f is convex.
4. To show that f is strictly convex, then it is enough to show that its Hessian matrix is
positive definite. Indeed, we have
2 3
@2 f 2 2
.X/ @x@1 @xf 2 .X/ @x@1 @xf 3 .X/ 2 3
6 @x21 7 222
8 6
H.X/ D 6
@2 f
@x2 @x1
.X/ @x@2 f
2 .X/ @x
@2 f
@x
.X/ 7
7
6 7
D 42 4 05:
4 2 2 3
5
@2 f @2 f @2 f 206
@x3 @x1 .X/ @x3 @x2 .X/ @x23 .X/

We see that the leading principal minors of H.X/ are

A.1/ D 2; A.2/ D 4 and A.3/ D 8:

Hence, A is positive definite (see Theorem 8.2.4). Therefore, according to question .3/, we
deduce that f is strictly convex. J

Exercise 8.6
1. Let v1 ; v2 ; : : : ; vn be a basis of a vector space E over a field K. Let W be a k-dimensional
subspace of E. Show that if m < k, then there exists a nonzero vector in W which is a
linear combination of the vectors vmC1 ; : : : ; vn .
2. Let A be a symmetric matrix in Mn .R/. Show that if Y T AY > 0, for all nonzero vectors Y
in a k-dimensional subspace W of Rn , then A has at least k positive eigenvalues (counting
the multiplicity).a

a
There are several proofs of this result, here we adapt the one in [7].

Solution
1. Consider the subspace F defined as

F D spanfvmC1 ; : : : ; vn g:

Then, dimK F D n  m. Hence, applying (4.16), we have

dimK W C dimK F D dimK .F C W/  dim .F \ W/:


8.4  Exercises
371 8
That is, k C n  m > n, since m < k. Hence, we deduce that the subspace F \ W contains at
least one nonzero vector v. Otherwise, we have dimK .F CW/ > dimK E, which is impossible
according to Theorem 4.6.7. Thus, v 2 F \ W. This means that v is a linear combination of
vmC1 ; : : : ; vn since v 2 F. J

2. Let v1 ; v2 ; : : : ; vn be an orthonormal basis of Rn consisting of eigenvectors


of A. (Such a basis exists according to Theorem 8.1.7). Let 1 ; 2 ; : : : ; n be the
corresponding eigenvalues. Without loss of generality, we may assume that the first
m .m  n/ eigenvalues of A are positive and the rest are not. If m < k, then from .1/, we
deduce that there exists a nonzero vector Y in W such that

Y D cmC1 vmC1 C    C cn vn :

Hence, since v1 ; v2 ; : : : ; vn are orthonormal, we have

Y T AY D c2mC1 mC1 C    C c2n n  0:

This is a contradiction, since we assumed that Y T AY > 0, for all nonzero vectors Y in
W. Consequently, m  k.

Exercise 8.7 (The Submultiplicative Norm of a Matrix)


Let A D .aij /; 1  i; j  n, be a matrix in Mn .R/.a We define the norm

X
n X
n
kAk2F D jaij j2 D tr.AT A/ Frobenius’ norm.b
jD1 iD1

1. Find kAkF for the matrix


2 3
1 2 3
6 7
A D 4 0 5 1 5 :
6 2 4

2. Show that

kABkF  kAkF kBkF submultiplicativity property; (8.46)


p
and kIkF D n, where I is the identity matrix in Mn .R/.
3. Show that the two norms

X
n X
n
kAk1 D max jaij j and kAk1 D max jaij j satisfy (8.46):
1in 1jn
jD1 iD1
372 Chapter 8 • Orthogonal Matrices and Quadratic Forms

4. Show that if Q is an orthogonal matrix, then

kAQkF D kQAkF D kAkF :

That is, the Frobenius norm is invariant under orthogonal transformation.


5. We define the norm

kAXk
kAk2 D sup D max kAXk; (8.47)
X¤0Rn kXk kXkD1

where k  k is the Euclidean norm in Rn introduced in Definition 3.3.1. Show that k  k2 is well
defined and satisfies the property (8.46), and kIk2 D 1.
6. Show that for any matrix A in Mn .R/,
p
kAk2  kAkF  nkAk2 : (8.48)
8

a
All the results here remain true if we replace R by C. See Remark 8.1.1.
b
Sometimes referred to as the Hilbert–Schmidt norm and defined as the usual Euclidean norm of the matrix A
2
when it is regarded as a vector in Rn .

Solution
1. We have
2 32 3 2 3
1 0 6 1 2 3 37 10 21
6 76 7 6 7
AT A D 4 2 5 2 5 4 0 5 1 5 D 4 10 33 19 5 :
3 1 4 6 2 4 21 19 26
p
Hence, tr.AT A/ D 96, so kAkF D 96.
2. We put A D .aij /; B D .bij /, and C D AB D .cij /. Hence, we have (see
Definition (1.1.11))

X
n
cij D aik bkj :
kD1

Therefore, we obtain

ˇXn ˇ2  X
n 2
ˇ ˇ
jcij j2 D ˇ aik bkj ˇ  jaik jjbkj j :
kD1 kD1

Now, applying the Cauchy–Schwarz inequality (see (3.16)), we obtain

X
n 2  X
n  X
n 
jaik jbkj j  jaik j2 jbkj j2 :
kD1 kD1 kD1
8.4  Exercises
373 8
This yields
X X  X 
kABk2F D jcij j2  jaik j2 jblj j2 D kAk2F kBk2F ;
i;j i;k l;j

which gives the desired result. As a side note, property (8.46) can be seen as a generalization
of the Cauchy–Schwarz inequality.
p
It is obvious that kIk2F D tr.I T I/ D tr.I/ D n. This yields kIkF D n.
3. First, we need to show that kAk1 has the submultiplicativity property (8.46). Indeed,
we have from above

X
n
jcij j  jaik jjbkj j:
kD1

Hence,

X
n X n 
X X
n 
jcij j  jaik jjbkj j D jaik j jbkj j :
jD1 j;k kD1 jD1

Since

X
n
jbkj j  kBk1 ;
jD1

we obtain

X
n X
n 
jcij j  jaik j kBk1  kAk1 kBk1 :
jD1 kD1

Thus,

kCk1 D kABk1  kAk1 kBk1 :

By the same reasoning, we can show that the norm k  k1 has the submultiplicativity
property.
4. Since Q is orthogonal, we have (see (8.2)) QT Q D QQT D I. Hence, we get

kQAk2F D tr..QA/T .QA// D tr.AT QT QA/ D tr.AT A/ D kAk2F :

Similarly,

kAQk2F D tr..AQ/T .AQ// D tr.QT AT AQ/ D tr.QQT AT A/ D tr.AT A/ D kAk2F ;

where we have used the cyclic property of the trace, that is, tr.ABC/ D tr.CAB/. (The reader
should be careful, tr.ABC/ ¤ tr.ACB/ in general).
374 Chapter 8 • Orthogonal Matrices and Quadratic Forms

5. We need to show that the supremum in (8.47) is well defined. Let


2 3
x1
6 7
6 x2 7
XD6 7
6 :: 7
4 : 5
xn

be a vector in Rn . Then, we have

n ˇX
X n ˇ2
ˇ ˇ
kAXk2 D ˇ aij xj ˇ :
iD1 jD1

Applying the Cauchy–Schwarz inequality, we get


8 0 10 1
X
n Xn X
n
kAXk2  @ jaij j2 A @ jxj j2 A ;
iD1 jD1 jD1

whence

kAXk2  kAk2F kXk2

and therefore

kAXk
 kAkF ; for any X ¤ 0Rn :
kXk
n o
kAXk
This implies that the set of real numbers kXk
; X ¤ 0Rn is bounded and therefore is has a
supremum and we have

kAk2  kAkF :

Now, we need to show that k  k2 satisfies (8.46). Indeed, we have for any vector X ¤ 0Rn .

k.AB/Xk D kA.BX/k  kAk2 kBXk  kAk2 kBk2 kXk:

Hence, we obtain,

k.AB/Xk
 kAk2 kBk2 ;
kXk

and so

kABk2  kAk2 kBk2 :


8.4  Exercises
375 8
If A D I, then we have

kAXk kXk
D D 1; X ¤ 0Rn :
kXk kXk

Hence, kIk2 D 1.
6. We have already proved in (5) that kAk2  kAkF . So, we just need to show that kAkF 
p
nkAk2 . This inequality can be easily obtained due to the fact that kAkF D tr.AT A/ 
n .AT A/ D kAk2 (see Exercise 8.8). J

Exercise 8.8 (Spectral Radius)


Let A be a matrix in Mn .C/ and let 1 ; 2 ; : : : ; n be its eigenvalues. We define the spectral
radius .A/ of A as

.A/ D max ji j:


1in

1. Show that .A/  kAk2 , where k  k2 is defined in (8.47) (with C instead of R).a
2. Prove that

lim Am D 0 ” .A/ < 1:


m!1

3. Show that

1=m
.A/ D lim kAm k2 spectral radius formula:b (8.49)
m!1


a
In fact, this holds for any matrix norm.
b
This formula yields a technique for estimating the top eigenvalue of A.

Solution
1. Let i be an eigenvalue of A, i.e.,

AXi D i Xi ; Xi ¤ 0Cn :

On the other hand, we have, for any X ¤ 0Cn ,

kAXk  kAk2 kXk:

In particular, for X D Xi , we have

kAXi k D ji jkXi k  kAk2 kXi k:

This yields ji j  kAk2 , which conclude the proof.


376 Chapter 8 • Orthogonal Matrices and Quadratic Forms

2. First, let  be an eigenvalue of A. Assume that limm!1 Am D 0, and let X be an


eigenvector corresponding to . Then

AX D X and Am X D m X:

Hence, since limm!1 Am D 0, then we deduce that limm!1 m X D 0. Since X ¤ 0Cn ,


then we have limm!1 m D 0. This shows that jj < 1. This last inequality is satisfied for
all the eigenvalues of A. Consequently, .A/ < 1. We leave it to the reader to show .A/ < 1
implies that limm!1 Am D 0.
3. To show the spectral radius formula, we have first from (1) that .A/  kAk2 , and
since .A/ D . .Am //1=m , we deduce that

1=m
.A/  kAm k2 ; for all m:

8 Now, to prove (8.49), we need to show that for any  > 0, there exists a positive integer
N D N./ such that for any m  N,

1=m
kAm k2  .A/ C :

Let  > 0 be given. Consider the matrix

1
A D A:
.A/ C 

Thus, .A / < 1. Therefore, we deduce from (2) that

 D 0:
lim Am
m!1

Consequently, there exists a positive integer l./ such that for m  l./, we have

1
kD
kAm kAm k < 1:
. .A/ C /m

Now, it is enough to choose N./ D l./. This finishes the proof of (8.49). J
377

Servicepart
References – 379

Index – 381

© Springer International Publishing AG 2017


B. Said-Houari, Linear Algebra, Compact Textbooks in Mathematics,
DOI 10.1007/978-3-319-63793-8
379

References

1. H. Anton, C. Rorres, Elementary Linear Algebra: with Supplemental Applications, 11th edn.
(Wiley, Hoboken, 2011)
2. M. Artin, Algebra, 2nd edn. (Pearson, Boston, 2011)
3. S. Axler, Linear Algebra Done Right. Undergraduate Texts in Mathematics, 2nd edn. (Springer,
New York, 1997)
4. E.F. Beckenbach, R. Bellman, Inequalities, vol. 30 (Springer, New York, 1965)
5. F. Boschet, B. Calvo, A. Calvo, J. Doyen, Exercices d’algèbre, 1er cycle scientifique, 1er année
(Librairie Armand Colin, Paris, 1971)
6. L. Brand, Eigenvalues of a matrix of rank k. Am. Math. Mon. 77(1), 62 (1970)
7. G.T. Gilbert, Positive definite matrices and Sylvester’s criterion. Am. Math. Mon. 98(1), 44–46
(1991)
8. R. Godement, Algebra (Houghton Mifflin Co., Boston, MA, 1968)
9. J. Grifone, Algèbre linéaire, 4th edn. (Cépaduès–éditions, Toulouse, 2011)
10. G.N. Hile, Entire solutions of linear elliptic equations with Laplacian principal part. Pac. J. Math
62, 127–140 (1976)
11. R.A. Horn, C.R. Johnson, Matrix Analysis, 2nd edn. (Cambridge University Press, Cambridge,
2013)
12. D. Kalman, J.E. White, Polynomial equations and circulant matrices. Am. Math. Mon. 108(9),
821–840 (2001)
13. P. Lancaster, M. Tismenetsky, The Theory of Matrices, 2nd edn. (Academic Press, Orlando, FL,
1985)
14. S. Lang, Linear Algebra. Undergraduate Texts in Mathematics, 3rd edn. (Springer, New York,
1987)
15. L. Lesieur, R. Temam, J. Lefebvre, Compléments d’algèbre linéaire (Librairie Armand Colin,
Paris, 1978)
16. H. Liebeck, A proof of the equality of column and row rank of a matrix. Am. Math. Mon. 73(10),
1114 (1966)
17. C.D. Meyer, Matrix Analysis and Applied Linear Algebra (SIAM, Philadelphia, PA, 2000)
18. D.S. Mitrinović, J.E. Pečarić, A.M. Fink, Classical and New Inequalities in Analysis. Mathematics
and Its Applications (East European Series), vol. 61 (Kluwer Academic, Dordrecht, 1993)
19. C. Moler, C. Van Loan, Nineteen dubious ways to compute the exponential of a matrix, twenty-five
years later. SIAM Rev. 45(1), 3–49 (2003)
20. J.M. Monier, Algèbre et géométrie, PC-PST-PT, 5th edn. (Dunod, Paris, 2007)
21. P.J. Olver, Lecture notes on numerical analysis, http://www.math.umn.edu/~olver/num.html.
Accessed Sept 2016
22. F. Pécastaings, Chemins vers l’algèbre, Tome 2 (Vuibert, Paris, 1986)
23. M. Queysanne, Algebre, 13th edn. (Librairie Armand Colin, Paris, 1964)
24. J. Rivaud, Algèbre linéaire, Tome 1, 2nd edn. (Vuibert, Paris, 1982)
25. S. Roman, Advanced Linear Algebra. Graduate Texts in Mathematics, vol. 135 (Springer, New
York, 2008)
26. H. Roudier, Algèbre linéaire: cours et exercices, 3rd edn. (Vuibert, Paris, 2008)
27. B. Said-Houari, Differential Equations: Methods and Applications. Compact Textbook in
Mathematics (Springer, Cham, 2015)

© Springer International Publishing AG 2017


B. Said-Houari, Linear Algebra, Compact Textbooks in Mathematics,
DOI 10.1007/978-3-319-63793-8
380 References

28. D. Serre, Matrices. Theory and Applications. Graduate Texts in Mathematics, vol. 216, 2nd edn.
(Springer, New York, 2010)
29. G. Strang, Linear Algebra and Its Applications, 3rd edn. (Harcourt Brace Jovanovich, San Diego,
1988)
30. V. Sundarapandian, Numerical Linear Algebra (PHI Learning Pvt. Ltd., New Delhi, 2008)
31. H. Valiaho, An elementary approach to the Jordan form of a matrix. Am. Math. Mon. 93(9),
711–714 (1986)
381

Index

Block matrix, 118 Components


Cofactor – of a vector, 124
– matrix, 92 Congruent
Fibonacci – matrices, 355
– sequence, 116 Consistent
Matrix – system of linear equations, 263
– skew–symmetric, 111 Convex
– function, 365
Abelian Cramer’s rule, 100
– group, 11 Cyclic property
Abelian group, 160 – of the trace, 373
Addition
– of matrices, 8
Determinant, 74
Adjoint
– of Vandermonde, 108
– of a matrix, 92
Determinant , 57
Apollonius’ identity, 149
Diagonalizable
Automorphism, 200
– matrix, 240, 290
– of vector spaces, 211
Diagonally
– dominant matrix, 264
Basis Dimension
– of a vector space, 176 – of a direct sum , 186
bijective – of a subspace, 182
– tarnsformation, 215 – of a vector space, 178
Binomial formula, 20 Direct sum
– of vector spaces, 170
cardinality Distance, 131
– of a set, 178 Dot product, 6, 132
Cauchy–Schwarz Dunford
– inequality, 136, 372 – decomposition, 302
Cayley–Hamilton Dunkl–Williams inequality, 151
– theorem, 305
Characteristic
– polynomial, 57 Eigenspace, 270
Characteristic polynomial, 284 Eigenvalue
Cholesky – complete, 275
– decomposition, 348 – defective, 275
Cofactor – of a matrix, 278
– expansion, 74 – of a projection, 272
– of a matrix, 73 – of an endomorphism, 269
Column space, 241 Eigenvector
Commutative – generalized, 305
– group, 12 – matrix, 295
Complement – of a matrix, 278
– of a subspace, 194 – of an endomorphism, 269

© Springer International Publishing AG 2017


B. Said-Houari, Linear Algebra, Compact Textbooks in Mathematics,
DOI 10.1007/978-3-319-63793-8
382 Index

Elementary Identity
– matrix, 82 – operator, 200
Elementary row operation, 246 Image
Endomorphism, 200 – of a linear transformation, 207
Equation Inconsistent
– linear, 4 – system of linear equations, 263
Equivalent inconsistent, 264
– matrices, 240 Index
Euclidean – of a matrix, 358
– norm, 129 Inertia
– vector space, 126 – of a matrix, 354, 359
Euler Injective
– formula, 283 – linear transformation, 205
Exponential Inverse
– of a matrix, 32 – of an isomorphism, 212
Isomorphic
Factorization – vector spaces, 212
– LU, 336 Isomorphism, 200
– QR, 330 – of vector spaces, 211
Fibonacci
– matrix, 116 Jordan
Frobenius inequality, 257 – block, 304
– for rank, 223 – canonical form, 303

Gauss elimination, 336 Kernel


Gauss–Jordan – of a linear transformation, 204
– elimination method, 41
Global minimum Lagrange form
– of a function, 365 – of the reminder, 369
Gram’s Lagrange’s identity, 157
– matrix, 363 Linear
Gram–Schmidt process, 328 – combination, 127, 165
Group, 11, 126 – dependence, 172
– GL.n; K/, 28 – independence, 172
– Abelian, 126 – operator, 200
– of invertible matrices, 65 – transformation, 200
– of matrices, 12 Linear transformation
– orthogonal, 327 – associated to a matrix, 233

Hölder’s Matrix
– inequality, 152 – associated to a linear transformation, 227
Hankel – augmented, 40
– matrix, 110 – circulant, 319
Hessian – companion, 112, 317
– matrix, 368 – diagonal, 30, 166
Homogeneous – idempotent, 58, 60
– system, 5 – identity, 21
– inverse, 22
Idempotent – involutory , 60
– matrix, 258, 266 – nilpotent, 55, 302
383
Index

– non derogatory, 317 – projection, 142, 143


– of full rank, 242 – subspace, 165
– of Vandermonde, 108, 110 – vectors, 141
– orthogonally diagonalizable, 334 Orthonormal
– positive definite, 341 – basis, 329
– positive semi-definite, 347 – vectors, 328
– skew-symmetric, 189
– square, 7, 19
– symmetric, 52, 189, 297 Parallelogram identity, 140
– transpose, 50 Polarization identity, 140
– triangular, 35, 166 Positive definite
– tridiagonal, 116 – quadratic form, 354
Matrix inverse Principal axes
– determinant of, 90 – theorem, 353
Maximal Product
– linearly independent set, 179 – of two vector space, 192
Method Projection
– of elimination, 2 – transformation, 215
– of substitution , 2 Pythagoras theorem in Rn , 144
Minimal
– polynomial, 312
Minkowski Quadratic
– inequality, 153 – form, 352
Minor – function, 365
– of a matrix, 251
– leading principal , 337
Rank
– of a matrix, 72
– of a linear transformation, 209
– principal, 337
– of a matrix, 242
Multiplication
– of a symmetric matrix, 346
– of matrices, 13
Rank-nullity theorem, 209
Multiplicity
Rayleigh–Ritz
– algebraic, 275
– theorem, 344
– geometric, 275
Reflection matrix, 326
Ring, 20
Nilpotent – of matrices, 22
– matrix, 308 Rotation matrix, 59, 326
– linear transform, 222 Row operation, 39
– matrix, 310 Row reduction method, 79
Norm Row space
– of a matrix, 371 – of a matrix, 244
– of a vector, 128
– of Frobenius, 371
– submultiplicative, 371 Schur’s
Null space, 167 – formula, 118
Nullity – lemma, 335
– of a linear transform, 209 Semi-definite
– matrix, 347
Orthogonal – quadratic form, 354
– complement, 196 Signature
– matrix, 145, 323 – of a matrix, 358
384 Index

Similar Taylor’s
– matrices, 240, 288 – theorem, 369
Singular Trace, 57
– matrix, 279 – of a matrix, 37
Spectral Transition matrix, 235, 252
– radius, 375 Triangle inequality, 139
– theorem, 334 – for rank, 223
Spectrum, 278, 296 Triangularization
Square root – of a matrix, 298
– of a matrix, 361 – of an endomorphism, 298
Submatrix
– leading principal, 337
– principal, 337 Unit
Subspace, 163 – vector, 130
Surjective
– linear transformation, 208
Vector space, 127, 159
Sylvester’s law
– of nullity, 257
Sylvester’s law of inertia, 359 Young’s inequality, 138, 153

Вам также может понравиться