Вы находитесь на странице: 1из 163

L I N E A R A L G E B R A

Erin P. J. Pearse

Rm = colsp(A) null(AT)
Rn = rowsp(A) null(A)

rowsp(A) = ran(AT) Axp = b


rank(A) = r = dim rowsp(A) colsp(A) = ran(A)
Ax = b rank(A) = r = dim colsp(A)
xp
x = xp + xh Axh = 0 b

A
null(AT)
0 xh 0
null(A) nullity(AT) = m-r
AT
nullity(A) = n-r

These notes follow Elementary Linear Algebra with Applications (9ed) by B. Kolman and
D. R. Hill, but also include material borrowed freely from other sources, including the
classic text by G. Strang. This document is not to be used for any commercial purpose.

Version of October 3, 2011


Disclaimer:
These notes were typed hurriedly in preparation for lecture. They are FULL of errors and
are not intended for distribution to the students. Typically, the errors get corrected while
proceeding through lecture on-the-fly. Caveat, caveat, caveat, etc.
Contents

Course Overview v
0.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
0.2 Preliminaries and reference . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
0.2.1 Common notations and terminology . . . . . . . . . . . . . . . . . . viii
0.2.2 Logic and inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
0.2.3 Proof Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi

1 Linear systems 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Systems of Linear Equations . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Solution sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.1 Types of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.2 Matrix operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Matrix multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.1 Column-by-column, row-by-row . . . . . . . . . . . . . . . . . . . . . 14
1.4 Algebraic properties of matrix operations . . . . . . . . . . . . . . . . . . . 16
1.4.1 The transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.5 Special matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.5.1 Inverses of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.5.2 Diagonal matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.5.3 Triangular matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.5.4 Symmetric matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.6 Matrix transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.6.1 Functions from Rn to R . . . . . . . . . . . . . . . . . . . . . . . . . 31
ii CONTENTS

1.6.2 Functions from Rn to Rm . . . . . . . . . . . . . . . . . . . . . . . . 31


1.6.3 Some operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1.6.4 Dilation and contraction . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.6.5 Compositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

2 Solving linear systems 39


2.1 Echelon form of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.1.1 Back substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.2 Solving linear systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.3 Elementary matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.3.1 Elementary matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.3.2 Representation of Row Operations . . . . . . . . . . . . . . . . . . . 46
2.3.3 How to find inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.4 Systems & Invertibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.4.1 The number of solutions of a system of linear equations . . . . . . . 51

3 Determinants 55
3.3 Cofactor expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.3.1 Determinants by cofactors . . . . . . . . . . . . . . . . . . . . . . . . 57
3.2 Determinant Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.2.1 Determinants and row operations . . . . . . . . . . . . . . . . . . . . 63
3.5 Applications of Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.5.1 Cramers Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.5.2 Linear systems of the form Ax = x . . . . . . . . . . . . . . . . . . 69

4 Vector Spaces 71
2 3
4.1 Vectors in R and R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.1.1 Vector arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.2 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.3 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.4 Span . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.5 Linear independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.6 Basis and dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.6.1 Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.6.2 Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
CONTENTS iii

4.7 Homogeneous systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91


4.7.1 Nonhomogeneous systems . . . . . . . . . . . . . . . . . . . . . . . . 92
4.8 Coordinates and isomorphism . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.8.1 Coordinates and change of basis . . . . . . . . . . . . . . . . . . . . 95
4.9 Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5 Length and direction 107


5.1 Vector arithmetic and norms . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.1.1 Distance and length . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.1.2 Dot products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.1.3 Arithmetic of the Dot Product . . . . . . . . . . . . . . . . . . . . . 110
5.1.4 Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.2 Cross Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.3 Inner product spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.3.1 Discursion relating to ODE and many other applications . . . . . . . 120
5.4 The Gram-Schmidt Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 124
5.5 Orthogonal complements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

7 Eigenvalues and eigenvectors 131


7.1 Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
7.1.1 Finding eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
7.1.2 Complex and degenerate eigenvalues . . . . . . . . . . . . . . . . . . 136
7.2 Diagonalization and similar matrices . . . . . . . . . . . . . . . . . . . . . . 140
7.3 Diagonalization of symmetric matrices . . . . . . . . . . . . . . . . . . . . . 144
7.3.1 Epilogue: an application of eigenvectors to calculus that youve
probably already seen (but maybe didnt know it) . . . . . . . . . . 148
iv CONTENTS
Course Overview

0.1 Outline

Linear algebra: the study of linear equations, vectors, matrices, and vector spaces. Lets
discuss these informally well go over these ideas more precisely later.

1. Linear algebra begins as the study of linear equations. A linear is a sum of variables
with coefficients, like
2x 3y + 14 z = 4.

This is a simple type of equation, the kind with which you have the most familiarity
it is an equation whose graph is straight: a line or a plane.

These are the most simple equations around, and the only ones that we really know
about. The lesson of calculus is that if a function is locally linear, i.e., smooth under
a magnifying glass, then we can study it. That is, if a function is differentiable, we
can study it with calculus.

A solution to

2x 3y + 14 z = 4

is a choice of x, y, z that makes each equation true, like x = 6, y = 3, z = 4.

The next step is to consider systems of linear equations (that is, a group of a few
linear equations).

Can they all be solved simultaneously? Is the solution unique? The answers depend
on the system under consideration: not all can be solved, and some have many
vi Course Overview

solutions. A solution to a system like

2x 3y + 41 z = 4
1
x 2y + 3z = 3

x + 9y z = 0

is a choice of x, y, z that makes each equation true simultaneously. The solution


also has a geometric interpretation: each of those equations is a plane in 3-space. A
solution to this set of equations is a point in space where all three planes intersect.
Why might such a thing not exist? How many ways can 3 planes intersect?

(Examples with 3 sheets of paper)

2. Matrices are a natural way to write and solve systems of linear equations. At first,
matrices seem like just a notational convenience. However, after working with them,
one discovers that they allow for efficient computation. After working with them
even more, you start to see how properties of the matrices give information about
the system that might not be initially apparent. Example: determinants.

3. Next, one is led to consider vectors, that is, objects which must be described
in terms of more than one coordinate or components, like (2, 3, 41 ). The most
natural examples are points in the plane or in space. Vectors allow one to describe
multidimensional phenomena, and so are inherently adapted to describing geometry.

4. In order to study vectors, it is often helpful to consider the collection of all vectors
of a certain type. Examples:

.
R2 = {(x, y) .. x, y are real numbers}
.
R3 = {(x, y, z) .. x, y, z are real numbers}
.
C2 = {(x, y) .. x, y are complex numbers}.

5. Abstracting vectors, one can describe them solely in terms of their properties. For
example, if you add two vectors, you get another vector (and it doesnt matter which
one came first). This leads to the notion of a vector space, that is, the collection of
all vectors of a certain type. Examples: the three just above.
0.1 Outline vii

How about

.
{(x, y, z) .. x, y, z are integers}?
.
{(x, y, z) .. x, y, z are positive}?

More interesting examples:

.
{f (x) .. f is a polynomial in x}
.
{f (x) .. f is a continuous function}
.
{f (x) .. f is a differentiable function}.

Also, we discover the essential notion of a basis. A basis is a toolkit containing the
building blocks of your entire vector space. That means that any element of your
vector space (i.e., any vector) can be built up out of basis elements. It is easy to
provide an example of a basis of R3 :

(1, 0, 0), (0, 1, 0), (0, 0, 1).

Here is an example of an expansion in terms of this basis:

(2, 3, 14 ) = 2 (1, 0, 0) 3 (0, 1, 0) + 41 (0, 0, 1).

What is a basis for the space of polynomials? {1, x, x2 , x3 , . . . }.

What is a basis for the space of continuous functions? This gets a bit more tricky, but
there are many good answers. One of the classical answers to this question consists
of functions that look like a cos + b sin , etc. This is the starting point of Fourier
analysis and is what almost all digital communication and information compression
technology is based upon.

Wavelets are a newer answer to this question. In fact, this is still an area of current
research.

6. If we add an operation (called inner product) to a vector space that gives us the
angles between two vectors, then we can tell when two vectors are perpendicular
viii Course Overview

(orthogonal), etc, and study geometry in any dimension.

It turns out that an inner product also gives a very good way to determine how
far apart two points are in a vector space, and how to approximate one element by
others.

7. All this study of vectors gives a different perspective on matrices. A matrix can be
thought of as a transformation of vectors. (Specifically, a linear transformation.) How
can one characterize such a transformation? Eigenvectors tell which directions remain
fixed, and eigenvalues tell how much things are stretched in these fixed directions.

There are innumerable applications of eigenvectors and eigenvalues, for example,


long-term behavior. Suppose you have a transition matrix: the eigenvalues will tell
you the long-term distribution of values. Example: soda pop sales projections.

8. We will continue to study linear transformations by examining the kernel (the set
of vectors that gets killed, i.e., sent to 0) and the range. Can a transformation be
inverted? Sometimes, but not always; the determinant knows the answer.

9. Finally, well discuss various applications as time permits. The previously mentioned
topics have uses in geometry, differential equations, data analysis, signal processing
and the approximation of functions in general, economics, business, electrical net-
works, optimization, computer graphics, probability, game theory, fractals and chaos,
quantum mechanics, and generally any other area where you have multiple players,
goods, particles, etc.

The set of complex numbers C (aka the complex plane) is basically R2 with multiplica-
tion. Enroll in Complex Analysis if you want to discover the amazing and far-reaching
consequences of endowing points in R2 with a new operation (multiplication) defined by

(u, v) (x, y) = (ux vy, uy + vx).

0.2 Preliminaries and reference

0.2.1 Common notations and terminology

Sets will be defined by listing their elements or providing the criteria to be a member.
0.2 Preliminaries and reference ix

Example 0.2.1.

A = {0, 2, 4, . . . } (pattern)
.
B = {x .. x2 = 1} (algebraic req)
.
C = {x .. x4 = 1}
.
D = {x .. |x y| 1} (geometric req)

Definition 0.2.2. If x is an element of the set A, we write x A; if not, we write x


/ A.
If every element of the set A is also an element of the set B, then A is a subset of B and
we write A B.

Example 0.2.3. If y = 0 in the example above, then we have an interval

.
D = [1, 1] = {x R .. 1 x 1}, and B D.

Definition 0.2.4. If A B and B A, then the sets are equal and we write A = B.

= {}, the empty set; it contains no elements and is a subset of every other set.

N = {1, 2, 3, . . . }, the natural numbers

Z = {. . . , 2, 1, 0, 1, 2, . . . }, the integers (Zahlen in German)

.
Q = {m .
n . m, n Z, n 6= 0}, the rational numbers (quotients)

R = Q, the real numbers

.
R2 = {(x, y) .. x, y R}, the plane

.
R3 = {(x, y, z) .. x, y, z R}, 3-space

.
Rn = {(x1 , x2 , . . . , xn ) .. each xi R}, (Euclidean) n-space

Set operations:

. .
intersection: A B = {x .. x A and x B} union: A B = {x .. x A or x B}
. .
complement: Ac = {x .. x
/ A} / B} = A B c
difference: A \ B = {x .. x A and x
.
product: A B = {(x, y) .. x A and y B} containment: A B (x A = x B)
x Course Overview

Subsets of R:

. .
[a, b] = {x .. a x b} is a closed interval [a, b]2 = [a, b] [a, b] = {(x, y) .. a x, y b}
.
(a, b) = {x .. a < x < b} is an open interval

(a, b] or [a, b) are half-open intervals

Some common vector spaces (each has an analogue with R replaced by C):

.
R2 = {(x, y) .. x, y R} = R R
.
R3 = {(x, y, z) .. x, y, z R} = R R R
.
Xn
Rn = {(x1 , x2 , . . . , xn ) .. xi R for i = 1, . . . , n} hx, yi = x y = xi yi
Z i=1
.
C(X) = {f : X R .. f is continuous} = C 0 (X) hf, gi = f (x)g(x) dx
X
.
C k (X) = {f : X R .. f, f 0 , f 00 , . . . , f (k) are continuous}

Mmn = {m n matrices with entries from R}


Xn .
Pn = {p(t) = ak tk .. ai R for i = 1, . . . , n}
k=0

Inner product (dot product) properties:

p
kxk = hx, xi = the length of x kx yk = distance from x to y

hx, yi = kxkkyk cos , so |hx, yi| kxkkyk hx, yi = 0 x y

0.2.2 Logic and inference

A implies B is written A = B and means that if A is true, then B must also be true.
This is if-then or implication. A is the hypothesis and B is the conclusion. To say the
hypothesis is satisfied means that A is true. In this case, one can make the argument

A = B
A
B

and infer that B must therefore be true, also. Logical equivalence: when A = B
and B = A, then the statements are equivalent and we write A if and only if B as
0.2 Preliminaries and reference xi

A B, A B, or A iff B.
Equivalent forms of an implication:

A B A = B (A and B) A or B B = A
T T T T T T
T F F F F F
F T T T T T
F F T T T T

DeMorgan laws:

A B (A and B) A or B (A or B) A and B
T T F F F F
T F T T F F
F T T T F F
F F T T T T

Set version of the DeMorgan laws: (A B)c = Ac B c , and (A B)c = Ac B c .


Distribution laws for sets: A (B C) = (A B) (A C), and A (B C) =
(A B) (A C).
Containment law (set version of contrapositive): A B B c Ac
A B c = .

Universal quantifier: x, A(x) means A(x) is true for all values of x.


Existential quantifier: x, A(x) means A(x) is true for some x (at least one, anyway).
Note: !x, A(x) means there is a unique x for which A(x) is true.
Quantifier rules:

x, A(x) x, A(x) x, y, A(x, y) y, x, A(x, y) x, y, A(x, y) = y, x, A(x, y)

x, A(x) x, A(x) x, y, A(x, y) y, x, A(x, y)

0.2.3 Proof Techniques

How to prove a statement of the form A = B.


xii Course Overview

Direct proof:

1. Assume the hypothesis A, for the moment.

2. Use this assumption, and whatever else you know, to prove that B is true.

Indirect proof:

Use the fact that A = B is equivalent to B = A (contrapositive).

1. Assume, for the moment, that the opposite of the conclusion, B, is true.

2. Use this assumption, and whatever else you know, to prove A is true.

Contradiction:

This works for proving statements that are not necesarily of the form A = B. Suppose
you are asked to show that some proposition P is true.

1. Assume, for the moment, that P is false.

2. Show that this assumption implies a fallacy (like x < x, 9 is prime, or some other
blatant lie).

Mathematical induction:

This works for proving statements which are supposed to be true for every natural number.
To prove that P (n) is true whenever n N:

1. Show P (1).

2. Show that P (k) = P (k + 1).


Chapter 1

Systems of Linear Equations and


Matrices

1.1 Introduction to Systems of Linear Eqns


Definition 1.1.1. A linear equation in n variables is

a1 x1 + a2 x2 + + an xn = b,

where ai , b are real numbers and the xi are unknown.

Example 1.1.2.
A linear equation in 2 variables looks like ax1 + bx2 = d where a, b, d are constants and
x1 , x2 are the two variables:

x2 = 2x1 + 1

3x1 x2 = 4

x1 + x2 1 = 0

A linear equation in 3 variables looks like ax1 + bx2 + cx3 = d where a, b, c, d are
constants and x1 , x2 , x3 are the three variables:

0.5x1 3x2 + x3 = 2

x3 2x2 + 3 = x1
2 Linear systems

2
3 x1 34 x2 = x3

In general, a linear equation in n variables looks like

a1 x1 + a2 x2 + a3 x3 + . . . + an xn = b

where a1 , a2 , a3 , . . . , an , b are constants and x1 , x2 , x3 , . . . , xn are n variables.


For contrast, here are some examples of equations that are not linear:
x1 x2 = 1 The variables are multiplied together.
1
x3 = x1 Reciprocals are not linear (equivalent to above).
2x1 + 2x22 + 3x3 = 1 Raising any variable to a power produces a nonlinear eqn.
x2 = sin x1 Trigonometric functions are not linear.
x3
e 3x2 = 0 The exponential function is not linear.
General rule: linear iff graph is flat.

1.1.1 Systems of Linear Equations

Definition 1.1.3. A system of linear equations (or linear system) is simply a collection
of two or more equations which share the same variables.

Example 1.1.4. Suppose you have a collection of dimes and nickels worth 80 cents, and
you have 11 coins total. The associated system of linear equations is

10x1 + 5x2 = 80

x1 + x2 = 11

Solution by back-substitution: the second equation may be rewritten as x2 = 11 x1 . This


new expression for x2 may be substituted into the first equation to produce

10x1 + 5(11 x1 ) = 80,

which then gives


5x1 = 25 = x1 = 5 and x2 = 6.

Alternative: multiplying the second equation by 10 gives

10x1 + 5 x2 = 80

10x1 10x2 = 110


1.1 Introduction 3

= 5 x2 = 30

which also gives x2 = 6 and x1 = 5.

Definition 1.1.5. A solution to a system of linear equations is sequence of numbers


s1 , s2 , . . . , sn such that the system of eqns is satisfied (i.e., true) when si is substituted in
for xi . In the previous example,

10 5 + 56 = 80

5+ 6 = 11

shows that (x1 , x2 ) = (5, 6) is a solution to the system. Geometrically, a solution is a point
where all the graphs intersect.
A solution set for a system of linear equations is the set of all possible solutions for the
system. A system with no solutions is said to be inconsistent; a system with at least one
solution is consistent.
A system is homogeneous all constants bi are equal to 0. A homogeneous system always
has the trivial solution x1 = x2 = . . . xn = 0.

This last definition might prompt you to ask, How many solutions can a system of
linear eqns have? Intuitively, you might expect that every system has exactly one solution,
but this is not the case. Consider the following systems:

Example 1.1.6.

x1 + x2 = 2

x1 x2 = 2

This system represents two lines which intersect at the point (2, 0). Hence, it has the
unique solution (2, 0).

x1 + x2 = 2

x1 + x2 = 1

This system represents two parallel lines. Since these lines do not intersect, there is no
solution (s1 , s2 ) which satisfies both equations simultaneously. More intuitively, think of
4 Linear systems

this system as being impossible to solve because two numbers cannot sum to two different
values.

x1 + x2 = 2

x1 x2 = 2

This system represents the same line two different ways. Since these two lines overlap each
other, any point on one line is also on the other. Hence, any point on the line is a solution
to the system.

1.1.2 Solution sets

In the last example, we saw a system with an infinite solution set (any point on the line
will work!). How to express this?

Example 1.1.7. Consider the linear system

x1 + 2x2 3x3 =4 (1.1.1)

2x1 + x2 3x3 = 4 (1.1.2)

Whats the first thing you notice about this system? It has two equations, and 3 unknowns.
So can we still solve it? Well, mostly ...
Begin by eliminating x1 by multiplying (1.1.1) by 2 and adding it the second equation
to obtain
3x2 + 3x3 = 12. (1.1.3)

Now solve 1.1.3 for x2 as


x2 = x3 4. (1.1.4)

Since this is about as far as we can go in solving this system, we let x3 = t, where t is a
parameter that can be any number, i.e., t R or < t < or t (, ). Now by
substituting x3 = t into (1.1.4), we get x2 = t 4. Now we rewrite equation (1.1.1) as

x1 = 4 x2 + 3x3

= 4 2(t 4) + 3t

=t+4
1.1 Introduction 5

and we obtain the solution set (t + 4, t 4, t), where < t < . Note that there are an
infinite number of solutions, but not just any three numbers (a, b, c) is a solution of the
system. A solution needs to have the specific form (t + 4, t 4, t).

Definition 1.1.8. A parameter is a variable, usually with a specified range, which remains
as part of the solution; the solution set is then said to be parameterized.
An infinite solution set which is described in terms of a parameter is also called a
parametric representation. A variable which has been set equal to the parameter is called
a free variable.

A parametric representation is not unique; it can be written many ways. For example,
the parametric solution to the system above may also be written as:

(r, r 8, r 4), < r < x1 is a free variable.

(s + 8, s, s + 4), < s < x2 is a free variable.

(u + 2, u 6, u 2), < u < No free variable.

For systems with more variables, the solution set may have many parameters. A
particular solution can be obtained from a parameterized solution by substituting in
certain values of the parameters:

r=1 = (1, 7, 3).

Fixing s = 7 gives the same point in R3 .

This example serves to illustrate the general case: for any system of linear equations,
it is always the case that there is either one unique solution, no solution, or an infinite
number of solutions. In other terminology, the solution set can consist of one point, it can
be empty, or it can contain infinitely many points. This is due to the nature of straight
lines and the ways they can intersect. For example, it is impossible for two straight lines
to intersect in precisely two places (in flat space). Well prove this later on.

HW 1.1: 1, 2, 9, 10, 10, 16, 18, 26, 27, 30, 31


6 Linear systems

1.2 Matrices

Definition 1.2.1. A matrix is a rectangular array of numbers. An m n matrix is a


matrix with m rows and n columns:

a11 a12 a13 a1n

a21 a22 a23 a2n


a31 a32 a33 a3n

.. .. .. ..

..
. . . . .

am1 am2 am3 amn

Definition 1.2.2. Each entry aij in the matrix is a number, where i tells what row the
number is on, and j tells which column it is in. For example, a23 is the number in the
second row and third column of the matrix. The subscripts i, j can be thought of as giving
the address of an entry within the matrix.

Definition 1.2.3. Two matrices are equal iff they have the same size (equal dimensions)
and all of their entries are equal, so aij = bij for all i, j.

Definition 1.2.4. If we have an m n matrix where m = n, then it is called a square


matrix. For a square matrix, the entries a11 , a22 , . . . , ann are called the main diagonal or
sometimes just the diagonal.

Remark. We will discuss how to perform arithmetic operations with matrices shortly, that
is, how to add two matrices together or what it might mean to multiply two together.
First, however, we will apply matrices to the task of solving linear systems, and develop
some motivation for why matrices might be important.

1.2.1 Types of matrices

Definition 1.2.5. A matrix with only one row is called a row vector. E.g.:

a = [a1 a2 . . . an ]

1
b = [3 1 4 12]
1.2 Matrices 7

A matrix with only one column is called a (GUESS WHAT) column vector. E.g.:

3
a1
2

a2

a= . b = 6.8

..

1

an
0

Definition 1.2.6. A matrix with the same number of rows as columns is a square matrix.

Both of the first two examples were square.

1.2.2 Matrix operations

Definition 1.2.7. If two matrices A and B are both of the same size, then we define the
sum of A and B as follows:


a11 a12 a1n b11 b12 b1n

a21 a22 a2n b21 b22 b2n


+ =
.. .. .. .. .. ..

.. ..
. . . . . . . .

am1 am2 amn bm1 bm2 bmn

a11 + b11 a12 + b12 a1n + b1n

a21 + b21 a22 + b22 a2n + b2n


.. .. ..

..
. . . .

am1 + bm1 am2 + bm2 amn + bmn

Remark. This is probably a good time to introduce some shorthand notation for matrices.
In future, we may write the matrix

a11 a12 a1n

a21 a22 a2n


A=
.. .. .. ..


. . . .

am1 am2 amn

in the abbreviated form


A = [aij ] .
8 Linear systems

In this notation, the sum of two matrices A = [aij ], B = [bij ] is written

A + B = [aij ] + [bij ] = [aij + bij ] .

Example 1.2.8. For



1 2 4 0 2 4
A= and B= ,
2 1 3 1 3 1

the sum is given by



1+0 2 + 2 44 1 0 0
A+B = =
2+1 1 + 3 3+1 3 2 4

Note that this definition only makes sense when A and B are the same size. If
two matrices are of different size, then their sum is undefined.

Definition 1.2.9. Scalar multiplication (or multiplication by a number, or multiplica-


tion by a constant) of a matrix


a11 a12 a1n

a21 a22 a2n


A=
.. .. .. ..
= [aij ]

. . . .

am1 am2 amn

by a scalar c is defined by

a11 a12 a1n ca11 ca12 ca1n

a21 a22 a2n ca21 ca22 ca2n


cA = c
.. .. .. ..
=
.. .. .. ..
= c [aij ] = [caij ]

. . . . . . . .

am1 am2 amn cam1 cam2 camn

Example 1.2.10. If we have the matrix



4 8 2
A= ,
6 8 10
1.2 Matrices 9

then two scalar multiples of it are



1
2 4 1 12 24 6
2A = and 3A =

3 4 5 18 24 30

Scalar multiplication is good for changing the size of entries in a matrix/vector so as to


have certain properties.

Example 1.2.11. Let A = [3 7 2 1]. Then for c = 17 , the largest entry of cA is 1. For
1 1
b= 3+7+2+1 = 13 , the entries of bA are percentages (or probabilities) associated to A.
If d = 2.54 and the entries of A are measurements in inches, then dA gives the same
measurements in cm.

Now you can take sums and differences of matrices. More generally,

Definition 1.2.12. A linear combination of matrices A1 , A2 , . . . , An is an expression

n
X
c1 A1 + c2 A2 + + cn An = ci Ai ,
i=1

where each ci is a real number (coefficient).

Later, well see that linear combinations of vectors are especially useful.
Pn
Here i=1 xi is the standard summation notation for the sum of n (different) things.
For example, if x = (3, 2, 1, 4, 7), then

4
X
xi = 2 + 1 + 4 = 6.
i=2

Sums have the following properties:


n
X n
X n
X
1. ai (ui vi ) = ai ui + ai vi
i=1 i=1 i=1
n
X n
X
2. cai vi = c ai vi
i=1 i=1
!
m
X Xn n
X m
X
3. aij =
aij
i=1 j=1 j=1 i=1

The transpose of A is denoted by AT and has entries aTij = aji . So the rows of A are
the columns of AT and vice versa.
10 Linear systems


1 2 3 1 4 7

A= 4 = AT = 2

5 6 5 8

7 8 9 3 6 9

Every matrix has a transpose; the matrix need not be square.


1 4
1 2 3
B= = BT = 2

5
4 5 6
3 6

The transpose of a row vector is a column vector, and vice versa.

Vectors as data storage

Suppose you own a store that sells 100 different products. The inventory of the store
is then a vector x R100 . Say u R100 is your inventory at the beginning of the week,
v R100 tells how many of each item was sold in the week, and w R100 tells how many
items arrive on the truck with this weeks delivery. Then at the end of the week, your
inventory is u v + w.

Vectors can also store relational data. A graph is a collection of vertices (nodes/points)
and edges (line segments showing connectivity/adjacency). The adjacency matrix of a
graph encodes this data with a 1 in the (i, j)th entry if Pi Pj and a 0 otherwise.

P1 P2 P3 P4

P1 0 1 0 0


P2 1 0 1

1
P3 P4





P3 0 1 0 1



P4 1 0 1 0

P2 P1

HW 1.2: 7(df), 9(cd), 10, 12, 14, 17, 19, 20, 21


1.3 Matrix multiplication 11

1.3 Matrix multiplication


Definition 1.3.1. The dot product of x and y is

n
X
xy = xi yi = x1 y1 + + xn yn .
i=1

Example 1.3.2. u = (0.31, 0.23, 0.23, 0.23) and v = (96, 87, 43, 81). Then

u v = (0.31)(96) + (0.23)(87) + (0.23)(43) + (0.23)(81) = 78.29

Example 1.3.3. Let a = (x, 2, 3) and b = (4, 1, 2). If a b = 4, what is x?

a b = 4x + 2 + 6 = 4 = 4x = 12 = x = 3.

Definition 1.3.4. The product of two matrices A = [aij ] and B = [bij ] is only defined
when the number of columns of A is equal to the number of rows of B. Suppose A is an
m n matrix and B is an n p matrix so that the product AB is well-defined. Then AB
is defined as follows:

Xn
AB = [cij ] where cij = ai bj = aik bkj .
k=1

hideous and slightly terrifying, you should not be alarmed. In practice, the entries of
a product are not too difficult to compute, and there is a very simple mnemonic for
remembering which entries from the factor matrices are used: t To find the entry in the
ith row and jth column of the product, use the ith row of A and the jth row of B. Using
full-blown matrix notation, we have

a11 a12 a13 a1n c11 c12 c1j c1p
b11 b12 b1j b1p
21 a22 a23 c21 c22 c2j
a a2n c2p

.. .. .. . . .. b21 b22 b2j b2p .. .. . . .. . . ..
. . .

. . . . . . . .
b31 b32 b3j b3p = ,
ai1 ai2 ai3 cij

ain .. .. . . .. . . .. i1
c ci2 cip
. . . . . .
.. .. .. . . ..
.. .. . . .. . . ..
. . . . .
. . . . . .
bn1 bn2 bnj bnp
am1 am2 am3 amn cm1 cm2 cmj cmp
where
cij = (AB)ij = ai1 b1j + ai2 b2j + ai3 b3j + . . . + ain bnj .

You can see why A must have the same number of columns as B has rows - otherwise
these numbers would not match up equally, and the product wouldnt be well-defined.
12 Linear systems

Example 1.3.5. Consider the matrices



3 2
2 1 3
A= 2 and B=

4
4 1 6
1 3

since A has 2 columns and B has 2 rows, the product of these two matrices is well-defined:

3 2
2 1 3
AB = 2

4
4 1 6
1 3

3 (2) 2 4 3121 3326

= 2 (2) + 4 4 21+41 23+46


1 (2) 3 4 1131 1336

14 1 3

= 12

6 30

14 2 15

Note that B has 3 columns and A has 3 rows, so the product BA is also defined:

3 2
2 1 3
BA =
2 4
4 1 6
1 3

2 3 + 1 2 + 3 1 2 (2) + 1 4 + 3 (3)
=
43+12+61 4 (2) + 1 4 + 6 (3)

1 1
=
20 22

This example illustrates a very important point: when we multiply matrices, AB is not
necessarily equal to BA. In fact, they need not even have the same size!

Example 1.3.6. Let



1 2
3 4
A= and B= 4 5 .

1 2
3 6
1.3 Matrix multiplication 13

Then we can find the product



1 2 5 8
3 4
BA = 4 =

5 17 26
1 2
3 6 15 24

because B has 2 columns and A has 2 rows. However, the product AB is not even defined!

Note that in general, the product matrix gets its height from the first matrix and its
width from the second.

Definition 1.3.7. The coefficient matrix of a system of linear equations is the matrix
whose entries aij represent the coefficient of the jth unknown in the ith equation.

Example 1.3.8. Given the linear system

x1 + 2x2 + x3 = 3

3x1 x2 3x3 =1

2x1 + 3x2 + x3 = 4

which we solved previously, the coefficient matrix of this system is



1 2 1

1 3

3

2 3 1

Definition 1.3.9. The augmented matrix of a system of linear equations is like the
coefficient matrix, but we include the additional column of constants on the far right side.

Example 1.3.10. The augmented matrix of the system given above is



1 2 1 3

1 3 1 .

3

2 3 1 4
14 Linear systems

Sometimes augmented matrices are written with a bar to emphasize that they are augmented
matrices:
1 2 1 3

1 3 1 .

3

2 3 1 4

Example 1.3.11. Note that any term which is missing from an equation (in a system of
linear equations) must be represented by a 0 in the coefficient matrix:


x1 x2 + 2x3 = 1 1 1 2 1

x2 x3 =3 = 1 1 3 .

0

4x3 = 2 0 0 4 2

1.3.1 Column-by-column, row-by-row

Above, we saw how to compute just one entry of a product matrix. We can also compute
just one column, or just one row.

j th col of AB = A[j th col of B]

ith row of AB = [ith row of A]B

This means you can compute AB column-by-column as

AB = A[b1 b2 . . . bn ] = [Ab1 Ab2 . . . Abn ]

where bj is a column of B, or else row-by-row as


a1 a1 B

a2 a2 B


AB = B =
.. ..
,

. .

an an B

where ai is a row of A.
1.3 Matrix multiplication 15

Let

a11 ... a1n x1
.. .. ..

A= ..
, x= .

. . . .

an1 ... ann xn

Then

a11 x1 + + a1n xn a11 a1n
.. .. ..

Ax = = x1 + + xn

. . .

an1 x1 + + ann xn an1 ann

This shows two nifty things.


(1) A matrix times a vector is a linear combination of vectors. More specifically, the
product Ax can be represented as a linear combinations of the columns of A, where the
coefficients are the entries of x.
(2) A system of linear equations can be encoded in matrices. Let


b1
..

b= .

.
bn

Then

a11 ... a1n x1 a11 x1 + + a1n xn b1
.. .. .. ..

Ax = = = =b

.
. . .
an1 ... ann xn an1 x1 + + ann xn bn

shows that Ax = b is the same thing as the system

a11 x1 + + a1n xn = b1
..
.

an1 x1 + + ann xn = bn .

HW 1.3: 6, 7, 13, 15, 19, 33, 34, 36, 43, 48, 49


16 Linear systems

1.4 Algebraic properties of matrix operations


Remark. Mathematical thought: study an object until one can extract the salient features
of the system and develop rules about how they interact.

1. First: basic arithmetic is learning specific sums/times tables.

2. Later: properties of numbers in algebra class.

Now, we do the same with matrices.

1. Algebra is the distillation of properties of numbers and how they behave with respect
to the operations of addition and multiplication.

2. Linear Algebra is the distillation of properties of matrices and how they behave under
addition and multiplication.

(Also, other operations unique to matrices.)

Algebraic Properties of Scalars

additive multiplicative

Commutative a+b=b+a ab = ba

Associative a + (b + c) = (a + b) + c a(bc) = (ab)c

Identity for a 6= 0!b s.t. a + b = a (b = 0) !b s.t. a b = a (b = 1)

Inverses for a 6= 0!b s.t. b + a = 0 (b = a) b s.t. b a = 1 (b = a1 )

Mixed additive/multiplicative properties:

Distributive a(b + c) = ab + ac (a + b)c = ac + bc

Zero a0=0 ab = 0 = a = 0 or b = 0

Even if the names are not familiar, the properties are. Now contrast this with the rules
governing matrices (c, d are scalars, A, B, C are matrices):

Matrix Identities

What are identities for matrices?


1.4 Algebraic properties of matrix operations 17

Additive identity:
0 0 0

0 0 0


0mn =
.. .. .. ..


. . . .

0 0 0

Multiplicative identity:
1 0 0

0 1 0


In =
.. .. .. ..


. . . .

0 0 1

Theorem 1.4.1. If R is a square matrix in reduced row-echelon form, then R either has
a row of zeros, or else R is the identity matrix.

Algebraic Properties of Matrices

additive multiplicative

Commutative A+B =B+A AB 6= BA

Associative A + (B + C) = (A + B) + C A(BC) = (AB)C

Identity !B s.t. A + B = A, B = 0mn !B s.t. AB = A, B = In ,

Inverses !B s.t. A + B = 0mn , B = 1 A (sometimes)!B s.t. AB = In , B = A1

Mixed additive/multiplicative properties:

Distributive A(B + C) = AB + AC, (A + B)C = AC + BC

Zero A0mn = 0mn , AB = 0mn ; A = 0mn or B = 0mn

Mixed scalar/matrix properties:

Associative (cd)A = c(dA), A(cB) = c(AB) = (cA)B

Distributive c(A + B) = cA + cB, (c + d)A = cA + dA

Zero cA = 0mn = c = 0 or A = 0mn


18 Linear systems

Note the special cases:

(1) matrix multiplication is NOT commutative,

(2) multiplicative identity is only defined for SQUARE matrices,

(3) multiplicative inverses do NOT always exist, and

(4) there ARE zero-divisors (so cancellation laws fail).

Example 1.4.2. Example of (1).



0 1 1 1 3 2 0 3
A= ,B = , AB = 6= BA = .
0 2 3 2 6 4 0 7

Example 1.4.3. Example of (4). Suppose



2 1 1 1
A= , and B =
2 1 2 2

so that we have

2 1 1 1 0 0
AB = = = 022
2 1 2 2 0 0

Remark. It is precisely because of this last fact that the familiar Law of Cancellation does
NOT hold for matrices. For scalars, we have the Law of Cancellation:

for a 6= 0, ab = ac = b = c

For matrices, it is not true in general that

AC = AD = C = D.

Let
1 2 0 3
C= ,D = .
2 0 4 10

Then
4 4
AC = = AD, but C 6= D.
4 4
1.4 Algebraic properties of matrix operations 19

We do, however, have the following result: if C is a invertible matrix, then

AC = BC = A = B and

CA = CB = A = B

1.4.1 The transpose

Definition 1.4.4. The transpose of a matrix A is the matrix AT obtained by interchanging


rows for columns. This corresponds to reflecting across the diagonal:

(AT )ij = Aji .

Example 1.4.5.
1 4
1 2 3
A= = AT =
2

5
4 5 6
3 6

1 2 3 1 4 7

T
B= 4 6 = B = 2

5 5 8

7 8 9 3 6 9

Note that it is precisely the diagonal entries which remain fixed.



1 2 1 2
C= = C T =
2 3 2 3

So it is possible for a matrix to be its own transpose.

Definition 1.4.6. For a square matrix A, when AT = A, we say A is symmetric.


(Compare to earlier defn.)

Theorem 1.4.7. Some algebraic properties of transposition, for matrices of appropriate


sizes:

1. (AT )T = A

2. (A + B)T = AT + B T

3. (cA)T = c(AT )
20 Linear systems

4. (AB)T = B T AT

Proof. Proof of (d). Let AB = [cij ]. Then the (i, j)th entry of (AB)T is

cTij = cji by defn of transpose


n
X
= ajk bki by defn of matrix mult.
k=1

On the other hand, B T AT has (i, j)th entry

n
X n
X
bTik aTkj = bki ajk by defn of transpose
k=1 k=1
Xn
= ajk bki by commutativity of scalar mult.
k=1

Together, these show the (i, j)th entry of (AB)T is equal to the (i, j)th entry of B T AT for
any i, j, and hence (AB)T = B T AT .

HW 1.4: 5, 8, 20, 22, 23, 24, 27, 32, 37, 38


1.5 Special matrices 21

1.5 Special matrices


Weve seen one special kind of matrix already an augmented matrix. This is the matrix
associated with a system of linear equations. There are many other special kinds of
matrices.

Example 1.5.1. The following 44 matrix gives the airline distances between the indicate
cities (in miles).

London Madrid New York Tokyo



London 0 785 3469 5959


Madrid 785 0 3593 6706


New York 3469 3593 0 6757

Tokyo 5959 6706 6757 0

What special properties does this matrix have? Why?

Example 1.5.2. Suppose you send your minions to do a poll at the supermarket and ask
customers which type of soda pop they bought that week.
(Lets assume for the sake of this example that there are only four kinds of pop, and
everyone buys exactly one type in a given week these conditions can be removed but it
will needlessly complicate the example for now.)
After several weeks, your minions present you with a report indicating how likely
someone is to buy one type, based on what they bought last time.

Cola Root beer Orange Lemon-lime



Cola 0.30 0.30 0.15 0.25


Root beer 0.40 0.10 0.30

0.20 (1.5.1)


Orange 0.25 0.25 0.25 0.25

Lemon-lime 0.20 0.20 0.20 0.40

What special properties does this matrix have? Why?


Later on, well see how certain properties of this matrix (the eigenvectors and eigenvalues)
can tell you what percentage of the population is drinking what, at any time in the distant
future.

Definition 1.5.3. A matrix is stochastic if all entries are nonnegative, and the sum of
22 Linear systems

each row is 1.

Suppose that represents the present state of affairs, so = [1 0 0 0] if every is


drinking Cola, etc. Next week,

0.30 0.30 0.15 0.25

h i
0.40 0.10 0.30

0.20 h i
1 0 0 0 = 0.30 0.30 0.15 0.25

0.25 0.25 0.25 0.25

0.20 0.20 0.20 0.40

The week after,

2
0.30 0.30 0.15 0.25

h i
0.40 0.10 0.30

0.20 h i
1 0 0 0 = 0.2975 0.2075 0.2225 0.2725

0.25 0.25 0.25 0.25

0.20 0.20 0.20 0.40

After 6 or more weeks (so n 6),


n
0.30 0.30 0.15 0.25 0.2827 0.2175 0.2185 0.2813


0.40 0.10 0.30 0.20 0.2827 0.2175 0.2185 0.2813
=

0.25 0.25 0.25 0.25 0.2827 0.2175 0.2185 0.2813

0.20 0.20 0.20 0.40 0.2827 0.2175 0.2185 0.2813

so customer distribution has stabilized at (0.2827, 0.2175, 0.2185, 0.2813), regardless of the
initial state.

1.5.1 Inverses of Matrices

Definition 1.5.4. A matrix A is invertible iff it has an inverse, that is, a matrix B such
that AB = BA = I. A matrix which is not invertible is singular.

Theorem 1.5.5. A 2 2 matrix



a b
A=
c d
1.5 Special matrices 23

is invertible iff ad bc 6= 0. In this case,



1 d b
A1 = . (1.5.2)
ad bc c a

Note: this only works for 2 2 matrices. (More on this later.)


 3 2 
Example 1.5.6. The inverse of A = 1 1 is

1 1 2 1 2
A1 = = .
3 (2) 1 3 1 3

To check, multiply
3 2 1 2
AA1 = = I,
1 1 1 3

1 2 3 2
A1 A = = I.
1 3 1 1

The most important Properties of Inverses are

Theorem 1.5.7. Assume that both A and B are invertible. Then

1. (A1 )1 = A

2. (cA)1 = 1c A1 , c 6= 0

3. (AB)1 = B 1 A1

4. (AT )1 = (A1 )T .

Note: this shows that if A and B are invertible, then AB is also invertible. In fact, any
product of invertible matrices is invertible.
 1 2 
Example 1.5.8. Let A = 0 3 . Then

1
A1 = (A1 )T =
3 2  3 2  3 0

= 0 1 =
1 (2) 0 1 2 1

1
AT = 1 (AT )1 =
 0 3 0  3 0 
= = 2 1 .
2 3
1 (2) 2 1

Proof of (4). To show (AT )1 = (A1 )T , we need to show

AT (A1 )T = (A1 )T AT = I.
24 Linear systems

First,

AT (A1 )T = (A1 A)T (BC)T = C T B T

= (I)T A1 A = I

=I I T = I.

Next,

(A1 )T AT = (AA1 )T (BC)T = C T B T

= (I)T AA1 = I

=I I T = I.

Theorem 1.5.9. If the inverse of a matrix exists, then it is unique.

Proof. Suppose B and C are both inverses of A. Since B is an inverse of A, we have


AB = BA = In . Since C is an inverse of A, we have AC = CA = In . Together,

B = BIn by additive identity property

= B(AC) by hypothesis: C is an inverse of A

= (BA)C by associativity of matrix mult

= In C by hypothesis: B is an inverse of A

=C by additive identity property

Powers of a matrix

By multiplicative associativity for matrices, it makes sense to multiply the same matrix
with itself multiple times; in other words, exponents are well defined for matrices:

Ak := A
| A{z. . . A} .
k times

If A is invertible,
Ak := A
|
1
A1 1
{z . . . A } .
k times

Theorem 1.5.10. An Am = An+m = Am An and (An )m = (Am )n = Amn .


1.5 Special matrices 25

Note however, that Ak 6= [akij ], and there is no general explicit formula for Ak - it must
be worked out by hand.

In some special cases, one can see how the pattern works:

1 1 1 1 1 1 1+1 1+1 2 2
A= q = A2 = = =
1 1 1 1 1 1 1+1 1+1 2 2

Then
2
1 1 1 1 2 2 1 1 4 4
A3 = A2 A = = =
1 1 1 1 2 2 1 1 4 4

Also,
1 2 1 4
B= = B2 = , B3 = . . .
0 1 0 1

Putting the above theorems together, you get things like,

A is invertible = Ak has an inverse: (Ak )1 = Ak = (A1 )k .

Convention: A0 = I.

Other functions of a matrix

Let f (x) = x2 . We just saw that for a square matrix A, f (A) is well-defined. Similarly,
given any polynomial
p(x) = a0 + a1 x + a2 x2 + + an xn ,

one can define


p(A) = a0 I + a1 A + a2 A2 + + an An .

 3 2 
Example 1.5.11. Let p(x) = 2 3x2 and A = 1 1 . Then

 3 2   3 2   7 4   19 12 
p(A) = 2I 3 1 1 1 1 = [ 20 02 ] 3 2 1 = 6 5

One can even define the exponential of a square matrix, using the power series repre-
26 Linear systems

sentation of ex :


X Ak
eA = .
k!
k=0

Here, the first term of the sum is I, following the convention A0 = I.

1.5.2 Diagonal matrices

Definition 1.5.12. A diagonal matrix is a square matrix for which all non-diagonal entries
are 0. E.g.:


0
2 0
1
0
0
0


0
2
1
0


=
2 0
0

0 0 2

0 0 0 4 1

Diagonal matrices may look too simple to be useful, but they are actually incredibly
useful. You will spend a lot of time wishing all matrices were diagonal, and some time in
Chapter 7 trying to make matrices diagonal.

A general diagonal matrix looks like



d1 0 ... 0

0 d2 . . . 0


D=
.. .. . . ..


. . . .

0 0 ... dn

and its powers look like


dk1 0 ... 0

0 dk2 . . . 0

k

D =
.. .. . . ..
.

. . . .

0 0 ... dkn

Here, k Z = {. . . , 2, 1, 0, 1, 2, . . . }.
Products with diagonal matrices are easy to compute:

a 0 0 m n o am an ao

r = bp

0 b 0 p q bq br

0 0 c s t u cs ct cu
1.5 Special matrices 27


m n o a 0 0 am bn co

b 0 = ap

p q r 0 bq cr

s t u 0 0 c as bt cu

1.5.3 Triangular matrices

Definition 1.5.13. An upper triangular matrix is a square matrix which has only zeros
below the diagonal, that is,
i > j = aij = 0.

A lower triangular matrix is a square matrix which has only zeros above the diagonal, that
is,
i < j = aij = 0.

Example 1.5.14. General lower, upper triangular matrices look like



a 0 0 a b c

L= b U = 0 d e .

c 0

d e f 0 0 f

Here, a, b, c, d, e, f R, and any (or all) of them can be 0.

Theorem 1.5.15. The transpose of an upper triangular matrix is lower triangular, and
vice versa.

Proof. Clear by inspection.

Theorem 1.5.16. The product of two lower triangular matrices is a lower triangular
matrix. (Similarly for upper triangular.)

Proof. Suppose A = [aij ] and B = [bij ] are both n n lower triangular matrices. By
definition, this means that
i < j = aij = bij = 0.

Let C = AB = [cij ] be the product matrix. Then by the defn of matrix mult,

n
X
cij = aik bkj = ai1 b1j + ai2 b2j + + ain bnj .
k=1
28 Linear systems

To show that C is also lower triangular, we need to see that for i < j, cij = 0. Then for cij
where i < j, we have

cij = ai1 b1j + ai2 b2j + + ai(j1) b(j1)j + aij bjj + + ain bnj

= ai1 0 + ai2 0 + + ai(j1) 0 + 0bjj + + 0bnj

= 0.

Theorem 1.5.17. A triangular matrix is invertible iff its diagonal entries are all nonzero.
In this case, the inverse is also triangular (same type).

1.5.4 Symmetric matrices

A is a symmetric matrix iff AT = A, that is, iff aij = aji for all i, j.
A is a skew-symmetric matrix iff AT = A, that is, iff aij = aji for all i, j.

Theorem 1.5.18. If A and B are symmetric, then

(i) AT is symmetric,

(ii) A + B and A B are symmetric, and

(iii) kA is symmetric, for k R.

Proof. Homework!

Theorem 1.5.19. Suppose A is symmetric. If A is invertible, then A1 is also symmetric.

Proof. Assume that A is symmetric and invertible. We need to show that A1 is symmetric.

(A1 )T = (AT )1 Thm

= A1 A = AT by hypothesis (Symmetric)

So A1 is its own transpose; it must be symmetric.

Theorem 1.5.20. If A is invertible, then AAT and AT A are also invertible.

Proof. Assume A is invertible. Then AT is also invertible, by Thm. Then note that
products of invertible matrices are invertible.

Theorem 1.5.21. AAT and AT A are always symmetric.


1.5 Special matrices 29

Proof. For the first one,

(AAT )T = (AT )T AT = AAT .

For the second one,

(AT A)T = AT (AT )T = AT A.

Recurrence relations

The Fibonacci sequence 1, 1, 2, 3, 5, 8, 13, 21, . . . is generated by the recurrence relation


un = un1 + un2 with the initial conditions u0 = u1 = 1. So each term of the sequence
can be written as the sum of the previous two.

This sum can be converted into a product using matrices:



un 1 1 un1
= .
un1 1 0 un2

With A = [ 11 10 ] and vn = [ uun+1


n
], this is

vn1 = Avn2 = A2 vn3 = = An1 v0 = An1 [ 11 ] .

A closed-form solution for this recurrence is



n (1 )2 1+ 5 un
un = , n = 1, 2, . . . , = = lim
5 2 n un1

Also, is the largest root of x2 = x + 1.

HW 1.5: 4, 8, 22, 31, 3438, 50, 59


Also: prove that a diagonal matrix is symmetric.
So start by assuming that A is a diagonal matrix. Now you need to prove aij = aij . Try
examining cases.
30 Linear systems

Also: compute eAt , where t R is a parameter and



0 3 4

A= 0 6 .

0

0 0 0

Use the power series expansion for ex and you should obtain a 3 3 matrix whose entries
are functions of t.
1.6 Matrix transformations 31

1.6 Matrix transformations

1.6.1 Functions from Rn to R

f : Rn R is a function which eats an n-vector and spits out a number, i.e., it takes n
inputs and gives one output.

Example 1.6.1. f : Rn R by f (x) = kxk.


fa : Rn R by fa (x) = x a = a1 x1 + a2 x2 + + an xn , here a Rn is fixed.
f : R2n R by f (x, y) = x y = x1 y1 + x2 y2 + + xn yn .
f : R3 R by f (x) = f (x1 , x2 , x3 ) = x1 . (Coordinate map)

1.6.2 Functions from Rn to Rm

f : Rn Rm is a function which eats an n-vector and spits out an m-vector, i.e., it takes
n inputs and gives m outputs. These are often called maps or transformations. When
m = n, they are called operators.

Example 1.6.2. f : Rn Rn by f (x) = c. (Constant at c Rn ) Esp.: f (x) = 0


f : Rn Rn by f (x) = x. (Identity operator)
fa : Rn Rn by fa (x) = ax, here a R is fixed.
f : Rn Rn by f (x) = f (x1 , x2 , x3 ) = (x1 , 0, 0). (Projection)

If we have a bunch of real-valued functions on Rn

f1 (x1 , x2 , . . . , xn ) = y1 ,

f2 (x1 , x2 , . . . , xn ) = y2 ,
..
.

fm (x1 , x2 , . . . , xn ) = ym ,

then taken together, they define a transformation T : Rn Rm via

T (x1 , x2 , . . . , xn ) = (y1 , y2 , . . . , ym ).

For x = (x1 , x2 , . . . , xn ), this is written:

T (x) = (f1 (x), f2 (x), . . . , fm (x))


32 Linear systems

Definition 1.6.3. A linear function f is one for which

1. f (x + y) = f (x) + f (y), for all vectors x, y, and

2. f (ax) = af (x), for all vectors x and every a R.

Theorem 1.6.4. For T (x) = (f1 (x), f2 (x), . . . , fm (x)) defined as above, T is a linear
transformation iff fi (x1 , x2 , . . . , xn ) is a linear combination of its variables, for each
i = 1, 2, . . . , m.

This means
fi (x1 , x2 , . . . , xn ) = ai1 x1 + ai2 x2 + + ain xn

Consequently, if T : Rn Rm is a linear transformation, then T (x) can be written as


x1
a11 . . . a1n
x

.. .. 2

T (x) = = Ax.

. . .
..

am1 ... amn
xm

Later, well prove that linear functions are essentially the same thing as functions
defined by matrix multiplication:

Theorem 1.6.5. f is a linear function iff f (x) = Ax for some matrix A.

Definition 1.6.6. Given a matrix A, write TA for the associated function defined by
multiplying against this matrix, i.e.,

TA (x) := Ax.

Similarly, [T ] = A is the standard matrix for T . (Later: there are many matrices for T .)

1.6.3 Some operators

To determine what kind of transformation is effected by a given T , it may help to look at


the image of a simple figure under the action of T . The unit cube is often helpful.

Definition 1.6.7. The unit cube in Rn is

.
Qn := [0, 1]n = [0, 1] [0, 1] [0, 1] = {x Rn .. 0 xi 1, i = 1, 2, . . . , n}.
| {z }
n times
1.6 Matrix transformations 33

So Q1 = [0, 1] is the unit interval, Q2 = [0, 1]2 is the unit square, etc.

Example 1.6.8. Let



1 0 h i
A= x= a b
0 1

a
Now define f (x) := Ax, so f ([ ab ]) = [ b ].

Example 1.6.9. Let



0 1 h i
B= x= a b
1 0

Now define g(x) := Ax, so g ([ ab ]) = [ b


a ].

Note that h(x) = g(x) corresponds to rotation in the other direction (cw).

Example 1.6.10 (Reflection). Consider TA : R3 R3 defined by



1 0 0

A= 0 .

0 1

0 0 1

Then

1 0 0 x1 x1

TA (x) = Ax = 0 x2 =

0 1 x2

0 0 1 x3 x3

is reflection in the x1 -direction, that is, it is the symmetric image of x reflected through
the x2 x3 -plane.

Consider TB : R3 R3 defined by

1 0 0

B= 1 0 .

0

0 0 1
34 Linear systems

Then

1 0 0 x1 x1

TB (x) = Bx = 1 0 x2 = x2

0

0 0 1 x3 x3

is reflection in the origin, that is, it is the symmetric image of x on the other side of the
line that passes through x and 0.

Consider TC : R3 R3 defined by

0 1 0

C= 1 0 .

0

0 0 1

This matrix is obtained by rowswap, and has the effect of interchanging the first two
coordinates:

0 1 0 x1 x2

TC (x) = Cx = 1 0 x2 = x1

0

0 0 1 x3 x3

is reflection in the vertical diagonal plane between the x1 and x2 axes.


(SKETCH lower triangular prism before & after)

Example 1.6.11 (Projection operators). Consider TA : R3 R3 defined by



1 0 0

A= 0 0 .

0

0 0 0

Then

1 0 0 x1 x1

TA (x) = Ax = 0 0 x2 =

0 0

0 0 0 x3 0

is projection to the x1 -axis.


1.6 Matrix transformations 35

Consider TB : R3 R3 defined by

0 0 0

B= 0 0 .

1

0 0 1

Then

0 0 0 x1 0

TB (x) = Bx = 0 0 x2 = x2

1

0 0 1 x3 x3

is projection to the x2 x3 -plane.

Example 1.6.12 (Rotation). Consider TC : R3 R3 defined by



1 0 0

C= 0 sin .

cos

0 sin cos

This fixes the x1 -coordinate but rotates the x2 x3 -plane.



1 0 0 x1 x1

TC (x) = Cx = 0 sin x2 = x2 cos x3 sin

cos

0 sin cos x3 x2 sin + x3 cos

1.6.4 Dilation and contraction

Definition 1.6.13. T (x) = kx is called a contraction iff 0 k 1 and a dilation iff


k 1.

A contraction uniformly compresses Rn toward the origin, and a dilation uniformly


expands R away from the origin. The standard matrix for a contraction or dilation is a
diagonal matrix with ALL entries equal to k, that is, a scalar multiple of the identity:

k 0 0

[T ] = [TkI ] = kI = 0 0 .

k

0 0 k
36 Linear systems

Example 1.6.14 (Contraction). Consider TC : R3 R3 defined by



1
2 0 0
1
C= 0 1
0 = I.

2 2
1
0 0 2

Then

1 1
2 0 0 x1 2 x1 x1
1
TC (x) = Cx = 0 1 1
0 x2 = = x2

2 2 x2 2
1 1
0 0 2 x3 2 x3
x3

Example 1.6.15 (Dilation). Consider TD : R3 R3 defined by



3 0 0

D= 0 0 = 3I.

3

0 0 3

Then

3 0 0 x1 3x1 x1

TD (x) = Dx = 0 0 x2 = 3x2 = 3 x2

3

0 0 3 x3 3x3 x3

1.6.5 Compositions

Definition 1.6.16. If TA : Rn Rk and TB : Rk Rm , then the composition is the


transformation TB TA = TB (TA ) : Rn Rm .

Theorem 1.6.17. The composition of two linear transforms is a linear transform.

Proof. TB TA (x) = TB (TA (x)) = TB (Ax) = B(A(x)) = BAx.

Corollary 1.6.18. TB TA = TBA . Also, [TB TA ] = [TB ][TA ].

By repeated application,

TC TB TA = TCBA , etc.
1.6 Matrix transformations 37

Corollary 1.6.19. Composition is not commutative.

Proof. If it were always true that TBA = TB TA = TA TB = TAB , this would imply that
AB = BA.

Example 1.6.20. The composition of two rotations is always another rotation, but these
dont usually commute. (EXAMPLE: chalk brush)

HW 1.6: 5, 7, 8, 18, 19
Describe in words the geometric action of f in #5.
HW 1.7: 3, 5, 14, 15
38 Linear systems
Chapter 2

Solving linear systems

2.1 Echelon form of a matrix

Definition 2.1.1. A matrix in row-echelon form is a matrix which has the following
properties:

1. The first nonzero entry in each row is a 1.

2. The first 1 of each row appears to the right of the first 1 in the row above it.

3. If any row consists entirely of zeroes, it appears at the bottom of the matrix.

Definition 2.1.2. A matrix in reduced row-echelon form is a matrix in row-echelon form


which has the additional requirement that the leading 1 of each row has only zeroes above
and below it.

Example 2.1.3. Each of these matrices is in row-echelon form



1 4 2 1 2 3 1 3 0 0 0 1 2 0


0 1 3 0 0 1 0 0 1 3 0 0 0 1

0 0 1 0 0 0 0 0 0 0 0 0 0 0

but only the last two are in reduced row-echelon form.


40 Solving linear systems

2.1.1 Back substitution

The significance of an augmented matrix in row-echelon is that it is easy to find the solution
of the associated system.
If the augmented matrix is in reduced row-echelon form, one can simply read off the
solution the associated system.
We can solve a system that is in triangular form

2x1 x2 + 3x3 2x4 = 1

x2 2x3 + 3x4 = 2

4x3 + 3x4 = 3

4x4 = 4

using the technique of back substitution as follows:

4x4 = 4 = x4 = 1

4x3 + 3 1 = 3 = x3 = 0

x2 2 0 + 3 1 = 2 = x2 =1

2x1 (1) + 3 0 2 1 = 1 = x1 = 1

So the solution to this system is (1, 1, 0, 1).

Definition 2.1.4. The elementary row operations for matrices:

I. Interchange two rows.

II. Multiply an row by a nonzero constant.

III. Add (a multiple of) one row to another.

Definition 2.1.5. Two matrices are said to be row-equivalent iff one can be obtained
from the other by a sequence of elementary row operations.

Thus, two equal matrices are certainly row-equivalent, but two row-equivalent
matrices need not be equal.
The reason for this name is that performing a row operation does not change the
solution set of the system. Thus, two row-equivalent systems have the same solution set.
2.2 Solving linear systems 41

2.2 Solving linear systems

Definition 2.2.1. Gaussian elimination is the following method of solving systems of


linear equations:

1. Write the system as an augmented matrix.

2. Use elementary row operations to convert this matrix into an equivalent matrix which
is in row-echelon form.

3. Write this new matrix as a system of linear equations.

4. Solve this simplified equivalent system using back-substitution.

Example 2.2.2. Consider the following system:

x1 3x3 = 2

3x1 2x3 + x2 = 5

2x1 + 2x2 + x3 = 4

First, we write the system as an augmented matrix:



1 0 3 2

2

3 1 5

2 2 1 4

Second, we perform elementary row operations as follows:



1 0 3 2

(3)R1 + R2 R2

0 1 7 11

2 2 1 4

1 0 3 2

(2)R1 + R3 R3

0 1 7 11

0 2 7 8

1 0 3 2

(2)R2 + R3 R3

0 1 7 11

0 0 7 14
42 Solving linear systems


1 0 3 2

( 71 )R3 R3

0 1 7 11

0 0 1 2

Third, we write this last matrix as a system of equations:

x1 x3 =2

x2 + 7x3 = 11

x3 = 2

Finally, we use back-substitution to obtain

x2 + 7 2 =11 = x2 = 3

x1 2 = 2 = x1 = 4

Thus, Gaussian elimination yields the solution (4, 3, 2).

Gauss-Jordan elimination

Definition 2.2.3. Gauss-Jordan elimination is the following method of solving systems


of linear equations:

1. Write the system as an augmented matrix.

2. Use elementary row operations to convert this matrix into an equivalent matrix which
is in reduced row-echelon form.

3. Write this new matrix as a system of linear equations.

4. Solve this simplified equivalent system using back-substitution.

So Gauss-Jordan elimination is just an extension of Gaussian elimination where you


convert the matrix all the way to reduced row-echelon form before converting back to a
system of equations.

Example 2.2.4. Continuing from the previous example, we could convert the matrix to
2.2 Solving linear systems 43

reduced row-echelon form as follows:



1 0 0 4

(3)R3 + R1 R1

0 1 7 11

0 0 1 2

1 0 0 4

3 (7)R3 + R2 R2

0 1 0

0 0 1 2

Now when we convert this matrix back into a linear system, we see that it immediately
gives the solution (4, 3, 2). This is the point at which you can simply read off the
solution from the matrix.

Remark. I dont require you to write the particular row operation being used, as I have
done above. However, I do recommend it as it is a good way of avoiding computational
mistakes.

Remark. Whenever you are working with an augmented matrix and you obtain a row
which is all zeroes except for the last, then you have an inconsistent system. That is, if
you get a row of the form
h i
0 0 0 c

for c 6= 0, then the original system of linear equations has no solution.

Definition 2.2.5. One particular important and useful kind of system is one in which
all the constant terms are zero. Such a system is called a homogeneous system. It is a
fact that every homogeneous system is consistent (ie, has at least one solution). One easy
way to remember this is to notice that every homogeneous system is satisfied by the trivial
solution, that is, x1 , x2 , . . . , xn = 0. When you set all variables to zero, the left side of
each equation becomes 0.

Theorem 2.2.6. A homogeneous system can only be row-equivalent to another homoge-


neous system.

Proof. No row operation alters any column of 0s.

Theorem 2.2.7. A homogeneous system with more variables than equations must have
infinitely many solutions.
44 Solving linear systems

Proof. The reduced row-echelon form can only have fewer nonzero rows than the original
matrix. Each nonzero row corresponds to a leading variable which will be given as a
function of free variables, so the number of free variables is

total leading = f ree.

So if there are less leading variables than rows (total variables), the number of free variables
is positive. The presence of one free variable indicates infinitely many solutions.

Example 2.2.8. We can solve the homogeneous system

2x1 + 2x2 x3 + x5 = 0

x1 x2 + 2x3 3x4 + x5 = 0

x1 + x2 2x3 x5 = 0

x3 + x4 + x5 = 0.

by Gauss-Jordan elimination as:



2 2 1 0 1 0 0 0 3 6 3 0

1 1 2 3 0 1 1 2 3 1 0 2R2 + R1 R3

1

1 2 0 1 0 0 0 0 3 0 0

1 R2 + R3 R3

0 0 1 1 1 0 0 0 1 1 1 0

1 1 2 3 1 0
R1 R2
3 6

0 0 3 0


(1)R1 R1
0 0 0 1 0 0
( 31 )R3 R3

0 0 1 1 1 0

1 1 2 0 1 0
(1)R3 + R1 R1

0 0 3 0 3 0


(1)R3 + R2 R2
0 0 0 1 0 0
(1)R3 + R4 R4

0 0 1 0 1 0

1 1 0 0 1 0
2R4 + R1 R1

0 0 1 0 1 0


(3)R4 + R2 R2
0 0 0 1 0 0
R3 R4

0 0 0 0 0 0
2.2 Solving linear systems 45

The corresponding system is

x1 + x2 + x5 = 0

x3 + x5 = 0

x4 = 0.

Solving for the leading variables,

x1 = x2 x5

x3 = x5

x4 = 0,

so the solution set is


.
{(s t, s, t, 0, t) .. s, t R} R5 .

Note that no operation affects the far right column, as all these entries are 0.

HW 2.1: 1, 6, 10
HW 2.2: 6, 10, 12, 19, 20
For #6, just do Gauss-Jordan; no need to do Gaussian.
46 Solving linear systems

2.3 Elementary matrices and finding inverses

2.3.1 Elementary matrices

Definition 2.3.1. An n n matrix is called an elementary matrix if it can be obtained


from the identity matrix In by a single elementary row operation.

Example 2.3.2.

0 1 0 1 0 0 1 0 2

E1 = 1 E2 = 0 E3 = 0

0 0 1 0 1 0

0 0 1 0 0 3 0 0 1

E1 comes from I3 by an application of the first row operation - interchanging two rows.
E2 comes from I3 by an application of the second row operation - multiplying one row by
the nonzero constant 3.
E3 comes from I3 by an application of the third row operation - adding twice the third
row to the first row.

2.3.2 Representation of Row Operations

Example 2.3.3. Suppose we have the matrices



1 2 3 0 1 0

A= 4 and E1 = 1

5 6 0 0

7 8 9 0 0 1

so that E1 is the elementary matrix obtained by swapping the first two rows of I3 . Now
we work out the matrix products as

0 1 0 1 2 3 4 5 6

E1 A = 1 0 0 4 5 6 = 1 2 3


0 0 1 7 8 9 7 8 9

1 2 3 0 1 0 2 1 3

AE1 = 4 5 6 1 0 0 = 5 4 6


7 8 9 0 0 1 8 7 9

Conclusion:
2.3 Elementary matrices 47

multiplying by E1 on the left swapping the first two rows of A


multiplying by E1 on the right swapping the first two columns A.


1 0 0 1 2 3 1 2 3

E2 A = 0 6 = 4

1 0 4 5 5 6

0 0 3 7 8 9 21 24 27

So multiplying on the left by E2 is the same as multiplying the third row by 3.


This is the same operation by which E2 was obtained from the identity matrix.


1 0 2 1 2 3 1 + 14 2 + 16 3 + 18

E3 A = 0 6 =

1 0 4 5 4 5 6

0 0 1 7 8 9 7 8 9

So multiplying on the left by E3 is the same as adding twice the third row to the first.
This is the same operation by which E3 was obtained from the identity matrix.

Row operations correspond to (matrix) multiplication by elementary matrices. Every-


thing that can be performed by row operations can similarly be performed using elementary
matrices.

Earlier: two matrices are row-equivalent iff there is some sequence of row operations
which would convert one into the other. Now:

Definition 2.3.4. Two matrices A and B are row-equivalent iff there is some sequence of
elementary matrices E1 , E2 , . . . , Ek such that

Ek Ek1 . . . E1 E2 A = B.

Inverses and Elementary Matrices

Theorem 2.3.5. If E is an elementary matrix, then E is invertible and its inverse E 1


is an elementary matrix of the same type.
48 Solving linear systems

Example 2.3.6. Since


1 0 2

E= 0

1 0

0 0 1

comes by 2R3 + R1 R1 , we choose the operation that would undo this, namely,
(2)R3 + R1 R1 . Then the elementary matrix corresponding to this is

1 0 2

E 1 = 0 0 .

1

0 0 1

Theorem 2.3.7 (Characterization of Invertibility). The following are equivalent:

(1) A is invertible.

(2) A can be written as the product of elementary matrices.

(3) A is row equivalent to I.

(4) The RREF (reduced row-echelon form) of A is I.

(5) The system of n equations in n unknowns given by Ax = b has exactly one solution.

(6) The system of n equations in n unknowns given by Ax = 0 has only the trivial solution
x1 = x2 = . . . = xn = 0.

Corollary 2.3.8. Let A be square. If BA = I, or if AB = I, then B = A1 .

Proof. First, we show A is invertible by using (6) of the previous theorem. Suppose Ax = 0,
so that x is a homog solution. Left-multiply the first hypothesis by B to get

BAx = B0 = Ix = 0 = x = 0.

So x = 0. Then by the prev thm, A is invertible. Thus we can right-multiply the first
hypothesis by A1 to obtain

BAA1 = IA1 = B = A1 .

The proof for AB = I follows BSA.


2.3 Elementary matrices 49

2.3.3 How to find inverses

From part (2) of the theorem, A is invertible iff

Ek . . . E2 E1 A = I.

Multiply on the right by A1 to get

Ek . . . E2 E1 AA1 = IA1 , or

Ek . . . E2 E1 I = A1 .

SO: the same sequence of elementary operations that takes A to I will take I to A1 .
This suggests a method:

1. Write [A|I].

2. Apply row operations to [A|I] to obtain [I|X].

3. Then X = A1 .
1 01
h i
Example 2.3.9. Let A = 1 1 1 . We compute the inverse:
0 10


1 0 1 1 0 0 1 0 1 1 0 0

1 0 0 R1 + R2 R2 ,

1 1 0 1 1 2 1 1 0

0 1 0 0 0 1 0 1 0 0 0 1

1 0 1 1 0 0

0 1 R3 + R2 R2 ,

0 2 1 1

0 1 0 0 0 1

1 0 1 1 0 0

0 R2 R3 ,

1 0 0 0 1

0 0 2 1 1 1

1 0 1 1 0 0
1
0 R3 R3 ,

1 0 0 0 1
2
1 1 1
0 0 1 2 2 2

1
1 0 0 2 21 1
2

0 R3 + R1 R1 ,

1 0 0 0 1

1 1
0 0 1 2 2 12
50 Solving linear systems

Thus,
1
2 12 1
2

A1 = 0 1 .

0

1 1
2 2 21

IMPORTANT NOTE: by the theorem, if the RREF of A is not I, then A is not


invertible. If you wind up with a matrix B A that has a row of 0s during row-reduction,
then A is not invertible!

HW 2.3: 4, 11, 19, 21, 23, 24, 25


2.4 Systems & Invertibility 51

2.4 Systems of equations & Invertibility

2.4.1 The number of solutions of a system of linear equations

Theorem 2.4.1. Every system of linear equations has no solutions, exactly one solution,
or infinitely many solutions.

Proof. It is clear by example that a system can have no solutions or one solution. Therefore,
we just need to show that any system with more than one solution actually has infinitely
many.

Assume that x1 6= x2 are solutions, define x0 := x1 x2 6= 0. Then

Ax0 = A(x1 x2 ) = Ax1 Ax2 = b b = 0. (?)

Now for t R, the system has a new solution x1 + tx0 :

A(x1 + tx0 ) = Ax1 + tAx0 = b + t0 = b, by (?).

Since there are infinitely many t R and x0 =


6 0, there are infinitely many solutions
yt = x1 + tx0 .

Theorem 2.4.2. If A is invertible, then Ax = b has a unique solution.

Proof. We have at least one solution, since x = A1 b works:

Ax = A(A1 b) = (AA1 )b = Ib = b.

To see this is the only solution, suppose Ay = b also. Then

A1 Ay = A1 b

Iy = A1 b,

so y = A1 b, the same solution.


52 Solving linear systems

Example 2.4.3. Solve the system

x1 + x3 = 1

x1 + x2 + x3 =3

x2 = 2

So
1
1 0 1 2 21 1
2

A = 1 1 , A1 = 0 1 .

1 0

1 1
0 1 0 2 2 12

(We computed the inverse of this matrix already.) So the solution is



1
x1 2 12 1
2 1 1
2 + 3
2 +1 3

x2 = A1 b = 0 1 3 = 0 + 0 + 2 = 2 .

0

1 1
x3 2 2 21 2 1 3
2 2 1 2

Check:
3 + (2) = 1

3 + 2 + (2) =3

2 = 2

Theorem 2.4.4. Let A, B be n n matrices. If AB is invertible, then so are A and B.

Well prove this later, but the idea is similar to the following: if 1/a and 1/b are both
1
defined, then (1/a)(1/b) = ab is defined. I.e., if a 6= 0 and b 6= 0, then ab 6= 0.

The method of finding inverses can also tell you what conditions b must satisfy for a
system to be solvable; it indicates what the solution will look like in terms of b.

Example 2.4.5. Consider the system

10x1 + 5x2 15x3 = b1

x1 + x2 x 3 = b2

x1 + 2x2 + x 3 = b3
2.4 Systems & Invertibility 53

We can attempt to solve this system symbolically by row-reducing the augmented matrix

1
10 5 15 b1 2 1 3 5 b1 1

5 R1 R1
1 1 b2 1 1

1 1 b2
R2 + R3 R3
1 2 1 b3 0 3 0 b2 + b3

1
0 1 5 5 b1 2b2
2R2 + R1 R1
1 1 1

b2
1

1

3 R3 R3
0 1 0 3 b2 + 13 b3

1 1 1 b2
R1 R2
1
0 + 13 b3

1 0 3 b2

1
R2 R3
0 1 5 5 b 1 2b 2

2
1 0 1 3 b2 13 b3
R2 + R1 R2
1
0 + 13 b3

1 0 3 b2


1
R2 + R3 R3
0 0 5 5 b1 53 b2 + 13 b3

1
1 0 0 25 b1 + 13 b2 4
15 b3 1

5 R3 R3
1
0 + 13 b3

1 0 3 b2


1
R3 + R1 R1
0 0 1 25 b1 13 b2 + 1
15 b3

So the solution, as a function of b, is the vector



1
25 b1 + 13 b2 4
15 b3

x=
1 1 .

3 b2 + 3 b3
1 1 1
25 b1 3 b2 + 15 b3

But this is solvable because A was invertible.

Example 2.4.6. Consider the system

2x1 + 2x2 + x3 = b1

x 1 + x 2 x 3 = b2

x1 x2 + x3 = b3
54 Solving linear systems

We attempt to solve this system symbolically by row-reducing the augmented matrix



2 2 1 b1 0 0 3 b1 2b2
2R2 + R1 R1
1 1 b2 1 1

1 1 b2
R2 + R3 R3
1 1 1 b3 0 0 0 b2 + b3

1 1 1 b2
R1 R2
1
0 23 b2

0 1 3 b1 1

3 R2 R2
0 0 0 b2 + b3

1 1 0 31 b1 + 31 b2
R1 R2
0 0 1 13 b1 32 b2

1

3 R2 R2
0 0 0 b2 + b3

From the third row, we see that b3 = b2 . Thus, the system has a solution iff b is of the
form
b1

b= b2 ,


b2

in which case the RREF is


1
1 1 0 3 b1 + 31 b2

1
32 b2 ,

0 0 1 3 b1

0 0 0 0

and the solution is of the form



1
3 b1 + 13 b2 t

x= t R.

t

1 2
3 b1 3 b2

HW 2.4: 1, 4, 5, 9, 11
Chapter 3

Determinants

3.3 Determinants by cofactor expansion

You know functions of a real variable, like

f : R R by f (x) = x2 g : R R by g(x) = sin x.

Weve seen functions of a vector, like

f : R2 R2 by f (x) = Ax,

where
a11 a12 x1
A= , x= .
a21 a22 x2

Coming soon:
q
g : R2 R by g(x) = x21 + x22 so g(x) = kxk.

We can also define functions of a general matrix, like

X
f (A) = sum(A) = aij .
i,j

This example isnt so useful. Heres one that is.


56 Determinants

Definition 3.3.1. The determinant of a 2 2 matrix A is




a b
:= ad bc.

c d

This defines a function f : M2 R by f (A) = det(A). (M2 = {2 2 matrices}.)


NOTE: dont confuse with absolute value; a determinant can be negative!

How to extend it to f : Mn R? Recursively.

Definition 3.3.2. If A is square, the minor of entry aij is the determinant of the submatrix
obtained by removing the row and column in which aij appears, and is denoted Mij . The
cofactor cij is the number (1)i+j Mij .

Example 3.3.3. The cofactors of the identity matrix I3 are



1 1 1 + +

1 1 1 = +


1 1 1 + +

Definition 3.3.4. The adjoint of A is the transpose of the cofactor matrix, denoted
adj(A).

Example 3.3.5. The adjoint matrix of



1 2 3

A= 4

5 6

7 8 9

is










T
5 6 4 6 4 5







8 9 7 9 7 8
T
3 6 3 3 6 3


2 3 1 3 1 2
= 6 12 6 = 6 12 6 .

8
9 7 9 7 8
3 6 3 3 6 3



2 3 1 3 1 2








5 6 4 6 4 5

NOTE: the adjoint need not be symmetric this was a fluke for this example.
3.3 Cofactor expansion 57

3.3.1 Determinants by cofactors

Theorem 3.3.6. det A can be computed by performing a cofactor expansion along any
row or column:

det A = ai1 ci1 + ai2 ci2 + + ain cin along ith row

= a1j c1j + a2j c2j + + anj cnj along j th column

Can use ANY row or column so pick one with a lot of 0s to make life easier!

Example 3.3.7. Compute the determinant by cofactor expansion.




2 0 1
2

4

0 3 1
det A = (use 2nd col)
2 6 3

0

1 0

1 4

4 3

1 2 1 2 2 1 2 2 1 2

= (0) 0 6 3 + (0) 3 (2) 4 1 + (1) 4

0 6 3 3 1

3

1 0 4 1 0 4 1 0 4 0 6

Now, compute each of these:




2 1 2
4

1 2 2 2 2
(2) 4

3 1 = (2) (1)
+ (3) (0)
4




1 4 1 4 1
1 0 4

= (2) ((1)(16 1) + (3)(8 2))

= (2) (17 + 18)

= 70



2 1 2











3 1 1 2 1 2
(1) 4

3 1

= (1) (2) (4) + (0)
3 3

6 6 3 1
3

0 6

= (1) ((2)(9 6) + (4)(3 12))

= (1) (30 60)


58 Determinants

= 90

So det A = 70 + 90 = 20.

The next theorems deal with square matrices, since only square matrices can be
invertible or have determinants.

Theorem 3.3.8. A is invertible iff det A 6= 0.

Proof. Later.

Lemma 3.3.9. If i 6= j, then ai1 cj1 + ai2 cj2 + + ain cjn = 0.

Sketch of proof. Following the book, we work part of the 3 3 case explicitly.
Consider a11 c31 + a12 c32 + a13 c33 , obtained by choosing i = 1, j = 3. Replace the third
row of A with a copy of the first:

a11 a12 a13 a11 a12 a13

A = a21 = A0 = a21 a23 .

a22 a23 a22

a31 a32 a33 a11 a12 a13

A0 has two identical rows, so its RREF has a row of zeros. By the invertibility Characteri-
zation Thm, this means A0 is not invertible. Hence, by the previous theorem, det A0 = 0.
The cofactor matrix associated with A0 is

c011 c012 c013 c011 c012 c013

c0 = c021 c022 c023
0
= c21 c022 c023 ,


c031 c032 c033 c31 c32 c33

since the last row depends only on the first two rows of A0 , and these are the same as the
first two rows of A. If we compute det A0 using cofactor expansion along the third row,
the result is
det A0 = a11 c31 + a12 c32 + a13 c33 .

Therefore, a11 c31 + a12 c32 + a13 c33 = 0.

Theorem 3.3.10. If det A 6= 0, then

1
A1 = adj A.
det A
3.3 Cofactor expansion 59

6 0, the previous theorem


Proof. We will prove that A adj A = (det A)I. Then, since det A =
tells us A is invertible, and we can multiply by A1 :

(det A)I = A adj A = A1 (det A)I = A1 A adj A


1
A1 = adj A,
det A

to complete the proof. So:


a11 a12 ... a1n

a
21 a22 . . . a2n c11 c21 . . . cj1 ... cn1

.. .. ..
c12 c22 . . . cj2 . . . cn2

. . .
B := A adj A =


.. .. .. ..


ai1 ai2 . . . ain . . . .

.. .. ..
. . c1n c2n . . .
. cjn ... cnn


an1 an2 . . . ann

Now the entries of B = [bij ] are given by

bij = ai1 cj1 + ai2 cj2 + + ain cjn .

If i = j, then bij is the cofactor expansion of det A, as given in Theorem 3.3.6.


If i 6= j, then bij = 0 by the Lemma. Therefore,


det A 0 ... 0 1 0 ... 0

0 det A ... 0 0 1 ... 0


B = A adj A =
.. .. .. ..
= det A
.. .. . . ..
= (det A)I.

. . . . . . . .

0 0 ... det A 0 0 ... 1

By the initial remarks, this completes the proof.

NOTE: this is not such a good formula for computing inverses. The row-reduction
method is probably less work.
However, this formula will help establish useful properties of the inverse.

Next: use the formula to prove those theorems about triangular matrices. (Recall: this
includes diagonal matrices!)

Theorem 3.3.11. If A is triangular, then det A = a11 a22 . . . ann .


60 Determinants

Proof. Suppose A is upper triangular. Compute the determinant via a cofactor expansion
down the first column:

a11 a12 . . . a1n


0 a22 . . . a2n

det A =

.. .. . . ..

. . . .


0 0 . . . ann

a22 . . . a2n


.. .. . . ..

= a11 . + 0 c12 + + 0 c1n

. . .


0 . . . ann

a33 . . . a3n


.. .. . . ..

= a11 a22

. . . .


0 . . . ann

a44 . . . a4n


.. .. . . ..

= a11 a22 a33

. . . .


0 . . . ann
..
.


a(n1)(n1) a(n1)n
= a11 a22 . . . a(n2)(n2)

0 ann

= a11 a22 . . . ann .

Corollary 3.3.12. Suppose A is triangular (/diagonal). Then A is invertible iff all


diagonal entries are nonzero.

Proof. Let A be a triangular matrix. Then

A is invertible det A 6= 0 an earlier Thm

a11 a22 . . . ann 6= 0 last result

aii 6= 0, i no zero-divisors in R.

Corollary 3.3.13. The inverse of an invertible triangular matrix is also triangular (of
the same type).
3.3 Cofactor expansion 61

Proof. Suppose A is an invertible triangular matrix. Invertibility gives

1
A1 = adj A.
det A

Scalar multiplication does not change triangularity, so this shows that we only need to
prove adj A is triangular.
Lets take A to be upper triangular, for definiteness. Then

i>j = aij = 0.

The adjoint is the transpose of the cofactor matrix, so we want to show

i<j = cij = (1)i+j Mij = 0, orMij = 0.

Let Bij be the submatrix obtained from A when the ith row and j th column are deleted,
so that Mij = det Bij .
Since i < j, the (i + 1)th row of A starts with at least i zeros.
But the ith row of Bij is just this same row with the j th entry removed, so the ith row of
Bij starts with at least i zeros. So Bij has a zero on the diagonal, in the ith row. This
means
0 = det Bij = Mij = cij = 0.

Here is a diagram for the final argument:



2 2 2 2
\ 2
2 2 2 2
\ 2
\ 2
\ 2
\ 2
\

2 2

0 0
2 2
\ 2 7


2

0 0 0
2
\ 2


0 0 0 2
\ 2

In the theory of abstract algebra, this means that the upper triangular matrices Un
form a (unitary) ring (and similarly for lower triangular):

1. Un is an abelian group under addition. (A + B = B + A Un , Un contains inverses


& identity.)

2. Un has a multiplication which is associative & distributive.


62 Determinants

3. (Unitary) Un has a multiplicative identity.

In fact, taking into account scalar multiplication, Un is an algebra over R .

HW 3.3: 1, 3, 5, 6, 7, 12, 18
3.2 Determinant Properties 63

3.2 Properties of the Determinant Function


What properties does f (A) = det A have, as a mapping f : Mn R? Is it additive?
Multiplicative? Linear? Continuous? Differentiable? Under what transformations is the
determinant invariant?

Theorem 3.2.1. det A = det AT .

Proof. First, note that this is clearly true for 2 2 matrices, just from the basic formula.
det A can be calculated by cofactor expansion along the first row.
det AT can be calculated by cofactor expansion along the first column.
These are the same thing.

This means that most row statements about determinants are still true for columns.

3.2.1 Determinants and row operations

This provides a much faster way of computing determinants than by cofactor expansion.

Theorem 3.2.2. Let A be n n.

1. If B comes from A by multiplying a row by k, then det B = k det A.

2. If B comes from A by swapping two rows, then det B = det A.

3. If B comes from A by adding a multiple of one row to another, then det B = det A.

Example 3.2.3. For A = [aij ],




ka11 ka12 ka13

a23 = ka11 c11 + ka12 c12 + ka13 c13

a21 a22


a31 a32 a33

= k(a11 c11 + a12 c12 + a13 c13 ) = k det A

Similarly, this picture should convince you that swapping rows produces a sign change in
the determinant:

1 1 1 + +

1 1 1 = +


1 1 1 + +
64 Determinants

How to exploit this to compute determinants:

1. Use row operations to reduce the matrix to triangular form.

2. Record the operations used as a leading coefficient.

3. Find the determinant of the reduced matrix by taking the product along the diagonal
(prev thm).

4. Multiply by the coefficient to obtain the desired determinant.

Example 3.2.4. Find the determinant of



0 3 1

A= 1

1 2

2 0 1

Solution.

0 3 1



1
det A = 2 1 1 2 2 R3 R3


12

1 0

0 3 1



= 2
5 R3 + R2 R2
0 1 2

12

1 0

0 0 17

2


= 2
5 3R2 + R1 R1
0 1 2

12

1 0

1 0 12


= 2 0 1
5 R3 R1
2
17
2

0 0

= (2)( 17
2 ) triangular matrix thm

= 17
3.2 Determinant Properties 65

Example 3.2.5. Find the determinant of



1 0 2

A= 3 6 .

5

2 2 1

Solution.


1 3 2 1 3 2

T
det A = det A = 0 2 = 0 2 = 15

5 5

3

2 6 1 0 0

Theorem 3.2.6. If A has a row or column of zeros, then det A = 0.

Proof. Do a cofactor expansion along that row or column and obtain

det A = 0 c1 + 0 c2 + + 0 cn = 0.

Theorem 3.2.7. det kA = k n det A.

Proof.

ka11 ka12 . . . ka1n a11 a12 . . . a1n a11 a12 . . . a1n


ka21 ka22 . . . ka2n ka21 ka22 . . . ka2n a21 a22 . . . a2n

det kA = = = kn n
... .. . . .. = k ... .. . . .. ... .. . . .. = k det A.


. . . . . . . . .


kan1 kan2 . . . kann kan1 kan2 . . . kann an1 an2 . . . ann

The determinant is not additive; i.e., it is NOT generally true that det(A + B) =
det A + det B.

But it IS multiplicative: det(AB) = det A det B.


For the proof, need some lemmas.

Lemma 3.2.8. If E is elementary, then det(EB) = det E det B.

Proof. case (1) E comes by multiplying a row by a constant k.


Then EB is B with one row multiplied by k, so

det(EB) = k det B.
66 Determinants

But we already saw det E = k, so

det(EB) = det E det B.

case (2) E comes by row swap. Similar.

case (3) E comes by row addition. Similar.

Theorem 3.2.9. A is invertible iff det A 6= 0.

Proof. Let R = Ek . . . E2 E1 A be the RREF of A. Applying the lemma k times,

det R = det Ek . . . det E2 det E1 det A.

Since det E 6= 0 for any elementary matrix,

det R = 0 det A = 0.

() Suppose A is invertible. Then R = I and det R = 1 6= 0 shows det A 6= 0.


6 0. Then det R 6= 0, so R cannot have a row of zeros. Then by
() Suppose det A =
prev Thm, R = I. This implies A is invertible, by Characterization Thm.

Definition 3.2.10. A vector x is proportional to a vector y iff x = ay for some a 6= 0.

Corollary 3.2.11. If A has two proportional rows (or two proportional columns) then
det A = 0.

Proof. Suppose A has two proportional rows. Then if B is the RREF of A, B has a row
of zeros, so B is not invertible and det B = 0. But B is invertible iff A is invertible, so
det A = 0 also. (We are using the Invertibility Characterization Thm.)
If A has two proportional columns, then AT has two proportional rows. Since det A =
det AT , this reduces to the previous case and we are done.

Theorem 3.2.12. det is multiplicative: det AB = det A det B.

Proof. case (1) A is invertible. Then A = E1 E2 . . . Ek , so

AB = E1 E2 . . . Ek B
3.2 Determinant Properties 67

det AB = det(E1 E2 . . . Ek B)

= det E1 det E2 . . . det Ek det B

= det(E1 E2 . . . Ek ) det B

= det A det B

case (2) A is not invertible. Then AB is not invertible, by Thm. Hence

0 = det AB = 0 det B = det A det B.

Example 3.2.13. Let



2 0 4 1 8 2
A= ,B = , AB = ,
3 1 1 2 13 1

Then

det A = 2 0 = 2, det B = 8 (1) = 9, det AB = 8 (26) = 18.

Theorem 3.2.14. If A is invertible, then det A1 = (det A)1 = 1


det A .

Proof. Since A1 A = I,

det(A1 A) = det I

det A1 det A = 1 prev Thm


1
det A1 = det A 6= 0.
det A

HW 3.2: 1, 6(a), 8, 9, 13, 15, 17, 18, 34


68 Determinants

3.5 Applications of Determinants

3.5.1 Cramers Rule

Theorem 3.5.1 (Cramers Rule). Suppose Ax = b, where det A 6= 0. Then the solution
to the system is given by

x1

..
det Ai
x= , xi = ,

. det A
xn

where Ai is obtained from A by replacing the ith column of A with b.

Example 3.5.2. Solve the system with Cramers Rule.

2x1 + x2 + 2x3 = 1

4x1 x2 + x3 = 0

x2 2x3 = 1

Solution. First, gather the matrices



2 1 2 1 1 2

A = 4 1 A1 = 0 1

1 1

0 1 1 1 1 1

2 1 2 2 1 1

A2 = 4 A3 = 4 1 0

0 1

0 1 1 0 1 1

Then compute the determinants (introduce DIAGONAL TECHNIQUE for 3 3):

det A = 6 det A1 = 6 det A2 = 18 det A3 = 6

Now the solution is



det A1 6
det A 6 1

x=
det A2
=
18 = 3

det A 6
det A3 6
det A 6 1
3.5 Applications of Determinants 69

Advantages of Cramers Rule:

1. Fast for small systems.

2. Can compute xi without computing the entire solution.

Disadvantages:

1. Too long/slow for large systems. Row-reduction is more efficient for systems larger
than 3 3.

3.5.2 Linear systems of the form Ax = x

Note: such a system is equivalent to

Ix = Ax

Ix Ax = 0

(I A)x = 0

which is homogeneous for



0 0 a11 a12 a13 a11 a12 a13

I A = 0 0 a21 a23 = a21 a22 a23

a22

0 0 a31 a32 a33 a31 a32 a33

Definition 3.5.3. The characteristic polynomial of A is

det(I A),

and the characteristic equation of A is

det(I A) = 0.

The system (I A)x = 0 will have nontrivial solutions iff det(I A) = 0.


(A nonzero determinant means invertibility of (I A), which implies only the trivial
solution exists.)

Definition 3.5.4. A solution of the characteristic equation is called an eigenvalue. The


corresponding nontrivial solutions of (I A)x = 0 are called eigenvectors.
70 Determinants

More later.

Example 3.5.5. For A = [ 23 01 ] from a previous example, the characteristic equation is



2

0
det(I A) = = ( 2)( 1) 0 = 2 3 + 2 = 0.
3 1

Thus the eigenvalues of A are = 1, 2.


For = 1, solving (I A)x = 0 yields

1 0 x1 0 1 0 0 1 0 0
= =
3 0 x2 0 3 0 0 0 0 0

yields x1 = 0 and x2 =anything. So an eigenvector corresponding to = 1 is



0
x= .
1

Meanwhile, for = 2,

0 0 x1 0 0 0 0 1 13 0
= = ,
3 1 x2 0 3 1 0 0 0 0

which means that x1 13 x2 = 0, or 3x1 = x2 . So an eigenvector corresponding to = 2 is



1
x= .
3

HW 3.2: 10, 11, 12, 26


HW 3.5: 14
Chapter 4

Vector Spaces

4.1 Vectors in R2 and R3


Add two vectors geometrically by putting the tail of x on the head of y (or v.v.).

2x is the vector in the direction of x but twice as long.


NOTE: the scalar multiple of a vector still points in the same direction.

x = (1)x is the vector in the opposite direction of x and of the same length.

x y goes from the head of y to the head of x. Think

(x y) + y = x.

Written in cartesian coordinates,



x1

x = (x1 , x2 , x3 ) = x2 .


x3

So if in doubt, go back to x = (x1 , x2 , x3 ) and check componentwise.

x + y = (x1 + y1 , x2 + y2 , x3 + y3 )

kx = (kx1 , kx2 , kx3 ), kR

Just like matrices, two vectors are equal iff all their corresponding entries are equal.
72 Vector Spaces

The vector from a point P = (x, y) to a point Q = (x0 , y 0 ) is



x0 x
P~Q = Q P =
y0 y

4.1.1 Vector arithmetic

Recall from matrices:

Theorem 4.1.1. If u, v, w are vectors of the same length, and k, ` R, then

(i) u + v = v + u,

(ii) u + (v + w) = (u + v) + w = u + v + w,

(iii) u + 0 = 0 + u = u,

(iv) u + (u) = 0,

(v) k(`u) = (k`)u,

(vi) (k + `)u = ku + `u,

(vii) k(u + v) = ku + kv, and

(viii) 1u = u.
4.2 Vector Spaces 73

4.2 Vector Spaces


A vector space is a set of things that you can take linear combinations of.

Definition 4.2.1. A vector space is a set V such that:

(a) There is an operation V V V satisfying

(a) u v = v u, for any u, v V .

(b) u (v w) = (v u) w, for any u, v, w V .

(c) There is an element 0 such that u 0 = u, for any u V .

(d) For every u V , there exists an element v V such that u v = 0. Denote this
by v = u.

(b) There is a field F acting on V via F V V , such that:

(a) a (u v) = au cv, for any u, v V and a F.

(b) (a + b) u = (a u) (b u), for any u V and a, b F.

(c) (ab) u = a(b u), for any u V and a, b F.

(d) 1 u = u, for every u V .

Elements of V are called vectors. Elements of F are called scalars.


The first four properties make V into an (abelian) group.

Usually, F = R, Q, C. When F = R, V is called a real vector space. When F = C, V is


called a complex vector space.

Theorem 4.2.2. From the properties above, one can prove that any vector space satisfies:

(a) 0 u = 0, for every u V .

(b) a 0 = 0, for every a F.

(c) if a u = 0, then either a = 0 or u = 0.

(d) (1) u = u, for every u V .

Example 4.2.3. V = (Rn , +, ) is the vector space weve studied until now.
A vector here is just u = (u1 , . . . , un ).
74 Vector Spaces

Example 4.2.4. V = (Mmn , +, ) is the vector space of m n matrices.


A vector here is
u11 u1n
.. ..

u=
..
. . .

um1 umn

Example 4.2.5. V = (Un , +, ) is the vector space of n n upper triangular matrices.


A vector here is
u11 u1n
..

u=
..
. .

unn

Example 4.2.6. V = (Dn , +, ) is the vector space of n n diagonal matrices.


A vector here is
u11

u=
..
.

unn

Example 4.2.7. V = (Sn , +, ) is the vector space of n n symmetric matrices.


A vector here is
u11 u1n
.. ..

u=
..
. . .

u1n unn

Example 4.2.8. V = (T rn0 , +, ) is the vector space of n n matrices with trace 0.


A vector here is

u11 u1n
n
... ..

u=
..
where T r(u) =
X
uii = 0
. .
i=1
un1 unn

Example 4.2.9. V = (C(X), +, ) is the vector space of continuous functions on X.


Here, X might be Rn or some subset of it. A vector here is

u : X R.

Example 4.2.10. V = (RR , +, ) is the vector space of R-valued functions on R.


A vector here is just any function u : R R.
4.2 Vector Spaces 75

Example 4.2.11. V = (C k (X), +, ) is the vector space of functions on X whose deriva-


tives u(k) (x), u(k1) (x), . . . u00 (x), u0 (x), u(x) are continuous.
Here, X might be R or (a, b). A vector here is

u : X R.

This is extendable to multivariable functions: if X = R3 , then C 2 (X) would consist of all


2u
functions u : Rn R for which xi xj is continuous for each i, j = 1, 2, 3.

Example 4.2.12. V = (C (X), +, ) is the vector space of functions on X whose deriva-


tive u(k) (x) are continuous, for any k = 0, 1, 2, . . . .

Example 4.2.13. V = (Pn , +, ) is the vector space of polynomials of degree k n.


Here, X might be R or (a, b). A vector here is

u(x) = a0 + a1 x + a2 x2 + + an xn .

Example 4.2.14. V = (Exp, , ) is the vector space of exponentials epx , if we define


aepx beqx := abe(p+q)x .

HW 4.2: 1, 2, 3, 13, 24, 25


Prove that every real vector space (other than the trivial vector space {0}) contains
infinitely many vectors.
76 Vector Spaces

4.3 Subspaces
Definition 4.3.1. Let W be a nonempty subset of V which is a vector space under the
operations of V . Then W is a subspace of V (Note: both V and W have the same field F.)

Theorem 4.3.2. If W V and W is closed under the operations of V , then W is a


subspace of V . That is, W is a subspace whenever

u, v W = (i) u v W, and (ii) au W for any a F.

In particular, if W V and W is closed, then W is a vector space.

Example 4.3.3. Every vector space V has two (trivial) subspaces: V itself and the zero
subspace {0}.

Example 4.3.4. R R2 R3 Rn , for n 3.


Dn Un Mnn , and Dn Sn Mnn .
Let m n and X = R. Then Pm Pn C (X) C k (X) C(X).
.
Example 4.3.5. Let V = R3 and W = {(x, y, z) .. x 2y + 6z = 0}. This is a plane
through the origin.
.
Example 4.3.6. Let V = R3 and W = {(a, b, c) .. c = a + b} = {(a, b, a + b)}.

Example 4.3.7. Let V = R3 and U = {au1 +bu2 } = {all linear combinations of u1 and u2 },
where
1 0

u1 = 0 and u2 = 1


1 1

Then U = W from the previous example:



1 0 a 0 a

au1 + bu2 = a 0 + b 1 = 0 + b = b



1 1 a b a+b

Theorem 4.3.8. Let V be a vector space, and u1 , u2 , . . . , un V . Then

k
.
M
W ={ (ai ui ) .. ai F} = {(a1 u1 ) (ak uk )}
i=1

is a subspace of V .
4.3 Subspaces 77

Definition 4.3.9. For an m n matrix A, the null space of A is

.
null(A) := {x .. Ax = 0}.

Theorem 4.3.10. null A is a subspace of Rn .

Proof. Let x, y ker(A). Then A(x + y) = Ax + Ay = 0 + 0 = 0, so x + y ker(A).


If c R, then A(cx) = cAx = c 0 = 0, so cx ker(A).

Definition 4.3.11. The kernel of a linear transform T : Rn Rm is

.
ker(T ) := {x .. T (x) = 0} = T 1 (0).

The range of a linear transformation T is

.
ran T = {y Rm .. y = T (x), for some x Rn }.

Theorem 4.3.12. ker T is a subspace of Rn and ran T is a subspace of Rm .

Proof. HW.

HW 4.3: 19, 20, 23, 24, 25, 32, 40


Prove that ker T is a subspace of Rn and ran T is a subspace of Rm , for any linear
transformation T : Rn Rm .
Prove that for TA (x) = Ax, ker TA = null A.
.
Prove or disprove: V = ({u : R R .. lim|x| u(x) = 0}, +, ) is a vector space.
A function f : R R is even iff f (x) = f (x); it is odd iff f (x) = f (x). Prove or
disprove: (i) the even functions on R form a vector space, (ii) the even functions on R
form a vector space.
Hint: for both of these, recall that the collection of all R-valued functions on R is a vector
space and use the theorem.
78 Vector Spaces

4.4 Span

Definition 4.4.1. If S = {v1 , v2 , . . . , vn } is any set of vectors from a vector space V , then
the span of S is

k k
!
. ..
X M
span S = { ai ui .. ai F} ={ (ai ui ) . ai F} .
i=1 i=1

Conversely, if V = span S for some S V , then S is called a spanning set for V .

Theorem 4.4.2. For any S V , the span of S is a subspace of V .

P P
Proof. Let u = ai vi and w = bi vi . Then

X X X
u+w = ai vi + bi vi = (ai + bi )vi span S,
X X
and cu = c ai vi = cai vi span S.

Example 4.4.3. Let V = P8 be the polynomials of degree at most 8. Then span{x, x3 , x5 , x7 }


is the subspace of P8 consisting of odd functions.

Example 4.4.4. Let V = M22 be the 2 2 matrices, and let



1 0 0 0
A= , and B= .
0 0 0 1

Then span{A, B} is the subspace of M22 consisting of diagonal matrices.

Example 4.4.5. For V = R3 and



3 1

v1 = 0 , and v2 = 1 ,


1 2

is v = (2, 1, 3) in span{v1 , v2 }? I.e., can we solve av1 + bv2 = v?


3 1 2 3 1 2

a 0 + b 1 = 1

0 1 1

1 2 3 1 2 3
4.4 Span 79


3 1 2 3 0 1 1 0 5

1 0 1 0

0 1 1 1 1

1 2 3 1 0 5 0 0 14

This is inconsistent, so the answer is no: v


/ span{v1 , v2 }.

Example 4.4.6. We can solve the homogeneous system

2x1 + 2x2 x3 + x5 = 0

x1 x2 + 2x3 3x4 + x5 = 0

x1 + x2 2x3 x5 = 0

x3 + x4 + x5 = 0.

by Gauss-Jordan elimination as:



2 2 1 0 1 0 0 0 3 6 3 0

1 1 2 3 0 1 1 2 3 1 0 2R2 + R1 R3

1

1 2 0 1 0 0 0 0 3 0 0

1 R2 + R3 R3

0 0 1 1 1 0 0 0 1 1 1 0

1 1 2 3 1 0
R1 R2
3 6

0 0 3 0


(1)R1 R1
0 0 0 1 0 0
( 31 )R3 R3

0 0 1 1 1 0

1 1 2 0 1 0
(1)R3 + R1 R1

0 0 3 0 3 0


(1)R3 + R2 R2
0 0 0 1 0 0
(1)R3 + R4 R4

0 0 1 0 1 0

1 1 0 0 1 0
2R4 + R1 R1

0 0 1 0 1 0



(3)R4 + R2 R2
0 0 0 1 0 0
R3 R4

0 0 0 0 0 0

The corresponding system is

x1 + x2 + x5 = 0
80 Vector Spaces

x3 + x5 = 0

x4 = 0.

Solving for the leading variables,

x1 = x2 x5

x3 = x5

x4 = 0,

so the solution set is


.
{(s t, s, t, 0, t) .. s, t R} R5 .

Thus, any solution can be written



s t 1 1


s 1 0

t = s 0 + t 1




0 0 0

t 0 1

So we found that one spanning set for the null space of A is





1 1








1 0



0 , 1








0 0






0 1

Theorem 4.4.7. Suppose S is a spanning set for V . Let R be some other subset of V . If
each element of S can be written as a linear combination of elements of R, then R is a
spanning set for S.

Proof. HW

HW 4.4: 7, 8, 10, 11 Prove the last theorem.


This theorem may be helpful for #7, 8, 10.
4.5 Linear independence 81

4.5 Linear independence

Example 4.5.1. Let V = R3 and U = {au1 +bu2 } = {all linear combinations of u1 and u2 },
where
1 0

u1 = 0 and u2 = 1


1 1

Then
1 0 a 0 a

au1 + bu2 = a 0 + b 1 = 0 + b = b



1 1 a b a+b

So S1 = {u1 , u2 } is a spanning set for U . But so are:



1 3 0 1 2 0 1

S2 = { 0 , 2 , 1 } and S3 = { 0 , 0 , 1 , 1 }


1 5 1 1 2 1 0

But S1 is the most efficient in some sense: S2 and S3 both contain redundant information.
(If you were trying to describe the plane S1 , you only need two vectors.)

Important principle (to be formalized as a theorem shortly):

S1 S2 and span S1 = span S2 = U = every vector in S2 is a lin. comb. of vectors of S1

However, containment is not quite the right idea for the theorem: consider

1 1 1

S4 = { 1 , 1 , 2 }


0 2 3

One can see that no element of S1 is contained in S4 , or vice versa. Nonetheless: each
element of S4 is a linear combination of elements of S1 and vice versa:

1 1 1

1 = u1 u2 , 1 = u1 + u2 , 2 = u1 + 2u2 ,


0 2 3
82 Vector Spaces

and

1 1 1 1
1 1
u1 = 1 + 1 , u2 = 2 1

2 2
0 2 3 2

This shows that span S4 = U .


Note also that S4 is redundant in the sense that any of its elements can be built out
of the other two:

1 1 1 1 1 1 1 1 1
1 2 1 3
1 = 3 1 2 2 , 1 = 1 + 2 , 2 = 1 + 1 ,

3 3 2 2
0 2 3 2 0 3 3 0 2

When describing a plane, you really only need two vectors!

In order to avoid the redundancy in S4 where some element can be written as a


linear combination of the others, we introduce the condition of independence. The idea is
that u is not independent of {v1 , v2 , . . . , vk } if you can write

u = a 1 v1 + + a k vk

for some coefficients a1 , . . . , ak .

Definition 4.5.2. The set of vectors {v1 , v2 , . . . , vk } is linearly dependent iff there exist
constants a1 , . . . , a2 (not all 0) such that

k
X
ai vi = 0.
i=1

The set of vectors {v1 , v2 , . . . , vk } is linearly independent iff

k
X
ai vi = 0 = a1 = a2 = = ak = 0.
i=1

Pk
In other words, the only way i=1 ai vi can equal 0 is if all the ai equal 0.
Linear independence is a condition that applies to sets of vectors.
Special case: no set containing the zero vector is linearly independent.
Special case: a set of 2 vectors is linearly independent iff one vector is not a scalar
4.5 Linear independence 83

multiple of the other.

Recall from 2.2:

Theorem 4.5.3. A homogeneous system of m equations in n unknowns always has a


nontrivial solution when m < n.

Consider the homogeneous system

k
X
ai vi = 0
i=1

which can also be written



a1
| | | | | ... |
a2


a1 v1 + a2 v2 + + an vn = v1 = Ba = 0

v2 ... vn ..
.


| | | | | ... |
an

So the question of linear dependence vs. independence amounts to: does the system Ba = 0
have nontrivial solutions a or only the trivial solution a = 0?

Example 4.5.4. Are the vectors linearly independent?



1 1 3 2

v1 = 2 , v2 = 2 , v3 = 2 , v4 = 0 .


1 1 1 0

Pk
Need to look for nontrivial solutions of i=1 ai vi = 0, i.e., of the homogeneous system

a1 + a2 3a3 + 2a4 = 0

2a1 2a2 + 2a3 =0

a1 + a2 a3 =0

Pk
By the theorem, there are nontrivial solutions, i.e., i=1 ai vi = 0 does not imply that all
the ai = 0. So this set is linearly dependent.

The theorem above may be rephrased:

Corollary 4.5.5. In a vector space of dimension n, any set of n + 1 or more vectors will
be linearly dependent.
84 Vector Spaces

(We havent seen the formal definition of dimension yet, but for now all you need is
that the dimension of Rn is n.)

Sometimes it takes more work to determine independence:

Example 4.5.6. Are the vectors v1 = t2 + t + 2, v2 = 2t2 + t, and v3 = 3t2 + 2t + 2


linearly independent in P2 ?
Pk
Need to look for nontrivial solutions of i=1 ai vi = 0, i.e., of the homogeneous system

1 2 3 0 1 1 2 1 0 1

0 0 1 0

1 1 2 1 1 1

2 0 2 0 0 2 2 0 0 0

Thus the coefficient matrix of the homogeneous system is noninvertible, and so there are
Pk
infinitely many solutions. In other words, i=1 ai vi = 0 does not imply that all the ai = 0:
the set {v1 , v2 , v3 } is linearly dependent.

Theorem 4.5.7. Let S = {v1 , v2 , . . . , vn } Rn , and let A be a matrix whose rows are
elements of S (in any order). Then S is linearly independent det(A) 6= 0.

Proof. () Let B = AT , so the columns of B are elements of S. If S is linearly independent,


then Ba = 0 implies a = 0, so this homogeneous system has only the unique solution and
so B is invertible. But then det(A) = det(AT ) = det(B) 6= 0.
() We show the contrapositive: if S is linearly dependent, then det(A) = 0. Suppose
S is linearly dependent. Then one element of S can be written as a linear combination of
the others, so one row of A can be written as a linear combination of the others. Then
using row operations, one can eventually replace that row by a row of 0s. Therefore A is
not invertible, and so det(A) = 0.

Theorem 4.5.8 (Characterization of Invertibility). The following are equivalent:

(1) A is invertible (i.e., A1 exists).

(2) A can be written as the product of elementary matrices.

(3) The RREF of A is I (so A is row equivalent to I).

(4) The system Ax = b has exactly one solution.

(5) The system Ax = 0 has only the trivial solution x = 0.


4.5 Linear independence 85

(6) det A 6= 0.

(7) The rows/columns of A are linearly independent vectors.

Theorem 4.5.9 (Independence and containment). Let S1 , S2 be finite subsets of a vector


space V , with S1 S2 .
(i) If S1 is linearly dependent, then so is S2 . (ii) If S2 is independent, then so is S1 .

Proof. (i) Let S1 = {v1 , v2 , . . . , vj } and S2 = {v1 , v2 , . . . , vj , vj+1 , . . . , vk }. If S1 is depen-


dent, there is some linear combination

a1 v1 + + aj vj = 0

where not all ai = 0. Hence

a1 v1 + + aj vj + 0vj+1 + + 0vk = 0,

with not all ai = 0, which shows that S2 is dependent. (ii) is the contrapositive of (i).

HW 4.5: 7, 9, 11, 13, 14, 17, 19, 23, 24, 25, 26


Prove that if v 6= 0, then {v} is linearly independent.
Prove that if {v1 , v2 } is a linearly dependent set, then one is a scalar multiple of the
other, i.e., v1 = av2 or v2 = av1 for some a R.
Prove that {0} is linearly dependent. Using this, prove that any set containing 0 is
linearly dependent.
86 Vector Spaces

4.6 Basis and dimension

4.6.1 Basis

Definition 4.6.1. A basis for a vector space V is a set of vectors S = {v1 , v2 , . . . , vn }


such that

1. S spans V (i.e., S is a spanning set for V ), and

2. S is linearly independent.

This means: for any u V ,


P
(i) u = ai vi , for some choice of coefficients ai R, and

(ii) S is minimal in the sense that it contains no redundant information.

More precisely, the efficiency of (ii) ensures that there is ONLY ONE way to write u as a
linear combination of the vi .
A primary use of basis is this: it suffices to define any linear transform by specifying
its action on the basis. Suppose S = {u1 , u2 , . . . , un } is a basis for U and T : U V is
linear. If you know T (ui ) for each i, then you know everything about T because:
P
(i) given any u U , there is a unique way to write u = ai ui , and
P P
(ii) T (u) = T ( ai ui ) = ai T (ui ), because T is linear.

Theorem 4.6.2. If S = {v1 , v2 , . . . , vn } is a basis for V , then any vector u V can be


written in a unique way as a linear combination of elements of S.

Proof. If S is a basis, then it is a spanning set, so there is some solution of the system
P
u = ai vi by definition of spanning set. To see the uniqueness, suppose

n
X n
X
u= ai vi = bi v i .
i=1 i=1

Then we need to show ai = bi for each i = 1, 2, . . . , n.

n
X n
X n
X
0=uu= ai vi bi v i = (ai bi )vi .
i=1 i=1 i=1

S is a basis, so the vi are linearly independent, whence ai b1 = 0 for each i.


4.6 Basis and dimension 87

Example 4.6.3. Recall our previous example





1

0

1 1

1

S1 = u1 = , u = S4 = v1 = 1 , v2 = , v =

0 2 1 1 3 2






1 1 0 2 3

We saw that S4 was not linearly independent. Here is an example of how this leads to
nonuniqueness of representation:

1 1 1 1 1 1
1 1 1 1
v1 + v2 = 1 + 1 = 0 = u1 = 0 = 2 1 2 = 2v2 v3

2 2 2 2
0 2 1 1 2 3

Example 4.6.4. Is the following set a basis for R3 ?





1 1 1


S = 1 , 1 , 2





0 2 1

To check that S is independent,




1 1 1 1 1 1 1 1 1

1 1 2 = 0 3 = 0 3 = 4 6= 0

2 2

2

0 2 1 0 2 1 0 0

so S is linearly independent by the Characterization of Invertibility theorem.


To check that S spans R3 , we need to be able to find a solution of

1 1 1 a 1 1 1 x1 a

x1 1 + x2 1 + x3 2 = b 1 2 x2 = b ,

1

0 2 1 c 0 2 1 x3 c

for any choice of a, b, c. However, the coefficient matrix has nonzero determinant and
therefore is invertible, so there is always a unique solution, by Char of Inv theorem.

Note: Since R3 is 3-dimensional, S must have exactly 3 nonzero vectors, if S is to be a


basis. If S has 2 elements, it cannot span R3 . If S has 4 or more elements, then it cannot
be independent, by a theorem. From the above example, one can see that this is equivalent
88 Vector Spaces

to the following result:

Theorem 4.6.5. Let S = {v1 , v2 , . . . , vn }, and let A be the matrix whose columns (or
rows) are elements of S. Then S is a basis for Rn if and only if det(A) 6= 0.

Proof. The Characterization of Invertibility theorem contains: S is linearly independent


det(A) 6= 0, and (ii) Ax = b has a unique solution det(A) 6= 0.

Example 4.6.6. Is S = {1 + t2 , 1 + t, 2 + 2t} a basis for P2 ?


To see that S spans P2 , we need to see that a generic quadratic polynomial a + bt + ct2
can be written as a linear combination of these three:

a + bt + ct2 = a1 (1 + t2 ) + a2 (1 + t) + a3 (2 + 2t) = (a1 a2 + 2a3 ) + (a2 + 2a3 )t + a1 t2 .

Two polynomials agree for all t if and only if the coefficients of the respective powers agree,
so this gives three equations to solve:

1: a1 a2 + 2a3 = a

a1 = c,


t: a2 + 2a3 = b = a2 = 12 (a + b + c),



t2 : a = 1 (+a + b c).

a1 =c 3 4

Suppose we are given the vector 5 2t t2 . Then this formula gives

(a1 , a2 , a3 ) = (1, 4, 1) = 5 2t t2 = (1)(1 + t2 ) 4(1 + t) + (1)(2 + 2t)

To check that S is linearly independent, consider the homogeneous system



a1 =0

a1 = 0,


a2 + 2a3 = 0 = a2 = 0,



a1 a2 + 2a3 = 0

a = 0.
3

So the formula implies that this homogeneous system has only the trivial solution, and
hence S is independent.

Example 4.6.7. Is S = {1 + t2 , 1 + t, 2 + 2t} a basis for P2 ?


Alternative approach: note that P2 has the same thing linear structure as R3 (to be made
precise in 4.8). Any vectors a + bt + ct2 P2 can be written as (a, b, c) R3 and vice
4.6 Basis and dimension 89

versa, and under this encoding, the vector space operations are compatible:

(a + bt + ct2 ) + (p + qt + rt2 ) / (a, b, c) + (p, q, r)

 
(a + p) + (b + q)t + (c + r)t2 / (a + p, b + q, c + r)

So S is a basis of P2 if and only if Q = {(1, 0, 1), (1, 1, 0), (2, 2, 0)} is a basis of R3 , and


1 0 1 1 0 1 4 0 0

1 1 0 = 1 1 0 = 1 1 0 = 4 6= 0



2 2 0 4 0 0 1 0 1

Theorem 4.6.8. Let S = {v1 , v2 , . . . , vn } be any finite subset of a vector space V , and let
W = span S. Then some subset of S is a basis for W .

Proof. If S is linearly independent, then S is a basis for W .


If not, then some vj can be written as a linear combination of the others, so delete it
to obtain S2 = {v1 , v2 , . . . , vj1 , vj+1 , . . . , vn }. Note that span S2 = W . If S2 is linearly
independent, were done; if not, then repeat.

Use this idea to solve #11, for example.

HW 4.6: 2, 6, 7, 8, 11, 15, 29, 36


Also: Let 1 = (4, 1) and 2 = (2, 1)

(i) Show that {1 , 2 } is a basis for R2 and solve (2, 5) = a1 + b2 .

(ii) Find a matrix A such that TA ([ 10 ]) = 1 and TA ([ 01 ]) = 2 .

(iii) Find a matrix B such that TB (1 ) = [ 10 ] and TB (2 ) = [ 01 ].

(iv) Use TB to represent the vector (2, 5) in terms of the basis {1 , 2 }.

(v) Explain why B is called a change-of-basis matrix (or in this book, a transition
matrix).
90 Vector Spaces

4.6.2 Dimension

Theorem 4.6.9. If S = {v1 , v2 , . . . , vn } is a basis for a vector space V and Q =


{w1 , w2 , . . . , wr } is a linearly independent set of vectors in V , then r n.

Corollary 4.6.10. Any two bases B1 and B2 of V have the same number of elements.

Proof. (When the Bi have finite cardinality): apply the theorem with S = B1 and T = B2 ,
and then the other way.

This means that the following definition makes sense:

Definition 4.6.11. The dimension of a nonzero vector space V is the number dim V of
vectors in a basis for V . If V = {0}, we define dim V = 0.

Example 4.6.12. dim Rn = n, for any n = 1, 2, . . . .


{1, t, t2 } is a basis for P2 , so dim P2 = 3.
{[ 10 00 ] , [ 00 10 ] , [ 01 00 ] , [ 00 01 ]} is a basis for M22 , so dim M22 = 4.
{[ 10 00 ] , [ 00 01 ]} is a basis for D2 , the diagonal subspace of M22 , so dim D2 = 2.

Definition 4.6.13. T is a maximally independent subset of V iff T is independent and


not contained in any strictly larger independent subset of V .
T is a minimal spanning set for V iff span T = V and no strictly smaller subset of T
spans V .

Theorem 4.6.14. TFAE (the following are equivalent):

1. B is a basis for V .

2. B is a maximally independent subset of V .

3. B is a minimal spanning set for V

HW 4.6: 11, 13, 16, 18, 41, 42, 43, 48, 49


For #16, also give the dimension of this space.
4.7 Homogeneous systems 91

4.7 Homogeneous systems


.
Recall that null(A) := {x .. Ax = 0}.

Definition 4.7.1. The nullity of a matrix A is nullity(A) := dim null(A).

Example 4.7.2. Find a basis for null(A) and compute nullity(A), where

1 1 4 1 2


0 1 2 1 1
A=



0 0 0 1 2

2 1 6 0 1

Solution: starting with R3 + R4 R4 , we have



1 1 4 1 2 1 0 2 0 1 1 0 2 0 1

1

0 1 2 1 1 0 1 2 1 1 0 1 2 0


0 0 0 1 2 0 0 0 1 2 0 0 0 1 2

2 1 6 1 3 2 0 4 0 2 0 0 0 0 0

The leading variables are x1 , x2 , and x4 , so make x3 = s and x5 = t and solve in terms of
these to get

2s t 2 1

2s + t 2

1

Ax = 0 x= = s 1 + t 0

s

2t 2

0

t 0 1

So {b1 = (2, 2, 1, 0, 0), b2 = (1, 1, 0, 2, 1)} is a basis of null(A) and nullity(A) = 2.

1 5

Example 4.7.3. For A = 3 1 , find all real numbers such that the homogeneous
system (I A)x = 0 has a nontrivial solution.

Solution: the system



1 5 x1
(I A)x = =0
3 +1 x2
92 Vector Spaces

has nontrivial solutions if and only if



1 5

= ( 1)( + 1) 15 = 2 16 = ( 4)( + 4)

0 = det(I A)x =
3

+1

so = 4.
1 5

Example 4.7.4. For A = 3 1 and each of = 4, find a basis for null(I A).
Solution: solve the homogeneous system (I A)x = 0 for each value of .
For = 4,

3 5 3 5 5
I A = = x = t , for any t R.
3 5 0 0 3

For = 4,

5 5 1 1 1
I A = = x = t , for any t R.
3 3 0 0 1

1

Then {[ 53 ] , } is a basis for null(I A).
1
 1 
Alternatively, {[ 53 ] , 1 } is a basis for R2 consisting of eigenvectors of A.

4.7.1 Nonhomogeneous systems


.
Recall that {x .. Ax = b} is a vector space if and only if b 6= 0.
However, we have the following theorem:

Theorem 4.7.5. Suppose that xp is a particular solution to this nonhomogeneous system:

Axp = b and Axh = 0.

Then any solution of Ax = b can be written as x = xp + xh , where xh is some solution of


the associated homogeneous system Ax = 0.

HW 2.2: 28, 29
HW 4.7: 2, 15, 16, 1921
4.8 Coordinates and isomorphism 93

4.8 Coordinates and isomorphism


Definition 4.8.1. An isomorphism is an invertible linear transformation T : U V
between two vector spaces U and V . The spaces U and V are called isomorphic iff there
exists an isomorphism between them, in which case one writes U
=V.

Remark 4.8.2. Recall that a transformation is invertible if and only if it is both one-to-one
and onto.

f : X Y is one-to-one (or injective) iff no two points of X are mapped to the


same point of Y :

For x1 , x2 X, f (x1 ) = f (x2 ) = x2 = x2 .

f : X Y is onto (or surjective) iff for every point of Y , there is some point in X
that gets mapped to it:

For every y Y, there exists x X such that f (x) = y.

If f is both injective and surjective, it is called bijective or said to be a bijection.


Formally, the three properties must be verified in order to check that something is an
isomorphism. However, we will see that there are shortcuts.

Theorem 4.8.3. Let U, V, W be vector spaces.

(a) U
=V.

(b) If U
= V , then V
= U.

(c) If U = W , then U
= V and V = W.

Proof. HW (#28)

Example 4.8.4. P2 is isomorphic to R3 . Define T : P2 R3 by T (a + bt + ct2 ) = (a, b, c).


To check that T is linear, let f = a + bt + ct2 and g = p + qt + rt2 and recall

(a + bt + ct2 ) + (p + qt + rt2 ) / (a, b, c) + (p, q, r)


T


 
(a + p) + (b + q)t + (c + r)t2 / (a + p, b + q, c + r)
T
94 Vector Spaces

Then

T (f + g) = T ((a + p) + (b + q)t + (c + r)t2 ) = (a + p, b + q, c + r)

T (f ) + T (g) = (a, b, c) + (p, q, r) = (a + p, b + q, c + r).

To check that T is invertible, verify that the inverse is given by T 1 (a, b, c) = a + bt + ct2 :

T 1 (T (a + bt + ct2 )) = T 1 (a, b, c) = a + bt + ct2 ,

T (T 1 (a, b, c)) = T (a + bt + ct2 ) = (a, b, c).

Lemma 4.8.5 (Basis lemma). Let B = {b1 , b2 , . . . , bn } be a basis of U .

(i) If T : U V is an isomorphism, then T (B) = {T (b1 ), T (b2 ), . . . , T (bn )} is a basis


of V .

(ii) Conversely, if {c1 , c2 , . . . , cn } is a basis for V and we define T to be the linear


transform satisfying T (bi ) = ci , then T : U V is an isomorphism.

Proof. Let v V ; need to write v as a linear combo of the T (bi ):

T 1 (v) = x1 b1 + x2 b2 + + xn bn B is a basis

v = T (x1 b1 + x2 b2 + + xn bn )

v = x1 T (b1 ) + x2 T (b2 ) + + xn T (bn ) T is linear.

This shows that T (B) is a spanning set. It follows from a homework problem that T (B) is
independent, so we have a basis.
For (ii), define T to be the linear map that satisfies T (bi ) = ci ; it is immediate (by the
properties of the basis) that this extends to a linear map from all of U into V . You can
check injectivity by writing elements of U in terms of the basis {b1 , b2 , . . . , bn } and applying
T , and surjectivity by writing elements of V in terms of the basis {c1 , c2 , . . . , cn }.

The point of this lemma is that basically an isomorphism amounts to a renaming. In


other words, two vector spaces are isomorphic if and only if there is:

1. a basis B1 of U and

2. a basis B2 of V and

3. a bijection between them.


4.8 Coordinates and isomorphism 95

Next, we show that every finite-dimensional vector space is isomorphic to Rn , i.e., is


Rn up to a relabeling of the basis vectors.

Theorem 4.8.6. V is an n-dimensional real vector space iff V is isomorphic to Rn .

Proof. V is n-dimensional iff it has a basis {v1 , v2 , . . . , vn } of n vectors. Define T : V Rn


by T (vj ) = ej , where ej is the standard basis vector of Rn having j th entry 1 and consisting
of 0s elsewhere. One can check that T is an isomorphism.
For the converse, the Basis lemma says that {e1 , . . . , en } is mapped to a basis of V
under the isomorphism.

Corollary 4.8.7. Let U, V be finite-dimensional. Then U


= V iff dim U = dim V = n.

Proof. If U
= V , then let {u1 , . . . , un } be a basis for U and let T : U V be the
isomorphism between U and V . By the basis lemma, {T (u1 ), . . . , T (un )} is a basis for V
and |{u1 , . . . , un }| = |{T (u1 ), . . . , T (un )}|.
If dim U = dim V = n then both are isomorphic to Rn by the previous theorem, and
hence they are isomorphic to each other by transitivity of the equivalence relation
= (part
(iii) of Theorem 4.8.3).

HW 4.8: 28, 29, 30

4.8.1 Coordinates and change of basis

Suppose that S = {v1 , v2 , . . . , vn } is a basis for V . This is not necessarily an ordered set;
when setting up a problem, it is often up to you to pick which vector to take as the first
basis vector.

Definition 4.8.8. We say S = (v1 , v2 , . . . , vn ) is an ordered basis when the order of the
elements is important, so that S is not the same as (v2 , v1 , . . . , vn ). For an ordered basis,
given

u = a1 v1 + a2 v2 + + an vn ,
96 Vector Spaces

we write

a1

a2


u=
..



.

an
S

to express u in the coordinate system S.


For the standard coordinate system, we leave off the subscript, so

c1

c2


u=
..

u = c1 e1 + c2 e2 + + cn en ,

.

cn

where

1 0 0

0 1 0


e1 =
..
, e2 =

, . . . , en =
.. ..


.

.

.

0 0 1

Example 4.8.9. Recall the homework problem with 1 = (4, 1) and 2 = (2, 1). You
are asked to:

(i) Show that {1 , 2 } is a basis for R2 and solve (2, 5) = a1 + b2 .

(ii) Find a matrix A such that TA ([ 10 ]) = 1 and TA ([ 01 ]) = 2 .

(iii) Find a matrix B such that TB (1 ) = [ 10 ] and TB (2 ) = [ 01 ].

(iv) Use TB to represent the vector (2, 5) in terms of the basis {1 , 2 }.

Solutions:

(i) With what we know now, the quick way is to compute the determinant

2

4
= 4 (2) = 6 6= 0

1 1
4.8 Coordinates and isomorphism 97

so that det(A) 6= 0 implies linear independence of {1 , 2 }. Since dim R2 = 2, this is


a maximally independent set and hence a basis. Then

4 2 a 2 1 1 2 2 2
a1 + b2 = (2, 5) = =
1 1 b 5 6 1 4 5 3

(ii) To find A, solve



a b 1 4 a b 0 2
= and =
c d 0 1 c d 1 1

a b  4 2 
to find A = c d = 1 1 .

(iii) Since TB does the opposite transform of A, it must be that B = A1 , so



1 1 2
B= .
6 1 4

(iv) Then to represent the vector (2, 5) in terms of the basis {1 , 2 }, compute

2 1 2 2 2
TB (2, 5) = TB = 1 =
5 6 1 4 5 3

to find that

2 4 2
= 2 + 3 = 21 + 32
5 1 1

In terms of the ordered basis S = (1 , 2 ), we would now write



2 2
= .
5 3
S

In terms of the ordered basis S 0 = (2 , 1 ), you can check that



2 3
= .
5 2
S0

Definition 4.8.10. Suppose that S = (v1 , v2 , . . . , vn ) and T = (w1 , w2 , . . . , wn ) are


98 Vector Spaces

ordered bases for V .

Denote the vector v V as written in T -coordinates by



c1

c2


[v]T :=
..

v = c1 w1 + c2 w2 + + cn wn

.

cn

Suppose each wj is written in the coordinate system S as


a1j

a2j


[wj ]S :=
..

wj = a1j v1 + a2j v2 + + anj vn
.

anj

Then

[v]S = [c1 w1 + c2 w2 + + cn wn ]S

= c1 [w1 ]S + c2 [w2 ]S + + cn [wn ]S



a11 a12 a1n

a21 a22 a2n


= c1 .. + c2 .. + + cn ..


. . .

an1 an2 ann

= PST c,

where PST is the n n matrix with j th column [wj ]S . This matrix PST is the transition
matrix (or change-of-basis matrix ) from the T -basis to the S-basis.

To compute the change of basis matrix:

a1 v1 + a2 v2 + a3 v3 = w1

b1 v1 + b2 v2 + b3 v3 = w2

c1 v1 + c2 v2 + c3 v3 = w3
4.8 Coordinates and isomorphism 99

Each of these equations is a system with augmented matrix [v1 v2 v3 | wj ]. Since


each system has the same coefficient matrix [v1 v2 v3 ], we can solve all three systems
simultaneously by row-reducing

| | | | | |


v1 v2 v3 w1 w2 w3

| | | | | |

This is an algorithm for how to find PST :

1. Set up the augmented matrix

2. Transform to RREF.

3. The columns on the right side form the columns of PST .

Example 4.8.11. Consider






2 1
1
6 4
5



S= v1 = 0 , v2 = 2 , v3 = 1 and T = w1 = 3 , w2 = 1 , w3 = 5 .






1 0 1 3 3 2

If v = (4, 9, 5), then v is expressed in the T -basis by



4 6 4 5 1

v = 9 = (1) 3 + (2) 1 + (2) 5 = [v]T = 2 ,


5 3 3 2 2

and v is expressed in the S-basis by



4 2 1 1 4

v = 9 = (4) 0 + (5) 2 + (1) 1 = [v]S = 5 .


5 1 0 1 1

The transition matrix PST is found from



2 1 1 6 4 5 1 0 0 2 2 1

1 5 0 1 0 1

0 2 1 3 1 2

1 0 1 3 3 2 0 0 1 1 1 1
100 Vector Spaces

and we verify

2 2 1 1 2+42 4

PST [v]T = 1 1 2 = 124 = 5 = [v]S

2

1 1 1 2 1+22 1

Remark 4.8.12. Notice that



a11 a12 a1n

a21 a22 a2n


A=
.. .. .. ..


. . . .

an1 an2 ann

is the matrix that sends ej to (a1j , a2j , . . . , anj ), for example:


0
a11 a12 a1n
a12


1
a21 a22 a2n a22


Ae2 = 0 = . .

.. .. .. .. ..
. . . . ..

.


an1 an2 ann an2
0

Therefore, A1 can be thought of as the matrix that sends (a1j , a2j , . . . , anj ) to ej , and
hence transforms a given basis into the standard basis.

Lemma 4.8.13. Let S and T be bases of V , and let MS be the matrix whose columns are
elements of S, and let MT be the matrix whose columns are elements of T . Then

PST = MS1 MT .

Note that [v]S = PST [v]T = MS1 MT [v]T iff MS [v]S = MT [v]T = v. For example:

4 2 2 2
AS [v]S = = = 2e1 + 5e2 .
1 1 3 5

HW 4.8: 13, 15, 18, 32, 33, 35, 3941


4.9 Rank 101

4.9 Rank

Definition 4.9.1. For an m n matrix



a11 a12 a1n

a21 a22 a2n


A=
.. .. .. ..
,

. . . .

am1 am2 amn

the row space of A is the subspace of Rn spanned by the rows, and the column space of A
is the subspace of Rm spanned by the columns:

h i h i
rowspace(A) := span{ a11 a12 a1n ,..., am1 am2 amn } Rn ,

a11 a1n

a21 a2n

} Rm

colspace(A) := span{
..
,...,
..
. .

am1 amn

Theorem 4.9.2. If A is row-equivalent to B, then rowspace(A) = rowspace(B), and


similarly for columns.

Proof. A is row-equivalent to B if and only if B can be obtained from A be some sequence


of row operations, which means that each row of B is a linear combination of rows of
A. Thus the rows of A span rowspace(B), so rowspace(B) rowspace(A). The other
containment follows BSA. The results for columns also follows BSA.

Suppose we apply this theorem to the case when B is the RREF of A. Then:

1. A and B are clearly row-equivalent, so the rows of B span rowspace(A), and

2. the nonzero rows of B are linearly independent, so B is actually a basis for


rowspace(A).
102 Vector Spaces

Example 4.9.3. Find a basis for the subspace of R5 spanned by





1 3 2 1





2



2 3 2



v1 = 0 , v2 = 8 , v3 = 7 , v4 = 0 .







3 1 2 4





4 3

4 3

Solution:

1 2 0 3 4 1 0 2 0 1


3 2 8 14 0 1 1 0 1
A= =B
1

2 3 7 2 3 0 0 0 1

1 2 0 4 3 0 0 0 0 0

So one basis for span{v1 , v2 , v3 , v4 } is





1 0 0








0 1 0



B = b1 = 2 , b2 = 1 , b3 = 0 .







0 0 1





1

1 1

Definition 4.9.4. The dimension of rowspace(A) is called the row rank of A (and similarly
for the column space).

From the theorem, we know that if A and B are row-equivalent, then they have the
same row rank, etc.

Corollary 4.9.5. If B is the RREF of A, then the row rank of A is equal to the number
of nonzero rows of B.

In the previous example, we found that the row rank of A is 3.

Theorem 4.9.6. The row rank and the column rank of A are equal.

Proof. Let B be the RREF of A. The row rank of A is k n if and only if the columns
of B include the first k standard basis vectors {e1 , . . . , ek }. Note that any other nonzero
column is a linear comb of these. Thus the column rank of A is also k.
4.9 Rank 103

Since row rank and column rank are equal, the following definition makes sense:

Definition 4.9.7. The rank of A is the row rank (or equivalently, the column rank) of A
and is denoted rank(A).

Theorem 4.9.8 (Rank theorem). If A is an m n matrix, then rank A + nullity A = n.

Let TA : Rn Rm be the linear transformation given by TA (x) = Ax. Recall that we


proved ker TA = null A. Now we can also see:

nullity A = dim ker TA and rank A = dim ran TA .

If the range has dimension r, then the Rank theorem just asserts that

n = (n r) + r.

Roughly speaking, this means that for any given A, Rn decomposes into a direct sum
(or Cartesian product) of the subspace of vectors which is killed by A and the subspace of
vectors which is not killed by A:

Rn = null A rowspace A.

More precisely, given a matrix A, there is

a (n r)-dimensional subspace K = null A = ker TA ,

an r-dimensional subspace L = rowspace A = TA1 (ran TA ) on which TA is invertible,

every vector in K is orthogonal to every vector in K .

This means that any x Rn can be written uniquely as x = ay + bz with y K and z L,


and y z = 0. Note that null(A) rowspace(A) = {0}.

Example 4.9.9. Let



3 1 2 1 0 1

A= 2 3 0 1 =B

1 1

7 1 8 0 0 0
104 Vector Spaces

Then rank A = 2 and 3 = dim R3 = rank A + nullity A = 2 + 1. So rowspace A is a


(2-dimensional) plane passing through the origin of R3 and null A is a line passing through
the origin of R3 which meets rowspace A orthogonally.
From B, a basis for rowspace A is given by



1 0

0 , 1




1
1

Also from B, the solution to the homogeneous system is



t

Ax = 0 x = t ,


t

so a basis for null A is given by



1




1 .





1

These are orthogonal subspaces:



1 0 1 a t

a 0 + b 1 t 1 = b t = (a)(t) + (b)(t) + (a + b)(t) = 0


1 1 1 a+b t

The rank theorem states that rank A + nullity A = n. In the case when A is n n, note
that null A = 0 if and only if Ax = 0 has only the trivial solution x = 0. This means we
have an update:

Theorem 4.9.10 (Characterization of Invertibility). The following are equivalent:

(1) A is invertible (i.e., A1 exists).

(2) A can be written as the product of elementary matrices.

(3) The RREF of A is I (so A is row equivalent to I).

(4) The system Ax = b has exactly one solution.


4.9 Rank 105

(5) The system Ax = 0 has only the trivial solution x = 0.

(6) det A 6= 0.

(7) The rows/columns of A are linearly independent vectors.

(8) The rows/columns of A span Rn .

(9) nullity(A) = 0.

(10) rank(A) = n.

HW 4.9: 1, 5, 13, 15, 39, 40


For #15, note that you can use column operations also.
106 Vector Spaces
Chapter 5

Length and direction

5.1 Vector arithmetic and norms


Recall from matrices:

Theorem 5.1.1. If u, v, w are vectors of the same length, and k, ` R, then

(i) u + v = v + u,

(ii) u + (v + w) = (u + v) + w = u + v + w,

(iii) u + 0 = 0 + u = u,

(iv) u + (u) = 0,

(v) k(`u) = (k`)u,

(vi) (k + `)u = ku + `u,

(vii) k(u + v) = ku + kv, and

(viii) 1u = u.

Definition 5.1.2. The norm of a vector u = (u1 , u2 ) R2 is its length

q
kuk := u21 + u22 .

In general, the norm of a vector u = (u1 , . . . , un ) Rn is

n
!1/2
X
kuk := u2i .
i=1
108 Length and direction

The norm of a sequence {un }


n=1 = (u1 , u2 , . . . ) R is
N


!1/2
X
kuk := u2i .
i=1

The norm of a function ux = u(x) RR is

Z 1/2
kuk := u(x)2 dx .

NOTE: k k : Rn [0, ) is a function on vectors. It is actually a continuous function.


In fact, it is differentiable everywhere except at x = 0.

Theorem 5.1.3. For x, y Rn and any scalar k,

(i) kxk 0.

(ii) kxk = 0 iff x = 0.

(iii) kkxk = |k| kxk.

(iv) kx + yk kxk + kyk. (Triangle ineq)

Proof of (i)(iii). Homework.

Proof of (iv). Postponed.

5.1.1 Distance and length

The distance between two points is the size of the space between them, i.e., the length of
the vector connecting them.

Definition 5.1.4. For x, y Rn , the distance from x to y is the length of x y.

q
dist(x, y) = kx yk = x21 + + x2n .

NOTE: sometimes it is not easy working with kxk because of the square root. In this
case, use kxk2 = x21 + + x2n .

Example 5.1.5. The surface defined by x21 + x22 = x3 is a paraboloid. The surface defined
by y1 + y2 + y3 = 0 is a plane. What is the closest point on the plane to the paraboloid?
5.1 Vector arithmetic and norms 109

Solution. You need to minimize kx yk, where x is on the paraboloid and y is on the

plane. However, is an increasing function, and hence order-preserving. This means that
it is enough to find x, y minimizing kx yk2 .

5.1.2 Dot products and projections

We have another function mapping vectors to numbers, but this one actually takes TWO
vectors.

Definition 5.1.6. If x, y Rn , the dot product of x and y is

n
X
x y := xi yi = x1 y1 + + xn yn .
i=1

Sometimes this is called the inner product and written hx, yi or hx|yi.

The dot product is also a differentiable function Rn Rn R (even at 0!); it is just a


polynomial in 2n variables.


Theorem 5.1.7. kxk = x x.

Proof. HW.

Recall the law of cosines: if a, b, c are the side lengths of a triangle and is the angle
opposite c, then
c2 = a2 + b2 2ab cos .


If = 2, then c is the hypotenuse of a right triangle: Pythagorean theorem!

Theorem 5.1.8. If x, y Rn , x y := kxkkyk cos , where is the angle between them.

Proof. For x, y R2 , the law of cosines gives

ky xk2 = kxk2 + kyk2 2kxkkyk cos

kxkkyk cos = 12 kxk2 + kyk2 ky xk2




n n n
!
X X X
= 1
2 x2i + yi2 (yi xi ) 2

i=1 i=1 i=1


n n n
!
X X X
= 1
2 x2i + yi2 (yi2 2xi yi + x2i )
i=1 i=1 i=1
110 Length and direction

n
!
X
1
= 2 (2xi yi )
i=1

= x y.

xy
Theorem 5.1.9. The angle [0, ) between x and y is given by cos = kxkkyk .

Corollary 5.1.10. The sign of x y corresponds to the angle between them:

is obtuse xy <0

is 2 xy =0

is acute x y > 0.

Definition 5.1.11. x and y are orthogonal iff x y = 0. (From second part above.)

5.1.3 Arithmetic of the Dot Product

Theorem 5.1.12. If u, v, w are vectors and k is a scalar, then

1. (Commutativity) u v = v u.

2. (Distributivity) u (v + w) = u v + u w.

3. (Scalar associativity) k(u v) = (ku) v.

4. (Positive definite) v v > 0 iff v 6= 0, and v v = 0 iff v = 0.

The last one should be clear from the earlier theorem:


vv >0 v v = kvk 0.

How about:

1. u (v w) = (u v) w?

2. Given u 6= 0, is there a unique 1 such that u 1 = u?

3. Given u 6= 0, is there a unique v such that u v = 1?


5.1 Vector arithmetic and norms 111

HW 5.1 #16, 21, 22, 25, 26, 33, 34, 35, 36(R2 )
Also: before doing the problems from the text, obtain the following results so that you
can use them to simplify your computations:

(i) Prove that kxk 0 for every x Rn .

(ii) For x Rn , prove that kxk = 0 iff x = 0.

(iii) For x Rn , prove that kxk = k xk.

(iv) Describe the geometric meaning of (ii) and (iii).

5.1.4 Projections

We want to decompose a vector in terms of others, for example, to express in terms of a


given basis.

Example 5.1.13. Let u = (2, 3). Consider the standard basis vectors e1 = (1, 0) and
e2 = (0, 1).

u e1 = (2)(1) + (3)(0) = 2

u e2 = (2)(0) + (3)(1) = 3

Now u = (u e1 )e1 + (u e2 )e2 = 2e1 3e2 is a decomposition of u into a linear combination


of e1 and e2 .

u e1 tells you how long u is in the e1 -direction, i.e., the component of u that is parallel
to e1 . In general,

n
X
u= (u ei )ei .
i=1

This can be more complicated if vectors other than e1 and e2 are used.

Example 5.1.14. Let u = (2, 3) again, but let x = (4, 3) and y = (2, 2).

u x = (2)(4) + (3)(3) = 8 9 = 1

u y = (2)(2) + (3)(2) = 4 6 = 2.
112 Length and direction

Now

(u x)x + (u y)y = (1)x + (2)y

= (4, 3) + (4, 4)

= (8, 7)

= 4(2, 47 )

This isnt even in the same direction as u! What happened? In the first example, we had
x y = (1)(0) + (0)(1) = 0, i.e. the basis vector were orthogonal (perp).

Example 5.1.15. Let u = (2, 3) again, but let x = (1, 1) and y = (1, 1). Then

x y = (1)(1) + (1)(1) = 0,

so x and y are perpendicular.

u x = (2)(1) + (3)(1) = 2 3 = 1

u y = (2)(1) + (3)(1) = 2 3 = 5.

Now

(u x)x + (u y)y = (1)x + (5)y = (1, 1) + (5, 5) = (4, 6) = 2u

This is a scalar multiple of u! Why not just u?


kxk, kyk =
6 1. Now:
     
ux uy 1 5 1 1 5 5 4 6
2
x+ 2
y= x+ y= , + , = , =u
kxk kyk 2 2 2 2 2 2 2 2

Definition 5.1.16. Let {x, y} be a basis for Rn .


If x y = 0, then {x, y} is an orthogonal basis for R2 .
If x y = 0 and kxk = kyk = 1, then {x, y} is an orthonormal basis for R2 .

ux ux
Theorem 5.1.17. The length of u in the direction of x is kxk2 = xx . Therefore, the
(orthogonal) projection of u onto (the line spanned by) x is

ux
projx u := x.
kxk2
5.1 Vector arithmetic and norms 113

Note: this is a scalar mult of x, so certainly in the same direction.


(x 6= 0 or it doesnt make sense.)

Corollary 5.1.18 (Orthogonal decomposition). The component of u which is orthogonal


to x is
ux
u projx u = u x.
kxk2

This is helpful if you have only one vector x, but not a basis. In fact, it produces a
basis
{x, u projx u}

Proof of the theorem and the corollary. Define

w1 := projx u

w2 := u projx u.

Now w2 is orthogonal to x because

w2 x = (u projx u) x
ux
=ux xx
kxk2
=uxux x x = kxk2 .

Now since w1 is in the direction of x, we know w1 = kx for some k. To find k,

u = w1 + w2 = kx + w2 = u x = (kx + w2 ) x

= k(x x) + w2 x dot prod arith

= kkxk2 + w2 x x x = kxk2

= kkxk2 w2 x = 0
ux
k= .
kxk2

This shows that the projection onto x is

ux
projx u = w1 = kx = x.
kxk2
114 Length and direction

Corollary 5.1.19. If {vi }ni=1 is an orthogonal basis of Rn , then for any u Rn ,

n n
X X u vi
u= projvi u = v.
2 i
i=1 i=1
kv ik

If {vi }ni=1 is an orthonormal basis of Rn , then for any u Rn ,

n
X n
X
u= projvi u = (u vi )vi .
i=1 i=1

Recall that a basis allows us to break a vector into parts and deal with each part
separately:

n
X n
X n
X
TA (u) = Au = A (u vi )vi = (u vi )Avi = (u vi )TA (vi ).
i=1 i=1 i=1

We saw this formula before, but orthogonality or orthonormality means that now the
coefficients are given more explicitly (provided {vi } is given).

HW 5.1
(i) Find an orthonormal basis for the subspace of R3 consisting of vectors of the form
(a, a + b, b).

(ii) Find the projections of each vector onto each of the other two:

5 4 3


5 2 1
a=
, b=
, c=

5 4 3


5 8 9

(iii) Find an orthonormal basis for R2 that contains a vector parallel to (1, 1) and write each
of the following vectors in terms of it: a = (2, 1), b = (3, 4), e1 = (1, 0), e2 = (0, 1).
Also, sketch the orthogonal decomposition of each vector, in the basis you found.

5.2 Cross Product


Skip.
5.3 Inner product spaces 115

5.3 Inner product spaces


Recall the properties of the dot product:

Theorem 5.3.1. If u, v, w are vectors and k is a scalar, then

1. (Commutativity) u v = v u.

2. (Positive definite) v v > 0 iff v 6= 0, and v v = 0 iff v = 0.

3. (Distributivity) u (v + w) = u v + u w.

4. (Scalar associativity) k(u v) = (ku) v.

We now use this to define general inner products axiomatically:

Definition 5.3.2. If V is a real vector space, then an inner product on V is a symmetric


positive semidefinite bilinear mapping V V R. In other words, any function that
assigns each pair of vectors u, v to a real number hu, vi, in such a way that

1. (Symmetry) hu, vi = hv, ui.

2. (Positive definite) hv, vi > 0 iff v 6= 0, and hv, vi = 0 iff v = 0.

3. (Linearity) hu, av + bwi = ahu, vi + bhu, wi, for a, b R.

Definition 5.3.3. An inner product space is a vector space V , equipped with an inner
product.

Example 5.3.4. Rn is an inner product space with the dot product, which is also called
the standard inner product. In this case, the inner product is given by a matrix product

v1

n
X h i v2

= uT v

hu, vi = u v = ui vi = u1 u2 ... un
..
i=1
.

vn

If C is a matrix satisfying certain properties, Rn can also be equipped with a nonstandard


inner product

c11 c12 ... c1n v1

c21
i c22 ... c2n v2
h
hu, viC = uT Cv =

u1 u2 ... un ..

.. .. ..

..


. . . . .

cn1 cn2 ... cnn vn
116 Length and direction

Well see that in order for C to give an inner product in this way:

1. (Symmetry) hu, viC = hv, uiC iff C = C T .

2. (Positive definite) hu, viC is positive definite iff uT Cu > 0 whenever v 6= 0.

3. (Linearity) This follows automatically from the linearity of matrix multiplication.

One important way to find a symmetric and positive definite matrix C is by using an
ordered basis S = (u1 , u2 , . . . , un ): Given S, define C by cij = hui , uj i.

Theorem 5.3.5. Let S = (u1 , . . . , un ) be an ordered basis for an inner product space V
and define C = [cij ] by cij = hui , uj i. Then

(i) C is symmetric.

(ii) C determines hv, wi for every v, w V in the sense that hv, wi = [v]TS C[w]S .

Proof. (i) This is straightforward: cij = hui , uj i = huj , ui i = cji .


(ii) Write v and w in terms of the basis as

n
X n
X
v= a i ui and w= bj uj ,
i=1 j=1

so that [v]S = (a1 , . . . , an ) and [w]S = (b1 , . . . , bn ). Then


* n n
+
X X
hv, wi = a i ui , bj uj
i=1 j=1
n
X n
X
= ai bj hui , uj i
i=1 j=1
Xn Xn
= ai cij bj
i=1 j=1

= [v]TS C[w]S .

Example 5.3.6. Let S = (e1 , e2 , . . . , en ) be the standard basis on Rn . Then



1,

i = j,
hei , ej i =
0, i 6= 0,

so that C defined by cij = hei , ej i is C = I. Then

hu, viI = uT Iv = uT v = u v.
5.3 Inner product spaces 117

Theorem 5.3.7. In Rn , Ax y = x AT y and x Ay = AT x y, for any n n matrix A.

Proof. hAx, yi = hy, Axi = y T Ax = (y T A)x = (AT y)T x = hAT y, xi = hx, AT yi, and
similarly for the other one.

Example 5.3.8. If C is defined in terms of a basis as above, then it is guaranteed to be


positive definite. In general, however, it can be tricky to check that a given matrix C is
positive definite. To do this, one would compute as follows: suppose

2 1
C=
1 2

Then

h i 2 1 x1
xT Cx = x1 x2 = 2x21 + 2x1 x2 + 2x22 = x21 + x22 + (x1 + x2 )2 ,
1 2 x2

which is strictly positive iff x 6= 0.

Example 5.3.9. Let V = C(0, 1) be the vector space of all continuous function on the
closed unit interval [0, 1]. For f, g V , define

Z 1
hf, gi := f (t)g(t) dt.
0

This makes C(0, 1) into an inner product space, as we now check:

1. (Symmetry) hu, vi = hv, ui.


This boils down to the commutativity of multiplication for real numbers:
Z 1 Z 1
f (t)g(t) dt = g(t)f (t) dt.
0 0

2. (Positive definite) hv, vi > 0 iff v 6= 0, and hv, vi = 0 iff v = 0.


This is immediate from basic properties of the integral:
Z 1 Z 1
2 2
f (t) 0 = f (t) dt 0 dt 0.
0 0

Also, if f is nonzero anywhere, then f is nonzero on some interval [a, b] [0, 1]


118 Length and direction

(because f is continuous). Consequently, f (t)2 > 0 for a t b and

Z 1 Z b Z b
f (t)2 dt f (t)2 dt dt = (b a) > 0.
0 a a

For the converse, it is clear that the integral of the zero function is 0.

3. (Linearity) This is immediate from the linearity properties of integrals:

Z 1 Z 1 Z 1
(af (t) + bg(t)) dt = a f (t) dt + b g(t) dt
0 0 0

Example 5.3.10. A standard and classical use of nonstandard inner products in the
above context is to replace
Z 1
hf, gi := f (t)g(t) dt
0

with
Z 1
hf, gi := f (t)g(t)w(t) dt,
0

where w(t) > 0 is weight function. For example, a probability density function.

Definition 5.3.11. A norm on a vector space V is a function V R+ which satisfies:

(i) (Positive semidefiniteness) kuk 0, and kuk = 0 iff u = 0.

(ii) kkuk = |k| kuk.

(iii) (Triangle inequality) ku + vk kuk + kvk.

A norm does not always have a corresponding inner product, but if you are given an
inner product, you can always define a norm in terms of it.

Lemma 5.3.12. If V is an inner product space, then a norm can be defined on V by


p
kvk := hu, ui.

Proof. HW. For the triangle inequality, use Cauchy-Schwarz inequality: |hx, yi| kxkkyk.

Theorem 5.3.13. Suppose (V, h, iC ) is an inner product space, where the matrix C = [cij ]
is defined by cij = hui , uj i, for some ONB S = (u1 , u2 , . . . , un ). The for vectors expressed
5.3 Inner product spaces 119

in terms of this ONB, the inner product behaves just like the dot product, i.e.,

X X X
v= ai ui , w = bi ui = hv, wiC = ai bi .

Proof. With respect to the basis S, the matrix C = [cij ] defined by cij = hui , uj i is just
the identity matrix I, so

b1

h i
b2 X
hv, wi = [v]TS C[w]S = [v]TS [w]S = a1 a2 ... an

=
a i bi .
...

bn

HW 5.3: #2, 3, 4, 10, 20, 24, 41, 48


Let C 1 (a, b) be the set of continuously differentiable functions on [a, b]:

.
C 1 (a, b) = {f : [a, b] R .. f 0 exists and is continuous on [a, b]}.

Let C01 (a, b) be the subset of C 1 (a, b) consisting of functions which vanish at the endpoints:

.
C01 (a, b) = {f C 1 (a, b) .. f (a) = f (b) = 0}.

Consider the transformation

D : C 1 (a, b) C 0 (a, b) defined by D(f ) = f 0 .

(1) Check that C 1 (a, b) is a vector space, and that C01 (a, b) is a subspace of C 1 (a, b).

(2) Check that D is a linear transformation.

(3) Check that D is a antisymmetric transformation, in the sense that

hf, Dgi = hDf, gi,

Rb
where hf, gi := a
f (t)g(t) dt.

(4) Explain why the property in the previous part agrees with the meaning of antisym-
metric as we apply it to matrices.
120 Length and direction

5.3.1 Discursion relating to ODE and many other applications

There are some technical issues that arise with infinite-dimensional vector spaces. Consider
the example of the vector space of sequences RN : this is infinite-dimensional with basis

1,

j=k
(e1 , e2 , e3 , . . . ) = (ej )
j=1 , (ej )k = ej (k) =
0, else.

The definition of the norm given here may not exist for all sequences.

Example 5.3.14. For u = (1, 1, 1, . . . ) RN ,


X
X n
X
kuk2 = u2i = 1 = lim 1 = lim n = .
n n
i=1 i=1 i=1

This limit doesnt exist, so neither does kuk!

Conclusion: RN is too big. Instead, consider just the sequences with finite norm:

.
`2 (N) := {u RN .. kuk < }.

Further complication: with infinite-dimensional vector spaces, there are different norms:
the norm may determine whether or not something is in the vector space. For example:


.
X
`1 (N) := {u RN .. kuk1 < }, kuk1 := |ui |.
i=1

Then one can show the strict containment `1 (N) `2 (N). For example, consider

u = (1, 21 , 13 , . . . , n1 , . . . ).

Then you have


!1/2
X X 1 2 X X 1
kuk2 = u2n = 2
= but kuk1 = |un | = = .
n=1 n=1
n 6 n=1 n=1
n

What this means in terms of the normed vector spaces `2 (N) and `1 (N), is that

u `2 (N) but / `1 (N).


u
5.3 Inner product spaces 121

In general, for 1 p < , the vector spaces


!1/p
.
X
p p
` (N) := {u RN .. kukp < }, kukp := |ui |
i=1

come up often, and are well-studied in a variety of applications. (Note that p, q do not
need to be integers!) The limiting case is also well-studied:

.
` (N) := {u RN .. kuk < }, kuk := sup |ui |.

For 1 p < q , one has a strict containment `p (N) `q (N).

Remark 5.3.15. Of all the normed vector spaces `p (N), for 1 p , the only one of
these norms that has an associated inner product (i.e., that can be defined in terms of an
p
inner product via kuk = h, u, ui) is the case p = 2.

Theorem 5.3.16 (Cauchy-Schwarz Ineq). In any inner product space, |hx, yi| kxkkyk.

Alternative form in Rn : if you square both sides, this reads

n
!2 n
! n
!
X X X
xi yi x2i yi2
i=1 i=1 i=1

The proof requires a lemma about numbers.

Lemma 5.3.17. Let a, b, c be fixed (real) numbers with b, c 0, and let t > 0 be a real
variable. Then
0 b 2ta + t2 c, t = a2 bc.

Proof. First, suppose c = 0. Then for every t R, we have

1
2ta b = a 2t b.

Since this is true for ANY t, must have a = 0.


Now suppose c 6= 0. Since we are assuming the inequality to be true for ANY t, it is
true for t = ac , in which case

a2 a2
0 b 2 ac a + c2 c = c b = a2 bc.

Pn
Proof of Cauchy-Schwarz in Rn . Let t > 0 be a variable again. Then i=1 (xi tyi )
2
0,
122 Length and direction

because it is a sum of nonnegative numbers. Expanding the summand,

n
X
0 (xi tyi )2
i=1
n
X
= (x2i 2tyi2 + t2 yi2 ) FOIL
i=1
Xn n
X n
X
= x2i 2txi yi + t2 yi2 rearrange
i=1 i=1 i=1
Xn n
X n
X
= x2i 2t xi yi + t2 yi2 factor out
i=1 i=1 i=1

= b 2ta + t2 c,

Pn Pn Pn
where b = i=1 x2i 0 and c = i=1 yi2 0 and a = i=1 xi yi may be any number. By
the lemma, a2 bc, and hence

n
!2 n
! n
!
X X X
xi yi x2i yi2 .
i=1 i=1 i=1

Theorem 5.3.18 (Triangle Inequality). kx + yk kxk + kyk.

Proof. We work with the squares and take roots at the end.

kx + yk2 = hx + y, x + yi

= hx, xi + 2hx, yi + hy, yi

= kxk2 + 2hx, yi + kyk2 kuk2 = hu, ui

kxk2 + 2|hx, yi| + kxk2 u |u|

kxk2 + 2kxkkyk + kyk2 Cauchy-Schwartz

= (kxk + kyk)2

kx + yk kxk + kyk.

Corollary 5.3.19 (Pythagoras). Let x, y be orthogonal. Then kx + yk2 = kxk2 + kyk2 .

Proof. If x, y orthogonal, then hx, yi = 0, so from the previous proof,

kx + yk2 = kxk2 + 2hx, yi + kyk2 = kxk2 + kyk2 .

Theorem 5.3.20 (Polarization identity). For x, y Rn , 4hx, yi = kx + yk2 kx yk2 .


5.3 Inner product spaces 123

Proof. FOIL out kx + yk2 and kx yk2 :

kx + yk2 = hx + y, x + yi = kxk2 + 2hx, yi + kyk2

kx yk2 = hx y, x yi = kxk2 2hx, yi + kyk2 .

Subtracting the second from the first gives the identity.

The polarization identity allows you to express a dot product in terms of norms.

Theorem 5.3.21 (Parallelogram law). For x, y Rn , kxk2 + kyk2 = kx + yk2 kx yk2 .

Proof. HW.

Think about this diagram when studying the Parallelogram and Polarization identities:

? NNNN gg3
NNN ggggggggg
NNNxy gggg
NNN gggggggg
g
ggNN
y
gg ggggg NNNN
ggx+y
g gg NNN
gggg ggggg NNN
g N/'
gg x

HW 5.3: #16, 33, 34, 35, 40, 42, 46


Keep the proof of the polarization identity in mind when doing #16.
124 Length and direction

5.4 The Gram-Schmidt Algorithm


Corollary 5.4.1. If {vi }ni=1 is an orthonormal basis of the inner product space (Rn , h, , i),
then for any u Rn ,

n
X
u= hu, vi ivi .
i=1

Example 5.4.2. Let



2 2 1
3 3 3 3

u = 4 , and v1 = 23 , v2 = 1 , v3 = 2

3 3


1
5 3 23 2
3

One can check that {v1 , v2 , v3 } is orthonormal, so there is a unique way to write u as a
linear combination of the vi s.
Without the theorem, youd need to solve a system of 3 equations in 3 unknowns:

u = c1 v1 + c2 v2 + c3 v3 .

With the theorem, you only need to compute some inner products:

6 8 5
c1 = hu, v1 i = 3 3 + 3 = 1,
6 4 10
c2 = hu, v2 i = 3 + 3 3 = 0,
3 8 10
c3 = hu, v3 i = 3 + 3 + 3 = 7,

to find u = v1 + 7v3 .

The Gram-Schmidt process is an algorithm which removes linear dependency from a


set of given vectors in an inner product space. In particular, this means it can be used to
convert a generic basis into an orthonormal basis.

Definition 5.4.3. When U = span{u1 , . . . , uk } is a subspace of a vector space V , and


{u1 , . . . , uk } are orthogonal vectors,

hv, u1 i hv, un i
projU v := u1 + + un .
hu1 , u1 i hun , un i

So v = projU v + (v projU v), where projU v U and projU v (v projU v).


5.4 The Gram-Schmidt Algorithm 125

Theorem 5.4.4 (Gram-Schmidt). Let V be an inner product space and let W be a nonzero
m-dimensional subspace of V . Then there exists an orthonormal basis for W .

Proof. Since every vector space has a basis, let S = {u1 , u2 , . . . , um } be any basis for W .
We construct a new basis, using these vectors, as follows:

(1) Define v1 := u1 .

(2) Use projections to decompose u2 into its components which are parallel to and
orthogonal to v1 = u1 :


u2 = projv1 u2 + u2 projv1 u2 ,

and then define

hu2 , v1 i
v2 := u2 projv1 u2 = u2 v1
hv1 , v1 i

Note that v2 6= 0, since {ui }m


i=1 is a basis of W , and is hence linearly independent.

Also, note that v2 is orthogonal to v1 by construction:


 
hu2 , v1 i
hv1 , v2 i = v1 , u2 v1
hv1 , v1 i
 
hu2 , v1 i
= hv1 , u2 i v1 , v1
hv1 , v1 i
hu2 , v1 i
= hv1 , u2 i hv1 , v1 i
hv1 , v1 i
= 0.

(3) Use projections to decompose u3 into its components which lie in span{v1 , v2 } and are
orthogonal to span{v1 , v2 }:

 
u3 = projspan{v1 ,v2 } u3 + u3 projspan{v1 ,v2 } u3 ,

and then define

v3 := u3 projspan{v1 ,v2 } u3 .

6 0, and that v3 is orthogonal to both v1 and v2 by construction.


Again, note that v3 =
126 Length and direction

(4) Continue to construct the remaining basis elements via

vk := uk projspan{v1 ,...,vk1 } uk ,

until you run out of uk s (when k = m).

By construction, this gives an orthogonal basis {vi }m


i=1 of W . For the final step, define

vi
wi := ,
kvi k

to obtain an orthonormal basis {wi }m


i=1 .

Example 5.4.5. Find an ONB for the subspace of R3 spanned by {x, y, z}, where

1 2 3

x = 1 , y = 0 , z = 3 .


0 2 3

(Note that these are independent.)


First:

1

v1 = x = 1 .


0

Second: compute v2 = y projspan{v1 } y.


2 1 1
2+0+ 0
v2 = y projv1 y = 0 p  2 1 = 1


2
1 + (1)
 2
2  0 2

Check that v2 v1 : hv2 , v1 i = 1 1 + 0 = 0.


Third: compute v3 = z projspan{v1 ,v2 } z.


3 1 1 33+1 1
3+3+0 336
v3 = z projv1 z projv2 z = 3 2 1 2 1 = 3 + 3 + 1 = 1

2 6
3 0 2 302 1

Check that v3 span{v1 , v2 }: hv3 , v1 i = 1 1 + 0 = 0 and hv3 , v2 i = 1 + 1 2 = 0.


5.4 The Gram-Schmidt Algorithm 127

So {v1 , v2 , v3 } is an orthogonal basis. To normalize and obtain an ONB, take



1 1 1
1 1 1
w1 = 1 , w2 = 1 , w3 = 1 .

2 6 3
0 2 1

HW 5.4: #7, 8, 10, 11, 14, 18, 3133


128 Length and direction

5.5 Orthogonal complements


Definition 5.5.1. Let W be a subspace of an inner product space V . A vector u V is
orthogonal to W iff it is orthogonal to every vector in W , i.e.,

wW = hu, wi = 0.

The set of all vectors in V that are orthogonal to W is called the orthogonal complement
of W and written

.
W := {v V .. v W }.

Theorem 5.5.2. Let W be a subspace of an inner product space V . Then

(1) W is a subspace of V .

(2) W W = {0}.

In cases when V is finite-dimensional, we also have

(3) V = W W .

(4) (W ) = W .

Proof. Part (1) already appeared in the HW: suppose that x, y W . Then

wW = hw, ax + byi = ahw, xi + bhw, yi = a 0 + b 0 = 0.

For part (2), if u W W , then u is orthogonal to itself: hu, ui = 0. But then


kuk = 0, so u = 0.
For part (3), let {wi }ki=1 by a basis of W , which we can take to be orthonormal by
Gram-Schmidt. For v V , take u = v projW v. Then
* k
+
X
hu, wi i = hv projW v, wi i = hv, wi i hv, wj iwj , wi
j=1
k
X
= hv, wi i hv, wj i hwj , wi i
j=1

= hv, wi i hv, wi i hwi , wi i

=0
5.5 Orthogonal complements 129

which shows that u is orthogonal to every basis vector of W , and hence also to every linear
combination of basis vectors. In other words, u is orthogonal to everything in W . So
u W .
For part (4), if w W , then w is orthogonal to every u W , so w (W ) . This
shows that W is a subspace of (W ) . To see that (W ) is a subspace of W , pick any
v (W ) and show that v W . By part (3), we can write v as v = w + u with w W
and u W , so well have v W if we can show u = 0:

hu, ui = hu, v wi = hu, vi hu, wi = 0 0 = 0.

Theorem 5.5.3 (Fundamental Theorem of Linear Algebra). Let A be an m n matrix.

(a) The null space of A is the orthogonal complement of the row space of A.

(b) The null space of AT is the orthogonal complement of the column space of A.

Rm = colsp(A) null(AT)
Rn = rowsp(A) null(A)

rowsp(A) = ran(AT) Axp = b


rank(A) = r = dim rowsp(A) colsp(A) = ran(A)
Ax = b rank(A) = r = dim colsp(A)
xp
x = xp + xh Axh = 0 b

A
null(AT)
0 xh 0
null(A) nullity(AT) = m-r
AT
nullity(A) = n-r

Theorem 5.5.4. Let W be a finite-dimensional subspace of the inner product space V ,


and pick v V . The closest point of W to v is projW v, i.e.,

kv projW vk = min kv wk.


wW

Proof. Letting w W and decomposing v w according to V = W W , we have

v w = (projW v w) + (v projW v)
130 Length and direction

with projW v w) W and (v projW v) W . Since these are orthogonal, Pythagoras


says that

kv wk2 = k projW v wk2 + kv projW vk2 .

The choice of w that minimizes this expression is w = projW v, which leaves kv wk2 =
kv projW vk2 . The conclusion follows by taking square roots.

HW 5.5: #4, 10, 11, 13, 18, 22, 26, 28


Chapter 7

Eigenvalues and eigenvectors

7.1 Eigenvalues

Definition 7.1.1. Let L : V V be a linear operator on a vector space V . The number


is called an eigenvalue of L iff there is a nonzero vector x V such that

L(x) = x.

Any x satisfying this equation is an eigenvector of L for the eigenvalue .


The eigenspace of is

.
E := span{x .. Ax = x}.

Note that x 6= 0. Otherwise, L(x) = L(0) = 0 = 0 for any and the definition would
be meaningless.

Any scalar multiple of an eigenvector is also an eigenvector:

L(cx) = cL(x) = cx = (cx)

Theorem 7.1.2. If dim V = n and S = {vk }nk=1 is a basis for V , then

L(x) = Ax,

where the n n matrix A has k th column equal to the S-coordinates of L(vk ).


132 Eigenvalues and eigenvectors

Note, however, that the eigenvalues are independent of any choice of basis.

7.1.1 Finding eigenvalues

To go about finding eigenvalues:

L(x) = x x Ax = 0 (I A)x = 0.

This system always has the trivial solution x = 0, but we are interested exclusively in
nonzero solutions when looking for eigenvalues, which means we need to find that satisfy

det(I A) = 0.

Example 7.1.3. Suppose L([ 10 ]) = [ 12 ] and L([ 01 ]) = [ 21 ]. Find the eigenvalues and vectors
of L.

In the standard basis, L(x) = Ax for



1 2
A= ,
2 1

so find eigenvalues via



1 2

det(I A) = = ( 1)2 4 = 2 2 3 = ( + 1)( 3)
2 1

so the eigenvalues are = 1, 3.

Now, to find the eigenvectors.

For = 1, solve the homogeneous system (I A)x = 0:



1 1 2 2 2 1 1 1
= = x1 + x2 = 0 = x= .
2 1 1 2 2 0 0 1

For = 3, solve the homogeneous system (I A)x = 0:



31 2 2 2 1 1 1
= = x1 x2 = 0 = x=
2 31 2 2 0 0 1
7.1 Eigenvalues 133

Check that these are eigenvectors:



1 2 1 12 1 1
= 1 : = = = (1)
2 1 1 21 1 1

1 2 1 1+2 3 1
=3: = = = (3)
2 1 1 2+1 3 1

Plotting these vectors before and after multiplication by A, one sees that (1, 1) flips
and (1, 1) is scaled by a factor of 3.

Example 7.1.4. Solve the eigenproblem for



4 2 2

A = 2 1 4 .


0 0 3


4 2
2


3 2
det(I A) = 4 = 6 + 5 + 12 + 0 + 0 0 0 (12 4)

2 +1

0 3

0

= (2 6 + 9)

= ( 3)2

so the eigenvalues are = 0, 3, 3. Here 3 is an eigenvalue of multiplicity 2.

It is important to record the multiplicity of the eigenvalues!

Now, to find the eigenvectors.

For = 0, solve the homogeneous system (I A)x = 0:



4 2 2 4 2 0 2 1 0

4 2 0 0

2 1 1 0 1

0 0 3 0 0 1 0 0 0
134 Eigenvalues and eigenvectors

This gives x3 = 0 and 2x1 + x2 = 0, whence an eigenvector is any multiple of



1

x = 2


0

For = 3, solve the homogeneous system (I A)x = 0:



1 2 2 1 2 2

4 0

2 4 0 0

0 0 0 0 0 0

This gives x1 + 2s + 2t = 0, whence an eigenvector is any vector of the form



2s 2t 2 2

x= = s 1 + t 0

s

t 0 1

In other words, E3 is a 2-dimensional eigenspace with basis





2 2



1 , 0





0 1

Check that these are eigenvectors:



4 2 2 1 44+0 0 1

=0: 2 1 4 2 = 2 + 2 + 0 = 0 = (0) 2


0 0 3 0 0+0+0 0 0

4 2 2 2 8 + 2 + 0 6 2

=3: 2 1 4 1 = 41+0 = 3 = (3) 1


0 0 3 0 0+0+0 0 0

4 2 2 2 8 + 2 + 0 6 2

2 1 4 0 = 0+0+0 = 0 = (3) 0


0 0 3 1 3+0+0 3 1

Geometric interpretation: there is a projection. The eigenvector for = 0 is a basis for


7.1 Eigenvalues 135

null(A). You can see that it is orthogonal to the rowspace of A:



4 2 2 2 1 0

2 0



A = 2 1 4 0 = rowspace(A) = span 1 , 0

0 1




0 0 3 0 0 0 0 1

Note that this basis for null(A) has nothing to do with the eigenvectors of A except that
they are orthogonal to E0 (the similarity to the basis vectors of E3 is coincidental).

Handy tip: To factorize a polynomial of degree 3 or higher, start by looking for


integer factors of a0 . This will give you one eigenvalue; then you can reduce the degree by
polynomial long division.

Example 7.1.5. Suppose p() = 8 + 2 52 + 3 . The factors of 8 are 1, 2, 4, 8,


so check these by evaluation:

p(1) = 8 + 2 5 + 1 = 6 6= 0

p(1) = 8 2 5 1 = 0,

so = 1 is a root. Then divide out ( + 1) to get

p()
= 2 6 + 8.
+1

Now it is easy to see that the other eigenvalues are = 2 and = 4.


Refresher on polynomial long division: x2 6x + 8

x+1 x3 5x2 + 2x + 8
x3 x2

6x2 + 2x
6x2 + 6x

8x + 8
8x 8

Theorem 7.1.6. Suppose you have a polynomial

p() = a0 + a1 + a2 2 + + an1 n1 + n .
136 Eigenvalues and eigenvectors

Then the product of the roots is (1)n a0 . Furthermore, if the coefficients ai are integers,
then the roots must also be integers.

HW 7.1: #6abd, 7, 8ac, 14, 15 (&16), 18, 21, 25, 27, 28


Also: prove that if 0 is an eigenvalue of A, then E0 = null(A). Conversely, if null(A) 6=
{0}, prove that 0 is an eigenvalue of A.

7.1.2 Complex and degenerate eigenvalues

Example 7.1.7. Suppose L is given by L(x) = Ax where



cos sin
A= ,
sin cos

Find the eigenvalues and vectors of L.


Geometric interpretation of this example: the given transformation has the effect of

rotating everything in R2 counterclockwise by 2.

Every vector in R2 changes direction under this transformation.


Therefore, this transformation has no eigenvalues. Or does it?
Since

cos

sin
det(I A) = = 2 2 cos + 1
sin cos

p
so the eigenvalues are = cos cos2 1.
If a polynomial with real coefficients has complex roots, these roots will always
appear as complex conjugate pairs. In other words, if a + ib is a root, then a ib
will also be a root. Note: ( (a + ib))( (a ib)) = 2 2a + (a2 + b2 ).
To make the computations less intense, well find the eigenvectors for the case when

= 2, in which case the eigenvalues are

p
= cos cos2 1 = 0 0 1 = 1 = i.

For = i, solve the homogeneous system (I A)x = 0:



i0 1 i 1 i 1 1
= ix1 + x2 = 0 = x= .
1 i0 i 1 0 0 i
7.1 Eigenvalues 137

Notice that

1
i(i) = (1)(i)2 = (1)(1) = 1 so = i = (1)i = (i)3 .
i

So another valid choice for the eigenvector would be



1 1 i
ix = =
i i 1

We saw already that if x is an eigenvector, then so is cx, where c is any scalar.


c is now allowed to be a complex scalar.

For = i, solve the homogeneous system (I A)x = 0:



i 0 1 i 1 i 1 1
= ix1 x2 = 0 = x=
1 i 0 i 1 0 0 i

Check that these are eigenvectors:



0 1 1 0+i i 1
=i: = = = (i)
1 0 i 1+0 1 i

0 1 1 0i i 1
= i : = = = (i)
1 0 i 1+0 1 i

Geometric interpretation of this example: the given matrix has the effect of rotating

everything in R2 counterclockwise by 2.

Every vector in R2 changes direction under this transformation.


Multiplication by a complex number of norm 1 corresponds to rotation; see the related
extra credit assignment. (In general, multiplying by a complex number involves both
rotation and scaling.)

Example 7.1.8. Suppose L([ 10 ]) = [ 10 ] and L([ 01 ]) = [ k1 ]. Find the eigenvalues and vectors
of L.

This transformation is a shear : the x-axis is fixed and the vector on the y-axis at
height a gets shifted horizontally ka to the right. What lines in R2 are invariant under
this transformation?
138 Eigenvalues and eigenvectors

In the standard basis, L(x) = Ax for



1 k
A= ,
0 1

so find eigenvalues via



1 k

det(I A) = = 2 2 + 1 0 = ( 1)2
1

0

so the eigenvalues are = 1, 1.

Now, to find the eigenvectors.

For = 1, solve the homogeneous system (I A)x = 0:



11 k 0 k 0 1 t
= = x2 = 0 = x= ,
0 11 0 0 0 0 0

and

1
E2 = span .
0

Now the multiplicity of = 2 was 2, but dim E2 = 1. What happened?


(The geometric multiplicity of is defined to be dim E , so this is sometimes referred to as
the situation where the geometric multiplicity of is less than the algebraic multiplicity.)

A is defective in some sense: it has fewer eigenvectors than it should.

Definition 7.1.9. Suppose Ax = x. Define to be a degenerate eigenvector or generalized


eigenvector for iff (I A) = x.

Now if is a generalized eigenvector,

(I A)2 = (I A)(I A) = (I A)x = 0,

since x was an eigenvector. Note that is not an eigenvector. If it were, then youd have
(I A) = 0, but (I A) = x 6= 0 (0 is never an eigenvector).
7.1 Eigenvalues 139

For generalized eigenvectors of = 2, solve the nonhomogeneous system (I A) = x:



0 k 1 t 0 1
= 2 = k1 = = = + t
0 0 0 k1 k1 0

From what weve seen before, any solution of (I A) = x can be written as = p + h


with p rowspace(I A) and h E0 .
Check the eigenvector:

1 k 1 1+0 1 1
=1: = = = (1)
0 1 0 0+0 0 0

Check the degenerate eigenvector:



1 k t 1 k 1 1 k 0 1 1 t1
= t + = t + =
0 1 k1 0 1 0 0 1 k1 0 k1 k1

t
h i
Thus, when multiplied by A, every vector of the form k
1 is shifted by 1 in the direction
parallel to x = [ 0t ].
In other words, even though it is not a subspace of R2 , the line

0 1 .

+ t .. t R
1 0
k

is invariant under A.

HW 7.1: #6c, 9, 10, 11, 12, 22, 23, 29, 30b


For #23, it may help to look at 6c, which is an example of this phenomenon.
Also: find the eigendata for the 2 2 matrix [k, 0; 0, k]. Can you give an example of a
vector in R2 which is not an eigenvector of this matrix?
140 Eigenvalues and eigenvectors

7.2 Diagonalization and similar matrices


P
Weve seen that if u = ai vi is a linear combination of eigenvectors of L : V V , then

X  X X
L(u) = L ai vi = ai L(vi ) = ai i vi . (7.2.1)

Suppose that dim V = n, and that there are n linearly independent eigenvectors.

In this case, the eigenvectors of L form a basis for V , and (7.2.1) can be

expressed as multiplication by a diagonal matrix.

This is a very desirable situation, because diagonal matrices are so easy to work with.
Let S denote this basis of eigenvectors, and let D be the diagonal matrix containing
the eigenvalues of L, so we have

1 a1
..

D= .. and [u]S = ,

. .

n an

where u is expressed in S-coordinates. Now (7.2.1) becomes


a1 1 1 a1
.. ..

L(u) =
X
ai i vi = ..
= = D[u]S .

. . .

an n n an

Definition 7.2.1. Let L : V V be a linear transformation. L is diagonalizable iff there


exists a basis in which L(x) = Dx for a diagonal matrix D. The matrix A is diagonalizable
iff it can be written A = P DP 1 for a diagonal matrix D.

To diagonalize L(x) = Ax:

1. Find the characteristic polynomial p() = det(I A).

2. Find the eigenvalues, i.e., the roots of p().

3. For each eigenvalue , find a basis for the eigenspace E .

If dim E = mult() for each , then A is diagonalizable.

If dim E < mult() for some , then A is not diagonalizable.

4. If A is diagonalizable, write D = diag(1 , . . . , n ) and P = [v1 . . . vn ].


7.2 Diagonalization and similar matrices 141

Here, the vk are the columns of P , and they must appear in the same order as the k .
Note: this method assumes you already have A, i.e., you have already picked a basis).

Example 7.2.2. Let L : R3 R3 be defined by



u1 2u1 u3

L u2 = u1 + u2 u3



u3 u3

With respect to the standard basis, this operator is given by matrix multiplication by

2 0 1

A= 1 1 .

1

0 0 1

However, consider the ordered basis





1 0 1


S = 0 , 1 , 1 .





1
0 0

The transition matrix P from S to the standard basis is



1 0 1

P = 0 1 ,

1

1 0 0

so the representation of L with respect to S is given by multiplication by



0 0 1 2 0 1 1 0 1 1 0 0

D = P 1 AP = 1 1 0 1 = 0 0 .

1 1 1 1 1 1

1 0 1 0 0 1 1 0 0 0 0 2

Definition 7.2.3. If A and B are n n matrices, they are called similar iff there is a
nonsingular matrix P such that B = P 1 AP . This is written A ' B.

The following results are the two key facts about similar matrices.

Theorem 7.2.4. A ' B if and only if A and B represent the same linear transformation
L : V V . (If A 6= B, then there are two different bases involved.)
142 Eigenvalues and eigenvectors

Sketch of proof. Since P is invertible, it can be understood as a transition matrix between


two bases, with a bit of computation. Conversely, if there are two bases given, then take P
to be the transition matrix and A ' B will follow.

Corollary 7.2.5. Similar matrices have the same eigenvalues.

Proof. Since eigenvalues are independent of any choice of basis for V , this follows from the
above theorem.

One can also prove the corollary with a calculation that is useful in its own right.
Suppose that A ' B and we denote the corresponding characteristic polynomials by pA ()
and pB (). Then:

pB () = det(I B)

= det(I P 1 AP )

= det P 1 IP P 1 AP


= det P 1 (I A)P


= det P 1 det(I A)(det P )




= det(I A)

= pA ()

For finite-dimensional vector spaces, this discussion implies:

Theorem 7.2.6. A linear transformation L : V V is diagonalizable if and only


if the eigenvectors of L form a basis of V . Moreover, if D is the matrix representing
L in this basis, then the entries on the diagonal of D are the eigenvalues of L.

In terms of matrices, this means:

Theorem 7.2.7. An n n matrix A is similar to a diagonal matrix D if and only if A


has n linearly independent eigenvectors. In this case, the entries on the diagonal of D are
the eigenvalues of A.

There is also a useful result that tells us when we are in this desirable situation:

Theorem 7.2.8. If 1 , . . . , k are eigenvalues of A that are all different (i.e., all have
multiplicity 1), then the corresponding eigenvectors {v1 , . . . , vk } are linearly independent.
In particular, if all the eigenvalues of A are distinct, then A is diagonalizable.
7.2 Diagonalization and similar matrices 143

Example 7.2.9. It is easy to tell if triangular matrices are diagonalizable:



2 0 0 0


1 4 0 0
A=
2

2 5 0

1 3 8 1

Note that the theorem gives a sufficient but not a necessary condition! A matrix can
be diagonalizable without having distinct eigenvalues, just like an animal can have feathers
without being a duck!

Example 7.2.10. Earlier, we looked at the diagonalization



2 0 1 1 0 1 1 0 0 0 0 1

1
1 = A = P DP = 0 0 1 1 .

1 1 1 1 0 1 1

0 0 1 1 0 0 0 0 2 1 0 1

From Theorem 7.2.6, we can infer that the eigenvalues of A are = 2, 1, 1.


Can you also see this just from looking at A?

HW 7.2: #6, 8, 10, 16, 17abc, 22, 24


I recommend doing #24 after doing (2) below:

(1) If A is similar to B, write A ' B. Show that similarity is an equivalence relation,


i.e., that it satisfies the following properties:

(a) A ' A.
(b) If A ' B, then B ' A.
(c) If A ' B and B ' C, then A ' C.

(2) Suppose that A ' B. In this case, prove the following:

(a) If A ' B, then AT ' B T .


(b) det A = det B.
(c) If A is nonsingular, then B is nonsingular.
(d) If A is nonsingular, then A1 ' B 1 .

If you do the Dynamical Systems extra credit, you will encounter a matrix in Problem
4 which superficially resembles the matrix in #16a (above) very closely. However, when
you work through these problems, youll notice that they behave very differently.
144 Eigenvalues and eigenvectors

7.3 Diagonalization of symmetric matrices


Given a n n matrix A, we consider the problem of diagonalizing A: A = P DP 1 .
It turns out that the case of symmetric matrices A = AT is especially important.

Lemma 7.3.1. Let A be a symmetric matrix with eigenvalues {k }nk=1 and corresponding
eigenvectors {vk }nk=1 . Then

(i) If j 6= k , then vj vk .

(ii) If j has multiplicity m, then j has m linearly independent eigenvectors.

Proof. For (i), we use the key fact about symmetric matrices: you can move them around
inside an inner product.

j hvj , vk i = hj vj , vk i = hAvj , vk i = hvj , Avk i = hvj , k vk i = k hvj , vk i.

Since j 6= k , the only way this can be true is if hvj , vk i = 0.


The proof of (i) would take a couple of weeks, so well skip that one, too.

Symmetric matrices are important because symmetric operators are important.

Definition 7.3.2. L : V V is a symmetric linear operator iff hL(x), yi = hx, L(y)i for
any x, y V .

It is easy to see that L is symmetric iff its representation L(x) = Ax in any basis is
given by a symmetric matrix:

hAx, yi = hL(x), yi = hx, L(y)i = hx, Ayi = hAT x, yi = A = AT .

Theorem 7.3.3 (Spectral theorem for finite-dimensional vector spaces). Let dim V = n
and let L : V V be a symmetric linear operator (hL(x), yii = hx, L(y)i). Then V has a
ONB consisting of eigenvectors of L, and all the eigenvalues of L are real.

Proof. Well skip the proof that R, as this could take a couple of weeks. The part
about the ONB follows from the lemma: for each eigenspace E , you can find an ONB by
Gram-Schmidt. By part (i) of the lemma, all the eigenspaces are orthogonal, and part (ii)
of the lemma, collecting these bases gives you a collection of n vectors. Since they are all
orthogonal, you have an ONB.
7.3 Diagonalization of symmetric matrices 145

In the diagonalization A = P DP 1 of A, this means that P has the form



| |

P = v1

... vn

| |

and as weve seen in the HW,



v1T | | 1
..

PTP = .. P T = P 1
vn = =

. v1 ... .

vnT | | 1

Definition 7.3.4. A square matrix A is orthogonal iff AT = A1 , i.e., iff AT A = I.

Theorem 7.3.5. A is orthogonal if and only if the columns/rows of A form an ONB.

Theorem 7.3.6. If L : V V is defined by L(x) = Ax and A is orthogonal, then L


preserves all angles and distances.

Proof. For any x, y V ,

hL(x), L(y)i = hAx, Ayi def of L

= hx, AT Ayi hBu, vi = hu, B T vi

= hx, A1 Ayi A is orthogonal

= hx, yi.

Thus, the angle between x and y is the same, before and after applying L. This also shows
that kL(x)k2 = hL(x), L(x)i = hx, xi = kxk2 , so that the lengths of the vectors dont
change either.

As a consequence, the geometry of an orthogonal transformation is very simple: it


corresponds to a rigid rotation and/or reflection.

Definition 7.3.7. An isometry is a linear transformation for which kL(x)k = kxk for all
xV.

In terms of the Basis Lemma, L is an isometry if and only if L gives a bijection between
two ONBs. (Since both are orthogonal, all angles are preserved.)
146 Eigenvalues and eigenvectors


4 2 4

Example 7.3.8. Diagonalize A = 2 2 .

1

4 2 4

Eigenvalues:

4 2 4



3 2 2
det(I A) = 2 1 2 = 9 = ( 9)


4 2 4

So the eigenvalues are = 9, 0, 0.

Eigenvectors: 1 = 9.


5 2 4 1 0 1

9I A = 2 2 0 1 21

8

4 2 5 0 0 0


2

So x1 = x3 and x2 = 12 x3 gives eigenvector v1 = 1 .


2

2 = 3 = 0.

1
4 2 4 1 2 1

0I A = 2 1 2 0

0 0

4 2 4 0 0 0

So x1 + 12 x2 + x3 = 0, or x1 = 12 x2 x3 , which gives

2s t 2s t 1 1

= s + 0 = v2 = 2 , v3 = 0

s

t 0 t 0 1

It is clear that v2 v1 and v3 v1 but v2 6 v3 . However, this just results from laziness
with our choice of s = 2 and t = 1 to make the vk s have integer entries. We could instead
choose an orthogonal basis of E0 by finding an element av2 + bv3 E0 which is orthogonal
7.3 Diagonalization of symmetric matrices 147

to v2 :

a b
h i
hv2 , av2 + bv3 i = 1 = a + b + 4a = 5a + b,

2 0 2a

b

so choose b = 5a to get something orthogonal to v2 , so let a = 1 and b = 5:



1 1 4

v3 = (1) 2 + 5 0 = 2


0 1 5

This new v3 is orthogonal to both v1 and v2 , so normalizing gives an ONB.


The key points of this example are that for this symmetric matrix A:

(i) the eigenvalues of multiplicity k have k independent eigenvectors, and

(ii) we were able to find an ONB for R3 consisting of eigenvectors of A.

HW 7.3: #3, 4, 8, 9, 16, 17, 27, 37, 38


For #9, describe the transformation associated to each of these matrices in geometric
terms.
148 Eigenvalues and eigenvectors

7.3.1 Epilogue: an application of eigenvectors to calculus that


youve probably already seen (but maybe didnt know it)

In multivariable calculus, when searching for the extrema of f : Rn R, a R-valued


function of n variables, you first find critical points of f by looking for x = (x1 , . . . , xn )
where f (x) = 0. In single-variable calculus, this is called the 1st derivative test. Once
you have located a critical point x, the next step is to determine whether x is a minimum
or a maximum of f (or neither). For this, you use the 2nd derivative test:

If f 00 (x) < 0 then f has a local maximum at x.

If f 00 (x) > 0 then f has a local minimum at x.

If f 00 (x) = 0, the second derivative test says nothing about the point x, except that
it may be an inflection point.

The second partial derivatives of a function correspond to curvature (often referred to


as convexity or concavity in single-variable calculus). To check curvature in higher
dimensions (i.e., for a function of more than one variable), you need to consider all the
second-order partials, so they are collected into a matrix. The matrix of partials is called
the Hessian of f :

2f 2f 2f

x21 x1 x2 x1 xn
2
f 2f 2f
x2 x1 x22
x2 xn
H(f ) =
.. .. ..

.
..
.
. . .

2f 2f 2f
xn x1 xn x2 x2n

The Hessian is also the coefficient of the quadratic term in the Taylor expansion of a
multivariable function near x Rn :

1
f (x + y) f (x) + f (x)T y + y T H(x)y, for y 0.
2

For functions f : R2 R, this simplifies to



2f 2f
fx1 x1 fx1 x2
H(f ) = x2 21 x1 x2
= ,
f 2f
x2 x1 x22
fx2 x1 fx2 x2
7.3 Diagonalization of symmetric matrices 149

and you are taught the 2nd partial derivative test: if f (a, b) = 0 and

M := det (H(f )) = fx1 x1 fx2 x2 fx21 x2 ,

(note that M is a function of x1 and x2 ), then

(i) If M (a, b) > 0 and fx1 x1 (a, b) > 0 then f has a local minimum at (a, b).

(ii) If M (a, b) > 0 and fx1 x1 (a, b) > 0 then f has a local maximum at (a, b).

(iii) If M (a, b) < 0 then f has a saddle point at (a, b).

(iv) If M (a, b) = 0, the second derivative test says nothing about (a, b), except that it
may be an inflection point.

Whats really going on here? The eigenvectors of the Hessian are the directions of principal
curvature; intuitively, these are the directions in which the graph of the function is curving
most rapidly and least rapidly (or most rapidly in the negative direction).
Key point: Since mixed partials are equal, the Hessian is a symmetric matrix,
and so we know these eigenvectors will be orthogonal (or can be taken to be
orthogonal, in the case of a repeated eigenvalue). This has a physical consequence: the
directions of principal curvature are orthogonal.

If the eigenvalues are both positive, then you are in case (i) above, and f has a local
minimum at (a, b).

If the eigenvalues are both negative, then you are in case (ii) above, and f has a
local maximum at (a, b).

If the eigenvalues have different signs 1 > 0 > 2 , then you are in case (iii) above
and there is a saddle point at (a, b).

The formulation in terms of M above is a trick that calculus textbooks give you to test
the eigenvalues of the Hessian without knowing what eigenvalues are. For Rn with n > 2,
you look at the determinants of the upper left k k submatrices, for k = 1, . . . , n. If they
are all positive, then your eigenvalues will be all positive and you have a minimum; if they
alternate signs, then your eigenvalues will be all negative and you have a maximum.
Additional upshot: Since the eigendata are independent of basis, the 2nd partial
derivative test works for any choice of coordinate system.

Вам также может понравиться