235 Course Notes

Linear Algebra 2
Course Notes for MATH 235

Edition 1.1
D. Wolczuk
Copyright: D. Wolczuk, 1st Edition, 2011
Contents
7 Fundamental Subspaces 1
7.1 Bases of Fundamental Subspaces . . . . . . . . . . . . . . . . . . . 1
7.2 Subspaces of Linear Mappings . . . . . . . . . . . . . . . . . . . . 8
8 Linear Mappings 10
8.1 General Linear Mappings . . . . . . . . . . . . . . . . . . . . . . . 10
8.2 Rank-Nullity Theorem . . . . . . . . . . . . . . . . . . . . . . . . 15
8.3 Matrix of a Linear Mapping . . . . . . . . . . . . . . . . . . . . . 19
8.4 Isomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
9 Inner Products 31
9.1 Inner Product Spaces . . . . . . . . . . . . . . . . . . . . . . . . . 31
9.2 Orthogonality and Length . . . . . . . . . . . . . . . . . . . . . . . 36
9.3 The Gram-Schmidt Procedure . . . . . . . . . . . . . . . . . . . . 49
9.4 General Projections . . . . . . . . . . . . . . . . . . . . . . . . . . 53
9.5 The Fundamental Theorem . . . . . . . . . . . . . . . . . . . . . . 61
9.6 The Method of Least Squares . . . . . . . . . . . . . . . . . . . . . 64
10 Applications of Orthogonal Matrices 71
10.1 Orthogonal Similarity . . . . . . . . . . . . . . . . . . . . . . . . . 71
10.2 Orthogonal Diagonalization . . . . . . . . . . . . . . . . . . . . . . 74
10.3 Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
10.4 Graphing Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . 90
10.5 Optimizing Quadratic Forms . . . . . . . . . . . . . . . . . . . . . 96
10.6 Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . 99
11 Complex Vector Spaces 108
11.1 Complex Number Review . . . . . . . . . . . . . . . . . . . . . . . 108
11.2 Complex Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . 115
11.3 Complex Diagonalization . . . . . . . . . . . . . . . . . . . . . . . 122
11.4 Complex Inner Products . . . . . . . . . . . . . . . . . . . . . . . 125
11.5 Unitary Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . 134
11.6 Cayley-Hamilton Theorem . . . . . . . . . . . . . . . . . . . . . . 140
ii
Chapter 7
Fundamental Subspaces
The main purpose of this chapter is to review a few important concepts from the
rst six chapters. These concepts include subspaces, bases, dimension, and linear
mappings. As you will soon see the rest of the book relies heavily on these and other
concepts from the rst six chapters.
7.1 Bases of Fundamental Subspaces
Recall from Math 136 the four fundamental subspaces of a matrix.
DEFINITION
Fundamental
Subspaces
Let A be an m n matrix. The four fundamental subspaces of A are
1. The columnspace of A is Col(A) = Ax R
m
x R
n
.
2. The rowspace of A is Row(A) = A
T
x R
n
x R
m
.
3. The nullspace of A is Null(A) = x R
n
Ax =
0.
4. The left nullspace of A is Null(A
T
) = x R
m
A
T
x =
0.
THEOREM 1 Let A be an mn matrix. Then Col(A) and Null(A
T
) are subspaces of R
m
and Row(A)
and Null(A) are subspaces of R
n
.
Our goal now is to nd an easy way to determine a basis for each of the four funda-
mental subspaces.
REMARK
To help you understand the following two proofs, you may wish to pick a simple 32
matrix A and follow the steps of the proof with your matrix A.
1
2 Chapter 7 Fundamental Subspaces
THEOREM 2 Let A be an mn matrix. The columns of A which correspond to leading ones in the
reduced row echelon form of A form a basis for Col(A). Moreover,
dimCol(A) = rank A
Proof: We rst observe that if A is the zero matrix, then the result is trivial. Hence,
we can assume that rank A = r > 0.
Denote the columns of the reduced row echelon form R of A by r
1
, . . . , r
n
. Since
rank A = r, R contains r leading ones. Let t
1
, . . . , t
r
denote the indexes of the columns
of R which contain leading ones. We will rst show that J = r
t
1
, . . . , r
t
r
is a basis
for Col(R).
Observe that by denition of the reduced row echelon form the vectors r
t
1
, . . . , r
t
r
are distinct standard basis vectors of R
m
and hence form a linearly independent set.
Additionally, every column of R which does not contain a leading one can be written
as a linear combination of the columns which do contain leading ones, so Span J =
Col(R). Therefore, J is a basis for Col(R) as claimed.
Denote the columns of A by a
1
, . . . , a
n
. We will now show that ( = a
t
1
, . . . , a
t
r
is
a basis for Col(A) by using the fact that J is a basis for Col(R). To do this, we rst
need to nd a relationship between the vectors in J and (.
Since R is the reduced row echelon form of A there exists a sequence of elementary
matrices E
1
, . . . , E
k
such that E
k
E
1
A = R. Let E = E
k
E
1
. Recall that every
elementary matrix is invertible, hence E
1
= E
1
1
E
1
k
exists. Then
R = EA = [Ea
1
Ea
n
]
Consequently, r
i
= Ea
i
, or a
i
= E
1
r
i
.
To prove that ( is linearly independent, we use the denition of linear independence.
Consider
c
1
a
t
1
+ + c
r
a
t
r
=
0
Multiply both sides by E to get
E(c
1
a
t
1
+ + c
r
a
t
r
) = E
0
c
1
Ea
t
1
+ + c
r
Ea
t
r
=
0
c
1
r
t
1
+ + c
r
r
t
r
=
0
Thus, c
1
= = c
r
= 0 since r
t
1
, . . . , r
t
r
is linearly independent. Thus, ( is linearly
independent.
To show that ( spans Col(A), we need to show that if we pick any

b Col(A), we
can write it as a linear combination of the vectors a
t
1
, . . . , a
t
r
. Pick
b Col(A). Then,
Section 7.1 Bases of Fundamental Subspaces 3
by denition of the columnspace, there exists a x R
n
such that Ax =

b. Then we
get
b = Ax
b = E
1
Rx
E
b = Rx
Therefore, E
b is in the columnspace of R and hence can be written as a linear com-

bination of the basis vectors r
t
1
, . . . , r
t
r
. Hence, we have
E
b = d
1
r
t
1
+ + d
r
r
t
r
b = E
1
(d
1
r
t
1
+ + d
r
r
t
r
)
b = d
1
E
1
r
t
1
+ + d
r
E
1
r
t
r
b = d
1
a
t
1
+ + d
r
a
t
r
as required. Thus, Spana
t
1
, . . . , a
t
r
= Col(A). Hence, a
t
1
, . . . , a
t
r
is a basis for
Col(A).
Recall that the dimension of a vector space is the number of vectors in any basis.
Thus, dimCol(A) = r = rank A.
THEOREM 3 Let A be an m n matrix. The set of all non-zero rows in the reduced row echelon
form of A form a basis for Row(A). Hence,
dimRow(A) = rank A
Proof: We rst observe that if A is the zero matrix, then the result is trivial. Hence,
we can assume that rank A = r > 0.
Let R be the reduced row echelon form of A. By denition of the reduced rowechelon
form, we have that the set of all non-zero rows of R form a basis for the rowspace of
R. We will now show that they also form a basis for the rowspace of A.
As in the proof above, denote the columns of A by a
1
, . . . , a
n
. Let E = E
k
E
1
where E
1
, . . . , E
k
are elementary matrices such that E
k
E
1
A = R. We then have
A = E
1
R.
Let

b Row(A). Then
b = A
T
x
= (E
1
R)
T
x
= R
T
(E
1
)
T
x
Let y = (E
1
)
T
x. Then, we have

b = R
T
y, so

b Row(R). Thus, Row(A)
Row(R). Hence, the non-zero rows of R spans Row(A). Since they are also linearly
independent, they form a basis for Row(A), and the result follows.
We know how to nd a basis for the nullspace of a matrix from our work on nding
bases of eigenspaces in Chapter 6. You may have noticed in Chapter 6 that the
standard procedure for nding a spanning set of a nullspace always led to a linearly
independent set. We could prove this directly, but it would be awkward. Instead, we
rst prove the following important result.
THEOREM 4 (Dimension Theorem)
Let A be an m n matrix. Then
rank A + dimNull(A) = n
Proof: Let v
1
, . . . , v
k
be a basis for Null(A) so that dimNull(A) = k. Then we can
extend this to a basis v
1
, . . . , v
k
, v
k+1
, . . . , v
n
for R
n
. We will prove that Av
k+1
, . . . , Av
n
is a basis for Col(A).

Consider
0 = c
k+1
A(v
k+1
) + + c
n
A(v
n
) = A(c
k+1
v
k+1
+ + c
n
v
n
)
Then, c
k+1
v
k+1
+ + c
n
v
n
Null(A). Hence, we can write it as a linear combination
of the basis vectors for Null(A). So, we get
c
k+1
v
k+1
+ + c
n
v
n
= d
1
v
1
+ + d
k
v
k
d
1
v
1
d
k
v
k
+ c
k+1
v
k+1
+ + c
n
v
n
=
0
Thus, d
1
= = d
k
= c
k+1
= = c
n
= 0 since v
1
, . . . , v
n
is linearly independent.
Therefore, Av
k+1
, . . . , Av
n
is linearly independent.
Let

b Col(A). Then

b = Ax for some x R
n
. Writing x as a linear combination of
the basis vectors v
1
, . . . , v
n
gives
b = A(c
1
v
1
+ + c
k
v
k
+ c
k+1
v
k+1
+ + c
n
v
n
)
= c
1
Av
1
+ + c
k
Av
k
+ c
k+1
Av
k+1
+ + c
n
Av
n
But, v
i
Null(A) for 1 i k, so Av
i
=
0. Hence, we have
b =
0 + +
0 + c
k+1
Av
k+1
+ + c
n
Av
n
= c
k+1
Av
k+1
+ + c
n
Av
n
Thus, SpanAv
k+1
, . . . , Av
n
= Col(A).
Therefore, we have show that Av
k+1
, . . . , Av
n
is a basis for Col(L) and hence
rank A = dimCol(A) = n k = n dimNull(A)
as required.
COROLLARY 5 Let A be an m n matrix with rank A = r. Then
dimNull(A) = n r
dimNull(A
T
) = m r
We now can nd a basis for the four fundamental subspaces of a matrix. We demon-
strate this with a couple examples.
EXAMPLE 1 Let A =
_
_
1 2 1
1 1 2
1 0 5
_
_
. Find a basis for the four fundamental subspaces.
Solution: Row reducing A we get
_
_
1 2 1
1 1 2
1 0 5
_
_
1 0 5
0 1 3
0 0 0
_
_
= R
A basis for Row(A) is the set of non-zero rows of R. Thus, a basis is
_
_
_
_
1
0
5
_
_
,
_
_
0
1
3
_
_
_
_
(remember that in linear algebra, vectors in R
n
are always written as column vectors).
A basis for Col(A) is the columns of A which correspond to the columns of R which
contain leading ones. Hence, a basis for Col(A) is
_
_
_
_
1
1
1
_
_
,
_
_
2
1
0
_
_
_
_
.
To nd a basis for Null(A), we solve Ax =
0 as normal. That is, we rewrite the system

Rx =
0 as a system of equations:
x
1
+ 5x
3
= 0
x
2
3x
3
= 0
Hence, x
3
is a free variable and the general solution is
_
_
x
1
x
2
x
3
_
_
=
_
_
5x
3
3x
3
x
3
_
_
= x
3
_
_
5
3
1
_
_
Consequently, a basis for Null(A) is
_
_
_
_
5
3
1
_
_
_
_
.
To nd a basis for Null(A
T
), we can just use our method for nding the nullspace of
a matrix on A
T
. That is, we row reduce A
T
to get
_
_
1 1 1
2 1 0
1 2 5
_
_
1 0 1
0 1 2
0 0 0
_
_
So, we have
x
1
x
3
= 0
x
2
+ 2x
3
= 0
Thus, x
3
_
_
x
1
x
2
x
3
_
_
=
_
_
x
3
2x
3
x
3
_
_
= x
3
_
_
1
2
1
_
_
Therefore, a basis for the left nullspace of A is
_
_
_
_
1
2
1
_
_
_
_
.
EXAMPLE 2 Let A =
_
_
2 4 4 2
1 2 2 7
1 2 0 3
_
_
. Find a basis for the four fundamental subspaces.
Solution: Row reducing A we get
_
_
2 4 4 2
1 2 2 7
1 2 0 3
_
_
1 2 0 3
0 0 1 2
0 0 0 0
_
_
= R
The set of non-zero rows
_
_
_
_
1
2
0
3
_
_
,
_
_
0
0
1
2
_
_
_
_
of R forms a basis for Row(A).
The columns of A which correspond to the columns of R which contain leading ones
form a basis for Col(A). Hence, a basis for Col(A) is
_
_
_
_
2
1
1
_
_
,
_
_
4
2
0
_
_
_
_
.
To nd a basis for Null(A), we solve Ax =
0 as normal. That is, we rewrite the system

Rx =
0 as a system of equations:
x
1
+ 2x
2
3x
4
= 0
x
3
2x
4
= 0
Hence, x
2
and x
4
are free variables and the general solution is
_
_
x
1
x
2
x
3
x
4
_
_
=
_
_
2x
2
+ 3x
4
x
2
2x
4
x
4
_
_
= x
2
_
_
2
1
0
0
_
_
+ x
4
_
_
3
0
2
1
_
_
Hence, a basis for Null(A) is
_
_
_
_
2
1
0
0
_
_
,
_
_
3
0
2
1
_
_
_
_
.
We row reduce A
T
to get
_
_
2 1 1
4 2 2
4 2 0
2 7 3
_
_
1 0 1/4
0 1 1/2
0 0 0
0 0 0
_
_
Thus, we have
x
1

1
4
x
3
= 0
x
2

1
2
x
3
= 0
Consequently, x
3
_
_
x
1
x
2
x
3
_
_
=
_
_
1
4
x
3
1
2
x
3
x
3
_
_
= x
3
_
_
1/4
1/2
1
_
_
Therefore, a basis for the left nullspace of A is
_
_
_
_
1/4
1/2
1
_
_
_
_
.
REMARK
It is not actually necessary to row reduce A
T
to nd a basis for the left nullspace of A.
We can use the fact that EA = R where E is the product of elementary matrices used
to bring A to its reduced row echelon form R to nd a basis for the left nullspace. The
derivation of this procedure is left as an exercise.
Section 7.1 Problems
1. Find a basis for the four fundamental subspace of each matrix.
(a)
_
1 2 1
2 4 3
_
(b)
_
_
2 1 5
3 4 2
1 1 1
_
_
(c)
_
_
1 2 3 5
2 4 0 4
3 6 5 1
_
_
(d)
_
_
3 2 5 3
1 0 3 1
1 1 1 1
1 4 5 1
_
_
2. Let A be an n n matrix. Prove that Null(A) =
0 if and only if det A = 0.

3. Let B be an m n matrix.
(a) Prove that if x is any vector in the left nullspace of B, then x
T
B =
0
T
.
(b) Prove that if x is any vector in the left nullspace of B and y is any vector
in the columnspace of B, then x y = 0.
4. Invent a 2 2 matrix A such that Null(A) = Col(A).
7.2 Subspaces of Linear Mappings
Recall from Math 136 that we dened a linear mapping L : R
n
R
m
to be a function
with domain R
n
and codomain R
m
such that
L(sx + ty) = sL(x) + tL(y)
for all x, y R
n
and s, t R. We dened the range and kernel (nullspace) of a linear
mapping L : R
n
R
m
by
Range(L) = L(x) R
m
x R
n
Ker(L) = x R
n
L(x) =
0
It is easy to verify using the Subspace Test that Range(L) is a subspace of R
m
and
Ker(L) is a subspace of R
n
.
Additionally, we saw that every linear mapping L : R
n
R
m
is a matrix mapping. In
particular, we dened the standard matrix [L] of L by
[L] =
_
L(e
1
) L(e
n
)
_
where e
1
, . . . , e
n
is the standard basis for R
n
. It satises
L(x) = [L]x
for all x R
n
. This relationship between linear mappings and matrix mappings is
very important, and we will continue to look at this during the remainder of the book.
The purpose of the next two theorems is to further demonstrate this relationship.
THEOREM 1 If L : R
n
R
m
is a linear mapping, then Range(L) = Col([L]) and
rank[L] = dimRange(L).
THEOREM 2 If L : R
n
R
m
is a linear mapping, then Ker(L) = Null([L]) and
dim(Ker(L)) = n rank[L].
EXAMPLE 1 Let L : R
4
R
3
be the linear mapping with standard matrix
[L] =
_
_
0 1 0 2
1 2 1 1
2 4 3 1
_
_
Find a basis for the range of L and kernel of L.
Solution: To nd a basis for the range of L, we can just nd a basis for the
columnspace of [L]. Row reducing [L] gives
_
_
0 1 0 2
1 2 1 1
2 4 3 1
_
_
1 0 0 2
0 1 0 2
0 0 1 1
_
_
Section 7.2 Subspaces of Linear Mappings 9
Since the rst three columns of the reduced row echelon form a basis for the
columnspace of the reduced row echelon form of [L], the rst three columns of [L]
form a basis for Col([L]). Thus, a basis for the range of L is
_
_
_
_
0
1
2
_
_
,
_
_
1
2
4
_
_
,
_
_
0
1
3
_
_
_
_
.
To nd a basis for Ker(L), we will nd a basis for the nullspace of [L]. Thus, we need
to nd a basis for the solution space of the homogeneous system [L]x =
0. We found
above that the RREF of [L] is
_
_
1 0 0 2
0 1 0 2
0 0 1 1
_
_
. Thus, we have
x
1
+ 2x
4
= 0
x
2
2x
4
= 0
x
3
+ x
4
= 0
Then the general solution of [L]x =
0 is
x =
_
_
x
1
x
2
x
3
x
4
_
_
=
_
_
2x
4
2x
4
x
4
x
4
_
_
= x
4
_
_
2
2
1
1
_
_
So, a basis for Ker(L) is
_
_
_
_
2
2
1
1
_
_
_
_
.
The Dimension Theorem for matrices gives us the following result.
THEOREM 3 Let L : R
n
R
m
be a linear mapping. Then,
dimRange(L) + dimKer(L) = dim(R
n
)
1. Let L(x
1
, x
2
) = (x
1
, 2x
1
+ x
2
, 0).
(a) Prove that L is linear.
(b) Find the standard matrix of L.
(c) Find a basis for Ker(L) and Range(L).
2. Prove Theorem 1 and Theorem 2.
Chapter 8
Linear Mappings
8.1 General Linear Mappings
Linear Mappings L : V W
We now observe that we can extend our denition of a linear mapping to the case
where the domain and codomain are general vectors spaces instead of just R
n
.
DEFINITION
Linear Mapping
Let V and Wbe vector spaces. A mapping L : V Wis called linear if
L(sx + ty) = sL(x) + tL(y)
for all x, y V and s, t R.
REMARKS
1. As before, two linear mappings L and M are equal if and only if they have the
same domain, the same codomain, and L(v) = M(v) for all v in the domain.
2. The denition of a linear mapping above still makes sense because of the clo-
sure properties of vector spaces.
3. As before, we sometimes call a linear mapping L : V V a linear operator.
4. It is important not to assume that any results that held for linear mappings
L : R
n
R
m
also hold for linear mappings L : V W.
10
Section 8.1 General Linear Mappings 11
EXAMPLE 1 Let L : P
3
(R) R
2
be dened by L(a + bx + cx
2
+ dx
3
) =
_
a + d
b + c
_
.
(a) Evaluate L(1 + 2x
2
x
3
).
Solution: L(1 + 2x
2
x
3
) =
_
1 + (1)
0 + 2
_
=
_
0
2
_
.
(b) Prove that L is linear.
Solution: Let a
1
+b
1
x+c
1
x
2
+d
1
x
3
, a
2
+b
2
x+c
2
x
2
+d
2
x
3
P
3
(R) and s, t R,
then
L
_
s(a
1
+ b
1
x + c
1
x
2
+ d
1
x
3
) + t(a
2
+ b
2
x + c
2
x
2
+ d
2
x
3
)
_
= L
_
(sa
1
+ ta
2
) + (sb
1
+ tb
2
)x + (sc
1
+ tc
2
)x
2
+ (sd
1
+ td
2
)x
3
_
=
_
sa
1
+ ta
2
+ sd
1
+ td
2
sb
1
+ tb
2
+ sc
1
+ tc
2
_
= s
_
a
1
+ d
1
b
1
+ c
1
_
+ t
_
a
2
+ d
2
b
2
+ c
2
_
= sL(a
1
+ b
1
x + c
1
x
2
+ d
1
x
3
) + tL(a
2
+ b
2
x + c
2
x
2
+ d
2
x
3
)
Thus, L is linear.
EXAMPLE 2 Let tr : M
nn
(R) R be dened by tr A =
n
i=1
(A)
ii
(called the trace of a matrix).
Prove that tr is linear.
Solution: Let A, B M
nn
(R) and s, t R. Then
tr(sA + tB) =
n
i=1
(sA + tB)
ii
=
n
i=1
(s(A)
ii
+ t(B)
ii
)
= s
n
i=1
(A)
ii
+ t
n
i=1
(B)
ii
= s tr A + t tr B
Thus, tr is linear.
12 Chapter 8 Linear Mappings
EXAMPLE 3 Prove that the mapping L : P
2
(R) M
22
(R) dened by L(a + bx + cx
2
) =
_
a bc
0 abc
_
is not linear.
Solution: Observe that
L(1 + x + x
2
) =
_
1 1
0 1
_
But,
L
_
2(1 + x + x
2
)
_
= L(2 + 2x + 2x
2
) =
_
2 4
0 8
_
2L(1 + x + x
2
)
We now begin to show that many of the results we had for linear mappings
L : R
n
R
m
also hold for linear mappings L : V W.
THEOREM 1 Let V and Wbe vector spaces and let L : V Wbe a linear mapping. Then,
L(
0) =
0
DEFINITION
Addition
Scalar
Multiplication
Let L : V Wand M : V Wbe linear mappings. Then we dene L + M by
(L + M)(v) = L(v) + M(v)
and for any t R we dene tL by
(tL)(v) = tL(v)
EXAMPLE 4 Let L : M
22
(R) P
2
(R) be dened by L
__
a b
c d
__
= a + bx + (a + c + d)x
2
and
M : M
22
(R) P
2
(R) be dened by M
__
a b
c d
__
= (b + c) + dx
2
. Then L and M are
both linear and L + M is the mapping dened by
(L + M)
__
a b
c d
__
= L
__
a b
c d
__
+ M
__
a b
c d
__
= [a + bx + (a + c + d)x
2
] + [(b + c) + dx
2
]
= (a + b + c) + bx + (a + c + 2d)x
2
Similarly, 4L is the mapping dened by
(4L)
__
a b
c d
__
= 4L
__
a b
c d
__
= 4a + 4bx + (4a + 4c + 4d)x
2
Section 8.1 General Linear Mappings 13
THEOREM 2 Let V and W be vector spaces. The set L of all linear mappings L : V W with
standard addition and scalar multiplication of mappings is a vector space.
Proof: To prove that L is a vector space, we need to show that it satises all ten
vector spaces axioms. We will prove V1 and V2 and leave the rest as exercises. Let
L, M L.
V1 To prove that L is closed under addition, we need to show that L+M is a linear
mapping with domain V and codomain W.
By denition, the domain of L + M is V and for any v V we have
(L + M)(v) = L(v) + M(v) W
since L(v) W, M(v) W, and Wis closed under addition. Moreover, since L
and M are linear for any v
1
, v
2
V and s, t R we have
(L + M)(sv
1
+ tv
2
) = L(sv
1
+ tv
2
) + M(sv
1
+ tv
2
)
= sL(v
1
) + tL(v
2
) + sM(v
1
) + tM(v
2
)
= s[L(v
1
) + M(v
1
)] + t[L(v
2
) + M(v
2
)]
= s(L + M)(v
1
) + t(L + M)(v
2
)
Thus, L + M is linear so L + M L.
V2 For any v V we have
(L + M)(v) = L(v) + M(v) = M(v) + L(v) = (M + L)(v)
since addition in Wis commutative. Hence L + M = M + L.
We also dene the composition of mappings in the expected way.

DEFINITION
Composition
Let L : V Wand M : W U be linear mappings. Then we dene M L by
(M L)(v) = M(L(v))
for all v V.
EXAMPLE 5 Let L : P
2
(R) R
3
be dened by L(a+bx+cx
2
) =
_
_
a + b
c
0
_
_
and let M : R
3
M
22
(R)
be dened by M
_
_
_
_
x
1
x
2
x
3
_
_
_
_
=
_
x
1
0
x
2
+ x
3
0
_
. Then M L is the mapping dened by
(M L)(a + bx + cx
2
) = M(L(a + bx + cx
2
)) = M
_
_
_
_
a + b
c
0
_
_
_
_
=
_
a + b 0
c 0
_
Observe that M L is in fact a linear mapping from P
2
(R) to M
22
(R).
1. Determine which of the following mappings are linear. If it is linear, prove it.
If not, give a counterexample to show that it is not linear.
(a) L : R
2
M
22
(R) dened by L(a, b) =
_
a 0
0 b
_
(b) T : P
2
(R) P
2
(R) dened by T(a + bx + cx
2
) = (a b) + (bc)x
2
.
(c) L : P
2
(R) M
22
2
) =
__
1 0
a + c b + c
__
.
(d) T : M
22
(R) R
3
dened by T
__
a b
c d
__
=
_
_
a + b
b + c
c a
_
_
(e) D : P
3
(R) P
2
(R) dened by D(a + bx + cx
2
+ dx
3
) = b + 2cx + 3dx
2
(f) L : M
22
(R) R dened by L(A) = det A.
2. Let J be a basis for an n-dimensional vector space V. Prove that the mapping
L : V R
n
dened by L(v) = [v]
J
for all v V is linear.
3. Let L : V Wbe a linear mapping. Prove that L(
0) =
0.
4. Let L : V W and M : W U be linear mappings. Prove that M L is a
linear mapping from V to U.
5. Let L be the set of all linear mappings from V to Wwith standard addition and
scalar multiplication of linear mappings. Prove that
(a) tL L for all L L and t R.
(b) t(L + M) = tL + tM for all L, M L and t R.
Section 8.2 Rank-Nullity Theorem 15
8.2 Rank-Nullity Theorem
Our goal in this section is not only to extend the denitions of the range and kernel of
a linear mapping to general linear mappings, but to also generalize Theorem 8.2.3 to
general linear mappings. We will see that this generalization, called the Rank-Nullity
Theorem, is extremely usefully.
DEFINITION
Range
Kernel
Let L : V Wbe a linear mapping, then the kernel of L is
Ker(L) = v V L(v) =
0
W
and the range of L is

Range(L) = L(v) v V
THEOREM 1 Let L : V W be a linear mapping. Then Ker(L) is a subspace of V and Range(L)
is a subspace of W.
The procedure for nding a basis for the range and kernel of a general linear mapping
is, of course, exactly the same as we saw for linear mappings from R
n
to R
m
.
EXAMPLE 1 Find a basis for the range and kernel of L : P
3
(R) R
2
dened by
L(a + bx + cx
2
+ dx
3
) =
_
a + d
b + c
_
Solution: If a + bx + cx
2
+ dx
3
Ker(L), then
_
0
0
_
= L(a + bx + cx
2
+ dx
3
) =
_
a + d
b + c
_
Therefore, a = d and b = c. Thus, every vector in Ker(L) has the form
a + bx + cx
2
+ dx
3
= d cx + cx
2
+ dx
3
= d(1 + x
3
) + c(x + x
2
)
Since 1 + x
3
, x + x
2
is clearly linearly independent, it is a basis for Ker(L).
The range of L contains all vectors of the form
_
a + d
b + c
_
= (a + d)
_
1
0
_
+ (b + c)
_
0
1
_
So a basis for Range(L) is
__
1
0
_
,
_
0
1
__
.
EXAMPLE 2 Determine the dimension of the range and kernel of L : P
2
(R) M
32
(R) dened by
L(a + bx + cx
2
) =
_
_
a b
c b
a + b a + c b
_
_
2
Ker(L), then
_
_
0 0
0 0
0 0
_
_
= L(a + bx + cx
2
) =
_
_
a b
c b
a + b a + c b
_
_
Hence, we must have a = b = c = 0. Thus, Ker(L) =
0, so a basis for Ker(L) is the

empty set. Hence, dim(Ker(L)) = 0.
The range of L contains all vectors of the form
_
_
a b
c b
a + b a + c b
_
_
= a
_
_
1 0
0 0
1 1
_
_
+ b
_
_
0 1
0 1
1 1
_
_
+ c
_
_
0 0
1 0
0 1
_
_
We can easily verify that the set
_
_
_
_
1 0
0 0
1 1
_
_
,
_
_
0 1
0 1
1 1
_
_
,
_
_
0 0
1 0
0 1
_
_
_
_
is linearly independent
and hence a basis for Range(L). Consequently, dim(Range(L)) = 3.
EXERCISE 1 Find a basis for the range and kernel of L : R
2
M
22
(R) dened by
L
__
x
1
x
2
__
=
_
x
1
x
1
+ x
2
0 x
1
+ x
2
_
Observe in all of the examples above that dim(Range(L)) + dim(Ker(L)) = dimV
which matches the result we had for linear mappings L : R
n
R
m
. Before we
extend this to the general case, we make a couple of denitions.
DEFINITION
Rank
Nullity
Let L : V Wbe a linear mapping. Then we dene the rank of L by
rank(L) = dim(Range(L))
We dene the nullity of L to be
nullity(L) = dim(Ker(L))
Section 8.2 Rank-Nullity Theorem 17
THEOREM 2 (Rank-Nullity Theorem)
Let V be an n-dimensional vector space and let W be a vector space. If L : V W
is linear, then
rank(L) + nullity(L) = n
Proof: Assume that nullity(L) = k and let
_
v
1
, , v
k
_
be a basis for Ker(L). Then
we can extend
_
v
1
, , v
k
_
to a basis
_
v
1
, , v
k
, v
k+1
, , v
n
_
for V. We claim that
_
L(v
k+1
), , L(v
n
)
_
is a basis for Range(L).
Let y Range(L). Then by denition, there exists v V such that
y = L(v) = L(c
1
v
1
+ + c
k
v
k
+ c
k+1
v
k+1
+ + c
n
v
n
)
= c
1
L(v
1
) + + c
k
L(v
k
) + c
k+1
L(v
k+1
) + + c
n
L(v
n
)
=
0 + +
0 + c
k+1
L(v
k+1
) + + c
n
L(v
n
)
= c
k+1
L(v
k+1
) + + c
n
L(v
n
)
Hence
y SpanL(v
k+1
), . . . , L(v
n
)
Consider
c
k+1
L(v
k+1
) + + c
n
L(v
n
) =
0
L(c
k+1
v
k+1
+ + c
n
v
n
) =
0
This implies that c
k+1
v
k+1
+ + c
n
v
n
Ker(L). Thus we can write
c
k+1
v
k+1
+ + c
n
v
n
= d
1
v
1
+ + d
k
v
k
But, then we have
d
1
v
1
d
k
v
k
+ c
k+1
v
k+1
+ + c
n
v
n
=
0
Hence, d
1
= = d
k
= c
k+1
= = c
n
= 0 since
_
v
1
, , v
n
_
as it is a basis for V.
Therefore, we have shown that L(v
k+1
), , L(v
n
) is a basis for Range(L) and hence
rank(L) = n k = n nullity(L)
REMARK
The proof of the Rank-Nullity Theorem should seem very familiar. It is essentially
identical to that of the Dimension Theorem in Chapter 7. In this book there will be
quite a few times where proofs are essentially repeated, so spending time to make
sure you understand proofs when you rst see them can have real long term benets.
EXAMPLE 3 Let L : P
2
(R) M
23
(R) be dened by
L(a + bx + cx
2
) =
_
a a c
a a c
_
Find the rank and nullity of L.
2
Ker(L) then
_
a a c
a a c
_
= L(a + bx + cx
2
) =
_
0 0 0
0 0 0
_
Hence a = 0 and c = 0. So every vector in Ker(L) has the form bx so a basis for
Ker(L) is x. Thus nullity(L) = 1 and so by the Rank-Nullity Theorem we get that
rank(L) = dimP
2
(R) nullity(L) = 3 1 = 2
In the example above, we could have instead found a basis for the range and then
used the Rank-Nullity Theorem to nd the nullity of L.
1. Find the rank and nullity of the following linear mappings.
(a) L : R
2
M
22
2
) =
_
a 0
0 b
_
(b) L : P
2
(R) P
2
2
) = (a b) + (bc)x
2
.
(c) T : P
2
(R) M
22
(R) dened by T(a + bx + cx
2
) =
_
0 0
a + c b + c
_
.
(d) L : R
3
P
1
(R) dened by L
_
_
_
_
a
b
c
_
_
_
_
= (a + b) + (a + b + c)x.
2. Prove that if L(v
1
), . . . , L(v
k
) spans W, then dimV dimW.
3. Find a linear mapping L : V Wwhere dimV dimW, but Range(L) W.
4. Find a linear mapping L : V V such that Ker(L) = Range(L).
5. Let V and W be n-dimensional vector spaces and let L : V W be a linear
mapping. Prove that Range(L) = Wif and only if Ker(L) =
0.
6. Let U, V, W be nite dimensional vector spaces and let L : V U and M :
U Wbe linear mappings.
(a) Prove that rank(M L) rank(M).
(b) Prove that rank(M L) rank(L).
(c) Prove that if M is invertible, then rank(M L) = rank L.
Section 8.3 Matrix of a Linear Mapping 19
8.3 Matrix of a Linear Mapping
We now show that every linear mapping L : V Wcan also be represented as a ma-
trix mapping. However, we must be careful when dealing with general vector spaces
as our domain and codomain. For example, it is certainly impossible to represent a
linear mapping L : P
2
(R) M
22
(R) as a matrix mapping L(x) = Ax since we can
not multiply a matrix by a polynomial x P
2
(R). Moreover, we require the output to
be a 2 2 matrix.
Thus, if we are going to dene a matrix representation of a general linear mapping,
we need to convert vectors from V to vectors in R
n
. Recall that the coordinate vector
of v V with respect to an ordered basis J is a vector in R
n
. In particular, if J =
v
1
, . . . , v
n
and v = b
1
v
1
+ + b
n
v
n
, then the coordinate vector of v with respect to
J is dened to be
[v]
J
=
_
_
b
1
.
.
.
b
n
_
_
Using coordinates, we can write a matrix mapping representation for a linear map-
ping L : V W. That is, we want to nd a matrix A such that
[L(x)]
(
= A[x]
J
for every x V, where J is a basis for V and ( is a basis for W.
Consider the left-hand side [L(x)]
(
. Using properties of linear mappings and coordi-
nates, we get
[L(x)]
(
= [L(b
1
v
1
+ + b
n
v
n
)]
(
= [b
1
L(v
1
) + + b
n
L(v
n
)]
(
= b
1
[L(v
1
)]
(
+ + b
n
[L(v
n
)]
(
=
_
[L(v
1
)]
(
[L(v
n
)]
(
_
_
_
b
1
.
.
.
b
n
_
_
= A[x]
J
Thus, we see the desired matrix is
A =
_
[L(v
1
)]
(
[L(v
n
)]
(
_
DEFINITION
Matrix of a
Linear Mapping
Suppose J = v
1
, . . . , v
n
is any basis for a vector space V and ( is any basis for a
nite dimensional vector space Wand let L : V Wbe a linear mapping. Then the
matrix of L with respect to bases J and ( is
(
[L]
J
=
_
[L(v
1
)]
(
[L(v
n
)]
(
_
It satises
[L(x)]
(
=
(
[L]
J
[x]
J
for all x V.
EXAMPLE 1 Let L : P
2
(R) R
2
be the linear mapping dened by L(a + bx + cx
2
) =
_
a + c
b c
_
,
J = 1 + x
2
, 1 + x, 1 + x + x
2
, ( =
__
1
0
_
,
_
1
1
__
. Find
(
[L]
J
.
Solution: To nd
(
[L]
J
we need to determine the (-coordinates of the images of the
vectors in J under L. We have
L(1 + x
2
) =
_
2
1
_
= 3
_
1
0
_
+ (1)
_
1
1
_
L(1 + x) =
_
1
1
_
= 0
_
1
0
_
+ 1
_
1
1
_
L(1 + x + x
2
) =
_
0
0
_
= 0
_
1
0
_
+ 0
_
1
1
_
Hence,
(
[L]
J
=
_
[L(1 + x
2
)]
(
[L(1 + x)]
(
[L(1 + x + x
2
)]
(
_
=
_
3 0 0
1 1 0
_
We can check our answer. Let p(x) = a + bx + cx
2
. To use
(
[L]
J
we need [p(x)]
J
.
Hence, we need to write p(x) as a linear combination of the vectors in J. In particular,
we need to solve
a+bx+cx
2
= t
1
(1+x
2
)+t
2
(1+x)+t
3
(1+x+x
2
) = (t
1
+t
2
t
3
)+(t
2
+t
3
)x+(t
1
+t
3
)x
3
Row reducing the corresponding augmented matrix gives
_
_
1 1 1 a
0 1 1 b
1 0 1 c
_
_
(a b + 3c)/3
(a + 2b c)/3
(a + b + c)/3
_
_
Thus,
[p(x)]
J
=
1
3
_
_
a b + 3c
a + 2b c
a + b + c
_
_
So we get
[L(p(x))]
(
=
(
[L]
J
[p(x)]
J
=
_
3 0 0
1 1 0
_
1
3
_
_
a b + 2c
a + 2b c
a + b + c
_
_
=
_
a b + 2c
b c
_
Therefore, by denition of (-coordinates
L(p(x)) = (a b + 2c)
_
1
0
_
+ (b c)
_
1
1
_
=
_
a + c
b c
_
as required.
EXAMPLE 2 Let T : R
2
M
22
(R) be the linear mapping dened by T
__
a
b
__
=
_
a + b 0
0 a b
_
,
and let J =
__
2
1
_
,
_
1
2
__
and ( =
__
1 1
0 0
_
,
_
1 0
0 1
_
,
_
1 1
0 1
_
,
_
0 0
1 0
__
. Determine
(
[T]
J
and use it to calculate the T(v) where [v]
J
=
_
2
3
_
.
Solution: To nd
(
[L]
J
we need to determine the (-coordinates of the images of the
vectors in J under L. We have
T(2, 1) =
_
1 0
0 3
_
= 2
_
1 1
0 0
_
+ 1
_
1 0
0 1
_
+ 2
_
1 1
0 1
_
+ 0
_
0 0
1 0
_
T(1, 2) =
_
3 0
0 1
_
= 4
_
1 1
0 0
_
+ 3
_
1 0
0 1
_
+ (4)
_
1 1
0 1
_
+ 0
_
0 0
1 0
_
Hence,
(
[T]
J
=
_
[T(2, 1)]
(
[T(1, 2)]
(
_
=
_
_
2 4
1 3
2 4
0 0
_
_
Thus, we get
[T(v)]
(
=
_
_
2 4
1 3
2 4
0 0
_
_
_
2
3
_
=
_
_
16
7
16
0
_
_
so
T(v) = (16)
_
1 1
0 0
_
+ (7)
_
1 0
0 1
_
+ 16
_
1 1
0 1
_
+ (0)
_
0 0
1 0
_
=
_
7 0
0 9
_
In the special case of a linear operator L acting on a nite-dimensional vector space
V with basis J, we often wish to nd the matrix
J
[L]
J
.
DEFINITION
Matrix of a
Linear Operator
Suppose J = v
1
, . . . , v
n
is any basis for a vector space V and let L : V V be a
linear operator. Then the J-matrix of L (or the matrix of L with respect to the basis
J) is
[L]
J
=
_
[L(v
1
)]
J
[L(v
n
)]
J
_
It satises
[L(x)]
J
= [L]
J
[x]
J
for all x V.
EXAMPLE 3 Let J =
_
_
_
_
1
1
1
_
_
,
_
_
1
0
1
_
_
,
_
_
1
3
2
_
_
_
_
and let A =
_
_
6 2 9
5 1 7
7 2 10
_
_
. Find the J-matrix of the linear
mapping dened by L(x) = Ax.
Solution: By denition, the columns of the J-matrix are the J-coordinates of the
images of the vectors in J under L. We nd that
_
_
6 2 9
5 1 7
7 2 10
_
_
_
_
1
1
1
_
_
=
_
_
1
1
1
_
_
= 1
_
_
1
1
1
_
_
+ 0
_
_
1
0
1
_
_
+ 0
_
_
1
3
2
_
_
_
_
6 2 9
5 1 7
7 2 10
_
_
_
_
1
0
1
_
_
=
_
_
3
2
3
_
_
= 2
_
_
1
1
1
_
_
+ 1
_
_
1
0
1
_
_
+ 0
_
_
1
3
2
_
_
_
_
6 2 9
5 1 7
7 2 10
_
_
_
_
1
3
2
_
_
=
_
_
6
6
7
_
_
= 3
_
_
1
1
1
_
_
+ 2
_
_
1
0
1
_
_
+ 1
_
_
1
3
2
_
_
Hence, [L]
J
=
_
[L(1, 1, 1)]
J
[L(1, 0, 1)]
J
[L(1, 3, 2)]
J
_
=
_
_
1 2 3
0 1 2
0 0 1
_
_
EXAMPLE 4 Let L : P
2
(R) P
2
(R) be dened by L(a+bx+cx
2
) = (2a+b)+(a+b+c)x+(b+2c)x
2
.
Find [L]
J
where J = 1+x
2
, 12x+x
2
, 1+x+x
2
and nd [L]
S
where S = 1, x, x
2
.
Solution: To determine [L]
J
we need to nd the J-coordinates of the images of the
vectors in J under L.
L(1 + x
2
) = 2 + 0x + 2x
2
= 2(1 + x
2
) + 0(1 2x + x
2
) + 0(1 + x + x
2
)
L(1 2x + x
2
) = 0 + 0x + 0x
2
= 0(1 + x
2
) + 0(1 2x + x
2
) + 0(1 + x + x
2
)
L(1 + x + x
2
) = 3 + 3x + 3x
2
= 0(1 + x
2
) + 0(1 2x + x
2
) + 3(1 + x + x
2
)
Thus,
[L]
J
=
_
[L(1 + x
2
)]
J
[L(1 2x + x
2
)]
J
[L(1 + x + x
2
)]
J
_
=
_
_
2 0 0
0 0 0
0 0 3
_
_
Similarly, for [L]
S
we calculate the S-coordinates of the image of the vectors in S
under L.
L(1) = 2 + x = 2(1) + 1(x) + 0x
2
L(x) = 1 + x + x
2
= 1(1) + 1(x) + 1(x
2
)
L(x
2
) = x + 2x
2
= 0(1) + 1(x) + 2(x
2
)
Hence,
[L]
S
=
_
_
2 1 0
1 1 1
0 1 2
_
_
In Example 4 we found the matrix of L with respect to the basis J is diagonal while
the matrix of L with respect to the basis S is not. This is perhaps not surprising since
we see that the vectors in J are eigenvectors of L (that is, they satisfy L(v
i
) =
i
v
i
),
while the vectors in S are not. Recall from Math 136, that we called such a basis a
geometrically natural basis.
REMARK
The relationship between diagonalization and the matrix of a linear operator is ex-
tremely important. We will review this a little later in the text. You may also nd it
useful at this point to review Section 6.1 in the Math 136 course notes.
1. Find the matrix of each linear mapping with respect to the bases J and (.
(a) D : P
2
(R) P
1
2
) = b + 2cx, J = 1, x, x
2
,
( = 1, x
(b) L : R
2
P
2
(R) dened by L(a
1
, a
2
) = (a
1
+a
2
) +a
1
x
2
, J =
__
1
1
_
,
_
1
2
__
,
( = 1 + x
2
, 1 + x, 1 x + x
2
(c) T : M
22
(R) R
2
dened by T
__
a b
c d
__
=
_
a + c
b + c
_
,
J =
__
1 1
0 0
_
,
_
0 1
1 0
_
,
_
1 0
0 1
_
,
_
0 0
1 1
__
, ( =
__
1
1
_
,
_
1
1
__
2. Find the J-matrix of each linear mapping with respect to the give basis J.
(a) D : P
2
(R) P
2
2
) = b + 2cx, J = 1, x, x
2
(b) L : M
22
(R) M
22
(R) dened by L
__
a b
c d
__
=
_
a b + c
0 d
_
,
J =
__
1 1
0 0
_
,
_
1 0
0 1
_
,
_
1 1
0 1
_
,
_
0 0
1 0
__
(c) T : U U, where U is the subspace of diagonal matrices in M
22
(R),
dened by T
__
a 0
0 b
__
=
_
a + b 0
0 2a + b
_
, J =
__
1 0
0 1
_
,
_
2 0
0 3
__
3. Let V be an n-dimensional vector space and let L : V V by the linear
operator dened by L(v) = v. Prove that [L]
J
= I for any basis J of V.
4. Invent a mapping L : R
2
R
2
and a basis J for R
2
, such that Col([L]
J
)
Range(L). Justify your answer.
8.4 Isomorphisms
Recall that the ten vector space axioms dene a structure for the set based on
the operations of addition and scalar multiplication. Since all vector spaces satisfy
the same ten properties, we would expect that all n-dimensional vector spaces should
have the same structure. This idea seems to be conrmed by our work with coordinate
vectors in Math 136. No matter what n-dimensional vector space Vwe use and which
basis we use for the vector space, we have a nice way of relating the vectors in V to
vectors in R
n
. Moreover, with respect to this basis the operations of addition and
scalar multiplication are preserved. Consider the following calculations:
_
_
1
2
3
_
_
+
_
_
4
5
6
_
_
=
_
_
5
7
9
_
_
(1 + 2x + 3x
2
) + (4 + 5x + 6x
2
) = 5 + 7x + 9x
2
(v
1
+ 2v
2
+ 3v
3
) + (4v
1
+ 5v
2
+ 6v
3
) = 5v
1
+ 7v
2
+ 9v
3
No matter which vector space we are using and which basis for that vector space,
any linear combination of vectors is really just performed on the coordinates of the
vectors with respect to the basis.
We will now look at how to use general linear mappings to mathematically prove
these observations. We begin with some familiar concepts.
One-To-One and Onto
DEFINITION
One-To-One
Onto
Let L : V W be a linear mapping. L is called one-to-one (injective) if for every
u, v V such that L(u) = L(v) we must have u = v.
L is called onto (surjective) if for every w W, there exists v V such that L(v) = w.
EXAMPLE 1 Let L : R
2
P
2
(R) be the linear mapping dened by L(a
1
, a
2
) = a
1
+ a
2
x
2
. Then
L is one-to-one, since if L(a
1
, a
2
) = L(b
1
, b
2
), then a
1
+ a
2
x
2
= b
1
+ b
2
x
2
and hence
a
1
= b
1
and a
2
= b
2
. L is not onto, since there is no a R
2
such that L(a) = x.
EXAMPLE 2 Let L : M
22
(R) R be the linear mapping dened by L(A) = tr A. Then L is not
one-to-one, since L
__
1 0
0 1
__
= L
__
0 0
0 0
__
, but
_
1 0
0 1
_

_
0 0
0 0
_
. L is onto, since if
we pick any x R, then we can nd a matrix A M
22
(R), for example A =
_
x 0
0 0
_
,
such that L (A) = x.
Observe that a linear mapping being one-to-one means that each w Range(L) is
mapped to by exactly one v V while L being onto means that Range(L) = W.
So there is a relationship between a mapping being onto and its range. We now
establish a relationship between a mapping being one-to-one and its kernel. These
relationships will be exploited in some proofs below.
Section 8.4 Isomorphisms 25
LEMMA 1 Let L : V Wbe a linear mapping. L is one-to-one if and only if Ker(L) =
0.
Proof: Assume L is one-to-one. Let x Ker(L). Then L(x) =

0 = L(
0) which
implies that x =
0 since L is one-to-one. Hence Ker(L) =
0.
Assume Ker(L) =
0 and consider L(u) = L(v). Then
0 = L(u) L(v) = L(u v)

Hence, u v Ker(L), so u v =
0. Therefore, u = v and so L is one-to-one.

EXAMPLE 3 Let L : R
3
P
5
(R) be the linear mapping dened by
L(a, b, c) = a(1 + x
3
) + b(1 + x
2
+ x
5
) + cx
4
Prove that L is one-to-one, but not onto.
Solution: Let a =
_
_
a
b
c
_
_
Ker(L). Then,
0 = L(a, b, c) = a(1 + x
3
) + b(1 + x
2
+ x
5
) + cx
4
This gives a = b = c = 0, so Ker(L) =
_
_
_
_
0
0
0
_
_
_
_
. Hence, L is one-to-one by Lemma 1.
Observe that a basis for Range(L) is 1+x
3
, 1+x
2
+x
5
, x
4
. Hence, dim(Range(L)) =
3 < dimP
5
(R). Thus, Range(L) P
5
(R), so L is not onto.
EXAMPLE 4 Let L : M
22
(R) R
2
be the linear mapping dened by
L
__
a b
c d
__
=
_
a
b
_
Prove that L is onto, but not one-to-one.
_
0 0
1 1
_
Ker(L). Thus, L is not one-to-one by Lemma 1.
Let
_
x
1
x
2
_
R
2
. Then observe that L
__
x
1
x
2
0 0
__
=
_
x
1
x
2
_
, so L is onto.
EXERCISE 1 Let L : R
3
P
2
(R) be the linear mapping dened by L(a, b, c) = a +bx +cx
2
. Prove
that L is one-to-one and onto.
Isomorphisms
To prove that two vector spaces V and W have the same structure we need to relate
each vector v
i
V with a unique vector w
i
W such that if av
1
+ bv
2
= v
3
, then
a w
1
+ b w
2
= w
3
. Thus, we will require a linear mapping from V to Wwhich is both
one-to-one and onto.
DEFINITION
Isomorphism
Isomorphic
Let V and W be vector spaces. Then V is said to be isomorphic to W if there
exists a linear mapping L : V W which is one-to-one and onto. L is called an
isomorphism from V to W.
EXAMPLE 5 Show that R
4
is isomorphic to M
22
(R).
Solution: We need to invent a linear mapping L fromR
4
to M
22
(R) that is one-to-one
and onto. To dene such a linear mapping, we think about how to relate vectors in
R
4
to vectors in M
22
(R). Keeping in mind our work at the beginning of this section,
we dene the mapping L : R
4
M
22
(R) by
L(a
0
, a
1
, a
2
, a
3
) =
_
a
0
a
1
a
2
a
3
_
Linear: Let a =
_
_
a
0
a
1
a
2
a
3
_
_
,
b =
_
_
b
0
b
1
b
2
b
3
_
_
R
4
and s, t R. Then L is linear since
L(sa + t
b) = L(sa
0
+ tb
0
, sa
1
+ tb
1
, sa
2
+ tb
2
, sa
3
+ tb
3
)
=
_
sa
0
+ tb
0
sa
1
+ tb
1
sa
2
+ tb
2
sa
3
+ tb
3
_
= s
_
a
0
a
1
a
2
a
3
_
+ t
_
b
0
b
1
b
2
b
3
_
= sL(a) + tL(
b)
One-To-One: If L(a) = L(
b), then
_
a
0
a
1
a
2
a
3
_
=
_
b
0
b
1
b
2
b
3
_
which implies a
i
= b
i
for
i = 0, 1, 2, 3. Therefore, a =
b and so L is one-to-one.
Onto: Pick
_
a
0
a
1
a
2
a
3
_
M
22
(R). Then L(a
0
, a
1
, a
2
, a
3
) =
_
a
0
a
1
a
2
a
3
_
, so L is onto.
Hence, L is an isomorphism and so R
4
is isomorphic to M
22
(R).
It is instructive to think carefully about the isomorphism in the example above. Ob-
serve that the images of the standard basis vectors for R
4
are the standard basis vec-
tors for M
22
(R). That is, the isomorphism is mapping basis vectors to basis vectors.
We will keep this in mind when constructing an isomorphism in the next example.
EXAMPLE 6 Let T = p(x) P
2
(R) p(1) = 0 and S =
_
_
_
_
a
b
c
_
_
R
4
a + b + c = 0
_
_
. Prove that T
is isomorphic to S.
Solution: By the factor theorem, every vector p(x) T has the form (x 1)(a + bx).
Hence, a basis for T is J = 1 x, x x
2
. Also, every vector x S has the form
x =
_
_
a
b
a b
_
_
= a
_
_
1
0
1
_
_
+ b
_
_
0
1
1
_
_
Therefore, a basis for S is ( =
_
_
_
_
1
0
1
_
_
,
_
_
0
1
1
_
_
_
_
. So, to dene an isomorphism we want
to map the vectors in J to the vectors in (. In particular, we dene L : T S by
L
_
a(1 x) + b(x x
2
)
_
= a
_
_
1
0
1
_
_
+ b
_
_
0
1
1
_
_
Linear: Let p(x), q(x) T with p(x) = a
1
(1 x) + b
1
(x x
2
) and
q(x) = a
2
(1 x) + b
2
(x x
2
). Then for any s, t R we get
L(sp(x) + tq(x)) = L
_
s(a
1
+ ta
2
)(1 x) + (sb
1
+ tb
2
)(x x
2
)
_
= s(a
1
+ ta
2
)
_
_
1
0
1
_
_
+ (sb
1
+ tb
2
)
_
_
0
1
1
_
_
= s
_
_
a
1
_
_
1
0
1
_
_
+ b
1
_
_
0
1
1
_
_
_
_
+ t
_
_
a
2
_
_
1
0
1
_
_
+ b
2
_
_
0
1
1
_
_
_
_
= sL(p(x)) + tL(q(x))
Thus L is linear.
One-To-One: Let a(1 x) + b(x x
2
) Ker(L). Then
_
_
0
0
0
_
_
= L
_
a(1 x) + b(x x
2
)
_
= a
_
_
1
0
1
_
_
+ b
_
_
0
1
1
_
_
=
_
_
a
b
a b
_
_
Hence, a = b = 0, so Ker(L) = 0. Therefore, by Lemma 1, L is one-to-one.
Onto: Let a
_
_
1
0
1
_
_
+ b
_
_
0
1
1
_
_
S. Then
L
_
a(1 x) + b(x x
2
)
_
= a
_
_
1
0
1
_
_
+ b
_
_
0
1
1
_
_
so L is onto.
Thus, L is an isomorphism and T is isomorphic to S.
EXERCISE 2 Prove that P
2
(R) is isomorphic to R
3
.
Observe in the examples above that the isomorphic vector spaces have the same di-
mension. This makes sense since if we are mapping basis vectors to basis vectors,
then both vector spaces need to have the same number of basis vectors. We now
prove that this is indeed always the case.
THEOREM 2 Let V and Wbe nite dimensional vector spaces. Then V is isomorphic to Wif and
only if dimV = dimW.
Proof: On one hand, assume that dimV = dimW = n. Let J = v
1
, . . . , v
n
be a
basis for V and ( = w
1
, . . . , w
n
be a basis for W. As in our examples above, we
dene a mapping L : V Wwhich maps the basis vectors in J to the basis vectors
in (. So, we dene
L(t
1
v
1
+ + t
n
v
n
) = t
1
w
1
+ + t
n
w
n
Linear: Let x = a
1
v
1
+ + a
n
v
n
, y = b
1
v
1
+ + b
n
v
n
V and s, t R. Then
L(sx + ty) = L
_
(sa
1
+ tb
1
)v
1
+ + (sa
n
+ tb
n
)v
n
_
= (sa
1
+ tb
1
) w
1
+ + (sa
n
+ tb
n
) w
n
= s(a
1
w
1
+ + a
n
w
n
) + t(b
1
w
1
+ + b
n
w
n
)
= sL(x) + tL(y)
Therefore, L is linear.
One-To-One: If v Ker(L), then
0 = L(v) = L(c
1
v
1
+ + c
n
v
n
) = c
1
L(v
1
) + + c
n
L(v
n
) = c
1
w
1
+ + c
n
w
n
( is linearly independent so we get c
1
= = c
n
= 0. Therefore Ker(L) =
0 and
hence L is one-to-one.
Onto: We have Ker(L) =
0 since L is one-to-one. Thus, we get rank(L) = dimV

0 = n by the Rank-Nullity Theorem. Consequently, Range(L) is an n-dimensional
subspace of Wwhich implies that Range(L) = W, so L is onto.
Thus, L is an isomorphism from V to Wand so V is isomorphic to W.
On the other hand, assume that V is isomorphic to W. Then there exists an isomor-
phism L from V to W. Since L is one-to-one and onto, we get Range(L) = W and
Ker(L) =
0. Thus, the Rank-Nullity Theorem gives

dimW = dim(Range(L)) = rank(L) = dimV nullity(L) = dimV
as required.
The proof not only proves the theorem, but it demonstrates a few additional facts.
First, it shows the intuitively obvious fact that if V is isomorphic to W, then W is
isomorphic to V. Typically we just say that V and W are isomorphic. Second, it
conrms that we can make an isomorphism from V to W by mapping basis vectors
of V to basis vectors to W. Finally, observe that once we had proven that L was
one-to-one, we could exploit the Rank-Nullity Theorem and that dimV = dimW to
immediately get that the mapping was onto. In particular, it shows us how to prove
the following theorem.
THEOREM 3 If V and Ware isomorphic vector spaces and L : V Wis linear, then L is one-to-
one if and only if L is onto.
REMARK
Note that if V and W are both n-dimensional, then it does not mean every linear
mapping L : V W must be an isomorphism. For example, L : R
4
M
22
(R)
dened by L(x
1
, x
2
, x
3
, x
4
) =
_
0 0
0 0
_
is denitely not one-to-one nor onto!
We saw that we can make an isomorphism by mapping basis vectors to basis vectors.
The following theorem shows that this property actually characterizes isomorphisms.
THEOREM 4 Let V and Wbe isomorphic vector spaces and let v
1
, . . . , v
n
be a basis for V. Then
a linear mapping L : V Wis an isomorphism if and only if L(v
1
), . . . , L(v
n
) is a
basis for W.
The following exercise is intended to be quite challenging. It requires complete un-
derstanding of what we have done in this chapter.
EXERCISE 3 Use the following steps to nd a basis for the vector space L of all linear operators
L : V V on an 2-dimensional vector space V.
1. Prove that L is isomorphic to M
22
(R) by constructing an isomorphism T.
2. Find an isomorphismT
1
from M
22
(R) to L. Let e
1
, e
2
, e
3
, e
4
denote the stan-
dard basis vectors for M
22
(R). Calculate T
1
(e
1
), T
1
(e
2
), T
1
(e
3
), T
1
(e
4
).
By Theorem 4 these vectors for a basis for L.
3. Demonstrate that you are correct by proving that the set
T
1
(e
1
), T
1
(e
2
), T
1
(e
3
), T
1
(e
4
) is linearly independent and spans L.
1. For each of the following pairs of vector spaces, dene an explicit isomor-
phism to establish that the spaces are isomorphic. Prove that your map is an
isomorphism.
(a) P
1
(R) and R
2
(b) P
3
(R) and M
22
(R)
(c) L : P
2
(R) P
2
2
) = (a b) + (bc)x
2
(d) The vector space P = p(x) P
3
p(1) = 0 and the vector space U of
2 2 upper triangular matrices
(e) The vector space S =
_
_
_
_
x
1
x
2
0
x
3
_
_
R
4
x
1
x
2
+ x
3
= 0
_
_
and the vector space
U = a + bx + cx
2
P
2
a = b
(f) Any n-dimensional vector space V and P
n1
(R).
2. Let V and W be n-dimensional vector spaces and let L : V W be a linear
mapping. Prove that L is one-to-one if and only if L is onto.
3. Let L : V Wbe an isomorphism and let v
1
, . . . , v
n
be a basis for V. Prove
that L(v
1
), . . . , L(v
n
) is a basis for W.
4. Let L : V U and M : U Wbe linear mappings.
(a) Prove that if L and M are onto, then M L is onto.
(b) Give an example where L is not onto, but M L is onto.
(c) Is it possible to give an example where M is not onto, but M L is onto?
Explain.
5. Let V and Wbe vector spaces with dimV = n and dimW = m, let L : V W
be a linear mapping, and let A be the matrix of L with respect to bases J for V
and ( for W.
(a) Dene an explicit isomorphismfrom Range(L) to Col(A). Prove that your
map is an isomorphism.
(b) Use (a) to prove that rank(L) = rank(A).
6. Let U and V be nite dimensional vector spaces and L : U V and M : V
U both be one-to-one linear mappings. Prove that U and V are isomorphic.
Chapter 9
Inner Products
In Math 136, we looked at the dot product function for vectors in R
n
. We saw that
the dot product function has an important relationship to the length of a vector and
to the angle between two vectors. Moreover, the dot product gave us an easy way
of nding the projection of a vector onto a plane. In Chapter 8, we saw that every
n-dimensional vector space is identical to R
n
in terms of being a vector space. Thus,
we should be able to extend these ideas to general vector spaces.
9.1 Inner Product Spaces
Recall that we dened a vector space so that the operations of addition and scalar
multiplication had the essential properties of addition and scalar multiplication of
vectors in R
n
. Thus, to generalize the concept of the dot product to general vector
spaces, it makes sense to include the essential properties of the dot product in our
denition.
DEFINITION
Inner Product
Let V be a vector space. An inner product on V is a function ( , ) : V V R that
has the following properties: for every v, u, w V and s, t R we have
I1 (v, v) 0 and (v, v) = 0 if and only if v =
0 (Positive Denite)
I2 (v, w) = ( w, v) (Symmetric)
I3 (sv + tu, w) = s(v, w) + t(u, w) (Bilinear)
DEFINITION
Inner Product
Space
A vector space V with an inner product ( , ) on V is called an inner product space.
In the same way that a vector space is dependent on the dened operations of addition
and scalar multiplication, an inner product space is dependent on the denitions of
addition, scalar multiplication, and the inner product. This will be demonstrated in
the examples below.
31
32 Chapter 9 Inner Products
EXAMPLE 1 The dot product is an inner product on R
n
, called the standard inner product on R
n
.
EXAMPLE 2 Which of the following denes an inner product on R
3
?
(a)
_
x, y
_
= x
1
y
1
+ 2x
2
y
2
+ 4x
3
y
3
.
Solution: We have
_
x, x
_
= x
2
1
+ 2x
2
2
+ 4x
2
3
0
and
_
x, x
_
= 0 if and only if x =
0. Hence, it is positive denite.

_
x, y
_
= x
1
y
1
+ 2x
2
y
2
+ 4x
3
y
3
= y
1
x
1
+ 2y
2
x
2
+ 4y
3
x
3
=
_
y, x
_
Therefore, it is symmetric. Also, it is bilinear since
_
sx + ty, z
_
= (sx
1
+ ty
1
)z
1
+ 2(sx
2
+ ty
2
)z
2
+ 4(sx
3
+ ty
3
)z
3
= s(x
1
z
1
+ 2x
2
z
2
+ 4x
3
z
3
) + t(y
1
z
1
+ 2y
2
z
2
+ 4y
3
z
3
)
= s
_
x, z
_
+ t
_
y, z
_
Thus ( , ) is an inner product.
(b)
_
x, y
_
= x
1
y
1
x
2
y
2
+ x
3
y
3
.
Solution: Observe that if x =
_
_
0
1
0
_
_
, then
_
x, x
_
= 0(0) 1(1) + 0(0) = 1 < 0
So, it is not positive denite and thus it is not an inner product.
(c)
_
x, y
_
= x
1
y
1
+ x
2
y
2
.
Solution: If x =
_
_
0
0
1
_
_
, then
_
x, x
_
= 0(0) + 0(0) = 0
but x

0. So, it is not positive denite and thus it is not an inner product.
REMARK
A vector space can have innitely many dierent inner products. However, dierent
inner products do not necessarily dier in an interesting way. For example, in R
n
all
inner products behave like the dot product with respect to some basis.
Section 9.1 Inner Product Spaces 33
EXAMPLE 3 On the vector space M
22
(R), dene ( , ) by
(A, B) = tr(B
T
A)
where tr is the trace of a matrix dened as the sum of the diagonal entires. Show that
( , ) is an inner product on M
22
(R).
Solution: Let A, B, C M
22
(R) and s, t R. Then:
(A, A) = tr(A
T
A) = tr
__
a
11
a
21
a
12
a
22
_ _
a
11
a
12
a
21
a
22
__
= tr
__
a
2
11
+ a
2
21
a
11
a
12
+ a
21
a
22
a
11
a
12
+ a
21
a
22
a
2
12
+ a
2
22
__
= a
2
11
+ a
2
21
+ a
2
12
+ a
2
22
0
Moreover, (A, A) = 0 if and only if A is the zero matrix. Hence, ( , ) is positive
denite.
(A, B) = tr(B
T
A) = tr
__
b
11
b
21
b
12
b
22
_ _
a
11
a
12
a
21
a
22
__
= tr
__
a
11
b
11
+ a
21
b
21
b
11
a
12
+ b
21
a
22
a
11
b
12
+ a
21
b
22
a
12
b
12
+ a
22
b
22
__
= a
11
b
11
+ a
12
b
12
+ a
21
b
21
+ a
22
b
22
With a similar calculation we nd that
(B, A) = a
11
b
11
+ a
12
b
12
+ a
21
b
21
+ a
22
b
22
= (A, B)
Therefore, ( , ) is symmetric. In Section 8.1 we proved that tr is a linear operator.
Hence, we get
(sA + tB, C) = tr(C
T
(sA + tB)) = tr(sC
T
A + tC
T
B)
= s tr(C
T
A) + t tr(C
T
B) = s (A, C) + t (B, C)
Thus, ( , ) is also bilinear, and so it is an inner product on M
22
(R).
REMARKS
1. Of course, this can be generalized to an inner product (A, B) = tr(B
T
A) on
M
mn
(R). This is called the standard inner product on M
mn
(R).
2. Observe that calculating (A, B) corresponds exactly to nding the dot product
on the isomorphic vectors in R
mn
. As a result, when using this inner product
you do not actually have to compute B
T
A.
EXAMPLE 4 Prove that the function ( , ) dened by
(p(x), q(x)) = p(1)q(1) + p(0)q(0) + p(1)q(1)
is an inner product on P
2
(R).
Solution: Let p(x) = a + bx + cx
2
, q(x), r(x) P
2
(R). Then:
(p(x), p(x)) = [p(1)]
2
+ [p(0)]
2
+ [p(1)]
2
= [a b + c]
2
+ [a]
2
+ [a + b + c]
2
0
Moreover, (p(x), p(x)) = 0 if and only if a b + c = 0, a = 0, and a + b + c = 0,
which implies a = b = c = 0. Hence, ( , ) is positive denite. We have
(p(x), q(x)) = p(1)q(1) + p(0)q(0) + p(1)q(1)
= q(1)p(1) + q(0)p(0) + q(1)p(1)
= (q(x), p(x))
Hence, ( , ) is symmetric. Finally, we get
((sp + tq)(x), r(x)) =(sp + tq)(1)r(1) + (sp + tq)(0)r(0) + (sp + tq)(1)r(1)
=[sp(1) + tq(1)]r(1) + [sp(0) + tq(0)]r(0)+
+ [sp(1) + tq(1)]r(1)
=s[p(1)r(1) + p(0)r(0) + p(1)r(1)]+
+ t[q(1)r(1) + q(0)r(0) + q(1)r(1)]
=s (p(x), r(x)) + t (q(x), r(x))
Therefore, ( , ) is also bilinear and hence is an inner product on P
2
(R).
EXAMPLE 5 An extremely important inner product in applied mathematics, physics, and engineer-
ing is the inner product
( f (x), g(x)) =
_

f (x)g(x) dx
on the vector space C[, ] of continuous functions dened on the closed interval
from to . This is the foundation for Fourier Series.
THEOREM 1 Let V be an inner product space with inner product ( , ). Then for any v V we have
_
v,
0
_
= 0
Section 9.1 Inner Product Spaces 35
REMARK
Whenever one considers an inner product space one must dene which inner product
they are using. However, in R
n
and M
mn
(R) one generally uses the standard inner
product. Thus, whenever we are working in R
n
or M
mn
(R), if no other inner product
is mentioned, we will take this to mean that the standard inner product is being used.
1. Calculate the following inner products under the standard inner product on
M
22
(R).
(a)
__
1 2
0 1
_
,
_
1 2
0 1
__
(b)
__
3 4
2 5
_
,
_
4 1
2 1
__
(c)
__
2 3
4 1
_
,
_
5 3
1 5
__
2. Calculate the following given that the function
(p(x), q(x)) = p(0)q(0) + p(1)q(1) + p(2)q(2)
denes an inner product on P
2
(R).
(a) (x, 1 + x + x
2
) (b) (1 + x
2
, 1 + x
2
) (c) (x + x
2
, 1 + x + x
2
)
3. Determine, with proof, which of the following functions denes an inner prod-
uct on the given vector space.
(a) On R
3
, (x, y) = 2x
1
y
1
+ 3x
2
y
2
+ 4x
3
y
3
(b) On P
2
(R), (p(x), q(x)) = p(1)q(1) + p(1)q(1)
(c) On M
22
(R),
__
a
1
a
2
a
3
a
4
_
,
_
b
1
b
2
b
3
b
4
__
= a
1
b
1
a
4
b
4
+ a
2
b
2
a
3
b
3
(d) On P
2
(R), (p(x), q(x)) = p(2)q(2) + p(1)q(1) + p(2)q(2)
(e) On R
3
, (x, y) = 2x
1
y
1
x
1
y
2
x
2
y
1
+ 2x
2
y
2
+ x
3
y
3
(f) On P
2
(R), (p(x), q(x)) = 2p(1)q(1)+2p(0)q(0)+2p(1)q(1)p(1)q(1)
p(1)q(1)
4. Let V be an inner product space with inner product ( , ). Prove that (v,
0) = 0
for any v V.
5. Let ( , ) denote the standard inner product on R
n
and let A M
nn
(R). Prove
that
(Ax, y) = (x, A
T
y)
for all x, y R
n
. [HINT: Recall that x
T
y = x y.]
9.2 Orthogonality and Length
The concepts of length and orthogonality are fundamental in geometry and have
many real world applications. Thus, we now extend these concepts to general in-
ner product spaces.
Length
DEFINITION
Length
Let Vbe an inner product space with inner product ( , ). Then for any v Vwe dene
the length (or norm) of v by
jv j =
_
_
v, v
_
Observe that the length of a vector in an inner product space is dependent on the
denition of the inner product being used in the same way that the sum of two vectors
is dependent on the denition of addition being used. This is demonstrated in the
examples below.
EXAMPLE 1 Find the length of x =
_
_
1
0
2
_
_
in R
3
with the inner product
_
x, y
_
= x
1
y
1
+ 2x
2
y
2
+ 4x
3
y
3
.
Solution: We have
jx j =
_
_
x, x
_
=
_
(1)(1) + 2(0)(0) + 4(2)(2) =
17
Observe that under the standard inner product the length of x would have been

5.
EXAMPLE 2 Let p(x) = x and q(x) = 2 3x
2
be vectors in P
2
(R) with inner product
(p(x), q(x)) = p(1)q(1) + p(0)q(0) + p(1)q(1)
Calculate the length of p(x) and the length of q(x).
Solution: We have
jp(x)j =
_
(p(x), p(x)) =
_
p(1)p(1) + p(0)p(0) + p(1)p(1)
=
_
(1)(1) + 0(0) + 1(1) =
2
jq(x)j =
_
(q(x), q(x)) =
_
q(1)q(1) + q(0)q(0) + q(1)q(1)
=
_
(1)(1) + 2(2) + (1)(1) =
6
Section 9.2 Orthogonality and Length 37
EXAMPLE 3 Find the length of p(x) = x in P
2
(p(x), q(x)) = p(0)q(0) + p(1)q(1) + p(2)q(2)
Solution: We have
jp(x)j =
_
(p(x), p(x)) =
_
p(0)p(0) + p(1)p(1) + p(2)p(2)
=
_
0(0) + 1(1) + 2(2) =
5
EXAMPLE 4 On C[, ] nd the length of f (x) = x under the inner product
( f (x), g(x)) =
_

f (x)g(x) dx
Solution: We have
jx j
2
=
_

x
2
dx =
1
3
x
3
=
2
3
3
Thus, jx j =
_
2
3
/3.
EXAMPLE 5 Find the length of A =
_
1/2 1/2
1/2 1/2
_
in M
22
(R).
Solution: Using the standard inner product we get
jAj =
_
(A, A) =
_
(1/2)
2
+ (1/2)
2
+ (1/2)
2
+ (1/2)
2
= 1
In many cases it is useful (and sometimes necessary) to use vectors which have length
one.
DEFINITION
Unit Vector
Let V be an inner product space with inner product ( , ). If v V is a vector such that
jv j = 1, then v is called a unit vector.
In many situations, we will be given a general vector v V and need to nd a unit
vector in the direction of v. This is called normalizing the vector. The following
theorem shows us how to do this.
THEOREM 1 Let V be an inner product space with inner product ( , ). Then for any v V and
t R, we have
jtv j = t jv j
Moreover, if v

0, then v =
1
jv j
v is a unit vector in the direction of v.
EXAMPLE 6 Find a unit vector in the direction of p(x) = x in P
2
(p(x), q(x)) = p(0)q(0) + p(1)q(1) + p(2)q(2)
Solution: We have
jp(x)j =
_
p(0)p(0) + p(1)p(1) + p(2)p(2) =
_
0(0) + 1(1) + 2(2) =
5
Thus, a unit vector in the direction of p is
p(x) =
1
5
x
Orthogonality
DEFINITION
Orthogonal
Vectors
Let V be an inner product space with inner product ( , ). If x, y V such that
_
x, y
_
= 0
then x and y are said to be orthogonal.
EXAMPLE 7 The standard basis vectors for R
2
are not orthogonal under the inner product
_
x, y
_
= 2x
1
y
1
+ x
1
y
2
+ x
2
y
1
+ 2x
2
y
2
for R
2
since
__
1
0
_
,
_
0
1
__
= 2(1)(0) + 1(1) + 0(0) + 2(0)(1) = 1
EXAMPLE 8 Show that
_
1 2
3 1
_
is orthogonal to
_
1 2
1 0
_
in M
22
(R).
Solution:
__
1 2
3 1
_
,
_
1 2
1 0
__
= (1)(1) + 2(2) + (1)(3) + 0(1) = 0.
So, they are orthogonal.
DEFINITION
Orthogonal Set
If S = v
1
, . . . , v
k
is a set of vectors in an inner product space V with inner product
( , ) such that
_
v
i
, v
j
_
= 0 for all i j, then S is called an orthogonal set.
EXAMPLE 9 Prove that 1, x, 3x
2
2 is an orthogonal set of vectors in P
2
(R) under the inner
product (p(x), q(x)) = p(1)q(1) + p(0)q(0) + p(1)q(1).
Solution: We have
(1, x) = 1(1) + 1(0) + 1(1) = 0
_
1, 3x
2
2
_
= 1(1) + 1(2) + 1(1) = 0
_
x, 3x
2
2
_
= (1)(1) + 0(2) + 1(1) = 0
Hence, each vector is orthogonal to all other vectors so the set is orthogonal.
EXAMPLE 10 Prove that 1, x, 3x
2
2 is not an orthogonal set of vectors in P
2
(R) under the inner
product (p(x), q(x)) = p(0)q(0) + p(1)q(1) + p(2)q(2).
Solution: We have (1, x) = 1(0) + 1(1) + 1(2) = 3 so 1 and x are not orthogonal.
Thus, the set is not orthogonal.
EXAMPLE 11 Prove that
__
1 2
0 1
_
,
_
0 1
0 2
_
,
_
0 0
1 0
__
is an orthogonal set in M
22
(R).
Solution: We have
__
1 2
0 1
_
,
_
0 1
0 2
__
= 1(0) + 2(1) + 0(0) + 1(2) = 0
__
1 2
0 1
_
,
_
0 0
1 0
__
= 1(0) + 2(0) + 0(1) + 1(0) = 0
__
0 0
1 0
_
,
_
0 1
0 2
__
= 0(0) + 0(1) + 1(0) + 0(2) = 0
Hence, the set is orthogonal.
EXERCISE 1 Determine if
__
3 1
1 2
_
,
_
1 3
2 1
_
,
_
1 2
3 1
__
is an orthogonal set in M
22
(R).
One important property of orthogonal vectors in R
2
is the Pythagorean Theorem.
This extends to an orthogonal set in an inner product space.
THEOREM 2 If v
1
, . . . , v
k
is an orthogonal set in an inner product space V, then
jv
1
+ +v
k
j
2
= jv
1
j
2
+ + jv
k
j
2
The following theorem gives us an important result about orthogonal sets which do
not contain the zero vector.
THEOREM 3 Let S = v
1
, . . . , v
k
be an orthogonal set in an inner product space V with inner
product ( , ) such that v
i

0 for all 1 i k. Then S is linearly independent.
Proof: Consider
0 = c
1
v
1
+ + c
k
v
k
Take the inner product of both sides with v
i
to get
_
v
i
,
0
_
=
_
v
i
, c
1
v
1
+ + c
k
v
k
_
0 = c
1
_
v
i
, v
1
_
+ + c
i
_
v
i
, v
i
_
+ + c
k
_
v
i
, v
k
_
= 0 + + 0 + c
i
jv
i
j
2
+ 0 + + 0
= c
i
jv
i
j
2
Since v
i

0 we have that jv
i
j 0, and so c
i
= 0 for 1 i k. Therefore, S is
linearly independent.
Consequently, if we have an orthogonal set of non-zero vectors which spans an inner
product space V, then it will be a basis for V. We will call this an orthogonal basis.
Since the vectors in the basis are all orthogonal to each other, our geometric intuition
tells us that it should be quite easy to nd the coordinates of any vector with respect
to this basis. The following theorem demonstrates this.
THEOREM 4 If S = v
1
, . . . , v
n
is an orthogonal basis for an inner product space V with inner
product ( , ) and v V, then the coecient of v
i
when v is written as a linear combi-
nation of the vectors in S is
_
v, v
i
_
jv
i
j
2
. In particular,
v =
_
v, v
1
_
jv
1
j
2
v
1
+ +
_
v, v
n
_
jv
n
j
2
v
n
Proof: Since v V we can write
v = c
1
v
1
+ + c
n
v
n
Taking the inner product of both sides with v
i
gives
_
v
i
, v
_
=
_
v
i
, c
1
v
1
+ + c
n
v
n
_
= c
1
_
v
i
, v
1
_
+ + c
i
_
v
i
, v
i
_
+ + c
n
_
v
i
, v
n
_
= 0 + + 0 + c
i
jv
i
j
2
+ 0 + + 0
since S is orthogonal. Also, v
i

0 which implies that jv
i
j
2
0. Therefore, we get
c
i
=
_
v, v
i
_
jv
i
j
2
This is valid for 1 i n and so the result follows.
EXAMPLE 12 Find the coordinates of x =
_
_
1
2
3
_
_
in R
3
with respect to the orthogonal basis
J =
_
_
_
_
1
1
1
_
_
,
_
_
1
0
1
_
_
,
_
_
1
2
1
_
_
_
_
.
Solution: By Theorem 4 we have
c
1
=
_
x, v
1
_
jv
1
j
2
=
0
3
= 0
c
2
=
_
x, v
2
_
jv
2
j
2
=
4
2
= 2
c
3
=
_
x, v
3
_
jv
3
j
2
=
6
6
= 1
Thus, [x]
J
=
_
_
0
2
1
_
_
.
Note that it is easy to verify that
_
_
1
2
3
_
_
= 0
_
_
1
1
1
_
_
+ 2
_
_
1
0
1
_
_
+ (1)
_
_
1
2
1
_
_
EXAMPLE 13 Given that J =
__
1 1
1 1
_
,
_
2 0
1 1
_
,
_
0 0
1 1
__
is an orthogonal basis for a subspace S
of M
22
(R), nd the coordinates of A =
_
4 2
3 1
_
with respect to J.
Solution: We have
c
1
=
_
A, v
1
_
jv
1
j
2
=
4(1) + 2(1) + 3(1) + (1)(1)
1
2
+ 1
2
+ 1
2
+ 1
2
=
8
4
= 2
c
2
=
_
A, v
2
_
jv
2
j
2
=
4(2) + 2(0) + 3(1) + (1)(1)
(2)
2
+ 0
2
+ 1
2
+ 1
2
=
6
6
= 1
c
3
=
_
A, v
3
_
jv
3
j
2
=
4(0) + 2(0) + 3(1) + (1)(1)
(0)
2
+ 0
2
+ 1
2
+ (1)
2
=
4
2
= 2
Thus, [A]
J
=
_
_
2
1
2
_
_
.
Orthonormal Bases
Observe that the formula for the coordinates with respect to an orthogonal basis
would be even easier if all the basis vectors were unit vectors. Thus, it is desirable to
consider such bases.
DEFINITION
Orthonormal Set
If S = v
1
, . . . , v
k
is an orthogonal set in an inner product space V such that jv
i
j = 1
for 1 i k, then S is called an orthonormal set.
By Theorem 3 an orthonormal set is necessarily linearly independent.
DEFINITION
Orthonormal
Basis
A basis for an inner product space V which is an orthonormal set is called an
orthonormal basis of V.
EXAMPLE 14 The standard basis of R
n
is an orthonormal basis under the standard inner product.
EXAMPLE 15 The standard basis for R
3
is not an orthonormal basis under the inner product
_
x, y
_
= x
1
y
1
+ 2x
2
y
2
+ x
3
y
3
since e
2
=
_
_
0
1
0
_
_
is not a unit vector under this inner product.
EXAMPLE 16 Let J =
_
_
1
3
_
_
1
1
1
_
_
,
1
2
_
_
1
0
1
_
_
,
1
6
_
_
1
2
1
_
_
_
_
. Verify that J is an orthonormal basis in R
3
.
Solution: We rst verify that each vector is a unit vector. We have
_
1
3
_
_
1
1
1
_
_
,
1
3
_
_
1
1
1
_
_
_
=
1
3
+
1
3
+
1
3
= 1
_
1
2
_
_
1
0
1
_
_
,
1
2
_
_
1
0
1
_
_
_
=
1
2
+ 0 +
1
2
= 1
_
1
6
_
_
1
2
1
_
_
,
1
6
_
_
1
2
1
_
_
_
=
1
6
+
4
6
+
1
6
= 1
We now verify that J is orthogonal.
_
1
3
_
_
1
1
1
_
_
,
1
2
_
_
1
0
1
_
_
_
=
1
6
(1 + 0 1) = 0
_
1
3
_
_
1
1
1
_
_
,
1
6
_
_
1
2
1
_
_
_
=
1
18
(1 2 + 1) = 0
_
1
2
_
_
1
0
1
_
_
,
1
6
_
_
1
2
1
_
_
_
=
1
12
(1 + 0 1) = 0
Therefore, J is an orthonormal set of 3 vectors in R
3
which implies that it is a basis
for R
3
.
EXERCISE 2 Show that J =
__
1/
2 1/
2
0 0
_
,
_
0 0
1/
2 1/
2
_
,
_
1/2 1/2
1/2 1/2
__
is an orthonormal
set in M
22
(R).
EXERCISE 3 Show that J = 1, x, x
2
is not an orthonormal set for P
2
(R) under the inner product
(p(x), q(x)) = p(2)q(2) + p(0)q(0) + p(2)q(2)
EXAMPLE 17 Turn 1, x, 3x
2
2 into an orthonormal basis for P
2
(R) under the inner product
(p(x), q(x)) = p(1)q(1) + p(0)q(0) + p(1)q(1)
Solution: In Example 9 we showed that 1, x, 3x
2
2 is orthogonal. Hence, it is a
linearly independent set of three vectors in P
2
(R) and thus is a basis for P
2
(R). To
turn it into an orthonormal basis, we just need to normalize each vector. We have
j1j =
1
2
+ 1
2
+ 1
2
=
3
jxj =
_
(1)
2
+ 0
2
+ 1
2
=
2
j3x
2
2j =
_
1
2
+ (2)
2
+ 1
2
=
6
Hence, an orthonormal basis for P
2
(R) is
_
1
3
,
x
2
,
3x
2
2
6
_
EXAMPLE 18 Show that the set 1, sin x, cos x is an orthogonal set in C[, ] under
( f (x), g(x)) =
_

f (x)g(x) dx, then make it an orthonormal set.

Solution:
(1, sin x) =
_

1 sin x dx = 0
(1, cos x) =
_

1 cos x dx = 0
(sin x, cos x) =
_

sin x cos x dx = 0
Thus, the set is orthogonal. Next we nd that
(1, 1) =
_

1
2
dx = 2
(sin x, sin x) =
_

sin
2
x dx =
(cos x, cos x) =
_

cos
2
x dx =
Hence, j1j =
2, j sin xj =
, and j cos xj =
. Therefore,
_
1
2
,
sin x
,
cos x
_
is an orthonormal set in C[, ].
The following theorem shows, as predicted, that it is very easy to nd coordinates of
a vector with respect to an orthonormal basis.
THEOREM 5 If v is any vector in an inner product space V with inner product ( , ) and
J = v
1
, . . . , v
n
is an orthonormal basis for V, then
v =
_
v, v
1
_
v
1
+ +
_
v, v
n
_
v
n
Proof: By Theorem 4 we have
v =
_
v, v
1
_
jv
1
j
2
v
1
+ +
_
v, v
n
_
jv
n
j
2
v
n
=
_
v, v
1
_
v
1
+ +
_
v, v
n
_
v
n
since jv
i
j = 1 for 1 i n.
REMARK
The formula for nding coordinates with respect to an orthonormal basis looks nicer
that that for an orthogonal basis. However, in practice, is it not necessarily easier to
use as the vectors in an orthonormal basis often contain square roots. Compare the
following example to Example 12.
EXAMPLE 19 Find
_
_
1
2
3
_
_
J
given that J =
_
_
1
3
_
_
1
1
1
_
_
,
1
2
_
_
1
0
1
_
_
,
1
6
_
_
1
2
1
_
_
_
_
is an orthonormal basis for R
3
.
Solution: We have
c
1
=
_
x, v
1
_
=
_
_
_
1
2
3
_
_
,
1
3
_
_
1
1
1
_
_
_
= 0
c
2
=
_
x, v
2
_
=
_
_
_
1
2
3
_
_
,
1
2
_
_
1
0
1
_
_
_
=
4
2
= 2
2
c
3
=
_
x, v
3
_
=
_
_
_
1
2
3
_
_
,
1
6
_
_
1
2
1
_
_
_
=
6
6
=
6
Thus, [x]
J
=
_
_
0
2
6
_
_
.
EXAMPLE 20 Let (p(x), q(x)) = p(1)q(1) + p(0)q(0) + p(1)q(1) be an inner product for P
2
(R).
Write f (x) = 1 + x + x
2
as a linear combination of the vectors in the orthonormal
basis J =
_
1
3
,
x
2
,
3x
2
2
6
_
.
Solution: By Theorem 5, the coordinates of x with respect to J are
c
1
=
_
f , v
1
_
= 1
_
1
3
_
+ 1
_
1
3
_
+ 3
_
1
3
_
=
5
3
c
2
=
_
f , v
2
_
= 1
_
1
2
_
+ 1(0) + 3
_
1
2
_
=
2
c
3
=
_
f , v
3
_
= 1
_
1
6
_
+ 1
_
2
6
_
+ 3
_
1
6
_
=
2
6
Thus,
1 + x + x
2
=
5
3
_
1
3
_
+
2
_
x
2
_
+
2
6
_
3x
2
2
6
_
Orthogonal Matrices
We end this section by looking at a very important application of orthonormal bases
of R
n
.
THEOREM 6 For an n n matrix P, the following are equivalent.
(1) The columns of P form an orthonormal basis for R
n
.
(2) P
T
= P
1
(3) The rows of P form an orthonormal basis for R
n
.
Proof: Let P =
_
v
1
v
n
_
where J = v
1
, . . . , v
n
n
.
(1) (2): By denition of matrix multiplication we have
P
T
P =
_
_
v
T
1
.
.
.
v
T
n
_
_
_
v
1
v
n
_
=
_
_
v
1
v
1
v
1
v
2
v
1
v
n
v
2
v
1
v
2
v
2
v
2
v
n
.
.
.
.
.
.
.
.
.
.
.
.
v
n
v
1
v
n
v
2
v
n
v
n
_
_
Therefore P
T
P = I if and only if v
i
v
j
= 0 whenever i j and v
i
v
i
= 1.
(2) (3): The rows of P form an orthonormal basis for R
n
if and only if the columns
of P
T
from an orthonormal basis for R
n
. We proved above that this is true if and only
if
(P
T
)
T
= (P
T
)
1
P = (P
1
)
T
P
T
=
_
(P
1
)
T
_
T
P
T
= P
1
as required.
DEFINITION
Orthogonal
Matrix
Let P be an n n matrix such that the columns of P form an orthonormal basis for
R
n
. Then P is called an orthogonal matrix.
REMARKS
1. Be very careful to remember that an orthogonal matrix has orthonormal rows
and columns. A matrix whose columns form an orthogonal set but not an
orthonormal set is not orthogonal!
2. By Theorem 6 we could have dened an orthogonal matrix as a matrix P such
that P
T
= P
1
instead. We will use this property of orthogonal matrices fre-
quently.
EXAMPLE 21 Which of the following matrices are orthogonal.
A =
_
_
0 1 0
0 0 1
1 0 0
_
_
, B
_
1/2 1/2
1/2 1/2
_
, C =
_
_
1/
2 1/
2 0
1/
6 1/
6 2/
6
1/
3 1/
3 1/
3
_
_
Solution: The columns of A are the standard basis vectors for R
3
which we know is
an orthonormal basis for R
3
. Thus, A is orthogonal.
Although the columns of B are clearly orthogonal, we have that
_
_
_
_
_
_
_
1/2
1/2
_
_
_
_
_
_
_
=
_
(1/2)
2
+ (1/2)
2
=
_
1/2 1
Thus, the rst column is not a unit vector, so B is not orthogonal.
By matrix multiplication, we nd that
CC
T
=
_
_
1/
2 1/
2 0
1/
6 1/
6 2/
6
1/
3 1/
3 1/
3
_
_
_
_
1/
2 1/
6 1/
3
1/
2 1/
6 1/
3
0 2/
6 1/
3
_
_
= I
Hence C is orthogonal.
We now look at some useful properties of orthogonal matrices.
THEOREM 7 Let P and Q be n n orthogonal matrices and x, y R
n
. Then
(1) (Px) (Py) = x y.
(2) jPx j = jx j.
(3) det P = 1.
(4) All real eigenvalues of P are 1 or 1.
(5) PQ is an orthogonal matrix.
Proof: We will prove (1) and (2) and leave the others as exercises. We have
(Px) (Py) = (Px)
T
(Py) = x
T
P
T
Py = x
T
y = x y
Thus,
jPx j
2
= (Px) (Px) = x x = jx j
2
as required.
1. Consider the subspace Span J in M
22
(R) where
J =
__
1 1
1 1
_
,
_
1 1
1 1
_
,
_
1 1
1 1
__
(a) Prove that J is an orthogonal basis for S.
(b) Turn J into an orthonormal basis ( by normalizing the vectors in J.
(c) Find the coordinates of x =
_
3 6
2 1
_
with respect to the orthonormal
basis (.
2. Consider the inner product ( , ) on P
2
(R) dened by
(p(x), q(x)) = p(1)q(1) + p(0)q(0) + p(1)q(1)
Let J = 1 + x, 2 + 3x, 2 3x
2
(a) Prove that J is an orthogonal basis for P

2
(R).
(b) Find the coordinates of 5 + x 3x
2
with respect to J.
3. Consider the inner product ( , ) on dened R
3
by
_
x, y
_
= x
1
y
1
+ 2x
2
y
2
+ x
3
y
3
.
Let J =
_
_
_
_
1
2
1
_
_
,
_
_
3
1
1
_
_
,
_
_
3
1
7
_
_
_
_
(a) Prove that J is an orthogonal basis for R
3
.
(b) Turn J into an orthonormal basis ( by normalizing the vectors in J.
(c) Find the coordinates of x =
_
_
1
1
1
_
_
with respect to the orthonormal basis (.
4. Which of the following matrices are orthogonal.
(a)
_
1/
10 3/
10
3/
10 1/
10
_
(b)
_
2 1
1 2
_
(c)
_
_
2/3 1/3 2/3
1/
18 4/
18 1/
18
1/
2 0 1/
2
_
_
(d)
_
_
1 1 1
2 1 0
1 1 1
_
_
(e)
_
_
0 1 0
0 0 1
1 0 0
_
_
(f)
_
_
2/
20 3/
11 0
3/
20 1/
11 1/
2
3/
20 1/
11 1/
2
_
_
5. Consider P
2
(R) with inner product (p(x), q(x)) = p(1)q(1) + p(0)q(0) +
p(1)q(1). Given that B = 1 x
2
,
1
2
(x x
2
) is an orthonormal set, extend B to
nd an orthonormal basis for P
2
(R).
6. Let V be an inner product space with inner product ( , ). Prove that for any
v V and t R, we have
jtv j = t jv j
Section 9.3 The Gram-Schmidt Procedure 49
7. Let P and Q be n n orthogonal matrices
(a) Prove that det P = 1.
(b) Prove that all real eigenvalues of P are 1.
(c) Give an orthogonal matrix P whose eigenvalues are not 1.
(d) Prove that PQ is an orthogonal matrix.
8. Given that J =
_
_
_
_
1/
6
2/
6
1/
6
_
_
,
_
_
1/
3
1/
3
1/
3
_
_
,
_
_
1/
2
0
1/
2
_
_
_
_
is an orthonormal basis
for R
3
. Using J, determine another orthonormal basis for R
3
which includes
the vector
_
_
1/
6
1/
3
1/
2
_
_
, and briey explain why your basis is orthonormal.
9. Let v
1
, . . . , v
m
be a set of m orthonormal vectors in R
n
with m < n. Let
Q =
_
v
1
v
m
_
. Prove that Q
T
Q = I
m
.
10. Let v
1
, . . . , v
n
be an orthonormal basis for an inner product space Vwith inner
product ( , ) and let x = c
1
v
1
+ + c
n
v
n
and y = d
1
v
1
+ + d
n
v
n
. Show that
(x, y) = c
1
d
1
+ + c
n
d
n
and jx j
2
= c
2
1
+ + c
2
n
9.3 The Gram-Schmidt Procedure
We saw in Math 136 that every nite dimensional vector space has a basis. In this
section, we derive an algorithm for changing a basis for a given inner product space
into an orthogonal basis. Once we have an orthogonal basis, it is easy to normalize
each vector to make it orthonormal.
Let Wbe a n-dimensional inner product space and let w
1
, . . . , w
n
be a basis W. We
want to nd an orthogonal basis v
1
, . . . , v
n
for W.
To develop a standard procedure, we will look at a few simple cases and then gener-
alize.
First, consider the case where w
1
is a basis for W. In this case, we can take v
1
= w
1
so that v
1
is an orthogonal basis for W.
Now, consider the case where w
1
, w
2
is a basis for W. Starting as in the case above
we let v
1
= w
1
. We then need to pick v
2
so that it is orthogonal to v
1
and v
1
, v
2
spans
W. To see how to nd such a vector v
2
we work backwards. Assume that we have an
orthogonal set v
1
, v
2
which spans W, then we must have that w
2
Spanv
1
, v
2
. If
this is true, then, from our work with coordinates with respect to an orthogonal basis,
we nd that
w
2
=
< w
2
, v
1
>
jv
1
j
2
v
1
+
< w
2
, v
2
>
jv
2
j
2
v
2
where
< w
2
, v
2
>
jv
2
j
2
0 since w
2
is not a scalar multiple of w
1
= v
1
. Rearranging gives
< w
2
, v
2
>
jv
2
j
2
v
2
= w
2

< w
2
, v
1
>
jv
1
j
2
v
1
Multiplying by a non-zero scalar does not change orthogonality or spanning, so we
can take any scalar multiple of v
2
. For simplicity, we take
v
2
= w
2

< w
2
, v
1
>
jv
1
j
2
v
1
Consequently, we have that v
1
, v
2
For the case where w
1
, w
2
, w
3
is a basis for W, we can repeat the same procedure.
We start by picking v
1
and v
2
as above. We then need to nd v
3
such that v
1
, v
2
, v
3
is orthogonal and spans W. Using the same argument as in the previous case, we get
v
3
= w
3

< w
3
, v
1
>
jv
1
j
2
v
1

< w
3
, v
2
>
jv
2
j
2
v
2
and v
1
, v
2
, v
3
is an orthogonal basis for W. Repeating this algorithm for the case
W = Span w
1
, . . . , w
k
gives us the following result.
THEOREM 1 (Gram-Schmidt Orthogonalization)
Let W be a subspace of an inner product space with basis w
1
, . . . , w
k
. Dene
v
1
, . . . , v
k
successively as follows:
v
1
= w
1
v
2
= w
2

< w
2
, v
1
>
jv
1
j
2
v
1
v
i
= w
i

< w
i
, v
1
>
jv
1
j
2
v
1

< w
i
, v
2
>
jv
2
j
2
v
2

< w
i
, v
i1
>
jv
i1
j
2
v
i1
for 3 i k. Then Spanv
1
, . . . , v
i
= Span w
1
, . . . , w
i
for 1 i k. In particular,
v
1
, . . . , v
k
is an orthogonal basis of W.
EXAMPLE 1 Use the Gram-Schmidt procedure to transform J =
_
_
_
_
1
1
1
_
_
,
_
_
1
1
1
_
_
,
_
_
1
1
0
_
_
_
_
into an
orthonormal basis for R
3
.
Solution: Denote the vectors in J by w
1
, w
2
, and w
3
respectively. We rst take
v
1
= w
1
=
_
_
1
1
1
_
_
. Then
w
2

_
w
2
, v
1
_
jv
1
j
2
v
1
=
_
_
1
1
1
_
1
3
_
_
1
1
1
_
_
=
_
_
2/3
4/3
2/3
_
_
Section 9.3 The Gram-Schmidt Procedure 51
We can take any scalar multiple of this, so we take v
2
=
_
_
1
2
1
_
_
to simplify the next
calculation. Next,
v
3
= w
3

_
w
3
, v
1
_
jv
1
j
2
v
1

_
w
3
, v
2
_
jv
2
j
2
v
2
=
_
_
1
1
0
_
2
3
_
_
1
1
1
_
_
+
1
6
_
_
1
2
1
_
_
=
_
_
1/2
0
1/2
_
_
Then v
1
, v
2
, v
3
is an orthogonal basis for R
3
. To get an orthonormal basis, we nor-
malize each vector. We get
v
1
=
1
3
_
_
1
1
1
_
_
v
2
=
1
6
_
_
1
2
1
_
_
v
3
=
1
2
_
_
1
0
1
_
_
So, v
1
, v
2
, v
3
3
.
EXAMPLE 2 Use the Gram-Schmidt procedure to transform J =
_
_
_
_
1
1
1
_
_
,
_
_
1
1
1
_
_
,
_
_
1
1
0
_
_
_
_
into an orthog-
onal basis for R
3
with inner product
_
x, y
_
= 2x
1
y
1
+ x
2
y
2
+ 3x
3
y
3
Solution: Take v
1
=
_
_
1
1
1
_
_
. Then
w
2

_
w
2
, v
1
_
jv
1
j
2
v
1
=
_
_
1
1
1
_
4
6
_
_
1
1
1
_
_
=
_
_
1/3
5/3
1/3
_
_
To simplify calculations we take v
2
=
_
_
1
5
1
_
_
. Next,
v
3
= w
3

_
w
3
, v
1
_
jv
1
j
2
v
1

_
w
3
, v
2
_
jv
2
j
2
v
2
=
_
_
1
1
0
_
3
6
_
_
1
1
1
_
_
+
3
30
_
_
1
5
1
_
_
=
_
_
3
5
0
2
5
_
_
Thus, an orthogonal basis is
_
_
_
_
1
1
1
_
_
,
_
_
1
5
1
_
_
,
_
_
3
0
2
_
_
_
_
.
EXAMPLE 3 Find an orthogonal basis of P
3
(R) with inner product (p, q) =
_
1
1
p(x)q(x) dx by
applying the Gram-Schmidt procedure to the basis 1, x, x
2
, x
3
.
Solution: Take v
1
= 1. Then
v
2
= x
(x, 1)
j1j
2
1 = x
0
2
1 = x
v
3
= x
2
_
x
2
, 1
_
j1j
2
1
_
x
2
, x
_
jxj
2
x = x
2
2
3
2
1
0
2
3
x = x
2
1
3
We instead take v
3
= 3x
2
1 to make calculations easier. Finally, we get
v
4
= x
3
_
x
3
, 1
_
j1j
2
1
_
x
3
, x
_
jxj
2
x
_
x
3
, 3x
2
1
_
j3x
2
1j
2
(3x
2
1) = x
3
3
5
x
Hence, an orthogonal basis for this inner product space
_
1, x, x
2
1
3
, x
3
3
5
x
_
.
EXAMPLE 4 Use the Gram-Schmidt procedure to nd an orthogonal basis for the subspace W of
M
22
(R) spanned by
w
1
, w
2
, w
3
, w
4
=
__
1 1
0 1
_
,
_
1 0
1 1
_
,
_
2 1
1 2
_
,
_
1 0
0 1
__
under the standard inner product.
Solution: Take v
1
= w
1
. Then
w
2

_
w
2
, v
1
_
jv
1
j
2
v
1
=
_
1 0
1 1
_
2
3
_
1 1
0 1
_
=
_
1/3 2/3
1 1/3
_
So, we take v
2
=
_
1 2
3 1
_
. Next we get
v
3
= w
3

_
w
3
, v
1
_
jv
1
j
2
v
1

_
w
3
, v
2
_
jv
2
j
2
v
2
=
_
2 1
1 2
_
5
3
_
1 1
0 1
_
5
15
_
1 2
3 1
_
=
_
0 0
0 0
_
At rst glance, we may think something has gone wrong since the zero vector cannot
be a member of the basis. However, if we look closely, we see that this implies that
w
3
is a linear combination of w
1
and w
2
. Hence, we have
Span w
1
, w
2
, w
4
= Span w
1
, w
2
, w
3
, w
4
Therefore, we can ignore w

3
and continue the procedure using w
4
.
v
3
= w
4

_
w
4
, v
1
_
jv
1
j
2
v
1

_
w
4
, v
2
_
jv
2
j
2
v
2
=
_
1 0
0 1
_
2
3
_
1 1
0 1
_
2
15
_
1 2
3 1
_
=
_
1/5 2/5
2/5 1/5
_
Hence an orthogonal basis for Wis v
1
, v
2
, v
3
.
Section 9.4 General Projections 53
1. Use the Gram-Schmidt procedure to transform J =
_
_
_
_
1
0
1
_
_
,
_
_
2
2
3
_
_
,
_
_
1
3
1
_
_
_
_
into an
orthogonal basis for R
3
.
2. Consider P
2
(R) with the inner product
(p(x), q(x)) = p(1)q(1) + p(0)q(0) + p(1)q(1)
Use the Gram-Schmidt procedure to transform S = 1, x, x
2
into an orthonor-
mal basis for P
2
(R).
3. Let J =
__
1 1
2 2
_
,
_
1 0
1 1
_
,
_
1 1
0 0
_
,
_
1 0
0 1
__
. Find an orthogonal basis for
the subspace S = Span J of M
22
(R).
4. Suppose P
2
(R) has an inner product ( , ) which satises the following:
(1, 1) = 2 (1, x) = 2 (1, x
2
) = 2
(x, x) = 4 (x, x
2
) = 2 (x
2
, x
2
) = 3
Given that J = x, 2x
2
+ x, 2 is a basis of P
2
(R). Apply the Gram-Schmidt
procedure to this basis to nd an orthogonal basis of P
2
(R).
9.4 General Projections
In Math 136 we saw how to calculate the projection of a vector onto a plane in R
3
.
Such projections have many important uses. So, our goal now is to extend projections
to general inner product spaces.
We will do this by mimicking what we did with projections in R
n
. We rst recall
from Math 136 that given a vector x R
3
and a plane P in R
3
which passes through
the origin, we wanted to write x as a the sum of a vector in P and a vector orthogonal
to every vector in P. Therefore, given a subspace Wof an inner product space V and
any vector v V, we want to nd a vector proj
W
v in W and a vector perp
W
v which
is orthogonal to every vector in Wsuch that
v = proj
W
v + perp
W
v
In the case of the plane in R
3
, we knew that the orthogonal vector was a scalar mul-
tiple of the normal vector. In the general case, we need to start by looking at the set
of vectors which are orthogonal to every vector in W.
DEFINITION
Orthogonal
Complement
Let Wbe a subspace of an inner product space V. The orthogonal complement W
of Win V is dened by
W
= v V
_
w, v
_
= 0 for all w W
EXAMPLE 1 Let W = Span
_
_
_
_
1
0
0
1
_
_
,
_
_
1
1
0
0
_
_
_
_
. Find W
in R
4
.
Solution: We want to nd all v =
_
_
v
1
v
2
v
3
v
4
_
_
such that
0 =
_
_
_
v
1
v
2
v
3
v
4
_
_
,
_
_
1
0
0
1
_
_
_
= v
1
+ v
4
0 =
_
_
_
v
1
v
2
v
3
v
4
_
_
,
_
_
1
1
0
0
_
_
_
= v
1
+ v
2
Solving this homogeneous system of two equations we nd that the general solution
is v = s
_
_
0
0
1
0
_
_
+ t
_
_
1
1
0
1
_
_
, s, t R. Thus,
W
= Span
_
_
_
_
0
0
1
0
_
_
,
_
_
1
1
0
1
_
_
_
_
EXAMPLE 2 Let W = Spanx in P
2
(R) with (p(x), q(x)) =
_
1
0
p(x)q(x) dx. Find W
.
Solution: We want to nd all p(x) = a + bx + cx
2
such that (p, x) = 0. This gives
0 =
_
1
0
ax + bx
2
+ cx
3
=
1
2
a +
1
3
b +
1
4
c
So c = 2a
4
3
b. Thus every vector in the orthogonal compliment has the form
a + bx
_
2a
4
3
b
_
x
2
= a(1 2x
2
) + b
_
x
4
3
x
2
_
. Thus we get
W
= Span1 2x
2
, 3x 4x
2
The following theorem shows that for a vector x to be in W
it only need be orthog-

onal to each of the basis vectors of W.
THEOREM 1 Let v
1
, . . . , v
k
be a basis for a subspace Wof an inner product space V. If
_
x, v
i
_
= 0
for 1 i k, then x W
.
The following theorem gives some important properties of the othogonal compliment
which we will require later.
THEOREM 2 Let Wbe a nite dimensional subspace of an inner product space V. Then
(1) W
is a subspace of V.
(2) If dimV = n, then dimW
= n dimW.
(3) If dimV = n, then
_
W
= W.
(4) W W
0.
(5) If dimV = n, v
1
, . . . , v
k
is an orthogonal basis for W, and v
k+1
, . . . , v
n
is an
orthogonal basis for W
, then v
1
, . . . , v
k
, v
k+1
, . . . , v
n
is an orthogonal basis
for V.
Proof: For (1), we apply the Subspace Test. By denition Wis a subset of V. Also,
0 W
since (
0, w) = 0 for all w W by Theorem 3.2.1. Let u, v W
and t R.
Then for any w Wwe have
_
u +v, w
_
=
_
u, w
_
+
_
v, w
_
= 0 + 0 = 0
and
_
tu, w
_
= t
_
u, w
_
= t(0) = 0
Thus, u +v W
and tu W
, so W
is a subspace of V.
For (2), let v
1
, . . . , v
k
be an orthonormal basis for W. We can extend this to a
basis for V and then apply the Gram-Schmidt procedure to get an orthonormal basis
v
1
, . . . , v
n
for V. Then, for any x W
we have x V so we can write

x =
_
x, v
1
_
v
1
+ +
_
x, v
k
_
v
k
+
_
x, v
k+1
_
v
k+1
+ +
_
x, v
n
_
v
n
=
0 + +
0 +
_
x, v
k+1
_
v
k+1
+ +
_
x, v
n
_
v
n
Hence, W
= Spanv
k+1
, . . . , v
n
. Moreover, v
k+1
, . . . , v
n
since it is orthonormal. Thus, dimW
= n k = n dimW.
For (3), if w W, then
_
w, v
_
= 0 for all v W
by denition of W
. Hence W
_
W
. So W
_
W
Also, dimW
= n dimW
= n (n dimW) = dimW.
Hence W =
_
W
.
For (4), if x W W
, then
_
x, x
_
= 0 and hence x =
0. Therefore, W W
0.
For (5), we have that v
1
, . . . , v
k
, v
k+1
, . . . , v
n
is an orthogonal set of non-zero vectors
in V since every vector in W
is orthogonal to every vector W. Moreover, it contains

n vectors by (4). Hence, we have that it is linearly independent set of n vectors. Thus,
it is an orthogonal basis for V.
Property (3) may seem obvious at rst glance which may make us wonder why the
condition that dimV = n was necessary. This is because this property may not be
true in an innite dimensional inner product space! That is, we can have
_
W
W.
We demonstrate this with an example.
EXAMPLE 3 Let P be the inner product space of all real polynomials with inner product
_
a
0
+ a
1
x + + a
k
x
k
, b
0
+ b
1
x + + b
l
x
l
_
= a
0
b
0
+ a
1
b
1
+ + a
p
b
p
where p = min(k, l). Now, let U be the subspace
U =
_
g(x) P g(1) = 0
_
We claim that U
= 0. Let f (x) U
, say f (x) = a
0
+ a
1
x + + a
n
x
n
. Note that
( f , x
n+1
) = 0. Dene g(x) = f (x) f (1)x
n+1
. We get that g(1) = 0 and hence g U.
Therefore, we have
0 = ( f , g) =
_
f , f f (1)x
n+1
_
= ( f , f ) f (1)( f , x
n+1
) = ( f , f )
and hence f (x) = 0.
Of course
_
U
= P, since (
0, v) = 0 for every v V. Therefore,

_
U
U.
Notice that, as in the proof of property (3) above, we do have that U
_
U
.
We now return to our purpose of looking at orthogonal complements, to dene the
projection of a vector v onto a subspace Wof an inner product space V. We stated that
we want to nd proj
W
v and perp
W
v such that v = proj
W
v+perp
W
v with proj
W
v W
and perp
W
v W
.
Suppose that dimV = n, v
1
, . . . , v
k
is an orthogonal basis for W, and v
k+1
, . . . , v
n
is an orthogonal basis for W
. By Theorem 2 we have that v

1
, . . . , v
k
, v
k+1
, . . . , v
n
is
an orthogonal basis for V. So, for any v V we nd the coordinates with respect to
this orthogonal basis are:
v =
_
v, v
1
_
jv
1
j
2
v
1
+ +
_
v, v
k
_
jv
k
j
2
v
k
+
_
v, v
k+1
_
jv
k+1
j
2
v
k+1
+ +
_
v, v
n
_
jv
n
j
2
v
n
This gives us what we want! Taking
proj
W
v =
_
v, v
1
_
jv
1
j
2
v
1
+ +
_
v, v
k
_
jv
k
j
2
v
k
W
and
perp
W
v =
_
v, v
k+1
_
jv
k+1
j
2
v
k+1
+ +
_
v, v
n
_
jv
n
j
2
v
n
W
we get v = proj
W
v + perp
W
v with proj
W
v Wand perp
W
v W
.
DEFINITION
Projection
Perpendicular
Suppose Wis a k-dimensional subspace of an inner product space V and v
1
, . . . , v
k
is an orthogonal basis for W. For any v V we dene the projection of v onto Wby
proj
W
v =
_
v, v
1
_
jv
1
j
2
v
1
+ +
_
v, v
k
_
jv
k
j
2
v
k
and the perpendicular of the projection by
perp
W
v = v proj
W
v
We have dened the perpendicular of the projection in such a way that we do not
need an orthogonal basis (or any basis) for W
. To ensure this is valid, we need to

verify that perp
W
v W
.
THEOREM 3 Suppose W is a k-dimensional subspace of an inner product space V. Then for any
v V, we have
perp
W
v = v proj
W
v W
Proof: Let v
1
, . . . , v
k
be an orthogonal basis for W. Then we can write any w W
as w = c
1
v
1
+ + c
k
v
k
. This gives
_
perp
W
v, w
_
=
_
v proj
W
v, w
_
=
_
v, w
_
_
proj
W
v, w
_
Observe that
_
v, w
_
=
_
v, c
1
v
1
+ + c
k
v
k
_
= c
1
_
v, v
1
_
+ + c
k
_
v, v
k
_
and, using the fact that v
1
, . . . , v
k
is an orthogonal set, we get
_
proj
W
v, w
_
=
__
v, v
1
_
jv
1
j
2
v
1
+ +
_
v, v
k
_
jv
k
j
2
v
k
, c
1
v
1
+ + c
k
v
k
_
= c
1
_
v, v
1
_
jv
1
j
2
_
v
1
, v
1
_
+ + c
k
_
v, v
k
_
jv
k
j
2
_
v
k
, v
k
_
= c
1
_
v, v
1
_
+ + c
k
_
v, v
k
_
Thus,
_
v, w
_
_
proj
W
v, w
_
= 0
and hence perp
W
v is orthogonal to every w Wand so perp
W
v W
.
Observe that this implies that proj
W
v and perp
W
v are orthogonal for any v V as
we saw in Math 136. Additionally, these functions satisfy other properties which we
saw for projections in Math 136.
THEOREM 4 Suppose Wis a k-dimensional subspace of an inner product space V. Then, proj
W
is
a linear operator on V with kernel W
.
THEOREM 5 Suppose that W is a subspace of a nite dimensional subspace V. Then, for any
v V we have
proj
W
v = perp
W
v
__
1 0
0 1
_
,
_
1 0
0 1
__
be a subspace of M
22
(R). Find proj
W
_
1 1
2 3
_
.
Solution: Denote the vectors in the basis for W by v
1
=
_
1 0
0 1
_
and v
2
=
_
1 0
0 1
_
,
and observe that v
1
, v
2
is orthogonal. Let x =
_
1 1
2 3
_
, then
proj
W
_
1 1
2 3
_
=
_
x, v
1
_
jv
1
j
2
v
1
+
_
x, v
2
_
jv
2
j
2
v
2
=
4
2
_
1 0
0 1
_
+
2
2
_
1 0
0 1
_
=
_
1 0
0 3
_
EXAMPLE 5 Let J =
_
_
_
_
1
0
1
1,
_
_
,
_
_
1
1
0
1
_
_
,
_
_
1
1
0
1
_
_
_
_
and let x =
_
_
4
3
2
5
_
_
. Find the projection of x onto the
subspace S = Span J of R
4
.
Solution: To nd the projection, we must have an orthogonal basis for S. Thus, our
rst step must be to perform the Gram-Schmidt procedure on J. Denote the given
basis by z
1
=
_
_
1
0
1
1,
_
_
, z
2
=
_
_
1
1
0
1
_
_
, z
3
=
_
_
1
1
0
1
_
_
. Let w
1
= z
1
. Then, we get
w
2
= z
2

_
z
2
, w
1
_
j w
1
j
2
w
1
=
_
_
1
1
0
1
_
2
3
_
_
1
0
1
1,
_
_
=
1
3
_
_
1
3
2
1
_
_
To simplify calculations we use w
2
=
_
_
1
3
2
1
_
_
instead. Then, we get
w
3
= z
3

_
z
3
, w
1
_
j w
1
j
2
w
1

_
z
3
, w
2
_
j w
2
j
2
w
2
=
_
_
1
1
0
1
_
_
+
2
3
_
_
1
0
1
1,
_
1
15
_
_
1
3
2
1
_
_
=
2
5
_
_
1
2
2
1
_
_
We take w
3
=
_
_
1
2
2
1
_
_
. Thus, the set w
1
, w
2
, w
3
is an orthogonal basis for S. We can
now determine the projection.
proj
S
x =
_
x, w
1
_
j w
1
j
2
w
1
+
_
x, w
2
_
j w
2
j
2
w
2
+
_
x, w
3
_
j w
3
j
2
w
3
=
11
3
w
1
+
14
15
w
2
+
1
10
w
3
=
_
_
9/2
3
2
9/2
_
_
REMARK
Observe that the iterative step in the Gram-Schmidt procedure is just calculating a
perpendicular of a projection. In particular, the Gram-Schmidt procedure can be
restated as follows: If w
1
, . . . , w
k
is a basis for an inner product space W, then
let v
1
= w
1
, and for 2 i k recursively dene W
i1
= Spanv
1
, . . . , v
i1
and
v
i
= perp
W
i1
w
i
. Then v
1
, . . . , v
k
If v
1
, . . . , v
k
is an orthonormal basis for W, then the formula for calculating a pro-
jection simplies to
proj
W
v =
_
v, v
1
_
v
1
+ +
_
v, v
k
_
v
k
_
_
_
_
0
1
0
_
_
,
_
4
5
0
3
5
_
_
_
_
be a subspace of R
3
. Find proj
W
_
_
1
1
1
_
_
and perp
W
_
_
1
1
1
_
_
.
_
_
_
_
0
1
0
_
_
,
_
4
5
0
3
5
_
_
_
_
is an orthonormal basis for W. Hence,
proj
W
x =
_
_
_
1
1
1
_
_
,
_
_
0
1
0
_
_
_
_
_
0
1
0
_
_
+
_
_
_
1
1
1
_
_
,
_
_
4/5
0
3/5
_
_
_
_
_
4/5
0
3/5
_
_
=
_
_
4/25
1
3/25
_
_
perp
W
x =
_
_
1
1
1
_
_
4/25
1
3/25
_
_
=
_
_
21/25
0
28/25
_
_
1. Consider S = Span
__
1 0
1 0
_
,
_
2 1
1 3
__
in M
22
(R). Find a basis for S
.
2. Find an orthogonal basis for the orthogonal compliment of each subspace of
P
2
(R) under the inner product (p(x), q(x)) = p(1)q(1) +p(0)q(0) +p(1)q(1).
(a) S = Span
_
x
2
+ 1
_
(b) S = Spanx
2
3. Let w
1
=
_
_
1
2
0
1
_
_
, w
2
=
_
_
2
1
1
2
_
_
, w
3
=
_
_
2
1
3
2
_
_
, and let S = Spanv
1
, v
2
, v
3
.
(a) Find an orthonormal basis for S.
(b) Let y =
_
_
0
0
0
12
_
_
. Determine proj
S
y.
4. Consider the subspace S = Span
__
1 0
1 1,
_
,
_
1 1
1 1
_
,
_
2 0
1 1
__
of M
22
(R).
(a) Find an orthogonal basis for S.
(b) Determine the projection of x =
_
4 3
2 5
_
on to S.
5. On P
2
(R) dene the inner product (p, q) = p(1)q(1) + p(0)q(0) + p(1)q(1)
and let S = Span
_
1, x x
2
_
.
(a) Use the Gram-Schmidt procedure to determine an orthogonal basis for S.
(b) Determine proj
S
(1 + x + x
2
).
6. Dene the inner product (x, y) = 2x
1
y
1
+ x
2
y
2
+ 3x
3
y
3
on R
3
. Extend the set
_
_
_
_
1
0
1
_
_
_
_
to an orthogonal basis for R
3
.
7. Suppose W is a k-dimensional subspace of an inner product space V. Prove
that proj
W
is a linear operator on V with kernel W
.
8. In P
3
(R) using the inner product (p(x), q(x)) = p(1)q(1)+p(0)q(0)+p(1)q(1)+
p(2)q(2) the following four vectors are orthogonal:
p
1
(x) = 9 13x 15x
2
+ 10x
3
p
2
(x) = 1 + x x
2
p
3
(x) = 1 2x
p
4
(x) = 1
Let S = Spanp
1
(x), p
2
(x), p
3
(x). Determine the projection of 1 + x + x
2
x
3
onto S. [HINT: If you do an excessive number of calculations, you have missed
the point of the question.]
Section 9.5 The Fundamental Theorem 61
9.5 The Fundamental Theorem
Let Wbe a subspace of a nite dimensional inner product space V. In the last section
we saw that any v V we can be written as v = x +y where x Wand y W
. We
now invent some notation for this.
DEFINITION
Direct Sum
Let V be a vector space and U and W be subspaces of a vector space V such that
U W =
0. The direct sum of U and Wis

U W = u + w V u U, w W
EXAMPLE 1 Let U = Span
_
_
_
_
1
0
1
_
_
_
_
and W = Span
_
_
_
_
1
1
1
_
_
_
_
. What is U W?
Solution: Since U W =
0 we have
U W = u + w V u U, w W =
_
_
s
_
_
1
0
1
_
_
+ t
_
_
1
1
1
_
_
s, t R
_
_
= Span
_
_
_
_
1
0
1
_
_
,
_
_
1
1
1
_
_
_
_
THEOREM 1 If U and Ware subspaces of a vector space V such that U W =
0, then U Wis
a subspace of V. Moreover, if v
1
, . . . , v
k
is a basis for U and w
1
, . . . , w
is a basis
for W, then v
1
, . . . , v
k
, w
1
, . . . , w
is a basis for U W.
EXAMPLE 2 Let U = Spanx
2
+ 1, x
4
and W = Spanx
3
x, x
3
+ x
2
+ 1. Find a basis for U W.
Solution: Since U W =
0 a basis for U Wis x

2
+ 1, x
4
, x
3
x, x
3
+ x
2
+ 1.
Theorem 9.4.2 tells us that for any subspace W of an n-dimensional inner product
space V that WW
0 and dimW
= ndimW. Combining this with Theorem

1 we get the following result.
THEOREM 2 Let V be a nite dimensional inner product space and let Wbe a subspace of V. Then
W W
= V
We now look at the related result for matrices. We start by looking at an example.
EXAMPLE 3 Find a basis for each of the four fundamental subspaces of A =
_
_
1 2 0 1
3 6 1 1
2 4 2 6
_
_
.
Then determine the orthogonal complement of the rowspace of A and the orthogonal
complement of the columnspace of A.
Solution: To nd a basis for each of the four fundamental subspaces, we use the
method derived in Section 7.1. Row reducing A to reduced row echelon form gives
R =
_
_
1 2 0 1
0 0 1 2
0 0 0 0
_
_
Hence, a basis for Row(A) is
_
_
_
_
1
2
0
1
_
_
,
_
_
0
0
1
2
_
_
_
_
, a basis for Null(A) is
_
_
_
_
2
1
0
0
_
_
,
_
_
1
0
2
1
_
_
_
_
,
and a basis for Col(A) is
_
_
_
_
1
3
2
_
_
,
_
_
0
1
2
_
_
_
_
.
Row reducing A
T
gives
_
_
1 0 8
0 1 2
0 0 0
0 0 0
_
_
. Thus, a basis for the left nullspace is
_
_
_
_
8
2
1
_
_
_
_
.
Observe that the basis vectors for Null(A) are orthogonal to the basis vectors for
Row(A). Additionally, we know that dim(RowA)
= 4 2 = 2 by Theorem 9.4.1
(2). Therefore, since dim(Null(A)) = 2 we have that (RowA)
= Null(A).
Similarly, we see that (Col A)
= Null(A
T
).
The result of this example seems pretty amazing. Moreover, applying parts (4) and
(5) of Theorem 9.4.1 and our notation for direct sums, we get that
R
n
= Row(A) Null(A) and R
m
= Col(A) Null(A
T
)
This tells us that every vector in R
4
is the sum of a vector in the rowspace of A and a
vector in the nullspace of A.
The generalization of this result to any m n matrix is extremely important.
Section 9.5 The Fundamental Theorem 63
THEOREM 3 (The Fundamental Theorem of Linear Algebra)
Let A be an m n matrix, then Col(A)
= Null(A
T
) and Row(A)
= Null(A). In
particular,
R
n
= Row(A) Null(A) and R
m
= Col(A) Null(A
T
)
Proof: Let A =
_
_
v
T
1
.
.
.
v
T
n
_
_
.
If x Row(A)
, then x is orthogonal to each column of A

T
. Hence, v
i
x = 0 for
1 i n. Thus we get
Ax =
_
_
v
T
1
x
.
.
.
v
T
n
x
_
_
=
_
_
v
1
x
.
.
.
v
n
x
_
_
=
0
hence x Null(A).
On the other hand, let x Null(A), then Ax =

0 so v
i
x = 0 for all i. Now, pick any
b Row(A). Then
b = c
1
v
1
+ + c
n
v
n
. Therefore,
b x = (c
1
v
1
+ + c
n
v
n
) x = c
1
(v
1
x) + + c
n
(v
n
x) = 0
Hence x Row(A)
.
Applying what we proved above to A
T
we get
(Col(A))
= (Row(A
T
))
= Null(A
T
)
Observe that the Fundamental Theorem of Linear Algebra tells us more than the
relationship between the four fundamental subspaces. For example, if A is an m n
matrix, then we know that the rank and nullity of the linear mapping L(x) = Ax is
rank L = rank A = dimRowA and nullity L = dim(Null A). Since R
n
= Row(A)
Null(A) we get by Theorem 9.4.1 (2) that
n = rank L + nullity(L)
That is, the Fundamental Theorem of Linear Algebra implies the Rank-Nullity The-
orem.
It can also be shown that the Fundamental Theorem of Linear Algebra implies a lot
of our results about solving systems of linear equations. We will also nd it useful in
a couple of proofs in future sections.
1. Let A =
_
_
1 1 3 1
2 2 6 2
1 0 2 3
3 1 7 7
_
_
. Find a basis for Row(A)
.
2. Prove that if U and Ware subspaces of a vector space V such that UW =
0,
then U W is a subspace of V. Moreover, if v
1
, . . . , v
k
is a basis for U and
w
1
, , w
is a basis for W, then v

1
, . . . , v
k
, w
1
, . . . , w
is a basis for U W.
9.6 The Method of Least Squares
In the sciences one often tries to nd a correlation between two quantities by collect-
ing data from repeated experimentation. Say, for example, a scientist is comparing
quantities x and y which is known to satisfy a quadratic relation y = a
0
+ a
1
x + a
2
x
2
.
The scientist would like to nd the values of a
0
, a
1
, and a
2
which best ts their ex-
perimental data.
Observe that the data collected from each experiment will correspond to an equation
which can be used to determine values of the three unknowns a
0
, a
1
, and a
2
. To get as
accurate of a result as possible, the scientists will perform the experiment many times.
We thus end up with a system of linear equations which has more equations than
unknowns. Such a system of equations is called an overdetermined system. Also
notice that due to experimentation error, the system is very likely to be inconsistent.
So, the scientist needs to nd the values of a
0
, a
1
, and a
2
which best approximates
the data they collected.
To solve this problem, we rephrase it in terms of linear algebra. Let A be an m n
matrix with m > n and let the system Ax =

b be inconsistent. We want to nd a
vector x that minimizes the distance between Ax and
b. Hence, we need to minimize

jAx
bj. The following theorem tells us how to do this.

THEOREM 1 (Approximation Theorem)
Let W be a nite dimensional subspace of an inner product space V. If v V, then
the vector closest to v in Wis proj
W
v. That is,
jv proj
W
v j < jv wj
for all w W, w proj
W
v.
Proof: Consider v w = (v proj
W
v) + (proj
W
v w). Then observe that
< v proj
W
v, proj
W
v w >=< perp
W
v, proj
W
v > < perp
W
v, w >= 0 0 = 0
Section 9.6 The Method of Least Squares 65
Hence, vproj
W
v, proj
W
v w is an orthogonal set. Therefore, since the Pythagorean
Theorem holds in an orthogonal set, we get
jv wj
2
= j(v proj
W
v) + (proj
W
v w)j
2
= jv proj
W
v j
2
+ j proj
W
v wj
2
> jv proj
W
v j
2
since j proj
W
v wj
2
> 0 if w proj
W
v. The result now follows.
Notice that Ax is in the columnspace of A. Thus, the Approximation Theorem tells us
that we can minimize jAx
b j by nding the projection of

b onto the columnspace
of A. Therefore, if we solve the consistent system
Ax = proj
Col A
b
we will nd the desired vector x. This might seem quite simple, but we can make it
even easier. Subtracting both sides of this equation from

b we get
b Ax =
b proj
Col A
b = perp
Col A
b
Thus,
b Ax is in the orthogonal complement of the columnspace of A which, by the

Fundamental Theorem of Linear Algebra, means

b Ax is in the nullspace of A
T
.
Hence
A
T
(
b Ax) =
0
or equivalently
A
T
Ax = A
T
b
This is called the normal system and the individual equations are called the normal
equations. This systemwill be consistent by construction. However, it need not have
a unique solution. If it does have innitely many solutions, then each of the solutions
will minimize jAx
bj.
EXAMPLE 1 Determine the vector x that minimizes jAx
bj for the system

3x
1
x
2
= 4
x
1
+ 2x
2
= 0
2x
1
+ x
2
= 1
Solution: We have A =
_
_
3 1
1 2
2 1
_
_
so A
T
A =
_
14 1
1 6
_
, A
T
b =
_
14
3
_
. Since A
T
A is
invertible, we nd that
x = (A
T
A)
1
A
T
b =
1
83
_
6 1
1 14
_ _
14
3
_
=
1
83
_
87
56
_
Thus, x
1
=
87
83
and x
2
=
56
83
.
Note that these give 3x
1
x
2
= 3.82, x
1
+ 2x
2
= 0.3, 2x
1
+ 3x
2
= 1.42, so we have
found a fairly good approximation.
EXAMPLE 2 Determine the vector x that minimizes jAx
bj for the system

2x
1
+ x
2
= 5
2x
1
+ x
2
= 8
2x
1
+ 3x
2
= 1
_
_
2 1
2 1
2 3
_
_
so
A
T
A =
_
12 6
6 11
_
, A
T
b =
_
24
6
_
Since A
T
A is invertible, we nd that
x = (A
T
A)
1
A
T
b =
1
96
_
11 6
6 12
_ _
24
6
_
=
_
300
216
_
Thus, x
1
=
87
83
and x
2
=
56
83
.
This method of nding an approximate solution is called the method of least squares.
It is called this because we are minimizing
jAx
bj =
_
_
_
_
_
_
_
_
_
_
_
v
1
.
.
.
v
m
_
_
_
_
_
_
_
_
_
_
_
=
_
v
2
1
+ + v
2
m
which is equivalent to minimizing v
2
1
+ + v
2
m
.
We now return to our problem of nding the curve of best t for a set of data points.
Say in an experiment we get a set of data points (x
1
, y
1
), . . . , (x
m
, y
m
) and we want to
nd the values of a
0
, . . . , a
n
such that p(x) = a
0
+ a
1
x + + a
n
x
n
is the polynomial
of best t. That is, we want the values of a
0
, . . . , a
n
such that the values of y
i
are
approximated as well as possible by p(x
i
).
Let y =
_
_
y
1
.
.
.
y
n
_
_
and dene p(x) =
_
_
p(x
1
)
.
.
.
p(x
n
)
_
_
. To make this look like the method of least
squares we write
p(x) =
_
_
p(x
1
)
.
.
.
p(x
n
)
_
_
=
_
_
a
0
+ a
1
(x
1
) + + a
n
(x
1
)
n
.
.
.
a
0
+ a
1
(x
m
) + + a
n
(x
m
)
n
_
_
=
_
_
1 x
1
x
n
1
.
.
.
.
.
.
.
.
.
1 x
m
x
n
m
_
_
_
_
a
0
.
.
.
a
n
_
_
= Xa
Thus, we are trying to minimize jXa y j and we can use the method of least squares
above.
This can be summarized in the following theorem.
THEOREM 2 Let n data points (x
1
, y
1
), . . . , (x
m
, y
n
) be given and write
y =
_
_
y
1
.
.
.
y
m
_
_
, X =
_
_
1 x
1
x
2
1
x
n
1
1 x
2
x
2
2
x
n
2
.
.
.
.
.
.
1 x
m
x
2
m
x
n
m
_
_
Then, if a =
_
_
a
0
.
.
.
a
n
_
_
is any solution to the normal system
X
T
Xa = X
T
y
then the polynomial
p(x) = a
0
+ a
1
x + + a
n
x
n
is the best tting polynomial of degree n for the given data. Moreover, if at least n+1
of the numbers x
1
, . . . , x
m
are distinct, then the matrix X
T
X is invertible and thus a is
unique with
a = (X
T
X)
1
X
T
y
Proof: We prove the rst part above, so we just prove the second part.
Suppose that n + 1 of the x
i
are distinct and consider a linear combination of the
columns of X.
c
0
_
_
1
1
.
.
.
1
_
_
+ c
1
_
_
x
1
x
2
.
.
.
x
m
_
_
+ + c
n
_
_
x
n
1
x
n
2
.
.
.
x
n
m
_
_
=
_
_
0
0
.
.
.
0
_
_
Now let q(x) = c
0
+ c
1
x + + c
n
x
n
. Equating coecients we see that x
1
, x
2
, . . . , x
m
are all roots of q(x). Then q(x) is a degree n polynomial with at least n + 1 distinct
roots which contradicts the Fundamental Theorem of Algebra. Therefore, q(x) must
be the zero polynomial, and so c
i
= 0 for 0 i n. Thus, the columns of X are
linearly independent.
To show this implies that X
T
X is invertible we consider X
T
Xv =
0. We have
jXv j
2
= (Xv)
T
Xv = v
T
X
T
Xv = v
T
0 = 0
Hence, Xv =
0, so v =
0 since the columns of X are linearly independent. Thus X

T
X
is invertible as required.
EXAMPLE 3 Find a
0
and a
1
to obtain the best tting equation of the form y = a
0
+ a
1
x for the
given data.
x 1 3 4 6 7
y 1 2 3 4 5
Solution: We let X =
_
_
1 1
1 3
1 4
1 6
1 7
_
_
and y =
_
_
1
2
3
4
5
_
_
. We then get
X
T
X =
_
1 1 1 1 1
1 3 4 6 7
_
_
_
1 1
1 3
1 4
1 6
1 7
_
_
=
_
5 21
21 111
_
X
T
y =
_
1 1 1 1 1
1 3 4 6 7
_
_
_
1
2
3
4
5
_
_
=
_
15
78
_
By Theorem 2 we get that a =
_
a
0
a
1
_
is unique and satises
a = (X
T
X)
1
X
T
y
=
_
5 21
21 111
_
1
_
15
78
_
=
1
114
_
111 21
21 5
_ _
15
78
_
=
1
38
_
9
25
_
So, the line of best t is y =
9
38
+
25
38
x.
EXAMPLE 4 Find a
0
, a
1
, a
2
0
+ a
1
x + a
2
x
2
for
the given data.
x 3 1 0 1 3
y 3 1 1 2 4
Solution: We have y =
_
_
3
1
1
2
4
_
_
and X =
_
_
1 3 9
1 1 1
1 0 0
1 1 1
1 3 9
_
_
. Hence,
X
T
X =
_
_
5 0 20
0 20 0
20 0 164
_
_
, X
T
y =
_
_
11
4
66
_
_
By Theorem 2 we get that a =
_
a
0
a
1
_
is unique and satises
a = (X
T
X)
1
X
T
y
=
_
_
5 0 20
0 20 0
20 0 164
_
_
1 _
_
11
4
66
_
_
=
1
210
_
_
242
42
55
_
_
So the best tting parabola is
p(x) =
121
105
+
1
5
x +
11
42
x
2
1. Find the vector x that minimizes jAx
bj for each of the following systems.

(a) x
1
+ 2x
2
= 1
x
1
+ x
2
= 2
x
1
x
2
= 5
(c) 2x
1
+ x
2
= 4
2x
1
x
2
= 1
3x
1
+ 2x
2
= 8
(e) x
1
x
2
= 2
2x
1
x
2
+ x
3
= 2
2x
1
+ 2x
2
+ x
3
= 3
x
1
x
2
= 2
(b) x
1
+ 5x
2
= 1
2x
1
7x
2
= 1
x
1
+ 2x
2
= 0
(d) x
1
+ 2x
2
= 2
x
1
+ 2x
2
= 3
x
1
+ 3x
2
= 2
x
1
+ 3x
2
= 3
(f) 2x
1
2x
2
= 1
x
1
+ x
2
= 2
3x
1
+ x
2
= 1
2x
1
x
2
= 2
2. Find a
0
and a
1
0
+ a
1
x for
the given data.
(a)
x 1 0 1
y 3 2 2
(b)
x 1 0 1 2
y 3 1 0 1
3. Find a
0
, a
1
, a
2
0
+a
1
x +a
2
x
2
for the given data.
(a)
x 2 1 1 2
y 0 2 0 1
(b)
x 2 1 0 1 2
y 0 1 1 3 1
Chapter 10
Applications of Orthogonal Matrices
10.1 Orthogonal Similarity
In Math 136, we saw that two matrices A and B are similar if there exists an invertible
matrix P such that P
1
AP = B. Moreover, we saw that this corresponds to applying
a change of coordinates to the standard matrix of a linear operator. In particular, if
L : R
n
R
n
is a linear operator and J = v
1
, . . . , v
n
is a basis for R
n
, then taking
P =
_
v
1
v
n
_
we get
[L]
J
= P
1
[L]P
We then looked for a basis J of R
n
such that [L]
J
was diagonal. We found that such
a basis is made up of the eigenvectors of [L].
In the last chapter, we sawthat orthonormal bases have some very nice properties. For
example, if the columns of P form an orthonormal basis for R
n
, then P is orthogonal
and it is very easy to nd P
1
. In particular, we have P
1
= P
T
. Hence, we now turn
our attention to trying to nd an orthonormal basis J of eigenvectors such that [L]
J
is diagonal. Since the corresponding matrix P will be orthogonal, we will get
[L]
J
= P
T
[L]P
DEFINITION
Orthogonally
Similar
Two matrices A and B are said to be orthogonally similar if there exists an orthogo-
nal matrix P such that
P
T
AP = B
Since P
T
= P
1
we have that if A and B are orthogonally similar, then they are similar.
Therefore, all the properties of similar matrices still hold. In particular, if A and B
are orthogonally similar, then rank A = rank B, tr A = tr B, det A = det B, and A and
B have the same eigenvalues.
We saw in Math 136 that not every square matrix A has a set of eigenvectors which
forms a basis for R
n
. Since we are now going to require the additional condition that
the basis is orthonormal, we expect that even fewer matrices will have this property.
So, we rst just look for matrices A such that there exists an orthogonal matrix P for
which P
T
AP is upper triangular.
71
72 Chapter 10 Applications of Orthogonal Matrices
THEOREM 1 (Triangularization Theorem)
If A is an n n matrix with real eigenvalues, then A is orthogonally similar to an
upper triangular matrix T.
Proof: We prove the result by induction on n. If n = 1, then A is upper triangular.
Therefore, we can take P to be the orthogonal matrix P = [1] and the result follows.
Now assume the result holds for all (n 1) (n 1) matrices and consider an n n
matrix A with all real eigenvalues.
Let v
1
be a unit eigenvector of one of As eigenvalues
1
. Extend the set v
1
to a
basis for R
n
and apply the Gram-Schmidt procedure to produce an orthonormal basis
v
1
, w
2
, . . . , w
n
for R
n
. Then the matrix P
1
=
_
v
1
w
2
. . . w
n
_
is orthogonal and
P
T
1
AP
1
=
_
_
v
T
1
w
T
2
.
.
.
w
T
n
_
_
A
_
v
1
w
2
. . . w
n
_
=
_
_
v
1
Av
1
v
1
A w
2
v
1
A w
n
w
2
Av
1
w
2
A w
2
. . . w
2
A w
n
.
.
.
.
.
.
.
.
.
w
n
Av
1
w
n
A w
2
. . . w
n
A w
n
_
_
Consider the entries in the rst column. Since Av
1
=
1
v
1
we get
w
i
Av
1
= w
i

1
v
1
=
1
( w
i
v
1
) = 0
for 2 i n and v
1
Av
1
=
1
. Hence
P
T
1
AP
1
=
_
1

b
T
0 A
1
_
where A
1
is an (n 1) (n 1) matrix,

b R
n1
, and

0 is the zero vector in R
n1
. A
is similar to
_
1

b
T
0 A
1
_
so all the eigenvalues of A
1
are eigenvalues of A and hence all
the eigenvalues of A
1
are real. Therefore, by the inductive hypothesis, there exists an
(n 1) (n 1) orthogonal matrix Q such that Q
T
A
1
Q = T
1
is upper triangular.
Let P
2
=
_
1

0
T
0 Q
_
. Since the columns of Q are orthonormal, the columns of P
2
are
also orthonormal and hence P
2
is an orthogonal matrix. Consequently, the matrix
P = P
1
P
2
is also orthogonal by Theorem 7. Then by block multiplication we get
P
T
AP = (P
1
P
2
)
T
A(P
1
P
2
) = P
T
2
P
T
1
AP
1
P
2
=
_
1

0
T
0 Q
T
_ _
1

b
T
0 A
1
_ _
1

0
T
0 Q
_
=
_
1

b
T
Q
0 Q
T
A
1
Q
_
=
_
1

b
T
Q
0 T
1
_
is upper triangular. Thus, the result follows by induction.
Section 10.1 Orthogonal Similarity 73
If A is orthogonally similar to an upper triangular matrix T, then A and T must share
the same eigenvalues. Thus, since T is upper triangular, the eigenvalues must appear
along the main diagonal of T.
Notice that the proof of the theorem gives us a method for nding an orthogonal
matrix P so that P
T
AP = T is upper triangular. However, since the proof is by
induction, this leads to a recursive algorithm. We demonstrate this with an example.
EXAMPLE 1 Let A =
_
_
2 1 1
0 3 2
0 2 1
_
_
. Find an orthogonal matrix P such that P
T
AP is upper triangular.
Solution: As in the proof, we rst need to nd one of the real eigenvalues of A. By
inspection, we see that 2 is an eigenvalue of A with unit eigenvector v
1
=
_
_
1
0
0
_
_
. Next
we extend v
1
to an orthonormal basis for R
3
. This is very easy in this case as we
can take w
2
=
_
_
0
1
0
_
_
and w
3
=
_
_
0
0
1
_
_
. Thus we get P
1
= I and
P
T
1
AP
1
=
_
_
2 1 1
0 3 2
0 2 1
_
_
=
_
2

b
T
0 A
1
_
where A
1
=
_
3 2
2 1
_
and
b
T
=
_
1 1
_
.
We now need to apply the inductive hypothesis on A
1
. Thus, we need to nd a
2 2 orthogonal matrix Q such that Q
T
A
1
Q = T
1
is upper triangular. We start
by nding a real eigenvalue of A
1
. Consider A
1
I =
_
3 2
2 1
_
. Then
C() = det(A
1
I) = ( 1)
2
, so we have one eigenvalue = 1. This gives
A
1
I =
_
2 2
2 2
_
so a corresponding unit eigenvector is
_
1/
2
1/
2
_
.
We extend
__
1/
2
1/
2
__
to a orthonormal basis for R
2
by picking
1
2
_
1
1
_
. Therefore,
Q =
1
2
_
1 1
1 1
_
is an orthogonal matrix and
Q
T
A
1
Q =
1
2
_
1 1
1 1
_ _
3 2
2 1
_
1
2
_
1 1
1 1
_
=
_
1 4
0 1
_
= T
1
Now that we have Q and T
1
, the proof of the theorem tells us to choose
P
2
=
_
1

0
T
0 Q
_
=
_
_
1 0 0
0 1/
2 1/
2
0 1/
2 1/
2
_
_
and P = P
1
P
2
= P
2
. Then
P
T
AP =
_
2

b
T
Q
0 T
1
_
=
_
_
2

2 0
0 1 4
0 0 1
_
_
as required.
Since the procedure is recursive, it would be very inecient for larger matrices.
There is a much better algorithm for doing this, although we will not cover it in
this book. The purpose of the example above is not to teach one how to nd such an
orthogonal matrix, but to help one understand the proof.
1. Prove that if A is orthogonal similar to B and B is orthogonally similar to C,
then A is orthogonal similar to C.
2. By following the steps in the proof of the Triangularization Theorem nd an
orthogonal matrix P and upper triangular matrix T such that P
T
AP = T for
each of the following matrices.
(a) A =
_
2 4
1 5
_
(b) A =
_
_
1 0 1
2 2 1
4 0 5
_
_
10.2 Orthogonal Diagonalization
We now know that we can nd an orthonormal basis J such that the J-matrix of any
linear operator L on R
n
with real eigenvalues is upper triangular. However, as we
saw in Math 136, having a diagonal matrix would be even better. So, we now look at
which matrices are orthogonally similar to a diagonal matrix.
DEFINITION
Orthogonally
Diagonalizable
An n n matrix A is said to be orthogonally diagonalizable if there exists an or-
thogonal matrix P and diagonal matrix D such that
P
T
AP = D
that is, if A is orthogonally similar to a diagonal matrix.
Our goal is to determine all matrices which are orthogonally diagonalizable. At
rst glance, it may not be clear of how to even start trying to nd matrices which are
orthogonally diagonalizable, if any exist. One way to approach this problemis to start
by looking at what condition a matrix must have if it is orthogonally diagonalizable.
Section 10.2 Orthogonal Diagonalization 75
THEOREM 1 If A is orthogonally diagonalizable, then A
T
= A.
Proof: If A is orthogonally diagonalizable, then there exists an orthogonal matrix
P such that P
T
AP = D with D diagonal. Since P
T
= P
1
we can write this as
A = PDP
T
. Taking transposes gives
A
T
= (PDP
T
)
T
= (P
T
)
T
D
T
P
T
= PDP
T
= A
since D = D
T
as D is diagonal. Thus, A = A
T
.
Observe that this theorem does not even guarantee the existence of an orthogonally
diagonalizable matrix. It only shows that if one exists, then it must satisfy A
T
= A.
But, on the other hand, it gives us a place to start looking for orthogonally diagonal-
izable matrices. Lets rst consider the set of 2 2 such matrices.
EXAMPLE 1 Prove that every 2 2 matrix A =
_
a b
b c
_
is orthogonally diagonalizable.
Solution: We will show that we can nd an orthonormal basis of eigenvectors for A
so that we can nd an orthogonal matrix P which diagonalizes A. We have C() =
det
_
a b
b c
_
=
2
(a + c) + ac b
2
. Thus, by the quadratic formula, the
eigenvalues of A are
+
=
a + c +
_
(a + c)
2
4(ac b
2
)
2
=
a + c +
_
(a c)
2
+ 4b
2
2
=
a + c
_
(a + c)
2
4(ac b
2
)
2
=
a + c
_
(a c)
2
+ 4b
2
2
If a = c and b = 0, then A is already diagonal and thus can be orthogonally diagonal-
ized by I. Otherwise, A has two real eigenvalues with algebraic multiplicity 1. We
have
A
+
I =
_
_
ac
(ac)
2
+4b
2
2
b
b
ca
(ac)
2
+4b
2
2
_
_
1
2b
ac
(ac)
2
+4b
2
0 0
_
_
Hence, an eigenvector for
+
is v
1
=
_
_
2b
ac
(ac)
2
+4b
2
1
_
_
. Also,
A
I =
_
_
ac+
(ac)
2
+4b
2
2
b
b
ca+
(ac)
2
+4b
2
2
_
_
1
2b
ac+
(ac)
2
+4b
2
0 0
_
_
so, an eigenvector for
is v
2
=
_
_
2b
ac+
(ac)
2
+4b
2
1
_
_
.
Observe that
v
1
v
2
=
_
_
2b
a c
_
(a c)
2
+ 4b
2
_
_
_
_
2b
a c +
_
(a c)
2
+ 4b
2
_
_
+ 1(1)
=
4b
2
(a c)
2
(a c)
2
4b
2
+ 1 = 0
Thus, the eigenvectors are orthogonal and hence we can normalize them to get an
2
. So, A can be diagonalized by an orthogonal matrix.
It is clear that the condition A
T
= A is going to be very important. Thus, we make
the following denition.
DEFINITION
Symmetric
Matrix
A matrix A such that A
T
= A is said to be symmetric.
It is clear fromthe denition that a matrix must be square to be symmetric. Moreover,
observe that a matrix A is symmetric if and only if a
i j
= A
ji
for all 1 i, j n.
EXAMPLE 2 Determine which of the following matrices is symmetric.
(a) A =
_
_
1 2 1
2 0 1
1 1 3
_
_
(b) B =
_
_
1 0 1
1 1 1
1 0 1
_
_
(c) C =
_
_
1 0 0
0 2 0
0 0 3
_
_
Solution: It is easy to verify that A
T
= A and C
T
= C so they are both symmetric.
However,
B
T
=
_
_
1 1 1
0 1 0
1 1 1
_
_
B
so B is not symmetric.
Example 1 shows that every 2 2 symmetric matrix is orthogonally diagonalizable.
Thanks to our work in the last section proving that every symmetric matrix is or-
thogonally diagonalizable is not dicult. We start by stating an amazing result about
symmetric matrices.
LEMMA 2 If A is a symmetric matrix with real entries, then all of its eigenvalues are real.
This proof requires properties of vectors and matrices with complex entries and so
will be delayed until Chapter 11.
THEOREM 3 (Principal Axis Theorem)
Every symmetric matrix is orthogonally diagonalizable.
Proof: Let A be a symmetric matrix. By Lemma 2 all eigenvalues of A are real.
Therefore, we can apply the Triangularization Theorem to get that there exists an
orthogonal matrix P such that P
T
AP = T is upper triangular. Since A is symmetric,
we have that A
T
= A and hence
T
T
= (P
T
AP)
T
= P
T
A
T
(P
T
)
T
= P
T
AP = T
Therefore, T is an upper triangular symmetric matrix. But, if T is upper triangular,
then T
T
is lower triangular, and so T is both upper and lower triangular and hence T
is diagonal.
Although proving this important theorem is quite easy thanks to the Triangularization
Theorem, it does not give us a nice way of orthogonally diagonalizing a symmetric
matrix. So, we instead refer to our solution of Example 1. In the example we saw that
the eigenvectors corresponding to dierent eigenvalues were naturally orthogonal. To
prove that this is always the case, we rst prove an important property of symmetric
matrices.
THEOREM 4 A matrix A is symmetric if and only if x (Ay) = (Ax) y for all x, y R
n
.
Proof: Suppose that A is symmetric, then for any x, y R
n
we have
x (Ay) = x
T
Ay = x
T
A
T
y = (Ax)
T
y = (Ax) y
Conversely, if x (Ay) = (Ax) y for all x, y R
n
, then
(x
T
A)y = (Ax)
T
y = (x
T
A
T
)y
Since this is valid for all y R
n
we get that x
T
A = x
T
A
T
by Theorem 3.1.4 from
the Math 136 course notes. Taking transposes of both sides gives A
T
x = Ax and this
is still valid for all x R
n
. Hence, applying Theorem 3.1.4 again gives A
T
= A as
required.
THEOREM 5 If v
1
, v
2
are eigenvectors of a symmetric matrix A corresponding to distinct eigenval-
ues
1
,
2
, then v
1
is orthogonal to v
2
.
Proof: We are assuming that Av
1
=
1
v
1
and Av
2
=
2
v
2
,
1

2
. Theorem 4 gives
1
(v
1
v
2
) = (
1
v
1
) v
2
= (Av
1
) v
2
= v
1
(Av
2
) = v
1
(
2
v
2
) =
2
(v
1
v
2
)
Hence, (
1

2
)(v
1
v
2
) = 0. But,
1

2
, so v
1
v
2
= 0 as required.
Consequently, if a symmetric matrix A has n distinct eigenvalues, the basis of eigen-
vectors which diagonalizes A will naturally be orthogonal. Hence, to orthogonally
diagonalize such a matrix A, we just need to normalize these eigenvectors to form an
n
of eigenvectors of A.
EXAMPLE 3 Find an orthogonal matrix P such that P
T
AP is diagonal, where A =
_
_
1 0 1
0 1 2
1 2 5
_
_
.
Solution: The characteristic polynomial is
C() =
1 0 1
0 1 2
1 2 5
= (1)[(1)(5) 4] (1) = (1)(6)

Thus the eigenvalues of A are
1
= 0,
2
= 1, and
3
= 6. For
1
= 0 we get
A 0I =
_
_
1 0 1
0 1 2
1 2 5
_
_
1 0 1
0 1 2
0 0 0
_
_
Hence, a basis for E
1
, the eigenspace of
1
, is
_
_
_
_
1
2
1
_
_
_
_
. For
2
= 1, we get
A I
_
_
1 2 0
0 0 1
0 0 0
_
_
2
is
_
_
_
_
2
1
0
_
_
_
_
. For
3
= 6, we get
A 6I
_
_
1 0 1/5
0 1 2/5
0 0 0
_
_
3
is
_
_
_
_
1
2
5
_
_
_
_
.
As predicted by the theorem, we can easily verify that these are orthogonal to each
other. After normalizing we nd that an orthonormal basis of eigenvectors for A is
_
_
_
_
1/
6
2/
6
1/
6
_
_
,
_
_
2/
5
1/
5
0
_
_
,
_
_
1/
30
2/
30
5/
30
_
_
_
_
Thus, we take
P =
_
_
1/
6 2
5 1/
30
2/
6 1/
5 2/
30
1/
6 0 5/
30
_
_
Since the columns of P form an orthonormal basis for R
3
we get that P is orthogonal.
Moreover, since the columns of P form a basis of eigenvectors for A, we get that P
diagonalizes A. In particular,
P
T
AP =
_
_
0 0 0
0 1 0
0 0 6
_
_
Notice that Theorem5 does not say anything about dierent eigenvectors correspond-
ing to the same eigenvalue. In particular, if an eigenvalue of a symmetric matrix A has
geometric multiplicity 2, then there is no guarantee that a basis for the eigenspace of
the eigenvalue will be orthogonal. Of course, this is easily xed as we can just apply
the Gram-Schmidt procedure to nd an orthonormal basis for the eigenspace.
EXAMPLE 4 Orthogonally diagonalize A =
_
_
8 2 2
2 5 4
2 4 5
_
_
.
Solution: The characteristic polynomial of A is
C() =
8 2 2
2 5 4
2 4 5
=
_
_
8 4 2
0 0 9
2 1 5
_
_
= ( 9)
2
Thus, the eigenvalues are
1
= 0 and
2
= 9 where the algebraic multiplicity of
1
is
1 and the algebraic multiplicity of
2
is 2. For
1
= 0 we get
A 0I
_
_
1 0 1/2
0 1 1
0 0 0
_
_
1
is
_
_
_
_
1
2
2
_
_
_
_
= v
1
. For
2
= 9 we get
A 9I
_
_
1 2 2
0 0 0
0 0 0
_
_
2
is
_
_
_
_
2
1
0
_
_
,
_
_
2
0
1
_
_
_
_
= v
2
, v
3
.
Observe that v
2
and v
3
are both orthogonal to v
1
, but not to each other. Thus, we
apply the Gram-Schmidt procedure on the basis v
2
, v
3
to get an orthonormal basis
for E
2
.
w
2
=
_
_
2
1
0
_
_
w
3
= v
3

_
v
3
, w
2
_
j w
2
j
2
w
2
=
_
_
2
0
1
_
4
5
_
_
2
1
0
_
_
=
_
_
2/5
4/5
1
_
_
Instead, we take w
3
=
_
_
2
4
5
_
_
. Thus, w
2
, w
3
is an orthogonal basis for the eigenspace of
2
and hence v
1
, w
2
, w
3
is an orthogonal basis for R
3
of eigenvectors of A.
We normalize the vectors to get the orthonormal basis
_
_
_
_
1/3
2/3
2/3
_
_
,
_
_
2/
5
1/
5
0
_
_
,
_
_
2/
45
4/
45
5/
45
_
_
_
_
Hence, we get
P =
_
_
1/3 2/
5 2/
45
2/3 1/
5 4/
45
2/3 0 5/
45
_
_
and D = P
T
AP =
_
_
0 0 0
0 9 0
0 0 9
_
_
EXAMPLE 5 Orthogonally diagonalize A =
_
_
1 1 1
1 1 1
1 1 1
_
_
.
C() =
1 1 1
1 1 1
1 1 1
=
_
_
1 1 2
1 1 2
0 0
_
_
=
2
( 3)
Thus, the eigenvalues are
1
= 0 with algebraic multiplicity 2 and
2
= 3 with
algebraic multiplicity 1. For
1
= 0 we get
A 0I
_
_
1 1 1
0 0 0
0 0 0
_
_
1
is
_
_
_
_
1
1
0
_
_
,
_
_
1
0
1
_
_
_
_
.
Since this is not an orthogonal basis for E
1
, we need to apply the Gram-Schmidt
procedure. We get
w
1
=
_
_
1
1
0
_
_
w
2
=
_
_
1
0
1
_
1
2
_
_
1
1
0
_
_
=
_
_
1/2
1/2
1
_
_
Instead, we take w
2
=
_
_
1
1
2
_
_
. Then, w
1
, w
2
is an orthogonal basis for E
1
. For
2
= 3
we get
A 3I
_
_
1 0 1
0 1 1
0 0 0
_
_
2
is
_
_
_
_
1
1
1
_
_
_
_
.
We normalize the basis vectors to get the orthonormal basis
_
_
_
_
1/
2
1/
2
0
_
_
,
_
_
1/
6
1/
6
2/
6
_
_
,
_
_
1/
3
1/
3
1/
3
_
_
_
_
Hence, we take
P =
_
_
1/
2 1/
6 1/
3
1/
2 1/
6 1/
3
0 2/
6 1/
3
_
_
and get
P
T
AP =
_
_
0 0 0
0 0 0
0 0 3
_
_
1. Orthogonally diagonalize each of the following symmetric matrices.
(a)
_
2 2
2 2
_
(b)
_
4 2
2 7
_
(c)
_
1 2
2 2
_
(d)
_
_
1 1 1
1 1 1
1 1 1
_
_
(e)
_
_
0 2 2
2 1 0
2 0 1
_
_
(f)
_
_
2 2 5
2 5 2
5 2 2
_
_
(g)
_
_
2 4 4
4 2 4
4 4 2
_
_
(h)
_
_
5 1 1
1 5 1
1 1 5
_
_
(i)
_
_
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
_
_
2. Prove that if A is invertible and orthogonally diagonalizable, then A
1
is or-
thogonally diagonalizable.
3. Determine whether each statement is true or false. Justify your answer with a
proof or a counter example.
(a) Every orthogonal matrix is orthogonally diagonalizable.
(b) If A and B are orthogonally diagonalizable, then AB is orthogonally diag-
onalizable.
(c) If A is orthogonally similar to a symmetric matrix B, then A is orthogo-
nally diagonalizable.
10.3 Quadratic Forms
We now use the results of the last section to study a very important class of functions
called quadratic forms. Quadratic forms are not only important in linear algebra, but
also in number theory, group theory, dierential geometry, and many other areas.
Simply put, a quadratic form is a function dened as a linear combination of all
possible terms x
i
x
j
for 1 i j n. For example, for two variables x
1
, x
2
we have
a
1
x
2
1
+a
2
x
1
x
2
+a
3
x
2
2
, and for three variables x
1
, x
2
, x
3
we get a
1
x
2
1
+a
2
x
1
x
2
+a
3
x
1
x
3
+
a
4
x
2
2
+ a
5
x
2
x
3
+ a
6
x
2
3
.
Notice that a quadratic form is certainly not a linear function. Instead, they are tied
to linear algebra by matrix multiplication. For example, if B =
_
1 2
0 1
_
and x =
_
x
1
x
2
_
,
then we have
Q(x) = x
T
Bx =
_
x
1
x
2
_
_
1 2
0 1
_ _
x
1
x
2
_
=
_
x
1
x
2
_
_
x
1
+ 2x
2
x
2
_
= x
1
(x
1
+ 2x
2
) + x
2
(x
2
) = x
2
1
+ 2x
1
x
2
x
2
2
which is a quadratic form. We now use this to precisely dene a quadratic form.
Section 10.3 Quadratic Forms 83
DEFINITION
Quadratic Form
A quadratic form on R
n
with corresponding matrix n n matrix A is dened by
Q(x) = x
T
Ax, for any x R
n
There is a connection between the dot product and quadratic forms. Observe that
Q(x) = x
T
Ax = x (Ax) = (Ax) x = (Ax)
T
x = x
T
A
T
x
Hence, A and A
T
give the same quadratic form (this does not imply that A = A
T
). Of
course, this is most natural when A is symmetric. In fact, it can be shown that every
quadratic form can be written as Q(x) = x
T
Ax where A is symmetric. Moreover, each
symmetric matrix A uniquely determines a quadratic form. Thus, as we will see, we
often deal with a quadratic form and its corresponding symmetric matrix in the same
way.
EXAMPLE 1 Let B =
_
1 2
0 1
_
. Find the symmetric matrix corresponding to Q(x) = x
T
Bx.
Solution: From our work above we have
Q(x) = x
T
Bx = x
2
1
+ 2x
1
x
2
x
2
2
We want to nd a symmetric matrix A =
_
a b
b c
_
such that
x
2
1
+ 2x
1
x
2
x
2
2
= x
T
Ax
We have that
x
2
1
+ 2x
1
x
2
x
2
2
= x
T
Ax =
_
x
1
x
2
_
_
a b
b c
_ _
x
1
x
2
_
=
_
x
1
x
2
_
_
ax
1
+ bx
2
bx
1
+ cx
2
_
= x
1
(ax
1
+ bx
2
) + x
2
(bx
1
+ cx
2
)
= a
1
x
2
1
+ 2a
2
x
1
x
2
+ a
3
x
2
2
Hence, we take a
1
= 1, a
2
= 1, and a
3
= 1. Consequently, the symmetric matrix
corresponding to Q(x) is A =
_
1 1
1 1
_
.
For a given quadratic form on R
n
, an easy way to think of the corresponding sym-
metric matrix is as a grid with the rows and columns corresponding to the variables
x
1
, . . . , x
n
. So the 11 entry of the matrix is the x
1
x
1
grid and so must have the coe-
cient of x
2
1
. The 12 entry of the matrix is the x
1
x
2
grid, but the 21 entry is also a x
1
x
2
grid, so they have to equally split the coecient of x
1
x
2
between them. Of course,
this also works in reverse.
EXAMPLE 2 Let A =
_
1 4
4 3
_
. Then the quadratic form corresponding to A is
Q(x
1
, x
2
) = x
2
1
+ 2(4)x
1
x
2
+ 3x
2
2
= x
2
1
+ 8x
1
x
2
+ 3x
2
2
EXAMPLE 3 Let A =
_
_
1 2 3
2 4 0
3 0 1
_
_
. Then the quadratic form corresponding to A is
Q(x
1
, x
2
, x
3
) = x
2
1
+ 2(2)x
1
x
2
+ 2(3)x
1
x
3
4x
2
2
+ 2(0)x
2
x
3
x
2
3
= x
2
1
+ 4x
1
x
2
6x
1
x
3
4x
2
2
x
2
3
EXERCISE 1 Let A =
_
_
3 1/2 1
1/2 3/2 2
1 2 0
_
_
. Find the quadratic form corresponding to A.
EXAMPLE 4 Let Q(x
1
, x
2
) = 2x
2
1
+ 3x
1
x
2
4x
2
2
. Find the corresponding symmetric matrix A.
Solution: We get a
11
= 2, 2a
12
= 3, and a
22
= 4. Hence,
A =
_
2 3/2
3/2 4
_
EXAMPLE 5 Let Q(x
1
, x
2
, x
3
) = x
2
1
2x
1
x
2
+ 8x
1
x
3
+ 3x
2
2
+ x
2
x
3
5x
2
3
, Find the corresponding
symmetric matrix A.
Solution: We have a
11
= 1, 2a
12
= 2, 2a
13
= 8, a
22
= 3, 2a
23
= 1, and a
33
= 5.
Hence,
A =
_
_
1 1 4
1 3 1/2
4 1/2 5
_
_
EXERCISE 2 Let Q(x
1
, x
2
, x
3
) = 3x
2
1
+
1
2
x
1
x
2
x
2
2
+2x
2
x
3
+2x
2
3
, Find the corresponding symmetric
matrix A.
EXAMPLE 6 Let Q(x
1
, x
2
, x
3
) = x
2
1
2x
2
2
3x
2
3
, then the corresponding symmetric matrix is
A =
_
_
1 0 0
0 2 0
0 0 3
_
_
The terms x
i
x
j
with i j are called cross-terms. Thus, a quadratic form Q(x) has no
cross terms if and only if its corresponding symmetric matrix is diagonal. We will
call such a quadratic form a diagonal quadratic form.
Classifying Quadratic Forms
DEFINITION Let Q(x) be a quadratic form. Then
1. Q(x) is positive denite if Q(x) > 0 for all x

0.
2. Q(x) is negative denite if Q(x) < 0 for all x

0.
3. Q(x) is indenite if Q(x) > 0 for some x and Q(x) < 0 for some x.
4. Q(x) is positive semidenite if Q(x) 0 for all x and Q(x) = 0 for some x

0.
5. Q(x) is negative semidenite if Q(x) 0 for all x and Q(x) = 0 for some x

0.
EXAMPLE 7 Q(x
1
, x
2
) = x
2
1
+ x
2
2
is positive denite, Q(x
1
, x
2
) = x
2
1
x
2
2
is negative de-
nite, Q(x
1
, x
2
) = x
2
1
x
2
2
is indenite, Q(x
1
, x
2
) = x
2
1
is positive semidenite, and
Q(x
1
, x
2
) = x
2
1
is negative semidenite.
We classify symmetric matrices in the same way that we classify their corresponding
quadratic forms.
EXAMPLE 8 A =
_
1 0
0 1
_
corresponds to the quadratic form Q(x
1
, x
2
) = x
2
1
+ x
2
2
, so A is positive
denite.
EXAMPLE 9 Classify the symmetric matrix A =
_
4 2
2 4
_
and the corresponding quadratic form
Q(x) = x
T
Ax.
Solution: Observe that we have
Q(x) = 4x
2
1
4x
1
x
2
+ 4x
2
2
= (2x
1
x
2
)
2
+ 3x
2
2
> 0
whenever x

0. Thus, Q(x) and A are both positive denite.
REMARK
The denitions, of course, can be generalized to any function and matrix. For exam-
ple, the matrix A =
_
1 1
0 1
_
is positive denite since
x
T
Ax = x
2
1
+ x
1
x
2
+ x
2
2
= (x
1
+
1
2
x
2
)
2
+
3
4
x
2
2
> 0
for all x

0. In this book we will just be dealing with quadratic forms and symmetric
matrices. However, it is important to remember that Assume A is positive denite
does not imply that A is symmetric.
Each of the quadratic forms in the example above was simple to classify since it either
had no cross-terms or we were easily able to complete a square. To classify more
dicult quadratic forms, we use our theory from the last section and the connection
between quadratic forms and symmetric matrices.
THEOREM 1 Let A be the symmetric matrix corresponding to a quadratic form Q(x) = x
T
Ax. If P
is an orthogonal matrix that diagonalizes A, then Q(x) can be expressed as
1
y
2
1
+
n
y
2
n
where
_
_
y
1
.
.
.
y
n
_
_
= y = P
T
x and where
1
, . . . ,
n
are the eigenvalues of A corresponding
to the columns of P.
Proof: Since P is orthogonal, if y = P
T
x, then x = Py. Moreover, we have that
P
T
AP = D is diagonal where the diagonal entries of D are the eigenvalues
1
, . . . ,
n
of A. Hence,
Q(x) = x
T
Ax = (Py)
T
A(Py) = y
T
P
T
APy = y
T
Dy =
1
y
2
1
+
n
y
2
n
This theoremshows that every quadratic form Q(x) is equivalent to a diagonal quadratic
form. Moreover, Q(x) can be brought into this form by the change of variables
y = P
T
x where P is an orthogonal matrix which diagonalizes the corresponding
symmetric matrix A.
EXAMPLE 10 Find new variables y
1
, y
2
, y
3
, y
4
such that
Q(x
1
, x
2
, x
3
, x
4
) = 3x
2
1
+2x
1
x
2
10x
1
x
3
+10x
1
x
4
+3x
2
2
+10x
2
x
3
10x
2
x
4
+3x
2
3
+2x
3
x
4
+3x
2
4
has diagonal form. Use the diagonal form to classify Q(x).
Solution: We have corresponding symmetric matrix A =
_
_
3 1 5 5
1 3 5 5
5 5 3 1
5 5 1 3
_
_
.
We rst nd an orthogonal matrix P which diagonalizes A. We have C() = (
12)( + 8)( 4)
2
and nding the corresponding normalized eigenvectors gives
P =
1
2
_
_
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
_
_
with P
T
AP =
_
_
12 0 0 0
0 8 0 0
0 0 4 0
0 0 0 4
_
_
. Thus, if
y =
_
_
y
1
y
2
y
3
y
4
_
_
= P
T
_
_
x
1
x
2
x
3
x
4
_
_
=
1
2
_
_
x
1
x
2
x
3
+ x
4
x
1
x
2
+ x
3
x
4
x
1
+ x
2
+ x
3
+ x
4
x
1
+ x
2
x
3
+ x
4
_
_
then we get
Q = 12y
2
1
8y
2
2
+ 4y
2
3
+ 4y
2
4
Therefore, Q(x) clearly takes positive and negative values, so Q(x) is indenite.
REMARKS
1. The eigenvectors we used to make up P are the principal axes of A. We will
see a geometric interpretation of this in the next section.
2. By changing the order of the eigenvectors in P we also change the order of
the eigenvalues in D and hence the coecients of the corresponding y
i
. For
example, if we took
P =
1
2
_
_
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
_
_
then we would get Q = 8y
2
1
+4y
2
2
+4y
2
3
+12y
2
4
. Notice, that since we can pick
any vector y R
4
, this does in fact give us exactly the same set of values as
our choice above. Alternatively, you can think of just doing another change of
variables z
1
= y
2
, z
2
= y
3
, z
3
= y
4
, and z
4
= y
1
.
Generalizing the method of the example above gives us the following theorem.
THEOREM 2 Let A be a symmetric matrix. Then the quadratic form Q(x) = x
T
Ax is
(1) positive denite if and only if the eigenvalues of A are all positive.
(2) negative denite if and only if the eigenvalues of A are all negative.
(3) indenite if and only if the some of the eigenvalues of A are positive and some
are negative.
(4) positive semidenite if and only if it has some zero eigenvalues and the rest
positive.
(5) negative semidenite if and only if has some zero eigenvalues and the rest
negative.
Proof: We will prove (1). The proofs of the others are similar. By the Principal Axis
Theorem A is orthogonally diagonalizable, so there exists an orthogonal matrix P
such that
Q(x) = x
T
Ax = y
T
Dy =
1
y
2
1
+ +
n
y
2
n
where y = P
T
x and
1
, . . . ,
n
are the eigenvalues of A. Consequently, Q(x) > 0 for
all x

0 if and only if all
i
are positive. Thus, Q(x) is positive denite if and only if
the eigenvalues of A are all positive.
EXAMPLE 11 Classify the following:
(a) Q(x
1
, x
2
) = 4x
2
1
6x
1
x
2
+ 2x
2
2
.
_
4 3
3 2
_
so
det(A I) =
4 3
3 2
=
2
6 1
Applying the quadratic formula we get =
6
40
2
. So, we have both positive and
negative eigenvalues so Q is indenite.
(b) Q(x
1
, x
2
, x
3
) = 2x
2
1
+ 2x
1
x
2
+ 2x
1
x
3
+ 2x
2
2
+ 2x
2
x
3
+ 2x
2
3
.
_
_
2 1 1
1 2 1
1 1 2
_
_
. We can nd that the eigenvalues are 1, 1 and 4.
Hence all the eigenvalues of A are positive so A is positive denite.
(c) A =
_
_
3 2 0
2 2 2
0 2 1
_
_
.
Solution: The eigenvalues of A are 5, 2, and 1. Therefore, A is indenite.
1. For each of the following quadratic forms Q(x):
(i) Find the corresponding symmetric matrix.
(ii) Classify the quadratic form and its corresponding symmetric matrix.
(iii) Find a corresponding diagonal form of Q(x) and the change of variables
which brings it into this form.
(a) Q(x) = x
2
1
+ 3x
1
x
2
+ x
2
2
(b) Q(x) = 8x
2
1
+ 4x
1
x
2
+ 11x
2
2
(c) Q(x) = x
2
1
+ 8x
1
x
2
5x
2
2
(d) Q(x) = x
2
1
+ 4x
1
x
2
+ 2x
2
2
(e) Q(x) = 4x
2
1
+ 4x
1
x
2
+ 4x
1
x
3
+ 4x
2
2
+ 4x
2
x
3
+ 4x
2
3
(f) Q(x) = 4x
1
x
2
4x
1
x
3
+ x
2
2
x
2
3
(g) Q(x) = 4x
2
1
2x
1
x
2
+ 2x
1
x
3
+ 4x
2
2
+ 2x
2
x
3
+ 4x
2
3
(h) Q(x) = 4x
2
1
+ 2x
1
x
2
+ 2x
1
x
3
4x
2
2
2x
2
x
3
4x
2
3
2. Let Q(x) = x
T
Ax with A =
_
a b
b c
_
and det A 0.
(a) Prove that Q is positive denite if det A > 0 and a > 0.
(b) Prove that Q is negative denite if det A > 0 and a < 0.
(c) Prove that Q is indenite if det A < 0.
3. Let A be an n n matrix and let x, y R
n
. Dene < x, y >= x
T
Ay. Prove that
< , > is an inner product on R
n
if and only if A is a positive denite, symmetric
matrix.
4. Let A be a positive denite symmetric matrix. Prove that:
(a) the diagonal entries of A are all positive.
(b) A is invertible.
5. Let A and B be symmetric n n matrices whose eigenvalues are all positive.
Show that the eigenvalues of A + B are all positive.
10.4 Graphing Quadratic Forms
In many applications of quadratic forms it is important to be able to sketch the graph
of a quadratic form Q(x) = k for some k. Observe that in general it is not easy
to identify the shape of the graph of a general equation ax
2
1
+ bx
1
x
2
+ cx
2
2
= k by
inspection. On the other hand, if you are familiar with conic sections, it is easy to
determine the shape of the graph of an equation of the form
1
x
2
1
+
2
x
2
2
= k. We
demonstrate this in a table.
The graph of
1
x
2
1
+
2
x
2
2
= k looks like:
k > 0 k = 0 k < 0
1
> 0,
2
> 0 ellipse point(0, 0) dne
1
< 0,
2
< 0 dne point(0, 0) ellipse
2
< 0 hyperbola asymptotes for hyperbola hyperbola
1
= 0,
2
> 0 parallel lines line x
2
= 0 dne
1
= 0,
2
< 0 dne line x
2
= 0 parallel lines
.
In the last section we saw that we could bring a quadratic form into diagonal form
by performing the change variables y = P
T
x where P orthogonally diagonalizes
the corresponding symmetric matrix. Therefore, to sketch an equation of the form
ax
2
1
+ bx
1
x
2
+ cx
2
2
= k we could rst apply the change of variables to bring it into the
form
1
y
2
1
+
2
y
2
2
= k which will be easy to sketch. However, we must determine how
performing the change of variables will aect the graph.
THEOREM 1 Let Q(x) = ax
2
1
+ bx
1
x
2
+ cx
2
2
. Then there exists an orthogonal matrix P which
corresponds to a rotation such that the change of variables y = P
T
x brings Q(x) into
diagonal form.
Proof: Let A =
_
a b/2
b/2 c
_
. Since A is symmetric we can apply the Principal Axis
Theorem to get that there exists an orthonormal basis v
1
, v
2
of R
2
of eigenvectors of
A. Let v
1
=
_
a
1
a
2
_
and v
2
=
_
b
1
b
2
_
. Since v
1
is a unit vector we must have a
2
1
+ a
2
2
= 1.
Hence, the components lie on the unit circle and so there exists an angle such that
a
1
= cos and a
2
= sin . Moreover, since v
2
is a unit vector orthogonal to v
1
we can
pick b
1
= sin and b
2
= cos . Hence we have
P =
_
cos sin
sin cos
_
which corresponds to a rotation by . Finally, from our work above, we know that
this change of basis matrix brings Q into diagonal form.
Section 10.4 Graphing Quadratic Forms 91
REMARK
Observe that our choices for v
1
and v
2
in the proof above were not unique. For
example, we could have picked v
1
=
_
cos
sin
_
and v
2
=
_
sin
cos
_
. In this case the matrix
P =
_
v
1
v
2
_
would correspond to a rotation and a reection. This does not contradict
the theorem though. The theorem only says that we can choose P to correspond to a
rotation, not that a rotation is the only choice. We will demonstrate this in an example
below.
EXAMPLE 1 Consider the equation x
2
1
+ x
1
x
2
+ x
2
2
= 1. Find a rotation so that the equation has no
cross terms.
_
1 1/2
1/2 1
_
. The eigenvalues are
1
=
1
2
and
2
=
3
2
.
Therefore, the graph of x
2
1
+x
1
x
2
+x
2
2
= 1 is an ellipse. We nd that the corresponding
unit eigenvectors are v
1
=
1
2
_
1
1
_
and v
2
=
1
2
_
1
1
_
. So, from our work in the proof
of the theorem, we can pick cos =
1
2
and sin =
1
2
. Hence, =
3
4
and
_
x
1
x
2
_
=
_
1/
2 1/
2
1/
2 1/
2
_ _
y
1
y
2
_
=
_
(y
1
y
2
)/
2
(y
1
y
2
)/
2
_
Substituting these into the equation gives
1 = x
2
1
+ x
1
x
2
+ x
2
2
=
_
y
1
y
2
2
_
2
+
_
y
1
y
2
2
_ _
y
1
y
2
2
_
+
_
y
1
y
2
2
_
2
=
1
2
(y
1
+ 2y
1
y
2
+ y
2
2
) +
1
2
(y
2
1
+ y
2
2
) +
1
2
(y
2
1
2y
1
y
2
+ y
2
2
)
=
1
2
y
2
1
+
3
2
y
2
2
Thus, the graph of x
2
1
+ x
1
x
2
+ x
2
2
= 1 is the graph of 1 =
1
2
y
2
1
+
3
2
y
2
2
rotated by
3
4
degrees. This is shown below.
Let us look at this procedure in general. Assume that we want to sketch Q(x
1
, x
2
) = k
where Q(x
1
, x
2
) is a quadratic form with corresponding symmetric matrix A. We
rst write Q(x
1
, x
2
) in diagonal form
1
y
2
1
+
2
y
2
2
by nding the orthogonal matrix
P =
_
v
1
v
2
_
which diagonalizes A and performing the change of variables y = P
T
x.
We can then easily sketch
1
y
2
1
+
2
y
2
2
= k (nding the equations of the asymptotes if
it is a hyperbola) in the y
1
y
2
-plane. Then using the fact that
x = Py =
_
v
1
v
2
_
_
y
1
y
2
_
= y
1
v
1
+ y
2
v
2
we can nd the y
1
-axis and the y
2
-axis in the x
1
x
2
-plane. In particular, the y
1
-axis in
the y
1
y
2
-plane is spanned by
_
1
0
_
, thus the y
1
-axis in the x
1
x
2
-plane is spanned by
x = 1v
1
+ 0v
2
= v
1
Hence, performing the change of variables rotates the y
1
-axis to be in the direction
of v
1
. Similarly, the y
2
axis is rotated to the v
2
direction. Therefore, we can sketch
the graph of Q(x
1
, x
2
) = k in the x
1
x
2
-plane by drawing the y
1
and y
2
axes in the
x
1
x
2
-plane and sketching the graph of
1
y
2
1
+
2
y
2
2
= k on these axes as we did in
the y
1
y
2
-plane. For this reason, the orthogonal eigenvectors v
1
, v
2
of A are called the
principal axes for Q(x
1
, x
2
).
EXAMPLE 2 Sketch 6x
2
1
+ 6x
1
x
2
2x
2
2
= 5.
Solution: We start by nding an orthogonal matrix which diagonalizes the corre-
sponding symmetric matrix A.
We have A =
_
6 3
3 2
_
, so det(A I) =
6 3
3 2
= ( 7)( + 3). We
nd corresponding eigenvectors of A are
_
3
1
_
for = 7 and
_
1
3
_
for = 3. Thus,
P =
_
3/
10 1/
10
1/
10 3/
10
_
orthogonally diagonalizes A.
Then, from our work above, we know that the
change of variables y = P
T
x changes 6x
2
1
+
6x
1
x
2
2x
2
2
= 5 into 7y
2
1
3y
2
2
= 5. To sketch
the hyperbola 5 = 7y
2
1
3y
2
2
more accurately, we
rst nd the equations of its asymptotes. We get
0 = 7y
2
1
3y
2
2
y
2
=
21
3
y
1
1.528y
1
Now to sketch 6x
2
1
+6x
1
x
2
2x
2
2
= 5 in the x
1
x
2
-plane we rst draw the y
1
-axis in the
direction of
_
3
1
_
and the y
2
-axis in the direction of
_
1
3
_
. Next, to be more precise, we
use the change of variables y = P
T
x to convert the equations of the asymptotes of the
hyperbola to equations in the x
1
x
2
-plane. We have
_
y
1
y
2
_
=
1
10
_
3 1
1 3
_ _
x
1
x
2
_
=
1
10
_
3x
1
+ x
2
x
1
+ 3x
2
_
Hence, we get the equations of the asymptotes are
y
2
=
21
3
y
1
1
10
(x
1
+ 3x
2
) =
21
3
1
10
(3x
1
+ x
2
)
3x
1
+ 9x
2
= 3
21x
1

21x
2
x
2
(9
21) = (3 3
21)x
1
x
2
=
3 3
21
9
21
x
1
Plotting the asymptotes and copying the graph
from above onto the y
1
and y
2
axes gives the pic-
ture to the right.
REMARK
The advantage of nding the principal axes over just calculating the rotation is that
we actually get precise equations of the axes and the asymptotes which are required
in some applications.
EXAMPLE 3 Sketch 2x
2
1
+ 4x
1
x
2
+ 5x
2
2
= 1.
_
2 2
2 5
_
, so det(A I) =
2 2
2 5
= ( 6)( 1).
We nd corresponding eigenvectors of A are
_
1
2
_
for = 6 and
_
2
1
_
for = 1. To
demonstrate a rotation and a reection, we will pick P =
_
2/
5 1/
5
1/
5 2/
5
_
.
We nd corresponding eigenvectors of A are
_
1
2
_
for = 6 and
_
2
1
_
for = 1. To demon-
strate a rotation and a reection, we will pick
P =
_
2/
5 1/
5
1/
5 2/
5
_
. The change of variables
y = P
T
x changes 2x
2
1
+ 4x
1
x
2
+ 5x
2
2
= 1 into the
ellipse y
2
1
+ 6y
2
2
= 1. Graphing gives the picture
to the right.
To sketch 2x
2
1
+4x
1
x
2
+5x
2
2
= 1 in the x
1
x
2
-plane
we draw the y
1
_
2
1
_
and
the y
2
_
1
2
_
and copy the
picture above onto the axes appropriately to get
the picture to the right.
EXAMPLE 4 Sketch x
2
1
+ 4x
1
x
2
2x
2
2
= 1.
_
1 2
2 2
_
so det(AI) =
1 2
2 2
= (+3)(2). We
nd corresponding eigenvectors of A are
_
1
2
_
for = 3 and
_
2
1
_
for = 2. Thus,
P =
_
1/
5 2/
5
2/
5 1/
5
_
orthogonally diagonalizes A.
Then, the change of variables y = P
T
x changes
x
2
1
+ 4x
1
x
2
2x
2
2
= 1 into 3y
2
1
+ 2y
2
2
= 5. This
is a hyperbola, so we nd the equations of its
asymptotes. We get
0 = 3y
2
1
+ 2y
2
2
y
2
=
6
2
y
1
1.225y
1
To sketch x
2
1
+ 4x
1
x
2
2x
2
2
= 1 in the x
1
x
2
-plane we rst draw the y
1
-axis in the
direction of
_
1
2
_
and the y
2
_
2
1
_
. Next, we use the change
of variables y = P
T
x to convert the equations of the asymptotes of the hyperbola to
equations in the x
1
x
2
-plane. We have
_
y
1
y
2
_
=
1
5
_
1 2
2 1
_ _
x
1
x
2
_
=
1
5
_
x
1
+ 2x
2
2x
1
+ x
2
_
Hence, we get the equations of the asymptotes are
y
2
=
6
2
x
1
1
5
(2x
1
+ x
2
) =
6
2
1
5
(x
1
+ 2x
2
)
4x
1
+ 2x
2
=
6x
1
2
6x
2
x
2
(2 2
6) = (4
6)x
1
x
2
=
4
6
2 2
6
x
1
Plotting the asymptotes and copying the graph
from above onto the y
1
and y
2
axes gives the pic-
ture to the right.
Notice that in the last two examples, the change of variables we used actually corre-
sponded to a reection and a rotation.
1. Sketch the graph of each of the following equations showing both the original
and new axes. For any hyperbola, nd the equation of the asymptotes.
(a) x
2
1
+ 8x
1
x
2
+ x
2
2
= 6 (b) 3x
2
1
2x
1
x
2
+ 3x
2
2
= 12
(c) 4x
2
1
+ 4x
1
x
2
7x
2
2
= 8 (d) 3x
2
1
4x
1
x
2
= 4
(e) x
2
1
+ 4x
1
x
2
+ 2x
2
2
= 6 (f) 4x
2
1
+ 4x
1
x
2
+ 4x
2
2
= 12
10.5 Optimizing Quadratic Forms
In calculus we use quadratic forms to classify critical points as local minimums and
maximums. However, in many other applications of quadratic forms, we actually
want to nd the maximum and/or minimum of the quadratic form subject to a con-
straint. Most of the time, it is possible to use a change of variables so that the con-
straint is jx j = 1. That is, given a quadratic form Q(x) = x
T
Ax on R
n
we want to nd
the maximum and minimum value of Q(x) subject to jx j = 1. For ease, we instead
use the equivalent constraint
1 = jx j
2
= x
2
1
+ x
2
n
To develop a procedure for solving this problem in general, we begin by looking at a
couple of examples.
EXAMPLE 1 Find the maximum and minimum of Q(x
1
, x
2
) = 2x
2
1
+ 3x
2
2
subject to the constraint
x
2
1
+ x
2
2
= 1.
Solution: Since we dont have a general method for solving this type of problem, we
resort to the basics. We will rst try a few points (x
1
, x
2
) that satisfy the constraint.
We get
Q(1, 0) = 2
Q
_
3
2
,
1
2
_
_
=
9
4
Q
_
1
2
,
1
2
_
=
5
2
Q
_
_
1
2
,
3
2
_
_
=
11
4
Q(0, 1) = 3
Although we have only tried a few points, we may guess that 3 is the maximumvalue.
Since we have already found that Q(0, 1) = 3, to prove that 3 is the maximum we just
need to show that 3 is an upper bound for Q(x
1
, x
2
) subject to x
2
1
+ x
2
2
= 1. Indeed,
we have that
Q(x
1
, x
2
) = 2x
2
1
+ 3x
2
2
3x
2
1
+ 3x
2
2
= 3(x
2
1
+ x
2
2
) = 3
Hence, 3 is the maximum.
Similarly, we have
Q(x
1
, x
2
) = 2x
2
1
+ 3x
2
2
2x
2
1
+ 2x
2
2
= 2(x
2
1
+ x
2
2
) = 2
Hence, 2 is a lower bound for Q(x
1
, x
2
) subject to the constraint and Q(1, 0) = 2, so
2 is the minimum.
Section 10.5 Optimizing Quadratic Forms 97
This example was extremely easy since we had no cross-terms in the quadratic form.
We now look at what happens if we have cross-terms.
EXAMPLE 2 Find the maximum and minimum of Q(x
1
, x
2
) = 7x
2
1
+ 8x
1
x
2
+ x
2
2
subject to the
constraint x
2
1
+ x
2
2
= 1.
Solution: We rst try some values of (x
1
, x
2
) that satisfy the constraint. We have
Q(1, 0) = 7
Q
_
1
2
,
1
2
_
= 8
Q(0, 1) = 1
Q
_
2
,
1
2
_
= 0
Q(1, 0) = 7
From this we might think that the maximum is 8 and the minimum is 0, although we
again realize that we have tested very few points. Indeed, if we try more points, we
quickly nd that these are not the maximum and minimum. So, instead of testing
points, we need to try to think of where the maximum and minimum should occur.
In the previous example, they occurred at (1, 0) and (0, 1) which are on the principal
axes. Thus, it makes sense to consider the principal axes of this quadratic form as
well.
We nd that Q(x
1
, x
2
) has eigenvalues = 9 and = 1 with corresponding unit
eigenvectors v
1
=
_
2/
5
1/
5
_
and v
2
=
_
1/
5
2/
5
_
. Taking these for (x
1
, x
2
) we get
Q
_
2
5
,
1
5
_
= 9, Q
_
1
5
,
2
5
_
= 1
Moreover, taking
y = P
T
x =
1
5
_
2 1
1 2
_ _
x
1
x
2
_
=
1
5
_
2x
1
+ x
2
x
1
2x
2
_
we get
Q(x
1
, x
2
) = y
T
Dy = 9
_
2
5
x
1
+
1
5
x
2
_
2
_
1
5
x
1

2
5
x
2
_
2
9
_
2
5
x
1
+
1
5
x
2
_
2
+ 9
_
1
5
x
1

2
5
x
2
_
2
= 9
_
_
_
2
5
x
1
+
1
5
x
2
_
2
+
_
1
5
x
1

2
5
x
2
_
2
_
_
= 9
since
_
2
5
x
1
+
1
5
x
2
_
2
+
_
1
5
x
1

2
5
x
2
_
2
= x
2
1
+ x
2
2
= 1
Similarly, we can show that Q(x
1
, x
2
) 1. Thus, the maximum of Q(x
1
, x
2
) subject
to jx j = 1 is 9 and the minimum is 1.
We generalize what we learned in the examples above to get the following theorem.
THEOREM 1 Let Q(x) be a quadratic form on R
n
with corresponding symmetric matrix A. The
maximum value and minimum value of Q(x) subject to the constraint jx j = 1 are the
greatest and least eigenvalues of A respectively. Moreover, these values occur when
x is taken to be a corresponding unit eigenvector of the eigenvalue.
Proof: Since A is symmetric, we can nd an orthogonal matrix P =
_
v
1
v
n
_
which orthogonally diagonalizes A where v
1
, . . . , v
n
are arranged so that the corre-
sponding eigenvalues
1
, . . . ,
n
satisfy
1

2

n
.
Hence, there exists a diagonal matrix D such that
x
T
Ax = y
T
Dy
where x = Py. Since P is orthogonal we have that
jy j = jPy j = jx j = 1
Thus, the quadratic form y
T
Dy subject to jy j = 1 takes the same set of values as the
quadratic form x
T
Ax subject to jx j = 1. Hence, we just need to nd the maximum
and minimum of y
T
Dy subject to jy j = 1.
Since y
2
1
+ + y
2
n
= jy j
2
= 1 we have
y
T
Dy =
1
y
2
1
+ +
n
y
2
n

1
y
2
1
+ +
1
y
2
n
=
1
(y
2
1
+ + y
2
n
) =
1
and y
T
Dy =
1
with y =
_
_
1
0
.
.
.
0
_
_
. Hence
1
is the maximum value.
Similarly,
y
T
Dy =
1
y
2
1
+ +
n
y
2
n

n
y
2
1
+ +
n
y
2
n
=
n
(y
2
1
+ + y
2
n
) =
n
and y
T
Dy =
n
with y =
_
_
0
.
.
.
0
1
_
_
. Hence
n
is the minimum value.
EXAMPLE 3 Let A =
_
_
3 0 1
0 3 0
1 0 3
_
_
. Find the maximum and minimum value of the quadratic form
Q(x) = x
T
Ax subject to x
2
1
+ x
2
2
+ x
2
3
= 1.
C() =
3 0 1
0 3 0
1 0 3
= ( 3)( + 4)( + 2)
Thus, by Theorem 1 the maximum of Q(x) subject to the constraint is 3 and the
minimum is 4.
Section 10.6 Singular Value Decomposition 99
EXAMPLE 4 Find the maximum and minimum value of the quadratic form
Q(x
1
, x
2
, x
3
) = 4x
2
1
2x
1
x
2
+ 2x
1
x
3
+ 4x
2
2
2x
2
x
3
+ 4x
2
3
subject to x
2
1
+ x
2
2
+ x
2
3
= 1. Find vectors at which the maximum and minimum occur.
Solution: The characteristic polynomial of the corresponding symmetric matrix is
C() =
4 1 1
1 4 1
1 1 4
= ( 5)
2
( 2)
We nd that a unit eigenvectors corresponding to
1
= 5 and
2
= 2 are v
1
=
_
_
1/
2
1/
2
0
_
_
and v
2
=
_
_
1/
3
1/
3
1/
3
_
_
respectively. Thus, by Theorem 1 the maximum of Q(x) subject
to x
2
1
+ x
2
2
+ x
2
3
= 1 is 5 and occurs when x = v
1
, and the minimum is 2 and occurs
when x = v
2
.
1. Find the maximum and minimum of each quadratic form subject to jx j = 1.
(a) Q(x) = 4x
2
1
2x
1
x
2
+ 4x
2
2
(b) Q(x) = 3x
2
1
+ 10x
1
x
2
3x
2
2
(c) Q(x) = x
2
1
+ 8x
2
2
+ 4x
2
x
3
+ 11x
2
3
(d) Q(x) = 3x
2
1
4x
1
x
2
+ 8x
1
x
3
+ 6x
2
2
+ 4x
2
x
3
+ 3x
2
3
(e) Q(x) = 2x
2
1
+ 2x
1
x
2
+ 2x
1
x
3
+ 2x
2
2
+ 2x
2
x
3
+ 2x
2
3
10.6 Singular Value Decomposition
We have now seen that nding the maximum and minimum of a quadratic form sub-
ject to jx j = 1 is very easy. However, in many applications we are not lucky enough
to get a symmetric matrix; in fact, we often do not even get a square matrix. Thus, it
is very desirable to derive a similar method for nding the maximum and minimum
of jAx j for an m n matrix A subject to jx j = 1. This derivation will lead us to a
very important matrix factorization known as the singular value decomposition. We
again begin by looking at an example.
EXAMPLE 1 Consider a linear mapping f : R
3
R
2
dened by f (x) = Ax with A =
_
1 0 1
1 1 1
_
.
Since the range is now a subspace of R
2
, we want to nd the maximum length of Ax
subject to jx j = 1. That is, we want to maximize jAx j. Observe that
jAx j
2
= (Ax)
T
(Ax) = x
T
A
T
Ax
Since A
T
A is symmetric this is a quadratic form. Hence, using Theorem 10.5.1, to
maximize jAx j we just need to nd the square root of the largest eigenvalue of A
T
A.
We have
A
T
A =
_
_
2 1 0
1 1 1
0 1 2
_
_
The characteristic polynomial of A
T
A is C() = ( 3)( 2). Thus, the eigen-
values of A
T
A are 3, 2 and 0. Hence, the maximum of jAx j subject to jx j = 1 is

3
and the minimum is 0. Moreover, these occur at the corresponding eigenvectors
_
_
1/
3
1/
3
1/
3
_
_
and
_
_
1/
6
2/
6
1/
6
_
_
of A
T
A.
In the example, we found that the eigenvalues of A
T
A were all non-negative. This
was important as we had to take the square root of them to maximize/minimize jAx j.
We now prove that this is always the case.
THEOREM 1 Let A be an mn matrix and let
1
, . . . ,
n
be the eigenvalues of A
T
A with correspond-
ing unit eigenvectors v
1
, . . . , v
n
. Then
1
, . . . ,
n
are all non-negative. In particular,
jAv
i
j =
_
i
Proof: For 1 i n we have A
T
Av
i
=
i
v
i
and hence
jAv
i
j
2
= (Av
i
)
T
Av
i
= v
T
i
A
T
Av
i
= v
T
i
(
i
v
i
) =
i
v
T
i
v
i
=
i
jv
i
j =
i
since jv
i
j = 1.
Observe from the example above that the square root of the largest and smallest
eigenvalues of A
T
A are the maximum and minimum of values of jAx j subject to
jx j = 1. So, these are behaving like the eigenvalues of a symmetric matrix. This
motivates the following denition.
DEFINITION
Singular Values
The singular values
1
, . . . ,
n
of an m n matrix A are the square roots of the
eigenvalues of A
T
A arranged so that
1

2

n
0.
EXAMPLE 2 Find the singular values of A =
_
1 1 1
2 2 2
_
.
Solution: We have A
T
A =
_
_
5 5 5
5 5 5
5 5 5
_
_
. The characteristic polynomial is C() =
2
( 15) so the eigenvalues are (from greatest to least)
1
= 15,
2
= 0, and
3
= 0. Thus, the singular values of A are
1
=
15,
2
= 0, and
3
= 0.
EXAMPLE 3 Find the singular values of B =
_
_
1 1
1 2
1 1
_
_
.
Solution: We have B
T
B =
_
3 2
2 6
_
. The characteristic polynomial is C() = (
2)(7). Hence, the eigenvalues of B
T
B are
1
= 7 and
2
= 2, so the singular values
of B are
1
=
7 and
2
=
2.
EXAMPLE 4 Find the singular values of A =
_
_
2 1
1 0
1 2
_
_
and B =
_
2 1 1
1 0 2
_
.
Solution: We have A
T
A =
_
6 0
0 5
_
. Hence, the eigenvalues of A
T
A are
1
= 6 and
2
= 5. Thus, the singular values of A are
1
=
6 and
2
=
5.
We have B
T
B =
_
_
5 2 0
2 1 1
0 1 5
_
_
. The characteristic polynomial of B
T
B is
C() =
5 2 0
2 1 1
0 1 5
= ( 5)( 6)
Hence, the eigenvalues of B
T
B are
1
= 6,
2
= 5, and
3
= 0. Thus, the singular
values of B are
1
=
6,
2
=
5, and
3
= 0.
Recall that 0 is an eigenvalue of an n n matrix A if and only if rank A n. In
particular, we know that the number of non-zero eigenvalues of A
T
A is the rank of
A
T
A. Thus, it is natural to ask if there is a relationship between the number of non-
zero singular values of an m n matrix A and the rank of A. In the problems above,
we see that we have the rank of A equals the number of non-zero singular values. To
prove this result in general, we use the following lemma.
THEOREM 2 Let A be an m n matrix. Then rank(A
T
A) = rank(A).
Proof: If Ax =
0, then A
T
Ax = A
T
0 =
0. Hence the nullspace of A is a subset of the

nullspace of A
T
A. On the other hand, consider A
T
Ax =
0. Then
jAx j
2
= (Ax) (Ax) = (Ax)
T
(Ax) = x
T
A
T
Ax = x
T
0 = 0
Thus, Ax =
0. Hence, the nullspace of A

T
A is a subset of the nullspace of A.
Therefore, dim(Null(A
T
A)) = dim(Null(A)) and so by the Rank-Nullity Theorem
rank(A
T
A) = n dim(Null(A
T
A)) = n dim(Null(A)) = rank(A)
COROLLARY 3 If A is an m n matrix and rank(A) = r, then A has r non-zero singular values.

We now want to extend the similarity of singular values and eigenvalues by dening
singular vectors. Observe that if A is an m n matrix with m n, then we cannot
have Av = v since Av R
m
. Hence, the best we can do to match the denition of
eigenvalues and eigenvectors is to pick suitable non-zero vectors v R
n
and u R
m
such that Av = u.
By denition, for any non-zero singular value of A there is a vector v

0 such
that A
T
Av =
2
v. Thus, if we have Av = u, then we have A
T
Av = A
T
(u) so
2
v = A
T
u. Dividing by , we see that we must also have A
T
u = v. Moreover, by
Theorem 1, if v is a unit eigenvector of A
T
A, then we get that u =
1
Av is also a unit
vector.
Therefore, for a non-zero singular value of A, we will want unit vectors v and u
such that
Av = u and A
T
u = v
However, our derivation does not work for = 0. In this case, we will see that we
are satised with just one of these conditions being satised.
DEFINITION
Singular Vectors
Let A be an m n matrix. If v R
n
and u R
m
are unit vectors and 0 is a
singular value of A such that
Av = u and A
T
u = v
then we say that u is a left singular vector of A and v is a right singular vector of A.
Additionally, if u is a unit vector such that A
T
u =

0, then u is a left singular vector
of A. If v is a unit vector such that Av =
0, then v is a right singular vector of A.

This denition of right singular vectors, not only preserves our relationship of these
vectors and the eigenvectors of A
T
A, but we also get the corresponding result for the
left singular vectors and the eigenvectors of AA
T
.
THEOREM 4 Let A be an mn matrix. If v is a right singular vector of A, then v is an eigenvector
of A
T
A. If u is a left singular vector of A, then u is an eigenvector of AA
T
.
Now that we have singular values and singular vectors mimicking eigenvalues and
eigenvectors, we wish to try to mimic orthogonal diagonalization. Let A be an m n
matrix. As was the case with singular vectors, if m n, then P
T
AP is not dened for
a square matrix P. Thus, we want to nd an m m orthogonal matrix U and an n n
orthogonal matrix V such that U
T
AV = , where is an m n diagonal matrix.
To do this, we require that for any mn matrix A there exists an orthonormal basis for
R
n
of right singular vectors and an orthonormal basis for R
m
of left singular vectors.
We now prove that these always exist.
LEMMA 5 Let A be an mn matrix with rank(A) = r and suppose that v
1
, . . . , v
n
is an orthonor-
mal basis for R
n
consisting of the eigenvectors of A
T
A arranged so that the corre-
sponding eigenvalues
1
, . . . ,
n
are arranged from greatest to least and let
1
, . . . ,
n
be the singular values of A. Then
Av
1
1
, . . . ,
Av
r
r
is an orthonormal basis for Col A.
Proof: Since A has rank r, we know that A
T
A has rank r and so A
T
A has r non-zero
eigenvalues. Thus,
1
, . . . ,
r
are the r non-zero singular values of A.
Now observe that for 1 i, j r, i j we have
Av
i
Av
j
= (Av
i
)
T
Av
j
= v
i
A
T
Av
j
= v
T
i

j
v
j
=
j
v
T
i
v
j
= 0
since v
1
, . . . , v
r
is an orthonormal set. Hence, Av
1
, . . . , Av
r
is orthogonal and so
Av
1
1
, . . . ,
Av
r
r
is orthonormal by Theorem 1. Moreover, we know that dimCol(A) = r,
so this is an orthonormal basis for Col A.
THEOREM 6 Let A be an m n matrix with rank r. There exists an orthonormal basis v
1
, . . . , v
n
of R
n
of right singular vectors of A and an orthonormal basis u
1
, . . . , u
m
of R
m
of
left singular vectors of A.
Proof: By denition eigenvectors of A
T
A are right singular vectors of A. Hence,
since A
T
A is symmetric, we can nd an orthonormal basis v
1
, . . . , v
n
of right singu-
lar vector of A.
By Lemma 5, the left singular vector u
i
=
1
i
Av
i
, 1 i r, form an orthonormal basis
for Col(A). Also, by denition, a left singular vector u
j
of A corresponding to = 0
lies in the nullspace of A
T
. Hence, by the Fundamental Theorem of Linear Algebra,
if u
r+1
, . . . , u
m
is an orthonormal basis for the nullspace of A
T
, then u
1
, . . . , u
m
is
an orthonormal basis for R
m
of left singular vectors of A.
Thus, for any m n matrix A with rank r we have an orthonormal basis v
1
, . . . , v
n
for R
n
of right singular vectors of A corresponding to the n singular values
1
, . . . ,
n
of A and an orthonormal basis u
1
, . . . , u
m
for R
m
of left singular vectors such that
A
_
v
1
v
n
_
=
_
Av
1
Av
r
Av
r+1
Av
n
_
=
_
1
u
1

r
u
r

0

0
_
=
_
u
1
u
m
_
where is the m n matrix with ()

ii
=
i
for 1 i r and all other entries of are
0.
Hence, we have that there exists an orthogonal matrix V and orthogonal matrix U
such that AV = U. Instead of writing this as U
T
AV = , we typically, write this as
A = UV
T
to get a matrix decomposition of A.
DEFINITION
Singular Value
Decomposition
(SVD)
A singular value decomposition of an m n matrix A is a factorization of the form
A = UV
T
where U is an orthogonal matrix containing left singular vectors of A, V is an or-
thogonal matrix containing right singular vectors of A, and is the mn matrix with
()
ii
=
i
for 1 i r and all other entries of are 0.
ALGORITHM
To nd a singular value decomposition of a matrix, we follow what we did above.
(1) Find the eigenvalues
1
, . . . ,
n
of A
T
A arranged from greatest to least and a
corresponding set of orthonormal eigenvectors v
1
, . . . , v
n
.
(2) Let V =
_
v
1
v
n
_
and let be the m n matrix whose rst r diagonal
entries are the r singular values of A arranged from greatest to least and all
other entries 0.
(3) Find left singular vectors of A by computing u
i
=
1
i
Av
i
for 1 i r. Then
extend the set u
1
, . . . , u
r
to an orthonormal basis for R
m
. Take
U =
_
u
1
u
m
_
.
Then A = UV
T
.
REMARK
As indicated in the proof of Theorem 5. One way to extend u
1
, . . . , u
r
to an or-
thonormal basis for R
m
is by nding an orthonormal basis for Null(A
T
).
EXAMPLE 5 Find a singular value decomposition of A =
_
1 1 3
3 1 1
_
.
Solution: We nd that A
T
A =
_
_
10 2 6
2 2 2
6 2 10
_
_
has eigenvalues (ordered from greatest
to least) of
1
= 16,
2
= 6 and
3
= 0. Corresponding orthonormal eigenvectors are
v
1
=
_
_
1/
2
0
1/
2
_
_
, v
2
=
_
_
1/
3
1/
3
1/
3
_
_
, v
3
=
_
_
1/
6
2/
6
1/
6
_
_
So we let V =
_
_
1/
2 1/
3 1/
6
0 1/
3 2/
6
1/
2 1/
3 1/
6
_
_
.
The singular values of A are
1
=
16 = 4, and
2
=
6, so we let =
_
4 0 0
0

6 0
_
.
Next, we compute
u
1
=
1
1
Av
1
=
1
4
_
4/
2
4/
2
_
=
_
1/
2
1/
2
_
u
2
=
1
2
Av
2
=
1
6
_
3/
3
3/
3
_
=
_
1/
2
1/
2
_
Since this forms a basis for R
2
we take U =
_
1/
2 1/
2
1/
2 1/
2
_
.
Then, we have a singular value decomposition A = UV
T
.
EXAMPLE 6 Find a singular value decomposition of B =
_
_
2 4
2 2
4 0
1 4
_
_
.
Solution: We have B
T
B =
_
25 0
0 36
_
. The eigenvalues are
1
= 36 and
2
= 25 with
corresponding orthonormal eigenvectors
_
0
1
_
,
_
1
0
_
. So V =
_
0 1
1 0
_
.
The singular values of B are
1
= 6 and
2
= 5 so =
_
_
6 0
0 5
0 0
0 0
_
_
.
We compute
u
1
=
1
1
Bv
1
=
1
6
_
_
4
2
0
4
_
_
u
2
=
1
2
Bv
2
=
1
5
_
_
2
2
4
1
_
_
But, we only have 2 orthogonal vectors in R
4
, so we need to extend u
1
, u
2
to an
4
. We know that we can complete a basis for R
4
by nding
an orthonormal basis for Null(B
T
).
Applying the Gram-Schmidt algorithm to a basis for Null(B
T
) we get vectors
u
3
=
1
3
_
_
1
2
0
2
_
_
, u
4
=
1
15
_
_
8
8
9
4
_
_
.
We let U =
_
u
1
u
2
u
3
u
4
_
and then we have a singular value decomposition UV
T
of B.
EXAMPLE 7 Find a singular value decomposition for A =
_
_
1 1
2 2
1 1
_
_
.
Solution: We have A
T
A =
_
6 6
6 6
_
. Thus, by inspection, the eigenvalues of A
1
= 12
and
2
= 0, with corresponding unit eigenvectors v
1
=
_
1/
2
1/
2
_
and v
2
=
_
1/
2
1/
2
_
respectively. Hence, the singular values of A are
1
=
12 and
2
= 0. Thus, we
have
V =
_
1/
2 1/
2
1/
2 1/
2
_
=
_
12 0
0 0
0 0
_
_
We compute
u
1
=
1
1
Av
1
=
1
25
_
_
2
4
2
_
_
Since we only have one vectors in R
3
, we need to extend u
1
to an orthonormal basis
for R
3
. To do this, we nd an orthonormal basis for Null(A
T
).
We observe that a basis for Null(A
T
) is
_
_
_
_
2
1
0
_
_
,
_
_
1
0
1
_
_
_
_
. Applying the Gram-Schmidt
algorithm to this basis and normalizing gives
u
2
=
1
6
_
_
2
1
0
_
_
, u
3
=
1
30
_
_
1
2
5
_
_
We let U =
_
u
1
u
2
u
3
_
and then we have a singular value decomposition UV
T
of
A.
1. Find a singular value decomposition for the following matrices.
(a)
_
2 2
1 2
_
(b)
_
3 4
8 6
_
(c)
_
2 3
0 2
_
(d)
_
_
1 2
1 2
1 2
1 2
_
_
(e)
_
1 1 1 1
2 2 2 2
_
(f)
_
4 4 2
3 2 4
_
2. Let A be an m n matrix and let P be an m m orthogonal matrix. Show that
PA has the same singular values as A.
3. Let UV
T
be a singular value decomposition for an m n matrix A with rank
r. Find, with proof, an orthonormal basis for Row(A), Col(A), Null(A) and
Null(A
T
) from the columns of U and V.
4. Let A =
_
_
2 1 1
0 2 0
0 0 2
_
_
be the standard matrix of a linear mapping L. (Observe that
A is not diagonalizable).
(a) Find a basis J for R
3
of right singular vectors of A.
(b) Find a basis ( for R
3
of left singular vectors of A.
(c) Determine
(
[L]
J
.
Chapter 11
Complex Vector Spaces
Our goal in this chapter is to extend everything we have done with real vector spaces
to complex vector spaces. That is, vector spaces which use complex numbers for
their scalars instead of just real numbers. We begin with a very brief review of the
basic operations on complex numbers.
11.1 Complex Number Review
Recall that a complex number is a number of the form z = a + bi where a, b R and
i
2
= 1. The set of all complex numbers is denoted C. We call a the real part of z
and denote it by
Re z = a
and we call b the imaginary part of z and denote it by
Imz = b
It is very important to notice that the real numbers are a subset of the complex num-
bers. In particular, every real number a is the complex number a + 0i. If the real part
of a complex number z is 0, then we say that z is imaginary. Any complex number
which has non-zero imaginary part is said to be non-real.
We dene addition and subtraction of complex numbers by
(a + bi) (c + di) = (a c) + (b d)i
EXAMPLE 1 We have
(3 + 4i) + (2 3i) = (3 + 2) + (4 3)i = 5 + i
(1 3i) (4 + 2i) = (1 4) + (3 2)i = 3 5i
108
Section 11.1 Complex Number Review 109
Observe that addition of complex numbers is dened by adding the real parts and
adding the imaginary parts of the complex numbers. That is, we add complex num-
bers component wise just like vectors in R
n
.
In fact, the set C with this rule for addition and with scalar multiplication by a real
scalar dened by
t(a + bi) = ta + tbi, t R
forms a vector space over R. Lets nd a basis for this real vector space. Notice that
every vector in C can be written as a + bi with a, b R, so 1, i spans C. Also, 1, i
is clearly linearly independent as the only solution to
t
1
1 + t
2
i = 0 + 0i
is t
1
= t
2
= 0 since t
1
and t
2
are real scalars. Hence, C forms a two dimensional
real vector space. Consequently, it is isomorphic to every other two dimensional real
vector space. In particular, it is isomorphic to the plane R
2
. As a result, the set C is
often called the complex plane. We generally think of having an isomorphism which
maps the complex number z = a + bi to the point (a, b) in R
2
.
We can use this isomorphism to dene the absolute value of a complex number z =
a + bi. Recall that the absolute value of a real number x measures the distance that
x is from the origin. So, we dene the absolute value of z as the distance from the
point (a, b) in R
2
to the origin.
DEFINITION
Absolute Value
The absolute value or modulus of a complex number z = a + bi is dened by
z =
a
2
+ b
2
EXAMPLE 2 We have
2 3i =
_
2
2
+ (3)
2
=
13
1
2
+
1
2
i
=
_
_
1
2
_
2
+
_
1
2
_
2
=
_
1
2
Of course, this has the same properties as the real absolute value.
THEOREM 1 Let w, z C. Then
(1) z is a non-negative real number.
(2) z = 0 if and only if z = 0.
(3) w + z w + z
110 Chapter 11 Complex Vector Spaces
REMARK
If z = a + bi is real, then b = 0, so z = a + 0i =
a
2
+ 0
2
= a. Hence, this is a
direct generalization of the real absolute value function.
The ability to represent complex numbers as points in the plane is very important in
the study of complex numbers. However, viewing C as a real vector space has one
serious limitation; it only denes the multiplication of a real scalar by a complex
number and does not immediately give us a way of multiplying two complex num-
bers together. So, we return to the standard form of complex numbers and dene
multiplication of complex numbers by using the normal distributive property and the
fact that i
2
= 1. We get
(a + bi)(c + di) = ac + adi + bci + bdi
2
= ac bd + (ad + bc)i
EXAMPLE 3 We have
(3 + 4i)(2 3i) = 3(2) 4(3) + [4(2) + 3(3)]i = 18 i
(1 3i)(4 + 2i) = 1(4) (3)(2) + [(3)(4) + 1(2)]i = 10 10i
We get the following familiar properties.
THEOREM 2 Let z
1
, z
2
, z
3
C, then
(1) z
1
+ z
2
= z
2
+ z
1
(2) z
1
z
2
= z
2
z
1
(3) z
1
+ (z
2
+ z
3
) = (z
1
+ z
2
) + z
3
(4) z
1
(z
2
z
3
) = (z
1
z
2
)z
3
(5) z
1
(z
2
+ z
3
) = z
1
z
2
+ z
1
z
3
(6) z
1
z
2
= z
1
z
2
We look at one more important example of complex multiplication.

EXAMPLE 4
(a + bi)(a bi) = a
2
+ b
2
+ [b(a) a(b)]i = a
2
+ b
2
= a + bi
2
This example motivates the following denition.
DEFINITION
Complex
Conjugate
Let z = a +bi be a complex number. Then, the complex conjugate of z is dened by
z = a bi
EXAMPLE 5 We have
4 3i = 4 + 3i
3i = 3i
4 = 4
2 + 2i = 2 2i
We have many important properties of complex conjugates.
THEOREM 3 Let w, z C with z = a + bi, then
(1) z = z
(2) z is real if and only if z = z
(3) z is imaginary if and only if z = z
(4) z w = z w
(5) zw = z w
(6) z + z = 2 Re(z) = 2a
(7) z z = i2 Im(z) = 2bi
(8) zz = a
2
+ b
2
= z
2
Since the absolute value of a non-zero complex number is a positive real number,
we can use property (8) to divide complex numbers. We demonstrate this with some
examples.
EXAMPLE 6 Calculate
2 3i
1 + 2i
.
Solution: To calculate this, we make the denominator real by multiplying the numer-
ator and the denominator by the complex conjugate of the denominator. We get
2 3i
1 + 2i
=
2 3i
1 + 2i

1 2i
1 2i
=
2 6 + (3 4)i
1
2
+ 2
2
=
4 + i
5
=
4
5

7
5
i
We can easily check this answer. We have
(1 + 2i)
_
4
5

7
5
i
_
=
4
5
+
14
5
+
_
8
5

7
5
_
i = 2 3i
EXAMPLE 7 Calculate
i
1 + i
.
Solution: We have
i
1 + i
=
i
1 + i

1 i
1 i
=
1 + i
1 + 1
=
1
2
+
1
2
i
EXERCISE 1 Calculate
5
3 4i
.
In some real world applications, such as problems involving electric circuits, one
needs to solve systems of linear equations involving complex numbers. The proce-
dure for solving systems of linear equations is the same, except that we can now
multiply a row by a non-zero complex number and we can a complex multiple of one
row to another. We demonstrate this with a few examples.
EXAMPLE 8 Solve the system of linear equations
z
1
+ iz
2
+ (1 + 2i)z
3
= 1 + 2i
z
2
+ 2z
3
= 2 + 2i
2z
1
+ (1 + 2i)z
2
+ (6 + 4i)z
3
= 4
Solution: We row reduce the augmented matrix of the system to RREF.
_
_
1 i 1 + 2i 1 + 2i
0 1 2 2 + 2i
2 1 + 2i 6 + 4i 4
_
_
R
3
2R
1
_
1 i 1 + 2i 1 + 2i
0 1 2 2 + 2i
0 1 4 2 4i
_
_
R
1
iR
2
R
3
+ R
2
_
1 0 1 1
0 1 2 2 + 2i
0 0 2 2i
_
_

1
2
R
3
_
1 0 1 1
0 1 2 2 + 2i
0 0 1 i
_
_
R
1
+ R
3
R
2
2R
3

_
_
1 0 0 1 + i
0 1 0 2
0 0 1 i
_
_
Hence, the solution is z
1
= 1 + i, z
2
= 2, and z
3
= i.
REMARK
It is very easy to make calculation mistakes when working with complex numbers.
Thus, it is highly recommended that you check your answer whenever possible.
EXAMPLE 9 Solve the system of linear equation
z
1
+ z
2
+ iz
3
= 1
2iz
1
+ z
2
+ (1 2i)z
3
= 2i
(1 + i)z
1
+ (2 + 2i)z
2
2z
3
= 2
_
_
1 1 i 1
2i 1 1 2i 2i
1 + i 2 + 2i 2 2
_
_
R
2
+ 2iR
1
R
3
+ (1 + i)R
1
_
1 1 i 1
0 1 + 2i 1 2i 0
0 1 + i 1 i 1 i
_
_
1
1+2i
R
2

_
_
1 1 i 1
0 1 1 0
0 1 + i 1 i 1 i
_
_
R
1
R
2
R
3
(1 + i)R
2
_
1 0 1 + i 1
0 1 1 0
0 0 0 1 i
_
_
Hence, the system is inconsistent.
EXAMPLE 10 Find the general solution of the system of linear equations
z
1
iz
2
= 1 + i
(1 + i)z
1
+ (1 i)z
2
+ (1 i)z
3
= 1 + 3i
2z
1
2iz
2
+ (3 + i)z
3
= 1 + 5i
_
_
1 i 0 1 + i
1 + i 1 i 1 i 1 + 3i
2 2i 3 + i 1 + 5i
_
_
R
2
(1 + i)R
1
R
3
2R
1
_
1 i 0 1 + i
0 0 1 i 1 + i
0 0 3 + i 1 + 3i
_
_
1
1i
R
2

_
_
1 i 0 1 + i
0 0 1 i
0 0 3 + i 1 + 3i
_
_
R
3
(3 + i)R
2
_
1 i 0 1 + i
0 0 1 i
0 0 0 0
_
_
Thus, z
2
is a free variable. Since we are solving this system over C, this means that
z
2
can take any complex value. Hence, we let z
2
= C. Then, the general solution
is z
1
= 1 + i + i, z
2
= , and z
3
= i.
1. Calculate the following
(a) (3 + 4i) (2 + 6i) (b) (1 + 2i)(3 + 2i) (c) 3i(2 + 3i)
(d) (1 + i)(1 2i)(3 + 4i) (e) 1 + 6i (f)
2
3

2i
(g)
2
1i
(h)
43i
34i
(i)
2+5i
36i
2. Solving the following systems of linear equations over C.
(a)
z
1
+ (2 + i)z
2
+ iz
3
= 1 + i
iz
1
+ (1 + 2i)z
2
+ 2iz
4
= i
z
1
+ (2 + i)z
2
+ (1 + i)z
3
+ 2iz
4
= 2 i
(b)
iz
1
+ 2z
2
(3 + i)z
3
= 1
(1 + i)z
1
+ (2 2i)z
2
4z
3
= i
iz
1
+ 2z
2
(3 + 3i)z
3
= 1 + 2i
(c)
iz
1
+ (1 + i)z
2
+ z
3
= 2i
(1 i)z
1
+ (1 2i)z
2
+ (2 + i)z
3
= 2 + i
2iz
1
+ 2iz
2
+ 2z
3
= 4 + 2i
(d)
z
1
z
2
+ iz
3
= 2i
(1 + i)z
1
iz
2
+ iz
3
= 2 + i
(1 i)z
1
+ (1 + 2i)z
2
+ (1 + 2i)z
3
= 3 + 2i
3. Prove that for any z
1
, z
2
, z
3
C that we have
z
1
(z
2
+ z
3
) = z
1
z
2
+ z
1
z
3
4. Prove that for any positive integer n and z C we have
z
n
= (z)
n
Section 11.2 Complex Vector Spaces 115
11.2 Complex Vector Spaces
We saw in the last section that the set of all complex numbers C can be thought of
as a two dimensional real vector space. However, we see this is inappropriate for the
solution space of Example 5.1.10 since it requires multiplication by complex scalars.
Thus, we need to extend the denition of a real vector space to a complex vector
space.
DEFINITION
Complex Vector
Space
A set V is called a vector space over C if there is an operation of addition and an
operation of scalar multiplication such that for any v, z, w V and , C we have:
V1 z + w V
V2 (z + w) +v =z + ( w +v)
V3 z + w = w +z
V4 There is a vector denoted
0 in V such that z +
0 = z. It is called the zero vector.

V5 For each z V there exists an element z such that z + (z) =

0. z is called
the additive inverse of z.
V6 z V
V7 (z) = ()z
V8 ( + )z = z + z
V9 (z + w) = z + w
V10 1z = z.
This denition looks a lot like the denition of a real vector space. In fact, the only
dierence is that we now allow the scalars to be any complex number, instead of just
any real number. In the rest of this section we will generalize most of our concepts
from real vector spaces to complex vector spaces.
EXAMPLE 1 The set C
n
=
_
_
_
_
z
1
.
.
.
z
n
_
_
z
i
C
_
_
is a vector space over C with addition dened by
_
_
z
1
.
.
.
z
n
_
_
+
_
_
w
1
.
.
.
w
n
_
_
=
_
_
z
1
+ w
1
.
.
.
z
n
+ w
n
_
_
and scalar multiplication dened by
_
z
1
.
.
.
z
n
_
_
=
_
_
z
1
.
.
.
z
n
_
_
for any C.
Just like a complex number z C, we can split a vector in C
n
into a real and imaginary
part.
THEOREM 1 If z C
n
, then there exists vectors x, y R
n
such that
z = x + iy
Other familiar real vector spaces can be turned into complex vector spaces.
EXAMPLE 2 The set M
mn
(C) of all mn matrices with complex entries is a complex vector space
with standard addition and complex scalar multiplication of matrices.
We now extend all of our theory from real vector spaces to complex vectors space.
DEFINITION
Subspace
If S is a subset of a complex vector space V and S is a complex vector space under
the same operations as V, then S is said to be a subspace of V.
As in the real case, we use the Subspace Test to prove a non-empty subset S of a
complex vector space V is a subspace of V.
EXAMPLE 3 Prove that S =
_
_
_
_
z
1
z
2
z
3
_
_
z
1
+ iz
2
= z
3
_
_
is a subspace of C
3
.
Solution: By denition S is a subset of C
3
. Moreover, S is non-empty since
_
_
0
0
0
_
_
satises 0 + i(0) = 0. Let
_
_
z
1
z
2
z
3
_
_
,
_
_
w
1
w
2
w
3
_
_
S. Then z
1
+ iz
2
= z
3
and w
1
+ iw
2
= w
3
.
Hence,
_
_
z
1
z
2
z
3
_
_
+
_
_
w
1
w
2
w
3
_
_
=
_
_
z
1
+ w
1
z
2
+ w
2
z
3
+ w
3
_
_
S
since (z
1
+ w
1
) + i(z
2
+ w
2
) = z
1
+ iz
2
+ w
1
+ iw
2
= z
3
w
3
= (z
3
+ w
3
), and
_
z
1
z
2
z
3
_
_
=
_
_
z
1
z
2
z
3
_
_
S
since z
1
+ i(z
2
) = (z
1
+ iz
2
) = (z
3
) = (z
3
). Thus, S is a subspace of C
3
by
the subspace test.
EXAMPLE 4 Is R a subspace of C as a vector space over C?
Solution: By denition R is a non-empty subset of C. Let x, y R. Then, x + y is
also a real number, so the set is closed under addition. But, the dierence between a
real vector space and a complex vector space is that we have scalar multiplication by
complex numbers in a complex vector space. So, if we take 2 Rand the scalar i C,
then we get i(2) R. Thus, R is not closed under complex scalar multiplication, and
hence it is not a subspace of C.
The concepts of linear independence, spanning, bases, and dimension are dened the
same as in the real case except that we now have complex scalar multiplication.
EXAMPLE 5 Find a basis for C as a vector space over C.
Solution: It may be tempting to think that a basis for C is 1, i, but this would
be incorrect since this set is linearly dependent in C. In particular, the equation
1
1 +
2
i = 0 has solution
1
= 1 and
2
= i since 1(1) + i(i) = 1 1 = 0. A basis
for C is 1 since any complex number a + bi is just equal to (a + bi)(1).
EXERCISE 1 What is the standard basis for C
n
?
EXAMPLE 6 Determine the dimension of the subspace S =
_
_
_
_
z
1
z
2
z
3
_
_
z
1
+ iz
2
= z
3
_
_
of C
3
.
Solution: Every vector in S can be written in the form
_
_
z
1
z
2
z
3
_
_
=
_
_
z
1
z
2
z
1
iz
2
_
_
= z
1
_
_
1
0
1
_
_
+ z
2
_
_
0
1
i
_
_
Thus,
_
_
_
_
1
0
1
_
_
,
_
_
0
1
i
_
_
_
_
spans S and is clearly linearly independent since neither vector is
a scalar multiple of the other, so it is a basis for S. Thus, dimS = 2.
EXERCISE 2 Find a basis for the four fundamental subspaces of A =
_
_
1 1 i
i 1 1 + 2i
i 1 + 2i 3 + 2i
2 0 2i
_
_
.
REMARK
Be warned that it is possible for two complex vectors to be scalar multiples of each
other without looking like they are. For example, can you determine by inspection
which of the following vectors is a scalar multiple of
_
1 i
2i
_
?
_
1
1 i
_
,
_
i
1 i
_
,
_
7 i
8 + 6i
_
Coordinates with respect to a basis, linear mappings, and matrices of linear mappings
are also dened in exactly the same way.
EXAMPLE 7 Given that J =
_
_
_
_
1
1 + i
1 i
_
_
,
_
_
1
i
1 + 2i
_
_
,
_
_
i
i
2 + 2i
_
_
_
_
is a basis of C
3
, nd the J-coordinates
of z =
_
_
2i
2 + i
3 + 2i
_
_
.
Solution: We need to nd complex numbers
1
,
2
, and
3
such that
_
_
2i
2 + i
3 + 2i
_
_
=
1
_
_
1
1 + i
1 i
_
_
+
2
_
_
1
i
1 + 2i
_
_
+
3
_
_
i
i
2 + 2i
_
_
Row reducing the corresponding augmented matrix gives
_
_
1 1 i 2i
1 + i i i 2 + i
1 i 1 + 2i 2 + 2i 3 + 2i
_
_
R
2
(1 + i)R
1
R
3
(1 i)R
1
_
1 1 i 2i
0 1 1 i
0 i 1 + i 1
_
_
R
1
+ R
2
R
3
iR
2
_
1 0 1 + i i
0 1 1 i
0 0 1 0
_
_
R
1
(1 + i)R
3
R
2
R
3
_
1 0 0 i
0 1 0 i
0 0 1 0
_
_
Thus, [z]
J
=
_
_
i
i
0
_
_
.
EXAMPLE 8 Find the standard matrix of L : C
2
C
3
given by L(z
1
, z
2
) =
_
_
iz
1
z
2
z
1
+ z
2
_
_
.
Solution: The standard basis for C
2
is
__
1
0
_
,
_
0
1
__
. We have
L(1, 0) =
_
_
i
0
1
_
_
and L(0, 1) =
_
_
0
1
1
_
_
. Hence, [L] =
_
L(1, 0) L(0, 1)
_
=
_
_
i 0
0 1
1 1
_
_
.
Of course, the concepts of determinants and invertibility are also the same.
EXAMPLE 9 We have
det
_
_
1 + i 2 3
0 2i 2 + 3i
0 0 1 i
_
_
= (1 + i)(2i)(1 i) = 4i
det
_
_
i i 2i
1 1 2
1 + i 1 i 3i
_
_
= 0
EXAMPLE 10 Let A =
_
1 + i 1
1 2 i
_
and B =
_
_
i 0 1
1 i 1 i
0 1 1 + i
_
_
. Show that A and B are invertible and
nd their inverse.
Solution: We have det A =
1 + i 1
1 2 i
= 2 + i 0, so A is invertible. Using the

formula for the inverse of a 2 2 matrix we found in Math 136 gives
A
1
=
1
2 + i
_
2 i 1
1 1 + i
_
Row reducing [B I ] to RREF gives
_
_
i 0 1 1 0 0
1 i 1 i 0 1 0
0 1 1 + i 0 0 1
_
_
1 0 0 2 2i 1 i
0 1 0 3 i 1 i i
0 0 1 1 2i i 1
_
_
Since the RREF of B is I, B is invertible. Moreover, B
1
=
_
_
2 2i 1 i
3 i 1 i i
1 2i i 1
_
_
.
EXERCISE 3 Let A =
_
_
1 + i 2 i
2 3 2i 1 + i
1 i 1 3i 1 + i
_
_
. Find the determinant of A and A
1
.
Complex Conjugates
We saw in Section 11.1 that the complex conjugate was a very important and useful
operation on complex numbers. Thus, we should expect that it will also be useful
and important in the study of general complex vector spaces. Therefore, we extend
the denition of the complex conjugate to vectors in C
n
and matrices in M
mn
(C).
DEFINITION
Complex
Conjugate in C
n
Let z =
_
_
z
1
.
.
.
z
n
_
_
C
n
. Then, we dene z by z =
_
_
z
1
.
.
.
z
n
_
_
.
DEFINITION
Complex
Conjugate in
M
mn
(C)
Let A =
_
_
z
11
z
1n
.
.
.
.
.
.
z
m1
z
mn
_
_
M
mn
(C). Then, we dene A by A =
_
_
z
11
z
1n
.
.
.
.
.
.
z
m1
z
mn
_
_
.
THEOREM 2 Let A M
mn
(C) and z C
n
. Then Az = Az.
EXAMPLE 11 Let w =
_
_
1
2i
1 i
_
_
, z =
_
_
1
0
1
_
_
, and A =
_
1 1 3i
i 2i
_
. Then
w =
_
_
1
2i
1 + i
_
_
z =
_
_
1
0
1
_
_
A =
_
1 1 + 3i
i 2i
_
EXAMPLE 12 Is the mapping L : C
2
C
2
dened by L(z) = z a linear mapping?
Solution: Consider L(z
1
+z
2
) = z
1
+z
2
= z
1
+z
2
, but if is not real. So,
it is not linear. For example
L
_
(1 + i)
_
1
0
__
= L
__
1 + i
0
__
=
_
1 + i
0
_
=
_
1 i
0
_

_
1 + i
0
_
= (1 + i)L
__
1
0
__
Even though the complex conjugate of a vector in C
n
does not dene a linear map-
ping, we will see that complex conjugates of vectors in C
n
and matrices in M
mn
(C)
occur naturally in the study of complex vector spaces.
1. Determine which of the following sets is a subspace of C
3
. Find a basis and
the dimension of each subspace.
(a) S
1
=
_
_
_
_
z
1
z
2
z
3
_
_
C
3
iz
1
= z
3
_
_
(b) S
2
=
_
_
_
_
z
1
z
2
z
3
_
_
C
3
z
1
z
2
= 0
_
_
(c) S
3
=
_
_
_
_
z
1
z
2
z
3
_
_
C
3
z
1
+ z
2
+ z
3
= 0
_
_
2. Find a basis for the four fundamental subspaces of A =
_
_
1 i
1 + i 1 + i
1 i
_
_
.
3. Let z C
n
. Prove that there exists vectors x, y R
n
such that z = x + iy.
4. Let L : C
3
C
2
be dened by L(z
1
, z
2
, z
3
) =
_
iz
1
+ (1 + i)z
2
+ (1 + 2i)z
3
(1 + i)z
1
2iz
2
3iz
3
_
.
(a) Prove that L is linear and nd the standard matrix of L.
(b) Determine a basis for the range and kernel of L.
5. Let L : C
2
C
2
be dened by L(z
1
, z
2
) =
_
z
1
+ (1 + i)z
2
(1 + i)z
1
+ 2iz
2
_
.
(a) Prove that L is linear.
(b) Determine a basis for the range and kernel of L.
(c) Use the result of part (b) to dene a basis J for C
2
such that [L]
J
is
diagonal.
6. Calculate the determinant of
_
_
1 1 i
1 + i i i
1 i 1 + 2i 1 + 2i
_
_
.
7. Find the inverse of
_
_
1 1 2
0 1 + i 1
i 1 + i 1 + 2i
_
_
.
8. Prove that for any n n matrix A, we have det A = det A.
11.3 Complex Diagonalization
In Math 136 we saw that some matrices were not diagonalizable over R because
they had complex eigenvalues. But, if we extend all of our concepts of eigenvalues,
eigenvectors, and diagonalization to complex vector spaces, then we will be able to
diagonalize these matrices.
DEFINITION
Eigenvalue
Eigenvector
Let A M
nn
(C). If there exists C and z C
n
with z

0 such that Az = z, then
is called an eigenvalue of A and z is called an eigenvector of A corresponding to
. We call (, z) an eigenpair.
All of the theory of similar matrices, eigenvalues, eigenvectors, and diagonalization
we did in Math 136 still applies in the complex case, except that we now allow the
use of complex eigenvalues and eigenvectors.
EXAMPLE 1 Diagonalize A =
_
0 1
1 0
_
over C.
C() =
1
1
=
2
+ 1
Therefore, the eigenvalues of A are
1
= i and
2
= i. For
1
= i we get
A iI =
_
i 1
1 i
_

_
1 i
0 0
_
1
is
__
i
1
__
. For
2
= i we get
A + iI =
_
i 1
1 i
_

_
1 i
0 0
_
So, a basis for E
2
is
__
i
1
__
. Thus, A is diagonalized by P =
_
i i
1 1
_
to D =
_
i 0
0 i
_
.
EXAMPLE 2 Diagonalize B =
_
i 1 + i
1 + i 1
_
over C.
C() =
i 1 + i
1 + i 1
=
2
(1 + i) i
Section 11.3 Complex Diagonalization 123
Using the quadratic formula we get
=
(1 + i)
6i
2
=
(1 + i)
6(
1+i
2
)
2
=
1
3
2
(1 + i)
So, the eigenvalues are
1
=
1+
3
2
(1 + i) and
2
=
1
3
2
(1 + i). For
1
we get
B
1
I =
_
_
i
1+
3
2
(1 + i) 1 + i
1 + i 1
1+
3
2
(1 + i)
_
3 + i 2
2
3 i
_
_
1 2/(
3 i)
0 0
_
Then a basis for E
1
is v
1
=
_
2
3 i
_
. For
2
we get
B
2
I =
_
_
i
1
3
2
(1 + i) 1 + i
1 + i 1
1
3
2
(1 + i)
_
_
3 + i 2
2

3 i
_
_
1 2/(
3 + i)
0 0
_
2
is v
2
=
_
2
3 + i
_
. Thus, B is diagonalized by
P =
_
2 2
3 i

3 + i
_
to D =
_
1
0
0
2
_
.
EXAMPLE 3 Diagonalize F =
_
3 2 + i
2 i 7
_
over C.
C() =
3 2 + i
2 i 7
=
2
10 + 16 = ( 2)( 8)
So the eigenvalues are
1
= 2 and
2
= 8. For
1
we get
F
1
I =
_
1 2 + i
2 i 5
_

_
1 2 + i
0 0
_
Then a basis for E
1
is v
1
=
_
2 i
1
_
. For
2
we get
F
2
I =
_
5 2 + i
2 i 1
_

_
1 1/(2 i)
0 0
_
2
is v
2
=
_
1
2 + i
_
. Thus, F is diagonalized by
P =
_
2 i 1
1 2 + i
_
to D =
_
2 0
0 8
_
.
As usual, these examples teach us more than just how to diagonalize complex ma-
trices. In the rst example we see that when the matrix has only real entries, then
the eigenvalues come in complex conjugate pairs. The second example shows that
when working with matrices with non-real entries our theory of symmetric matrices
for real matrices does not apply. In particular, we had B
T
= B, but the eigenvalues of
B are not real. On the other hand, observe that the matrix F has non-real entires but
the eigenvalues of F are all real.
THEOREM 1 Let A M
nn
(R). If is a non-real eigenvalue of A with corresponding eigenvector
z, then is also an eigenvalue of A with eigenvector z.
Proof: We have Az = z. Hence,
z = z = Az = Az = Az
since A is a real matrix. Thus, is an eigenvalue of A with eigenvector z.
COROLLARY 2 If A M
nn
(R) with n odd, then A has at least one real eigenvalue.
EXAMPLE 4 Find all eigenvalues of A =
_
_
0 2 1
2 3 0
1 0 2
_
_
given that
1
= 2 + i is an eigenvalue of A.
Solution: Since A is a real matrix and
1
= 2 + i is an eigenvalue of A we have by
Theorem 1 that
2
=
1
= 2 i is also an eigenvalue of A. Finally, we know that the
sum of the eigenvalues is the trace of the matrix, so the remaining eigenvalue, which
must be real, is
3
= tr A
1

2
= 5 (2 + i) (2 i) = 1
It is easy to verify this result by checking that det(A I) = 0.
1. For each of the following matrices, either diagonalize the matrix over C, or
show that it is not diagonalizable.
(a)
_
2 1 + i
1 i 1
_
(b)
_
3 5
5 3
_
(c)
_
_
2 2 1
4 1 2
2 2 1
_
_
(d)
_
1 i
i 1
_
(e)
_
_
2 1 1
2 1 0
3 1 2
_
_
(f)
_
_
1 + i 1 0
1 1 i
1 0 1
_
_
Section 11.4 Complex Inner Products 125
2. For any R, let R
=
_
cos sin
sin cos
_
.
(a) Diagonalize R
over C.
(b) Verify your answer in (a) is correct, by showing the matrix P and diagonal
matrix D from part (a) satisfy P
1
R
P = D for = 0 and =

4
.
3. Let A M
nn
(C) and let z be an eigenvector of A. Prove that z is an eigenvector
of A. What is the corresponding eigenvalue?
4. Suppose that a real 22 matrix A has 2+i as an eigenvalue with a corresponding
eigenvector
_
1 + i
i
_
. Determine A.
11.4 Complex Inner Products
The concepts of orthogonality and orthonormal bases were very useful in real vector
spaces. Hence, it makes sense to extend these concepts to complex vector spaces. In
particular, we would like to be able to dene the concepts of orthogonality and length
in complex vector spaces to match those in the real case.
REMARK
In this section we leave the proofs of the theorems to the reader (or homework as-
signments/tests) as they are essentially the same as in the real case.
We start by looking at how to dene an inner product for C
n
.
We rst observe that if we extend the standard dot product to vectors in C
n
, then this
does not dene an inner product. Indeed, if z = x + iy, then we have
z z = z
2
1
+ + z
2
n
= (x
2
1
+ + x
2
n
y
2
1
y
2
n
) + 2i(x
1
y
1
+ + x
n
y
n
)
But, to be an inner product we require that (z, z) be a non-negative real number so
that we can dene the length of a vector. Since z z does not even have to be real, it
cannot dene an inner product.
Thinking of properties of complex numbers, the only way we can ensure that we get
a non-negative real number when multiplying complex numbers is to multiply the
complex number by its conjugate. In particular, if we dene
(z, w) = z
1
w
1
+ + z
n
w
n
=z w
then, we get
< z, z >= z
1
z
1
+ + z
n
z
n
= z
1
2
+ + z
n
2
which is not only a non-negative real number, but it is 0 if and only if z =

0. Thus,
this is what we are looking for.
DEFINITION
Standard Inner
product on C
n
The standard inner product <, > on C
n
is dened by
_
z, w
_
=z w = z
1
w
1
+ + z
n
w
n
for any z, w C
n
.
REMARK
In some other elds, for example engineering, they use
_
z, w
_
= w z
for the denition of the standard inner product for C
n
. Be warned that many computer
programs use the engineering denition of the inner product for C
n
.
EXAMPLE 1 Let z =
_
_
1
2 + i
1 i
_
_
and w =
_
_
i
2
1 + 3i
_
_
. Then
<z, z > =z z =
_
_
1
2 + i
1 i
_
_
1
2 i
1 + i
_
_
= 1
2
+ (2 + i)(2 i) + (1 i)(1 + i) = 8
< z, w > =z w =
_
_
1
2 + i
1 i
_
_
i
2
1 3i
_
_
= 1(i) + (2 + i)(2) + (1 i)(1 3i) = 2 3i
< w, z > = w z =
_
_
i
2
1 + 3i
_
_
1
2 i
1 + i
_
_
= i(1) + 2(2 i) + (1 + 3i)(1 + i) = 2 + 3i
This example shows that the standard inner product for C
n
is not symmetric. The
next example shows that it is also not bilinear.
EXAMPLE 2 Let z, w C
n
. Then, for any C we have (z, w) = (z, w), but (z, w) = (z, w).
Solution: We have
(z, w) = (z) w = (z w) = (z, w)
(z, w) =z w = z ( w) = (z w) = (z, w)
EXERCISE 1 Let z =
_
_
1 + i
1 i
2
_
_
and w =
_
_
1 i
2i
2
_
_
. Find (z, w) and ( w, z).
THEOREM 1 Let ( , ) denote the standard inner product for C
n
. Then
(1)
_
z, z
_
R and
_
z, z
_
0 for all z V, and
_
z, z
_
= 0 if and only if z =
0
(2)
_
z, w
_
=
_
w, z
_
(3)
_
v +z, w
_
=
_
v, w
_
+
_
z, w
_
(4)
_
z, w
_
=
_
z, w
_
As we saw in the examples, this theorem shows that the standard inner product on
C
n
does not satisfy the same properties as a real inner product. So, to dene an inner
product on a general complex vector space, we should use these properties instead of
those for a real inner product.
DEFINITION
Hermitian Inner
Product
Let V be a complex vector space. A Hermitian inner product on V is a function
( , ) : V V C such that for all v, w, z V and C we have
(1)
_
z, z
_
R and
_
z, z
_
0 for all z V, and
_
z, z
_
= 0 if and only if z =
0
(2)
_
z, w
_
=
_
w, z
_
(3)
_
v +z, w
_
=
_
v, w
_
+
_
z, w
_
(4)
_
z, w
_
=
_
z, w
_
A complex vector space with a Hermitian inner product is called a Hermitian inner
product space.
THEOREM 2 If ( , ) is a Hermitian inner product on a complex vector space V, then for all v, w, z
V and C we have
_
z, v + w
_
=
_
z, v
_
+
_
z, w
_
_
z, w
_
=
_
z, w
_
It is important to observe that this is a true generalization of the real inner product.
That is, we could use this for the denition of the real inner product since if
_
z, w
_
and are strictly real, then this will satisfy the denition of a real inner product.
THEOREM 3 The function (A, B) = tr(B
T
A) denes a Hermitian inner product on M
mn
(C).
EXAMPLE 3 Let A =
_
1 i
2 2 + i
_
and B =
_
3 3 + i
2i 0
_
. Find (A, B) under the inner product
(A, B) = tr(B
T
A)
Solution: We have
(A, B) = tr(B
T
A) = tr
__
3 2i
3 i 0
_ _
1 i
2 2 + i
__
= tr
_
3 + 4i 2 + i
3 i 1 + 3i
_
= 3 + 4i + 1 + 3i = 2 + 7i
EXAMPLE 4 If A =
_
_
a
11
a
1n
.
.
.
.
.
.
a
m1
a
mn
_
_
and B =
_
_
b
11
b
1n
.
.
.
.
.
.
b
m1
b
mn
_
_
, then
B
T
A =
_
_
b
11
b
m1
.
.
.
.
.
.
b
1n
b
mn
_
_
_
_
a
11
a
1n
.
.
.
.
.
.
a
m1
a
mn
_
_
Hence, we get
tr(B
T
A) =
m
=1
a
1
b
1
+
m
=1
a
2
b
2
+ +
m
=1
a
n
b
n
=
n
j=1
m
=1
a
j
b
j
which corresponds to the standard inner product of the corresponding vectors under
the obvious isomorphism with C
mn
.
REMARK
As in the real case, whenever we are working with C
n
or M
mn
(C), if no other inner
product is specied, we will assume we are working with the standard inner product.
EXERCISE 2 Let A, B M
22
(C) with A =
_
1 1 i
2 + 2i i
_
and B =
_
1 1 i
2 + 2i i
_
. Determine
(A, B) and (B, A).
Length and Orthogonality
We can now dene length and orthogonality to match the denitions in the real case.
DEFINITION
Length
Unit Vector
Let V be a Hermitian inner product space with inner product ( , ). Then, for anyz V
we dene the length of z by
jz j =
_
_
z, z
_
If jz j = 1, then z is called a unit vector.
DEFINITION
Orthogonality
Let V be a Hermitian inner product space with inner product ( , ). Then, for any
z, w V we say that z and w are orthogonal if
_
z, w
_
= 0.
EXAMPLE 5 Let u =
_
_
i
1 + i
2 i
_
_
C
3
.
ju j =
_
_
u, u
_
=
_
u u
=
_
i(i) + (1 + i)(1 i) + (2 i)(2 + i) =
1 + 2 + 5 =
8
EXAMPLE 6 Let z =
_
_
1
i
2 + 3i
_
_
C
3
. Which of the following vectors are orthogonal to z:
u =
_
_
i
1
0
_
_
, v =
_
_
i
1
0
_
_
, w =
_
_
1 2i
7
1 i
_
_
Solution: We have
_
z, u
_
= z u =
_
_
1
i
2 + 3i
_
_
i
1
0
_
_
= i + i + 0 = 2i
_
z, v
_
= z v =
_
_
1
i
2 + 3i
_
_
i
1
0
_
_
= i i + 0 = 0
_
z, w
_
= z w =
_
_
1
i
2 + 3i
_
_
1 + 2i
7
1 + i
_
_
= (1 + 2i) 7i + (1 + 5i) = 0
Hence, v and w are orthogonal to z, and u is not orthogonal to z.
Of course, these satisfy all of our familiar properties of length and orthogonality.
THEOREM 4 Let V be a Hermitian inner product space with inner product ( , ). Then, for any
z, w V and C we have
(1) jz j = jz j
(2) jz + wj jz j + j wj
(3)
1
jz j
z is a unit vector in the direction of z.
DEFINITION
Orthogonal
Orthonormal
Let S = z
1
, . . . , z
k
be a set in a Hermitian inner product space with Hermitian inner
product ( , ). S is said to be orthogonal if
_
z
, z
j
_
= 0 for all j. S is said to be
orthonormal if it is orthogonal and jz
j
j = 1 for all j.
THEOREM 5 If S = z
1
, . . . , z
k
is an orthogonal set in a Hermitian inner product space V, then
jz
1
+ +z
k
j
2
= jz
1
j
2
+ + jz
k
j
2
THEOREM 6 If S = z
1
, . . . , z
k
is an orthogonal set of non-zero vectors in a Hermitian inner
product space V, then S is linearly independent.
The concept of projections and, of course, the Gram-Schmidt procedure are still valid
for Hermitian inner products.
EXAMPLE 7 Use the Gram-Schmidt procedure to nd an orthogonal basis for the subspace S of
C
4
dened by
S = Span
_
_
_
_
1
0
1
0
_
_
,
_
_
i
1
i
1
_
_
,
_
_
1 + i
1
i
1 + i
_
_
_
_
= w
1
, w
2
, w
3
Solution: First, we let v

1
= w
1
. Then we have
v
2
= w
2

_
w
2
, v
1
_
jv
1
j
2
v
1
=
_
_
i
1
i
1
_
2i
2
_
_
1
0
1
0
_
_
=
_
_
0
1
0
1
_
_
v
3
= w
3

_
w
3
, v
1
_
jv
1
j
2
v
1

_
w
3
, v
2
_
jv
2
j
2
v
2
=
_
_
1 + i
1
i
1 + i
_
1 + 2i
2
_
_
1
0
1
0
_
i
2
_
_
0
1
0
1
_
_
=
_
_
1/2
1 i/2
1/2
1 + i/2
_
_
Consequently, v
1
, v
2
, v
3
is an orthogonal basis for S.
We recall that a real square matrix with orthonormal columns is an orthogonal matrix.
Since orthogonal matrices were so useful in the real case, we generalize them as well.
DEFINITION
Unitary Matrix
If the columns of a matrix U form an orthonormal basis for C
n
, then U is called
unitary.
THEOREM 7 Let U M
nn
(C). Then the following are equivalent:
(1) the columns of U form an orthonormal basis for C
n
.
(2) U
T
= U
1
(3) the rows of U form an orthonormal basis for C
n
.
EXAMPLE 8 Determine which of the following matrices are unitary:
A =
_
1 1 + 2i
1 1 + 2i
_
, B =
_
_
(1 + i)/
2 0 0
0 0 1
0 i 0
_
_
, C =
_
_
1 + i
2
1 i
6
1
6
i
2
0
3 3i
24
1
2
2i
6
1 i
24
_
_
Solution: The columns of A are not unit vectors, so A is not unitary.
The columns of B are clearly unit vectors and orthogonal to each other, so the
columns of B form an orthonormal basis for C
3
. Hence, B is unitary.
We have
C
T
C =
_
_
1 i
2
i
2
1
2
1 + i
6
0
2i
6
1
6
3 + 3i
24
1 + i
24
_
_
_
_
1 + i
2
1 i
6
1
6
i
2
0
3 3i
24
1
2
2i
6
1 i
24
_
_
=
_
_
1 0 0
0 1 0
0 0 1
_
_
so C is unitary.
REMARK
Observe that if the entries of A are all real, then A is unitary if and only if it is
orthogonal.
THEOREM 8 If U and W are unitary matrices, then UW is a unitary matrix.
Notice this is the second time we have seen the conjugate of the transpose of a matrix.
So, we make the following denition.
DEFINITION
Conjugate
Transpose
Let A M
mn
(C). Then the conjugate transpose of A is
A
= A
T
EXAMPLE 9 If A =
_
_
1 0 1 + i 2i
1 3i 2 2 + i i
i 3i 4 i 1 5i
_
_
, then A
=
_
_
1 1 + 3i i
0 2 3i
1 i 2 i 4 + i
2i i 1 + 5i
_
_
.
REMARK
Observe that if A is a real matrix, then A
= A
T
. Hence, the conjugate transpose is
the complex version of the transpose.
THEOREM 9 Let A, B M
mn
(C) and let C. Then
(1) (Az, w) = (z, A
w) for all z, w C
n
(2) (A
= A
(3) (A + B)
= A
+ B
(4) (A)
= A
(5) (AB)
= B
We end this section by proving Lemma 10.2.2.

LEMMA 10 If A is a symmetric matrix with real entries, then all of its eigenvalues are real.
Proof: Let be any eigenvalue of A with corresponding unit eigenvector z C
n
.
Then, we have
=
_
z, z
_
=
_
z, z
_
=
_
Az, z
_
= Az z =z Az
by Theorem 10.2.4. Since A has all real entries, we get
Az = Az = z = z
Thus,
=z z = z z =
_
z, z
_
=
Consequently, is real.
1. Let u =
_
_
1
2i
1 i
_
_
, z =
_
_
2
2i
1 i
_
_
, and w =
_
_
1 + i
1 2i
i
_
_
be vectors in C
3
. Calculate the
following.
(a) ju j (b) j wj (c) (u, z) (d) (z, u)
(e) (u, (2 + i)z) (f) (z, w) (g) ( w, z) (h) (u +z, 2i w iz)
2. Determine which of the following matrices is unitary.
(a)
_
i/
2 1/
2
1/
2 i/
2
_
(b)
_
_
(1 + i)/2 (1 i)/
6 (1 i)/
12
1/2 0 3i/
12
1/2 2i/
6 i/
12
_
_
(c)
_
1/
2 1/
2
1/
2 1/
2
_
(d)
_
_
0 i 0
i 0 0
0 0 (1 + i)/2
_
_
3. Consider C
3
with its standard inner product. Let z =
_
_
1 + i
2 i
1 + i
_
_
, w =
_
_
1 i
2 3i
1
_
_
.
(a) Evaluate (z, w) and ( w, 2iz).
(b) Find a vector in Spanz, w that is orthogonal to z.
(c) Write the formula for the projection of u onto S = Spanz, w and then
nd the projection.
4. Let A, B M
mn
(C).
(a) Prove that (A
= A.
(b) (Az, w) = (z, A
w) for all z, w C
n
.
5. Let U M
nn
(C). Prove that if the columns of U form an orthonormal basis
for C
n
, then U
= U
1
.
6. Let U be an n n unitary matrix.
(a) Show that (Uz, U w) = (z, w) for any z, w C
n
.
(b) Suppose is an eigenvalue of U. Use part (a) to show that = 1.
(c) Find a unitary matrix U which has eigenvalue where 1.
7. Let V be a complex inner product space, with complex inner product ( , ).
Prove that if (u, v) = 0, then ju +v j
2
= ju j
2
+ jv j
2
. Is the converse true?
11.5 Unitary Diagonalization
In Section 10.2, we saw that every real symmetric matrix is orthogonally diagonaliz-
able. It is natural to ask if there is a comparable result in the case of matrices with
complex entries.
First, we observe that if A is a real symmetric matrix, then the condition A
T
= A is
equivalent to A
= A. Hence, this condition should be our equivalent of symmetric.

DEFINITION
Hermitian matrix
A matrix A M
nn
(C) such that A
= A is called Hermitian.
EXAMPLE 1 Which of the following matrices are Hermitian?
A =
_
2 3 i
3 + i 4
_
, B =
_
1 2i
2i 3 i
_
, C =
_
_
0 i i
i 0 i
i i 0
_
_
Solution: We have A
=
_
2 3 i
3 + i 4
_
= A, so A is Hermitian.
B
=
_
1 2i
2i 3 + i
_
B, so B is not Hermitian.
C
=
_
_
0 i i
i 0 i
i i 0
_
_
C, so C is not Hermitian.
Observe that if A is Hermitian then we have (A)
j
= A
j
, so the diagonal entries of
A must be real, and for j the j-th entry must be the complex conjugate of the
j-th entry. Moreover, we see that every real symmetric matrix is Hermitian. Thus,
we expect that Hermitian matrices satisfy the same properties as symmetric matrices.
THEOREM 1 An n n matrix is Hermitian if and only if for all z, w C
n
we have
_
Az, w
_
=
_
z, A w
_
Proof: If A is Hermitian, then A
= A and hence by Theorem 11.4.9 we get

_
Az, w
_
=
_
z, A
w
_
=
_
z, A w
_
If < z, A w >=< Az, w >, then we have
z
T
A w = z
T
A w =
_
z, A w
_
=
_
Az, w
_
= (Az)
T
w = z
T
A
T
w
Since this is valid for all z, w C
n
, we have that A = A
T
, and hence A = A
. Thus A
is Hermitian.
Section 11.5 Unitary Diagonalization 135
THEOREM 2 Suppose A is an n n Hermitian matrix. Then:
(1) all the eigenvalues of A are real.
(2) if
1
and
2
are distinct eigenvalues with corresponding eigenvectors v
1
and v
2
respectively, then v
1
and v
2
are orthogonal.
From this result we expect to get something very similar to the Principal Axis Theo-
rem for Hermitian matrices. But, instead of orthogonally diagonalizing, we now look
at the complex equivalent... unitarily diagonalizing.
DEFINITION
Unitarily Similar
If A and B are matrices such that U
AU = B, where U is a unitary matrix, then we

say that A and B are unitarily similar.
If A and B are unitarily similar, then they are similar. Consequently, all of our prop-
erties of similarity still applies.
DEFINITION
Unitarily
Diagonalizable
An n n matrix A is said to be unitarily diagonalizable if it is unitarily similar to a
diagonal matrix D.
EXAMPLE 2 Let A =
_
4 1 3i
1 + 3i 5
_
. Verify that A is Hermitian and show that A is unitarily
diagonalizable.
Solution: We see that A
=
_
4 1 3i
1 + 3i 5
_
= A, so A is Hermitian. We have
C() =
4 1 3i
1 + 3i 5
=
2
30 = ( + 5)( 6)
Hence, the eigenvalues are
1
= 5 and
2
= 6. We have
A + 5I =
_
1 1 3i
1 + 3i 10
_

_
1 1 3i
0 0
_
So a basis for E
1
is
__
1 + 3i
1
__
. We also have
A 6I =
_
10 1 3i
1 + 3i 1
_

_
1 (1 3i)/10
0 0
_
So a basis for E
2
is
__
1 3i
10
__
. Observe that
__
1 + 3i
1
_
,
_
1 3i
10
__
= (1 + 3i)(1 + 3i) + 1(10) = 0
Thus, the eigenvectors are orthogonal. Normalizing them, we get that A is diagonal-
ized by the unitary matrix U =
_
(1 + 3i)/
11 (1 3i)/
110
1/
11 10/
110
_
to D =
_
5 0
0 6
_
.
We expect that every Hermitian matrix is unitarily diagonalizable. We will prove this
exactly the same way that we did in Section 10.2 for symmetric matrices; by rst
proving that every square matrix is unitarily similar to an upper triangular matrix.
However, we now get the additional benet of not having to restrict ourselves to real
eigenvalues as we did with the Triangularization Theorem.
THEOREM 3 (Schurs Theorem)
If A is an n n matrix, then A is unitarily similar to an upper triangular matrix whose
diagonal entries are the eigenvalues of A.
Proof: We prove this by induction. If n = 1, then A is upper triangular. Assume
the result holds for all (n 1) (n 1) matrices. Let
1
be an eigenvalue of A with
corresponding unit eigenvector z
1
. Extend z
1
to an orthonormal basis z
1
, . . . , z
n
of
C
n
and let U
1
=
_
z
1
z
n
_
. Then, U
1
is unitary and we have
U
1
AU
1
=
_
_
z
1
T
.
.
.
z
n
T
_
_
A
_
z
1
z
n
_
=
_
_
z
1
T
1
z
1
z
1
T
Az
2
z
1
T
Az
n
z
2
T
1
z
1
z
2
T
Az
2
z
2
T
Az
n
.
.
.
.
.
.
.
.
.
z
n
T
1
z
1
z
n
T
Az
2
z
n
T
Az
n
_
_
For 1 j n we have
z
j
T
1
z
1
=
1
z
j
T
z
1
=
1
z
T
j
z
1
=
1
z
j
z
1
=
1
z
1
z
j
=
1
(z
1
, z
j
)
Since z
1
, . . . , z
n
is orthonormal, we get that z
1
T
1
z
1
=
1
and z
j
T
1
z
1
= 0 for 2
j n. Hence, we can write
U
1
AU
1
=
_
1

b
T
0 A
1
_
where A
1
is an (n1)(n1) matrix and
b C
n1
. Thus, by the inductive hypothesis
we get that there exits a unitary matrix Q such that Q
A
1
Q = T
1
is upper triangular.
Let U
2
=
_
1 0
0 Q
_
, then U
2
is clearly unitary and hence U = U
1
U
2
is unitary. Thus,
U
AU = (U
1
U
2
)
A(U
1
U
2
) = U
2
U
1
AU
1
U
2
=
_
1 0
0 Q
_ _
1

b
T
0 A
1
_ _
1 0
0 Q
_
=
_
1

b
T
Q
0 T
1
_
which is upper triangular. Since unitarily similar matrices have the same eigenvalues
and the eigenvalues of a triangular matrix are its diagonal entries, we get that the
eigenvalues of A are along the main diagonal of U
AU as required.
REMARK
Schurs Theorem shows that for every n n matrix A there exists a unitary matrix U
such that U
AU = T. Since U
T
= U
1
we can solve this for A to get A = UTU
.
This is called a Schur decomposition of A.
We can now prove the result corresponding to the Principal Axis Theorem.
THEOREM 4 (Spectral Theorem for Hermitian Matrices)
If A is Hermitian, then it is unitarily diagonalizable.
Proof: By Schurs Theorem, there exists a unitary matrix U and an upper triangular
matrix T such that U
AU = T. Since A
= A we get that
T
= (U
AU)
= U
= U
AU = T
Since T is upper triangular, we get that T
is lower triangular. Thus, T is both upper

and lower triangular and hence is diagonal. Consequently, U unitarily diagonalizes
A.
In the real case, we also had that every orthogonally diagonalizable matrix was sym-
metric. Unfortunately, the corresponding result is not true in the complex case as the
next example demonstrates.
EXAMPLE 3 Prove that A =
_
0 1
1 0
_
is unitarily diagonalizable, but not Hermitian.
Solution: Observe that A
=
_
0 1
1 0
_
A, so A is not Hermitian.
In Example 11.3.1, we showed that A is diagonalized by P =
_
i i
1 1
_
. Observe that
__
i
1
_
,
_
i
1
__
= (i)(i) + 1(1) = 0
so the columns of P are orthogonal. Hence, if we normalize them, we get that A is
unitarily diagonalized by U =
_
2
i
2
1
2
1
2
_
_
.
The matrix in Example 3 satises A
= A and so is called skew-Hermitian. Thus,

we see that there are matrices which are not Hermitian but whose eigenvectors form
an orthonormal basis for C
n
and hence are unitarily diagonalizable. In particular,
in the proof of the Spectral Theorem for Normal Matrices, we can observe that the
reverse direction fails because if A is not Hermitian, we cannot guarantee that T
= T
since some of the eigenvalues may not be real.
So, as we did in the real case, we should look for a necessary condition for a matrix
to be unitarily diagonalizable.
Assume that eigenvectors of A form an orthonormal basis z
1
, . . . , z
n
for C
n
. Then,
we know that U =
_
z
1
z
n
_
unitarily diagonalizes A. That is, we have U
AU =
D, where D is diagonal. Hence, D
= U
U. Now, notice that DD
= D
D since D
and D
are diagonal. This gives

(U
AU)(U
U) = (U
U)(U
AU) U
AA
U = U
AU
Consequently, if A is unitarily diagonalizable, then we must have AA
= A
A.
DEFINITION
Normal Matrix
An n n matrix A is called normal if AA
= A
A.
THEOREM 5 (Spectral Theorem for Normal Matrices)
A matrix A is unitarily diagonalizable if and only if A is normal.
Proof: We proved that every unitarily diagonalizable matrix is normal above. So,
we just need to prove that every normal matrix is unitarily diagonalizable. Of course,
we do this using Schurs Theorem.
By Schurs Theorem, there exists an upper triangular matrix T and a unitary matrix
U such that U
AU = T. We just need to prove that T is in fact diagonal. Observe

that
TT
= (U
AU)(U
U) = U
AA
U = U
AU = (U
U)(U
AU) = T
T.
Hence T is also normal and if we compare the diagonal entries of TT
and T
T we
get
t
11
2
+ t
12
2
+ + t
1n
2
= t
11
2
t
22
2
+ + t
2n
2
= t
12
2
+ t
22
2
.
.
.
t
nn
2
= t
1n
2
+ t
2n
2
+ + t
nn
2
Hence, we must have t
j
= 0 for all j, so T is diagonal as required.
As a result normal matrices are very important. We now look at some useful proper-
ties of normal matrices.
THEOREM 6 If A is an n n normal matrix, then
(1) jAz j = jA
z j, for all z C
n
.
(2) A I is normal for every C.
(3) If Az = z, then A
z = z.
Proof: For (1) observe that if AA
= A
A, then AA
T
= A
T
A. Then
jAz j
2
=< Az, Az >= (Az)
T
Az = z
T
A
T
Az = z
T
AA
T
z
= z
T
(A
)
T
A
z = (A
z)
T
A
z =< A
z, A
z >= jA
z j
2
for any z C
n
.
For (2) we observe that
(A I)(A I)
= (A I)(A
I) = AA
A +
2
I
= A
A A A
+
2
I = (A
I)(A I)
= (A I)
(A I)
For (3) suppose that Az = z for some z C
n
and let B = A I. Then B is normal
by (2) and
Bz = (A I)z = Az z =
0.
So, by (1) we get
0 = jBz j = jB
z j = j(A
I)z j = jA
z z j
Thus, A
z = z, as required.
1. Unitarily diagonalize the following matrices.
(a)
_
a b
b a
_
(b) B =
_
4i 1 + 3i
1 + 3i i
_
(c) C =
_
_
1 0 1 + i
0 2 0
1 i 0 0
_
_
2. Prove that all the eigenvalues of a Hermitian matrix A are real.
3. Prove that all the eigenvalues of a skew-Hermitian matrix A are purely imagi-
nary.
4. Let A M
nn
(C) satisfy A
= iA.
(a) Prove that A is normal.
(b) Show that every eigenvalue of A must satisfy = i.
5. Let A M
nn
(C) be normal and invertible. Prove that B = A
A
1
is unitary.
6. Let A and B be Hermitian matrices. Prove that AB is Hermitian if and only if
AB = BA.
7. Let A =
_
2i 2 + i
1 i 3
_
.
(a) Prove that
1
= 2 + i is an eigenvalue of A.
(b) Find a unitary matrix U such that U
AU = T is upper triangular.
11.6 Cayley-Hamilton Theorem
We now use Schurs theorem to prove an important result about the relationship of a
matrix with its characteristic polynomial.
THEOREM 1 (Cayley-Hamilton Theorem)
If A M
nn
(C), then A is a root of its characteristic polynomial C().
Proof: By Schurs theorem, there exists a unitary matrix U and an upper triangular
matrix T such that U
AU = T. Let
C() = c
n
n
+ c
n1
n1
+ + c
1
+ c
0
= (
1
) (
n
)
For any n n matrix X we dene
C(X) = c
n
X
n
+ c
n1
X
n1
+ + c
1
X + c
0
I
Observe that
C(T) = C(U
AU)
= c
n
(U
AU)
n
+ + c
1
(U
AU) + c
0
I
= c
n
U
A
n
U + + c
1
U
AU + c
0
U
U
= U
c
n
A
n
U + + U
c
1
AU + U
c
0
U
= U
(c
n
A
n
+ + c
1
A + c
0
I)U
= U
C(A)U
We can complete the proof by showing C(T) is the zero matrix.
Since the eigenvalues
1
, . . . ,
n
of A are the diagonal entries of T we have
C(X) = (1)
n
(X
1
I)(X
2
I) (X
n
I)
so
C(T) = (1)
n
(T
1
I)(T
2
I) (T
n
I). (11.1)
Observe that the rst column of T
1
I is zero since the rst column of T is
_
1
0
.
.
.
0
_
_
.
Then, since the second column of T
2
I is
_
_
c
0
.
.
.
0
_
_
, we get that (T
1
I)(T
2
I) has
the rst two columns zero since the rst two columns of T
1
I have the form
_
_
b
0
.
.
.
0
_
_
.
Continuing, we see that all the columns of Equation 11.1 are zero as required.
Section 11.6 Cayley-Hamilton Theorem 141
EXAMPLE 1 Show that A =
_
a b
0 c
_
is a root of its characteristic polynomial.
Solution: Since A is upper triangular, we have C() = (a)(c) =
2
(a+c)+ac.
Then,
C(A) = A
2
(a + c)A + acI
=
_
a
2
ab + bc
0 c
2
_
_
a
2
+ ac ab + cb
0 ac + c
2
_
+
_
ac 0
0 ac
_
=
_
0 0
0 0
_
EXERCISE 1 Show that A =
_
_
a b c
0 d e
0 0 f
_
_
is a root of its characteristic polynomial in two ways. First,
show it directly by using the method of Example 1. Second, show it by evaluating
C(A) = (A aI)(A dI)(A f I)
by multiplying from left to right as in the proof.

235 Course Notes

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

235 Course Notes

Загружено:

Авторское право:

Доступные форматы

Linear Algebra 2

Course Notes for MATH 235

b is in the columnspace of R and hence can be written as a linear com-

is a basis for Col(A).

0 as normal. That is, we rewrite the system

0 as normal. That is, we rewrite the system

0 if and only if det A = 0.

We also dene the composition of mappings in the expected way.

and the range of L is

0, so a basis for Ker(L) is the

0 since L is one-to-one. Hence Ker(L) =

0 and consider L(u) = L(v). Then

0 = L(u) L(v) = L(u v)

0. Therefore, u = v and so L is one-to-one.

0 since L is one-to-one. Thus, we get rank(L) = dimV

0. Thus, the Rank-Nullity Theorem gives

0. Hence, it is positive denite.

f (x)g(x) dx, then make it an orthonormal set.

(a) Prove that J is an orthogonal basis for P

Therefore, we can ignore w

The following theorem shows that for a vector x to be in W

it only need be orthog-

0, w) = 0 for all w W by Theorem 3.2.1. Let u, v W

we have x V so we can write

is orthogonal to every vector W. Moreover, it contains

0, v) = 0 for every v V. Therefore,

is an orthogonal basis for W

. By Theorem 2 we have that v

. To ensure this is valid, we need to

0. The direct sum of U and Wis

0 a basis for U Wis x

= ndimW. Combining this with Theorem

, then x is orthogonal to each column of A

is a basis for W, then v

b. Hence, we need to minimize

bj. The following theorem tells us how to do this.

b j by nding the projection of

b Ax is in the orthogonal complement of the columnspace of A which, by the

bj for the system

bj for the system

0 since the columns of X are linearly independent. Thus X

bj for each of the following systems.

= (1)[(1)(5) 4] (1) = (1)(6)

0. Hence the nullspace of A is a subset of the

0. Hence, the nullspace of A

COROLLARY 3 If A is an m n matrix and rank(A) = r, then A has r non-zero singular values.

0, then v is a right singular vector of A.

where is the m n matrix with ()

We look at one more important example of complex multiplication.

0 = z. It is called the zero vector.

= 2 + i 0, so A is invertible. Using the

Solution: First, we let v

We end this section by proving Lemma 10.2.2.

= A. Hence, this condition should be our equivalent of symmetric.

= A and hence by Theorem 11.4.9 we get

AU = B, where U is a unitary matrix, then we

is lower triangular. Thus, T is both upper

= A and so is called skew-Hermitian. Thus,

U. Now, notice that DD

are diagonal. This gives

AU = T. We just need to prove that T is in fact diagonal. Observe

Вам также может понравиться