Вы находитесь на странице: 1из 55

Matrix Computations

Marko Huhtanen
1 Introduction
Matrix computations is at the center of numerical analysis requiring know-
ledge of several mathematical techniques and at least a rudimentary unders-
tanding of programming. In the early era during the 50s and 60s, the eld
could be described as being comprised of certain fundamental matrix facto-
rizations and how to compute them reliably. (Then computer architectures
were sequential whereas parallel computing has become the dominant pa-
radigm since.) These inluded the LU factorization and SVD decomposition,
as well as algorithms for solving the eigenvalue problem. The complexity of
these algorithms is O(n
3
) while the storage requirement is O(n
2
).
Having these algorithms in an acceptable (early) form did not mean that
the numerical linear algebra problems were wiped away. Typically applica-
tions involve partial dierential equations, already from the very early era of
computing of the 40s. Once discretized, the matrices approximating the cor-
responding linear operators are sparse. Roughly, this means that only O(n) of
the entries are nonzeros. (The reason: dierentiation operates locally on func-
tions at a point.) Stored by taking this into account, i.e., zeros are not stored,
the storage requirement is just O(n) as opposed to O(n
2
). Consequently, ve-
ry large matrices could be generated to approximate the original problem.
In fact, so large that the existing matrix computational techniques could not
be used at all to solve the corresponding linear algebra problems. This was
(and still is) certainly irritating: everything has been carefully set up and
it just remains to do the matrix computations. Except that it is not going
to happen. The solution turns out to be out of reach because of its severe
computational complexity. Something that was considered originally a tri-
vial linear algebra problem has turned into an exact opposite, i.e., actually
an exceptionally tough problem. So either you scale down your ambitions
and accept coarser approximations (something you do not want to do), or
try to solve the linear algebra problems somehow (but not at any cost since
you cannot aord it).
From the 70s iterative methods started seriously gaining ground to solve
very large linear algebra problems without O(n
3
) complexity and O(n
2
) sto-
1
rage requirement. Analogous developements are still going on in every area
of numerical analysis (or scientic computing or computational science or
whatever you want to call it). As a rule, in executing iterative methods, the
matrix (or matrices) related with the problem cannot be manipulated free-
ly. For example, it may be that matrix-vector products with the matrix is
the only information available, although you certainly want to avoid such an
extereme. However, mathematically this means that the underlying assump-
tions are getting closer to those usually made in, let us say, operator theory.
To get an idea what this could imply as opposed to studying classical matrix
analysis, assuming having the Hermitian transpose may be unrealistic.
A reason for writing these lecture notes is the hope of being able to combine
classical matrix analytic techniques with those mathematical ideas that are
useful in studying iterative methods. Occasionally this means that the view-
point is more abstract and functional analytic. The abstractness is, however,
not an aim. This is underscored by the fact that everything what is done
takes place in C
nn
.
1
It is assumed that the students have learnt undergraduate linear algebra and
know things such as the Gram-Schmidt orthogonalization process, the Gaus-
sian elimination and know basics of the eigenvalues and eigenvectors, like how
to solve tiny (2-by-2 or 3-by-3) eigenvalue problems by hand. One should al-
so be familiar with the standard Euclidean geometry of C
n
originating from
the inner product
(x, y) = y

x =

j=1
x
j
y
j
of vectors x, y C
n
. (This is, of course, needed in the Gram-Schmidt ortho-
gonalization.) The Hermitian transpose of a matrix A is the unique matrix
A

satisfying (Ax, y) = (x, A

y) for every vectors x and y.


2 Product of matrix subspaces
Factoring matrices is the way to solve small linear algebra problems. (Small is
relative. It typically means that you can use a PC to nish the computations
in reasonable time.) The most well-known case is that of solving a linear
system
Ax = b (1)
with a nonsingular A C
nn
and a vector b C
n
given. The solution can
be obtained by the Gaussian elimination by computing the LU factorization
of A, i.e., A is represented as (actually replaced with) the product
A = LU, (2)
1
Here it is appropriate to quote P. Halmos: "Matrix theory is operator theory in the
most important and the most translucent special case."
2
where L is a lower triangular and U an upper triangular matrix.
2
For the com-
putational complexity, it requires O(n
3
) oating point operations to compute
the LU factorization.
In view of large scale problems, it is benecial to approach factoring matrices
more generally. A reason for this is that then approximate factoring is more
realistic than factoring exactly. (Also in small computational problems the
equality (2) holds only approximately, because of nite precision.)
The notion of matrix subspace is a reasonably exible structure to this end.
That is, V C
nn
is a matrix subspace of C
nn
over C (or R) if
V
1
+V
2
C
nn
holds whenever , C (or R) and V
1
, V
2
V.
Any subalgebra of C
nn
is clearly also a subspace of C
nn
. Lower and upper
triangular matrices are subalgebras of C
nn
.
Example 1 The set of complex symmetric matrices consists of matrices
M C
nn
satisfying M
T
= M. It is a matrix subspace of C
nn
of dimension
n + (n 1) + + 1 = (n + 1)n/2.
Example 2 The set of Hermitian matrices consists of matrices M C
nn
satisfying M

= M. It is a matrix subspace of C
nn
over R of dimension
n + 2(n 1 + + 1) = n
2
.
Example 3 The set of Toeplitz matrices consists of matrices M C
nn
having constant diagonals. It is a matrix subspace of C
nn
of dimension
2n 1.
The structure you have in the LU factorization is the following.
Denition 1 Assume V
1
and V
2
are matrix subspaces of C
nn
over C (or
R). Their set of products is dened as
V
1
V
2
= {V
1
V
2
: V
1
V
1
and V
2
V
2
}.
With two matrix subspaces V
1
and V
2
, basic questions arising in practice are
the following. For a given A C
nn
, does
A V
1
V
2
2
Later on, we will see that partial pivoting is needed to a numerically reliable compu-
tation PA = LU, where P is a permutation.
3
hold, i.e., do we have A = V
1
V
2
? If it does, how to compute such a factoriza-
tion? And what is the computational complexity? If we do not care about
an exact factorization, how to approximate A with an element of the set of
products to have A V
1
V
2
?
For the LU factorization the answers are fortunately known.
Problem 1 Show that a nonsingular A C
nn
has an LU factorization if
and only if it is strongly nonsingular.
3
Problem 2 Show that either A C
nn
has an LU factorization or there is
a matrix arbitrarily close to A which has. (In other words, the closure of the
set of products of lower and upper triangular matrices is C
nn
.)
The singular value decomposition is related with the problem of approxima-
ting a given matrix A C
nn
with matrices of rank k at most. Such matrices
can be expressed as the product of matrix subspaces. (For simplicity, we only
consider the square matrix case.)
Denition 2 Matrices of rank k at most in C
nn
is the set of product V
1
V
2
where V
1
(resp. V
2
) is the subspace of C
nn
consisting of matrices having the
last n k columns (resp. rows) zeros.
Matrices of rank k at most in C
nn
are denoted by F
k
.
Often one uses the expansion
V
1
V
2
= [u
1
u
2
u
k
0 0][v
1
v
2
v
k
0 0]

=
k

j=1
u
j
v

j
(3)
with u
j
, v
j
C
n
for j = 1, . . . , k, to represent matrices from F
k
. This shows
that for k < n the elements of F
k
are singular (and do not constitute a
subspace). Namely, take x span{v
1
, . . . , v
k
}
4
to have x N(V
1
V
2
).
Observe that, by using (3), an element of F
k
requires storing only 2nk
complex numbers. This means that the rank-one matrices in the sum are
not computed explicitely unless necessary. Certain operations, such as com-
puting matrix-vector products, are still possible (and less expensive) by using
(3).
3
A is strongly nonsingular if all its principal minors are nonzero. A principal minor is
the determinant of any k-by-k matrix cut from the left-upper corner of A.
4
Find an element in the nullspace of the matrix whose rows are v

1
, . . . , v

k
.
4
Example 4 Matrices of the form I + V
1
V
2
are important as well, where
I C
nn
denotes the identity matrix and V
1
V
2
F
k
. Historically they
are related with integral equations. More importantly, they can be inverted
quickly (whenever invertible) since their structure is preserved in inversion.
In the rank-one case, try this out by nding the scalar C that solves the
equation
(I +u
1
v

1
)(I +u
1
v

1
) = I.
The cost is about one inner product. This is an example of a matrix which is
very easy to invert, i.e., never use standard methods such as the Gaussian
elimination with matrices of this form.
3 The singular value decomposition
The singular value decomposition (SVD) has many applications. (The ap-
proach is completely analogous if A C
nn
is rectangular but not square.)
From the view point of data compression, n
2
complex numbers must be kept
in memory when A is stored. There are ways to try to approximate mat-
rices somehow with fewer parameters, i.e., compress A. The singular value
decomposition is related with the approximation problem
min
F
k
F
k
||AF
k
||, (4)
where the norm of a matrix will be dened below.
5
Bear in mind that an
element of F
k
requires storing just 2nk complex numbers. Consequently, if
for a small k the value of (4) is small, A can be well compressed with the
help of the singular value decomposition.
Observe that in case of a compression when we do not have zero in (4), we
are satised with an approximate factorization
A V
1
V
2
(5)
of A. (One should keep in mind that actually any numerically computed
factorization is approximate.)
Let us nally dene the singular value decomposition.
Denition 3 The singular value decomposition of A C
nn
is a factoriza-
tion A = UV

with U, V C
nn
unitary and a diagonal C
nn
with the
diagonal entries satisfying
1

2

n
0.
5
This approximation problem can be formulated in Banach spaces, to measure com-
pactness. It is just in the Euclidean setting of C
n
where the SVD happens to solve the
problem.
5
If
k+1
= 0, then the so-called compressed SVD is A =

U

with

U and

V of
size n-by-k consisting of the rst k columns of U and V . The diagonal matrix

is the k-by-k block from the left upper corner of . Then the columns of

U yield an orthonormal basis of the column space of A. The dropped n k


columns of V yield an orthonormal basis of the null-space of A.
Recall that a matrix Q C
nn
is unitary if its columns are orthonormal, i.e.,
Q

Q = I (6)
holds. Unitary matrices are extremely important for computations. The rea-
son is that they do not magnify errors by the fact that unitary matrices are
norm preserving, i.e., (6) is equivalent to having
||Qx|| = ||x|| (7)
for every x C
n
.
Unitary matrices can be generated easily once you invoke the Gram-Schmidt
process.
Problem 3 Assume A C
nn
is nonsingular. Show that orthonormalizing
its columns with the Gram-Schmidt process starting from the leftmost co-
lumn is equivalent to computing a representation A = QR with Q unitary
and R upper triangular with positive diagonal entries. (This is called the QR
factorization of A.)
Recall that unitary matrices appear also in connection with the Hermitian
eigenvalue problem.
Problem 4 Assume A C
nn
is Hermitian, i.e., A

= A. Show that A can


be unitarily diagonalized, i.e.,
A = UU

(8)
with a unitary U C
nn
and a diagonal matrix C
nn
. (Hint: show rst
that eigenvectors related to diering eigevalues are orthogonal.)
That there exists a singular value decomposition is based on inspecting the
eigenvalue problem for A

A. Since A

A Hermitian, it can be unitarily diago-


nalized as
A

A = V V

Assume the eigenvalues are ordered nonincreasingly. (Note that A

A is posi-
tive semidenite, i.e., its eigenvalues are nonnegative.) Take any eigenvectors
v
j
and v
l
of A

A. Then Av
j
and Av
l
are orthogonal by the fact that
(Av
j
, Av
l
) = (v
j
, A

Av
l
) = (v
j
,
l
v
l
) = 0. (9)
6
Consequently, take the columns of U to be
u
j
=
Av
j
_

j
=
Av
j

j
(10)
for nonzero eigenvalues
j
of A

A. For zero eigenvalues, the remaining eigen-


vectors are in the null space of A by (9). Corresponding to these, take any
set of orthonormal vectors in the orthogonal complement of the vectors. For
them, the corresponding singular values are zeros.
The (operator) norm of a matrix A C
nn
is dened as
||A|| = max
||x||=1
||Ax||. (11)
It expresses how large at most can A make unit vectors.
6
Because of (7) we
have
||A|| = ||||. (12)
For diagonal matrices the norm is easy to compute and we obtain ||A|| =
1
.
So the rst singular value of A is unique.
Problem 5 Show that for a diagonal matrix the norm is the maximal ab-
solute value of its diagonal entries. (Observe that you can restrict the com-
putations to real numbers.)
Repeating these arguments yields the fact that the singular values of a matrix
are uniquely determined.
Problem 6 Show that the singular values of a matrix A C
nn
are unique.
Without resorting to (12) it would be very challenging to compute the norm
of A. In particular, if you use other than the Euclidean norm in C
n
, then you
have this challenge. (And numerous other challenges. So the reason for not
using the Euclidean norm should be exceptionally good.)
How fast the singular values decay is directly related with how well A is
approximated in (4).
Theorem 1 Let A C
nn
. Then the value of the minimization problem (4)
is
k+1
.
Proof. The value of (4) is at most
k+1
. This is seen by using the singular
value decomposition of A and forming F
k
by setting
k+1
= =
n
= 0.
6
Unit vector is a vector of length one.
7
(This corresponds to replacing the n k last columns of U and rows of V

with zeros.)
To see that
k+1
is actually the minimum, consider V
1
V
2
F
k
which realizes
(4). Let w

1
, . . . , w

k
be the nonzero rows of V
2
. Then choose a unit vector
v which is a linear combination of v
1
, . . . , v
k+1
which is in the orthogonal
complement of w
1
, . . . , w
k
. (For this, nd a nonzero solution to a (k +1)-by-k
homogeneous linear system.) Then (AV
1
V
2
)v = Av and its norm is at least

k+1
. End of proof.
The solution constructed from the singular value decomposition solves the
minimization problem (4) actually in any unitarily invariant norm. A norm
||| ||| on C
nn
is said to be unitarily invariant if for any A C
nn
holds
|||A||| = |||Q
1
AQ
2
|||
for any unitary Q
1
, Q
2
C
nn
. Aside from the operator norm, the Frobenius
norm || ||
F
is often used in practice, due to its computational convenience.
The Frobenius norm is induced by the inner product
(A, B) = trace B

A (13)
on C
nn
. When dealing with matrix subspaces over R, use
(A, B) = Re trace B

A. (14)
The trace is computed by summing the diagonal entries of the matrix. This
simply means that C
nn
is treated as C
n
2
using the standard inner product.
Therefore
||A||
F
=

_
n

j=1
n

k=1
|a
jk
|
2
.
This is clearly an easy computation whereas computing ||A|| is more involved.
Numerically reliable algorithms for computing the SVD do not proceed the
way we showed the existence of the SVD above. The problem with the exis-
tence proof is the formation of A

A which can have very large and very


small numbers, relatively. Numerically this may imply problems and in tho-
se situations one shoud never accept the numerical results blindly. (Roughly,
for any nontrival numerical problem, there is no numerical method which is
100 percent reliable. A numerical method is good, if the measure of those
problems where it fails is negligible.) Fortunately, all the respectable publicly
or commercial codes available for computing the SVD can be regarded as re-
liable.
7
7
Such as LAPACK or Matlab. For numerical libraries, see, e.g., wikipedia.
8
Problem 7 In a numerically reliable computation of the SVD, the so-called
Householder transformations are used. They matrices of the form I + uv

with u, v C
n
which are additionally unitary. Using Example 4 for , nd
explicit conditions for u and v to have a Householder transformation, i.e., a
unitary I +uv

.
4 Nonsingular and invertible matrix subspaces
The singular value decomposition can be viewed as a method for solving
the approximate factoring problem (5). Other (approximate) factorization
problems require completely dierent techniques. In this section we identify
a family of matrix subspaces which allows solving the task.
It is noteworthy that F
k
possesses a very special structure since both factors
involve singular matrices. It is not that we absolutely want to use singular
approximations in practice. Often it is quite the opposite, since it is easy to
see that an arbitrary A C
nn
is invertible with probability one. Therefore
the SVD provides actually a very peculiar way to approximate matrices in
general.
Example 5 A way to circumvent the described problems with the SVD is
to consider a more general approximation in terms of summing. This can be
done in many ways. For a classical example, principal component analysis
is an approach in statistics to analyze data. The process is simply an SVD
approximation after the data has been mean centered, i.e., translated so that
its center of mass is at the origin.
The question of existence of invertible elements in a matrix subspace is hence
of central relevance for general algorithmic factoring.
Denition 4 A matrix subspace V of C
nn
is said to be nonsingular if there
are invertible elements in V.
Example 6 In the eigenvalue problem for a given A C
nn
one deals with
the nonsingular matrix subspace
V = span{I, A}. (15)
The singular elements of V correspond to the eigenvalues of A.
Example 7 In the generalized eigenvalue problem for given A, B C
nn
one is concerned with solving
Ax = Bx (16)
9
for C and nonzero x C
n
. In the case of interest, one deals with the
nonsingular matrix subspace
V = span{A, B}. (17)
The singular elements of V correspond to the generalized eigenvalues.
Since the singular elements of a matrix subspace are important, let us make
some remarks on their structure. To this end, assume V
1
, . . . , V
k
is a basis of
a matrix subspace V of C
nn
over C (or R). Thus, for any V V we have a
unique representation
V =
k

j=1
z
j
V
j
(18)
with z
j
C (or R) for j = 1, . . . , k. Dene
p(z
1
, . . . , z
k
) = det
k

j=1
z
j
V
j
and set
V (p) = {z C
k
: p(z
1
, . . . , z
k
) = 0} (19)
This gives us a so-called determinantal hypersurface in C
k
(or R
k
).
Denition 5 V (p) is said to be the spectrum of V in the basis V
1
, . . . , V
k
of
V.
It is noteworthy that the spectrum is basis dependent. However, if W
1
, . . . , W
k
is another basis of V, then there exists a matrix M C
kk
such that MV (p)
is the spectrum in this new basis.
Problem 8 Let V be the set of upper triangular matrices. Find the spectrum
in some basis of V.
Problem 9 Suppose W
l
=

k
j=1
m
lj
V
j
. Show that M = {m
lj
} yields the
matrix for transforming the spectrum into new basis W
1
, . . . , W
k
.
Example 8 In the eigenvalue problem for a given A C
nn
one chooses,
without exception, V
1
= I and V
2
= A.
For dimV > 2 computing the spectrum is a tough problem in general. In the
case dimV = 2 it is handled with equivalence transformations.
10
Denition 6 Matrix subspaces V and W are said to be equivalent if there
exist invertible matrices X, Y such that W = XVY
1
.
With the spectrum it is easy to see that if there are nonsingular elements in
V, then most of them are.
Proposition 1 Suppose there are nonsingular elements in a matrix subspace
V. Then the set of nonsingular elements is open and dense.
Proof. Because of the isomorphism (18), V can be indentied with C
k
(or R
k
).
Its is clear that V (p) is a closed set in C
k
(or R
k
). It cannot have interior
points either. Namely, if (z
1
, . . . , z
k
) were an interior point, then x z
2
, . . . , z
k
and regard p as a polynomial in the one variable z
1
. It has a nite number of
zeros. Thereby an arbitrary small perturbation yields a nonzero value. End
of proof.
There are two types of nonsingular matrix subspaces. To see this, we start
by recalling how the LU factorization is found.
Suppose A C
nn
is invertible and can be LU factored. The Gaussian eli-
mination to this end constructs a lower triangular matrix L
1
such that
L
1
A = U (20)
is an upper triangular matrix. Then the fact that the inverse of a nonsingular
lower triangular matrix is lower triangular matrix is used to conclude that
this actually yields an LU factorization of A.
To generalize this, for a nonsingular matrix subspace V set
Inv(V) = {V
1
: V V, det V = 0} (21)
and call it the set of inverses of V. It is not easy to characterize this set in
general. The structure you have in the lower triangular case is of the following
type.
Denition 7 A matrix subspace V is said to be invertible if
Inv(V) = {W : W W, det W = 0}
for a nonsingular matrix subspace W.
This means that the set of inverses is a matrix subspace, aside from those
elements which are singular. Or, in other words, the closure of Inv(V) is a
matrix subspace.
11
Clearly, W is unique and denoted by V
1
and called the inverse of V.
The set of lower (upper) triangular matrices is an invertible matrix subspace
by the fact that it is a subalgebra of C
nn
containing invertible elements.
The inverse is the set of lower (upper) triangular matrices.
Example 9 Let V be the set of Hermitian matrices. Use (8) to conclude
that V is invertible with V
1
= V.
Invertibility is preserved under equivalence, i.e., if W = XVY
1
and V is
invertible, then so is W and
W
1
= Y V
1
X
1
holds.
Problem 10 Let V
T
= {V
T
: V V}. Then V is invertible if and only if
V
T
is. (For the Hermitian transposition an analogous claim holds.)
The following notion is of importance for matrix subspaces as well as for
iterative methods considered later.
Denition 8 Let A C
nn
. Then the minimal polynomial
8
of A is the
monic polynomial p of the least degree satisfying
p(A) = 0. (22)
Problem 11 Show that similar matrices have the same minimal polynomial.
(Recall that two matrices A and B are similar if there exists an invertible
matrix such that A = XBX
1
.) Show also that the minimal polynomial is
unique.
The Jordan canonical form can be used to show that the characteristic po-
lynomial annihilates A. Hence the degree of the minimal polynomial is at
most n. The characteristic polynomial may not yield the minimal polyno-
mial though.
Problem 12 Suppose A C
nn
is diagonalizable. Determine the degree of
the minimal polynomial in terms of the distinct eigenvalues of A.
8
A monic polynomial c
k
z
k
+c
k1
z
k1
+ +c
2
z
2
+c
1
z +c
0
has the leading coecient
c
n
= 1.
12
Let p(z) = z
k
+ c
k1
z
k1
+ + c
2
z
2
+ c
1
z + c
0
be the minimal polynomial
of A. Then A is invertible if and only if c
0
= 0. Namely, if c
0
= 0, then
A(
1
c
0
A
k1
+
c
k1
c
0
A
k2
+ +
c
1
c
0
I) = I. (23)
In particular, this also shows that the inverse of A is a polynomial in A. For
the converse, if c
0
= 0, then
A(A
k1
+c
k1
A
k2
+ +c
1
I) = 0
and since A
k1
+c
k1
A
k2
+ +c
1
I = 0, A cannot be invertible.
Example 10 The set of complex symmetric matrices is an invertible matrix
subspace. Its inverse is the set of complex symmetric matrices. To see this,
assume A is complex symmetric and invertible. If p is a polynomial, then
p(A)
T
= p(A
T
) = p(A), i.e., any polynomial in A is complex symmetric.
Therefore the inverse of A is complex symmetric.
In this example we used the following fact. A matrix subspace V over C
(or R) is said to be polynomially closed if V V, then p(V ) V for any
polynomial p with complex (real) coecients.
Proposition 2 Suppose a nonsingular matrix subspace V is equivalent to a
matrix subspace which is polynomially closed. Then V is invertible.
Problem 13 Let A C
nn
. For iterative methods the matrix subspaces
K
j
(A; I) = span{I, A, . . . , A
j1
}
are very important, for j = 1, 2, . . .. Are these nonsingular matrix subspaces?
For what value of j is K
j
(A; I) invertible?
5 Factoring algorithmically
Assume given two nonsingular matrix subspaces V
1
and V
2
of which one
is invertible. Let us suppose V
2
is invertible with the inverse W. Suppose
A C
nn
is nonsingular and the task is to recover whether A V
1
V
2
.
Clearly, A = V
1
V
2
holds if and only if
AW = V
1
(24)
for some nonsingular W W. This latter problem is linear and thereby
completely solvable. (Once done, we have A = V
1
W
1
.) To this end we will
use projections.
13
Recall that a linear operator P on a vector space is a projection if P
2
= P. A
projector moves points onto its range and acts like the identity operator on
the range. Such operators are of importance in numerical computations and
approximation. In the so-called dimension reduction approximation, the task
is to nd in some sense a good projector which is used to replace the original
problem with a problem which is smaller in dimension. Then the problem is
projected onto the range of P. Observe that if P is a projector, then so is
I P.
It is preferable to use orthogonal projectors since they take the shortes path
while moving points to the range. We say that P is an orthogonal projector
if
R(P) R(I P),
i.e., the range of P is orthogonal to the range of I P. Orthogonality requires
using an inner product, so that we use (13) on C
nn
. (Recall that then C
nn
is isometrically isomorphic with C
n
2
.)
The standard way of constructing an orthogonal projector onto a given subs-
pace is to take its orthonormal basis q
1
, . . . , q
k
. Then set
Px =
k

j=1
q
j
(x, q
j
).
Occasionally orthogonal projectors onto familiar matrix subspaces are rea-
dily available. The orthogonal projector on C
nn
onto the set of Hermitian
matrices is given by
PA =
1
2
(A+A

). (25)
(Because the set of Hermitian matrices is a subspace over R, use (14).) This
is the so-called Hermitian part of A. Similarly, onto the set of complex sym-
metric matrices the orthogonal projector acts according to
PA =
1
2
(A+A
T
). (26)
These are simple to apply.
Problem 14 Suppose P is a projection. Then show that P is an orthogonal
projection i the operator norm ||P|| = 1 i for every x holds ||x||
2
= ||(I
P)x||
2
+||Px||
2
.
A matrix is called standard if there is exactly one entry which equals 1 while
other entries equal zero. A matrix subspace V is called standard if it has
a basis consisting of standard matrices. This simply means that there are
no interdepencies between the entries of V. In this case the the orthogonal
14
projector P onto V acts such that PA simply replaces with zeros those entries
of A which are outside the sparsity structure of V. Other entries of A are
kept intact.
Denition 9 The sparsity structure of a matrix subspace V means the loca-
tion of those entries which are nonzero for some V V.
Consider now solving the problem (24). Denote by P
1
the orthogonal pro-
jector onto V
1
. Dene a linear map
W (I P
1
)AW (27)
from W to C
nn
. Since I P
1
is an orthogonal projector onto the orthogonal
complement of V
1
, the solutions can be computed by nding the nullspace of
(27). Namely, if W is in the nullspace, then (I P
1
)AW = 0. Thereby we
have AW = V
1
V
1
.
Of course, since (27) is linear, the nullspace is computable. (As we know,
there are even nite step algorithms to this end.)
It is a classical result that every matrix A C
nn
is the product of two
complex symmetric matrices. So far the demonstrations of this result have be
quite ad hoc. Let us see how this happens routinely with the above method.
Example 11 Let us factor A =
_
1 2
1 1
_
into the product of two symmetric
matrices. We have
(I P)A
_
s
1
s
2
s
2
s
3
_
= (I P)
_
s
1
+ 2s
s
s
2
+ 2s
3
s
1
+s
2
s
2
+s
3
_
=
_
0 s
1
/2 +s
3
s
1
/2 s
3
0
_
.
Thereby, whenever s
1
= 2s
3
we have a symmetric AS = S
1
. The nullspace is
clearly nonsingular. Moreover, its dimension is two, so that the factorization
is certainly not unique.
This is no coincidence. It can be shown that to factor any A C
nn
into
the product of two complex symmetric matrices, the nullspace is nonsingular
and at least of dimension n.
Because of Proposition 1, the dimension of the nullspace of (27) expresses
how nonunique the factorization is. Of course, in practice it typically suces
to compute a single factorization.
15
Observe that if V
1
had been invertible instead of V
2
, then we can proceed by
transposing the problem, i.e., then the question reads whether A
T
V
T
2
V
T
1
holds. Another option is to proceed by replacing (27) with
W (I P
2
)WA (28)
where W is now the inverse of V
1
. Then we look for the nullspace of this
linear map. If (I P
2
)WA = 0, then WA = V
2
V
2
. This relates directly
with (20) how the LU factorization can be computed.
Example 12 In the standard method for an LU factorization, one computes
(20) by constructing a single element L
1
by performing row operations such
that L
1
A is upper triangular. Let us see how we solve the problem comple-
tely with (28). Then W is the set of lower triangular matrices and V
2
the set
of upper triangular matrices. They are both standard matrix subspaces. Let
an invertible A = {a
jk
} be given. Denote by W = {w
jk
} the lower triangular
matrix of variables. Now I P
2
projects to the set of stricly lower triangu-
lar matrices. Therefore nding the nullspace of (28) gives us the equations
collected row-wise from the strictly lower triangular part
a
11
w
21
+a
21
w
22
= 0
a
11
w
31
+a
21
w
32
+a
31
w
33
= 0
a
12
w
31
+a
22
w
32
+a
32
w
33
= 0
.
.
.
and so forth. There are no equations for w
11
. One equation binding w
21
and
w
22
. Two equations binding w
31
, w
32
and w
33
. And so forth. It is noteworthy
that here the transpose of A starts appearing row-wise, instead of A. In the
generic case (= what happens with probability one) the dimension of the
nullspace of (28) is now n. This means then that any two dierent factors L
of an LU factorization dier by a scaling. That is, A = LU = LDD
1
U =

L

U
for a diagonal matrix D.
Denition 10 Let A C
nn
. Multiplying A from the right (left) by an
invertible diagonal matrix is a called right (left) scaling of A
Observe that it may not be necessary to invert the factor W at all. This
is the case, for example, in the Gaussian elimination applied to the linear
system (1). Then one has Ux = c, where U = L
1
A and L
1
b = c. In such
a case, it suces W be nonsingular since we do not need to understand
what Inv(W) is like. In preconditioning (considered later in connection with
iterative methods) one has such a situation.
In practice it may be realistic to compute only an approximate factorization.
The problem is challenging if we consider
inf
V
1
V
1
, V
2
V
2
||AV
1
V
2
|| (29)
16
to this end. First, there does not seem to exist any direct way of solving this
problem. Second, we have to accept dealing with the inmum instead of a
minimization problem. Numerically such a situation indicates possibly severe
problems.
Example 13 Let
A =
_
1
1 1
_
and suppose V
1
is the set of lower and V
2
the set of upper triangular mat-
rices. Then the value (29) is zero for any . For = 0 the minimum does
not exist since A is not LU factorizable. Otherwise we have
_
1
1 1
_
=
_
1 0
1/ 1
_ _
1
0 1 1/
_
. For very small , the factors are huge. Moreo-
ver, in nite precision arithmetics, 1 1/ is replaced with 1/. This has a
dramatic eect since
_
1 0
1/ 1
_ _
1
0 1/
_
A =
_
0 0
0 1
_
which is not
small at all.
Numerically one option is to look at
min
WW
||(I P
1
)AW|| (30)
by imposing additional constraints for W to satisfy. This could be some sort
of norm conditions. As a result, we have (I P
1
)AW 0, i.e., AW V
1
with V
1
= P
1
AW. Then it remains to invert W, if required.
In approximations with (30), one should keep in mind that
||AV
1
W
1
||
||W
1
||
||AW V
1
||

AV
1
W
1

||W|| .
Here appears the condition number
(W) = ||W||||W
1
|| (31)
of W which scales the accuracy of the approximations. This is no accident.
The condition number appears often in assessing accuracy of numerical linear
algebra computations. From the SVD of W one obtains (W) =

1
n
.
6 Computing the LU factorization with partial
pivoting
In factoring in practice, one typically computes a single element of the nulls-
pace (27) or (28). This should be done as fast as possible without sacricing
17
the accuracy of the numerical results. This means that one tries to benet
from the properties of the problem as much as possible. For the LU facto-
rization this means computing the standard LU factorization not for A but
for a matrix which has the rows of A reordered, i.e.,
PA = LU, (32)
where P is a permutation matrix to make the computations numerically more
stable. This permutation is not known in advance. Finding P is part of the
algorithm called the LU factorization with partial pivoting.
Problem 15 Show that the complexity of the standard Gaussian elimina-
tion for A C
n
(when all the pivots are nonzero) is
2
3
n
3
ops.
9
We work out a low dimensional example to see what is going on.
10
For the
matrix A below, the standard Gaussian row operations give
A =
_

_
2 1 1 0
4 3 3 1
8 7 9 5
6 7 9 8
_

_
=
_

_
1 0 0 0
2 1 0 0
4 3 1 0
3 4 1 1
_

_
_

_
2 1 1 0
0 1 1 1
0 0 2 2
0 0 0 2
_

_
= LU.
In the L factor we get a hint of the catastrophic behaviour of Example 13, i.e.,
away from the diagonal appear large entries. To avoid this, one must partial
pivot inbetween the row operations. The resulting LU factorization with
partial pivoting has replaced the standard LU factorization since the 1950s.
(To such an extent that by an LU factorization of A is typically meant the
LU factorization (32).)
The rule is simple: the pivot must be the largest entry among those entries
being under elimination. This is achieved by permuting rows. We have for
the rst column
P
1
A =
_

_
0 0 1 0
0 1 0 0
1 0 0 0
0 0 0 1
_

_
_

_
2 1 1 0
4 3 3 1
8 7 9 5
6 7 9 8
_

_
=
_

_
8 7 9 5
4 3 3 1
2 1 1 0
6 7 9 8
_

_
.
Then the row operations expressed in terms of a matrix-matrix product for
the rst column read
L
1
P
1
A =
_

_
1 0 0 0
1
2
1 0 0
1
4
0 1 0
3
4
0 0 1
_

_
_

_
8 7 9 5
4 3 3 1
2 1 1 0
6 7 9 8
_

_
=
_

_
8 7 9 5
0
1
2
3
2
3
2
0
3
4
5
4
5
4
0
7
4
9
4
17
4
_

_
.
9
By a op is meant a oating point operation: sum, dierence, product or a fraction of
two complex numbers.
10
This example is from [4].
18
Then
P
2
L
1
P
1
A =
_

_
1 0 0 0
0 0 0 1
0 0 1 0
0 1 0 0
_

_
_

_
8 7 9 5
0
1
2
3
2
3
2
0
3
4
5
4
5
4
0
7
4
9
4
17
4
_

_
=
_

_
8 7 9 5
0
7
4
9
4
17
4
0
3
4
5
4
5
4
0
1
2
3
2
3
2
_

_
and
L
2
P
2
L
1
P
1
A =
_

_
1 0 0 0
0 1 0 0
0
3
7
1 0
0
2
7
0 1
_

_
_

_
8 7 9 5
0
7
4
9
4
17
4
0
3
4
5
4
5
4
0
1
2
3
2
3
2
_

_
=
_

_
8 7 9 5
0
7
4
9
4
17
4
0 0
2
7
4
7
0 0
6
7
2
7
_

_
.
Then, similarly, with
P
3
=
_

_
1 0 0 0
0 1 0 0
0 0 0 1
0 0 1 0
_

_
and L
3
=
_

_
1 0 0 0
0 1 0 0
0 0 1 0
0 0
1
3
1
_

_
we complete the process to have
L
3
P
3
L
2
P
2
L
1
P
1
A = U =
_

_
8 7 9 5
0
7
4
9
4
17
4
0 0
6
7
2
7
0 0 0
2
3
_

_
.
Problem 16 The Gaussian row operation matrices (also called the Gauss
transforms) can be expressed as
L
j
= I +l
j
e

j
(33)
with l
j
C
n
having the rst j entries zeros. Show that L
1
j
= I l
j
e

j
. (Here
e
j
denotes the jth standard basis vector.) Using this, show that the inverse
of L
n1
L
1
is I

n1
j=1
l
l
e

j
.
This looks certainly complicated. However, we have (hidden) here the searc-
hed factorization PA = LU. To see this, there holds
L
3
P
3
L
2
P
2
L
1
P
1
= L

3
L

2
L

1
P
3
P
2
P
1
once we set
L

3
= L
3
, L

2
= P
3
L
2
P
1
3
and L

1
= P
3
P
2
L
1
P
1
2
P
1
3
.
We have P = P
3
P
2
P
1
and L
1
= L

3
L

2
L

1
.
19
Matrices L

j
are easily found. Because of (33),
L

j
= I +P
n1
P
j+1
l
j
e

j
P
1
j+1
P
1
n1
.
Observe that when P
l
is applied to a vector, it does not permute the l 1
rst entries. Therefore e

j
P
1
j+1
P
1
n1
= (P
n1
P
j+1
e
j
)

= e

j
. Moreover,
the rst j entries of l

j
= P
n1
P
j+1
l
j
are zeros while the remaing entries
are just those of l
j
permuted.
If A C
nn
is nonsingular, then the Gaussian elimination with partial pivo-
ting always yieds a factoriziation (32). The reason is that the permutations
and Gauss transforms are invertible, so that the transformed matrix remains
invertible. (And if there were just zeros in a column such that no P
l
can
bring a nonzero to the lth diagonal position, then the rst l columns would
be necessarily linearly dependent.)
Problem 17 In assessing accuracy of nite precision computations, often
the norm used on C
n
is
||x||

= max
1jn
|x
j
|.
This is the so-called max norm. Then the corresponding norm of a matrix
A C
nn
is dened as
||A||

= max
||x||=1
||Ax||

.
Show that in (32) produced with the partially pivoted Gaussian elimination
we have ||L||

n.
Problem 18 In Problem 17, a moderate linear growth with respect to the
dimension was shown to bound the max norm of the L factor. Unfortunately,
the max norm of the U factor can grow exponentially. Show that ||U||


2
n
||A||

is possible by considering A = {a
ij
} with
a
ij
=
_
_
_
1, when i = j or j = n
1, when i > j
0, else
.
Example 14 Consider Example 13. For A =
_
1
1 1
_
the partial pivoted
Gaussian elimination yields P
1
=
_
0 1
1 0
_
, so that
P
1
A =
_
1 0
1
_ _
1 1
0 1
_
.
In nite precision arithmetics, 1 is replaced with 1. This does not have a
dramatic eect at all since P
1
1
LU A =
_
0
0 0
_
.
20
Regarding the cost of the partial pivoting, in choosing P
j
one needs to com-
pute the absolute values of n j + 1 entries of a column of the U matrix
computed so far. In all this sums up to computation of O(n
2
) absolute values.
This is negligible in comparison with
2
3
n
3
ops consumed in row operations.
The so-called complete pivoting means applying permutations from the left
and right as PAQ whose LU factorization is gets then computed. The corres-
ponding algortihm is signicantly more costly by the fact that O(n
3
) absolute
values need now to be evaluated in constructing P and Q. Therefore the LU
factorization with complete pivoting is rarely used. So far, partial pivoting
has been sucient to deal with practical computational problems.
Because of Examples 13 and 14, we must say a few words about the oa-
ting point arithmetic the computers rely on. The current standard is the
IEEE double precision arithmetic. Practically all computer manufacturers
have chosen to use this standard. Its purpose is to discretize R by replacing
it with a nite set of rational numbers. Numbers can be in absolute value
between 2.23 10
308
and 1.79 10
308
. Otherwise there will be underow or
overow. This is usually not the thing to worry about. Since only a nite set
of rational numbers are in use, there appear gaps between numbers compared
with R. To desrcibe this, the interval [1, 2] R is represented by
1, 1 + 2
52
, 1 + 2 2
52
, 1 + 3 2
52
, . . . , 2. (34)
For other intervals this representation is then scaled such that the interval
[2
j
, 2
j+1
] is represented by the rational numbers (34) multiplied by 2
j
. This is
done untill the overow is reached. (Similarly for the underow.) For example,
the interval [2, 4] R is represented by
2, 2 + 2
51
, 2 + 2 2
51
, 2 + 3 2
51
, . . . , 4.
This means that the gaps are of the same size, in the relative sense.
In modelling the IEEE double precision standard, one assumes that there is
neither underow nor overow, and that there is the zero number. Designate
this set by F. Then fl : R F maps real numbers onto F such that
fl(x) = x(1 +),
where ||
machine
with the IEEE double precision standard
machine
=
2
53
1.1110
16
. This is the way the computer is regarded to round real
numbers; it retains about 16 correct digits.
Once numbers are rounded, we can perform elementary arithmetic opera-
tions with them. These are the sum, dierence, product and fraction. Let
designate such an operation. If x, y F, it is assumed that the oating point
arithmetic for computing x y R yields
(x y)(1 +) F,
21
where ||
machine
.
Using this model, it can be shown that if

L are

U are the factors computed
in nite precision arithmetic, then the standard LU factorization satises

U = A+A,
where
||A||
||L||||U||
= O(
machine
). The partially pivoted LU factorization satises

U = PA+A,
where
||A||
||A||
= O(
machine
). Here =
max
i,j
|u
ij
|
max
i,j
|a
ij
|
.
11
Examples 13 and 14 are
clearly in line with these results.
7 Using the structure: Cholesky factorization,
Sylvester equation and FFT
To solve a linear system (1), we know now that it can be done reliably at the
cost of O(n
3
) ops requiring storing n
2
numbers. This is thereby the worst
scenario in the sense that it can only be improved. (With a linear system
one should always ask, do I really need the LU factorization or is there a
better way.) An improvement requires that the linear system has some special
structure. This is often the case in applications. We illustrate this with three
very dierent examples: the Cholesky factorization, the Sylvester equation
and the use of the FFT.
The Cholesky factorization replaces the LU factorization if the matrix is
positive denite. It is simply the LU factorization computed by keeping in
mind that A is positive denite.
Denition 11 A matrix A C
nn
is positive denite if it is Hermitian and
satises
(Ax, x) > 0 (35)
for any nonzero x C
n
.
Problems related with energy minimization typically involve positive denite
matrices. (In physics, there is no lack of such problems!)
Problem 19 Show that a Hermitian matrix is positive denite if and only
if its eigenvalues are strictly positive. Moreover, show that if A is positive
denite and M is invertible, then MAM

is positive denite.
11
See Numerical Linear Algebra, L.N. Trefethen and D. Bau, III, SIAM, 1997.
22
Suppose a positive denite A C
nn
is split as A =
_
a
11
a

a B
_
with a
C
n1
. Use the condition (35) with x = e
1
to conclude that a
11
> 0. Then we
have
A =
_
0
a/ I
_ _
1 0
0 B aa

/a
11
_ _
a

/
0 I
_
= R
1
A
1
R

1
(36)
with =

a
11
. The middle factor A
1
is positive denite by Problem 19.
Therefore Baa

/a
11
is positive denite as well. To see this, use the condition
(35) with vectors x whose rst entry is zero. Consequently, the trick of (36)
can be repeated with the block B aa

/a
11
. Once completed, we have
A = R
1
R
2
R
n
IR

n
R

2
R

1
= RR

,
the so-called Cholesky factorization of A. Since only an upper triangular
matrix R is needed, the Cholesky factorization requires storing n
2
/2 numbers.
Problem 20 Show that the Cholesky factorization can be computed twice
faster than the standard LU factorization, i.e., that it requires
1
3
n
3
ops
to compute the Cholesky factorization of a positive denite A C
nn
.
It is quite remarkable that no partial pivoting is needed in computing the
Cholesky factorization. The reason is that the operator norm of the factor R
can be elegantly controlled in terms of that of A. Namely, if R = UV

is
the SVD of R, then A = U
2
U

is the SDV of A. (Compare with Example


13, where the matrix is not positive denite for small.)
Next we consider solving the so-called Sylvester equation which appears in
control theory and stability analysis. To this end, assume given A, B, C
C
nn
. The task is to nd a matrix X C
nn
solving
AX XB = C (37)
which is called the Sylvester equation. The linear operator associated with
the Sylvester equation is
X AX XB (38)
from C
nn
to C
nn
. Hence (37) could be written as a standard linear system
of size n
2
-by-n
2
. Then the LU factorization would consume
2
3
n
6
ops to solve
the problem. This is not very attractive. It turns out that the problem can
be solved numerically reliably by consuming O(n
3
) ops only. Then we do
not factor the linear operator (38).
To this end we need a similarity transformation called the Schur demposition.
23
Theorem 2 Let A C
nn
. Then there exists a unitary Q C
nn
and an
upper triangular T C
nn
such that A = QTQ

.
Proof. The following construction is not algorithmic.
Because the characteristic polynomial has (by the fundamental theorem of
algebra) at least one zero, A has at least one eigenvector q
1
C
n
. Suppose
it is of unit length and Q
1
C
nn
is unitary having q
1
as its rst column.
Then A = Q
1
T
1
Q

1
, where T
1
has the rst column consisting of zeros except
that the rst entry is the eigenvalue
1
corresponding to q
1
.
This idea is repeated with the right lower (n 1)-by-(n 1) block of T
1
.
We obtain A = Q
1
Q
2
T
2
Q

2
Q

1
, where the rst column of T
2
equals the rst
column of T
1
. The second column of T
2
consists of zeros except for the rst
two entries.
Continue this construction to have
A = Q
1
Q
n1
TQ

n1
Q

1
,
where T = T
n1
and Q = Q
1
Q
n1
. End of proof.
Observe that the eigenvalues of A, denoted by (A), can be found from the
diagonal of T.
If A is Hermitian, then T is diagonal with real entries. More generally, if T
is diagonal, then A is said to be normal.
By the same reasoning, there exists a factorization of A as A = QTQ

with
a unitary Q and a lower triangular T. (Or, use the theorem for A

and then
Hermitian transpose the result.)
The above construction is not algorithmic since the computation of the zeros
of a polynomial is numerically a very tough problem. Still, the Schur decom-
position of a matrix can be computed numerically reliably by using O(n
3
)
ops. The algorithm is the so-called QR iteration. The eigenvalue problem is
solved in this way in practice.
Problem 21 It seems that nding the zeros of the characteristic polynomial
gives you the eigenvalues. In reality it goes the other way. To see this, consider
a monic polynomial p(z) = z
3
+ a
2
z
2
+ a
1
z + a
0
. How are its zeros related
with the eigenvalues of the matrix
A =
_
_
0 0 a
0
1 0 a
1
0 1 a
2
_
_
called the companion matrix of p? From this you can guess what is the
companion matrix of p(z) = z
m
+a
m1
z
m1
+ +a
1
z +a
0
.
24
To solve the Sylvester equation numerically reliably, we assume having com-
puted the Schur decompositions of A and B as
A = Q
1
TQ

1
, and B = Q
2
SQ

2
,
where T is lower and S is upper triangular. Then (37) is equivalent to
TY Y S = D (39)
with Y = Q

1
XQ
2
and D = Q
1
CQ

2
. Since T and S are lower and upper
triangular, the solving can be done by substitution. It takes place column-
wise starting from the rst column as
(t
11
s
11
)y
11
= d
11
(t
22
s
11
)y
21
+t
21
y
11
= d
21
(t
33
s
11
)y
31
+t
32
y
21
+ t
31
y
11
= d
31
.
.
.
.
.
.
.
.
.
.
These have a unique solution for any rst column of D if and only if the
eigenvalue s
11
of B is not an eigenvalue of A. Having these solved, move on
to the second column
(t
11
s
22
)y
12
s
12
y
11
= d
12
(t
22
s
22
)y
22
s
12
y
21
+t
21
y
12
= d
22
(t
33
s
22
)y
32
s
12
y
31
+t
32
y
22
+t
31
y
12
= d
32
.
.
.
.
.
.
.
.
.
.
These have a unique solution for any second column of D if and only if the
eigenvalue s
22
of B is not an eigenvalue of A.
These arguments can be repeated to have the following result.
Theorem 3 Let A, B C
nn
. Then (37) has a unique solution for any
C C
nn
if and only if (A) (B) = .
Problem 22 Assume A and B are diagonalizable and you manage to diago-
nalize them numerically reliably as A = X
1
D
1
X
1
1
and B = X
2
D
2
X
1
2
. How
you can use this to solve the Sylverster equation (37).
The third example we deal with is the fast Fourier transform (FFT) origina-
ting from numerical Fourier analysis. It is an algorithm encompassing many
ideas and principles which are important more generally in matrix compu-
tations. Recall that the Fourier coecents of a sucently regular function
f : [0, 1] C are dened by
c
j
=
_
1
0
f(t)e
2jti
dt (40)
25
for n Z. Then f(t) =

j=
c
j
e
2jti
holds (in some sense). Initially, the
Fourier transform was used in solving partial dierential equations. Since
then it (and its many variants) has found use in many applications, most
notably in signal processing.
In practice, only a nite number of the Fourier coecients can be computed.
Then the computation of a single Fourier coecient relies on numerical in-
tegration. (Only rarely on has an f which allows intgration in a closed form.
Also, it is possible that f can only be sampled at a nite set.) Since the com-
putation of the Fourier coecients is linear
12
the numerical problem turns
into a problem in matrix computations.
In the numerical computation of (40), the interval [0, 1] is divided into n
subintervals. Thereafter the integral is approximated with the Riemann sum.
These approximations can expressed in terms of a matrix-vector product. For
n = 1, 2, 3, . . . the associated matrices are
F
1
= [1], F
2
=
_
1 1
1 1
_
,
_
_
1 1 1
1 w
3
w
2
3
1 w
2
3
w
4
3
_
_
. . . ,
where w
n
= e
2i/n
, i.e., an nth complex root of 1. For F
n
= {f
jk
} C
nn
the (j, k)-entry is
f
jk
= e
2i(j1)(k1)/n
.
This is called the Fourier matrix. An application of
1
n
F
n
to a vector x C
n
corresponds to computing a numerical approximation to the Fourier coef-
cients c
0
, . . . , c
n1
of f. (Then f is sampled with values put into x.) The
operation is called the discrete Fourier transformation (DFT) of x. The vector
x consists of the values of f at the grid points 0, 1/n, . . . , (n 1)/n.
Problem 23 The Fourier matrix is a so-called Vandermonde matrix dened
as follows. Let z
0
, . . . , z
n1
C be distinct. Then the associated Vandermon-
de matrix is
V (z
0
, . . . , z
n1
) =
_

_
1 z
0
z
2
0
z
n1
0
1 z
1
z
2
1
z
n1
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1 z
n1
z
2
n1
z
n1
n1
_

_
.
Consider an interpolation problem: Let c
0
, . . . , c
n1
C be given. Find a
polynomial p of degree n 1 such that p(z
j
) = c
j
for j = 0, . . . , n 1. How
can you solve this problem with the help of V (z
0
, . . . , z
n1
)?
Hence, F
n
= V (1, w
n
, w
2
n
, . . . , w
n1
n
).
12
We have
_
1
0
(f(t) + g(t))e
2jti
dt =
_
1
0
f(t)e
2jti
dt +
_
1
0
g(t)e
2jti
dt for any
, C.
26
Proposition 3 Let F
n
be the Fourier matrix. Then F
n
is complex symmetric
and satises F

n
F
n
= nI.
Proof. It is clear that F
n
is complex symmetric. To show that F

n
F
n
= nI,
the (j, k)-entry of the matrix F

n
F
n
is s
jk
=

n1
l=0
w
n
jl
w
kl
n
=

n1
l=0
w
(kj)l
n
.
For j = k we have s
jk
= n. Otherwise, compute the sum in the standard way
to have
s
jk
w
kj
n
s
jk
= 0
by the fact that w
kj
n
w
(kj)(n1)
n
= 1. End of proof.
This means that
1

n
F
n
is unitary. (Because of this, sometimes
1

n
F
n
is called
the Fourier matrix.) And F
1
n
=
1
n
F

n
=
1
n
F
n
because of complex symmetry.
Only because the Fourier matrix appears so often in applications, it deserves
all this special attention.
The FFT is an algorithm to perform matrix-vector products rapidly with the
Fourier matrix when n = 2
l
for l N. In other words, DFT, i.e., the numerical
computation of the Fourier coecients can be done very fast. Bear in mind
that for an arbitrary matrix A C
nn
, the matrix-vector product can be
expected to require about 2n
2
ops. (n
2
multiplications and n(n 1) sums.)
For certain values of n, this can be done much fast with F
n
.
Assume thus n = 2
l
. Denote m =
n
2
. Then
F
n
=
_
I D
I D
_ _
F
m
0
0 F
m
_
P, (41)
where D is a diagonal matrix and P a permutation matrix dened as follows.
We have D = diag(1, w
n
, . . . , w
m1
n
) and P acts as
P[x
0
x
n1
]
T
= [x
0
x
2
x
4
x
n2
x
1
x
3
x
5
x
n1
]
T
,
i.e., P collects rst the even components and thereafter the odd components
of x. Before showing that (41) holds, lets do the operation count for a matrix-
vector product with the right hand side of (41).
We assume that forming Px is free. Thereafter applying twice with F
m
costs
2(2(
n
2
)
2
) = n
2
ops. Then the last matrix-vector product costs 2n ops when
applied taking into account its structure, i.e., zeros are ignored in applying
diagonal matrices in the blocks. In all n
2
+2n. This is less than 2n
2
, of course.
This idea can be used again. There is nothing we can do about these 2n ops.
However, by the same trick applied twice to F
m
replaces n
2
with
1
2
n
2
+2n. This
can be repeated log
2
n times. Thereby we consume just 2nlog
2
n = 2nl ops
in all. This is much less than 2n
2
, the cost of computing F
n
x by performing
the matrix-vector product directly.
13
13
Sometimes only multiplications are counted. On a computer they take longer than
sums. Then there are other constants instead of 2 in front.
27
It remains to show (41). Let y = F
n
x. Then
y
j
=
n1

k=0
w
kj
n
x
k
=
m1

k=0
w
2kj
n
x
2k
+
m1

k=0
w
(2k+1)j
n
x
2k+1
.
by separating the components of x into even and odd. Thus we have
y
j
=
m1

k=0
w
kj
m
x

k
+w
j
n
m1

k=0
w
kj
m
x

k
. (42)
by denoting x

= [x
0
x
2
x
4
x
n2
]
T
and x

= [x
1
x
3
x
5
x
n1
]
T
. Set y

=
F
m
x

and y

= F
m
x

. Then (42) can be written as


y
j
= y

j
+w
j
n
y

j
(43)
and
y
j+m
= y

j
w
j
n
y

j
(44)
for j = 0, . . . , m1. (Use w
km
m
= 1 and w
m
n
= 1 to have (44).) This is the
factorization (41) written componentwise.
Observe that the inverse of the Fourier matrix can be applied with the same
speed by the fact that
1
n
F
1
n
x =
1
n
F
n
x.
Aside from numerical Fourier analysis, the Fourier matrix appears in connec-
tion with Toeplitz matrices. (See Example 3.) This is best illustrated with
with the circulant matrices, a subset of Toeplitz matrices. Circulant matrices
are dened conveniently with the help of the permutation matrix
P =
_

_
0 0 0 1
1 0 0 0
0 1 0 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 1 0
_

_
. (45)
Problem 24 If you have a Hermitian matrix, you know its eigenvalues are
located on the real axis. (You can see this by using the Schur decomposition.)
Since P is a permutation, it is also unitary. Using the Schur decomposition,
where are the eigenvalues of P located now?
Denition 12 Let P be the permutation (45). Then K
n1
(P; I) is the set of
circulant matrices.
28
Let p(z) =

n1
j=0
c
j
z
j
be a polynomial. Then the polynomial p in a matrix
A C
nn
is dened to be the matrix p(A) =

n1
j=0
c
j
A
j
. In the case of P we
have
C = p(P) =
n1

j=0
c
j
P
j
=
_

_
c
0
c
n1
c
n2
c
1
c
1
c
0
c
n1
c
2
c
2
c
1
c
0
c
3
.
.
.
.
.
.
.
.
.
.
.
.
c
n1
c
n2
c
1
c
0
_

_
(46)
illustrating the matrix structure of circulant matrices.
The columns of the Fourier matrix are eigenvectors of circulant matrices.
Problem 25 Let F
n
be the Fourier matrix and P C
nn
the permutation
matrix (45). Find the diagonal matrix satisfying
PF
n
= F
n
, (47)
i.e., determine the eigenvalues of P by hand.
Denote by Q the unitary matrix Q =
1

n
F
n
and consider the circulant matrix
C in (46). Because of (47), C is normal, i.e., unitarily similar with a diagonal
matrix. This follows from
Q

CQ =
n1

j=0
c
j
Q

P
j
Q =
n1

j=0
c
j
(Q

PQ)
j
=
n1

j=0
c
j

j
. (48)
Since the eigenvalues of P are known (see Problem 25), this yields an O(n
2
)
algorithm to determine the eigenvalues of a circulant matrix C by evaluating
the corresponding polynomial p at the eigenvalues of P. (Assuming the com-
putation of p() for any eigenvalue costs O(n) ops. Realistic because the
eigenvalues are so special.)
Problem 26 The polynomial trick (48) generalizes. That is, show that if
A C
nn
and p is a polynomial, then
(p(A)) = p((A)),
where p((A)) = {p() : (A)}. (Hint: Again the Schur decomposition.)
If n = 2
l
, then there is a much faster way to nd the eigenvalues and hence
diagonalize C. This relies on using the FFT as follows. From the identity
F
1
n
C = F
1
n
,
29
consider the rst columns on both sides to have
F
n
[c
0
c
1
c
n1
]
T
= [
1

2

n
]
T
.
Consequently, the eigenvalues require computing a single matrix-vector pro-
duct with F
n
. Hence, by invoking the FFT, the eigenvalues of C can be found
at the cost of O(nlog
2
n) ops. Moreover, then a linear system
Cx = b (49)
with b C
n
can be solved in O(nlog
2
n) ops by using the diagonalization
C = F
n
F
1
n
and the FFT. That is, x = F
n

1
F
1
n
b.
Problem 27 Suppose T C
kk
is a Toeplitz matrix. Devise a method to
perform matrix-vector products fast with T in two steps. First construct a
circulant matrix C C
nn
with n = 2
l
k containing T as its block. Then
perform matrix-vector products appropriately with C to have matrix-vector
products with T.
There are also fast algorithms for solving linear systems involving Toeplitz
matrices [1].
8 Eigenvalue problems and functions of mat-
rices
Next we are concerned with the eigenvalue problem
Ax = x (50)
for a given A C
nn
. We also say something about the generalized eigenvalue
problem (16).
Compared with solving linear systems, the eigenvalue problem is of dierent
nature since it is not solvable through a nite computation. This follows
from the fact that nding eigenvalues and nding zeros of polynomials are
equivalent problems. (See Problem 21.) Abels theorem on nding zeros of
polynomials states that, in general, the solutions cannot be expressed exactly
with radicals
14
in terms of the coecients of the polynomial if the degree of
the polynomial is ve or larger.
Because of this negative result, all the numerical methods for solving the
eigenvalue problem are iterative and stopped after some degree of accuracy is
14
Repeatedly forming sums, dierences, products, quotients, and radicals (nth roots, for
some integer n) of previously obtained numbers.
30
achieved with the approximations. Of course, for computers, the IEEE double
precision accuracy would certainly suce. However, if there are eigenvalues
of very dierent magnitude, it is unlikely attained for all the eigenvalues.
In practice, there are two types of eigenvalue problems. Either all the eigen-
values and perhaps the corresponding eigenvectors need to be computed. Or,
only the eigenvalues (and perhaps the corresponding eigenvectors) located in
some region of the complex plane are of interest. Techniques for this latter
problem are described in connection with iterative methods.
Example 15 How only a few eigenvalues could be of interest? In quantum
mechanics one deals with Hermitian operators. The energy levels of a system
are given by the eigenvalues. One is often interested in knowing the lowest
energy levels, i.e., nding the smallest eigenvalues of the matrix obtained
after discretizing the problem.
Before outlining how the full eigenvalue problem is very succesfully sol-
ved with the so-called QR iteration, we mention some inclusion results. An
inclusion region for the eigenvalues is a subset of C which is know to inclu-
de (A). It should be mentioned that such results were more important in
the early days of numerical analysis, before the existence of reliable nume-
rical methods for the eigenvalue problem. It seems that their importance is
marginal at present.
Example 16 The eigenvalues of a Hermitian matrix are contained in the
real axis. The eigenvalues of a skew-Hermitian matrix are contained in the
imaginary axis.
The so-called Gershgorin discs of a matrix A C
nn
are dened as follows.
For j = 1, . . . n, set R
j
=

l=j
|a
jl
|. Then dene
D
j
= {z C : |a
jj
z| R
j
}. (51)
Theorem 4 Let A C
nn
. Then there holds
(A)
n
_
j=1
D
j
,
where D
j
is the Gershgorin disc of A dened in (51).
Proof. Suppose (A), so that (50) holds for a nonzero vector x C
n
.
Assume the jth entry satises |x
j
| |x
l
| for all l = 1, . . . n. Then
|a
jj
|

l=j
|a
jl
||x
l
|/|x
j
|

l=j
|a
jl
|
31
and therefore D
j
. End of proof.
First, we do not show that every D
j
contains an eigenvalue. Second, this so-
called Gershgorins theorem is basis dependent. (=The linear operator corres-
ponding to A represented in another basis gives rise to dierent Gershgorin
discs.) This can be beneted from. Before there were numerical methods to
solve the eigenvalue problem, various tricks using similarity transformations
were invented to squeeze more information out of the Gershgorins theorem.
The goal: sharp information is obtained in case A is diagonal.
Another inclusion region for the eigenvalues is the so-called eld of values of
A C
nn
dened as
F(A) = {x

Ax : ||x|| = 1}.
Occcasionally, the eld of values is also called the numerical range of A. It
contains the eigenvalues of A (easy) and is convex (not so easy). It has also
some other uses. However, the computation of F(A) is tedious and based on
using the following two observations repeatedly. First, F(A) = F(A) for
any C. Second, we have A = H +iK with Hermitian
H =
1
2
(A+A

) and K =
1
2i
(AA

).
Then x

Ax = x

Hx + ix

Kx gives the real and maginary parts of x

Ax.
In particular, nding min
||x||=1
x

Hx and max
||x||=1
x

Hx give left and right


vertical extremes of F(A). (This means nding two extreme eigenvalues of
H.) Repeat this with e
i
A for a nite number of [0, 2).
To have an idea how the Schur decomposition is computed, consider rst the
power method to approximate an eigenvalue of A having the largest modulus.
Starting with an intial guess q
(0)
, the so-called power method proceeds as
for k = 1, 2, . . .
z
(k)
= Aq
(k1)
q
(k)
= z
(k)
/||z
(k)
||

(k)
= (Aq
(k)
, q
(k)
)
end
(52)
Hence, A is repeatedly applied to q
(0)
and the result is scaled to keep the
iterates to be of unit length.
To see when the power method can be expected to converge, assume A is
diagonalizable as A = XX
1
with X = [x
1
x
2
x
n
]. Suppose |
1
| >
|
2
| |
3
| |
n
|. We also assume the starting vector to be such that
a
1
= 0 in the expasion
q
(0)
= a
1
x
1
+ +a
n
x
n
. (53)
32
(If q
(0)
is randomly chosen, then this holds with probablity one.) Then, wit-
hout scaling, we have after k steps
A
k
q
(0)
= a
1

k
1
(x
1
+
n

j=2
_

1
_
k
a
j
a
1
x
j
).
Hence, q
(k)
expanded in the basis x
1
, . . . , x
n
as q
(0)
in (53) is dominantly in
the direction of x
1
in the sense that the remaining components are O(
_
|
j
|
|
1
|
_
k
).
The Schur decomposition is computed, roughly, by executing the power met-
hod simultaneously with all the columns of A.
The computation of the Schur decomposition consist of two phases. Since
the ideas involved are fundamental, we describe the main ingredients of the
scheme. Both phases require using the Householder transformations
H = I
2
v

v
vv

(54)
where v C
n
is chosen in such a way that Hx is in the direction of e
1
for
a given x C
n
. (Or an obvious modication of this.) This is achieved by
making an educated guess
v = x +e
1
, (55)
where C must be chosen so that H has the desired property. This leads
to =
(x,e
1
)
|(x,e
1
)|
||x|| and = ||x|| if (x, e
1
) = 0. Then Hx = e
1
. Note that
H is unitary and Hermitian.
All this may sound simple. In a sense it is. However, the Householder trans-
formations are fundamental building blocks for more complicated algorithms.
The rst phase of the computation of the Schur decomposition consists of the
construction of n 1 Householder transformations to have unitary matrix
Q
0
= H
n1
H
n2
H
1
(56)
such that Q
0
AQ

0
is a Hessenberg matrix.
Denition 13 A matrix H C
nn
is a Hessenberg matrix if h
jk
= 0 for
j k + 2.
Hence, a Hessenberg matrix is almost upper triangularit just can have one
extra nonzero diagonal right below the main diagonal.
To have H
1
in (56), in (55) take x = [0 a
21
a
31
a
n1
]
T
and replace e
1
with
e
2
. Then form H
1
accordingly. As a result, once we form H
1
AH

1
=

A, this
matrix has the rst column of the correct form. (An application with H

1
= H
1
from the right does not aect the rst column: Denote M = H
1
A. Then
33
MH
1
= M
2
v

v
Mvv

. Now the rst component of v is zero.) Next in (55) take


x = [0 0 a
31
a
31
a
n1
]
T
and replace e
1
with e
3
. Then form H
2
accordingly.
As a result, once we form H
2
H
1
AH

1
H

2
=

A, this matrix has the rst two


columns of the correct form.
Once complete, we have the unitary matrix (56) such that Q
0
AQ

0
= H is a
Hessenberg matrix.
Problem 28 Explain how at most n 1 Householder transformations are
needed to compute the QR factorization of A C
nn
. (This is the numerically
reliable way to compute the QR factorization.)
Next comes the second phase whose purpose is to construct unitary similarity
transformations (plus various shifts alongside to speed up the convergence to
achieve O(n
3
) complexity) with the aim at making the subdiagonal arbitrarily
small so that the result is an upper triangular matrix plus . The matrix
should be so small that it can be discarded. This is the actual QR iteration.
Once achieved, the iteration is stopped.
The method can be argued by generalizing the power method. With Q
(0)
= I
iterate according to
for k = 1, 2, . . .
Z
(k)
= HQ
(k1)
Z
(k)
= Q
(k)
R
(k)
end
(57)
This is a way to generalize the power method to matrices. Let us emphasize
that the QR factorization of Z
(k)
is needed. Mere scaling by the norms of the
columns would be doing just the same power method n times. Recall that the
QR factorization is the same as the Gram-Schmidt process started from the
left-most column. This means that everything in the direction of the columns
orthogonalized so far are projected away when the algorithm proceeds. This
is the key idea for converging to dierent eigenvalues with diering columns.
Note that if we put
H
k1
= Q

(k1)
HQ
(k1)
= Q

(k1)
(HQ
(k1)
) = (Q

(k1)
Q
(k)
)R
(k)
.
Then
H
k
= Q

(k)
HQ
(k)
= (Q

(k)
HQ
(k1)
)(Q

(k1)
Q
(k)
) = R
(k)
(Q

(k1)
Q
(k)
).
This means that one can interpret the iteration (57) such that one computes
the QR decomposition of H
k1
. Then H
k
is obtained by changing the order
of the factors. This means we can do the following QR iteration
for k = 0, 1, 2, . . .
H
k
=

Q
(k)

R
(k)
H
k+1
=

R
(k)

Q
(k)
end
(58)
34
instead. (Here we have H
0
= H.) It is this sequence H
k
which converges
to an upper triangular matrix, under sucient assumptions. In high quality
mathematical software, there are many tricks to speed up the convergence of
this basic version of the QR iteration. For the convergence, see [1, Chapter
7.3].
Problem 29 Suppose you have computed Q and T in the Schur decompo-
sition of A C
nn
. At this point you know (A). How do you compute an
eigenvector related with a given
j
(A)?
In Problem 29 the eigenvectors are immediately available if A is normal, i.e.,
when T is a diagonal matrix. Then the eigenvalue problem can numerically
reliable solved, at least when the eigenvalues are distinct. In the nonnor-
mal case, the eigenvalue problem can be extremely tough in nite precision
arithmetics.
Problem 30 A reliable computation of the eigenvalues can be tough. Sup-
pose you have performed the rst phase of the QR iteration and computed
H. Suppose your H looks like P in (45) except that the (1, n) entry is . You
are not sure if your is a result of a rounding error. If so, perhaps it should
be replaced with zero. Compute the eigenvalues (with the help of Problem
21) to see does it really make any dierence.
The generalized eigenvalue problem (16) is solved by computing the so-called
generalized Schur decomposition.
Theorem 5 Let A, B C
nn
. Then there exists unitary Q, Z C
nn
and
upper triangular T, S C
nn
such that A = QTZ

and B = QST

.
Proof. The following construction is not algorithmic. It is a generalization of
the proof of the existence of the Schur decomposition.
Consider the polynomial p() = det(AB). By the fundamental theorem of
algebra, p has at least one zero . Hence, there exist at least one generalized
eigenvector z
1
C
n
such that Az
1
= Bz
1
. Suppose it is of unit length and
Z
1
C
nn
is unitary having z
1
as its rst column. Let Q
1
C
nn
be unitary
having q
1
= Bz
1
/||Bz
1
|| as its rst column. (If Bz
1
= 0, then take any unit
vector to be q
1
.) Note that Az
1
is in the direction of q
1
.
Then A = Q
1
T
1
Z

1
, where T
1
has the rst column consisting of zeros except
the rst entry. Similarly, B = Q
1
S
1
Z

1
, where S
1
has the rst column consis-
ting of zeros except the rst entry.
This idea is repeated with the right lower (n1)-by-(n1) blocks of T
1
and
S
1
. After orthonormalizations, we obtain A = Q
1
Q
2
T
2
Z

2
Z

1
, where the rst
35
column of T
2
equals the rst column of T
1
. The second column of T
2
consists
of zeros except for the rst two entries. We obtain B = Q
1
Q
2
S
2
Z

2
Z

1
, where
the rst column of S
2
equals the rst column of S
1
. The second column of
S
2
consists of zeros except for the rst two entries.
Continue this construction to have the claim. End of proof.
Now if we have the generalized Schur decomposition available, then
Ax Bx = 0
holds if and only if
Q(T S)Z

x = 0. (59)
The generalized eigenvalues of the problem Ty = Sy are recovered imme-
diately from the diagonal entries: t
jj
s
jj
= 0 for j = 1, . . . , n. For the
eigenvectors, the techniques of Problem 29 apply.
The algorithm to compute the generalized Schur decomposition is called the
QZ iteration [1]. Observe that there are also ways to transform a generalized
eigenvalue problem into a standard eigenvalue problem. For instance, if B
is invertible, then the generalized eigenvalue problem (16) is equivalent to
solving the standard eigenvalue problem
Mx = x
with M = B
1
A. A possible source of numerical problems in equivalence
transformations like this is the possible ill-conditioning of B. If the condition
number
(B) = ||B||||B
1
||
of B is very large, then B is said to be ill-conditioned. As opposed to this,
in (59) all the appearing equivalence transformations are unitary matrices.
Problem 31 Let A, B C
nn
. Give an algorithm for computing unitary
Q, Z such that Q

AZ is a Hessenberg matrix and Q

BZ upper triangular.
Since everything in numerical computations is subject to perturbations, it is
informative to have a quantitative estimate of how eigenvalues behave then.
The distance of a point x from a set Y is dened as (x, Y ) = inf
yY
||xy||.
Then the distance of the set X from Y is dened as
(X, Y ) = sup
xX
(x, Y ).
The distance between X and Y is d(X, Y ) = max{(X, Y )(Y, X)}. The
following result is called the Bauer-Fike theorem.
36
Theorem 6 Suppose A C
nn
is diagonalizable as A = XX
1
. Then
((A+E), (A)) (X)||E||.
Proof. Suppose (A + E). If (A), then the claim holds. So let us
assume (A). Then
(I (A))
1
X
1
(I AE)X = I (I (A))
1
X
1
EX
is singular. This forces
15
1 ||(I (A))
1
X
1
EX|| ||(I (A))
1
||||X
1
||||E||||X||.
Since I (A) is diagonal, we obtain
||(I (A))
1
|| = max
j
1
|
j
(A)|
=
1
min
j
|
j
(A)|
which yields the claim. End of proof.
This means that best stability results can be expected in the normal case, i.e.,
when X can be chosen to be unitary. Otherwise, (X) is a good measure of
the reliability of computations. Of course, a nice thing about the eigenvalue
computations is that you can always compute the so-called residual ||A x

x|| to check how accurate eigenpair approximations



and x you have
generated.
There are also perturbation results for the non-diagonalizable case, but the
bounds are very weak. The reason for this can be seen in Problem 30.
Problem 32
16
The trace of A C
nn
is dened as tr(A) =

n
j=1
a
jj
. Show
that
p() = det(AI) = (1)
n

n
+ (1)
(n1)
tr(A)
(n1)
+ + det(A).
Conclude (by using the Schur decomposition) that tr(A) is the sum of the
eigenvalues of A. Consequently, the sum of the eigenvalues depends conti-
nuously on A very nicely.
Once there is a way to solve the eigenvalue problem, it allows computing
functions of matrices economically. So far we have dealt with polynomials in
a matrix A dened as
p(A) =
k

j=0
a
j
A
j
(60)
15
Recall that if ||M|| < 1, then (I M)
1
exists since (I M)
1
=

k=0
M
k
. This is
called the Neumann series of M.
16
Alhough we have not shown it, the eigenvalues depend continuously on the matrix.
This problem is related with a closely related result.
37
for any polynomial p(z) =

k
j=0
a
j
z
j
. Polynomials in matrices are of impor-
tance in connection with iterative methods, although they are not formed
explicitely.
For more complicated functions, the most important example is the expo-
nential of A dened analogously by replacing the variable z with A to have
e
A
=

j=0
A
j
j!
. (61)
Typically the exponential appears in solving dierential equations with time
dependence. Then it is used by applying it as
t e
At
b (62)
where t > 0 and b C
nn
. It solves the intial value problem x

= Ax with
x(0) = b. When resulting from discretizing time dependent PDEs, n is very
large. This is important for the following reason. In (62), the exponential itself
is not needed, just matrix-vector products with it when t varies. Therefore,
you do not want to compute the exponential e
At
, if somehow it can be avoided.
(Compare with solving a linear system. Then A
1
is not needed, only A
1
b.)
The most general denition is given by using complex analysis and the Cauc-
hy integral formula as follows. If f is analytic in a neighborhood of (A),
then
f(A) =
1
2i
_

f()(I A)
1
d, (63)
where is a simple Jordan curve surrounding the eigenvalues conterclockwise
(and included in the domain of denition of f). This is easy to accept by
considering a diagonalizable matrix A = XX
1
. Then we can factor using
the similarity to have
f(A) = X
1
2i
_

f()(I )
1
dX
1
(64)
For the integral this means using the standard complex analysis on the dia-
gonal, i.e., the resulting diagonal matrix is D(f) = diag(f(
1
), . . . , f(
n
)).
Thereafter perform the similarity to get f(A) = XD(f)X
1
. In particular,
we see that if f is a polynomial or the exponential function, then f(A) agrees
with our earlier denitions (60) and (61). The usage of (64) requires that A
can be reliably diagonalized. This is not always a realistic assumption.
The above arguments prove the spectral mapping theorem
(f(A)) = f((A))
in the diagonalizable case.
38
Example 17 Let A C
nn
be skew-Hermitian, i.e., A

= A. Then (I
A)(I +A)
1
is the so-called Cayley transform of A. It follows from (64) that
the Cayley transform is unitary.
Example 18 Let A C
nn
be skew-Hermitian. Then in the numerical sol-
ving of the time dependent Schrdinger equation one needs to apply e
At
with
t > 0 to a vector. Also now e
At
is unitary.
In practice it depends on f how f(A) should be computed. For the exponen-
tial function, the method of choice is the so-called squaring and scaling al-
gorithm [1]. (Any high quality mathematical software, such as Matlab, has
an implementation of this algorithm). The squaring and scaling algorithm is
not based on diagonalizing A.
Problem 33 (This problem is from [1, Chapter 11].) Let A C
nn
be
Hermitian positive denite. (a) Show that there exists a unique Hermitian
positive denite X such that A = X
2
. (b) Show that if X
0
= I and X
k+1
=
(X
k
+ AX
1
k
)/2, then X
k

A with quadratic speed, where

A denotes
the matrix X of (a).
9 Iterative methods
Next we consider ways to solve the linear system
Ax = b (65)
for an invertible A C
nn
and b C
n
given when n is large. By large is
meant that direct methods such as the LU factorization of O(n
3
) complexity
and O(n
2
) storage are not acceptable. Thus n is of order O(10
4
) at least. It
could be of order O(10
7
) or even O(10
8
).
Practically all the problems considered so far in these lecture notes appear
for n large. The large scale eigenvalue problem is encountered often. So is the
task of computing the SVD. (Or, rather, some singular values and possibly
related singular vectors.) Also the problem of applying the exponential of a
very large matrix to a vector arises in practice.
Instead of direct methods, in large scale problems one executes iterative met-
hods. Iterative methods require a dierent mindset compared with when di-
rect methods are used. By iterative methods we mean algorithms which are
not based on factoring the coecient matrix A. In iterative methods one uses
information based on matrix-vector products and then constructs approxi-
mations to the solution of (65). A rule of thumb is that a single iteration step
39
should not cost more than O(n) or O(nlog n) oating point operations. The
approximations (hopefully) improve step by step untill sucient accuracy is
reached. Very seldom one needs approximations in full machine precision.
Something like four or six correct digits may well suce.
Since a single iteration step should not cost more than O(n) or O(nlog n)
oating point operations, any of the performed matrix-vector products can-
not cost more than this. It means that A is either sparse, or has some very
specic structure which makes this possible (like some sort of Toeplitz-like
structure). Sparse means that A has only O(n) nonzero entries. Sparse mat-
rices are not stored like full matrices, i.e., zeros are not stored. The location
and the values of the entries are stored instead.
17
Matrix-vector products are
programmed such that no zero multiplications are performed.
If there is an option to model a problem either with a dierential equation
or with an integral equation, the former gives rise to sparse matrices, after
discretizing.
Iterative methods are not guaranteed to converge, i.e., it may take far too
many iterations to achive the required accuracy. In other words, although
a single itration step is assumed to be inxpensive, we cannot aord to take
them indenitely. In fact, in serious applications, the basic iterative methods
typically do not converge fast enough. Fortunately, there are ways to speed
up the convergence by so-called preconditioning. It can be said that the task
of solving a large scale problem depends on how succesfully the problem can
be preconditioned.
In large scale problems one still needs very much direct methods, although
more indirectly. During the iteration is executed, direct methods are used for
very small sub-problems at every iteration step. With the so-called optimal
iterative methods one needs to solve least squares problems.
In the least squares problem one is concerned with a linear system
My = c (66)
for M C
kj
and c C
k
given such that k j. It is assumed that the
columns of M are linearly independent. Since the problem is overdetermined,
i.e., there are more equations than variables, (66) is rarely solvable. (Clearly,
the problem solvable if only if c is in the column space of M.
18
) Therefore it
is of interest to solve the problem in the least squares sense by solving
min
yC
j
||My c||
2
(67)
17
Sparse matrix storage means that A is located in memory in a nonstandard way. This
has some consequences. For example, performing matrix-vector products with A

can be
very costly, i.e., slow.
18
Recall that the column space of M is the subspace {My : y C
j
} of C
n
, i.e., the set
of all linear combinations of the columns of M.
40
instead. This is easily accomplished with the help of the orthogonal projector
P onto the column space of M. This is achieved by computing the reduced
QR factorization of M = QR of M, where Q C
kj
having orthonormal
columns spanning the columns space of M and R C
jj
upper triangular.
19
(Do this either with the help of Problem 3 or 28. For numerical stability,
the latter alternative is recommended.) Then P = QQ

. Consequently, (67)
equals
min
yC
j
||My QQ

c||
2
+||(I QQ

)c||
2
. (68)
Since ||(I QQ)c|| is a constant, it suces to concentrate on
min
yC
j
||MyQQ

c||
2
= min
yC
j
||QQ

MyQQ

c||
2
= min
yC
j
||Q(RyQ

c)||
2
. (69)
Since the columns of M were assumed to be linearly independent, the value
of this latter minimization problem is zero by choosing y = R
1
Q

c. This
also solves (67). (Gauss invented a way to solve the least squares problem.
The above is a numerically correct way of doing it.)
Least squares problems appear frequently and just in connection with itera-
tive methods.
Problem 34 In Problem 23 you had square Vandermonde matrices because
you did interpolation. Explain what kind of rectangular Vandermonde mat-
rices you obtain when you have to t a polynomial of degree j 1 in the
least squares sense to go approximately through the points (x
1
, y
1
),...,(x
k
, y
k
).
Find the best cubic least squares t to
(0, 0.486), (0.15, 1.144), (0.30, 1.166), (0.45, 1.095),
(0.60, 1.099), (0.75, 1.117), (0.9, 1.38), (1.05, 1.857).
Use Matlab (and its built in reduced QR factorization function) and plot
your curve against the points.
With this tool available, we can now introduce the so-called GMRES met-
hod (generalized minimal residual method) for iteratively solving the linear
system (65). The reasoning behind the GMRES method is largely based on
the equality (23), i.e., we know there exists a polynomial p such that
Ap(A)b = b
and therefore x = p(A)b. Since n is very large, constructing this polyno-
mial is completely unrealistic. Instead, one uses lower degree polynomials to
19
This is the reduced QR factorization of M. The other choice is M =

Q

R with Q C
kk
unitary and

R = [R
T
0]
T
C
kj
.
41
get approximations through solving the arising minimization problems. the
optimality This means that at the jth step one solves
min
deg(p
j1
)j1
||Ap
j1
(A)b b|| (70)
to have the corresponding approximation as x
j
= p
j1
(A)b. This is realistic
only for j n. Here deg(p) denotes the degree of a polynomial p. Of course,
one cannot expect to have zero with (70) since j is small.
Example 19 Suppose A can be split as A = I B with ||B|| < 1. Then, by
using the truncation

j1
k=0
B
k
=

j1
k=0
(AI)
k
of the Neumann series
A
1
= (I B)
1
=

k=0
B
k
(71)
we have
min
deg(p
j1
)j1
||Ap
j1
(A)b b|| ||AI||
j
||b||.
(Note that it is easy to implement the truncation of the Neumann series as an
iterative method. It can be applied in an abstract setting such as on Banach
spaces. Like the Neumann series, however, it cannot be expected to converge
in general.)
In realistic problems, it is very rare that A could be split so that the Neumann
series would be of any use. Therefore one needs a numerically reliable way
to solve (70) to have the GMRES method.
Observe that the minimization problem (70) is equivalent to solving (67)
with
M = A[b Ab A
j1
b] = [Ab A
2
b A
j
b] C
nj
and c = b. Then the approximation is x
j
=

j1
k=0
y
k+1
A
k
b. Numerically this
is an unstable approach since the columns of M can be expected to be almost
linearly dependent. (Recall what happens in the power method.)
To proceed numerically reliably, dene the associated Krylov subspace
K
j
(A; b) = {b, Ab, A
2
b, . . . , A
j1
b} (72)
of A at b. To compute an orthonormal basis of K
j
(A; b) in a numerically
stable way, proceed as follows. Set q
1
= b/||b||. Then compute Aq
1
and ort-
honormalize this vector against q
1
to have
h
21
q
2
= Aq
1
h
11
q
1
,
where h
11
= (Aq
1
, q
1
) and h
21
= (Aq
1
h
11
q
1
)/||Aq
1
h
11
q
1
||. The unit
vector q
2
is orthogonal against q
1
This is done by executing the modied
42
Gram-Schmidt process.
20
Then compute Aq
2
and orthonormalize this vector
against q
1
and q
2
to have
h
32
q
3
= Aq
2
h
12
q
1
h
22
q
2
.
The coecients h
12
, h
22
and h
32
are computed by executing the modied
Gram-Schmidt process. The unit vector q
3
is orthogonal against q
1
and q
2
.
Hence, the kth iterate is computed according to
h
k k1
q
k
= Aq
k1

k1

l=1
h
l k1
q
l
, (73)
where the coecents h
l k1
are computed by executing the modied Gram-
Schmidt process such that Aq
k1
is orthogonalized against the orthonormal
vectors q
1
, . . . , q
k1
computed so far. After j steps we have computed an
orthonormal basis q
1
, . . . , q
j
of K
j
(A; b) assuming its dimension is j. The
algorithm described is the so-called Arnoldi method.
The computations satisfy the matrix identity
AQ
j
= Q
j+1

H
j
, (74)
where q
j
= [q
1
q
2
q
j
] C
nj
has orthonormal columns and

H
j
= {h
st
}
C
(j+1)j
has a Hessenberg-like structure. The entries of the k 1th column
of

H
j
are obtained from the identity (73). Using this means that (70) can be
written as
min
y
j
C
j
||AQ
j
y
j
b|| = min
y
j
C
j
||Q
j+1

H
j
y
j
b|| = min
y
j
C
j
||

H
j
y
j
e
1
||, (75)
where = ||b||. Solving this (small) least squares problem is straighforward.
Once done, we obtain the approximate solution as x
j
= Q
j
y
j
.
Problem 35 How many inner products are required to have the factoriza-
tion (74)?
Problem 36 Suppose at the jth step the dimension of the Krylov subspaces
ceases to grow, i.e.,
j = dimK
j
(A; b) = dimK
j+1
(A; b). (76)
Show that then (74) can be written as AQ
j
= Q
j
H
j
, where H
j
C
jj
is a
Hessenberg matrix. If j = n and A is Hermitian, what can you say about
H
n
? When taken into account, how many inner products are needed then to
have the factorization (74)?
20
The modied Gram-Schmidt process is a more accurate version of the standard Gram-
Schmidt process. In exact arithmetic the results coincide.
43
Problem 37 If (76) holds, then K
j
(A; b) is said to be an invariant subspace
of the matrix A. This means that
AK
j
(A; b) K
j
(A; b)
holds. How are the eigenvalues and corresponding eigenvectors of H
j
related
with those of A?
Denote by r
j
= b Ax
j
the residual at the jth step. Since K
j
(A; b)
K
j+1
(A; b), we have
||r
j
|| ||r
j+1
|| (77)
for every j. In this sense the GMRES method is optimal and improves the
approximation at every step.
Regarding the computational cost of the GMRES method, the Arnoldi ite-
ration, i.e., the construction of an orthonormal basis of K
j
(A; b) dominates
the complexity. It requires computing j 1 matrix-vector products. Then,
the Gram-Schmidt process requires computing inner-products. The cost of
solving (75) is negligible.
The storage required by the GMRES method grows linearly with the iteration
number. It consists, in essence, of the need to store the orthonormal basis
q
1
, . . . , q
j
. (Of course, A must be stored as well.) The storage required by

H
j
is negligible.
It is primarily the storage constraints which force j to be small. This means
that the GMRES method must be restarted after certain number of steps.
This gives rise to the so-called restarted GMRES(j) method. It is based on
the observation that we may well use an initial approximation x
0
to the
solution x of (65). Then consider the residual r
0
= b Ax
0
. If the residual is
not small enough, execute the GMRES method to solve the linear system
Aw = r
0
. (78)
Clearly, with any approximation w
j
to w we have
Aw
j
r
0
= b Ax
0
A(x
0
+w
j
) b.
Therefore we are actually solving the orginal problem (65). In the restarted
GMRES(j) method one always takes x
0
to be the best approximation com-
puted so far, after j steps of the standard GMRES method with the previous
r
0
have been executed.
For the speed of convergence of the GMRES method, assume A is diagona-
lizable as A = XX
1
. Then
min
deg(p
j1
)j1
||Ap
j1
(A)b b|| = min
deg(p
j1
)j1
||X(p
j1
() I)X
1
b||
44
min
deg(p
j1
)j1
||p
j1
() I||(X)||b||.
Since p
j1
() I is a diagonal matrix, the appearing minization problem
can written as
min
deg(p
j1
)j1
||p
j1
() I|| = min
deg(p
j1
)j1
max

|p
j1
() 1|.
21
(79)
This is a polynomial approximation problem in the complex plane. Its beha-
viour describes well how fast the GMRES method converges, assuming (X)
is not large. Of course, one hardly has any experience on such approxima-
tion problems. Still, we may immediately conclude that if a diagonalizable A
has only at most j distinct eigenvalues, the GMRES method yields the exact
solution x = p(A)b after j steps. That is, (79) equals zero which is seen by
choosing p
j1
(
k
) =
1

k
for k = 1, . . . , j, where
1
, . . . ,
j
denote the distinct
eigenvalues. (You can nd p
j1
by the Lagrange interpolation, for example.)
Problem 38 Show that for a normal A whose eigenvalues have real parts
greater than or equal to the imaginary parts, the restarted GMRES(2) met-
hod converges. (This problem is from [2, p.47].)
How (79) is then of use? A rule of thumb is as follows. Let X consist of a
small number l points such that the distance d(X, ) is small (in other words,
is well approximated with a small number of points), then one can expect
the GMRES to converge fast. And conversely, if is widely spread and, in
particular, surrounds the origin, very slow convergence can be exected. In
the latter case one needs to precondition the linear system.
One should be aware that if (X) is very large (= if A is not diagona-
lizable), then it is not an easy task to estimate the speed of convergence of
the GMRES method in terms of some properties of A.
Problem 39 Suppose A = P, where P is the permutation (45). Recall that
you know the eigenvalues of A. Let b = (1, 0, 0, . . . , 0). Compute by hand
the GMRES approximations x
j
for this problem. (This is the worst case
behaviour for the GMRES method.)
The fact that there are k sum terms on the right hand side of (73) means that
the so-called length of the recurrence of the GMRES method grows linearly
with the iteration number. This is an equivalent way of expressing that the
work and storage grows linearly with the iteration number. So the question
arises, is such a growth avoidable? The answer is yes, if A is Hermitian (or
more generally, normal). To see this, we have
h
l,k1
= (Aq
k1
, q
l
) = (q
k1
, Aq
l
)
21
Often this is expressed as min
deg(p)j, p(0)=1
max

|p() 1|.
45
if A is Hermitian. Now Aq
l
is a linear combination of q
1
, . . . , q
l+1
. Therefore,
if l + 1 < k 1, i.e., l < k 2 we have zero on the (k 1)th column of

H
j
.
This means that
h
k k1
q
k
= Aq
k1

k1

l=1
h
l k1
q
l
= Aq
k1

k1

l=k2
h
l k1
q
l
(80)
in exact arithmetic. Therefore

H
j
can be computed by storing only the three
most recently computed orthonormal basis vectors.
22
In particular,

H
j
is tri-
diagonal. In this special case the Arnoldi method is called the Hermitian
Lanczos method.
How can this then be used in computing the GMRES approximation x
j
=
Q
j
y
j
when only the last three columns of Q
j
are kept stored? The solution
lies in
min
deg(p
j1
)j1
||Ap
j1
(A)b b|| = min
deg(p
j1
)j1
||p
j1
(A)Ab b|| (81)
which holds since A and p(A) commute. The latter minimization problem
can be solved by executing the Arnoldi method with the starting vector
q
1
= Ab/||Ab||. The value of the minimum is obtained with
v
j
=
j

l=1
(b, q
l
)q
l
. (82)
This can be computed such that at each step k, for k = 2, 3, . . ., an up-date
is made according to
v
k
= v
k1
+ (b, q
k
)q
k
,
with v
1
= (b, q
1
)q
1
. Then, because of (80), we obtain (82) by storing only
three most recently computed basis vectors q
k2
, q
k1
and q
k
.
The right-hand side of (81) is realized with v
j
, i.e., it equals ||v
j
b|| =
||AA
1
v
j
b||. The GMRES approximation is hence
x
j
= A
1
v
j
=
j

l=1
(b, q
l
)A
1
q
l
.
Denote q
l
= A
1
q
l
. Of course, we are not allowed to apply the inverse and
therefore the iteration must devised in another way. This is possible because
the starting vector was q
1
= Ab/||Ab||. Thus, save q
1
= A
1
q
1
= b/||Ab||
instead. We have in the above Arnoldi method
h
21
q
2
= Aq
1
h
11
q
1
= Aq
1
(Aq
1
, q
1
)q
1
22
This is quite remarkable. The appearing iterates are orthogonal against practically all
the vectors computed so far, except three.
46
or, equivalently,
h
21
q
2
= q
1
h
11
q
1
= q
1
(Aq
1
, q
1
) q
1
with q
1
= A q
1
. Hence, we compute q
2
instead. Then q
2
is recovered by q
2
=
A q
2
. At the kth step this means computing
h
k k1
q
k
= q
k1

k1

l=k2
h
l k1
q
l
(83)
by using the three most recently computed q
k2
, q
k1
and q
k
. In these com-
putations use q
k2
= A q
k2
, q
k1
= A q
k1
, and q
k
= A q
k
to have h
l k1
=
(Aq
k1
, q
l
). As a result, the approximation x
j
is obtained by making an up-
date according to
v
k
= v
k1
+ (b, q
k
) q
k
, (84)
at each step k, for k = 2, 3, . . . by starting it with v
1
= (b, q
1
) q
1
. When the
GMRES method is implemented in this way, the algorithm obtained is called
the MINRES method. It can be executed when A is Hermitian.
Problem 40 Consider B = I + A with C and nonzero C. Then
show that
K
j
(A; b) = K
j
(B; b)
for every j. (This can be used to show that the MINRES actually extends to
matrices which are translations and rotations of Hermitian matrices.)
The conjugate gradient (CG) method can be derived similarly. The CG met-
hod is applicable when A is Hermitian positive denite. It is concerned with
the error of the approximation by solving
min
min
deg(p
j1
)j1
||p
j1
(A)b A
1
b||
2
A
, (85)
where ||x||
A
= (x, Ax)
1/2
denotes the so-called A-norm of a vector x C
n
.
Since ||p
j1
(A)b A
1
b||
2
A
= ||Ap
j1
(A)b b||
2
A
1
, this minimization problem
converts into involving the residual which can be written as
min
min
deg(p
j1
)j1
||p
j1
(A)Ab b||
2
A
1 (86)
by repeating the trick (81). This means executing the Arnoldi method with
respect to the inner product
(x, y)
A
1 = (x, A
1
y)
for x, y C
n
. The point again is that the inverse of A is never explicitely
needed since the starting vector for solving (86) is q
1
=
Ab
||Ab||
A
1
=
Ab
(Ab,b)
1/2
.
47
Thereby everything proceeds as with the MINRES method, i.e., rst q
1
=
A
1
q
1
=
b
(Ab,b)
1/2
is stored instead. Then, comletely analogously, vectors q
l
=
A
1
q
l
are stored instead. This means that (82) becomes
v
j
=
j

l=1
(b, q
l
)
A
1q
l
=
j

l=1
(b, q
l
)q
l
(87)
so that the CG approximation is x
j
= A
1
v
j
=

j
l=1
(b, q
l
) q
l
. Now it just
remains to verify that this is realizable with a three term recurrence.
Iterative methods which can be realized in terms of a short term recurrence
have been studied a lot. The CG (conjugate gradient) method was the earliest
one. The above MINRES method (or one implementation of it) dates from
the 1970s. These methods can be applied more generally when (65) is applied
from the left with A

to have the linear system


A

Ax = A

b. (88)
Now the coecient matrix is A

A which is Hermitian positive denite since


A was assumed to be invertible. These are the so-called normal equations.
The problem with this is that the CG and MINRES methods can be expec-
ted to converge very slowly for the normal equations in realistic problems.
(And, of course, we must be able to perform matrix-vector products with A

inexpensively.) Therefore, as a rule, using the normal equations should be


avoided, at least when dierential equations are being discretized.
Iterative methods applied to (65) when A has no special properties cannot
be realized in terms of short term recurrence if (77) should hold for the
corresponding residuals. Methods such as BICGSTAB rely on a short term
recurrence at the cost of possibly a very irreqular behaviour of the norms of
the residuals. We do not consider any of these methods in this course.
For more details on iterative methods, see [2].
10 Preconditioning
Typically dierent iterative methods for solving the linear system (65) behave
quite similarly. To put in other words, if an iterative method converges slowly,
it is unlikely that another iterative method would drastically dier in speed.
(Bear also in mind what happened in Problem 39.) In situations like this it is
necessary to speed-up the iterations, i.e., to precondition the linear system.
This is needed in most cases.
Preconditioning means a construction of an invertible matrix M C
nn
which multiplies the original linear system (65) from the left so as to have
MAx = Mb = c. (89)
48
This is called left preconditioning. There is an analogous way to precondition
from the right. The purpose of preconditioning is to obtain a linear system
(with the same solution as the orginal one) for which iterative methods con-
verge substantially faster than for the original one.
To construct M there is a balance. The goal is certainly clear, M should
approximate the inverse of A. However, the cost and storage required to have
M = A
1
is overwhelming and therefore completely out of the question. One
should thus somehow compute an inexpensive approximation to the inverse
of A.
In practice, M may not appear explicitely. The reason for this is that one
only needs to be able to perform matrix-vector products with it. (Recall that
the purpose of the partial pivoted LU factorization is not to compute A
1
explicitely since only matrix vector-products with A
1
are needed.) Because
of these reasons, one does not explicitely form the product MA either.
For a Hermitian A one would also like to devise preconditioning in such a
way that one could still use the CG of MINRES method for the preconditio-
ned problem. (Note that if A is Hermitian, then MA is typically not.) See
Example 21 below to see how this can be achieved.
The simplest (and rst) approach to preconditioning consists of extracting
a part of A which readily invertible.
23
This can be expressed in terms of
splitting A as
A = M
1
+ (AM
1
) = M
1
+M
2
, (90)
where M
1
is readily invertible. Then we have MA = I + M
1
1
M
2
with
M = M
1
1
. Especially, M
1
1
M
2
is small in norm, we can estimate the speed
of convergence of iterations; see Example 19. Some classical choices for M
follow.
The Jacobi preconditioner is dened as
M
1
= diag(A) +I
with some C such that = 0 if diag(A) is invertible. (Here diag(A)
denotes the diagonal matrix obtained by extracting the diagonal of A.) The
Jacobi preconditioner is too simple to lead to any signicant speed-ups.
By the Gauss-Seidel preconditioner is meant a splitting (90) where M
1
and
M
2
are lower and upper triangular. (Or vice versa.) This choice is unique
aside from the diagonal, which can be split in several ways.
24
Applying M
relies on forward substitution. If A is sparse with O(n) elements, the cost
23
By readily invertible we mean that matrix-vector products with the inverse are inx-
pensive.
24
Any particular choice, such as the so-called successive over-relaxation (SOR), is simply
a variant of the Gauss-Seidel method.
49
is O(n) when implemented appropriately. Again, the speed of convergence
depends on M
1
1
M
2
.
Problem 41 For any matrix B C
nn
it can be shown that
lim
k
||B
j
||
1/j
exists and equals the spectral radius of B, i.e., maximum of the absolute
values of the eigenvalues of B.
25
Consider a preconditioned matrix I B
with B = M
1
1
M
2
. Explain why the Neumann series (see Example 19) still
can be used to have inverse of I B if the spectral radius of B is less than
one. Give an example of a matrix B such that ||B|| > 1 and the spectral
radius of B is less than one.
Problem 42 In the early are of iterative methods, one used the truncated
Neumann series to solve the preconditioned linear system (I B)x = c.
(This meant that the spectral radius of B had to be less than one.) How much
storage does this require when executed by the up-dating the approximation
technique?
Problem 43 Show that if A is strictly diagonally dominant, then the stan-
dard Gauss-Seidel iteration converges.
As a rule, the prescribed preconditioning ideas are based on a description of
a matrix subspace V whose invertible elements allow a rapid application of
the inverse. Then one solves
min
V V
||AV ||, (91)
or some reasonable variant of it, and takes V
1
to be the preconditioner M.
Example 20 A practical extension of diagonal matrices consists of taking
V C
nn
to be the set of block-diagonal matrices. The blocks are of size O(1)
and therefore V
1
can be computed, possibly in parallel, for any invertible
V V.
Problem 44 Let V be the set of circulant matrices. How do you solve (91)
in the Frobenius norm?
Next we suppose A C
nn
is sparse. Let W and V be two standard sparse
nonsingular matrix subspaces of C
nn
. (Recall Denition 4.) A matrix subs-
pace is said to be sparse if the union of the sparsity patterns of its elements
25
This is the so-called Gelfand formula.
50
is sparse. Standard means the sparsity patterns determine the dimensions of
W and V. It is assumed that the nonsingular elements of V allow a rapid
application of the inverse.
Under these assumptions, consider the minimization problem (30). Regarding
constraints, we consider
min
WW, V V
||AW V ||
F
(92)
when the columns of either W or V being constrained to be of xed norm. The
task is thus to precondition A with W from the right with the aim at having a
matrix AW which can be well approximated with an easily invertible matrix
V . Then rapid convergence of iterations can be expected if the conditioning
of V is reasonable. Unlike in the usual paradigm of preconditioning, iterations
are not expected to converge rapidly for AW. They should converge rapidly
for AWV
1
.
Consider the minimization problem (92) under the assumption that the co-
lumns of V are constrained to be unit vectors, i.e., of norm one. Based on
the sparsity structure of W and the corresponding columns of A, the aim is
at rst choosing V optimally. Thereafter W is determined optimally.
To describe the method, denote by w
j
and v
j
the jth columns of W and V .
The column v
j
is computed rst as follows. Assume there can appear k
j
n
nonzero entries in w
j
at prescribed positions and denote by A
j
C
nk
j
the
matrix with the corresponding columns of A extracted. Compute the sparse
QR factorization
A
j
= Q
j
R
j
(93)
of A
j
. Sparse QR factorization means that since A
j
has just a few nonzero
entries per column, Q
j
is also sparse by the fact that k
j
is small. (Why Q
j
is
also sparse? Recall that the lth column of Q
j
is a linear combination of the
rst l columns of A
j
.)
Remember that the columns of Q
j
yield an orthonormal basis of the column
space of A
j
. Assume there can appear l
j
n nonzero entries in v
j
at prescri-
bed positions and denote by M
j
C
k
j
l
j
the matrix with the corresponding
columns of Q

j
extracted. Then v
j
, regarded as a vector in C
l
j
, of unit norm
is computed satisfying
||M
j
v
j
|| = ||M
j
|| , (94)
i.e., v
j
is chosen in such a way that its component in the column space of A
j
is as large as possible. This can be found by computing the singular value
decomposition of M
j
. (Its computational cost is completely marginal by the
fact that M
j
is only a k
j
-by-l
j
matrix.)
Suppose the column v
j
has been computed as just described for j = 1, . . . , n.
51
Then solve the least squares problems
min
w
j
C
k
j
||A
j
w
j
v
j
||
2
(95)
to have the column w
j
of W.
For each pair v
j
and w
j
of columns, the computational cost consists of com-
puting the sparse QR factorization (93) and, by using it, solving (94) and
(95). For the sparse QR factorization there are codes available. (Now A
j
has
the special property of being very tall and skinny.)
For simplicity, consider the case of V = CI. Moreover, let W be standard
matrix subspace having only O(1) nonzero entries in each column. Then, by
constraining the columns of V to be of unit length, the problem reduces to
considering
min
WW
||AW I||
2
F
(96)
for which there exists a freely available software to solve the problem [3].
The resulting matrix W is called a sparse approximate inverse of A, i.e, an
approximation to A
1
. It is instructive to compare that, as opposed to this,
(91) yields an approximation to A.
Since W is sparse such that each column contains only O(1) entries at presc-
ribed positions, the problem separates into minimizing
n

j=1
||Aw
j
e
j
|| =
n

j=1
||A
j
w
j
e
j
||. (97)
Here A
j
C
nk
j
, where A
j
consists of those columns of A at which the
column w
j
is allowed to have nonzero entries. A single minimization problem
min
w
j
C
k
j
||A
j
w
j
e
j
|| (98)
is solved by computing the sparse QR factorization A
j
= Q
j
R
j
of A
j
.
Observe that one is free to set the sparsity structure of W freely. It is the
tough part of this aproach: how to choose it wisely?
Problem 45 Suppose there are exactly 5 nonzero entries in each column of
A C
nn
. Suppose each column of W is allowed to have 7 nonzero entries
at some prescribed positions. How sparse is the last column of Q
j
? Assume
you manage to program the solving (98) such that no zero entries are used
in the computations. How much does it cost to compute W?
The computation of a sparse approximative inverse parallelizes very well
since the minimization problems (98) can be solved independently from each
52
other. Good parallelizability seems to be getting more and more important
in the future. A drawback of the approach stems from the observation that
the inverse of a sparse matrix can be expected to be dense in general. This
means that the approach is not entirely feasible. The denseness is related
with the fact that the inverse of A is a polynomial in A, and each power A
j
is typically denser than preceding powers of A.
Next we consider a preconditioning technique based on approximating an
algorithm
26
instead of approximating a matrix (such as A or its inverse).
Preconditioning by the so-called incomplete factorizations means simplifying
standard algorithms, such as the Gaussian elimination for the LU facto-
rization, to get factorizations which are inexpensive to compute and require
little storage. The resulting factorizations are approximate. However, unlike
preconditioners introduced above, they do not satisfy any optimality con-
ditions. Rather, they are based on approximately executing the original
algorithm by replacing it with a simpler one.
For the basic incomplete LU factorization, also called ILU(0), this means
the following changes in the Gaussian elimination. Nonzero entries in the L
and U factors are accepted at those positions only where there are nonzeros
in the sparsity structure of A.
27
Other appearing entries are discarded and
set to equal zero. The reason for this approach is the fact that even if A
is sparse, the (partially pivoted) L and U factors are considerable more,
typically unacceptably, dense.
The method is best illustrated with a small example. Let
A =
_

_
1 1 1 1
2 1 0 3
1 0 1 0
0 0 4 1
_

_
.
Then
L
1
A =
_

_
1 0 0 0
2 1 0 0
0 0 1 0
0 0 0 1
_

_
_

_
1 1 1 1
2 1 0 3
1 0 1 0
0 0 4 1
_

_
=
A
1
+R
1
=
_

_
1 1 1 1
0 1 0 1
1 0 1 0
0 0 4 1
_

_
+R
1
.
26
Algorithm here means a given function of the entries of A. Approximation is not
quite correct expression here. Rather, the function is simply replaced with a simpler one,
somewhat ad hoc.
27
Diagonals are also included, in case there are zero entries on the diagonal of A.
53
Discard R
1
. Thereafter
L
2
A
1
=
_

_
1 0 0 0
0 1 0 0
1 0 1 0
0 0 0 1
_

_
_

_
1 1 1 1
0 1 0 1
1 0 1 0
0 0 4 1
_

_
A
2
+R
2
=
_

_
1 1 1 1
0 1 0 1
0 0 2 0
0 0 4 1
_

_
+R
2
.
Discard R
2
. Finally
L
3
A
2
=
_

_
1 0 0 0
0 1 0 0
0 0 1 0
0 0 2 1
_

_
_

_
1 1 1 1
0 1 0 1
0 0 2 0
0 0 4 1
_

_
= A
3
+R
3
= U +R
3
=
_

_
1 1 1 1
0 1 0 1
0 0 2 0
0 0 0 1
_

_
+R
3
.
This yields the approximate upper triangular factor U. Discard R
3
and set
L = (L
3
L
2
L
1
)
1
=
_

_
1 0 0 0
2 1 0 0
1 0 1 0
0 0 2 1
_

_
to have the approximate lower triangular factor L. The union of their sparsity
structures coincides with that of A.
This was ILU(0). Allowing more ll-in in the approximate factors leads to
better (but more costly) ILU preconditioners. ILU preconditioning techniques
have been developed to high sophistication and there are high quality softwa-
re available for computing ILU preconditioners. The poor parallelizability is
the most serious drawback of these algorithms.
Example 21 Regarding the stucture, for Hermitian problems one certainly
tries to devise preconditioners which retain the structure. Then ILU(0) yields
an incomplete Cholesky factorization LL

. As a result, the original linear


system (65) converts into
L
1
AL

x = L
1
b. (99)
(Of course, inversions are not performed explicitely.) Now L
1
AL

is still
Hermitian since A was. Approximations x
j
to (99) can still be computed
with the CG or MINRES method. Then x
j
= L

x
j
an approximation to the
original problem.
54
The preconditioners discussed above were algebraic in the sense they require
only having A. No information concerning where A originates from was used.
For PDEs in particular, there are so-called multigrid and domain decompo-
sition methods to produce preconditioners. These techniques use properties
such as how the PDE has been discretized; see, e.g., [2]. Although they are
important, these methods are not covered in this course.
Viitteet
[1] G.H. Golub and C.F. van Loan, Matrix Computations, The Johns
Hopkins University Press, the 3rd ed., 1996.
[2] A. Greenbaum, Iterative Methods for Solving Linear Systems, SIAM,
Philadelphia, 1997.
[3] SPAI, http://www.computational.unibas.ch/software/spai/
[4] L.N. Trefethen and D. Bau, III, Numerical Linear Algebra, SIAM
Philadelphia, 1997.
55