Вы находитесь на странице: 1из 13

Chapter 5

Algebraic Eigenvalue
Problems
Eigenvalue analysis is an important practice in many elds of engineering or
physics. We also have seen in this course that eigenvalue analysis play an
important role in the performance of many numerical algorithms. A general
eigenvalue problem is stated as follows:
Denition 5.0.1 Given n n matrices A and B, nd numbers such that the
equation
Ax = Bx
(5.1)
is satised for some nontrivial vector x = 0.
If B is invertible, then (5.1) can be reduced to
Cx = x.

(5.2)

Even if both A and B are real-valued, it is likely that and x are complexvalued. Finding the solution of eigensystems is a fairly complicated procedure.
It is at least as dicult as nding the roots of polynomials. Therefore, any
numerical method for solving eigenvalue problems is expected to be iterative in
nature. Algorithms for solving eigenvalue problems include the power method,
subspace iteration, the QR algorithm, the Jacobi method, the Arnoldi method
and the Lanczos algorithm.
Some major references in this eld are given at the end of this note.

5.1

Perturbation Theory

Given A = (aij ) Rnn , dene ri :=


C| |x aii | ri }. Then
1

n
j=i

|aij | for all i. Let Ci := {x

CHAPTER 5. ALGEBRAIC EIGENVALUE PROBLEMS

Theorem 5.1.1 (Gerschgorin) Every eigenvalue of A belongs to one of the


circles Ci . Moreover, if m of the circles form a connected set S disjoint from
the remaining n m circles, then S contains exactly m eigenvalues.
(pf): Let be a eigenvalue of A and let x be the corresponding eigenvector,
i.e. Ax = x. Let k bethe index for which x = |xk |. Then akj xj = xk
implies ( akk )xk = j=k akj xj . It follows that | akk | rk , i.e., Ck .
We know that eigenvalues are roots of the characteristic polynomial p() =
det(I A) whereas coecients of p() are polynomials in aij . We also know
that the zeros of a polynomial depend continuously on its coecients. So we
conclude that the eigenvalues of a matrix depend continuously on its elements.
Consider A() := D + (A D) where D is the diagonal of A and 0  1.
When  = 0, all circles degenerate into points since A(0) = D. As  increases,
all circles expand. Note A(1) = A. As  is changed, each eigenvalue () is
varied continuously in the complex plane and mark out a path from (0) to
(1). Circles that are disjoint for  = 0 are necessarily disjoint for all  < 0 .
By the continuity of i () in , eigenvalues cannot go from a connected set to
another disjoint set. The theorem is proved.

Theorem 5.1.2 (Bauer-Fike) If is an eigenvalue of A + E Cnn and


P 1 AP = D = diag {1 , . . . , n }. Then
min | | (P )E

(5.3)

where   denotes any induced norm.


(pf): Consider (A + E)x = x. Then (I A)x = Ex, (I D)P 1 x =
(P EP )P 1 x. Assuming = i , then it follows that P 1 x (I
1
D)1 P 1 EP P 1 x. If an induced norm is used, then ID)1  = max ||
.
The theorem is proved.





101, 90
, 
Example. Let A =
,E =
. The eigenvalues of A
110, 98
0,
0

2
are {1, 2}. The eigenvalues of A + E are { 3 1838+
}. When  = 0.001,
2
the eigenvalues are approximately {1.298, 1.701}. Thus min | | 0.298.
This example shows that a small perturbation E can lead to relative large
perturbation in the eigenvalues of A.
Remark. When A is a normal matrix, i.e., when AA = A A (This class of
matrices include symmetric matrices, orthogonal matrices, hermitian matrices,
etc.), it is known that P may be chosen to be a unitary matrix (i.e., P P =
I). If we use the L2 -norm, then P 2 = P 1 2 = 1. In this case, we have
min  | E2 . This implies that eigenvalue problems for normal matrices
are well-conditioned.
Remark. When talking about the perturbation of eigenvectors, some cautions
should be taken.
(1) Eigenvectors are unique up to multiplicative constants. Continuity should
be discussed only under the assumption that all vectors are normalized in the
same way.
1

5.2. POWER METHOD AND ITS VARIANTS

(2) There
associated with multiply eigenvalues. For example,
 are diculties

1 + , 0
the matrix
has eigenvectors [1, 0]T and [0, 1]T whereas the matrix
0, 1 
I has eigenvectors everywhere in R2 .
Examples. (1) Consider the matrix

1 1 0 0 0
0 1 1 0 0

A=
0 0 1 1 0
0 0 0 1 1
0 0 0 0 1
which is already in the Jordan Canonical form. It is not diagonalizable and
has eigenvalues = 1 as an n-ford eigenvalues. Consider a slightly perturbed
matrix

1 1 0 0 0
0 1 1 0 0

A+E =
0 0 1 1 0 .
0 0 0 1 1
 0 0 0 1
The eigenvalues of A + E are roots of p() = (1 )n (1)n  = 0. So
1/n
1/n
1 i = ()i
where ()i
is the i-th of the n-th roots of . Note that
1/n
|i () i | = | |. With n = 10 and  = 1010 , we nd that |i () i | = .1.
Note also that the matrix A has only one eigenvector whereas A + E has a
complete set of eigenvectors.

2, 1010
. A is not diagonalizable. The exact
(2) Consider the matrix
0,2
eigenvalue of A are {2, 2} Suppose that = 1 is an approximate eigenvalue with
eigenvector x = [1, 1010 [T . Then we nd the residue r := Axx = [0, 1010]T .
Obviously, the residue is misleading.

5.2

Power Method and Its Variants

Given a matrix A Cnn and x(0) Cn . We dene an iteration x(k+1) := Ax(k)


for k = 0, 1, . . . What will happen to the sequence {x(k) }. If A is convergent,
then x(k) = Ak x(0) 0. If A is not convergent, then x(k) may grow unboundly.
Either case is undesirable. So we modify the process as follows.
Algorithm 5.2.1 (Power Method)
Given x(0) Cn arbitrary,
For k = 1, 2, . . ., do
w(k)
x(k)

:= Ax(k1)
w(k)
:=
w(k) 

CHAPTER 5. ALGEBRAIC EIGENVALUE PROBLEMS

There is nothing special about doing a sup-norm normalization. In fact, any


kind of normalization will be ne. We now examine the convergence properties
of the power method.
For illustration purpose, we shall assume A is diagonalizable with eigenvalues
|1 | > |2 | . . . |n |. Let the corresponding eigenvectors be denoted by
(0)
x
1 ,n. . . xn . Detail discussion for other cases can be found in [1]. Suppose x =
i=1 i xi . Then
Ax(0)

k (0)

A x

n

i=1
n

i i xi

i ki xi

k1

1 x1 +

i=1

i=2

i
1

k
xi

Assume 1 = 0 (This is guaranteed if x(0) is selected randomly). We observe


 (k1) 
w
A
w (k1) 
(k)
Ax(k1)


x
= Ax(k1)  =
(k1)
A ww(k1)  
=

A2 x(k2)
A2 x(k2) 

Ak x(0)
.
Ak x(0) 

It is clear that as k , the vector Ak x(0) behaves like 1 k1 x1 in the sense that
contribution from x2 , . . . xn becomes less and less signicant. Normalization
k
makes x(k) |1 1k | xx11 . That is, x(k) converges to an eigenvector associated
1

with the eigenvalue 1 . Also, w(k+1) = Ax(k) 1 x(k) . So

(Ax(k) )j
(k)
xj

1 .

Remark. It is clear that the rate of convergence of power method depends on


the ratio 21 .
Remark. The eigenvalues of the matrix A bI for a scalar b are i b if and
only if i are eigenvalues of A. With this shift in mind, we can work on the
matrix AbI instead of A with the hope that the ratio of the rst two dominant
eigenvalues i b will become smaller. This is called a shifted power method.
It is not hard to nd that the application is very limited. For example, assume
all eigenvalues are real and are distributed as follows: Then with all choices of
-2

-1

4
converges to 1

2
1
converges to 3

b the shifted power method will converge to either 1 or 3 , but not any other
eigenvalues. The idea is plausible, however, in the following setting.

5.2. POWER METHOD AND ITS VARIANTS


Suppose A is nonsingular, then A1 has eigenvalues
the power method to A1 , i.e., we dene

5
1
1
1 , . . . , n .

We apply

Algorithm 5.2.2 (Inverse Power Method)


Given x(0) Cn arbitrary, For k = 1, 2, . . ., do
Aw(k)
x(k)

:= x(k1)
w(k)
:=
.
w(k) 

If we assume |1 | . . . |n1 | > |n |, then {x(k) } converges to an eigen|. Now we incorporate


vector associated with n at the rate depending on | n1
n
with the shift idea to obtain the following algorithm.
Algorithm 5.2.3 (Shifted Inverse Power Method)
Given x(0) Cn arbitrary,
For k = 1, 2, . . ., do
(A bI)w(k)
x(k)

:= x(k1)
:=

w(k)
w(k) 

The eigenvalues of (A bI)1 are 11b , . . . , n1b . So whenever b is chosen


close enough to the eigenvalue i , the sequence {x(k) } by the shifted inverse
power method converges to an eigenvector associate with the eigenvalue i1b .
Example. Suppose eigenvalues are distributed as follows: Then the longer
-2

-1

vertical bars separates the regions of b by which the shifted inverse power method
will be able to nd, respectively, dierent eigenvalues.
Remark. There are several ways to choose the shift b:
(1) If some estimate of i has been found, we may use it for b.
(2) We may generate b(0) at random. Then dene
b(k+1) :=

(w(k) ) Aw(k)
(w(k) ) w(k)

(5.4)

successively. This is a variable-shift inverse power method, and is known as the


Rayleigh Quotient Iteration. It can be shown that the rate of convergence is
cubic [2].

CHAPTER 5. ALGEBRAIC EIGENVALUE PROBLEMS

Remark. In the inverse power method, we have to solve a linear system


(A bI)w(k) = x(k1) .

(5.5)

When b is close to some eigenvalue i , the matrix A bI is nearly singular. We


expect that small errors in x(k1) would lead to large errors in the computation
of w(k) . Fortunately, this error mainly occurs in the direction of x(k) . Thus after
normalization the method is still useful in practice. (See, [2], for a justication.)

5.3

The OR algorithm

The basic idea behind the QR algorithm is from a sequence of matrices {Ai }
that are isospectral to the original matrix A1 := A and that converge to a special
form from which the eigenvalues can readily be known. For simplicity, we shall
restrict our discussion for real matrices in the sequel.
Algorithm 5.3.1 (Basic QR algorithm)
Given A Rnn , dene A1 := A.
For k = 1, 2, . . ., do
Calculate the QR decomposition Ak = Qk Rk ,
Dene Ak+1 := Rk Qk .
Remark. We observe that Ak+1 = Rk Qk = QTk Ak Qk . So Ak+1 is orthogonally
similar to Ak . By induction, the sequence {Ak } is isospectral to A.
Denition 5.3.1 We say matrix A Rnn is upper Hessenberg if and only if
Aij = 0 for all i 2 j.
The QR algorithm is expensive because of the large number of computations
that must be performed at each step. To save this overhead, the matrix A
usually is simplied rst by orthogonal similarity transformations to an upper
Hessenberg matrix. This can be accomplished
as follows:

1, 0, . . . , 0
0,

2 is
where U
Given A, let U2 be the unitary matrix U2 =

U2
0,
formed by the Householder
transformation
for
the
column vector [a21 , . . . an1 ]T .

unchanged

x
x ... x

0
x
x
Thus U2 A =
. It is clear that U2 AU2T does not

..
..
..

.
.
.
0
x ... x
change the rst column of U2 A. Continuing this procedure, A is transformed
by orthogonal similarity transformations into an upper Hessenberg matrix.

5.3. THE OR ALGORITHM

Denition 5.3.2 An orthogonal matrix of the form

1
0
0 0
0
0
0 cos 0 0 sin 0 j th row

0
0
1 0
0
0

0
0
0 1
0
0

0 sin 0 0 cos 0 k th row


0
0
0 0
0
1
is called a rotation matrix in the (j, k)-plane and is denoted as jk .


a11 , a12
. Then
Example. Consider the 2 2 matrix A =
a21 , a22


 

c s
a11 , a12
ca11 + sa21 ,
ca12 + sa22
=
.
s c
a21 , a22
sa11 + ca21 , sa12 + ca22
Thus if we choose c :=

and s = 2a21 2 , then the (2, 1)-position


 a11 +a21
c s
of the product becomes zero. The matrix
amounts to the rotation
s c
of the coordinate axes by an angle determined by tan = sc = aa21
. This idea
11
has been incorporated into the so called Givens method for solving eigenvalue
problems:
Suppose A is an upper Hessenberg matrix. Then the QR decomposition of
A can be calculated by a sequence of plane rotations. This is demonstrated as
follows:

x x x x

x x x x
0

A1 =
0 x x x 12 A1 = A2 = 0 x x x
0 0 x x
0 0 x x
a11
a211 +a221

23 A2

34 A3

x x
0
= A3 =
0 0
0 0

x x
0 x
= A4 =
0 0
0 0

x x

x x
x x
= R.

0

Working with upper Hessenberg matrices, the QR decomposition will only take
O(n2 ) ops, while with general matrices, the QR decomposition requires O(n3 )
ops.
Remark. It is important to note that if A1 = A is upper Hessenberg, then
the orthogonal matrix Q1 in the QR decomposition of A1 = Q1 R1 is also upper

CHAPTER 5. ALGEBRAIC EIGENVALUE PROBLEMS

Hessenberg (Prove this!). Therefore, A2 = R1 Q1 maintains to be upper Hessenberg. It follows that the upper Hessenberg structure is maintained throughout
the QR algorithm.
We have seen that
Ak+1 = QTk Ak Qk = QTk . . . QT1 AQ1 . . . Qk .
Let us call
Pk : = Q1 . . . Qk
Uk : = Rk . . . R1 .

(5.6)
(5.7)

Note that Pk is still orthogonal and Uk upper triangular. Moreover,


Pk Uk

= (Q1 . . . Qk )(Rk . . . R1 ) = (Q1 . . . Qk1 )Ak (Rk1 . . . R1 )


= (Q1 . . . Qk1 )(QTk1 . . . QT1 AQ1 . . . Qk1 )(Rk1 . . . R1 )
= APk1 Uk1 = . . . = Ak .

(5.8)

In other words, we have found that Pk Uk is the QR decomposition of the matrix


Ak
Theorem 5.3.1 Suppose A Rnn and suppose |1 | > . . . > |n | > 0. Then
the sequence {Ak } converges to an upper triangular matrix.
(pf): From the assumption, we know all eigenvalues of A are real and distanct. So A is diagonalizable. Let P be the nonsingular matrix such that
P 1 AP = D := diag {1 , . . . , n }. Then Ak = P Dk P 1 . Let P 1 = LU
be the LU decomposition of P 1 with value 1 along the diagonal of L (We assume this LU decomposition exists with some prior pivoting if necessary). Then
Ak = P Dk LU = P (Dk LDk )Dk U . Consider Dk LDk . The (i, j)-component
is ( ji )klij . Since L is lower triangular, only the lower triangular i.e., i j,
portion needs to be considered.
Let Dk LDk := I + Ek . By the ordering of i , we see that Ek 0 as
k . Let P := QR. Then
Ak

= QR(Dk LDk )Dk U


= QR(I + Ek )Dk U = Q(I + REk R1 )RDk U
k )RDk U = (QQ
k )(R
k RDk U ).
kR
= Q(Q

(5.9)

kR
k is the QR decomposition of I + REk R1 .
where Q
We recall the issue of uniqueness of the QR decomposition of a matrix.
Suppose a nonsingular matrix B has two QR decompositions, say B = QR =

= RR
1 . But QT Q
is orthogonal whereas RR
1 is upper
QR.
Then QT Q
triangular. An orthogonal and upper triangular matrix is diagonal with absolute
= QD and R
= D1 R.
1 := D, then Q
value 1 along its diagonal. If we call RR
So the QR decomposition of B is unique if the diagonal elements of R are
required to be positive.

5.3. THE OR ALGORITHM

We have already known Ak = Pk Uk . Thus Ak has two QR decompositions.


It follows that
Pk
Uk

kD
k
= QQ
1
R
kR
k RDK U = D
k RDk U
= D
k

k is a diagonal matrix with elements either 1 or -1. Now


where D
Ak+1

kD
k )T A(QQ
kD
k) = D
kQ
T (QT AQ)(Q
kD
k)
= PkT APk = (QQ
k
T
1
1
kQ
k (RP AP R )(Q
kD
k)
= D
T
1
(RDR )(Q
k ).
kQ
kD
= D
(5.10)
k

Note that RDR1 is an upper triangular matrix with diagonal elements 1 , . . . , n .


Observe also that I + REk R1 converges to I as k since Ek 0. So
k = Q
k (I + REk R1 ) Q
T as k . Since R
k is upper triangular
R
k

with positive diagonal elements, Rk and Qk I as k . Thus we see


from (5.10) that Ak+! an upper triangular matrix with 1 , . . . , n along the
diagonal.

Remark. From the above proof, we see that the rate of convergence depends

on the ration maxj | j+1


|. It is, therefore , desired to use the shift technique to
j
accelerate the convergence.
Algorithm 5.3.2 (QR algorithm with origin shift)(Explicit shift)
Given A Rnn , dene A1 := A.
For k = 1, 2, . . ., do
Select a shift factor ck ;
Calculate the QR decomposition,
Ak ck I = Qk Rk ;

(5.11)

Ak+1 := Rk Qk + ck I.

(5.12)

Dene
Remark. We note that Rk = QTk (Ak ck I). So Ak+1 = QTk (Ak ck I)Qk +
ck I = QTk Ak Qk . That is, Ak+1 and Ak are orthogonally similar.
Denition 5.3.3 Suppose Ak is upper Hessenberg. The shift factor ck may be
determine from the eigenvalues, say k or k , of the bottom 2 2 submatrix of
Ak . If both k and k are real, we take ck to be k or k according to whether
(k)
(k)
|k ann | or |k ann | is smaller. If k = k , then we choose ck = k . Such
a choice of shift factor is known as the Wilkinson shift.
Theorem 5.3.2 Given any matrix A and the relationship B = QT AQ. Suppose
B is upper Hessenberg with positive and diagonal elements and suppose Q is
orthogonal. Then matrices Q and B are uniquely determined from the rst
column of Q.

10

CHAPTER 5. ALGEBRAIC EIGENVALUE PROBLEMS

(pf): This is homework problem.


Remark. Let A be any given matrix and let B = QT AQ be an upper Hessenberg matrix. Let P T be an orthogonal matrix whose rst column agrees with
that of Q. Suppose P AP T is reduced to upper Hessenberg, that is, suppose
U1 , . . . , Un2 are orthogonal matrices such that
:= Un2 . . . U1 (P AP T )U1 . . . U T
B
n2

=
becomes an upper Hessenberg matrix (Recall how this is done!). Let Q
T T
T
T

is
P U1 . . . Un2 . Then we have Q AQ = B. Note that the rst column of Q
= B.
the same as that of Q. From Theorem 5.3.1, we conclude that B
Suppose Ak is upper Hessenberg. There is an easier way to compute Ak+1 =
QTk Ak Qk from Ak based on the above theorem and remark: Since Ak is upper
Hessenberg, the rst column of Qk is known, i.e., it is the column [c, s, 0, . . . , 0]T
with
(k)
a11 ck
c := 
(k)
(k)
(a11 ck )2 + a21
(k)

a21
s := 
.
(k)
(k)
2
(a11 ck ) + a21
So the rst column of P T is known and

c
s
T

P :=
0
0

PT
s
c
0
0

It is easy to see that P Ak P T is at most

x
x

P Ak P T =
x
0
0

a matrix of the form

x x x x
x x x x

x x x x

0 x x x
0 0 x x

can be chosen to be the matrix

0 0
0 0

1 0
0 1

So we can make a rotation in the (2.3)-plane to eliminate the nonzero element


at the (3, 1)-position. In doing so, the (4, 2)-position becomes nonzero. We
continue to chase the nonzero elements by n 2 rotations to produce Ak+1 .
Suppose we have applied one QR step with shift c1 to obtain A2 = QT1 A1 Q1
and other QR step with shift c2 to obtain A3 = QT2 A2 Q2 = QT2 QT1 A1 Q1 Q2 .
Then we may as well using the rst column of Q1 Q2 to compute A3 from
A1 by using Theorem 5.3.1 and the above remark. We recall that A1 c1 I =
Q1 R1 , A2 = R1 Q1 +c1 I, A2 c2 I = Q2 R2 and A3 = R2 Q2 +c2 I. It follows that
T2 = (A3 c2 I)QT2 = (QT2 QT1 A1 Q1 Q2 c2 I)QT2 = QT2 QT1 (A1 c2 I)Q1 . Thus
R2 R1 = QT2 QT1 (A1 c2 I)Q1 R1 , or equivalently Q1 Q2 R2 R1 = (A1 c2 I)(A
c1 I) = A2 (c0 + c1 )A + c0 c1 I. That is, the rst column of Q1 Q2 is seem to
be the normalized rst column of the matrix A2 (c0 + c1 )A + c0 c1 I. This is
implicit double shift.

5.4. EXERCISES

5.4

11

Exercises

1. Let p be a chosen integer satisfying 1 p n. Given an n p matrix Q0


with orthonormal columns, consider the iteration
Zk
Qk R k

= AQk1
= Zk (QR factorization)

for k = 1, 2, . . .. Explain why this iteration can usually be used to compute


the p largest eigenvalues of A in absolute value. How then should you
modify the iteration when the p smallest eigenvalues of A in absolute
value are needed.
2. Prove that the upper Hessenberg structure is maintained throughout the
QR algorithm. That is, if Ak = Qk Rk is an upper Hessenberg matrix,
show that Ak+1 = Rk Qk is also upper Hessenberg.
3. Prove Theorem 5.3.2.
4. Compute a QR step with the matrix


2 
A=
 1
(a) without shift;
(b) with shift k = 1.
5. Assume A = diag(1 , 2 ) with 1 > 2 . Assume


ck
xk =
sk
with c2k + s2k = 1 is an approximate eigenvector.
(a) Compute the Rayleigh quotient of xk .
(b) Compute, by pen and paper, the next Raleigh quotient iteration
xk+1 .
(c) Comparing xk and xk+1 , give an explanation why the convergence of
the Rayleigh quotient iteration is ultimately cubic.

12

CHAPTER 5. ALGEBRAIC EIGENVALUE PROBLEMS

Bibliography
[1] D. K. Faddeev and V. N. Faddeva, Computational Methods of Linear Algebra, W. H. Freeman and Co., San Francisco, 1963.
[2] B. N. Parlett, The Symmetric Eigenvalue Problems, Prentice-Hall, Englewood Clis, N.J., 1980.
[3] B. T. Smith et al, Matrix Eigensystem Routines, EISPACK Guide, Lecture
notes in computer science, Vol. 6, Berlin, Springer-Verlag, 1974.
[4] G.W. Stewart and Ji-guang Sun, Matrix Perturbation Theory, Academic
Press, Boston, 1990.
[5] D. S. Watkins, Understanding the QR algorithm, SIAM Review, 24(1982),
427-440.
[6] J. H. Wilkinson and C. Reinsch, Linear Algebra, Handbook for Automatic
Computation, Vol. II, ed. F. L. Bauer, Berlin, Springer-Verlag, 1971.
[7] J.H. Wilkinson, The Algebraic Eigenvalue Problem, Oxford, Clarendon
Press, 1988.

13

Вам также может понравиться