Академический Документы
Профессиональный Документы
Культура Документы
= arg min
xIR
n
(x), (x) =
1
2
x
T
Ax b
T
x,
where A IR
nn
is symmetric positive denite (SPD) matrix and b IR
n
is a given vector. Clearly, we have
(1.2) (x) = Ax b,
2
= A, (the Hessian is independent on x).
Since the Hessian A is SPD, from well known conditions for a minimum of
a function, we may conclude that there is a unique minimizer x
of ().
Moreover, (1.2) implies that x
k
j=0
.
From the relation (1.3) and (1.2) we immediately nd that
(1.4)
j
=
p
T
j
r
j
p
T
j
Ap
j
, where r
j
= (x
j
) = Ax
j
b.
We note here that r
j
is oftentimes referred to as the residual vector.
We will need the following denition and notation
Denition 1.1. We say that the set of directions p
j
k1
j=0
is a conjugate
set of directions, i p
T
j
Ap
i
= 0 for all i = 1, . . . , (k 1), j = 1, . . . , (k 1),
i ,= j.
By symmetry this denition can also be stated as: The set of directions
p
j
k1
j=0
is a conjugate set of directions, i p
T
j
Ap
i
= 0 for all i and j satisfying
0 i < j (k 1).
We introduce now the following vector spaces and ane spaces (for k =
1, . . .):
(1.5)
W
k
:= spanp
0
, . . . , p
k1
U
k
:= x
0
+ W
k
= z IR
n
[ z = x
0
+ w
k
, w
k
W
k
i
j=0
are obtained via the line search algorithm. Then
the following identity holds:
(1.6) p
T
i
r
i
= p
T
i
[(y)], for all y U
i
.
Proof. We rst note that since x
j
i
j=0
are obtained via the line search
algorithm we have that x
i
U
i
. If we take y U
i
, from the denition of U
i
it follows that x
i
y W
i
and hence p
T
i
A(x
i
y) = 0 (because p
T
i
Aw = 0
for all w W
i
= spanp
0
, . . . , p
i1
). The proof of the identity (1.6) then is
as follows:
p
T
i
(r
i
[(y)]) = p
T
i
(Ax
i
b Ay + b) = p
T
i
A(x
i
y) = 0.
k
j=0
are the iterates obtained after k steps of the line search algo-
rithm then
(2.1) x
j
= arg min
xU
j
(x), for all 1 j k
Proof. The proof is by induction. For k = 1, the result follows from the
denition of x
1
as a minimizer on U
1
. Assume that for k = i,
x
j
= arg min
yU
j
(y), for all 1 j i
To prove the statement of the theorem then, we need to show that
If x
i+1
= x
i
+
i
p
i
then x
i+1
= arg min
xU
i+1
(x),
By the denition of U
i+1
, any x U
i+1
can be written as x = y +p
i
, where
IR and y U
i
. Applying (1.3) and then Lemma 1.2 leads to
(x) = (y + p
i
) = (y) + p
T
i
[(y)] +
2
2
p
T
i
Ap
i
= (y) +
_
p
T
i
[(x
i
)] +
2
2
p
T
i
Ap
i
_
.
Note that we have arrived at a decoupled functional, since the rst term
does not depend on and the second term does not depend on y. Thus,
min
xU
i+1
(x) = min
yU
i
(y) + min
IR
n
_
p
T
i
r
i
+
2
2
p
T
i
Ap
i
_
.
The right side is minimized when y = x
i
and =
i
=
p
T
i
r
i
p
T
i
Ap
i
, and hence
the left side is minimized exactly for x
i+1
= x
i
+
i
p
i
, which concludes the
proof.
3. The Conjugate Gradient algorithm
The conjugate gradient method is an algorithm that explores the result in
Theorem 2.1 and constructs conjugate directions. Here is the rationale of
what we plan to do in this section:
We rst give a general recurrence relation that generates a set of
conjugate directions (Lemma 3.1).
We then show that this recurrence relation can be reduced to a much
simpler expression (see Lemma 3.2(iv)).
As a result, we will get a line search method, which uses conjugate
set of directions and is known as the CG method (see Algorithm 3.3).
4 CG FOR LINEAR SYSTEMS555,S09
We begin with a result, which could be obtained by Gramm-Schmidt
orthogonalization with respect to the A-inner product of the residual vectors
r
j
k
j=0
.
Lemma 3.1. Let p
0
= r
0
and let for k = 1, 2, . . .
(3.1) p
k
= r
k
+
k1
j=0
p
T
j
Ar
k
p
T
j
Ap
j
p
j
Then p
T
j
Ap
m
= 0 for all 0 m < j k.
Proof. We will show that the relation (3.1) gives conjugate directions is by
induction. For k = 1 one directly checks that p
T
1
Ap
0
= 0. Assume that for
k = i the vectors p
j
i
j=0
are pairwise conjugate. We then need to show
that p
T
i+1
Ap
m
= 0 for all m i. Let m i. Then we have
p
T
i+1
Ap
m
= r
T
i+1
Ap
m
+
i
j=0
p
T
j
Ar
i+1
p
T
j
Ap
j
p
T
j
Ap
m
= r
T
i+1
Ap
m
+
p
T
m
Ar
i+1
p
T
m
Ap
m
p
T
m
Ap
m
= 0
Next Lemma among other things, shows that the sum in (3.1) contains only
one term.
Lemma 3.2. Let p
j
k
j=0
are directions obtained via (3.1). Then
(i) W
k
= spanr
0
, . . . , r
k1
(ii) r
T
m
r
j
= 0, for all 0 j < m k
(iii) p
T
k
r
j
= r
T
k
r
k
, for all 0 j k
(iv) The direction vector p
k
satises
p
k
= r
k
+
k1
p
k1
, where
k1
=
r
T
k
r
k
r
T
k1
r
k1
.
Proof. The rst item follows directly from (3.1) and a simple induction ar-
gument, since p
0
= r
0
.
To prove (ii), we rst use (i) to conclude that for 0 j < m k, and any
t IR we have that
r
j
W
j+1
W
m
and hence (x
m
+ tr
j
) U
m
.
Further, from Theorem 2.1, since x
m
is the unique minimizer of () over
U
m
, it follows that t = 0 is the unique minimizer of g(t) = (x
m
+ tr
j
).
Hence we have
0 =
d(x
m
+ tr
j
)
dt
t=0
= [(x
m
)]
T
r
j
= r
T
m
r
j
,
and this proves (ii).
To show that (iii) holds, we rst show the identity in (iii) for j = k. Indeed,
from (i) it follows that r
k
is orthogonal to each p
l
for l < k. Hence, if we
take the inner product with r
k
, the second term in the right side of (3.1)
CG FOR LINEAR SYSTEMS555,S09 5
would vanish, and this is exactly the identity in (iii) for j = k. If j < k,
then we have that (x
k
x
j
) W
k
, and hence p
T
k
A(x
k
x
j
) = 0. Therefore,
p
T
k
(r
k
r
j
) = p
T
k
A(x
k
x
j
) = 0.
To show (iv) we write p
k
W
k+1
as linear combination of r
j
k
j=0
(which
form an orthogonal basis), and then apply (iii). This leads to
p
k
=
k
j=0
p
T
k
r
j
r
T
j
r
j
r
j
=
k
j=0
r
T
k
r
k
r
T
j
r
j
r
j
= r
k
r
T
k
r
k
r
T
k1
r
k1
k1
j=0
r
T
k1
r
k1
r
T
j
r
j
r
j
= r
k
+
k1
k1
j=0
p
T
k1
r
j
r
T
j
r
j
r
j
= r
k
+
k1
p
k1
.
We now can write the conjugate gradient algorithm, using the much shorter
recurrence relation for the direction vectors p
k
, which is provided by Lemma 3.2(iv).
We denote below |y|
2
= y
T
y and |y|
2
A
= y
T
Ay for a vector y IR
n
.
Algorithm 3.3 (Conjugate Gradient). Let x
0
be given initial guess.
Set r
0
= Ax
0
b and p
0
= r
0
, k = 0.
While r
k
,= 0 do
k
=
|r
k
|
2
|p
k
|
2
A
[from Lemma 3.2(iii)]
x
k+1
= x
k
+
k
p
k
r
k+1
= r
k
+
k
Ap
k
[because Ax
k+1
b = Ax
k
b +
k
Ap
k
]
k
=
|r
k+1
|
2
|r
k
|
2
p
k+1
= r
k
+
k
p
k
[from Lemma 3.2(iv)]
Set k = k + 1
endWhile
4. Convergence rate of the Conjugate Gradient method
In this section we will present an estimate for the convergence rate of the CG
algorithm. The convergence rate estimate, given here is rather general and
does not take into account knowledge of the distribution of the eigenvalues
of A. There are estimates that are more rened in this regard. We refer to
Luenberger [3] for further reading.
4.1. Krylov subspaces and error reduction. To analyze the error we
rst prove the following result:
Lemma 4.1. The following relation holds:
(4.1) W
l
= spanr
0
, . . . , A
l1
r
0
.
6 CG FOR LINEAR SYSTEMS555,S09
Proof. The case l = 1, being clear, we assume that the relation holds for
l = i, and we would like to show that the same relation holds for l =
(i + 1). From Lemma 3.2(i), this would be equivalent to showing that r
i
spanr
0
, . . . , A
i
r
0
. By the induction assumption, we can write
W
i
r
i1
= R
i1
(A)r
0
, and W
i
p
i1
= P
i1
(A)r
0
,
where R
i1
() and P
i1
() are polynomials of degree less that or equal to
(i 1). We then have
r
i
= r
i1
+
i1
Ap
i1
= R
i1
(A)r
0
+
i1
AP
i1
(A)r
0
spanr
0
, . . . , A
i
r
0
,
which concludes the proof.
We now present a general error estimate relating |x
x
l
|
A
and |x
x
0
|
A
.
Lemma 4.2. The following estimate holds:
|x
x
l
|
A
= inf
PP
l
; P(0)=1
|P(A)(x
x
0
)|
A
.
Proof. Since r
l
is orthogonal to W
l
, we have
(x
x
l
)
T
Ay = r
T
l
y = 0, for all y W
l
.
Denoting for a moment w
l
= (x
l
x
0
) W
l
and e
0
= x
x
0
, the relation
above implies that
0 = (x
x
l
)
T
Ay = (e
0
w
l
)
T
Ay for all y W
l
.
Therefore, w
l
= (x
l
x
0
) is an A-orthogonal projection of e
0
= (x
x
0
) on
W
l
. Thus,
|e
0
w
l
|
A
= min
wW
l
|e
0
w|
A
But from Lemma 4.1 we know that w = Q
l1
(A)r
0
, for a polynomial Q
l1
T
l1
. Also, Ae
0
= r
0
and e
0
w = (I Q
l1
(A)A)e
0
and hence
(4.2) |x
x
l
|
A
= |e
0
q
l
|
A
= min
P
l
P
l
; P
l
(0)=1
|P
l
(A)e
0
|
A
.
This completes the proof.
To obtain a qualitative estimate on the right hand side of (4.2), we observe
that for any polynomial P
l
() we have
|x
x
l
|
A
= min
P
l
P
l
; P
l
(0)=1
|P
l
(A)e
0
|
A
min
P
l
P
l
; P
l
(0)=1
(P
l
(A))|e
0
|
A
,
where (P
l
(A)) is the spectral radius of P
l
(A). Since both A and P
l
(A) have
the same eigenvectors, we may conclude that
|x
x
l
|
A
min
P
l
P
l
; P
l
(0)=1
max
1jn
[P
l
(
j
)[|e
0
|
A
= c
l
(
1
, . . . ,
n
)|e
0
|
where
1
2
. . .
n
are the eigenvalues of A.
In the next section, we will derive a somewhat pessimistic upper bound on
c
l
by rst estimating
c
l
(
1
, . . . ,
n
) min
P
l
P
l
; P
l
(0)=1
|P
l
|
,[
1
,n]
and then with the help of a construction based on the Chebyshev polynomi-
als, we will nd the value of the right side of the above inequality in terms
of
1
and
n
.
CG FOR LINEAR SYSTEMS555,S09 7
4.2. Chebyshev polynomials and a convergence rate estimate. The
Chebyshev polynomials of rst kind on [1, 1] are dened as
T
l
() = cos(l arccos()), l = 0, 1, . . .
Using a simple trigonometric identity (with = arccos()) shows that
T
l+1
() + T
l1
() = cos(l + 1) + cos(l 1) = 2(cos ) cos l.
Hence,
(4.3) T
l+1
() = 2T
l
() T
l1
().
This proves that T
l
are indeed polynomials, because T
0
() = 1 and T
1
() = .
The form (4.3) defnes T
l
() for all IR. Another form of the Chebyshev
polynomals, which will be useful in the convergence rate estimate given
below in Theorem 4.4 is derived as follows: From the relation (4.3) for xed
we observe that
T
l
() = c
1
[
1
()]
l
+ c
2
[
2
()]
l
, l = 0, 1, . . . ,
where
1
() and
2
() are the roots of the characteristic equation
2
2 + 1 = 0.
The constants c
1
and c
2
are easily computed from the initial conditions
T
0
() = 1 and T
1
() = and hence
(4.4) T
l
() =
1
2
[( +
_
2
1)
l
+ (
_
2
1)
l
].
We further have that [T
l
()[ 1 for all [1, 1] and that
(4.5) If
m
= cos(
m
l
), then T
l
(
m
) = (1)
l
, m = 0, 1, . . . , l.
Dene now
S
l
() =
_
T
l
_
n
+
1
n
1
__
1
T
l
_
n
+
1
2
n
1
_
Note that
|S
l
|
,[
1
,n]
=
T
l
_
n
+
1
n
1
_
1
.
Next Lemma shows that S
l
is a polynomial with minimum max-norm,
that is,
|S
l
|
,[
1
,n]
= min
P
l
P
l
; P
l
(0)=1
|P
l
|
,[
1
,n]
.
Lemma 4.3. For any P
l
T
l
with P
l
(0) = 1,
|S
l
|
,[
1
,n]
|P
l
|
,[
1
,n]
Proof. Denote
t
=
_
T
l
_
n
+
1
n
1
__
1
.
Let
m
=
1
n
2
m
+
n
+
1
2
, where
m
are dened in (4.5). Note that
S
l
(
m
) = (1)
m
t
, m = 0, . . . , l,
and also that
m
[
1
,
n
]. Assume that there exists P
l
T
l
with P
l
(0) = 1,
such that
[P
l
()[ < [t
[, for all [
1
,
n
].
8 CG FOR LINEAR SYSTEMS555,S09
This in particular implies that
[t
[ < P
l
(
m
) < [t
[, m = 0, 1, . . . , l.
If sign(t
) > 0 then
P
l
(
m
)S
l
(
m
) < 0, for m even, and P
l
(
m
)S
l
(
m
) > 0, for m odd.
On the other hand, the case sign(t
, the dierence P
l
S
l
has a zero in every interval (
m
,
m+1
).
There are l such intervals. But we also have that P
l
(0) S
l
(0) = 0. Since
P
l
S
l
is a polynomial of degree at most l, it follows that P
l
S
l
, which is
a contradiction.
Clearly, from this lemma it follows that
(4.6) |x
x
l
|
A
|S
l
|
,[
1
,n]
|x
x
0
|
A
.
In the next Theorem 4.4 we obtain this estimate in terms of the condition
number of A, by calculating |S
l
|
,[
1
,n]
.
Theorem 4.4. The error after l iterations of the CG algorithm can be
bounded as follows:
(4.7)
|x
x
l
|
A
2
_
+1
1
_
l
+
_
+1
_
l
|x
x
0
|
A
2
_
1
+ 1
_
l
|x
x
0
|
A
,
where = (A) =
n
/
1
is the condition number of A.
Proof. We aim to calculate |S
l
|
,[
1
,n]
=
T
l
_
n
+
1
n
1
_
1
. From (4.4),
for =
n
+
1
n
1
=
+ 1
1
, we obtain
2
1 =
+ 1
1
1
=
+ 1 2
1
=
(
1)
2
(
1)(
+ 1)
=
1
.
Thus,
T
l
_
n
+
1
n
1
_
=
1
2
_
_
+ 1
1
_
l
+
_
1
+ 1
_
l
_
.
Finally
|S
l
|
,[
1
,n]
=
T
l
_
n
+
1
n
1
_
1
=
2
_
+1
1
_
l
+
_
+1
_
l
2
_
1
+ 1
_
l
.
The proof is completed by substituting the above expression in (4.6).
References
[1] Douglas N. Arnold. A concise introduction to numerical analysis. Lecture Notes, Penn
State, MATH 597INumerical Analysis, Fall 2001.
[2] Magnus R. Hestenes and Eduard Stiefel. Methods of conjugate gradients for solving
linear systems. J. Research Nat. Bur. Standards, 49:409436 (1953), 1952.
[3] David G. Luenberger. Linear and nonlinear programming. Kluwer Academic Publish-
ers, Boston, MA, second edition, 2003.