Академический Документы
Профессиональный Документы
Культура Документы
Abstract. We discuss the method of Cyclic Reduction for solving special systems
of linear equations that arise when discretizing partial differential equations. In
connection with parallel computations the method has become very important.
Introduction
II
2
2.1
Classical Algorithm
Scalar Cyclic Reduction
d1 f1
e2 d2 f2
.. ..
.
..
A=
.
.
.
(1)
The fundamental operation in Cyclic Reduction is the simultaneous elimination of odd-indexed unknowns. This operation may be described as a block
LU-factorization of a permutation of the rows and columns of the matrix A.
Let S be the permutation matrix such that
S(1, 2, . . . , n)T = (1, 3, . . . , |2, 4, . . .)T
The permuted matrix SAS T becomes:
d1
d3
T
SAS = e2 f2
e4
f1
e3 f3
..
.. ..
.
. .
.
d2
..
.
d4
..
..
.
.
d1
f1
0
1
..
e3 f3
. . d3
.
.
..
.. ..
.
. .
1
0
(1) (1)
(2)
SAS T = l1 m1
0
.
1
d1 f1
.
.
(1) (1) .
.. ...
l2 . .
e2 d2 . .
..
.. ..
.
1
. .
0
For n even, this decomposition is computed for i = 1, . . . , n2 1 as
mi = f2i /d2i+1 ,
li = e2i /d2i1 ,
(1)
= mi f2i+1 ,
(1)
fi
di
ln/2 = en /dn1
(1)
III
Since the Schur complement is tridiagonal we may iterate and again eliminate
the odd-indexed unknowns of the reduced system. G. Golub recognized (cf.
[10]) that Cyclic Reduction is therefore equivalent to Gaussian elimination
without pivoting on a permuted system (P AP T )(P x) = P v. The permutation matrix P reorders the vector (1, 2, . . . , n) such that the odd multiples of
20 come first, followed by the odd multiples of 21 , the odd multiples of 22 ,
etc.; e.g. for n = 7 we get the permuted system
d1
f1
x1
v1
d3
x3 v3
e
f
3
3
d5
f5 e5
x5 v5
d7
e7
(3)
x7 = v7 .
e2 f2
v2
d
x
2
2
e6 f6
d6 x6 v6
e4 f4
d4
x4
v4
The connection to Gaussian elimination allows us to conclude that if elimination with diagonal pivots is stable then Cyclic Reduction will also be stable. If
A is strictly diagonal dominant or symmetric positive definite then this holds
for P AP T . No pivoting is necessary in these cases and therefore Cyclic Reduction is well defined and stable. However, fill-in is generated in the process
and this increases the operation count. Cyclic Reduction requires about 2.7
times more operations than Gaussian elimination and thus has a redundancy
of 2.7 [2].
2.2
We use the notation of Heller [8] and consider a block tridiagonal linear
system Ax = v, i.e.
Ej xj1 + Dj xj + Fj xj+1 = vj ,
j = 1, . . . , n,
E1 = Fn = 0.
(4)
rhs
v2j1
v2j
v2j+1
IV
where
Ej0 = E2j (D2j1 )1 E2j1
Dj0
Fj0
vj0
= F2j (D2j+1 )
(5)
1
E2j+1
(6)
F2j+1
1
(7)
1
v2j+1 .
(8)
Eliminating in this way the odd indexed unknowns, we obtain a block tridiagonal reduced system. The elimination can of course only be performed if the
diagonal block matrices are non-singular. Instead of inverting these diagonal
blocks (or better solving the corresponding linear systems) it is sometimes
possible to eliminate the odd unknowns by matrix multiplications: Multiply equation #(2j 1) with E2j D2j+1 , equation #(2j) with D2j1 D2j+1 ,
equation #(2j + 1) with F2j D2j1 and add. If
D2j1 D2j+1 E2j = E2j D2j+1 D2j1 and
D2j1 D2j+1 F2j = F2j D2j1 D2j+1 ,
then the matrices of the unknowns with odd indices vanish. However, if the
matrices do not commute then we cannot eliminate by matrix multiplications.
In case of the discretized Poisson equation (20) the matrices do commute.
Eliminating the unknowns with odd indices thus results in a new tridiagonal system for the even unknowns. The new system is half the size of the
original system. We can iterate the procedure and eliminate again the odd
unknowns in the new system.
2.3
If we reorder Equation (4) for the first step of Cyclic Reduction by separating
the unknowns with odd and even indices we obtain:
vo
xo
D1 F
(9)
=
ve
xe
G D2
where D1 and D2 are diagonal block-matrices and F and G are blockbidiagonal. Eliminating the unknowns with odd indices xo = (x1 , x3 , . . .)T
means computing the reduced system of equations for the unknowns with
even indices xe = (x2 , x4 , . . .)T by forming the Schur complement:
(D2 GD11 F )xe = ve GD11 vo .
(10)
It is also possible to eliminate the unknowns with even indices, thus forming
the equivalent system:
D1 F D21 G
0
0
D2 GD11 F
xo
xe
=
vo F D21 ve
ve GD11 vo
.
(11)
Notice that by this operation the system (11) splits in two independent
block-tridiagonal systems for the two sets of unknowns xo and xe . They can
be solved independently and this process can be seen as the first step of a
divide and conquer algorithm. The two smaller systems can be split again
in the same way in two subsystems which can be solved independently on
different processors.
2.4
a1
1 a
v1
x1
x v
2 2
. .
. = . .
. .
1 xn1 vn1
a
xn
vn
1
.. ..
. .
..
. a
1
(12)
Such systems have to be solved in the Fourier algorithm discussed in Section 3.4. Cyclic Reduction reduces this system in step k to A(k) x(k) = v (k)
where
VI
A(k)
ak bk
bk ak bk
.
.
.
.
= bk . .
.
.
.. a b
k k
bk ak
(13)
sign (a)/2
k1
(14)
.
(15)
2. If |a|>2 the diagonal elements produced by the reduction phase are given
by
p
(16)
ak = sign (a) a2 4 coth |y0 |2k ,
where
y0 =
ln a2 4 + a ln 2
ln a2 4 a + ln 2
if a > 2
if a < 2 .
(17)
and it is possible to take advantage of that fact. One can predict, depending on a and the machine precision, the number of Cyclic Reduction-steps
until ak may be assumed to be constant. E.g. for a = 2.5 the 14 leading
decimal digits of ak are constant for for k 5 [3]. Thus we can apply
incomplete Cyclic Reduction to solve the special system. Furthermore
by making use of the explicit expressions for ak the computation can be
vectorized.
Poissons Equation
(18)
ba
,
m+1
y =
dc
n+1
(19)
VII
be the step sizes of the grid and denote by uij the approximation for the
function value u(xi , yj ) at xi = a + ix and yj = c + jy. For simplicity, we
shall assume hereafter that x = y. If we use central difference approximations for the Laplace operator and if we assume Dirichlet zero boundary
conditions then we obtain the linear system
Tu = g
(20)
A I
I A ...
Rnmnm ,
T =
.
.
.. .. I
I A
(21)
4 1
1 4 . . .
Rmm .
A=
.
.
.. .. 1
1 4
(22)
The unknown vector u contains the rows of the matrix uij i.e.
u1j
u1
u2
u2j
u = . Rnm where uj = . Rm .
.
.
.
.
umj
un
The right hand side is partitioned similarly:
g1j
g2j
g1
g2
and g = .
..
(23)
Rnm .
gm
Elimination of the unknowns with odd indices can be done by matrix multiplications since for the system (20) the matrices Dj = A and Fj = Ej = I
do commute. As pointed out in [5], the block matrices of the system of equations one obtains when discretizing Poissons equation in a rectangle using
the nine-point formula do also commute. Thus in this case elimination can
also be done with matrix multiplication.
VIII
rhs
g2j1
g2j
g2j+1 .
(r)
g2j1
(24)
(r)
A(r) g2j
(r)
g2j+1 ,
k+1r
j = 1, . . . , 2
1.
(25)
Thus after r reduction steps the remaining system of equation has the size
(2k+1r 1) (2k+1r1 ):
(r)
j = 1, . . . , 2k+1r 1.
(26)
A(k) u2k = g1 .
(27)
Computational Simplifications
IX
(28)
(29)
(30)
almost the same recurrence as for P2r . Multiplying (30) by 2 and substituting k = 2r , we obtain
2T2r+1 (t) = 2 (2T2r (t))2 = 2 (2T2r (t))2 .
Thus the polynomials 2T2r (t) obey the same recurrence relation as P2r .
However, since P1 (x) = x and 2T1 (t) = 2t we conclude that x = 2t and
therefore
x
P2r (x) = 2T2r
, r 0.
(31)
2
The properties of the Chebychev polynomials are well known and by (31),
we can translate them to P2r (x). The zeros of P2r are
2i 1
(k)
, i = 1, 2, . . . , 2r
i = 2 cos
2r+1
and since for r 1 the leading coefficient of P2r (x) is 1, the factorization
is
2r
2r
Y
Y
(k)
(k)
(r)
P2r (x) = (x i ), thus A = (A i I).
(32)
i=1
i=1
zi = Azi1 zi2 ,
Then u = z2r = P2r (A)v = A(r) v. In [5] it is proved that the resulting
algorithm cyclic odd-even reduction and factorization (CORF) based on
this procedure is numerically unstable.
3. Using the product of linear factors (32) we compute z0 = v and
(k)
i = 1, 2, . . . , 2r
zi = (A i I)zi1 ,
i = 1, 2, . . . , 2r
(A i I)zi = zi1 ,
(33)
(r)
X c
1
1
i
= Q 2r
=
,
(k)
(k)
P2r (x)
i=1 x i
i=1 (x i )
where
(r)
ci
1
(k)
P20r (i )
2
Y
j=1
j6=i
(k)
(k)
1
XI
v= A
(r)
1
u = (P2r (A))
u=
2
X
(r)
(k)
ci (A i I)1 u
i=1
(k)
Bunemans Algorithm
It has been observed that block Cyclic Reduction using the recurrence relation
(25)
(r+1)
gj
(r)
(r)
(r)
j = 1, . . . , 2k+1r 1
gj
(r)
(r)
(0)
pj
and qj
= 0 and
(0)
qj
(r)
(r1)
(r1)
(r1)
= pj
(r)
2. pj
3. qj
(r1)
(r1)
(r)
= A(r) pj + qj
for v
v
(r1)
(r)
(r)
(r)
XII
3.4
Fourier Algorithm
2 1
1 2 . . .
Rmm
B=
.
.
. . . . 1
1 2
has the eigenvalues i eigenvectors Q with
r
qij =
2
ij
sin
,
m+1
m+1
i = 2 2 cos
i
.
m+1
(34)
a b
b a ...
.
C=
.
.
.
.
. . b
b a
Since C = bB + (a + 2b)I we conclude that the eigenvectors are the same
i = bi + a + 2b. The eigenvalues of A
and that the eigenvalues of C are
are obtained for b = 1 and a = 4.
Thus we can decompose the matrix A (22) as A = QQT . Introducing
this decomposition in the system (20) and multiplying each block equation
from the left by QT we obtain
QT uj1 + QT uj + QT uj+1 = QT gj ,
j = 1, . . . , n.
j = 1, . . . , n.
XIII
and by permuting the equations in the same way, the system (20) is transformed into m decoupled tridiagonal systems of equations
g1
T1
u
1
u
T2
2 g2
=
(35)
.
.
..
.. ..
.
Tm
i 1
u
m
gm
1 i . . .
Rn .
Ti =
... ... 1
1 i
Summarizing we obtain the following algorithm to solve Equation (20):
1. Compute the eigen-decomposition A = QQT using the explicit expression (34).
2. Transform the right hand side gj = QT gj , j = 1, . . . , n.
3. Permute equations and unknowns and solve the m decoupled equations
(35) in parallel.
4. Compute the solution by the back-transformation uj = Q
uj .
Notice that for the transformations in the second and forth step the fast
Fourier transform may be used [9].
For solving the special decoupled linear systems we have the choice of several methods. First they may be solved efficiently using Gaussian elimination
in parallel on different processors. Because of the special structure [1, k , 1]
of the tridiagonal matrix the elements of the LU decomposition converge and
may be assigned their limit [3].
A second possibility is to compute the eigen-decomposition of [1, k , 1] =
T . The eigenvectors are the same for all the decoupled systems, so only
kQ
QD
one decomposition has to be computed. However, this method is more expen 1 Q
T
sive than Gaussian elimination. To solve one system by applying QD
k
2
to the right hand side we need O(n ) operations while Gaussian elimination
only needs O(n).
Finally we may solve these special systems in parallel using the incomplete
Cyclic Reduction described in Section 2.4. The operation count is higher than
for Gaussian elimination but still of O(n). The advantage of this method is
that one can make use of a vector arithmetic unit if it is available on a single
processor [3].
Conclusions
XIV
XV
References
1. Amodio, P., Paprzycki, M.: A Cyclic Reduction Approach to the Numerical
Solution of Boundary Value ODEs. SIAM J. Sci. Comput. Vol. 18, No. 1,
(1997) 5668
2. Arbenz, P., Hegland M.: The Stable Parallel Solution of General Banded Linear
Systems. Technical Report #252, Computer Science Dept. ETH Z
urich, 1996.
3. Bondeli, S., Gander, W.: Cyclic Reduction for Special Tridiagonal Matrices.
SIAM J. for Matrix Analysis Vol. 15 (1994)
4. Buneman, O.: A Compact Non-iterative Poisson Solver. Rep. 294, Stanford
University, Institute for Plasma Research, Stanford, Calif. (1969)
5. Buzbee, B. L., Golub, G. H., Nielson, C. W.: On Direct Methods for Solving
Poissons Equations. SIAM J. Numerical Analysis, 7 (4) (1970) 627656
6. Calvetti, D., Gallopoulos, E., Reichel, L.: Incomplete partial fractions for parallel evaluation of rational matrix functions. Journal of Computational and
Applied Mathematics 59 (1995) 349380
7. Golub, G. H., Ortega, J. M.: Scientific Computing. An Introduction with Parallel Computing. Academic Press, Inc., (1993)
8. Heller, D.: Some Aspects of the Cyclic Reduction Algorithm for Block Tridiagonal Linear Systems. SIAM J. Numer. Anal., 13 (4) (1976) 484496
9. Hockney, R.W.: A Fast Direct Solution of Poissons Equation Using Fourier
Analysis. J. Assoc. Comput. Mach. 12 (1965) 95113
10. Lambiotte, J., Voigt, R.: The Solution of Tridiagonal Linear Systems on the
CDC Star100 Computer. ACM Trans. Math. Softw. 1 (1975) 308329
11. Sweet, R. A.: , A Parallel and Vector Variant of the Cyclic Reduction Algorithm.
SIAM J. Sci. and Statist. Computing 9 (4) (1988) 761765