Mikko Honkala
This licentiate thesis has been submitted for official examination for the
degree of Licentiate of Science in Technology in Espoo on March 27, 2002.
The local and global convergence of the methods have been studied and a
convergenceaiding strategy for the multilevel methods is proposed.
Simulation examples are presented and the results have been discussed.
iii
Contents
Abstract iii
Preface iii
Contents iv
1 Introduction 1
2 Background 2
2.1 Hierarchical analysis . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 Parallel processing in circuit simulation . . . . . . . . . . . . . . 3
2.3 APLAC circuit simulation and design tool . . . . . . . . . . . . 4
5 Convergence 24
5.1 Local Convergence . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.2 Global Convergence . . . . . . . . . . . . . . . . . . . . . . . . . 27
6 Implementation in APLAC 30
6.1 Parallel processing . . . . . . . . . . . . . . . . . . . . . . . . . 30
6.2 Hierarchical analyzer . . . . . . . . . . . . . . . . . . . . . . . . 31
6.3 Iteration models . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.3.1 Iterative model . . . . . . . . . . . . . . . . . . . . . . . 32
6.3.2 Incremental model . . . . . . . . . . . . . . . . . . . . . 33
6.4 Aiding the convergence . . . . . . . . . . . . . . . . . . . . . . . 34
iv
7 Simulation examples 35
7.1 Transistor circuits . . . . . . . . . . . . . . . . . . . . . . . . . . 35
7.2 Operationalamplifier circuit . . . . . . . . . . . . . . . . . . . . 44
7.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
8 Conclusion 46
References 47
v
List of Symbols and Abbreviations
symbols
(·)i ith subcircuit variable/function
(·)E External variable/function
(·)Ei External variable/function of ith subcircuit
(·)k,j Variable at kth outer and jth inner iteration
A Matrix block of Jacobian matrix
b Right hand side vector of linear matrix equation
B Matrix block of Jacobian matrix
B̂ Matrix block of upper triangular matrix
B Ball
C Matrix block of Jacobian matrix
Ĉ Matrix block of lower triangular matrix
D Matrix block of of Jacobian matrix
DSub Jacobian matrix of subcircuit
e Error of iteration
E Voltage source
f Function
g Nodal conductance
G Conductance
Gmin Minimum conductance
G Conductance matrix
i Subcircuit index
i Current
iD Diode current
I Set of is
j Inner iteration index
j Nodal current source vector
J Jacobian matrix
J Maximum number of inner iterations
k Outer iteration index
K, K̂, K̄ Constants
K Auxiliary matrix
L Lower triangular matrix
L Constant
m Number of subcircuits
M Constant
n Number of nodes
n Number of controlling voltages
vi
symbols
N Constant
p Vector of variables
P Constant
Q Transistor
R Resistance
S1 , S2 Speedups
S Preconditioner
u Controlling voltage
u Vector of controlling voltages
U Upper triangular matrix
tserial Time of serial DC analysis
thierarchical Time of hierarchical DC analysis
tparallel Time of parallel DC analysis
v Nodal voltages
v Vector of node voltages
Vcc Supply voltage
w Vector of variables
x Variable
x Vector of variables
x∗ Solution vector
xaux Auxiliary vector of variables
α Variable
β Variable
δ Range
∆x Newton–Raphson update
ε Maximum error
η Constant
λ Damping factor
ρ Variable
Ω Open set
τ Maximum error of inner iteration
vii
Abbreviations
AC Alternating Current
APLAC Formerly, Analysis Program for Linear Active Circuits
or Analysis and design Program for mixed
Lumped And distributed Circuits.
Nowadays, APLAC is not an acronym but name
of the circuit simulator and design tool.
BBD Bordered Block Diagonal
CGNE Conjugate Gradient for Normal Equations
DC Direct Current
GN Gauss–Newton
HB Harmonic Balance
ITA Iterated Timing Analysis
LS Least Squares
LU LowerUpper
MEMS Microelectromechanical Systems
MIMD Multiple Instruction stream / Multiple Data stream
MLNA Multilevel Newton Analysis
MLNR Multilevel Newton–Raphson
MNA Modified Nodal Analysis
MPI Message Passing Interface
NR Newton–Raphson
NRGN Newton–Raphson and Gauss–Newton
NRNR Newton–Raphson and Newton–Raphson
PVM Parallel Virtual Machine
RF Radio Frequency
VCCS VoltageControlled Current Source
WR Waveform Relaxation
viii
1 Introduction
The need for fast and accurate circuitsimulation tools is obvious. During
the design process, there is a need to perform computationally demanding
numerical simulations to verify the functionality of the circuit under design.
Therefore, faster and faster computers and simulation programs are neces
sary. In order to fully utilize the possibilities of existing computer hardware,
sophisticated and finetuned programs are required.
One of the most effective ways to reduce the computing time is to use par
allel processing. The necessary requirement for parallel processing is parallel
hardware. Traditionally, the parallel processing is performed in supercomput
ers with multiple processors, but these computers are usually very expensive.
Thus, ”the poor man’s supercomputers”, namely, networks of workstations
(NoW) are utilized as parallel computers. In networked parallel processing,
each serial (or parallel) computer is used as a processing unit and data is
transferred via a local area network, like Ethernet.
This thesis is the first step toward a fully parallel APLAC circuit simulation
and design tool. The parallel processing of a NoW is applied in APLAC’s DC
analysis. The results obtained in this thesis can be utilized in the future for
other analyses, too.
The main contribution of this thesis is the implementation of parallel hier
archical analysis methods based on circuit decomposition in APLAC, as well
as their finetuning such that they are suitable for networked processing and,
of course, for demanding DC analysis. The need of minimal communication
between computers in the network requires the utilization of multilevel iter
ation methods; and because of problems encountered in the convergence of
DC iteration, the methods are further improved such that their convergence
properties are enhanced
The methods have been implemented in APLAC and detailed experiments
have been performed for the evaluation of the proposed methods.
1
2 Background
2.1 Hierarchical analysis
For hierarchical analysis, there have been introduced concepts like diakoptics
and tearing in the 1970’s [15, 33, 35, 36, 114]. In the 1990’s, the term domain
decomposition has been connected to these methods [53]. In these methods, the
linear or linearized circuit equations are ordered into bordered block diagonal
(BBD) form, which can be decomposed into separately solved submatrices.
The equations are solved by using hierarchical LU factorization and forward
backward substitution. The BBD ordering of the matrix can be done even
recursively on multiple levels of hierarchy. These methods have been efficiently
utilized for parallel computation in DC and transient analysis [11,12,28,34,47,
53,65,66,90,97,103,104,108,109,113]. Other, theoretical and practical studies
of these methods can be found in Refs. [10, 60, 67, 84, 93, 111]. Even utilization
of diakoptics in large change sensitivity analysis has been studied in Ref. [99].
The partitioning of circuits has been continuously under study and algo
rithms can be found, e.g., in Refs. [21, 27–29, 33, 37, 46, 92, 121, 124]. The main
idea is to divide the circuit automatically such that it is optimal for hierarchical
analysis methods.
However, it has been pointed out [67, 69] that the decomposition methods
for solving the linearized circuit equations belong to the 1970’s and that they
cannot be compared to modern sparse matrix equation solvers.
In the methods above, the decomposion is performed on the linear equation
level, but if the circuit is partitioned before linearization, then, on the nonlinear
equation level, nonlinear analysis methods like Multilevel Newton–Raphson
(MLNR) methods (or Multilevel Newton Analysis, MLNA [77]) can be applied.
These methods have been applied to DC and transient analysis to solve the
system of (discretized) nonlinear equations [8, 9, 17, 28, 41, 56, 57, 113, 125, 126].
They can be used also in the HarmonicBalance (HB) method [83] as well
as in the simulation of microelectromechanical systems (MEMS) [2, 3, 96] and
mixed circuit/device systems [64,85]. The multilevel methods can be effectively
parallelized [8, 9, 28, 41, 113, 123, 125, 126].
The third possibility of performing a transient analysis hierarchically is to
decompose the circuit on an algebraic differential equation level and analyze
circuit using the Waveform Relaxation (WR) [59], which has been also used in
parallel circuit simulation [44, 68, 71–74, 90, 91, 98, 112, 117, 122, 124].
A powerful property of the hierarchical methods is the possibility to utilize
timedomain latency [19, 61, 75–78, 86], which usually reduces computation in
the transient analysis of, at least, digital circuits because they usually are very
modular and latent. Analog circuits are mostly tightly coupled and the latency
property is not so significant.
An example of a modern approach to hierarchical analysis (different from
2
parallel processing) is the HSIM simulator [110]. It is designed to simulate
nanoscale circuits, which consist of millions of transistors and many repetitive
subcircuits. It uses hierarchical storage of circuits for similar circuits and, thus,
minimizes memory consumption. Modelorder reduction methods are utilized
to reduce the size of linear subcircuits. HSIM uses hierarchical equation solving
methods that exploit the similarity of circuit conditions and waveforms.
One may say that, for mixed analog/digital circuit simulation [89], the end
of the evolution of hierarchical analysis is a method whereby the digital part
is separated from the analog part and analyzed using efficient port level or
behavioral level analysis methods, while the analog part is simulated using
standard transistorlevel methods.
3
– in linearequation solving with iterative methods (like linear relax
ation methods) [23, 42, 54, 63, 100–102, 127],
– in nonlinearequation solving with MLNR methods [8, 9, 28, 41, 113,
123, 125, 126],
– in nonlinearequation solving with Iterated Timing Analysis (ITA)
[43, 79, 90], or
– in algebraic differentialequation solving with WR [44,68,71–74,90,
91, 98, 112, 117, 122, 124],
• Periodic steadystate analysis with the HB method [80–82] and with the
shooting method [49].
The above list does not include references to logic or highlevel descrip
tion language simulation algorithms and programs. The list is not complete,
and some references could be included in many categories but are, however,
included only in one category.
Notice that in the list above, there are only references where parallel pro
cessing algorithms are used in the context of circuit simulation, and of course,
there are numerous books and articles about parallel algorithms themselves
that can be applied in circuit simulation, e.g., in Ref. [24].
Networked circuit simulation has been studied, e.g., in Refs. [4,28,53,70,74,
82,98,113]. MLNR methods especially have proven to be efficient for transient
analysis in networked computing [28, 53, 113].
4
3 Parallel hierarchical Newton–Raphson method
3.1 Introduction
DC analysis is the basis of all circuit simulation. Before AC analysis, the
operating point has to be found, and the DC solution is the initial condition
for the transient analysis. Moreover, the DC characteristics themselves are
sometimes of interest.
DC analysis solves the steadystate behavior of the circuit variables under
the DC excitation. Usually, the nonlinear circuit equations are solved iter
atively using Newton–Raphson (NR) method. Unfortunately, the iteration
often lacks a good initial guess and convergence is not guaranteed. Therefore,
convergenceaiding methods are needed.
The parallelization of DC analysis can be performed in many ways, e.g.,
existing parallel sparsematrix packages can be utilized for the linearized equa
tion solving.
In the following subsections, the theory of the parallel DC analysis based
on hierarchical circuit decomposition is described. The equation formulation is
presented in Section 3.2 and the parallel NR method with convergence aiding
is summarized in Section 3.5.
f(x) = 0, (1)
5
where x ∈ Rn are nodal voltages (or currents) of the circuit and f : Rn → Rn
has a Jacobian matrix J. The nodal equations are, typically, solved using the
NR method which is a sequence of iterations
where k is the iteration index. The NR update ∆xk is solved from the linear
equation
Consider a circuit which has n nodes and which can be decomposed into m
subcircuits consisting of ni internal nodes and nEi external connection nodes.
Fig. 1 presents a circuit having two subcircuits.
Subcircuit 1 Subcircuit 2
1,1 E1 2,1
1,2 E2
6
G J
The nonlinear system of nodal equations for internal and external nodes
can be written as
fi (xi , xE ) = 0,
fE (x1 , . . . , xm , xE ) = 0, (4)
6
where
∂fi
Ai := ∈ Rni ×ni , (6)
∂xi
∂fi
Bi := ∈ Rni ×nE , (7)
∂xE
∂fE
Ci := ∈ RnE ×ni , (8)
∂xi
∂fE
D := ∈ RnE ×nE . (9)
∂xE
fE , as well as D, can be further decomposed into parts that contain the
contributions of the circuit elements of the main circuit and each subcircuit:
Xm
fE = fE0 + fEi , (10)
i=1
Xm
D = DE0 + Di . (11)
i=1
Example 1
Subcircuit 1 Subcircuit 2
D1 D2
1,1 E 2,1
6
G1 G3 J G2
7
f2 (v2,1 , vE ) = G2 v2,1 − iD2 (vE , v2,1 ) = 0
fE (v1,1 , v2,1 , vE ) = −J + G3 vE + iD1 (vE , v1,1 ) + iD2 (vE , v2,1 ) = 0
where iD1 and iD2 are the diode currents and v1,1 , v2,1 , and vE node voltages
of the nodes 1,1, 2,1, and E, respectively. The Jacobian takes the form
G1 − ∂iD1 − ∂iD1
∂v1,1 ∂vE
G2 − ∂v∂i ∂i
− ∂v
J= D2 D2
.
2,1 E
∂iD1 ∂iD2 ∂iD1
G3 + ∂v + ∂v∂iD2
∂v 1,1 ∂v 2,1 E E
DE0 = G3 ,
∂iD1
D1 = ,
∂vE
∂iD2
D2 = .
∂vE
f can be decomposed similarly.
The parallel solution of the linear equations (12) can be performed us
ing hierarchical LU factorization (see Section 3.3). During the solution pro
cess, contributions of the subcircuits (which can be interpreted as Norton’s
equivalentcircuit presentations of the subcircuits) have to be computed in or
der to solve variables of the main circuit. Solving the subcircuit part of Eq.
(12) we obtain update equations of the subcircuit variables
∆xi = −A−1 −1
i fi − Ai Bi ∆xE . (13)
and, substituting this to the main circuit part of the Eq. (12), the main circuit
equation is then
!
Xm X
m
DE0 + DSub,i ∆xE = −fE0 + fSub,i , (14)
i=1 i=1
where
8
Subcircuit 1 Subcircuit 2
1,1 E1;1
E0;1
E2;1 2,1
i1 1
; i2 1
;
1,2 E1;2
E0;2
 22
E ;
i1 2
;
6 i2 2
;
G J
The formulation is essential for the new MLNR methods presented in Section
4.
The idea of this formulation is that, this way, each subcircuit has indepen
dent shortcircuit currents ii ∈ RnEi and external variables xEi ∈ RnEi , and the
only connection is at the main circuit level. Thus the subcircuits do not have
common variables. The nodal equations for the internal nodes, the subcircuit
connection nodes and the main circuit nodes take the form
fi (xi , xEi ) = 0,
fEi (xi , xEi ) + ii = 0, (17)
fE0 (xE0 , xEI , iI ) = 0,
respectively, where i = 1, . . . , m, subscript I denotes all i, fEi : Rni × RnE i →
RnE i , and fE0 : RnE 0 × RnEI × RnEI → RnE0 .
Eqs. (13) and (17) show that only ∆xE and ii have to be transferred
into subcircuits during the communication between the main circuit and the
subcircuits.
The idea of separation of the subcircuits is not completely new [126], but
a new way of taking advantage of this formulation is presented in Section 5.2.
Example 2
Now, we may continue the previous example. By adding the shortcircuit cur
rents (see Fig. 4) and applying the MNA formulation, the nonlinear equations
become
f1 (v1,1 , vE1 ) = G1 v1,1 − iD1 (vE1 , v1,1 ) = 0 (18)
f2 (v2,1 , vE2 ) = G2 v2,1 − iD2 (vE2 , v2,1 ) = 0 (19)
fE 1 (v1,1 , vE1 ) − i1 = iD1 (vE1 , v1,1 ) − i1 = 0 (20)
fE 2 (v2,1 , vE2 ) − i2 = iD2 (vE2 , v2,1 ) − i2 = 0 (21)
fE 0,1 (vE1 , vE ) = vE − vE1 = 0 (22)
fE 0,2 (vE2 , vE ) = vE − vE2 = 0 (23)
9
Subcircuit 1 Subcircuit 2
D1 D2
1,1 E1
i1 E i2  E2 2,1
6
G1 G3 J G2
10
Formerly, hierarchical LU factorization has been used (in serial machines)
as a sparse matrix equation solver. The efficiency of hierarchical LU factoriza
tion is discussed in detail in Refs. [35, 69, 114]. But even if the decomposition
is efficient, the cost of hierarchical analysis is at least (m + α)n + m3 /3 + m2
(α is the sparsity factor, n is number of the nodes and m is number of the
subcircuits), while for good sparse matrix code, it is only αn [69, pp. 383–384].
Typically, the complexity of the sparse matrix computation is n1.1−1.5 [58]. As
a conclusion, we may say that hierarchical decomposition does not necessarily
improve the speed of linear equation solving. The cost of the symbolic re
ordering of a sparse matrix depends more on the size of a matrix — at least in
APLAC (see Section 7) — and can be reduced with efficient decomposition.
However, hierarchical LU factorization is a way to perform the circuit anal
ysis in parallel. If there is only a need to solve linear equation in parallel, it
might be useful to apply existing parallel solver, e.g., SuperLU [20]. However,
the shared memory version of the SuperLU algorithm has been noticed to be
inefficient for circuit matrices [6]. Moreover, hierarchical LU factorization is a
way to produce Norton’s equivalent circuits of the subcircuits.
Hierarchical linearequation solving is considered in the following, and sum
marized in Algorithms 1–4. Here, twolevel hierarchy is treated because the
practical implementation in APLAC (see Section 6) uses only two levels of
hierarchy. Ref. [109] presents the multilevel algorithm for LU factorization.
The linear BBD equations (12) can be decomposed into separate systems
of subcircuit equations
Ai Bi xi bi
= . (26)
Ci Di xEi bEi
For convenience, the unknowns are denoted by x and the RHS by b. In NR
equations, x consists of the NR updates and b = −f(x).
Hierarchical LU factorization starts from the partial LU factorization. The
factorization is stopped before accessing Di :
Ai Bi Li \Ui B̂i
→ , (27)
Ci Di Ĉi Di
where
Ĉi = Ci U−1
i , (28)
−1
B̂i = Li Bi . (29)
Because U and L are triangular matrices, the calculation of Ĉi and B̂i is a
straightforward task.
The next step is to compute the contributions of the subcircuit Jacobian
matrices
DSub,i = Di − Ĉi B̂i (30)
11
for the main level equations. Notice that the Jacobian matrix is the same as
in Eq. (15). Algorithm 1 describes the LU factorization and the calculation of
DSub,i .
Algorithm 1 LU factorize(circuit)
1. LU factorize Ai : Ai = Li Ui .
2. Ĉi = Ci U−1
i
3. B̂i = L−1
i Bi
The solving continues on the subcircuit level with forward substitution, i.e.,
solving the auxiliary vector z from the equation
Li zi = bi . (31)
which is the same as fSub,i in Eq. (16). The Algorithm 2 performs the forward
substitution and computation of the bSubi . The vector bSub,i and matrix DSub,i
are transferred into the main level equation solver for the solving of the external
(main circuit level) variables.
Algorithm 2 Forward(circuit)
1. Solve Li zi = bi .
2. bSub,i = bE − Ĉi zi
Before solving the inner variables of the subcircuits, the main level equa
tions have to be constructed from the subcircuit contributions and from the
main level circuit:
!
Xm Xm
DE0 + DSub,i xE = bE0 + bSub,i . (33)
i=1 i=1
From the circuittheoretical point of view, the resulting matrix equation (33)
is the linearized circuit equation of the main circuit such that subcircuits are
expressed as Norton’s equivalent circuits. The external voltages xE can be
12
solved using LU factorization, but there is no restriction on the main level
equation solver.
Then, the values of the external voltages are transferred into the subcircuits
and terms zi − B̂i xEi are constructed for the backward substitution. After that,
it is trivial to solve the equation
Algorithm 3 Backward(circuit)
1. Solve Ui xi = zi − B̂i xEi .
2. Forall: Forward(subcircuit)
4. Forall: Backward(subcircuit)
We can verify that the inner variables obtained from the substitutions (31)
and (34)
xi = U−1 i zi − B̂ i xE i
= U−1 −1 −1 −1
i Li bi − Ui Li Bi xE i (35)
= A−1 −1
i bi − Ai Bi xEi
13
3.4 Aiding the convergence
The convergence of NR iteration is guaranteed only sufficiently near the solu
tion [52]. Due to a poor initial guess, the iteration may diverge or be extremely
slow. Exponential characteristics, which are very common in electrical circuits,
lead easily to numerical overflow. Therefore, some additional aid is needed to
guarantee the convergence.
NR iteration convergenceaiding methods can be roughly divided into three
different groups:
• Heuristic stepsize adjusting methods that calculate the step sizes of NR
iteration due to some heuristic rules. Typically, a priori knowledge of
diode current characteristics is exploited.
• Norm reduction methods, which exploit the fact that the NR step is
in the descent direction of the norm kfk and that the step size can be
adjusted such that norm reduces monotonically at every iteration.
14
Algorithm 5 Parallel DC analysis(circuit)
1. Set k = 0.
3. Begin iteration:
4. End iteration.
15
4 Multilevel Newton–Raphson methods
In the previous section, the hierarchical NR method was presented. What
next? The only improvement that we can obtain is the speedup from paral
lelization (and sometimes from the decomposition itself which is mostly due
to the partitioning of the symbolic analysis). In order to improve the speed
more, other iteration methods than NR iteration have to be utilized. Dig
ital circuits are usually modular, latent, and unidirectional, in other words,
loosely coupled. Because block, waveform, and nonlinear relaxation methods
utilize these properties, they have been found suitable for this group of cir
cuits. For analog circuits, which usually are tightly coupled, the capabilities
of these methods cannot be fully exploited, but it might be worthwhile trying
to use MLNR methods because they have been effectively applied in parallel
processing [28, 113, 125, 126].
where j is the inner iteration index. The inner iteration is stopped at some
error level τ = min(τ 0 , k∆xE k2 ) (τ 0 is the maximum allowed error level) which
is needed for quadratic convergence of the outer level iteration [77]. The initial
guess for the inner variables xk,0i in the inner iteration can be at every inner
iteration the same or it may be the ending values of the previous iteration.
The maincircuit variables are iterated using subcircuits as macromodels.
The MLNA [77] is presented in Algorithm 6.
Algorithm 6 MLNA(circuit)
1. Set x0E , ε and τ 0 .
i = −fi (xi , xE i ).
(a) Solve Ai∆xk,j k,j k
(c) Set j = j + 1.
16
(d) If k∆xk,j
i k > τ go to Step 3 (a).
6. τ = min(τ 0 , k∆xE k2 )
7. Set k = k + 1.
17
In the following, two new MLNR methods are presented. The main em
phasis is in the convergence of the DC analysis: how to improve convergence
of the MLNR methods and, thus, the speed of the analysis.
Algorithm 7 MLNR(circuit)
1. Set x0,0 , ε and J.
18
(e) if k∆xk > ε and k < K go to step 2 (a).
Algorithm 8 InnerNR(circuit)
1. Begin iteration: Set j = 0.
(c) Set j = j + 1.
(d) If j < J or k∆xi k > ε go to step 1 (a).
19
4.2.2 Combined NRGN method
In order to take the connection nodes into account in the inner iteration,
some additional connectionnode equations have to be solved together with
the other subcircuit equations. The resulting system of nonlinear equations
is overdetermined and is solved in the sense of LeastSquares (LS) using GN
iteration. Moreover, the GNiteration step is in the descending direction of the
residual [94]. Thus, the step sizes can be damped such that the total residual
norm decreases monotonically at every inner iteration.
The squared norm of the left hand side of Eq. (17) is
kfk22 = kf1 k22 + kfE 1 + i1 k22 + . . . + kfm k22 + kfE m + im k22 + kfE 0 k22 . (38)
During the inner iterations, xEi and ii are kept constant and only internal
variables, xi , are iterated. If we use the NR method, the direction of the inner
NR step is in the descent direction of kfi k22 . However, it is not guaranteed that
kfEi + ii k22 decreases. But if we use GN iteration in the inner loop, i.e., solve
the equations
fi (xi ) = 0, (39)
fEi (xi ) = 0, (40)
in the sense of LS, the direction is in the descending direction of the norm
kfi k22 + kfEi + ii k22 , and thus in the descending direction of the total norm.
The linearized GN equations of the nonlinear subcircuit equations (39) and
(40) take the form
Ri ∆xi = bi , (41)
Algorithm 9 InnerGN(circuit)
1. Begin inner iteration: Set j = 0.
20
(c) If j = J or k∆xi k < ε, end inner iteration, else set j = j + 1 and
go to step 1 (a).
2. End inner iteration.
21
Eq. (55) has to be factored in order to obtain L22 and U22 .
LS solution can be solved using forwardbackward substitution from the
equation
x
LLS ULS = b. (56)
xaux
Algorithm 10 summarizes the discussion.
Algorithm 10 LSLU()
1. LU factorize Ai = L11 U11 .
2. L21 = CU−1
11
3. K = L21 L−1
11
4. U12 = −L−1
11 K
T
The other possibility to solve the LS solution is to use CGNE. The CGNE is
an iterative Krylovsubspace method for normal equations, and the Algorithm
is as follows:
Algorithm 11 CGNE()
T
1. Set initial guess x0 ; s0 = b − Rx0 ; r0 = RT s0 ; p0 = r0 ; ρ0 = (r0 ) r0 ;
p−1 = 0; β−1 = 0
2. For i = 0, 1, 2, . . .
22
(h) ρi+1 = (ri+1 )T ri+1
ρ
(i) βi = ρi+1
i
3. End.
and, because we have the LU factors of the matrix A, then the preconditioned
matrix is
−1 1 1
RA = = . (59)
CU−1 L−1 ĈL−1
4.2.3 DC sweep
In a DC sweep, a voltage, model parameter, or temperature is swept and the
DC analysis is done several times repetitively. Then, the previous analysis is a
good initial guess for the next analysis, and thus the convergence is improved.
If, e.g., a voltage source whose voltage is swept is inside a subcircuit, then,
in the first iteration of the next DC analysis, there is a need to perform the
inner iteration only for the subcircuit that contains this source. If the source
is at the main level, there is no need to perform the inner iterations at all, and
the outer iteration step is taken immediately.
23
5 Convergence
The convergence of the NR and MLNR methods is considered in this section.
For application, it is crucial to know whether the iteration method converges
or not. Usually, nonlinear iteration methods converge provided that the ini
tial guess is close to the solution; however, the iteration may diverge due to
poor initial guess. Therefore, the treatment of convergence is divided into
two sections. First, the local convergence of the methods and, then, global
convergence is discussed.
• A solution x∗ ∈ Ω exists.
• J(x∗ ) is nonsingular.
Lemma 1 If Assumption 1 holds, then there exist K > 0 and δ > 0, such that
if xk ∈ B(δ), then the NR iterate satisfies
where e = x − x∗ .
24
The proof can be also found, e.g., in Ref. [52].
Next, convergence properties of the NRNR method will be shown. The
assumptions for the forthcoming Lemma are presented.
25
Lemma 1 holds for all j by induction. 2
Now, when the effect of the inner iteration on the error is known, it
is straightforward to show the quadratic convergence of the outer iterations
({x0,0 , x1,0 , . . .}) of the NRNR method. The idea of the proof is to show that
if we are sufficienly close to the solution, the inner iteration cannot disturb the
iteration too much and the outer iteration converges quadratically.
Theorem 2 If Assumptions 1 and 2 hold on Ω, there is δ > 0 such that if
x0,0 ∈ B(δ), then the outer iteration of the NRNR method converges quadrati
cally.
Proof. Let δ be small enough so that B(K̂δ) ⊂ Ω. Reduce, if needed, δ
such that K K̂ 2 δ = η < 1. If xk,0 ∈ B(δ), then Lemma 2 implies that after J
inner iterations xk,J ∈ B(K̂δ) ⊂ Ω and we can continue the iteration. Lemma
1 implies that, after outer iteration,
kek+1,0 k ≤ Kkek,J k2 ≤ K K̂ 2 kek,0k2 ≤ K K̂ 2 δkek,0k = ηkek,0 k < kek,0k (69)
and xk+1,0 ∈ B(ηδ) ⊂ B(δ). Since x0,0 ∈ B(δ), xk,0 converges to x∗ quadrat
ically. 2
The NRGN method converges under slightly different assumptions.
Assumption 3 There is an L such that
kf(x) − f(x∗ )k ≤ Lkx − x∗ k, (70)
and that for some P
k(Rk,j )† k ≤ P. (71)
The first assumption is the same as in Assumption 2, but a bound is as
sumed for the norm of the pseudoinverse R† instead of A−1 .
Lemma 3 From Assumption 3, it follows that for all j = 1, 2, . . .
kek,j k ≤ K̄kek,0k, (72)
where K̄ = (1 + P M)j .
Proof. The proof is similar to the proof of Lemma 2. The only exception
is that A is now replaced by R. 2
Similarly, the outerlevel convergence of the NRGN method can be proven.
Theorem 3 If Assumptions 1 and 2 hold on Ω, there is δ > 0 such that if
x0,0 ∈ B(δ), then the outer iteration of NRGN method converges quadratically.
Proof. The proof is similar to the proof of the Theorem 2. 2
26
5.2 Global Convergence
There are numerous different convergenceaiding methods for the conventional
NR method (they were briefly discussed in Section 3.4) but there have only
been a few reported attempts to improve the convergence of the multilevel
NR method [123, 126]. In Ref. [77], it is mentioned that during transient
analysis, it may sometimes happen that the iteration does not converge and
that the convergence problem can be handled such that the time step is halved.
However, this approach does not solve the convergence problems in DC (and,
e.g., in harmonic balance) analysis.
Refs. [125, 126] present a twolevel line search (normreduction) scheme
which finds such an inner solution that the direction of the outer iteration is in
the descent direction and a line search can be performed between points xk+1,0
and xk,0 . Ref. [123] applies nonmonotone line search methods to the MLNR
methods.
In the following, it will be shown how the convergence of the new MLNR
methods can be aided by using any existing stepsize adjusting methods (which
are normally used together with the conventional NR method), such that every
inner and outer iteration reduces the total error norm kfk monotonically. The
proposed strategy does not limit the utilization of stepsize adjusting methods
to some specific normreduction methods but allowes wider range of methods
to be applied.
The general goal of the stepsize adjusting methods is to find a damping
factor, λ, such that the error norm
f xk,j + λ∆xk,j
<
f xk,j
. (73)
is not satified during the inner iteration, the iteration should be stopped at
the point where the iteration has last reduced the norm.
If GN iteration is applied as the inner iteration, the direction of the iteration
step is always descending and a λ can be found such that the norm is reduced.
However, due to numerical reasons, it may happen that a λ that satisfies the
condition cannot be found. Again, the inner iteration should be stopped at
the point where the error norm was reduced. In Fig. 5, the damping of the
inner iteration step is shown, while the external voltages, xE , (and shortcircuit
currents) are kept constant.
27
6 ...............................................................
xi
..
......
...........
........
........................ ...............
............
..........
........
.....
...
....
.................. ....................................... ..........
. .. .
. ..
. ..
. .
.. ....
.
........ ....
. .
.
...
...
... 6 ..
.
.
...
... .....
.
..
................... ........................... ......... ........
. ... .
.
.
.
..... .... ....
............................. ....... ..... .....
... ... ... .
... .... ... ... ...
... ... ... ... ... ... ... ...
... ... ... .... ... ... ... ...
... ... .... ..... ... .. ... ...
... .... .... .. . .
... ..... .... ....... .. ... ... ..
...
... ..... ....... .............. .... .... ....
. ..
... ..... ....... . . ...
.... ..... .... .. ....
. ..
.... ..... ..... ... .... ..
.... ..... ...... .
. ..
.... ..... . ........... ...... . . ...
..
6x = x + x
.... .... ...... .
. ..
.... ... .
. ...
.....0;0
....
....
0;1 0;0
... ......
. .
....
. ..
..
..
.... . ........... .
.... ...................... ..
...
.....
.....
..... ....
..... ...
..... ..
.....
..... ...
.
.... ..
.
.....
..... ...
..... ...
..... ..
..... ...
.
......
...
x
......
....... ..
0;0 ........ ...
.........
............ ..
.
.....
.
..................................

x E
After J iterations, the outeriteration step starts from the point where
the error is smaller than or equal to that of the starting point of the inner
iteration. Because the outer iteration step is a normal NR step, it is in the
direction of steepest descent and any stepsize adjusting methods can be used.
The damping of the outer iteration is presented in Fig. 6.
6 ...........................................................
xi
..........................
....
.
......
........
.......... ................
............
..........
........
..
. ................... ..........
..
. ..
.
....
...... . ............. ... . ..
...
. ....... ........... .....
*
.. ..
. ..... ...
.
.
. .
... ....... ..... ..
.... .. ...... .
. ... ... ............. ......... ........
... ...
... ..... ..... ..... .... ...
... ... ..... .......
. ..... ... ... .......... ....... ...... ......
... ... ... ..... .... ... .. ..
...
...
...
...
...
...
.
...
...
....
.
....
....
.
. .
.
*x = x + x0
... ... ... ...
1.
.. .. .. ...
...0 .... ....0;J....
;. ;J
... .
.... .... .....
... .... .... ..... .... ..... ..... .....
... .. .... ........ ........... ... .. .. ...
... ..... .. ... ..
... ....
... ..... .... .. .. ..
.... ..... .....
. .
... .... ..
..
..... .......
....
.... ..... ........................ .... ...
6x
.... .... . ..
.... ..... .. ...
.... 0 ;J ..... .
.. ..
.... . .. ....... .. ..
.... .......... .... ..
................... . ..
..... ...
..... ..
..... ..
..... ..
.....
..... .
. ..
.... .
.
..... .
.....
.... ...
..
.....
..... ..
..... .....
.....
...... ...
0;0
x ......
.......
........
..........
............. .
...
.
.
..
...
...
................................

x E
28
be illconditioned and produce bad directions. The direction may not be in
the descent direction, or such small step sizes are needed that normreduction
methods are not able to find them. Especially, controlled sources have a very
undirective nature [120] and produce bad directions.
In addition, continuation or homotopy methods can be applied on the outer
iteration level similar to the normal NR method. For example, if the outer
iteration fails to improve the iteration, source stepping can be started.
29
6 Implementation in APLAC
The methods proposed have been implemented in the inhouse development
version of APLAC circuit simulation and design tool.
The parallel hierarchical analysis is implemented such that one main (or
master) APLAC executes parallel APLACs in each machine in the computer
network. One computer is used for both a subcircuit and the main circuit,
because the computational cost of maincircuit solving is so small that it is
not worthwhile to devote one machine to be the maincircuit solver. The other
computers are only used for subcircuits.
Because APLAC is programmed in the objectoriented way [38, 107], the
hierarchical analysis is also implemeted using the same objectoriented ap
proach. The analysis in each parallel APLAC is performed by the hierarchical
analyzer object which uses the hierarchical sparse matrix object, the parallel
message passing2 object, and the subcircuit interface object. The main ana
lyzer controls the DC analysis and calls slave analyzers (in parallel APLACs)
when necessary.
The decomposition is done such that the user specifies the subcircuits with
DefModel, APLAC’s subcircuit modeling component, and then each parallel
APLAC interprets the input file and creates its own subcircuit and analyzer.
So, there is no need for prepartitioning of the input file into separate input
files.
Section 6.1 explains the implementation of parallel processing in APLAC,
and Section 6.2, the structure of the hierarchical analyzer in more detail.
Because APLAC creates the equations using the iterationmodel approch,
some aspects of it are discussed in Section 6.3. The Section 6.4 presents the
convergenceaiding methods in APLAC which have been applied in parallel
methods.
30
The PVM creates a virtual machine, multicomputer, which has multiple
machines and a software backplane, Pvmd, to coordinate the operation. Pvmd
is the PVM daemon, a process that serves as message router and virtual ma
chine coordinator.
31
i(uk+1 )

or
X
n
j(u) = i(u ) − k
gi uki . (78)
i=1
When all nonlinear VCCSs have been linearized, the nodal matrix equation
can be constructed:
Gv = j, (79)
where G, v, and j are nodal conductance matrix, nodal voltage, and current
source vectors, respectively. It can be easily shown that if Eq. (77) is used,
the linear matrix equation is equal to the NR matrix equation
The first model is called the incremental model because the solution of the
linear equation is the increment to the variables iterated, ∆xk . The second
model is called the iterative model, because the solution is the new iterates of
the variables, xk+1 .
Hierarchical LU factorization produces the iteration model of the subcir
cuits (DSub and bSub ) during the solving process.
32
The inner variables can be solved from the upper part of Eq. (82) by using
forward and backward substitutions (31) and (34):
−1
xik,j+1
= Ui zi − B̂i xE
k,j k,j+1
= U−1 i L−1
i fi
k,j
+ A xk,j
i i + B xk,j
i E − U−1 −1
i Li Bi xE
k,j+1
(83)
= A−1 k,j
i fi − xk,j −1
i + Ai Bi xE
k,j
− A−1
i Bi xE
k,j+1
.
The substitutions reduce to inner NR iteration because xk,j+1 E = xk,j
E in this
case. The drawback of the model is that instead of simply solving equation
Ai ∆xi = −fi (xi ), there is a need to perform the unnecessary forwardbackward
computation. Thus, the iterative model approach is not the most effective in
terms of computational cost. Moreover, in order to perform the GN iteration,
Bi and Di have to be removed from the conductance matrix. The Jacobian
matrix is easy to construct due to the block structure, but the extra manipula
tion of the source vector is unavoidable. The extra terms have to be removed
from the source vector as follows:
k+1 k k
A xi fi A B xi B k
=− + − xE . (84)
C xk+1
E fEk C D xkE D
33
Due to these problems, the iterative model was chosen to be implemented.
• Maximum stepsize limiting method that ensures that the NR step size
does not exceed a specified level. If the step size is larger than maximum
allowed, e.g., 1 V for voltages, the step size is limited to that level. This
improves the convergence because the linearization is a good approxima
tion only close to the linearization point, and the method prevents too
large steps.
34
7 Simulation examples
Three different circuits were simulated using the hierarchical DC analysis
methods: two differentsized transistor circuits and one operational amplifier
circuit.
The simulations were performed in a local area network containing three
PCs (Pentium III 650 MHz and two AMD 800 MHz). Ethernet was used as
the connection network. The network is not devoted to these experiments only
and, therefore, the results may be slightly disturbed.
There are many different formulas [24] to compute the efficiency of the
parellelization, but what really counts for the user is the real time (wall clock
time) used for the simulations after he/she has pushed the button. Therefore,
only the resulting wallclock times of the parallel simulations are presented.
For the oneprocessor hierarchical NR method CPU times are presented, too.
Vcc
6
R2
R
R
jQ R
R
R
R
35
1080. They are then divided into two or three subcircuits according to the
number of parallel computers or hierarchical analyzers.
Table 1: CPU times (s) of the hierarchical analysis of the moderate transistor
circuit. The total wallclock times are in parentheses.
Table 2: CPU times (s) of the hierarchical analysis of the large transistor
circuit. The total wall clock times are in parentheses.
36
The parallel simulation runs were performed with parallel NR (NRpar),
NRNR, NRGN with LSLU (NRGN LU) and CGNE (NRGN CGNE) methods.
In the multilevel methods, J was varied between 1 and 4.
Tables 3–6 present the numbers of outer iterations in the sweep points.
Figs. 10 and 11 present the simulation times of the moderate circuit with two
and three processors, Figs. 12 and 13 those of the large circuit.
The calculated speedups
tserial
S1 = (85)
tparallel
versus serial (not hierarchical) APLAC are presented in Figs. 14 and 15. The
speedups of the hierarchical analysis versus the number of the subcircuits are
in the same figures.
The improvements obtained from parallelization with respect to hierarchi
cal analysis
thierarchical
S2 = (86)
tparallel
are presented in Figs. 16 and 17. The times of the parallel analysis are
compared to the times of hierarchical NR with the same number of subcircuits
as parallel processors.
The behaviour of the error kfk of the outer iteration in the first sweep
points is presented in the Figs. 18 and 19. The maximum number of inner
iterations J is 4.
J NRNR NRGN LU
0 17+8+8+8+8+8
1 11+5+5+5+5+5 11+3+3+3+3+3
2 5+3+3+3+3+3 6+2+2+2+2+2
3 6+3+3+3+3+3 5+2+2+2+2+2
4 5+3+3+3+3+3 4+2+2+2+2+2
37
Table 4: Number of outer iterations of the moderate transistor circuit with
three processors
J NRNR NRGN LU
0 16+8+8+8+8+8
1 11+5+5+5+5+5 11+3+3+3+3+3
2 5+3+3+3+3+3 6+2+2+2+2+2
3 6+3+3+3+3+3 5+2+2+2+2+2
4 5+3+3+3+3+3 4+2+2+2+2+2
Table 5: Number of outer iterations of the large transistor circuit with two
processors
J NRNR NRGN LU
0 17+8+8+8+8+8
1 12+5+5+5+5+5 12+3+3+3+3+3
2 6+3+3+3+3+3 6+2+2+2+2+2
3 6+3+3+3+3+3 5+2+2+2+2+2
4 5+3+3+3+3+3 4+2+2+2+2+2
Table 6: Number of outer iterations of the large transistor circuit with three
processors
J NRNR NRGN LU
0 16+8+8+8+8+8
1 11+5+5+5+5+5 11+3+3+3+3+3
2 5+3+3+3+3+3 6+2+2+2+2+2
3 6+3+3+3+3+3 5+2+2+2+2+2
4 5+3+3+3+3+3 4+2+2+2+2+2
38
7.8
7.6
7.4
7.2
Simulation time
6.8
6.6
6.4
6.2
0 1 2 3 4
Inner iterations
Figure 10: Simulation times (s) of the moderate transistor circuit vs. maxi
mum number of inner iterations J. Two processors were used. NRNR (— o)
NRGN LU (— ∗ ) and NRNR CGNE (— 2 ) with 0 inner iterations is the NR
method.
6.3
6.2
6.1
5.9
Simulation time
5.8
5.7
5.6
5.5
5.4
5.3
0 1 2 3 4
Inner iterations
Figure 11: Simulation times (s) of the moderate transistor circuit vs. maxi
mum number of inner iterations J. Three processors were used. NRNR (— o)
NRGN LU (— ∗ ) and NRNR CGNE (— 2 ) with 0 inner iterations is the NR
method.
39
27.5
27
26.5
Simulation time
26
25.5
25
24.5
24
0 1 2 3 4
Inner iterations
Figure 12: Simulation times (s) of the large transistor circuit vs. maxi
mum number of inner iterations J. Two processors were used. NRNR (— o)
NRGN LU (— ∗ ) and NRNR CGNE (— 2 ) with 0 inner iterations is the NR
method.
20.4
20.2
20
19.8
19.6
Simulation time
19.4
19.2
19
18.8
18.6
18.4
0 1 2 3 4
Inner iterations
Figure 13: Simulation times (s) of the large transistor circuit vs. maxi
mum number of inner iterations J. Three processors were used. NRNR (— o)
NRGN LU (— ∗ ) and NRNR CGNE (— 2 ) with 0 inner iterations is the NR
method.
40
1.7
1.6
1.5
1.4
Speed−up
1.3
1.2
1.1
1
1 2 3
Processors/subcircuits
Figure 14: Speedup of the simulation of the moderate transistor circuit using
NRpar (— ), NRNR (—o ), NRGN LU (— ∗ ) and NRNR CGNE (— 2 ) with 2 inner
iterations vs. number of processors. Speedup of hierarchical analysis ( ×)
  vs.
number of subcircuits.
1.8
1.7
1.6
1.5
Speed−up
1.4
1.3
1.2
1.1
1
1 2 3
Processors/subcircuits
Figure 15: Speedup of the simulation of the large transistor circuit using
NRpar (— ), NRNR (—o ), NRGN LU (— ∗ ) and NRNR CGNE (— 2 ) with 2 inner
iterations vs. number of processors. Speedup of hierarchical analysis ( ×)
  vs.
number of subcircuits.
41
1.45
1.4
1.35
1.3
1.25
Speed−up
1.2
1.15
1.1
1.05
1
1 2 3
Processors/subcircuits
Figure 16: Speedups of the simulation of the moderate transistor circuit ob
tained from parallelization. NRpar (— ), NRNR (— o ), NRGN LU (— ∗ ) and
NRNR CGNE (— 2 ) with 2 inner iterations versus number of processors with
respect to hierarchical analysis.
1.25
1.2
1.15
Speed−up
1.1
1.05
0.95
1 2 3
Processors/subcircuits
Figure 17: Speedups of the simulation of the large transistor circuit ob
tained from parallelization. NRpar (— ), NRNR (— o ), NRGN LU (— ∗ ) and
NRNR CGNE (— 2 ) with 2 inner iterations versus number of processors with
respect to hierarchical analysis.
42
2
10
0
10
−2
10
−4
10
Error
−6
10
−8
10
−10
10
−12
10
0 2 4 6 8 10 12 14 16
Iteration
), NRNR (—
Figure 18: Error norm of the outer iteration of NRpar (— o ), and
NRGN (— ∗ ) methods.
2
10
0
10
−2
10
−4
10
Error
−6
10
−8
10
−10
10
−12
10
0 2 4 6 8 10 12 14 16
Iteration
), NRNR (—
Figure 19: Error norm of the outer iteration of NRpar (— o ), and
NRGN (— ∗ ) methods.
43
7.2 Operationalamplifier circuit
The operationalamplifier circuit is a series connection of 9 inverting operational
amplifier configurations (Fig. 20). The operational amplifiers consist of 12
BJTs and 7 resistors presented in Ref. [95, p. 438]. In the inverting configura
tion, R = 1 kΩ. The input node is connected to ground and the output node
is floating. The supply voltages are ±2.5 V. The simulation goal was to find
the DC operating point of the circuit.
The CPU times of hierarchical analysis are presented in Table 7. They show
how the time of the iteration part dominates the analysis and slows down the
DC analysis.
Fig. 21 presents the behaviour of the error norm (J = 3). As we can see,
the NRGN iteration diverges after 4 outer iteration. This is the point where
the stepsize limiting methods fail and it was time to use the sourcestepping
method.
7.3 Discussion
The results show that the nonparallel hierarchical analysis was able to reduce
the time used for the symbolic analysis, but only in one example is it able
to improve the iteration part of the analysis. In the operationalamplifier
example, the speedup obtained from the symbolic analysis is totally buried
44
1
10
0
10
−1
10
−2
10
Error
−3
10
−4
10
−5
10
−6
10
−7
10
0 5 10 15 20 25
Iteration
), NRNR (—
Figure 21: Error norm of the outer iteration of NRpar (— o ), and
NRGN (— ∗ ) methods.
by the iteration part. These results are in agreement with the discussion in
Ref. [69].
The speedup of the parallel hierarchical NR method was not magnificnt.
The large transistor circuit example with two processors was even slower than
nonparallel simulation. The MLNR methods slightly improve the situation.
However, the speedups of the MLNR methods are not so good as Refs. [28,113]
lead us to expect, but it might be that the number of the computers used was so
small that reduction of the network communication does not matter anymore.
The nonideal implementation of the inner iterations also affects the results.
The results also show that a suitable default value for the maximum number
of inner iterations, J, is two. It is in agreement with the experimental results
of Refs. [125, 126].
In the transistor examples, the GN iteration reduces the number of outer
iterations but the operationalamplifier example shows that the NRGN method
is not always superior. It seemed that solving the inner solution in the sense of
LS brings the solution to a state that the outer iteration is not able to recover
from anymore.
As a final remark, we may say that the conventional NR method is quite
fast and robust, but using extra inner iterations we may sometimes improve
the nonlinear iteration.
45
8 Conclusion
The parallel hierarchical approach to DC analysis was presented. In hierar
chical analysis, the circuit is decomposed into subcircuits. This allows the
utilization of hierarchical analysis methods, like NR iteration with hierarchical
LU factorization.
It was studied how to implement DC analysis in a NoWs where low commu
nication was needed. Therefore, multilevel methods, as well, were considered
and implemented in APLAC.
The main emphasis was on aiding convergence, which is the critical part
in DC analysis. If a method does not converge, it cannot be fast, either.
Two new MLNR methods were presented. With the new NRNR and NRGN
methods, convergenceaiding methods can be applied. In addition, quadratic
local convergence was proven for both methods.
The simulation results, however, showed that with a small number of pro
cessors, the results were not spectacular and the multilevel methods bring
about only small improvements. Network communication, which the MLNR
methods reduces, was not such a bottleneck as presented in literature.
The study of the parallel methods can be applied in other analyses, too,
and the work done serves as an introduction to parallel circuit simulation in
APLAC.
46
References
[1] “Message Passing Interface Forum. MPI: A MessagePassing Interface
Standard,” Int J. Supercomputer applications, vol. 8, no. 3/4.
[6] W. Bomhof, Iterative and Parallel methods for Linear Systems with ap
plications in Circuit Simulation. PhD thesis, Utrecht University, 2001.
[7] W. Bomhof and H. A. van der Vorst, “A Parallel Linear System Solver
for Circuit Simulation Problems,” Numer. Linear Algebra Appl., vol. 7,
pp. 649–665, 2000.
[11] M.C. Chang and I. Hajj, “iPRIDE: A Parallel Integrated Circuit Sim
ulator Using Direct Method,” Digest of Technical Paper of ICCAD’88,
pp. 304–307, 1988.
47
[12] C.C. Chen and Y. H. Hu, “Parallel LU Factorization for Circuit Simula
tion on a MIMD Computer,” Proceedings of the 1988 IEEE International
Conference on Computer Design ICCD’88, pp. 129–132, 1988.
[13] C.C. Chen and Y. H. Hu, “A Practical Scheduling Algorithm for Parallel
LU factorization in Circuit Simulation,” Proc. Int. Symp. Circuits and
Systems, vol. 3, pp. 1788–1791, 1989.
[15] L. O. Chua and L.K. Chen, “Diakoptic and Generalized Hybrid Anal
ysis,” IEEE Trans. Circuits Syst., vol. CAS23, pp. 694–705, December
1976.
[16] P.Y. Chung and I. N. Hajj, “Parallel solution of Sparse Linear Systems
on a Vector Multiprocessor Computer,” Proc. Int. Symp. Circuits and
Systems, vol. 2, pp. 1577–1580, 1990.
48
[23] V. B. DmitrievZdorov, N. I. Merezin, V. P. Popov, and R. A. Dougal,
“Stability of RealTime Modular Simulation of Analog System,” Proc.
of The 7th workshop on Computers in power Electronics (COMPEL),
pp. 263–267, 2000.
[24] J. J. Dongarra, I. S. Duff, D. C. Sorensen, and H. A. Van der Vorst, Nu
merical Linear Algebra for HighPerformance Computers. Philadelphia:
SIAM, 1998.
[25] K.M. Eickhoff and W. L. Engl, “Levelized Incomplete LU Factorization
and Its Application to LargeScale Circuit Simulation,” IEEE Trans
actions on ComputerAided Design of Integrated Circuits and Systems,
vol. 14, pp. 720–727, June 1995.
[26] W. L. Engl, R. Laur, and H. K. Dirks, “MEDUSA — A Simulator for
Modular Circuits,” IEEE Transactions on ComputerAided Design of
Integrated Circuits and Systems, vol. CAD1, pp. 85–93, April 1982.
[27] N. Fröhlich, V. Glöckel, and J. Fleichmann, “A New Partitioning Method
for Parallel Simulation of VLSI Circuits on Transistor Level,” Proceedings
of the Design, Automation and Test in Europe Conference and Exhibi
tion, pp. 679–684, 2000.
[28] N. Fröhlich, B. M. Riess, U. A. Wever, and Q. Zheng, “A New Approach
for Parallel Simulation of VLSI Circuits on a Transistor Level,” IEEE
Trans. Circuits Syst. I, vol. 45, pp. 601–613, June 1998.
[29] N. Fröhlich, R. Schlagenhaft, and J. Fleischmann, “A New Approach
for Partitioning VLSI Circuits on Transistor Level,” Proceedings of 11th
Workshop on Parallel and Distributed Simulation, pp. 64–67, 1997.
[30] D. A. Gates, P. K., and D. O. pederson, “MixedLevel circuit and de
vice Simulation on a DistributedMemory Multicomputer,” Proceedings
of IEEE 1993 Custom Integrated Circuits Conference, pp. 851–854, 1993.
[31] H. Gaunholt, P. Heikkilä, K. Mannersalo, V. Porra, and M. Valto
nen, “Gyrator Transformation — A Better Way for Modified Nodal
Approach,” Proceedings of European Conference on Circuit Theory and
Design, vol. 2, pp. 864–872, July 1991.
[32] A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek, and V. Sun
deram, PVM: Parallel Virtual Machine, A Users’ Guide and Tutorial
for Networked Parallel Computing. The MIT Press, 1994.
[33] G. Guardabassi and A. SangiovanniVincentelli, “A Two Level Algorithm
for Tearing,” IEEE Trans. Circuits Syst., vol. CAS23, pp. 783–791, De
cember 1976.
49
[34] K. Hachiya, T. Saito, T. Nakata, and N. Tanabe, “Enchancement of Par
allelism for Tearingbased Circuit Simulation,” Proceedings of the Asic
and South Pacific Design Automation Conference, pp. 493–498, 1997.
[36] H. H. Happ, Diakoptics and Networks. New York: Academic Press, 1971.
50
Theme: Frontiers Computer Technology (TENCON’94), vol. 2, pp. 832–
836, 1994.
51
[57] Z. M. KovàcsV. and A. Benedetti, “MUSIC: A Novel MUltilevel Simu
lator for MOS Integrated Circuits,” Proceedings of European Conference
on Circuit Theory and Design, pp. 559–600, 1993.
[60] F. Li and P.Y. Woo, “A New Concept the ’Virtual Circuit’ and Its Ap
plication in LargeScale Network Analysis with Tearing,” International
Journal of Circuit Theory and Applications, vol. 27, pp. 283–291, 1999.
52
[68] P. Odent and H. L. Claesen, De Man, “Combined Waveform Relaxation
– Waveform Relaxation Newton Algorithm for efficient parallel Circuit
Simulation,” Proceedings of the European Design Automation Confer
ence, pp. 244–248, 1990.
53
[79] C. V. Ramamoorthy and V. Vij, “CMCIM: A Parallel Circuit Simulator
on a Distributed Memory Multiprocessor,” Proceedings of 7th interna
tional Conference on VLSI Design, pp. 39–44, January 1994.
[85] F. M. Rotela, Mixed Circuit and Device Simulation for Analysis, Design,
and Optimization of Optoelectronic, Radio Frequanecy, and High Speed
Semiconductor Devices. PhD thesis, Stanford University, April 2000.
54
[91] R. A. Saleh and J. K. White, “Accelerating Relaxation Algorithms for
Circuit Simulation Using WaveformNewton and StepSize Refinement,”
IEEE Transactions on ComputerAided Design, vol. 9, pp. 951–958,
September 1990.
[101] R. Suda, New Iterative Linear Solvers for Parallel Circuit Simulation.
PhD thesis, Department of Information Sciences, University of Tokyo,
1996.
55
[102] R. Suda and Y. Oyanagi, “Implementation of sparta, a Highly Paral
lel Circuit Simulator by the Preconditioned Jacobi Method, on a Dis
tributed Memory Machine,” Proc. International Conference on Super
computing, (Barcelona), pp. 209–217, 1995.
[103] N. Tanaka and H. Asai, “Large Scale Circuit Simulation System with
Dedicated Parallel Processor SMASH,” The Transactions of the IEICE,
vol. E 73, pp. 1957–1963, December 1990.
[104] N. Tanaka and H. Asai, “Architecture for Simulation System with Con
sideration of Circuit partition,” Proceedings of IEEE International Sym
possium on Circuits and Systems, pp. 2689–2692, 1991.
[109] M. Vlach, “LU Decomposition Algorithms for Parallel and Vector Com
putation,” Analog Methods for ComputerAided Circuit Analysis and De
sign (T. Ozawa, ed.), pp. 37–64, Marcel Dekker inc, 1988.
[112] Y.C. Wen, K. Gallivan, and R. Saleh, “Improving Parallel Circuit Simu
lation using HighLevel Waveforms,” Proc. Int. Symp. Circuits and Sys
tems, vol. 1, pp. 728–731, 1995.
[113] U. Wever and Q. Zheng, “Parallel Transient Analysis for Circuit Simu
lation,” Proceedings of the 29th Annual Hawaii international Conference
on Systems Sciences, pp. 442–447, 1996.
56
[114] F. F. Wu, “Solution of LargeScale Networks by Tearing,” IEEE Trans.
Circuits Syst., vol. CAS23, pp. 706–713, December 1976.
57
[125] X. Zhang, “Dynamic and Static Load Balancing for Block Bordered
System Circuit Equations on Multiprocessors,” IEEE Transactions on
ComputerAided Design, vol. 11, pp. 1086–1094, September 1992.
58