Вы находитесь на странице: 1из 66

HELSINKI UNIVERSITY OF TECHNOLOGY

Department of Electrical and Communications Engineering

Mikko Honkala

PARALLEL HIERARCHICAL DC ANALYSIS

This licentiate thesis has been submitted for official examination for the
degree of Licentiate of Science in Technology in Espoo on March 27, 2002.

Supervisor of the thesis

Prof. Martti Valtonen

Instructor of the thesis

D.Sc. (Tech.) Janne Roos


HELSINKI UNIVERSITY OF TECHNOLOGY ABSTRACT OF
LICENTIATE THESIS

Author: Mikko Honkala

Name of the Thesis: Parallel Hierarchical DC Analysis

Date: March 27, 2002 Number of pages: 58


Department: Electrical and Communications Engineering

Professorship: Circuit Theory


Supervisor: Prof. Martti Valtonen
Instructor: D.Sc. (Tech.) Janne Roos

During the design process, there is a need to perform computationally


demanding numerical simulations to verify the functionality of the circuit
under design. One of the most effective ways to reduce the computing time
is to use parallel processing.

Parallel hierarchical DC analysis methods based on circuit decomposition


are presented. A parallel Newton–Raphson method and parallel multilevel
Newton–Raphson methods suitable for distributed computing in networks
of workstations have been implemented in APLAC.

The local and global convergence of the methods have been studied and a
convergence-aiding strategy for the multilevel methods is proposed.

Simulation examples are presented and the results have been discussed.

KEYWORDS: Parallel Processing, Circuit Simulation, Newton–Raphson,


Multilevel Newton–Raphson
Preface
This work was done at the Circuit Theory Laboratory of Helsinki University
of Technology. The work started in 1999 as the implementation of hierarchical
analysis method in APLAC, but quickly changed to parallellization. Then,
it was followed by the development of parallel multilevel methods. I wish to
thank prof. Martti Valtonen for allowing all this to happen.
The NRGN method would not have been created without Ville Karanko.
He suggested the use of Gauss–Newton iterations and showed how to perform
the linear operations of the method efficiently. While Ville Karanko studied
the linear part of the method, I studied the nonlinear part, convergence and
speed. I thank him for ”nonviolent communication” during the development
process.
In addition, I want to thank my instructor D.Sc. (Tech.) Janne Roos for
guidance and cooperation in writing the conference papers, Sakari Aaltonen for
correcting my English, Luis Costa for helping with LATEX, and Jarmo Virtanen
for exposing APLAC’s internal life.
Especially, I wish to thank my wife Sanna for her support during these
years.

Espoo, March 27, 2002


Mikko Honkala

iii
Contents
Abstract iii

Preface iii

Contents iv

List of Symbols and Abbreviations vi

1 Introduction 1

2 Background 2
2.1 Hierarchical analysis . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 Parallel processing in circuit simulation . . . . . . . . . . . . . . 3
2.3 APLAC circuit simulation and design tool . . . . . . . . . . . . 4

3 Parallel hierarchical Newton–Raphson method 5


3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2 Circuit equation formulation . . . . . . . . . . . . . . . . . . . . 5
3.3 Hierarchical LU factorization and linear equation solving . . . . 10
3.4 Aiding the convergence . . . . . . . . . . . . . . . . . . . . . . . 14
3.5 Summary of NR method . . . . . . . . . . . . . . . . . . . . . . 14

4 Multilevel Newton–Raphson methods 16


4.1 Conventional multilevel methods . . . . . . . . . . . . . . . . . 16
4.2 New multilevel methods . . . . . . . . . . . . . . . . . . . . . . 18
4.2.1 NRNR method . . . . . . . . . . . . . . . . . . . . . . . 19
4.2.2 Combined NRGN method . . . . . . . . . . . . . . . . . 20
4.2.3 DC sweep . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5 Convergence 24
5.1 Local Convergence . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.2 Global Convergence . . . . . . . . . . . . . . . . . . . . . . . . . 27

6 Implementation in APLAC 30
6.1 Parallel processing . . . . . . . . . . . . . . . . . . . . . . . . . 30
6.2 Hierarchical analyzer . . . . . . . . . . . . . . . . . . . . . . . . 31
6.3 Iteration models . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.3.1 Iterative model . . . . . . . . . . . . . . . . . . . . . . . 32
6.3.2 Incremental model . . . . . . . . . . . . . . . . . . . . . 33
6.4 Aiding the convergence . . . . . . . . . . . . . . . . . . . . . . . 34

iv
7 Simulation examples 35
7.1 Transistor circuits . . . . . . . . . . . . . . . . . . . . . . . . . . 35
7.2 Operational-amplifier circuit . . . . . . . . . . . . . . . . . . . . 44
7.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

8 Conclusion 46

References 47

v
List of Symbols and Abbreviations
symbols
(·)i ith subcircuit variable/function
(·)E External variable/function
(·)Ei External variable/function of ith subcircuit
(·)k,j Variable at kth outer and jth inner iteration
A Matrix block of Jacobian matrix
b Right hand side vector of linear matrix equation
B Matrix block of Jacobian matrix
B̂ Matrix block of upper triangular matrix
B Ball
C Matrix block of Jacobian matrix
Ĉ Matrix block of lower triangular matrix
D Matrix block of of Jacobian matrix
DSub Jacobian matrix of subcircuit
e Error of iteration
E Voltage source
f Function
g Nodal conductance
G Conductance
Gmin Minimum conductance
G Conductance matrix
i Subcircuit index
i Current
iD Diode current
I Set of is
j Inner iteration index
j Nodal current source vector
J Jacobian matrix
J Maximum number of inner iterations
k Outer iteration index
K, K̂, K̄ Constants
K Auxiliary matrix
L Lower triangular matrix
L Constant
m Number of subcircuits
M Constant
n Number of nodes
n Number of controlling voltages

vi
symbols
N Constant
p Vector of variables
P Constant
Q Transistor
R Resistance
S1 , S2 Speed-ups
S Preconditioner
u Controlling voltage
u Vector of controlling voltages
U Upper triangular matrix
tserial Time of serial DC analysis
thierarchical Time of hierarchical DC analysis
tparallel Time of parallel DC analysis
v Nodal voltages
v Vector of node voltages
Vcc Supply voltage
w Vector of variables
x Variable
x Vector of variables
x∗ Solution vector
xaux Auxiliary vector of variables
α Variable
β Variable
δ Range
∆x Newton–Raphson update
ε Maximum error
η Constant
λ Damping factor
ρ Variable
Ω Open set
τ Maximum error of inner iteration

vii
Abbreviations
AC Alternating Current
APLAC Formerly, Analysis Program for Linear Active Circuits
or Analysis and design Program for mixed
Lumped And distributed Circuits.
Nowadays, APLAC is not an acronym but name
of the circuit simulator and design tool.
BBD Bordered Block Diagonal
CGNE Conjugate Gradient for Normal Equations
DC Direct Current
GN Gauss–Newton
HB Harmonic Balance
ITA Iterated Timing Analysis
LS Least Squares
LU Lower-Upper
MEMS Microelectromechanical Systems
MIMD Multiple Instruction stream / Multiple Data stream
MLNA Multilevel Newton Analysis
MLNR Multilevel Newton–Raphson
MNA Modified Nodal Analysis
MPI Message Passing Interface
NR Newton–Raphson
NRGN Newton–Raphson and Gauss–Newton
NRNR Newton–Raphson and Newton–Raphson
PVM Parallel Virtual Machine
RF Radio Frequency
VCCS Voltage-Controlled Current Source
WR Waveform Relaxation

viii
1 Introduction
The need for fast and accurate circuit-simulation tools is obvious. During
the design process, there is a need to perform computationally demanding
numerical simulations to verify the functionality of the circuit under design.
Therefore, faster and faster computers and simulation programs are neces-
sary. In order to fully utilize the possibilities of existing computer hardware,
sophisticated and fine-tuned programs are required.
One of the most effective ways to reduce the computing time is to use par-
allel processing. The necessary requirement for parallel processing is parallel
hardware. Traditionally, the parallel processing is performed in supercomput-
ers with multiple processors, but these computers are usually very expensive.
Thus, ”the poor man’s supercomputers”, namely, networks of workstations
(NoW) are utilized as parallel computers. In networked parallel processing,
each serial (or parallel) computer is used as a processing unit and data is
transferred via a local area network, like Ethernet.
This thesis is the first step toward a fully parallel APLAC circuit simulation
and design tool. The parallel processing of a NoW is applied in APLAC’s DC
analysis. The results obtained in this thesis can be utilized in the future for
other analyses, too.
The main contribution of this thesis is the implementation of parallel hier-
archical analysis methods based on circuit decomposition in APLAC, as well
as their fine-tuning such that they are suitable for networked processing and,
of course, for demanding DC analysis. The need of minimal communication
between computers in the network requires the utilization of multilevel iter-
ation methods; and because of problems encountered in the convergence of
DC iteration, the methods are further improved such that their convergence
properties are enhanced
The methods have been implemented in APLAC and detailed experiments
have been performed for the evaluation of the proposed methods.

1
2 Background
2.1 Hierarchical analysis
For hierarchical analysis, there have been introduced concepts like diakoptics
and tearing in the 1970’s [15, 33, 35, 36, 114]. In the 1990’s, the term domain
decomposition has been connected to these methods [53]. In these methods, the
linear or linearized circuit equations are ordered into bordered block diagonal
(BBD) form, which can be decomposed into separately solved submatrices.
The equations are solved by using hierarchical LU factorization and forward-
backward substitution. The BBD ordering of the matrix can be done even
recursively on multiple levels of hierarchy. These methods have been efficiently
utilized for parallel computation in DC and transient analysis [11,12,28,34,47,
53,65,66,90,97,103,104,108,109,113]. Other, theoretical and practical studies
of these methods can be found in Refs. [10, 60, 67, 84, 93, 111]. Even utilization
of diakoptics in large change sensitivity analysis has been studied in Ref. [99].
The partitioning of circuits has been continuously under study and algo-
rithms can be found, e.g., in Refs. [21, 27–29, 33, 37, 46, 92, 121, 124]. The main
idea is to divide the circuit automatically such that it is optimal for hierarchical
analysis methods.
However, it has been pointed out [67, 69] that the decomposition methods
for solving the linearized circuit equations belong to the 1970’s and that they
cannot be compared to modern sparse matrix equation solvers.
In the methods above, the decomposion is performed on the linear equation
level, but if the circuit is partitioned before linearization, then, on the nonlinear
equation level, nonlinear analysis methods like Multilevel Newton–Raphson
(MLNR) methods (or Multilevel Newton Analysis, MLNA [77]) can be applied.
These methods have been applied to DC and transient analysis to solve the
system of (discretized) nonlinear equations [8, 9, 17, 28, 41, 56, 57, 113, 125, 126].
They can be used also in the Harmonic-Balance (HB) method [83] as well
as in the simulation of microelectromechanical systems (MEMS) [2, 3, 96] and
mixed circuit/device systems [64,85]. The multilevel methods can be effectively
parallelized [8, 9, 28, 41, 113, 123, 125, 126].
The third possibility of performing a transient analysis hierarchically is to
decompose the circuit on an algebraic differential equation level and analyze
circuit using the Waveform Relaxation (WR) [59], which has been also used in
parallel circuit simulation [44, 68, 71–74, 90, 91, 98, 112, 117, 122, 124].
A powerful property of the hierarchical methods is the possibility to utilize
time-domain latency [19, 61, 75–78, 86], which usually reduces computation in
the transient analysis of, at least, digital circuits because they usually are very
modular and latent. Analog circuits are mostly tightly coupled and the latency
property is not so significant.
An example of a modern approach to hierarchical analysis (different from

2
parallel processing) is the HSIM simulator [110]. It is designed to simulate
nanoscale circuits, which consist of millions of transistors and many repetitive
subcircuits. It uses hierarchical storage of circuits for similar circuits and, thus,
minimizes memory consumption. Model-order reduction methods are utilized
to reduce the size of linear subcircuits. HSIM uses hierarchical equation solving
methods that exploit the similarity of circuit conditions and waveforms.
One may say that, for mixed analog/digital circuit simulation [89], the end
of the evolution of hierarchical analysis is a method whereby the digital part
is separated from the analog part and analyzed using efficient port level or
behavioral level analysis methods, while the analog part is simulated using
standard transistor-level methods.

2.2 Parallel processing in circuit simulation


There are many ways of utilizing parallel processing in circuit simulation.
Mainly, the approach depends on the computer architecture used. There are
several types of architectures, but here we concentrate on MIMD (Multiple
Instruction stream / Multiple Data stream) type architectures, where multiple
instruction streams are executed in parallel [24]. Most multiprocessor com-
puters belong to this class. The programming models can be divided into two
classes: message passing and shared memory.
In the message-passing programming model, each processor has its own
local memory and message passing is used to deliver data between processors.
On the shared-memory model, the processors have shared data [24]. For the
programming, there are programming packages for message passing (e.g., PVM
[32], MPI [1]) and shared data (e.g., OpenMP 1 ).
Parallel processing can be performed in the hardware where single- or mul-
tiprocessor computers are connected in a network, and a software backplane is
used to control the processing. This system is treated as a single multiproces-
sor computer (Virtual Machine). This kind of computing is called networked
computing or distributed computing.
Parallel circuit simulation has been widely studied in the litterature. The
reseach done in the field of parallel (analog) circuit simulation can be divided
roughly into the following categories:

• DC and transient analysis on the transistor level (or in some cases on


the port level), where parallel processing is applied

– in linear-equation solving with direct methods (like LU factoriza-


tion) [6,7,11–14,16,18,25,28,30,34,45,47,50,51,53–55,62,65,66,88,
90, 97, 103, 104, 106, 108, 109, 113, 115, 116, 118, 119],
1
www.OpenMP.org

3
– in linear-equation solving with iterative methods (like linear relax-
ation methods) [23, 42, 54, 63, 100–102, 127],
– in nonlinear-equation solving with MLNR methods [8, 9, 28, 41, 113,
123, 125, 126],
– in nonlinear-equation solving with Iterated Timing Analysis (ITA)
[43, 79, 90], or
– in algebraic differential-equation solving with WR [44,68,71–74,90,
91, 98, 112, 117, 122, 124],

• Periodic steady-state analysis with the HB method [80–82] and with the
shooting method [49].

• Optimization and statistical design [70].

The above list does not include references to logic or high-level descrip-
tion language simulation algorithms and programs. The list is not complete,
and some references could be included in many categories but are, however,
included only in one category.
Notice that in the list above, there are only references where parallel pro-
cessing algorithms are used in the context of circuit simulation, and of course,
there are numerous books and articles about parallel algorithms themselves
that can be applied in circuit simulation, e.g., in Ref. [24].
Networked circuit simulation has been studied, e.g., in Refs. [4,28,53,70,74,
82,98,113]. MLNR methods especially have proven to be efficient for transient
analysis in networked computing [28, 53, 113].

2.3 APLAC circuit simulation and design tool


The APLAC circuit simulation and design tool has been continuously devel-
oped since the 70’s. Nowadays, it includes numerous, different linear and
nonlinear simulation methods, lumped and distributed circuit models, and
also different optimization methods. Besides analog and RF circuit simulation
methods, there are system-level and electromagnetic methods in APLAC. Even
parallelization of the optimization methods in APLAC has been studied [70].
Because the development still continues, there is no meaning to list all
possibilities of APLAC. More recent information about APLAC can be found
at www.aplac.hut.fi/aplac.

4
3 Parallel hierarchical Newton–Raphson method
3.1 Introduction
DC analysis is the basis of all circuit simulation. Before AC analysis, the
operating point has to be found, and the DC solution is the initial condition
for the transient analysis. Moreover, the DC characteristics themselves are
sometimes of interest.
DC analysis solves the steady-state behavior of the circuit variables under
the DC excitation. Usually, the nonlinear circuit equations are solved iter-
atively using Newton–Raphson (NR) method. Unfortunately, the iteration
often lacks a good initial guess and convergence is not guaranteed. Therefore,
convergence-aiding methods are needed.
The parallelization of DC analysis can be performed in many ways, e.g.,
existing parallel sparse-matrix packages can be utilized for the linearized equa-
tion solving.
In the following subsections, the theory of the parallel DC analysis based
on hierarchical circuit decomposition is described. The equation formulation is
presented in Section 3.2 and the parallel NR method with convergence aiding
is summarized in Section 3.5.

3.2 Circuit equation formulation


In a hierarchical analysis, the circuit is partitioned into subcircuits and the
main circuit consists of the connections between the subcircuits, which can be
whatever circuit elements or connection nodes only.
The decomposition can be performed using specific partitioning algorithms
or the user can choose the decomposition himself. Here, we do not consider how
the decomposition is performed but, as a good guideline, the optimal partition
has only a few connections when compared with the size of the subcircuit. The
following explains how to formulate the NR or MLNR iteration equations for
the decomposed circuit. The practical implementation in APLAC is presented
in Section 6.
The circuit equations can be created, e.g., using a nodal formulation or
modified nodal analysis (MNA) [39]. In this thesis, MNA is used to formulate
the circuit equations, and the variables are nodal voltages or currents. In
addition, the nodal voltages can describe, e.g, temperatures or mechanical
quantities. However, for convenience, only nodal voltages are considered in
the following sections. (In APLAC, the nonlinear MNA equations are reduced
to the nonlinear nodal formulation using the gyrator transformation [31]).
The system of nonlinear (modified) nodal equations can be written as

f(x) = 0, (1)

5
where x ∈ Rn are nodal voltages (or currents) of the circuit and f : Rn → Rn
has a Jacobian matrix J. The nodal equations are, typically, solved using the
NR method which is a sequence of iterations

xk+1 = xk + ∆xk (2)

where k is the iteration index. The NR update ∆xk is solved from the linear
equation

Jk ∆xk = −f(xk ). (3)

Consider a circuit which has n nodes and which can be decomposed into m
subcircuits consisting of ni internal nodes and nEi external connection nodes.
Fig. 1 presents a circuit having two subcircuits.

Subcircuit 1 Subcircuit 2

1,1 E1 2,1

1,2 E2

6
G J

Figure 1: Main circuit and two subcircuits.

The nonlinear system of nodal equations for internal and external nodes
can be written as

fi (xi , xE ) = 0,
fE (x1 , . . . , xm , xE ) = 0, (4)

respectively, where i = 1, . . . , m, and xi ∈ Rni are internal nodal voltages of


subcircuits, xE ∈ RnE voltages of external connection nodes of subcircuits,
fi : Rni × RnE → Rni , and fE : Rn1 × . . . Rnm × RnE → RnE .
The Jacobian matrix J has a bordered block diagonal (BBD) form [109]:
 
A1 B1
 A2 B2 
 
 ... ..  ,
J= .  (5)
 
 Am Bm 
C1 C2 . . . Cm D

6
where
∂fi
Ai := ∈ Rni ×ni , (6)
∂xi
∂fi
Bi := ∈ Rni ×nE , (7)
∂xE
∂fE
Ci := ∈ RnE ×ni , (8)
∂xi
∂fE
D := ∈ RnE ×nE . (9)
∂xE
fE , as well as D, can be further decomposed into parts that contain the
contributions of the circuit elements of the main circuit and each subcircuit:
Xm
fE = fE0 + fEi , (10)
i=1
Xm
D = DE0 + Di . (11)
i=1

Thus, the linearized BBD equations are


    
Ak1 Bk1 ∆xk1 f1 (xk1 , xE k )
 Ak2 Bk2   k   f2 (xk2 , xE k ) 
   ∆x2   
 ... ..   ..   .. 
 .   .  = − .  (12)
     
 Akm Bkm   ∆xkm   fm (xkm , xE k ) 
Ck1 Ck2 . . . Ckm Dk ∆xE k fE (xk1 , xk2 , . . . , xE k )

Example 1

Subcircuit 1 Subcircuit 2
D1 D2
1,1 E 2,1

6
G1 G3 J G2

Figure 2: Main circuit and two subcircuits.

As an example, consider the circuit in Fig. 2. The nonlinear nodal equa-


tions are as follows:
f1 (v1,1 , vE ) = G1 v1,1 − iD1 (vE , v1,1 ) = 0

7
f2 (v2,1 , vE ) = G2 v2,1 − iD2 (vE , v2,1 ) = 0
fE (v1,1 , v2,1 , vE ) = −J + G3 vE + iD1 (vE , v1,1 ) + iD2 (vE , v2,1 ) = 0

where iD1 and iD2 are the diode currents and v1,1 , v2,1 , and vE node voltages
of the nodes 1,1, 2,1, and E, respectively. The Jacobian takes the form
 
G1 − ∂iD1 − ∂iD1
 ∂v1,1 ∂vE 
 G2 − ∂v∂i ∂i
− ∂v 
J= D2 D2
.
 2,1 E 
∂iD1 ∂iD2 ∂iD1
G3 + ∂v + ∂v∂iD2
∂v 1,1 ∂v 2,1 E E

The submatrix D can be decomposed as follows

DE0 = G3 ,
∂iD1
D1 = ,
∂vE
∂iD2
D2 = .
∂vE
f can be decomposed similarly. 

The parallel solution of the linear equations (12) can be performed us-
ing hierarchical LU factorization (see Section 3.3). During the solution pro-
cess, contributions of the subcircuits (which can be interpreted as Norton’s
equivalent-circuit presentations of the subcircuits) have to be computed in or-
der to solve variables of the main circuit. Solving the subcircuit part of Eq.
(12) we obtain update equations of the subcircuit variables

∆xi = −A−1 −1
i fi − Ai Bi ∆xE . (13)

and, substituting this to the main circuit part of the Eq. (12), the main circuit
equation is then
!
Xm X
m
DE0 + DSub,i ∆xE = −fE0 + fSub,i , (14)
i=1 i=1

where

DSub,i := Di − Ci A−1i Bi , (15)


fSub,i := fEi − Ci A−1
i fi , (16)

are contributions of Norton’s equivalent-circuits.


The above formulation is enough for the parallel NR method, but these
equations can be modified such that we take the short-circuit currents flowing
from the main circuit to the subcircuits as additional variables (see Fig. 3).

8
Subcircuit 1 Subcircuit 2

1,1 E1;1
 E0;1
-E2;1 2,1

i1 1
; i2 1
;

1,2 E1;2
 E0;2
- 22
E ;

i1 2
;
6 i2 2
;

G J

Figure 3: Main circuit and two subcircuits with short-circuit currents.

The formulation is essential for the new MLNR methods presented in Section
4.
The idea of this formulation is that, this way, each subcircuit has indepen-
dent short-circuit currents ii ∈ RnEi and external variables xEi ∈ RnEi , and the
only connection is at the main circuit level. Thus the subcircuits do not have
common variables. The nodal equations for the internal nodes, the subcircuit
connection nodes and the main circuit nodes take the form
fi (xi , xEi ) = 0,
fEi (xi , xEi ) + ii = 0, (17)
fE0 (xE0 , xEI , iI ) = 0,
respectively, where i = 1, . . . , m, subscript I denotes all i, fEi : Rni × RnE i →
RnE i , and fE0 : RnE 0 × RnEI × RnEI → RnE0 .
Eqs. (13) and (17) show that only ∆xE and ii have to be transferred
into subcircuits during the communication between the main circuit and the
subcircuits.
The idea of separation of the subcircuits is not completely new [126], but
a new way of taking advantage of this formulation is presented in Section 5.2.

Example 2
Now, we may continue the previous example. By adding the short-circuit cur-
rents (see Fig. 4) and applying the MNA formulation, the nonlinear equations
become
f1 (v1,1 , vE1 ) = G1 v1,1 − iD1 (vE1 , v1,1 ) = 0 (18)
f2 (v2,1 , vE2 ) = G2 v2,1 − iD2 (vE2 , v2,1 ) = 0 (19)
fE 1 (v1,1 , vE1 ) − i1 = iD1 (vE1 , v1,1 ) − i1 = 0 (20)
fE 2 (v2,1 , vE2 ) − i2 = iD2 (vE2 , v2,1 ) − i2 = 0 (21)
fE 0,1 (vE1 , vE ) = vE − vE1 = 0 (22)
fE 0,2 (vE2 , vE ) = vE − vE2 = 0 (23)

9
Subcircuit 1 Subcircuit 2
D1 D2
1,1 E1

i1 E i2 - E2 2,1

6
G1 G3 J G2

Figure 4: Main circuit and two subcircuits with short-circuit currents.

fE 0,3 (vE , i1 , i2 ) = −J + G3 vE + i1 + i2 = 0 (24)


As we can see, the subcircuit equations f1 and fE 1 are independent of f2 and
fE 2 . Now the Jacobian matrix is
 
G − ∂iD1 0 ∂iD1
− ∂v 0 0 0 0
 1 ∂v1,1 E1 
 ∂iD2 ∂iD2 
 0 G2 − ∂v 0 − ∂v 0 0 0 
 
 2,1 E2

 ∂iD1 0 ∂iD1 0 −1 0 0 
 ∂v1,1 ∂vE1 
J=  . (25)
 ∂iD2 ∂iD2 −1 
 0 ∂v 0 ∂v 0 0 
 2,1 E2 
 0 0 −1 0 0 0 1 
 
 0 0 0 −1 0 0 1 
0 0 0 0 1 1 G3
In realistic simulations, ni  nEi . 

3.3 Hierarchical LU factorization and linear equation


solving
The sparse linear equations arising from circuit equations can be solved in
very different ways. The most common ones are LU factorization with forward–
backward substitutions and iterative methods like GMRES [87]. Sparse matrix
techniques [58] are superior in speed to dense matrix techniques because they
exploit the sparsity of the matrix and perform only the necessery computations.
Before LU factorization, symbolic sparse-matrix reordering is performed.
The purpose of the reordering is to reorder the matrix such that minimum
number of fill-ins [58] is produced (in APLAC, the Minimum Local Fill-in
algorithm [58] is used).
Hierarchical LU factorization is the basis of all hierarchical analysis, not
only for the NR method where the linearized equations of the subcircuits are
solved independently, but also for MLNR methods.

10
Formerly, hierarchical LU factorization has been used (in serial machines)
as a sparse matrix equation solver. The efficiency of hierarchical LU factoriza-
tion is discussed in detail in Refs. [35, 69, 114]. But even if the decomposition
is efficient, the cost of hierarchical analysis is at least (m + α)n + m3 /3 + m2
(α is the sparsity factor, n is number of the nodes and m is number of the
subcircuits), while for good sparse matrix code, it is only αn [69, pp. 383–384].
Typically, the complexity of the sparse matrix computation is n1.1−1.5 [58]. As
a conclusion, we may say that hierarchical decomposition does not necessarily
improve the speed of linear equation solving. The cost of the symbolic re-
ordering of a sparse matrix depends more on the size of a matrix — at least in
APLAC (see Section 7) — and can be reduced with efficient decomposition.
However, hierarchical LU factorization is a way to perform the circuit anal-
ysis in parallel. If there is only a need to solve linear equation in parallel, it
might be useful to apply existing parallel solver, e.g., SuperLU [20]. However,
the shared memory version of the SuperLU algorithm has been noticed to be
inefficient for circuit matrices [6]. Moreover, hierarchical LU factorization is a
way to produce Norton’s equivalent circuits of the subcircuits.
Hierarchical linear-equation solving is considered in the following, and sum-
marized in Algorithms 1–4. Here, two-level hierarchy is treated because the
practical implementation in APLAC (see Section 6) uses only two levels of
hierarchy. Ref. [109] presents the multilevel algorithm for LU factorization.
The linear BBD equations (12) can be decomposed into separate systems
of subcircuit equations
    
Ai Bi xi bi
= . (26)
Ci Di xEi bEi
For convenience, the unknowns are denoted by x and the RHS by b. In NR
equations, x consists of the NR updates and b = −f(x).
Hierarchical LU factorization starts from the partial LU factorization. The
factorization is stopped before accessing Di :
   
Ai Bi Li \Ui B̂i
→ , (27)
Ci Di Ĉi Di
where
Ĉi = Ci U−1
i , (28)
−1
B̂i = Li Bi . (29)
Because U and L are triangular matrices, the calculation of Ĉi and B̂i is a
straightforward task.
The next step is to compute the contributions of the subcircuit Jacobian
matrices
DSub,i = Di − Ĉi B̂i (30)

11
for the main level equations. Notice that the Jacobian matrix is the same as
in Eq. (15). Algorithm 1 describes the LU factorization and the calculation of
DSub,i .

Algorithm 1 LU factorize(circuit)
1. LU factorize Ai : Ai = Li Ui .

2. Ĉi = Ci U−1
i

3. B̂i = L−1
i Bi

4. D̂i = Di − Ĉi B̂i

The solving continues on the subcircuit level with forward substitution, i.e.,
solving the auxiliary vector z from the equation

Li zi = bi . (31)

The zi is needed for the calculation of bSub,i :

bSub,i = bEi − Ĉi zi , (32)

which is the same as fSub,i in Eq. (16). The Algorithm 2 performs the forward
substitution and computation of the bSubi . The vector bSub,i and matrix DSub,i
are transferred into the main level equation solver for the solving of the external
(main circuit level) variables.

Algorithm 2 Forward(circuit)
1. Solve Li zi = bi .

2. bSub,i = bE − Ĉi zi

Before solving the inner variables of the subcircuits, the main level equa-
tions have to be constructed from the subcircuit contributions and from the
main level circuit:
!
Xm Xm
DE0 + DSub,i xE = bE0 + bSub,i . (33)
i=1 i=1

From the circuit-theoretical point of view, the resulting matrix equation (33)
is the linearized circuit equation of the main circuit such that subcircuits are
expressed as Norton’s equivalent circuits. The external voltages xE can be

12
solved using LU factorization, but there is no restriction on the main level
equation solver.
Then, the values of the external voltages are transferred into the subcircuits
and terms zi − B̂i xEi are constructed for the backward substitution. After that,
it is trivial to solve the equation

Ui xi = zi − B̂i xEi . (34)

Algorithm 3 Backward(circuit)
1. Solve Ui xi = zi − B̂i xEi .

Algorithm 4 describes the whole linear-equation solving process constructed


from Algorithms 1–3. The notation ”Forall” means that the following proce-
dure can be performed in parallel.

Algorithm 4 Solve linear equation(circuit)


1. Forall: LU-factorize(subcircuit)

2. Forall: Forward(subcircuit)

3. Solve, using sparse matrix LU factorization,


!
X
m X
m
DE0 + DSub,i xE = bE0 + bSub,i .
i=1 i=1

4. Forall: Backward(subcircuit)

We can verify that the inner variables obtained from the substitutions (31)
and (34)
 
xi = U−1 i zi − B̂ i xE i

= U−1 −1 −1 −1
i Li bi − Ui Li Bi xE i (35)
= A−1 −1
i bi − Ai Bi xEi

are the same as in Eq. (13).


The full NR method includes, in addition to the linear-equation solving,
sparse matrix reordering, and control of NR iteration with convergence-aiding
methods.

13
3.4 Aiding the convergence
The convergence of NR iteration is guaranteed only sufficiently near the solu-
tion [52]. Due to a poor initial guess, the iteration may diverge or be extremely
slow. Exponential characteristics, which are very common in electrical circuits,
lead easily to numerical overflow. Therefore, some additional aid is needed to
guarantee the convergence.
NR iteration convergence-aiding methods can be roughly divided into three
different groups:
• Heuristic step-size adjusting methods that calculate the step sizes of NR
iteration due to some heuristic rules. Typically, a priori knowledge of
diode current characteristics is exploited.

• Norm reduction methods, which exploit the fact that the NR step is
in the descent direction of the norm kfk and that the step size can be
adjusted such that norm reduces monotonically at every iteration.

• Continuation or homotopy methods [69, 105] which use the continuous


deformation from easy problem to hard problem (the original problem
in question). The solution of the previous problem is an initial guess
for the next problem. This way, the solution is dragged into the final
solution. The very well-known methods like source stepping and model
damping (or Gmin -damping or conductance damping) are rough examples
of discrete homotopy methods.
The goal of step-size limiting methods is to adjust the step size such that
the error norm kfk reduces at every iteration. Step-size limiting is done by
calculating a damping factor λ such that
kf(xk + λ∆xk )k < kf(xk )k. (36)
In practice, it has been found to be effective to use heuristic step-size limiting
methods first and if this does not prevent the error from increasing, then norm-
reduction is performed until a smaller error norm is found.

3.5 Summary of NR method


Now, equation formulation, nonlinear and linear equation solving and conver-
gence aiding have been discussed, and it is time to summarize the NR method.
The convergence-aiding methods used in APLAC are presented in Section
6. In Algorithm 5, it is only shown when to use step-size limiting methods.
The homotopy methods are not shown in Algorithm 5.
Parameter ε is used in the stopping criterion of the iteration, and K is the
maximum number of iterations. The parameters are specified beforehand by
the user.

14
Algorithm 5 Parallel DC analysis(circuit)
1. Set k = 0.

2. Forall: Symbolic analysis (reordering of matrices)

3. Begin iteration:

(a) Forall: Linearize circuits and construct matrices.


(b) Call: Solve linear equation(circuit)
(c) Forall: Find λi using heuristic step-size limiting rules.
(d) Calculate overall damping factor: λ = min {λi , λ0 }.
(e) Forall: Damp the solution xk+1
i = xki + λ∆xki .
(f) Forall: Calculate error norm kfi k.

(g) Norm reduction: if kf xk + λ∆xk k ≥ kf xk )k, compute new λ
and go to step 3 (f).
(h) Set k = k + 1
(i) If k∆xk > ε and k < K then go to step 3 (a).

4. End iteration.

15
4 Multilevel Newton–Raphson methods
In the previous section, the hierarchical NR method was presented. What
next? The only improvement that we can obtain is the speed-up from paral-
lelization (and sometimes from the decomposition itself which is mostly due
to the partitioning of the symbolic analysis). In order to improve the speed
more, other iteration methods than NR iteration have to be utilized. Dig-
ital circuits are usually modular, latent, and unidirectional, in other words,
loosely coupled. Because block-, waveform-, and nonlinear relaxation methods
utilize these properties, they have been found suitable for this group of cir-
cuits. For analog circuits, which usually are tightly coupled, the capabilities
of these methods cannot be fully exploited, but it might be worthwhile trying
to use MLNR methods because they have been effectively applied in parallel
processing [28, 113, 125, 126].

4.1 Conventional multilevel methods


One of the first MLNR methods, Multilevel Newton Analysis (MLNA) [77],
performs, instead of global NR steps as in Eq. (2), the iterations on multiple
levels. Between outer iterations, the external variables are kept constant and
only inner variables of subcircuits are iterated:
 −1
k,j+1
xi = xi − Ai
k,j k,j
fi (xk,j k
i , xE i ), (37)

where j is the inner iteration index. The inner iteration is stopped at some
error level τ = min(τ 0 , k∆xE k2 ) (τ 0 is the maximum allowed error level) which
is needed for quadratic convergence of the outer level iteration [77]. The initial
guess for the inner variables xk,0i in the inner iteration can be at every inner
iteration the same or it may be the ending values of the previous iteration.
The main-circuit variables are iterated using subcircuits as macromodels.
The MLNA [77] is presented in Algorithm 6.

Algorithm 6 MLNA(circuit)
1. Set x0E , ε and τ 0 .

2. Begin outer iteration: Set k = 0.

3. Forall: Begin inner iterations for subsystems i: Set j = 0 and xk,0


i .

i = −fi (xi , xE i ).
(a) Solve Ai∆xk,j k,j k

(b) Set xk,j+1


i = xk,j k,j
i + ∆xi .

(c) Set j = j + 1.

16
(d) If k∆xk,j
i k > τ go to Step 3 (a).

4. End inner iteration.


P  Pm
5. Solve DE k0 + m i=1 DSub, i ∆xE = −fE 0 +
k k k
i=1 fSub, i .

6. τ = min(τ 0 , k∆xE k2 )

7. Set k = k + 1.

8. End outer iteration if k∆xE k < ε.

It is not certain or even common that the overall computational cost of


MLNR method is lower than the conventional NR method’s. Moreover, the
MLNR method is always a disturbed NR method, i.e., the total convergence
of all variables is slower than with the NR method. The often mentioned
fast outer convergence does not mean anything if the overall convergence and
computational cost is worse.
So, why use MLNR methods? One reason is the possibility to take ad-
vantage of time domain latency that, especially with digital circuits, reduces
the need of inner iterations during the transient analysis and, thus, saves com-
putation. Another useful property is the reduced communication in paral-
lel processing. The communication is needed only during the outer iteration
which reduces the time needed. Particularly, in the distributed computation in
NoWs, the reduction of communication affects computational efficiency dras-
tically.
There have been presented some experimental results on using MLNR
methods [125, 126]. Recently, Refs. [28, 113] have presented very convincing
results of using MLNR for networked simulation.
Of the other multilevel methods, we may mention, the block relaxation
method [22] which is nothing more than a MLNR method with only one inner
iteration. Such a method is used, e.g., in MEDUSA [26]. But in Ref. [64], the
block relaxation method has been shown to have poor convergence properties.
Refs. [125, 126] present the implicit method, where the number of inner
iterations is limited to some constant (the experiments [126] show that 3 is
the optimal number) instead of continuing the iteration to some error level.
This way, load balancing (balancing the computational load between different
processors) can be better controlled than in the MLNA method.
In the corrected explicit method [125,126], the inner variables are corrected
with a correction term δ = −A−1 B∆xE after the outer iteration step. The
same kind of correction idea is used in the modified MLNR method [64], where
the inner iterations are corrected with Forward-Euler type corrections. The
corrected methods have been shown to be more reliable in convergence than
the uncorrected ones.

17
In the following, two new MLNR methods are presented. The main em-
phasis is in the convergence of the DC analysis: how to improve convergence
of the MLNR methods and, thus, the speed of the analysis.

4.2 New multilevel methods


In the new multilevel methods [40,41], the idea of correction [125,126] in outer
iteration is extended such that a full NR step (i.e, all the variables, instead
of only external variables, are iterated) is taken instead of outer iteration and
correction. As a matter of fact, one inner iteration and outer iteration with the
correction δ = −A−1 B∆xE is the NR step in the case of linear fE . Because
the Norton’s equivalent circuits are produced using forward substitution, there
is no reason to make the correction else than by using backward substitution
after external variable updating. It leads to the full NR step also in the case
of nonlinear fE . This full-step property is used in the control of the iteration.
Computation of the global NR step is more expensive than the outer it-
eration step where only external variables are updated because of calculation
of backward substitution and communication between processors, but it is
compensated by faster convergence of the total iteration.
The specific circuit-equation formulation with short-circuit currents is uti-
lized in the convergence-control mechanism of the first new method and it is an
essential part of the second method. The first new method is called the com-
bined Newton–Raphson and Newton–Raphson method (NRNR) [41], while the
second method is called the combined Newton–Raphson and Gauss–Newton
(NRGN) method [40] because it utilizes Gauss–Newton (GN) iteration in order
to obtain an even better possibility to control convergence.
The main core of the both new methods is presented in Algorithm 7. The
inner iteration depends on the method used and iterates the inner variables
from xk,0 to xk,J but it may end before reaching the J iteration. For simplicity,
the iteration is treated as if the inner iteration always reached the end. The
outer iteration takes the full NR steps from xk,J to xk+1,0 .

Algorithm 7 MLNR(circuit)
1. Set x0,0 , ε and J.

2. Begin outer iteration: Set k = 0.

(a) Forall: Inner iteration(subcircuit).


(b) Call: Solve linear equation(circuit).
(c) Take full NR step: xk+1,0 = xk,J + ∆xk,J
(d) Set k = k + 1.

18
(e) if k∆xk > ε and k < K go to step 2 (a).

3. End outer iteration

Typically, the optimal number of inner iterations is something between 1


and 5. If there is a larger number of inner iterations, the extra inner iterations
do not improve the situation anymore, but on the other hand, there have to
be some inner iterations in order to reduce the number of outer iterations.

4.2.1 NRNR method


In the NRNR method, the main new idea is to modify the iteration such that
the global convergence can be easily controlled. By taking full NR steps on
the outer iteration level, i.e., by iterating all variables instead of updating only
external variables, local quadratic convergence of the new method with only J
inner iterations is achieved (see Section 5.1). Thus, the iteration is speeded up.
If the short-circuit currents are added (which is not necessary for the NRNR
method), we can check whether the direction of the inner iteration step is in
the descent direction of the norm kfk, and if it is, the NR step can be adjusted
such that the norm decreases monotonically (see Section 5.2).
The inner iteration of the NRNR method is summarized in Algorithm 8.

Algorithm 8 InnerNR(circuit)
1. Begin iteration: Set j = 0.

i ∆xi = −fi (xi , xE i ).


(a) Solve Ak,j k,j k,j k

(b) Set xk,j+1


i = xk,j k,j
i + ∆xi .

(c) Set j = j + 1.
(d) If j < J or k∆xi k > ε go to step 1 (a).

2. End inner iteration.

In the case of a linear fE , the NRNR method is equivalent to the nonaided


corrected explicit method [125, 126]. The difference is in the control of the it-
eration: where the corrected explicit method with convergence aiding reduces
the error norm between every outer iteration, the NRNR method monotoni-
cally reduces the norm at every inner and outer iteration. In addition, if fE is
nonlinear, the iteration goes differently due to the different correction.

19
4.2.2 Combined NRGN method
In order to take the connection nodes into account in the inner iteration,
some additional connection-node equations have to be solved together with
the other subcircuit equations. The resulting system of nonlinear equations
is overdetermined and is solved in the sense of Least-Squares (LS) using GN
iteration. Moreover, the GN-iteration step is in the descending direction of the
residual [94]. Thus, the step sizes can be damped such that the total residual
norm decreases monotonically at every inner iteration.
The squared norm of the left hand side of Eq. (17) is

kfk22 = kf1 k22 + kfE 1 + i1 k22 + . . . + kfm k22 + kfE m + im k22 + kfE 0 k22 . (38)

During the inner iterations, xEi and ii are kept constant and only internal
variables, xi , are iterated. If we use the NR method, the direction of the inner
NR step is in the descent direction of kfi k22 . However, it is not guaranteed that
kfEi + ii k22 decreases. But if we use GN iteration in the inner loop, i.e., solve
the equations

fi (xi ) = 0, (39)
fEi (xi ) = 0, (40)

in the sense of LS, the direction is in the descending direction of the norm
kfi k22 + kfEi + ii k22 , and thus in the descending direction of the total norm.
The linearized GN equations of the nonlinear subcircuit equations (39) and
(40) take the form

Ri ∆xi = bi , (41)

where ∆xi ∈ Rni is the GN update and


 
Ai
Ri := ∈ R(ni +nE i )×ni , (42)
Ci
 
fi
bi := − ∈ Rni +nE i . (43)
fEi
The inner GN iteration is described in Algorithm 9.

Algorithm 9 InnerGN(circuit)
1. Begin inner iteration: Set j = 0.

(a) Solve Rk,j k,j k,j


i ∆xi = bi .

(b) Set xk,j+1 i +λ∆xi and find λ such that kfi


= xk,j k2 +kfE k,j+1
k,j k,j+1 2
i i +
ii k2 < kfi k2 + kfE i + ii k2 .
k 2 k,j 2 k,j k 2

20
(c) If j = J or k∆xi k < ε, end inner iteration, else set j = j + 1 and
go to step 1 (a).
2. End inner iteration.

The overdetermined linear equation can be solved by means of the normal


equations
RT R∆x = −RT f, (44)
and the solution is
∆x = −(RT R)−1 RT f. (45)
Now, we can define the pseudoinverse
R† := (RTR)−1 RT (46)
which is needed in the convergence proof in Section 5.1. In practice, the LS
solution is not solved by using normal equations.
Because R is nearly square (nEi  ni ), the computation of the LS solution
can be performed using a specific sparse-matrix algorithm that utilizes the
sparse-matrix LU-factorization [5], or using the Conjugate Gradient Normal
Equations (CGNE) method [5].
In Ref. [48], it has been shown that by solving the augmented system
  
Ai −(A−1 T T
i ) Ci x
= b, (47)
Ci I xaux
where I is unity matrix, we get the LS solution vector x and auxiliary vector
xaux ∈ RnEi .
The LU algorithm efficiently uses sparse LU factorization code for the fac-
torization of Ai . The algorithm constructs LU factors
 
L11
LLS = ∈ R(ni +nE i )×(ni +nE i ) , (48)
L21 L22
 
U11 U12
ULS = ∈ R(ni +nE i )×(ni +nE i ) . (49)
U22
It follows that
L11 = Li , (50)
U11 = Ui , (51)
L21 = Ci U−111 , (52)
K := L21 L−1
11 , (53)
−1 T
U12 = −L11 K , (54)
L22 U22 = I + KKT . (55)

21
Eq. (55) has to be factored in order to obtain L22 and U22 .
LS solution can be solved using forward-backward substitution from the
equation
 
x
LLS ULS = b. (56)
xaux
Algorithm 10 summarizes the discussion.

Algorithm 10 LSLU()
1. LU factorize Ai = L11 U11 .

2. L21 = CU−1
11

3. K = L21 L−1
11

4. U12 = −L−1
11 K
T

5. L22 U22 = I + KKT .


 
x
6. Solve, using forward-backward substitution, LLS ULS = b.
xaux

The other possibility to solve the LS solution is to use CGNE. The CGNE is
an iterative Krylov-subspace method for normal equations, and the Algorithm
is as follows:

Algorithm 11 CGNE()
T
1. Set initial guess x0 ; s0 = b − Rx0 ; r0 = RT s0 ; p0 = r0 ; ρ0 = (r0 ) r0 ;
p−1 = 0; β−1 = 0

2. For i = 0, 1, 2, . . .

(a) pi = ri + βi−1 pi−1


(b) wi = Rpi
ρi
(c) αi = i T i
(w ) w
(d) xi+1 = xi + αi pi
(e) si+1 = si − αi wi
(f) ri+1 = RT si+1
(g) If xi+1 is accurate enough then quit.

22
(h) ρi+1 = (ri+1 )T ri+1
ρ
(i) βi = ρi+1
i

3. End.

In order to speed up the rate of convergence, CGNE has to be precondi-


tioned. Then, we are solving the problem

min kRS−1 y = bk2 , y = Sx. (57)

where S is the preconditioner and y = Sx ∈ Rni . There is a possibility to use


A as a preconditioner:

min kRA−1 y = bk2 , (58)

and, because we have the LU factors of the matrix A, then the preconditioned
matrix is
   
−1 1 1
RA = = . (59)
CU−1 L−1 ĈL−1

4.2.3 DC sweep
In a DC sweep, a voltage, model parameter, or temperature is swept and the
DC analysis is done several times repetitively. Then, the previous analysis is a
good initial guess for the next analysis, and thus the convergence is improved.
If, e.g., a voltage source whose voltage is swept is inside a subcircuit, then,
in the first iteration of the next DC analysis, there is a need to perform the
inner iteration only for the subcircuit that contains this source. If the source
is at the main level, there is no need to perform the inner iterations at all, and
the outer iteration step is taken immediately.

23
5 Convergence
The convergence of the NR and MLNR methods is considered in this section.
For application, it is crucial to know whether the iteration method converges
or not. Usually, nonlinear iteration methods converge provided that the ini-
tial guess is close to the solution; however, the iteration may diverge due to
poor initial guess. Therefore, the treatment of convergence is divided into
two sections. First, the local convergence of the methods and, then, global
convergence is discussed.

5.1 Local Convergence


In this section, the convergence theorem of the NR method is briefly reviewed
and, after that, the local quadratic convergence of the MLNR methods is
shown.
Let f be differentiable in an open set Ω ⊂ Rn . The standard assumptions
for the solution of the circuit equations and the Jacobian are presented next.

Assumption 1 The standard assumptions are

• A solution x∗ ∈ Ω exists.

• J : Ω → Rn×n . J is Lipschitz continuous.

• J(x∗ ) is nonsingular.

The assumptions lead to the following lemma.


Let us define the ball B(·) as follows

B(r) = {x | kek < r }. (60)

Lemma 1 If Assumption 1 holds, then there exist K > 0 and δ > 0, such that
if xk ∈ B(δ), then the NR iterate satisfies

kek+1 k ≤ Kkek k2 , (61)

where e = x − x∗ .

The proof can be found, e.g., in Ref. [52].


Clearly, Lemma 1 does not prove the convergence. It only shows the bounds
of the NR iterates. Now, we can continue and present the well-known result
of NR convergence analysis.

Theorem 1 Under Assumption 1, if we start the NR iteration “sufficiently


close” to the solution, x0 ∈ B(δ), the NR iteration convergences quadratically.

24
The proof can be also found, e.g., in Ref. [52].
Next, convergence properties of the NRNR method will be shown. The
assumptions for the forthcoming Lemma are presented.

Assumption 2 There exists an L > 0 such that


kf(x) − f(x∗ )k ≤ Lkx − x∗ k, (62)
and that for some M > 0
k(Ak,j )−1 k ≤ M. (63)

If Assumption 2 is true (note that we do not need Assumption 1), we can


show the effect of inner iterations on the total error norm kfk. The following
Lemma is valid for any multilevel method that uses the NR iteration as the
inner iteration.

Lemma 2 From Assumption 2, it follows that for all j = 1, 2, . . .


kek,j k ≤ K̂kek,0k, (64)
where K̂ = (1 + LM)j .

Proof. (By induction.) j = 1: The inner iteration (37) can be rewritten,


by adding ekE and fEk , as follows:
 k,1   k,0     k,0 
eI eI (Ak,0)−1 0 fI
k,1
e = k = k − . (65)
eE eE 0 0 fEk
Then
 k,1   k,0     k,0 
e e (A k,0 −1
) 0 fI
I = I −
ek ek 0 0 fEk
E

E

≤ ek,0 + (Ak,0)−1 f k,0 (66)

≤ ek,0 + ML ek,0

= (1 + LM) ek,0 ,
and thus Eq. (64) is true for j = 1. The inductive hypothesis is that
k,j
e ≤ (1 + LM)j ek,0 . (67)
Using the hypothesis for j + 1,
k,j+1
e ≤ ek,j + (Ak,j )−1 f k,j

≤ ek,j + LM ek,j (68)

= (1 + LM) ek,j

≤ (1 + LM)j+1 ek,0 .

25
Lemma 1 holds for all j by induction. 2
Now, when the effect of the inner iteration on the error is known, it
is straightforward to show the quadratic convergence of the outer iterations
({x0,0 , x1,0 , . . .}) of the NRNR method. The idea of the proof is to show that
if we are sufficienly close to the solution, the inner iteration cannot disturb the
iteration too much and the outer iteration converges quadratically.
Theorem 2 If Assumptions 1 and 2 hold on Ω, there is δ > 0 such that if
x0,0 ∈ B(δ), then the outer iteration of the NRNR method converges quadrati-
cally.
Proof. Let δ be small enough so that B(K̂δ) ⊂ Ω. Reduce, if needed, δ
such that K K̂ 2 δ = η < 1. If xk,0 ∈ B(δ), then Lemma 2 implies that after J
inner iterations xk,J ∈ B(K̂δ) ⊂ Ω and we can continue the iteration. Lemma
1 implies that, after outer iteration,
kek+1,0 k ≤ Kkek,J k2 ≤ K K̂ 2 kek,0k2 ≤ K K̂ 2 δkek,0k = ηkek,0 k < kek,0k (69)

and xk+1,0 ∈ B(ηδ) ⊂ B(δ). Since x0,0 ∈ B(δ), xk,0 converges to x∗ quadrat-
ically. 2
The NRGN method converges under slightly different assumptions.
Assumption 3 There is an L such that
kf(x) − f(x∗ )k ≤ Lkx − x∗ k, (70)
and that for some P
k(Rk,j )† k ≤ P. (71)
The first assumption is the same as in Assumption 2, but a bound is as-
sumed for the norm of the pseudoinverse R† instead of A−1 .
Lemma 3 From Assumption 3, it follows that for all j = 1, 2, . . .
kek,j k ≤ K̄kek,0k, (72)
where K̄ = (1 + P M)j .
Proof. The proof is similar to the proof of Lemma 2. The only exception
is that A is now replaced by R. 2
Similarly, the outer-level convergence of the NRGN method can be proven.
Theorem 3 If Assumptions 1 and 2 hold on Ω, there is δ > 0 such that if
x0,0 ∈ B(δ), then the outer iteration of NRGN method converges quadratically.
Proof. The proof is similar to the proof of the Theorem 2. 2

26
5.2 Global Convergence
There are numerous different convergence-aiding methods for the conventional
NR method (they were briefly discussed in Section 3.4) but there have only
been a few reported attempts to improve the convergence of the multilevel
NR method [123, 126]. In Ref. [77], it is mentioned that during transient
analysis, it may sometimes happen that the iteration does not converge and
that the convergence problem can be handled such that the time step is halved.
However, this approach does not solve the convergence problems in DC (and,
e.g., in harmonic balance) analysis.
Refs. [125, 126] present a two-level line search (norm-reduction) scheme
which finds such an inner solution that the direction of the outer iteration is in
the descent direction and a line search can be performed between points xk+1,0
and xk,0 . Ref. [123] applies nonmonotone line search methods to the MLNR
methods.
In the following, it will be shown how the convergence of the new MLNR
methods can be aided by using any existing step-size adjusting methods (which
are normally used together with the conventional NR method), such that every
inner and outer iteration reduces the total error norm kfk monotonically. The
proposed strategy does not limit the utilization of step-size adjusting methods
to some specific norm-reduction methods but allowes wider range of methods
to be applied.
The general goal of the step-size adjusting methods is to find a damping
factor, λ, such that the error norm
 
f xk,j + λ∆xk,j < f xk,j . (73)

If NR iteration is used as the inner iteration, the NR steps can be adjusted


using heuristic step-size adjusting methods. In practice, line-search methods
can also be used within the inner iteration, but there is no theoretical justifi-
cation for this because the direction of the inner NR step is not necessarily in
the descent direction of the total error function. If condition

kfik,j+1k22 + kfE k,j+1


i + iki k22 < kfik,j k22 + kfE k,j
i + ii k2
k 2
(74)

is not satified during the inner iteration, the iteration should be stopped at
the point where the iteration has last reduced the norm.
If GN iteration is applied as the inner iteration, the direction of the iteration
step is always descending and a λ can be found such that the norm is reduced.
However, due to numerical reasons, it may happen that a λ that satisfies the
condition cannot be found. Again, the inner iteration should be stopped at
the point where the error norm was reduced. In Fig. 5, the damping of the
inner iteration step is shown, while the external voltages, xE , (and short-circuit
currents) are kept constant.

27
6 ...............................................................
xi
..
......
...........
........
........................ ...............
............
..........
........
.....
...
....
.................. ....................................... ..........
. .. .
. ..
. ..
. .
.. ....
.
........ ....
. .
.
...
...
... 6 ..
.
.
...
... .....
.
..
................... ........................... ......... ........
. ... .
.
.
.
..... .... ....
............................. ....... ..... .....
... ... ... .
... .... ... ... ...
... ... ... ... ... ... ... ...
... ... ... .... ... ... ... ...
... ... .... ..... ... .. ... ...
... .... .... .. . .
... ..... .... ....... .. ... ... ..
...
... ..... ....... .............. .... .... ....
. ..
... ..... ....... . . ...
.... ..... .... .. ....
. ..
.... ..... ..... ... .... ..
.... ..... ...... .
. ..
.... ..... . ........... ...... . . ...
..
6x = x + x
.... .... ...... .
. ..
.... ... .
. ...
.....0;0
....
....
0;1 0;0
 ... ......
. .
....
. ..
..
..
.... . ........... .
.... ...................... ..
...
.....
.....
..... ....
..... ...
..... ..
.....
..... ...
.
.... ..
.
.....
..... ...
..... ...
..... ..
..... ...
.
......
...
x
......
....... ..
0;0 ........ ...
.........
............ ..
.
.....
.
..................................
-
x E

Figure 5: Step-size adjusting in inner iteration, while external voltages, xE ,


are constant.

After J iterations, the outer-iteration step starts from the point where
the error is smaller than or equal to that of the starting point of the inner
iteration. Because the outer iteration step is a normal NR step, it is in the
direction of steepest descent and any step-size adjusting methods can be used.
The damping of the outer iteration is presented in Fig. 6.
6 ...........................................................
xi
..........................

....
.
......
........
.......... ................
............
..........
........
..
. ................... ..........
..
. ..
.
....
...... . ............. ... . ..
...
. ....... ........... .....
*
.. ..
. ..... ...
.
.
. .
... ....... ..... ..
.... .. ...... .
. ... ... ............. ......... ........
... ...
... ..... ..... ..... .... ...
... ... ..... .......
. ..... ... ... .......... ....... ...... ......
... ... ... ..... .... ... .. ..
...
...
...
...
...
...
.
...
...
....
.
....
....
.
. .
.
*x = x + x0
... ... ... ...
1.
.. .. .. ...
...0 .... ....0;J.... 
;. ;J
... .
.... .... .....
... .... .... ..... .... ..... ..... .....
... .. .... ........ ........... ... .. .. ...
... ..... .. ... ..
... ....
... ..... .... .. .. ..
.... ..... .....
. .
... .... ..
..
..... .......
....
.... ..... ........................ .... ...
6x
.... .... . ..
.... ..... .. ...
.... 0 ;J ..... .
.. ..
.... . .. ....... .. ..
.... .......... .... ..
................... . ..
..... ...
..... ..
..... ..
..... ..
.....
..... .
. ..
.... .
.
..... .
.....
.... ...
..
.....
..... ..
..... .....
.....
...... ...
0;0
x ......
.......
........
..........
............. .
...
.
.
..
...
...

................................
-
x E

Figure 6: Step-size adjusting in outer iteration that is a full NR iteration.

The full outer NR step brings us an additional improvement. If there are


diodes in the connection nodes in the subcircuits and they are not taken ac-
count of when limiting the step size in the outer iteration, then the connection
node voltages can change too much and cause convergence problems due to
the exponential characteristics of the diodes. By taking full NR steps, we can
avoid this problem.
Even if the NR step is in the descent direction, the system of equations may

28
be ill-conditioned and produce bad directions. The direction may not be in
the descent direction, or such small step sizes are needed that norm-reduction
methods are not able to find them. Especially, controlled sources have a very
undirective nature [120] and produce bad directions.
In addition, continuation or homotopy methods can be applied on the outer
iteration level similar to the normal NR method. For example, if the outer
iteration fails to improve the iteration, source stepping can be started.

29
6 Implementation in APLAC
The methods proposed have been implemented in the in-house development
version of APLAC circuit simulation and design tool.
The parallel hierarchical analysis is implemented such that one main (or
master) APLAC executes parallel APLACs in each machine in the computer
network. One computer is used for both a subcircuit and the main circuit,
because the computational cost of main-circuit solving is so small that it is
not worthwhile to devote one machine to be the main-circuit solver. The other
computers are only used for subcircuits.
Because APLAC is programmed in the object-oriented way [38, 107], the
hierarchical analysis is also implemeted using the same object-oriented ap-
proach. The analysis in each parallel APLAC is performed by the hierarchical
analyzer object which uses the hierarchical sparse matrix object, the parallel
message passing2 object, and the subcircuit interface object. The main ana-
lyzer controls the DC analysis and calls slave analyzers (in parallel APLACs)
when necessary.
The decomposition is done such that the user specifies the subcircuits with
DefModel, APLAC’s subcircuit modeling component, and then each parallel
APLAC interprets the input file and creates its own subcircuit and analyzer.
So, there is no need for prepartitioning of the input file into separate input
files.
Section 6.1 explains the implementation of parallel processing in APLAC,
and Section 6.2, the structure of the hierarchical analyzer in more detail.
Because APLAC creates the equations using the iteration-model approch,
some aspects of it are discussed in Section 6.3. The Section 6.4 presents the
convergence-aiding methods in APLAC which have been applied in parallel
methods.

6.1 Parallel processing


As already mentioned, the hierarchical analysis uses the parallel message pass-
ing object for distributing the data between parallel analyzers.
The object interface is programmed such that it can be used in any parallel-
processing implementation. For example, optimization methods in APLAC
can be implemented in parallel using this object.
The message passing object uses the Parallel Virtual Machine (PVM) pack-
age [32]. The PVM is byproduct of the Heterogeneous Distributed Computing
research project at Oak Ridge National Laboratory and the University of Ten-
nessee and it has gained wide acceptance in the high-performance scientific
computing community.
2
The parallel message passing is different from the message passing between objects in
APLAC. If there is a need to refer to this, the term APLAC message passing is used.

30
The PVM creates a virtual machine, multicomputer, which has multiple
machines and a software backplane, Pvmd, to coordinate the operation. Pvmd
is the PVM daemon, a process that serves as message router and virtual ma-
chine coordinator.

6.2 Hierarchical analyzer


In addition to the parallel message passing object, several other objects are
needed in implementation:

• Hierarchical circuit-analyzer object that performs the DC analysis. There


are two types of analyzer objects, masters and slaves. The master an-
alyzer handles the control of DC analysis and main circuit operations.
The slave analyzers are for subcircuit analysis.

• Hierarchical sparse-matrix object that performs the matrix operations


like LU factorization, linear equation solving, as well as computation of
values of Norton’s equivalent circuit elements. It performs the permuta-
tion into the block form (26) where the Ai matrix is sparse. Bi , Ci , and
Di matrices are dense. This allows the utilization of the existing sparse
matrix solver for Ai without large modifications in it.

• Subcircuit-interface object that handles the interface between subcircuits


and the main circuit. It creates the Norton’s equivalent circuits at the
main-circuit level and collects the external voltages and currents for the
subcircuit analyzer.

6.3 Iteration models


APLAC uses the iteration-model approach for the formulation of circuit equa-
tions. Each nonlinear and linear component is mapped into a voltage-controlled
current source (VCCS). The current i of the VCCS is a function of controlling
voltages u that are differences of two node voltages. Each nonlinear VCCS
builds an iteration model by linearizing the nonlinear current characteristics
at uk :
X
n
∂i(uk )
i(uk+1
) ≈ i(u ) +
k
(uk+1
i − uki ). (75)
i=1
∂ui

The topology of the iteration model is illustrated in Fig. 7.


The (trans)conductances are
∂i(uk )
gi = (76)
∂ui

31
i(uk+1 )
    -   

uk1 +1    ukn+1 i(uk+1 ) ,!


? ? ? ?g1(uk )uk1+1 ?gn(uk )ukn+1 ?j (uk )
      

Figure 7: Iteration model.

and the independent current source may be

j(u) = i(uk ) (77)

or
X
n
j(u) = i(u ) − k
gi uki . (78)
i=1

When all nonlinear VCCSs have been linearized, the nodal matrix equation
can be constructed:

Gv = j, (79)

where G, v, and j are nodal conductance matrix, nodal voltage-, and current
source vectors, respectively. It can be easily shown that if Eq. (77) is used,
the linear matrix equation is equal to the NR matrix equation

J∆xk = −f(xk ). (80)

or, if Eq. (78) is used, the linear matrix equation is equal to

Jxk+1 = −f(xk ) + Jxk . (81)

The first model is called the incremental model because the solution of the
linear equation is the increment to the variables iterated, ∆xk . The second
model is called the iterative model, because the solution is the new iterates of
the variables, xk+1 .
Hierarchical LU factorization produces the iteration model of the subcir-
cuits (DSub and bSub ) during the solving process.

6.3.1 Iterative model


If we use iterative model in hierarchical analysis, the subcircuit matrix equation
has the form
   k,j+1   k,j     k,j 
Ai Bi xi fi Ai Bi xi
k,j+1 =− k,j + . (82)
Ci Di xE fE Ci Di xk,j
E

32
The inner variables can be solved from the upper part of Eq. (82) by using
forward and backward substitutions (31) and (34):
 
−1
xik,j+1
= Ui zi − B̂i xE
k,j k,j+1

 
= U−1 i L−1
i fi
k,j
+ A xk,j
i i + B xk,j
i E − U−1 −1
i Li Bi xE
k,j+1
(83)
= A−1 k,j
i fi − xk,j −1
i + Ai Bi xE
k,j
− A−1
i Bi xE
k,j+1
.
The substitutions reduce to inner NR iteration because xk,j+1 E = xk,j
E in this
case. The drawback of the model is that instead of simply solving equation
Ai ∆xi = −fi (xi ), there is a need to perform the unnecessary forward-backward
computation. Thus, the iterative model approach is not the most effective in
terms of computational cost. Moreover, in order to perform the GN iteration,
Bi and Di have to be removed from the conductance matrix. The Jacobian
matrix is easy to construct due to the block structure, but the extra manipula-
tion of the source vector is unavoidable. The extra terms have to be removed
from the source vector as follows:
   k+1   k    k   
A xi fi A B xi B  k 
=− + − xE . (84)
C xk+1
E fEk C D xkE D

6.3.2 Incremental model


Using the incremental-model approach, we can avoid the drawbacks of the iter-
ative model. The source vector is in a form that is optimal for inner iterations,
but the incremental method has several drawbacks of its own:
• The linear-model problem: In the iterative-model approach, linear com-
ponent values are computed only once and they contribute only to the
conductance matrix or the source vector. In the incremental model ap-
proach, every component in a circuit has to have iteration model — linear
ones too. For example, each conductance has a conductance value and a
nodal current source j k = guk that has to be updated in every iteration.
In the implementation in APLAC, every component has to be called by
using APLAC message passing, and using such a mechanism slows down
the computation.
• The initial-guess problem: Because the initial guess for controlling volt-
ages u0,0 can be calculated either from the node voltage guess v0,0 or can
be specified independently of node voltages, there can be a contradiction
between node-voltage and controlling-voltage guesses, e.g., two parallel
VCCSs have same controlling-voltage nodes but different controlling-
voltage guesses, thus, there is no way to calculate consistent node volt-
ages. The increment ∆x0 is useless, because there are no node voltages
to which it may be added. In practice, it is suggested to use zero initial
conditions for both node- and controlling voltages.

33
Due to these problems, the iterative model was chosen to be implemented.

6.4 Aiding the convergence


In the following, the convergence-aiding methods in APLAC are presented.
The methods are used with the NR method and with the MLNR methods.
Three step-size limiting methods are used in the implementation:

• Maximum step-size limiting method that ensures that the NR step size
does not exceed a specified level. If the step size is larger than maximum
allowed, e.g., 1 V for voltages, the step size is limited to that level. This
improves the convergence because the linearization is a good approxima-
tion only close to the linearization point, and the method prevents too
large steps.

• Diode damping that uses a priori knowledge of the exponential diode


characteristic. The heuristic rules presented in Ref. [107] are used to
calculate a likely solution of the diode voltage. Each diode suggests a
damping-factor candidate such that the value of the controlling voltage
does not exeed the point calculated by the rules. The smallest candidate
from the diode damping and maximum step-size limiting methods is
considered as the global damping factor.

• Norm reduction, a parabolic line search method. In APLAC, if, after


maximum step-size limiting and diode damping, the error is still too
large, a line search is performed. The exact rules are presented in Ref.
[107].

In addition to step-size adjusting methods, two well-known continuation-like


methods are used:

• Source-stepping method. After an unsuccessful iteration, the values of


the independent sources are lowered according to the rules presented
in [107]. Then, if the iteration converges, the values are brought gradually
back to the original ones such that the solution of the previous iteration
is the initial guess for the next one. If the iteration does not converge,
the values are lowered again.

• Model damping (conductance stepping or Gmin stepping), where addi-


tional conductances are placed in parallel with nonlinear elements and
from every node to the ground. After a successful iteration, the value of
the conductance is lowered and the solution is used as the initial solution
for the next analysis. This is repeated until the final solution is found.

34
7 Simulation examples
Three different circuits were simulated using the hierarchical DC analysis
methods: two different-sized transistor circuits and one operational amplifier
circuit.
The simulations were performed in a local area network containing three
PCs (Pentium III 650 MHz and two AMD 800 MHz). Ethernet was used as
the connection network. The network is not devoted to these experiments only
and, therefore, the results may be slightly disturbed.
There are many different formulas [24] to compute the efficiency of the
parellelization, but what really counts for the user is the real time (wall clock
time) used for the simulations after he/she has pushed the button. Therefore,
only the resulting wall-clock times of the parallel simulations are presented.
For the one-processor hierarchical NR method CPU times are presented, too.

7.1 Transistor circuits


Example circuits are analog cascade amplifiers. They have been built up hier-
archically from smaller pieces, simple transistor amplifiers in Fig. 8.

Vcc
6

R2
R
R
jQ R
R

R
R

Figure 8: Transistor amplifier.

The model of transistor Q is simple Ebers-Moll with APLAC’s default


parameters, R = 1 kΩ, R2 = 1 Ω, and Vcc = 10 V.
With the amplifiers, two cascade chains are constructed as shown in Fig.
9. The number of amplifiers is, in moderate circuit, 540 and, in large circuit,

35
1080. They are then divided into two or three subcircuits according to the
number of parallel computers or hierarchical analyzers.

Amp. 1 Amp. 2 ... Amp. n

Figure 9: Amplifier chain.

First, the speed of the nonparallel hierarchical analysis is studied. A DC


sweep is performed such that the input voltage E is swept from 0 V to 5 V
with 1 V steps. Tables 1 and 2 present the CPU times and the total wall-clock
times used for the DC analysis. The DC analysis is divided into three phases.
The preprocessing phase is the reading and interpretation of the input file.
The symbolic analysis phase consists of the symbolic reordering of the sparse
matrices, and iteration is the NR iteration in 6 sweep points.
The investigation shows that the time used in symbolic analysis is reduced
drastically when the partition into subcircuits is increased. With the moder-
ate circuit, the time of iteration was reduced but with the large circuit, the
decomposition did not improve the iteration time.

Table 1: CPU times (s) of the hierarchical analysis of the moderate transistor
circuit. The total wall-clock times are in parentheses.

Subs Preprocessing Symbolic Iter Total


0 0.22 0.24 6.88 7.34 (8.51)
2 0.23 0.08 6.78 7.09 (8.18)
3 0.23 0.04 6.09 6.36 (7.56)

Table 2: CPU times (s) of the hierarchical analysis of the large transistor
circuit. The total wall clock times are in parentheses.

Subs Preprocessing Symbolic Iter Total


0 10.18 13.46 5.54 29.18 (30.59)
2 10.40 4.58 8.97 23.95 (25.24)
3 10.19 2.31 8.18 20.68 (21.99)

36
The parallel simulation runs were performed with parallel NR (NRpar),
NRNR, NRGN with LSLU (NRGN LU) and CGNE (NRGN CGNE) methods.
In the multilevel methods, J was varied between 1 and 4.
Tables 3–6 present the numbers of outer iterations in the sweep points.
Figs. 10 and 11 present the simulation times of the moderate circuit with two
and three processors, Figs. 12 and 13 those of the large circuit.
The calculated speed-ups
tserial
S1 = (85)
tparallel
versus serial (not hierarchical) APLAC are presented in Figs. 14 and 15. The
speed-ups of the hierarchical analysis versus the number of the subcircuits are
in the same figures.
The improvements obtained from parallelization with respect to hierarchi-
cal analysis
thierarchical
S2 = (86)
tparallel
are presented in Figs. 16 and 17. The times of the parallel analysis are
compared to the times of hierarchical NR with the same number of subcircuits
as parallel processors.
The behaviour of the error kfk of the outer iteration in the first sweep
points is presented in the Figs. 18 and 19. The maximum number of inner
iterations J is 4.

Table 3: Number of outer iterations of the moderate transistor circuit with


two processors

J NRNR NRGN LU
0 17+8+8+8+8+8
1 11+5+5+5+5+5 11+3+3+3+3+3
2 5+3+3+3+3+3 6+2+2+2+2+2
3 6+3+3+3+3+3 5+2+2+2+2+2
4 5+3+3+3+3+3 4+2+2+2+2+2

37
Table 4: Number of outer iterations of the moderate transistor circuit with
three processors

J NRNR NRGN LU
0 16+8+8+8+8+8
1 11+5+5+5+5+5 11+3+3+3+3+3
2 5+3+3+3+3+3 6+2+2+2+2+2
3 6+3+3+3+3+3 5+2+2+2+2+2
4 5+3+3+3+3+3 4+2+2+2+2+2

Table 5: Number of outer iterations of the large transistor circuit with two
processors

J NRNR NRGN LU
0 17+8+8+8+8+8
1 12+5+5+5+5+5 12+3+3+3+3+3
2 6+3+3+3+3+3 6+2+2+2+2+2
3 6+3+3+3+3+3 5+2+2+2+2+2
4 5+3+3+3+3+3 4+2+2+2+2+2

Table 6: Number of outer iterations of the large transistor circuit with three
processors

J NRNR NRGN LU
0 16+8+8+8+8+8
1 11+5+5+5+5+5 11+3+3+3+3+3
2 5+3+3+3+3+3 6+2+2+2+2+2
3 6+3+3+3+3+3 5+2+2+2+2+2
4 5+3+3+3+3+3 4+2+2+2+2+2

38
7.8

7.6

7.4

7.2
Simulation time

6.8

6.6

6.4

6.2
0 1 2 3 4
Inner iterations

Figure 10: Simulation times (s) of the moderate transistor circuit vs. maxi-
mum number of inner iterations J. Two processors were used. NRNR (— o)
NRGN LU (— ∗ ) and NRNR CGNE (— 2 ) with 0 inner iterations is the NR
method.

6.3

6.2

6.1

5.9
Simulation time

5.8

5.7

5.6

5.5

5.4

5.3
0 1 2 3 4
Inner iterations

Figure 11: Simulation times (s) of the moderate transistor circuit vs. maxi-
mum number of inner iterations J. Three processors were used. NRNR (— o)
NRGN LU (— ∗ ) and NRNR CGNE (— 2 ) with 0 inner iterations is the NR
method.

39
27.5

27

26.5

Simulation time

26

25.5

25

24.5

24
0 1 2 3 4
Inner iterations

Figure 12: Simulation times (s) of the large transistor circuit vs. maxi-
mum number of inner iterations J. Two processors were used. NRNR (— o)
NRGN LU (— ∗ ) and NRNR CGNE (— 2 ) with 0 inner iterations is the NR
method.

20.4

20.2

20

19.8

19.6
Simulation time

19.4

19.2

19

18.8

18.6

18.4
0 1 2 3 4
Inner iterations

Figure 13: Simulation times (s) of the large transistor circuit vs. maxi-
mum number of inner iterations J. Three processors were used. NRNR (— o)
NRGN LU (— ∗ ) and NRNR CGNE (— 2 ) with 0 inner iterations is the NR
method.

40
1.7

1.6

1.5

1.4
Speed−up

1.3

1.2

1.1

1
1 2 3
Processors/subcircuits

Figure 14: Speed-up of the simulation of the moderate transistor circuit using
NRpar (—  ), NRNR (—o ), NRGN LU (— ∗ ) and NRNR CGNE (— 2 ) with 2 inner
iterations vs. number of processors. Speed-up of hierarchical analysis (- ×)
- - vs.
number of subcircuits.

1.8

1.7

1.6

1.5
Speed−up

1.4

1.3

1.2

1.1

1
1 2 3
Processors/subcircuits

Figure 15: Speed-up of the simulation of the large transistor circuit using
NRpar (—  ), NRNR (—o ), NRGN LU (— ∗ ) and NRNR CGNE (— 2 ) with 2 inner
iterations vs. number of processors. Speed-up of hierarchical analysis (- ×)
- - vs.
number of subcircuits.

41
1.45

1.4

1.35

1.3

1.25
Speed−up

1.2

1.15

1.1

1.05

1
1 2 3
Processors/subcircuits

Figure 16: Speed-ups of the simulation of the moderate transistor circuit ob-
tained from parallelization. NRpar (—  ), NRNR (— o ), NRGN LU (— ∗ ) and
NRNR CGNE (— 2 ) with 2 inner iterations versus number of processors with
respect to hierarchical analysis.

1.25

1.2

1.15
Speed−up

1.1

1.05

0.95
1 2 3
Processors/subcircuits

Figure 17: Speed-ups of the simulation of the large transistor circuit ob-
tained from parallelization. NRpar (—  ), NRNR (— o ), NRGN LU (— ∗ ) and
NRNR CGNE (— 2 ) with 2 inner iterations versus number of processors with
respect to hierarchical analysis.

42
2
10

0
10

−2
10

−4
10
Error

−6
10

−8
10

−10
10

−12
10
0 2 4 6 8 10 12 14 16
Iteration

 ), NRNR (—
Figure 18: Error norm of the outer iteration of NRpar (— o ), and
NRGN (— ∗ ) methods.

2
10

0
10

−2
10

−4
10
Error

−6
10

−8
10

−10
10

−12
10
0 2 4 6 8 10 12 14 16
Iteration

 ), NRNR (—
Figure 19: Error norm of the outer iteration of NRpar (— o ), and
NRGN (— ∗ ) methods.

43
7.2 Operational-amplifier circuit
The operational-amplifier circuit is a series connection of 9 inverting operational-
amplifier configurations (Fig. 20). The operational amplifiers consist of 12
BJTs and 7 resistors presented in Ref. [95, p. 438]. In the inverting configura-
tion, R = 1 kΩ. The input node is connected to ground and the output node
is floating. The supply voltages are ±2.5 V. The simulation goal was to find
the DC operating point of the circuit.

Figure 20: Inverting operational amplifier.

The CPU times of hierarchical analysis are presented in Table 7. They show
how the time of the iteration part dominates the analysis and slows down the
DC analysis.

Table 7: CPU times (ms) of hierarchical analysis of operational-amplifier cir-


cuit

Subs Preprocessing Symbolic iter total


0 27 3 390 419
3 29 2 399 429

Fig. 21 presents the behaviour of the error norm (J = 3). As we can see,
the NRGN iteration diverges after 4 outer iteration. This is the point where
the step-size limiting methods fail and it was time to use the source-stepping
method.

7.3 Discussion
The results show that the nonparallel hierarchical analysis was able to reduce
the time used for the symbolic analysis, but only in one example is it able
to improve the iteration part of the analysis. In the operational-amplifier
example, the speed-up obtained from the symbolic analysis is totally buried

44
1
10

0
10

−1
10

−2
10
Error

−3
10

−4
10

−5
10

−6
10

−7
10
0 5 10 15 20 25
Iteration

 ), NRNR (—
Figure 21: Error norm of the outer iteration of NRpar (— o ), and
NRGN (— ∗ ) methods.

by the iteration part. These results are in agreement with the discussion in
Ref. [69].
The speed-up of the parallel hierarchical NR method was not magnificnt.
The large transistor circuit example with two processors was even slower than
nonparallel simulation. The MLNR methods slightly improve the situation.
However, the speed-ups of the MLNR methods are not so good as Refs. [28,113]
lead us to expect, but it might be that the number of the computers used was so
small that reduction of the network communication does not matter anymore.
The nonideal implementation of the inner iterations also affects the results.
The results also show that a suitable default value for the maximum number
of inner iterations, J, is two. It is in agreement with the experimental results
of Refs. [125, 126].
In the transistor examples, the GN iteration reduces the number of outer
iterations but the operational-amplifier example shows that the NRGN method
is not always superior. It seemed that solving the inner solution in the sense of
LS brings the solution to a state that the outer iteration is not able to recover
from anymore.
As a final remark, we may say that the conventional NR method is quite
fast and robust, but using extra inner iterations we may sometimes improve
the nonlinear iteration.

45
8 Conclusion
The parallel hierarchical approach to DC analysis was presented. In hierar-
chical analysis, the circuit is decomposed into subcircuits. This allows the
utilization of hierarchical analysis methods, like NR iteration with hierarchical
LU factorization.
It was studied how to implement DC analysis in a NoWs where low commu-
nication was needed. Therefore, multilevel methods, as well, were considered
and implemented in APLAC.
The main emphasis was on aiding convergence, which is the critical part
in DC analysis. If a method does not converge, it cannot be fast, either.
Two new MLNR methods were presented. With the new NRNR and NRGN
methods, convergence-aiding methods can be applied. In addition, quadratic
local convergence was proven for both methods.
The simulation results, however, showed that with a small number of pro-
cessors, the results were not spectacular and the multilevel methods bring
about only small improvements. Network communication, which the MLNR
methods reduces, was not such a bottleneck as presented in literature.
The study of the parallel methods can be applied in other analyses, too,
and the work done serves as an introduction to parallel circuit simulation in
APLAC.

46
References
[1] “Message Passing Interface Forum. MPI: A Message-Passing Interface
Standard,” Int J. Supercomputer applications, vol. 8, no. 3/4.

[2] N. R. Aluru and J. White, “A Multi-level Newton Method for Static


and Fundamental Frequency Analysis of Electromechanical Systems,”
International Conference on Simulation of Semiconductor Processes and
Devices SISPAD’97, pp. 125–128, 1997.

[3] N. R. Aluru and J. White, “A Multilevel Newton Method for Mixed-


Energy Domain Simulation of MEMS,” Journal of Microelectromechan-
ical Systems, vol. 8, pp. 299–308, September 1999.

[4] D. Atef, A. Salem, and H. Baraka, “An Architecture of Distributed Co-


Simulation,” 42nd Midwest Sympossium on Circuits and systems, vol. 2,
pp. 855–858, 1999.

[5] A. Björck and J. Y. Yuan, “Preconditioners for Least Squares problems


by LU Factorization,” Electronic Transactions on Numerical Analysis,
vol. 8, pp. 26–35, 1999.

[6] W. Bomhof, Iterative and Parallel methods for Linear Systems with ap-
plications in Circuit Simulation. PhD thesis, Utrecht University, 2001.

[7] W. Bomhof and H. A. van der Vorst, “A Parallel Linear System Solver
for Circuit Simulation Problems,” Numer. Linear Algebra Appl., vol. 7,
pp. 649–665, 2000.

[8] J. Borchhardt, F. Grund, and D. Horn, “Parallized Numerical methods


for Large Systems of Differential-Algebraic Equations in Industrial Ap-
plications,” Surveys on Mathematics for Industry, vol. 8, pp. 201–211,
1999.

[9] J. Borchhardt, F. Grund, D. Horn, and M. Uhle, MAGNUS — Mehrstu-


fige Analyse Großer Netzwerke und Systeme, Tech. Report 9, WIAS,
Berlin, 1994.

[10] D. Bukat, G. Centkowski, and J. Ogrodzki, “Optima 1.1 — a Hier-


archical Decomposition Based Analyzer Including User Defined Mod-
els,” Proceedings of European Conference on Circuit Theory and Design,
pp. 360–370, 1991.

[11] M.-C. Chang and I. Hajj, “iPRIDE: A Parallel Integrated Circuit Sim-
ulator Using Direct Method,” Digest of Technical Paper of ICCAD’88,
pp. 304–307, 1988.

47
[12] C.-C. Chen and Y. H. Hu, “Parallel LU Factorization for Circuit Simula-
tion on a MIMD Computer,” Proceedings of the 1988 IEEE International
Conference on Computer Design ICCD’88, pp. 129–132, 1988.

[13] C.-C. Chen and Y. H. Hu, “A Practical Scheduling Algorithm for Parallel
LU factorization in Circuit Simulation,” Proc. Int. Symp. Circuits and
Systems, vol. 3, pp. 1788–1791, 1989.

[14] R. M. M. Chen, W. C. Siu, and A. M. Layfield, “Running SPICE in


parallel,” Proc. Int. Symp. Circuits and Systems, vol. 2, pp. 880–883,
1991.

[15] L. O. Chua and L.-K. Chen, “Diakoptic and Generalized Hybrid Anal-
ysis,” IEEE Trans. Circuits Syst., vol. CAS-23, pp. 694–705, December
1976.

[16] P.-Y. Chung and I. N. Hajj, “Parallel solution of Sparse Linear Systems
on a Vector Multiprocessor Computer,” Proc. Int. Symp. Circuits and
Systems, vol. 2, pp. 1577–1580, 1990.

[17] C. Cocchi, A. Benedetti, and Z. M. Kovàcs-V., “A New Subcircuit Or-


dering Algorithm for a Multilevel Cicuit Simulator,” Proceedings of Eu-
ropean Conference on Circuit Theory and Design, pp. 1059–1062, 1995.

[18] P. F. Cox, R. G. Burch, D. E. Hocevar, P. Yang, and B. d. Epler, “Direct


Circuit Simulation Algorithms for Parallel Processing,” IEEE Transac-
tions on Computer-Aided Design, vol. 10, pp. 714–725, June 1991.

[19] P. F. Cox, R. G. Burch, P. Yang, and D. E. Hocevar, “New Implicit


Integration Method for Efficient Latency Exploitation in Circuit Simula-
tion,” IEEE Transactions on Computer-Aided Design, vol. 8, pp. 1051–
1064, October 1989.

[20] J. W. Demmel, J. R. Gilbert, and X. S. Li, “An Asyncronous Parallel


Supernodal Algorithm for Sparse Gaussian Elimination,” SIAM Journal
on Matrix Analysis and Applications, vol. 20, no. 3, pp. 915–952, 1999.

[21] A.-C. Deng, “On Network Partitioning Algorithm of Large-Scale CMOS


Circuits,” IEEE Trans. Circuits Syst., vol. 36, pp. 294–299, February
1989.

[22] M. P. Desai and I. N. Hajj, “On the Convergence of Block Relaxation


Methods for Circuit Simulation,” IEEE Trans. Circuits Syst., vol. 36,
pp. 948–958, July 1989.

48
[23] V. B. Dmitriev-Zdorov, N. I. Merezin, V. P. Popov, and R. A. Dougal,
“Stability of Real-Time Modular Simulation of Analog System,” Proc.
of The 7th workshop on Computers in power Electronics (COMPEL),
pp. 263–267, 2000.
[24] J. J. Dongarra, I. S. Duff, D. C. Sorensen, and H. A. Van der Vorst, Nu-
merical Linear Algebra for High-Performance Computers. Philadelphia:
SIAM, 1998.
[25] K.-M. Eickhoff and W. L. Engl, “Levelized Incomplete LU Factorization
and Its Application to Large-Scale Circuit Simulation,” IEEE Trans-
actions on Computer-Aided Design of Integrated Circuits and Systems,
vol. 14, pp. 720–727, June 1995.
[26] W. L. Engl, R. Laur, and H. K. Dirks, “MEDUSA — A Simulator for
Modular Circuits,” IEEE Transactions on Computer-Aided Design of
Integrated Circuits and Systems, vol. CAD-1, pp. 85–93, April 1982.
[27] N. Fröhlich, V. Glöckel, and J. Fleichmann, “A New Partitioning Method
for Parallel Simulation of VLSI Circuits on Transistor Level,” Proceedings
of the Design, Automation and Test in Europe Conference and Exhibi-
tion, pp. 679–684, 2000.
[28] N. Fröhlich, B. M. Riess, U. A. Wever, and Q. Zheng, “A New Approach
for Parallel Simulation of VLSI Circuits on a Transistor Level,” IEEE
Trans. Circuits Syst. I, vol. 45, pp. 601–613, June 1998.
[29] N. Fröhlich, R. Schlagenhaft, and J. Fleischmann, “A New Approach
for Partitioning VLSI Circuits on Transistor Level,” Proceedings of 11th
Workshop on Parallel and Distributed Simulation, pp. 64–67, 1997.
[30] D. A. Gates, P. K., and D. O. pederson, “Mixed-Level circuit and de-
vice Simulation on a Distributed-Memory Multicomputer,” Proceedings
of IEEE 1993 Custom Integrated Circuits Conference, pp. 851–854, 1993.
[31] H. Gaunholt, P. Heikkilä, K. Mannersalo, V. Porra, and M. Valto-
nen, “Gyrator Transformation — A Better Way for Modified Nodal
Approach,” Proceedings of European Conference on Circuit Theory and
Design, vol. 2, pp. 864–872, July 1991.
[32] A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek, and V. Sun-
deram, PVM: Parallel Virtual Machine, A Users’ Guide and Tutorial
for Networked Parallel Computing. The MIT Press, 1994.
[33] G. Guardabassi and A. Sangiovanni-Vincentelli, “A Two Level Algorithm
for Tearing,” IEEE Trans. Circuits Syst., vol. CAS-23, pp. 783–791, De-
cember 1976.

49
[34] K. Hachiya, T. Saito, T. Nakata, and N. Tanabe, “Enchancement of Par-
allelism for Tearing-based Circuit Simulation,” Proceedings of the Asic
and South Pacific Design Automation Conference, pp. 493–498, 1997.

[35] I. N. Hajj, “Sparsity Considerations in Network Solution by Tearing,”


IEEE Trans. Circuits Syst., vol. CAS-27, pp. 357–366, May 1980.

[36] H. H. Happ, Diakoptics and Networks. New York: Academic Press, 1971.

[37] M. M. Hassoun and P. M. Lin, “An Efficient Partitoning Algorithm for


Large-Scale Circuits,” Proc. Int. Symp. Circuits and Systems, vol. 3,
pp. 2405–2408, 1990.

[38] P. Heikkilä, Object-Oriented Approach to Numerical Circuit Analysis.


Doctoral thesis, Helsinki University of Technology, Department of Elec-
trical E ngineering, 1992.

[39] C.-W. Ho, A. E. Ruehli, and P. A. Brennan, “The Modified Nodal


Approach to Network Analysis,” IEEE Trans. Circuits Syst., vol. 22,
pp. 504–509, June 1975.

[40] M. Honkala, V. Karanko, and J. Roos, “Improving the Convergence


of Combined Newton–Raphson and Gauss–Newton Multilevel Iteration
Method,” Proc. Int. Symp. Circuits and Systems, 2002. Accepted to be
published.

[41] M. Honkala, J. Roos, and M. Valtonen, “New Multilevel Newton–


Raphson Method for Parallel Circuit Simulation,” Proceedings of Euro-
pean Conference on Circuit Theory and Design, vol. 2, (Espoo, Finland),
pp. 113–116, 2001.

[42] G. G. Hung, K. Gallivan, and R. Saleh, “Parallel Circuit Simulation


Using Hierarchical Relaxation,” Proceedings of 27th ACM/IEEE Design
Automation Conference, pp. 394–399, 1990.

[43] G. G. Hung, K. Gallivan, and R. Saleh, “Parallel Circuit Simulation


Based on Nonlinear Relaxation Methods,” Proc. Int. Symp. Circuits and
Systems, vol. 4, pp. 2284–2287, 1991.

[44] G.-G. Hung, Y.-C. Wen, K. A. Gallivan, and R. A. Saleh, “Improving


the Performance of Parallel Relaxation-Based Circuit Simulation,” IEEE
Transactions on Computer-Aided Design of Integrated Circuits and Sys-
tems, vol. 12, pp. 1762–1774, November 1993.

[45] X. D. Jia and R. M. M. Chen, “A New Matrix Solution Technique


for Large-Scale Circuit Simulation on Multi-processor System,” Proceed-
ings of 1994 IEEE Region 10’s Ninth Annual International Confrence.

50
Theme: Frontiers Computer Technology (TENCON’94), vol. 2, pp. 832–
836, 1994.

[46] X. D. Jia, R. M. M. Chen, and A. M. Layfield, “Circuit Partitioning


for Multiprocessor Spice,” Proceedings of IEEE Region 10 Conference
on 20000 Computer, Communication, Control and Power Engineering
(TENCON’93), pp. 1186–1189, 1993.

[47] T. Kage, F. Kawafuji, and J. Niitsuma, “A Circuit Partitioning Approach


for Parallel Circuit Simulation,” IEICE Trans. Fundamentals, vol. E77-
A, pp. 461–466, March 1994.

[48] V. Karanko and M. Honkala, “Least Squares Solution of Nearly Square


Overdetermined Sparse Linear Systems,” Proc. Int. Symp. Circuits and
Systems, 2002.

[49] T. Kato, H. Minato, and M. Tanaka, “Parallel Analysis of a Power


Electronic Circuit,” Proc. of Power Electronics and Motion Conference
(PIEMC), vol. 2, pp. 654–659, 2000.

[50] T. J. Kazmierski and Y. Bouchlaghem, “Hierarchical Solution of Linear


Algebraic Equations on Transputer Trees for Circuit Simulation,” Pro-
ceedings of European Conference on Circuit Theory and Design, pp. 42–
45, 1989.

[51] J. Keller, T. Rauber, and B. Rederlechner, “Conservative Circuit Sim-


ulation on Shared-Memory Multiprocessors,” Proc. Tenth Workshop on
Parallel and Distributed Simulation (PADS’96), pp. 125–134, 1996.

[52] C. T. Kelley, Iterative Methods for Linear and Nonlinear Equations.


Philadelphia: SIAM, 1995.

[53] U. Kleis, O. Wallat, U. Wever, and Q. Zheng, “Domain Decomposition


Methods for Circuit Simulation,” Proceedings of the 8th Workshop on
Parallel and Distributed Simulation, pp. 183–184, 1994.

[54] V. Klinger, “DiPaCS: a new concept for Parallel Circuit Simulation,”


Proceedings of the 28th Annual Simulation Sympossium, pp. 32–41, 1995.

[55] D. P. Koester, Parallel Block-Diagonal-Bordered Sparse linear Solvers for


Power System Applications. PhD thesis, Syracuse University, October
1995.

[56] Z. M. Kovàcs-V, A. Benedetti, S. Graffi, and G. Masetti, “A New Mul-


tilevel Simulator for MOS Integrated Circuits,” International Workshop
on VLSI Process and Device Modelling, pp. 160–161, 1993.

51
[57] Z. M. Kovàcs-V. and A. Benedetti, “MUSIC: A Novel MUltilevel Simu-
lator for MOS Integrated Circuits,” Proceedings of European Conference
on Circuit Theory and Design, pp. 559–600, 1993.

[58] K. S. Kundert, “Sparse Matrix Techniques,” Circuit Analysis, Simu-


lation and Design (A. E. Ruehli, ed.), pp. 281–324, Elseviers Science
Publishers B. V., 1986.

[59] E. Lelarasmee, A. E. Ruehli, and A. L. Sangiovanni-Vincentelli, “The


Waveform Relaxation Method for Time-domain Analysis of Large Scale
Integrated Circuits,” IEEE Transactions on Computer-Aided Design of
Integrated Circuits and Systems, vol. 1, pp. 131–145, July 1982.

[60] F. Li and P.-Y. Woo, “A New Concept the ’Virtual Circuit’ and Its Ap-
plication in Large-Scale Network Analysis with Tearing,” International
Journal of Circuit Theory and Applications, vol. 27, pp. 283–291, 1999.

[61] P. Linardis, K. G. Nichols, and E. J. Zaluska, “Network Partitioning and


Latency Exploitation in Time-Domain Analysis of Nonlinear Electronic
Circuits,” Proc. Int. Symp. Circuits and Systems, pp. 510–514, 1978.

[62] A. Mahmood, Y. Chu, and T. Sobh, “Parallel Sparse-Matrix Solution


for Direct Circuit Simulation on a Transputer Array,” IEE poc. Circuits
Devices Syst., vol. 144, pp. 335–342, December 1996.

[63] K. Mayaram, P. Yang, J. Chern, R. Burch, L. Arledge, and P. Cox, “A


Parallel Block-Diagonal Preconditioned Conjugate-Gradient Solution Al-
gorithm for Circuit and Device Simulations,” Digest of technical Papers.
IEEE international Conference on Computer-Aided Design (ICCAD-90),
pp. 446–449, 1990.

[64] K. Mayaram and D. O. Pederson, “Coupling Algorithms for Mixed-Level


Circuit and Device Simulation,” IEEE Transactions on Computer-Aided
Design, vol. 11, pp. 1003–1012, August 1992.

[65] M. Nishigaki, N. Tanaka, and H. Asai, “Hierarchical Decomposition for


Circuit Simulation by Direct Method,” The Transactions of the IEICE,
vol. E 73, pp. 1957–1963, December 1990.

[66] M. Nishigaki, N. Tanaka, and H. Asai, “Hierarchical Decomposition Sys-


tem and Its Availability for Network Solution,” Proceedings of IEEE
International Sympossium on Circuits and Systems, pp. 884–887, 1991.

[67] M. Nitescu and F. Constantinescu, “The Efficiency of Some Numerical


Methods for Hierarchical Circuit Analysis,” Rev. Roum. Sci. Techn.-
Electrotechn. et Eneg., vol. 42, pp. 345–353, 1997.

52
[68] P. Odent and H. L. Claesen, De Man, “Combined Waveform Relaxation
– Waveform Relaxation Newton Algorithm for efficient parallel Circuit
Simulation,” Proceedings of the European Design Automation Confer-
ence, pp. 244–248, 1990.

[69] J. Ogrodzki, Circuit Simulation Methods and Algorithms. CRC Press,


1994.

[70] E. Pajarre, T. Ritoniemi, and H. Tenhunen, “PAR-APLAC: Parallel


Circuit Analysis and Optimization,” Proceedings of Design Automation
Conference (EURO-DAC’92), pp. 584–589, 1992.

[71] L. Peterson and S. Mattisson, “The Design and Implementation of a Con-


current Circuit Simulation Program for Multicomputers,” IEEE Trans-
actions on Computer-Aided Design of Integrated Circuits and Systems,
vol. 12, pp. 1004–1014, July 1993.

[72] L. Peterson and S. Mattisson, “Circuit Partitioning and Iteration Scheme


for Waveform Relaxation on Multicomputers,” Proc. Int. Symp. Circuits
and Systems, vol. 1, pp. 570–573, 1989.

[73] L. Peterson and S. Mattisson, “Dynamic Partitioning for Concurrent


Waveform Relaxation-Based Circuit Simulation,” Proc. Int. Symp. Cir-
cuits and Systems, vol. 3, pp. 1639–1642, 1993.

[74] C. R. Pon, R. Saleh, and T. Kwasniewski, “Distributed Circuit Simula-


tion Using Waveform relaxation in a Slotted-Ring Architecture,” IEEE
Region 10’s Ninth Annual International Confrence. Theme: Frontiers
Computer Technology (TENCON’94), vol. 2, pp. 545–548, 1994.

[75] N. B. Rabbat and H. Y. Hsieh, “A Latent Macromodular Approach to


Large-scale Sparse Networks,” Proc. Int. Symp. Circuits and Systems,
pp. 271–274, 1976.

[76] N. B. Rabbat and H. Y. Hsieh, “Concepts of Latency in the Time-


Domain Solution of Nonlinear Differential Equations,” Proc. Int. Symp.
Circuits and Systems, pp. 813–825, 1978.

[77] N. B. G. Rabbat, A. L. Sangiovanni-Vincentelli, and H. Y. Hsieh, “A


Multilevel Newton Algorithm with Macromodeling and Latency for the
Analysis of Large-Scale Nonlinear Circuits in the Time Domain,” IEEE
Trans. Circuits Syst., vol. CAS-26, pp. 733–741, September 1979.

[78] N. B. Rabbat and H. Y. Hsieh, “A Latent Macromodular Approach to


Large Scale Sparse Networks,” IEEE Trans. Circuits Syst. I, vol. CAS-
23, pp. 745–752, December 1976.

53
[79] C. V. Ramamoorthy and V. Vij, “CM-CIM: A Parallel Circuit Simulator
on a Distributed Memory Multiprocessor,” Proceedings of 7th interna-
tional Conference on VLSI Design, pp. 39–44, January 1994.

[80] D. L. Rhodes and A. Gerasoulis, “Scalable Parallellization of Harmonic


Balance Simulation,” Lecture notes in Computer Science, Irregular Con-
ference 99, 1999.

[81] D. L. Rhodes and A. Gerasoulis, “A Scheduling Approach to Parallel


Harmonic Balance Simulation,” Concurrency: Practice and Experience,
vol. 12, pp. 175–187, June 2000.

[82] D. L. Rhodes and B. S. Perlman, “Parallel Computation for Microwave


Circuit Simulation,” IEEE Transactions on Microwave Theory and Tech-
niques, vol. 45, pp. 587–592, May 1997.

[83] V. Rizzolli, F. Mastri, and D. Masotti, “A Hiearchical Harmonic-Balance


Technique for the Efficient Simulation of Large Size Nonlinear Microvave
Circuits,” Proc. of 25th European Microwave Conf. (Bologna), pp. 615–
619, 1995.

[84] R. A. Rohrer, “Circuit Partitioning Simplified,” IEEE Trans. Circuits


Syst., vol. 35, pp. 2–5, January 1988.

[85] F. M. Rotela, Mixed Circuit and Device Simulation for Analysis, Design,
and Optimization of Opto-electronic, Radio Frequanecy, and High Speed
Semiconductor Devices. PhD thesis, Stanford University, April 2000.

[86] A. E. Ruehli, N. B. Rabbat, and H. Y. Hsieh, “Macromodular Latent So-


lution of Digital Networks Including Interconnections,” Proc. Int. Symp.
Circuits and Systems, pp. 515–521, 1978.

[87] Y. Saad and M. H. Schultz, “GMRES: A Generalized Minimal Residual


Algorithm for Solving Nonsymmetric Linear Systems,” SIAM Journal
on Scientific and Statistical Computing, vol. 7, pp. 856–869, 1986.

[88] P. Sadayappan and V. Visvanathan, “Circuit Simulation on Shared-


Memory Multiprocessors,” IEEE Transactions on Computers, vol. 37,
pp. 1634–1642, December 1988.

[89] R. Saleh, S.-J. Jou, and A. R. Newton, Mixed-Mode Simulation and


Analog Multilevel Simulation. Kluwer Academic Publisher, 1994.

[90] R. A. Saleh, K. A. Gallivan, M.-C. Chang, I. N. Hajj, D. Smart, and T. N.


Trick, “Parallel Circuit Simulation on Supercomputers,” Proceedings of
the IEEE, vol. 77, pp. 1915–1931, December 1989.

54
[91] R. A. Saleh and J. K. White, “Accelerating Relaxation Algorithms for
Circuit Simulation Using Waveform-Newton and Step-Size Refinement,”
IEEE Transactions on Computer-Aided Design, vol. 9, pp. 951–958,
September 1990.

[92] A. Sangiovanni-Vincentelli, L.-K. Chen, and L. O. Chua, “An Efficient


Heuristic Cluster Algorithm for Tearing Large-Scale Networks,” IEEE
Trans. Circuits Syst., vol. CAS-24, pp. 709–717, December 1977.

[93] A. Sangiovanni-Vincentelli, L.-K. Chen, and L. O. Chua, “Three


Decomposition-Based Solution Methods for Solving a Large System of
Linear Equations,” Proc. Int. Symp. Circuits and Systems, pp. 582–586,
1978.

[94] H. R. Schwarz, Numerical Analysis — A Comprehensive Introduction.


Chichester, New York, Brisbane, Toronto, Singapore: John Wiley &
Sons, 1988.

[95] A. S. Sedra and kenneth C. Smith, Microelectronic circuits. Oxford


University Press, 1991.

[96] S. D. Senturia, N. Aluru, and J. White, “Simulating the Behavior of


MEMS Devices: Computational Methods and Needs,” IEEE Computa-
tional Science and engineering, vol. 4, pp. 30–43, January 1997.

[97] S. Skelboe, “Parallel Algorithm for a Direct Circuit Simulator,” Proceed-


ings of European Conference on Circuit Theory and Design, pp. 314–323,
1991.

[98] C. P. Soto, R. Saleh, and T. Kwasniewski, “Time Warping-Waveform Re-


laxation in a Distributed Circuit Simulation Environment,” 38th Midwest
Sympossium on Circuits and systems, vol. 1, pp. 338–341, 1996.

[99] G. N. Stenbakken and J. A. Starzyk, “Diakoptic and large change sen-


sitivity analysis,” IEE Proceedings-G Circuits, Devices and Systems,
vol. 139, no. 1, pp. 625–630, 1992.

[100] R. Suda, “Large scale circuit analysis by preconditioned relaxation meth-


ods,” Proc. of Matrix Analysis and Parallel Computing (PCG ’94),
pp. 189–197, April 1994.

[101] R. Suda, New Iterative Linear Solvers for Parallel Circuit Simulation.
PhD thesis, Department of Information Sciences, University of Tokyo,
1996.

55
[102] R. Suda and Y. Oyanagi, “Implementation of sparta, a Highly Paral-
lel Circuit Simulator by the Preconditioned Jacobi Method, on a Dis-
tributed Memory Machine,” Proc. International Conference on Super-
computing, (Barcelona), pp. 209–217, 1995.

[103] N. Tanaka and H. Asai, “Large Scale Circuit Simulation System with
Dedicated Parallel Processor SMASH,” The Transactions of the IEICE,
vol. E 73, pp. 1957–1963, December 1990.

[104] N. Tanaka and H. Asai, “Architecture for Simulation System with Con-
sideration of Circuit partition,” Proceedings of IEEE International Sym-
possium on Circuits and Systems, pp. 2689–2692, 1991.

[105] L. Trajković, R. C. Melville, and S.-C. Fang, “Finding DC Operat-


ing Points of Transistor Circuits Using Homotopy Methods,” Proc. Int.
Symp. Circuits and Systems, pp. 758–761, 1991.

[106] J. A. Trotter and P. Agrawal, “Multiprocessor architecture for Circuit


Simulation,” Proceedings of International Conference on Computer De-
sign: VLSI in Computers and Processors (ICCD’91), pp. 621–625, 1991.

[107] M. Valtonen, P. Heikkilä, H. Jokinen, and T. Veijola, “APLAC – Object-


oriented Circuit Simulator and Design Tool,” Low-power HF Microelec-
tronics: a Unified Approach (G. A. S. Machado, ed.), pp. 333–372, IEE,
1996.

[108] C. M. Vidallon, “Multipurpose Asynchronous Parallel Process Circuit


Solver,” Proceedings of European Conference on Circuit Theory and De-
sign, pp. 1161–1170, 1991.

[109] M. Vlach, “LU Decomposition Algorithms for Parallel and Vector Com-
putation,” Analog Methods for Computer-Aided Circuit Analysis and De-
sign (T. Ozawa, ed.), pp. 37–64, Marcel Dekker inc, 1988.

[110] S. Wang and A.-C. Deng, “Delivering a Full-Chip Hierarchical Circuit


Simulation Analysis Solution for Nanometer Desing,” 2001.

[111] E. Wehrhahn, “Hierarchical Circuit Analysis,” Proc. Int. Symp. Circuits


and Systems, pp. 701–704, 1989.

[112] Y.-C. Wen, K. Gallivan, and R. Saleh, “Improving Parallel Circuit Simu-
lation using High-Level Waveforms,” Proc. Int. Symp. Circuits and Sys-
tems, vol. 1, pp. 728–731, 1995.

[113] U. Wever and Q. Zheng, “Parallel Transient Analysis for Circuit Simu-
lation,” Proceedings of the 29th Annual Hawaii international Conference
on Systems Sciences, pp. 442–447, 1996.

56
[114] F. F. Wu, “Solution of Large-Scale Networks by Tearing,” IEEE Trans.
Circuits Syst., vol. CAS-23, pp. 706–713, December 1976.

[115] K. Y. Wu, P. K. H. Ng, X. D. Jia, R. M. M. Chen, and A. M. Lay-


field, “Performance Tuning of a Multiprocessor Sparse Matrix Equation
Solver,” Proceedings of the Twenty-Eight Hawaii International Confer-
ence on System Sciences, pp. 4–13, 1995.

[116] K. Y. Wu, P. K. H. Ng, X. D. Jia, R. M. M. Chen, and A. M. Layfield,


“A Parallel Direct Method Circuit Simulator Based on Sparse Matrix
Partitioning,” Computers & Electrical Engineering, vol. 24, pp. 385–404,
1998.

[117] E. Z. Xia and R. A. Saleh, “Parallel Waveform-Newton Algorithms


for Circuit Simulation,” IEEE Transactions on Computer-Aided Design,
vol. 11, pp. 432–442, April 1992.

[118] F. Yamamoto and S. Takashi, “Vectorized LU Decomposotion Al-


gorithms for Large-Scale Circuit Simulation,” IEEE Transactions on
Computer-Aided Design, vol. CAD-4, pp. 232–239, July 1985.

[119] G.-C. Yang, “PARASPICE: A Parallel Circuit Simulator for Shared-


Memory Multiprocessors,” Proceedings of 27th ACM/IEEE Design Au-
tomation Conference, pp. 400–405, 1990.

[120] H. R. Yeager and R. W. Dutton, “Improvement in Norm-Reducing New-


ton Methods for Circuit Simulation,” IEEE Transactions on Computer-
Aided Design, vol. 8, pp. 538–546, May 1989.

[121] D. C. Yeh and V. B. Rao, “Partitioning Issues in Circuit Simulation


on Multiprocessors,” IEEE international Conference on Computer-Aided
Design (ICCAD-91), Digest of Technical Papers, pp. 300–303, 1988.

[122] H. Yoshida, S. Kumagai, I. Shirakawa, and S. Kodama, “A Parallel Im-


plementation of Large-Scale Circuit Simulation,” Proc. Int. Symp. Cir-
cuits and Systems, vol. 1, pp. 321–324, 1988.

[123] G. Zanghirati, “Global convergence of nonmonotone strategies in parallel


methods for block-bordered nonlinear systems,” J. Appl. Math. Comp.,
vol. 107, pp. 137–168, January 2000.

[124] A. I. Zecevic and N. Gacic, “A Partitioning Algorithm for the Parallel


Solution of Differential-Algebraic Equations by Waveform Relaxation,”
IEEE Trans. Circuits Syst. I, vol. 46, pp. 421–434, April 1999.

57
[125] X. Zhang, “Dynamic and Static Load Balancing for Block Bordered
System Circuit Equations on Multiprocessors,” IEEE Transactions on
Computer-Aided Design, vol. 11, pp. 1086–1094, September 1992.

[126] X. Zhang, R. H. Byrd, and R. B. Schnabel, “Parallel Methods for Solv-


ing Nonlinear Block Bordered System of Equations,” SIAM Journal on
Scientific and Statistical Computing, vol. 13, pp. 841–859, July 1992.

[127] D. J. Zukowski and T. A. Johnson, “Efficient Parallel Circuit Simula-


tion Using Bounded-Chaotic Relaxation,” Proc. Int. Symp. Circuits and
Systems, vol. 2, pp. 911–914, 1992.

58