Вы находитесь на странице: 1из 14

616 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 9, NO.

5, OCTOBER 2001

On Gate Level Power Optimization Using


Dual-Supply Voltages
Chunhong Chen, Member, IEEE, Ankur Srivastava, Student Member, IEEE, and Majid Sarrafzadeh, Fellow, IEEE

Abstract—In this paper, we present an approach for applying the subthreshold current [9], [20]. In either case, an implicit as-
two supply voltages to optimize power in CMOS digital circuits sumption is that the reduced supply voltages are uniformly ap-
under the timing constraints. Given a technology-mapped net- plied to all logic modules. Instead, the entire circuit performance
work, we first analyze the power/delay model and the timing slack
distribution in the network. Then a new strategy is developed does not necessarily get worse if the reduced supply voltages are
for timing-constrained optimization issues by making full use of applied only to the logic modules on the noncritical paths. The
slacks. Based on this strategy, the power reduction is translated reason is that these modules typically have high timing slack
into the polynomial-time-solvable maximal-weighted-indepen- which may allow the increase of their delay without violating
dent-set problem on transitive graphs. Since different supply the timing constraints. Experience shows that, for most circuits,
voltages used in the circuit lead to totally different power con-
sumption, we propose a fast heuristic approach to predict the logic modules (or gates, at logic level) on critical paths only ac-
optimum dual-supply voltages by looking at the lower bound of count for a small portion of all modules. In Fig. 1, for example,
power consumption in the given circuit. To deal with the possible we plot the average distribution of gates with different slack for
power penalty due to the level converters at the interface of 16 MCNC91 benchmarks after technology mapping (note that
different supply voltages, we use a “constrained F-M” algorithm the slack value has been normalized to the longest path delay). It
to minimize the number of level converters. We have implemented
our approach under SIS environment. Experiment shows that the can be seen from this figure that, the number of gates on critical
resulting lower bound of power is tight for most circuits and that paths (i.e., with zero or close-to-zero slack) accounts for only
the predicted “optimum” supply voltages are exactly or very close about 14% of total gates, while more than 60% of gates have
to the best choice of actual ones. The total power saving of up to their slack larger than 0.2. This potentially provides much room
26% (average of about 20%) is achieved without degrading the for power reduction using the reduced supply voltage.
circuit performance, compared to the average power improve-
ment of about 7% by gate sizing technique based on a standard In general, power reduction technique with the reduced
cell library. Our technique provides the power-delay tradeoff voltage (or variable voltage) is called voltage scaling. De-
by specifying different timing constraints in circuits for power pending on how many supply voltages of different value are
optimization. available, voltage scaling may be classified as multiple voltage
Index Terms—Dual-supply voltages, gate level, maximal- approach or dual-voltage approach. Prior work on multiple
weighted-independent-set, power optimization, timing con- voltage approach includes [4], [12], [13], and [25]. Most of
straints. them basically focus on the scheduling problem for low power
at the behavioral level. For example, [12] proposes an optimal
I. INTRODUCTION scheduling algorithm to reduce The systems power while
meeting the timing constraint. In [13], a dynamic programming

W ITH the increasing demand for low-power applications,


power optimization has been a major goal in designing
digital circuits. Since the dynamic power dissipation of CMOS
technique is used to minimize the average energy consumption.
However, there are some practical hurdles that need to be over-
come before the use of multiple supply voltages. Among them
circuits is proportional to the square of supply voltage, reducing are constrained physical design problems and the resulting
the supply voltage promises to be one of the most effective area/delay/power penalty due to level converters (LCs), which
ways to achieve low power. Unfortunately, the reduced supply are required at the interface of different voltages. In contrast,
voltage leads to the speed loss of the logic modules. This sit- these issues can be eased if only two supply voltages are used
uation deteriorates especially as the supply voltage approaches [2], [3], [5], [15], [26]. In [3], for example, a layout scheme
the threshold voltage of the device [1]. To compensate for the using two voltages is discussed together with its application to
reduced speed, one can use parallel and pipelined architectures a media processor chip design. In [15], dual-supply voltages
with the expensive hardware overhead [8]. Alternatively, the were used successfully to design a chip of MPEG4 codec
speed degradation problem can be overcome by using the low- core at Toshiba Corporation. These show the feasibility of
threshold voltage which, however, results in a rapid increase in implementing dual-voltage circuits.
In [2], Usami and Horowitz first proposed a dual-voltage ap-
Manuscript received August 4, 1999; revised February 13, 2001. This work proach based on the so-called cluster-voltage scaling (CVS).
was supported in part by the National Science Foundation (NSF) under Grant
MIP-9527389, by DARPA, and by a Grant from Motorola. The idea is to use the depth-first search from primary outputs
C. Chen is with the Department of Electrical and Computer Engineering, Uni- to find gates which may operate at low supply voltage without
versity of Windsor, ON, Canada. violating the timing constraints. As a result, two clusters are
A. Srivastava and M. Sarrafzadeh are with the Computer Science Department,
University of California at Los Angeles, Los Angeles, CA 90095-1596 USA. created. One is the cluster consisting of gates with low supply
Publisher Item Identifier S 1063-8210(01)03643-5. voltage ( ) on the side of primary outputs, and the other is the
1063–8210/01$10.00 © 2001 IEEE
CHEN et al.: ON GATE LEVEL POWER OPTIMIZATION USING DUAL-SUPPLY VOLTAGES 617

mance of the circuit, we relate the power optimization to the


maximal-weighted-independent set (MWIS) problem and pro-
pose a fast heuristic algorithm to predict the optimum supply
voltage [21]. Then, based on predicted supply voltages, we de-
velop an effective algorithm to allow as many gates as pos-
sible working at . Considering the possible power penalty
of the LCs at the interface of and , we target min-
imizing the number of LCs by using what we call the “con-
strained Fiduccia–Mattheyses” [6] algorithm. By specifying dif-
ferent timing constraints, the proposed technique is also able
to provide the power-delay tradeoff with two supply voltages
[22]. The design flow of power optimization with dual voltages
is shown in Fig. 2 (other layout structures can be found in [16]).
Fig. 1. The average distribution of gates with different slack for 16 benchmark
circuits.
II. BACKGROUND
A technology-mapped network can be represented as a di-
cluster of gates with high-supply voltage ( ) on the side of rected acyclic graph . A node corresponds
primary inputs. Thus, no LCs are required. In this cluster struc- to a gate in the network (The terms “gate” and “node” will be
ture, however, any gate may be selected to operate at only
used interchangeably throughout the paper). The existence of
after all its transitive fanouts have been selected to do so. Thus, a directed edge implies that node is an imme-
some part of the circuit with high slack may be left to operate diate fanin of node (or, node is an immediate fanout of node
unnecessarily at , limiting the potential of further power re- ). The set of all immediate fanins (fanouts) of is denoted by
duction. To avoid this problem, an extended-CVS structure was
( ). If there is a directed path from node to node
first proposed in [5] using the so-called level sort technique and in , ( ) is said to be a transitive fanin (fanout) of ( ).
was improved recently in [18]. In this structure, gates may The set of all transitive fanins (fanouts) of node is denoted by
be scattered among the gates. However, because of lack
( ). Each node is associated with a delay
of a global view, the method may not be effective especially
, where is the intrinsic delay, is a con-
when the given timing constraints are tight. In addition, these stant dependent on the driving ability of the node, and is
techniques did not consider the switching activity information the loading capacitance at the output of node . The arrival time
within the circuit. Clearly, as long as the switching activity of
and required time of node are recursively given by
all nodes in the circuit is available, the more power saving can
be reached by allowing logic gates with high switching activity
to operate at . In this sense, the CVS-based structure is not (1)
preferable for power reduction, since the nodes near primary
outputs typically have low switching activity (on average) [23].
More recently, a linear programming approach is presented in respectively, and the slack time of node is defined to be
[19] to address dual-voltage problem. However, it is based on .
the so-called delay balanced graph whose number increases ex- With good accuracy [1], the node delay at supply voltage
ponentially in terms of problem size. is proportional to , where is the threshold
No matter what specific algorithm is to be used, different voltage, and is a constant (note that the slew effect is not con-
value of two supply voltages can lead to totally different power sidered here). The delay of node at any supply voltage can
consumption. This gives rise to another important problem with be expressed as
dual-voltage approach: How to select the value of two supply
voltages for maximum power reduction? Existing techniques do (2)
not deal with this problem. Since the optimum supply voltage
depends on specific circuits, it is straightforward to try a variety
where is the delay of node at . The delay of node
of possible voltages for each given circuit and select the best
at is expressed as . The delay
one of them. However, this exhaustive strategy is computation-
increase due to the voltage reduction from to is thus
ally expensive. A fast prediction algorithm is thus desirable to
. We call the delay gap of node .
provide a good starting point toward power optimization with
In our two-voltage approach, each node works at either
dual voltages.
or . If node works at , in (1) is replaced
In this paper, we address the problem of reducing the total
with for the calculation of arrival time, required time and
power consumption by using two supply voltages ( and
slack time.
) at gate level. Given a technology-mapped network, the
In CMOS circuits, the average dynamic power consumption
power consumption can be modeled more accurately. We an-
is given by
alyze the timing slack distribution and explore how to take full
advantage of slacks in a given circuit. In order to get the max- (3)
imum power reduction while maintaining the timing perfor-
618 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 9, NO. 5, OCTOBER 2001

Fig. 2. Design flow with dual-voltage technique.

where timing constraints, any node may be selected to work at for


clock frequency; power reduction only if it is a feasible node. In general, however,
again the loading capacitance; not all feasible nodes in can work at . The reason is that,
switching activity at the output of gate . once a node is selected to work at , the slack of others (such
In general, switching activity inside a circuit not only depends as the transitive fanins/fanouts of the node) may be reduced.
on the topologic structure and input patterns of the circuit, but As a result, some feasible nodes may no longer be feasible (see
may vary with gate delay which introduces glitching transi- Example 2.1).
tions. Here, however, does not account for glitch power. Example 2.1: Fig. 3 shows an example with six nodes, where
The reason is twofold. First, measuring the glitches requires an and ( ) are the slack and delay gap of node
event-driven simulation and an explicit knowledge of the inputs , respectively. Initially, all nodes are feasible. If node is
waveforms. The later hypothesis is unrealistic, and updating the selected to work at , the new slack values for all nodes will
amount of glitching with simulation iteratively is computation- be , and , making node
ally too expensive. Secondly, as we will see in the next sec- and nodes nonfeasible. Instead, if we first select node
tion, the proposed technique tends to reach path balancing by to work at , the new slack for all nodes turns out to be:
reducing the slack without changing the circuit topology. This , , and , leaving
helps eliminate glitching to some extent. In any case, glitch , and still feasible. These feasible nodes can be further
power can be taken into account during validation by power sim- considered for the possibility of working at . In this sense,
ulation. is a better choice than .
Reducing supply voltage results in power saving of the gate. This example shows that the order of selecting nodes should
For gate , changing its supply voltage from to any supply be considered carefully so that their slacks can be fully exploited
voltage generates the power reduction for more power saving. In the next section, we will discuss this
problem in more details. For any given circuit, our goal is to
(4) select a subset of gates working at such that the timing
constraints are satisfied and the total power gain is maximized.
When , the power reduction is
. We call the power gain of gate .
For the purpose of further discussion in the following, we give III. TIMING SLACK ANALYSIS
some definitions first. In this section, we explore how to take full advantage of slacks
Definition 2.1: A circuit (or graph ) is safe if for in a circuit by examining the interaction between the node delay
each node . Otherwise, the circuit is unsafe, meaning that and slack changes. First, we provide some definitions.
it violates the timing constraints. Definition 3.1: Given a graph : 1) an edge
Definition 2.2: A node is called a feasible node if is called sensitive if either
, where is the delay gap of node . Otherwise, the or . Intuitively, a sensitive edge im-
node is called a nonfeasible node. plies that the slack of and is sensitive to each others delay
It is assumed that initially each node works at change; 2) a directed path is called sensitive if: a) the path con-
and its slack (i.e., the circuit is safe). Under the given sists of only sensitive edges and b) the slack of all nodes on the
CHEN et al.: ON GATE LEVEL POWER OPTIMIZATION USING DUAL-SUPPLY VOLTAGES 619

Fig. 3. Part of a safe circuit with six feasible nodes.


Fig. 4. Example graph with the arrival, required and slack time shown as a
tripletf g
, assuming d(v ) = 2, d(u ) = 1, i = 1; 2; 3, and d(w ) = 1,
a; r; s
j = 1; 2 .
path is monotonously distributed1 in the direction of the path;
3) two nodes are called slack sensitive if there exists a
sensitive path from to or from to in . Otherwise, they
are called slack insensitive.
Definition 3.2: The sensitive transitive closure graph
of is a directed graph such that there is an edge
if and only if there is a directed sensitive path
from to in .
We clarify the above definitions using the following example.
Example 3.1: Fig. 4 shows an example, where the delay of
node is 2, while the delay of the rest is assumed to be 1. The ar-
rival, required and slack time for each node are represented by a
triplet in the figure. From Definition 3.1, all edges ex- Fig. 5. The sensitive transitive closure graph, G of Fig. 4.
cept the directed edge are sensitive. The sensitive paths
include , , and . Nodes , , and a) is true when is an immediate fanout of . In this case,
are pair-wise slack sensitive, while , are slack insensi- we argue . [Otherwise, we have both
tive. The sensitive transitive closure graph of Fig. 4 is shown in and . This implies
Fig. 5. Note that there is no edge from to in Fig. 5 since . This is a contradiction.] Therefore, increasing the
in Fig. 4 is not a sensitive path. Instead, there is an ad- delay of by results in the increase of both and
ditional directed edge in Fig. 5 since there is a directed by , while both and remain unchanged. This means
sensitive path, , from to in Fig. 4. both and will be reduced by . Similarly, one can
Suppose the delay of node increases by a small positive prove that Case a) still holds true if .
amount , then its slack is reduced by exactly . Ob- Let us verify the above lemma using the following example.
viously, the slack of any node does not Example 3.2: Consider Fig. 4. Assume that the delay of node
change. Instead, the slack for any node , , increases by . Since , both
may or may not change, depending on and the and are reduced to 0.2, as indicated in Case a). From Case
slack sensitivity of and . Consider three cases as shown in b), decreases to 0.2. Since , are slack insensitive,
Lemma 3.1. remains unchanged, as can be expected from Case c). In
Lemma 3.1: Let be a transitive fanin/fanout node of any contrast, by using , we have
node , and be a small positive constant. We have the . In this case, increasing by does not affect the value
following. of , even though nodes , are slack sensitive.
Case a) If , then will decrease by as long To make full use of timing slacks, it is desirable to keep the
as and are slack sensitive. minimum number of nodes whose slacks are reduced by the in-
Case b) If , then will decrease creased delay of a node. In this sense, Case c) is the best case.
by as long as and are slack sensitive. This motivates us to first select the set of nodes with max-
Case c) If either or , are slack insensi- imum slack (denoted by ) in and then construct an in-
tive, then remains unchanged. duced subgraph, , of (see Definition 3.2)
Proof: For simplicity, we give proof of Case a) only. on such that there is an edge if is in .
Without loss of generality, assuming in Case We use to denote the set of neighbors of node in .
a). Since and are slack sensitive and , we In Fig. 4, for example, we have and .
have , where is a node lying at the sensitive of Fig. 5 is shown in Fig. 6, where . There
path from to . Thus, we only need to prove that Case are two important properties of , as given in Lemma 3.2.
Lemma 3.2:
f
1Assuming a directed path consists of v ; v ; . . . ; v g . The monotonic
slack distribution on the path implies that if s(v )  s(v ), then s(v ) Property 1) If the delay of any node increases by
s(v ); if s(v )  s(v ), then s(v )  s(v 0
), where i = 1; . . . ; m 1. , then the slack of any node decreases by .
620 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 9, NO. 5, OCTOBER 2001

Fig. 6. The induced graph G of Fig. 4.

Property 2) The delay increase of any node by


does not affect the slack of any node as
long as is small enough. Fig. 7. The updated G .
Proof: 1) Since , it follows that
and nodes , are slack-sensitive. From Case a), is reduced first-order gradient approach, where the sum of incremental de-
by as is. 2) Since , we have either lays is the objective function to be optimized, the MINS corre-
or . If , then sponds to a fastest way that the objective is increased, and is
. Let . If , then the step size used for searching the best solution. To speed up the
nodes and are slack-insensitive. In either case, remains process, one may wish to solve the problem in one shot by in-
unchanged according to Case c). creasing the delay for all nodes in the graph. This can however
One can demonstrate Lemma 3.2 using in Figs. 4–6. prevent us from analyzing/formulating the slack-sensitivity be-
It can be seen that is independent of , and Lemma 3.2 holds tween nodes, as described previously in this section. Since slack
true if , where is the second sensitivity explains how the nodes interact in terms of their delay
largest slack of all nodes in . Therefore, a reasonable value for and slack changes, it is generally not realistic to find an exact so-
is . From Lemma 3.1, in a given graph , in- lution in one shot where the slack sensitivity information is un-
creasing the delay of any node may reduce the slacks of its tran- available.
sitive fanin or fanout nodes, depending on their slack-sensitivity.
As will be seen in Section IV, the power consumption of a IV. LOWER BOUND OF POWER CONSUMPTION
node can be represented as a monotonic function of its delay. In the previous section, we obtained the basic scheme of se-
Therefore, it is generally desirable to maximize the sum of pos- lecting nodes for delay increase by analyzing the relation be-
sible increased delay of all nodes in without violating the tween the slack and delay of nodes. The goal is to maximize
timing constraints. To this end, one can select nodes in the max- the sum of all possible delay increase for all nodes in the cir-
imum independent node set2 (MINS) of for the delay in- cuit while maintaining the timing performance. However, max-
crease by . Each time the delay of all nodes in MINS increases imizing the sum of all possible delay increase does not neces-
by , we update , and , and find MINS again. This sarily lead to maximum power reduction. The reason is two-
process continues until . Assuming that the circuit fold. First, the voltage reduction and, hence, power reduction
contains groups of nodes with different slacks, the whole under a unit penalty of node delay can be different for different
MINS-based process can be completed by passes. Also, nodes. Second, under dual-voltage environment, only two dis-
choosing guarantees that keeps safe at crete supply voltages are available for all nodes. In this sec-
each pass. This is because , , tion, we deal with the former issue by introducing the notion of
and from Case a)–c) of Lemma 3.1, , weight function and transforming the delay increase into power
. Example 3.3 illustrates the process. reduction. The latter will be discussed in Section V.
Example 3.3: Consider Fig. 4 again. Initially, we have
and . The MINS of (see Fig. 6) A. Lower-Bound Algorithm
is (or ). Let MINS , and increase the delay To tackle low-power design using the slack and delay of
of node by . We then update nodes, let us examine how the node delay affects the power
, and as: , and consumption. First, assume (for the purpose of establishing a
. is also updated, as shown in Fig. 7 lower bound) that initially the given circuit is safe and that the
where MINS is or . Let MINS supply voltage can take any continuous value. From (2) and (4)
and increase the delay of both node and node by at any supply voltage , the power reduction of node under
. The process terminates with unit delay penalty is given by
the final maximum slack . As a result, the sum of
increased delay of all nodes is thus , which is
the maximum achievable under the given timing constraints.
The above process of increasing delay is done by many itera-
tions, depending on the number of nodes with different slacks in
the graph. From the optimization point of view, this constitutes a (5)
2A maximum independent node set of a graph is a set of independent nodes
(i.e., no two nodes are connected by an edge) such that the number of nodes in We call the weight function of node . Note that
this set is maximum. not only depends on and , but also varies with the
CHEN et al.: ON GATE LEVEL POWER OPTIMIZATION USING DUAL-SUPPLY VOLTAGES 621

supply voltage and, hence, the delay of node . This leads all edges incident with at least one of the nodes) from . This
to the following considerations. process repeats until . An exact and polynomial-time
1) Since the weight function denotes the power reduction by algorithm can be found in [11] and [14], respectively.
increasing a unit delay on a node, we need to select nodes
in MWIS3 instead of MINS for the delay increase of so B. Prediction of “Optimum” Supply Voltages
that the maximum power reduction can be obtained. It is interesting to look at the effect of low supply voltage
2) Since the weight function is related to the node delay, a on the delay and power reduction. For each gate, the reduced
small value of is generally preferable. Theoretically, is promises more power saving at the cost of increased delay
required to approach zero for the maximum power saving. of the gate. Under the same timing constraints, using the lower
In the real world, however, smaller is not always better. means that the fewer gates are permitted to work with.
First, too small value of can lead to prohibitively ex- Therefore, one can expect there is a specific “optimal” value
pensive computation cost. Second, since MWIS depends of supply voltage for the total power reduction, depending on
on the relative weight of nodes in , the small change the given circuits. Unfortunately, the existing dual-voltage al-
of weight for specific nodes will not necessarily affect gorithms work with the fixed supply voltages. These algorithms
MWIS. This is true especially when the weights of nodes cannot help if the given supply voltages are far from their op-
in are distributed in a wider range. In our experiment timum. Also, as mentioned in Section I, an exhaustive approach
part, we will show that for most circuits, the results are for selecting optimum supply voltages is computationally ex-
reasonably good when using . pensive. This motivates us to find a fast prediction of “optimal”
In the above discussion, we assumed the supply voltage is dual-supply voltages.
a continuous variable. Instead, when only dual-discrete supply When the lower-bound algorithm terminates, the supply
voltages are available, the actual power reduction is higher than voltage is obtained for each node in . In order to
it is with continuous voltage. From this point of view, our ap- meet the given timing constraints, the supply voltage, ,
proach provides a lower bound of power consumption with dual selected for node has to be kept greater than . Under
voltages. The procedure of our algorithm is outlined below: dual-voltage environment, either or can be used for
each node. It is reasonable to select .
The “optimal” value of can be estimated by finding
Lower-Bound-Algorithm {
input:
output: supply voltages for all nodes and
lower bound of the total power (6)
if
Calculate node delay, slack and weight where
function for each node in under if
using (1), (2) and (5); This can be done by calculating the specific summation in
Identify and in and let (6) for different values of ranging from
to and selecting the maximum sum. Equation
while { (6) cannot take its maximum value unless is equal to
Find and construct ; for a specific node . We prove this below by way of
Find MWIS of ; contradiction. Without loss of generality, suppose we have
Increase node delay by for each
node in MWIS; and (6) takes its maximum value
Update the node slack, voltage,
weight and ;
}
Calculate the final supply voltage,
, for each node in using (2); when , where . Note
Obtain the lower bound of power con- that for , and for . On the other
sumption using (4); hand, when , the value of (6) becomes
}

It should be noted that the MWIS problem is NP-complete on


general graphs [11]. It is, however, polynomial-time solvable for
transitive graphs [11]. A fast heuristic for finding MWIS is as Since , we have . This contradicts the
follows. Initially MWIS is set to be . Then choose a node previous hypothesis that is the maximum value.
with maximum among all nodes in . Add The estimate given by (6) is optimal in the sense that any devi-
node to MWIS and delete and all its neighbors (along with ation from it will either increase power consumption or violate
3A maximal weighted independent set of a graph is a set of independent nodes
the timing constraints. When the optimum and are
(i.e., no two nodes are connected by an edge) such that the sum of their weights available, we modify lower-bound algorithm for dual-voltage
is maximum. power optimization as follows. Each time the delay of nodes in
622 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 9, NO. 5, OCTOBER 2001

MWIS increases by , we check if some of these nodes are able Step 2: Let and ;
to work at . If yes, select them to work at , and update Step 3: while ( ) {
their weight to zero. The reason is that, once a node works at Find of the induced sub-
, no further increase of its delay is needed. By doing so, graph, , of on a set of nodes with
more delay budget can be provided for other nodes. The formal maximum slack;
description of power optimization algorithm based on predicted Increase node delay by for all
optimum supply voltages will be given in the next section. nodes in MWIS;
if the delay of node is so large
V. POWER OPTIMIZATION WITH DUAL-SUPPLY VOLTAGES that it is able to work at ,
, then
In this section, we describe an effective algorithm for power
;
optimization using dual-supply voltages. Since a level converter
Update the delay, slack , and the
is required at the interface of two voltages, we also discuss the
weight function of node , ;
reduction of the level converters’ power penalty.
endif
}
A. Power Optimization Algorithm Using Dual-Supply Voltages
}
Before presenting our algorithm, we take a look at the perfor-
mance of existing techniques for power optimization with two We now apply the dual-voltage power optimization (DVPO)
supply voltages. Example 5.1 shows how these techniques work algorithm to Fig. 3, as shown in Example 5.2. For comparison,
on Fig. 3. it is assumed that the supply voltages are the same as in CVS
Example 5.1: One of existing techniques is CVS scheme as and ZSA algorithms and, hence, the prediction of and
mentioned in Section I. Assuming, in Fig. 3, that is a primary is not included in this example.
output and the power gain, , of is as fol- Example 5.2: Consider Fig. 3 again. The weight functions
lows: , , , for all nodes are approximated as ,
and . The CVS algorithm [2] starts 1 6. We have , , ,
with selecting the primary output . Once works at , , and . In the first iteration
the slack of its immediate predecessor, , changes from three of DVPO algorithm, , , ,
to zero. To meet the timing constraint, cannot work at , , MWIS , and
and the algorithm terminates. The power reduction is two. An- , . No nodes are put into since
other technique available is zero-slack-algorithm (ZSA) [10]. , . The slack values of all nodes
The basic idea behind ZSA is that, at each step, it first finds are updated as: , and .
the nodes with minimum positive slack and then selects some In the second iteration, , , ,
of them to work at such that their slacks become zero (or , MWIS , and
small enough in this application). At the first step, since node , . The delay of nodes , and
has maximum power gain of three, is selected to work at are large enough to be put into , i.e., .
with the power gain of three. At the second step, nodes , The new slacks are: , ,
and have the minimum positive slack of two. Because three and , and are updated to be zero.
of them are nonfeasible, the algorithm stops with the final power In the third iteration, , , ,
reduction of three. , MWIS , and
As a matter of fact, however, the optimal solution of Fig. 3 is , . Thus, node is added into .
to select , , , and to work at . That will result in the Finally, the algorithm terminates with . As a result, we get
maximum power reduction of six while keeping the circuit safe. the optimal solution of with maximum
Both CVS and ZSA break down because they select the nodes power reduction of six.
locally (i.e., in a very greedy fashion) and ignore the interaction The time complexity of DVPO algorithm is analyzed as fol-
between the delay and slack changes of nodes. Thus, a more lows. The time complexity of lower-bound-algorithm is basically
effective algorithm, with a global view, is required to tackle this the same as that in Step 3 of DVPO. Finding the maximum value
problem. of requires time, where is the number of
Based on the discussions in Section IV, we next outline our nodes in . Estimation of the “optimal” value of takes
algorithm for power optimization. time, since at most values need to be tried for finding the max-
imum sum in equation (6). Therefore, the prediction of optimum
Dual-Voltage-Power-Optimization (DVPO) supply voltages just needs time. Calculating the delay,
Algorithm { slack and weight function for all nodes takes time. Finding
input: also needs time. The MWIS of can be obtained in
output: The set of gates, time because . Since MWIS , and
Step 1: Predict the “optimum” supply linear time is required to update the slacks and weight functions
voltages by calling lower-bound-algo- of all nodes in and for each node MWIS,
rithm ( ), unless they have been given each iteration in Step 3 can be completed in time. There-
by the users; fore, assuming the number of iterations is , the time complexity
CHEN et al.: ON GATE LEVEL POWER OPTIMIZATION USING DUAL-SUPPLY VOLTAGES 623

Fig. 8. The comparison ofK n K


and for 16 benchmarks. ( —the number of
Fig. 9. The conventional level converter.
n
groups of nodes with different slack; —the number of nodes.)
self moving from to . Thus, the problem of maximizing
power reduction can be solved by constrained CF–M algorithm.
of the whole DVPO algorithm is . When selecting
The constrained means that a move is accepted only if it does
at each iteration, we have , where
not violate the given timing constraints. Although more effec-
is the number of groups of nodes with different slack. Our expe-
tive cluster-based F–M (like hMetis [24]) algorithms are avail-
rience shows for most circuits, making the computation
able for the general partitioning purposes, they cannot be used
efficient. Fig. 8 is the comparison of and for 16 MCNC’91
in this application where nodes are required to move individu-
benchmark circuits, where , on average.
ally without violating the timing constraints. Our algorithm for
minimizing LCs power penalty is described in the following.
B. Level Converter Power Minimization Algorithm
When using two supply voltages, the circuit requires level
Level-Converter-Power-Minimization (LCPM)
converters (LCs) at the interface of gates and gates
Algorithm {
to block the static current which occurs if a gate drives a
input: Circuit with the initial parti-
gate [2]. A conventional LC circuit is shown in Fig. 9. In
tion of and
the above discussion, we did not account for the effect of the LC.
output: Power optimized circuit with
Since an LC generates the additional area, delay, and dynamic
final partition
power to the circuit, the number of LCs should be minimized.
repeat
From Fig. 9, the LC consists of 6 transistors and, hence, its area
; gain_sum ;
can be seen as the area of a three-input NAND gate. In the DVPO
for each gate do
algorithm, the delay of LC (denoted by ) can easily be taken
compute gain ;
into consideration by replacing with as
choose such that
the delay gap of node if an LC is needed at the output of .
gain (gain );
In general, can be seen to be a constant dependent on the
if the tentative move of from
specific technology (in our experiment, the of 0.5 ns is used
to does not violate the timing con-
as in [2]). After the DVPO, we have two partitions of gates: one
straints then
with gates (denoted by ) and the other with gates
;
(denoted by ). Assuming that the input capacitance of an LC
gain_sum gain_sum
is and that the switching activity at node where the LC is
gain ;
inserted is , the additional power consumption due to the
endif
LC can be approximated by .
endfor
Now we examine the problem of reducing the total power
Find such that
consumption by moving a gate from to or from to .
gain_sum (gain_sum );
Initially, because of DVPO, it is less likely to make any move of
Move the first gates from to
a gate from to which may violate the timing constraints
if
due to the increased delay of the gate. Let us first look into
gain_sum ;
the possibility of moving a gate from to . Without loss of
until gain_sum ;
generality, consider an -type gate with -type fanins and
}
-type fanouts. The possible power reduction resulting from
the move of node from to is given by
After the process of gate move from to is complete, a
similar algorithm is applied to the possible move of gate from
to where the cost function is modified as
(7)

where the first term is the reduced power due to LCs and the
(8)
second term accounts for the increased power of the node it-
624 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 9, NO. 5, OCTOBER 2001

TABLE I
THE LOWER BOUND OF POWER CONSUMPTION

Fig. 10. Examples of gate moves. (a) From L to H. (b) From H to L.


and CVS is presented in the third part, and the power-delay
tradeoff is provided in the fourth part. Finally, we show our
Fig. 10 shows typical examples of the possible gate moves. In
power optimization results with different input statistics in the
Fig. 10(a), gate L is to be moved from to so that the number
fifth part.
of LCs is minimized [see (7)]. In Fig. 10(b), in order to reduce
the number of LCs, gate H is to be moved from to if it A. Lower Bound of Power and “Optimum” Supply Voltages
does not violate the timing constraints [see (8)]. Obviously, the
time complexity of LCPM algorithm is the same as that of the Table I shows the lower bound of power consumption using
traditional F–M algorithm [6] which needs time for each and its comparison with actual minimum
pass, where is the number of terminals in the circuit. power. For individual circuits, the lower bound ranges from
Since the power overhead of LCs is not considered in lower- 54.5% to 77.9% of actual power consumption. On average, the
bound algorithm in Section IV, our lower bound estimate of power lower bound accounts for about 66% of the actual power, com-
dissipation may not be tight, depending upon specific circuits. pared to 71.5% for actual minimum power. To test the “op-
Ideally, one has to try every possible set of dual voltages exhaus- timum” voltage estimation, we optimized the same circuits with
tively for power optimization before deriving the real optimum DVPO algorithm using different values of . Table II lists
voltages. Obviously, this is prohibitively costly. Also, it is too these test data. Depending on specific circuits, the “optimum”
complicated, if not impossible, to deal with LCs and best supply value of ranges from 2.2 V to 3.5 V (here only “optimum”
voltages in an integrated problem formulation. Hence, we have was given because is 5 V under the minimum delay
divided this difficult problem into two subproblems which can constraints). It can be seen that the maximum power saving was
be solved easier. We first estimate “optimum” voltages by lower- achieved at a specific value of (as shown in shaded box of
bound algorithm assuming no power overhead for LCs, and then Table II) which is exactly, or very close to, the estimated “op-
minimize the number of LCs by LCPM algorithm. As we will see timum” voltage (as shown in the last column of Table II). Al-
in the next section, the lower bound of power consumption de- though, for a few circuits (e.g., circuit frg2), the estimate and
viates the actual minimum power by about 6% on average (the actual optimum voltage are much different, the resulting power
readers are referred to Table I in Section VI). reduction is nevertheless always very similar. Note that, in some
cases (e.g., circuit 9symml), the lower bound of power can be
much smaller than the actual minimum power consumption. The
VI. EXPERIMENTAL RESULTS AND DISCUSSIONS
reason is that, for this circuit, the final supply voltages ( ) re-
We implemented our dual-voltage approach in the environ- sulting from lower bound algorithm scatter in a wider range be-
ment of SIS package [7], as shown in Fig. 2, and tested it on cause of the very nature of the circuit.
a set of MCNC91 benchmark circuits. The technology-mapped To look at how the value of affects the performance of the
network was obtained using SIS under the delay mode. We used lower-bound algorithm, we tested it on different fixed values
the longest path delay of the minimum delay implementation as of (note that implies it varies dynam-
the timing constraints in our algorithm. The original power con- ically). Fig. 11 shows the resulting lower bound of power and
sumption was estimated at supply voltage of 5 V and clock fre- CPU time (on a SUN SPARCstation 5 with 32 MB RAM) for
quency of 20 MHz under 1 m technology with the threshold the same circuits as above. For most circuits (except one), as
voltage of 0.6 V. Our experiments are divided into five parts. In shown in Fig. 11(a), the lower bound varies within the range of
the first part, we estimated the lower bound of power consump- 3% for different , indicating its insensitivity to . In contrast,
tion and optimum supply voltages, and compare the predicted CPU time increases with decreased , as shown in Fig. 11(b).
optimum voltage with the actual one. In the second part, we Our experiment shows the further reduction of results in too
compare our dual-voltage approach with gate sizing technique expensive computation cost only with almost the same lower
(using a standard cell library). The performance of our approach bound estimate. We confirm that it is reasonable to choose
CHEN et al.: ON GATE LEVEL POWER OPTIMIZATION USING DUAL-SUPPLY VOLTAGES 625

TABLE II
THE OPTIMUM SUPPLY VOLTAGE AND POWER REDUCTION USING DIFFERENT VALUES OF V (THE SHADED ENTRIES CORRESPOND TO THE ACTUAL VOLTAGE
WITH MAXIMUM POWER REDUCTION)

(a) (a)

(b) (b)
Fig. 11. Performance of lower-bound algorithm on a set of circuits. (a) Lower Fig. 12. The estimated lower bound of power consumption and “optimum”
bound on power consumption. (b) CPU time. supply voltages under different timing constraints. (a) Lower bound on power
consumption. (b) Optimum supply voltage.
in terms of both accuracy and efficiency of the
algorithm. This is because the timing penalty increases quickly when the
Also, we tested our algorithm under different timing con- supply voltage of a gate is reduced to the smaller value.
straints. Fig. 12 shows the lower bound of power and optimum
supply voltage given the timing constraints of , B. Comparison of Our Dual-Voltage Approach and Gate
and , where is the minimum delay of the circuit. Sizing Technique
As can be expected, the lower bound of power decreases as the In order to compare our dual-voltage approach with gate
timing constraints are relaxed. The “optimum” , however, sizing technique, we implemented a gate sizing technique [17]
basically stays at 5 V under various timing constraints. What is which is also based on the MWIS. Table III summarizes the
interesting is that looser timing constraints do not always lead results (with V and V) using a standard
to the lower “optimum” value of , as shown in Fig. 12(b). cell library. The average power reduction over tested circuits
626 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 9, NO. 5, OCTOBER 2001

TABLE III slack drastically increases after optimization. For circuit i3, in
POWER REDUCTION (%) COMPARISON OF DUAL-VOLTAGE TECHNIQUE AND contrast, only 6 out of 310 gates work at and, hence, the
GATE SIZING TECHNIQUE
slack change of the gates is much smaller, resulting in very lim-
ited power saving (only 0.1%).

D. Power-Delay Tradeoff
To provide the power-delay tradeoff, we specified the dif-
ferent timing constraints on the given circuits for our power
optimization. As an example, the power-delay tradeoff curves
for two circuits (C880 and apex6) are shown in Fig. 14, where
and are the two extreme points. corresponds to the
minimum timing constraint with relatively less power reduc-
tion, while stands for relatively loose timing constraint with
the possible maximum of power reduction which is reached
when all gates of the circuit work at . Using V
and V, the maximal percentage of power reduction
is 6.9% for gate sizing and 19.6% for dual-voltage approach. should be theoretically %. However,
When using the optimum supply voltage, more power saving since the primary inputs always work at , the actual min-
is possible with dual-voltage technique. However, it should imum is higher. As shown in Fig. 14, the actual minimum power
be pointed out that for dual-voltage technique, there will be for circuit C880 and apex6 are 62.8% and 66.1% of the initial
some power overheads at physical level, which could not be total power, respectively, and the power consumption no longer
accounted for at gate level. More recently, an approach that decreases when the delay (i.e., the given timing constraint) in-
uses gate sizing and dual-voltage techniques simultaneously creases further beyond point .
for low power has also been reported [17], [27]. In the above discussions, we are assuming the capacitive
loading of a gate remains the same whether it works at
C. Performance of Our Approach Compared With CVS or . In the real design, however, the gate with has
less parasitic capacitance than that with , indicating that
For comparison with CVS which is the existing dual-voltage our power estimation for gates tends to be pessimistic. In
approach, we also implemented CVS under SIS environment this sense, the actual power saving can be a little higher than
and tested it on benchmarks. Table IV shows the comparison of estimated here.
our algorithm and CVS ( V and V were
used for this experiment). Columns 2, 3, and 4 of this table are
the number of gates in the circuit, the circuit delay and initial E. Power Savings Under Different Input Statistics
power consumption (i.e., with supply voltage of 5 V), respec- In the above experiments we assume that the input activity
tively, before running the algorithms. With our algorithm, the is 0.5 for all circuits. Considering the fact that the actual input
number of gates working at in the circuit, on average, ac- conditions may not be known, it is interesting to look at how
counts for 65.7% of the total number of gates. Individual per- sensitive our optimization results are to input statistics. While
centage ranges from 34.6% to 86.3%, depending on the specific the proposed technique does not require any specific input pat-
circuits. The average power penalty due to the LCs is about 5%. terns for optimization, the weight function given by (5) does
Our algorithm achieves a total power reduction of 19.6%, on av- vary with switching activity, , which depends on the input
erage, with the maximum reduction of 26.3% for circuit C880. statistics. Thus, input activity affects the optimal dual voltages,
In contrast, only 11.4% average power reduction is obtained by final assignment of nodes to or and, hence, the power
CVS. A typical instance is circuit 9symml which has only one optimization results. For a case study, we use the random input
primary output. Under the timing constraints of minimum cir- activity ranging from zero to one for circuit 9symml, and run our
cuit delay, the slack of the primary output is zero and no power dual-voltage optimization algorithms. The results are plotted
reduction is possible by using CVS. Instead, our algorithm gets in Fig. 15. Fig. 15(a) shows the curve of initial power con-
15.4% power improvement for this circuit. Note that all results sumption versus random input activity, and Fig. 15(b) shows
in Table IV were achieved without degrading the timing perfor- the power saving using our approach with random input ac-
mance of the circuit. This demonstrates the great potential ob- tivity. From Fig. 15(a), the power consumption of this circuit
tained by our dual-voltage power optimization. The last column changes a lot due to different input patterns, ranging from 295.1
of Table IV shows the CPU time running on a SUN SPARCsta- to 1367.2. From Fig. 15(b), however, the power reduction varies
tion 5 with 32 MB RAM. from 15.9% to 17.6%, showing that percentage power saving is
To see how the excessive slacks in the circuit contribute to not sensitive to input statistics. This is a desirable feature of our
power reduction, we conducted experiments on the slack change approach. Our experiments with other benchmarks showed the
of all gates resulting from our algorithm. Fig. 13 shows the dis- similar feature. We believe that percentage power reduction is
tribution histogram of gates over the slack value for two circuits: dominated by other factors (such as timing constraints and cir-
c8 and i3. For circuit c8 whose longest path delay is 13.2 ns, 45 cuit topology which affect the slack distribution) instead of input
out of 130 gates work at . The number of gates with small statistics.
CHEN et al.: ON GATE LEVEL POWER OPTIMIZATION USING DUAL-SUPPLY VOLTAGES 627

TABLE IV
PERFORMANCE OF OUR ALGORITHM AND CVS ALGORITHM ON A SET OF BENCHMARK CIRCUITS

(a)
(a)

(b) (b)

Fig. 13. Gate distribution over the slack value for (a) circuit c8 and (b) circuit Fig. 14. Power-delay tradeoff for (a) circuit C880 and (b) circuit apex6.
i3.

VII. CONCLUSION AND FURTHER WORK connection delay cannot be reasonably estimated until the phys-
In this paper, we targeted gate level power optimization with ical design phase, while the slew effect depends strongly on the
dual-supply voltages. We have shown that dual-voltage approach specific value of dual-supply voltages applied. In addition, the
can achieve significant power saving without degrading timing delay given by (2) is based on the assumption that the input
performance of the circuit. We demonstrated that it is important to voltage of a gate at supply voltage is also . However,
predict optimum supply voltages before using dual-voltage tech- when a gate drives a gate, the delay of the gate
nique and that the power penalty due to level converters can be is more accurately proportional to instead
wellcontrolled.Byspecifyingdifferenttimingconstraints,weob- of . Therefore, in this case, (2) tends to over-
tained a power-delay tradeoff with dual voltages. estimate the delay. With these issues in mind, further work is
In this work, we used a simpler delay model which does not required to develop more accurate delay model with dual volt-
account for interconnection delay and signal slew effect. Inter- ages. Besides, there is the additional noise due to cross-coupling
628 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 9, NO. 5, OCTOBER 2001

[3] M. Igarashi et al., “A low power design method using multiple supply
voltages,” in Proc. Int. Symp. Low Power Electronics and Design
(ISLPED), Monterey, CA, Aug. 1997, pp. 36–41.
[4] S. Raje and M. Sarrafzadeh, “Variable voltage scheduling,” in Proc. Int.
Symp. Low Power Design (ISLPD), Dara Point, CA, Apr. 1995, pp. 9–14.
[5] K. Usami et al., “Automated low power technique exploiting multiple
supply voltage applied to a media processor,” in Proc. Custom Integrated
Circuit Conf. (CICC), Santa Clara, CA, May 1997, pp. 131–134.
[6] C. M. Fiduccia and R. M. Mattheyses, “A linear time heuristic for im-
proving network partitions,” in Proc. IEEE/ACM Design Automation
Conference (DAC), Las Vegas, NV, June 1982, pp. 175–181.
[7] E. M. Sentovish et al., “SIS: A system for sequential circuit synthesis,”
Univ. California, Berkeley, Tech. Rep. UCB/ERL M92/41, May 1992.
[8] A. P. Chandrakasan and R. W. Brodersen, Low-Power CMOS Digital
Design. Norwell, MA: Kluwer, 1995.
(a) [9] S. Mutoh et al., “1-V power supply high-speed digital circuit technology
with multithreshold voltage CMOS,” IEEE J. Solid-State Circuits, vol.
30, pp. 847–853, Aug. 1995.
[10] R. Nair, C. L. Berman, P. S. Hauge, and E. J. Yoffa, “Generation of per-
formance constraints for layout,” IEEE Trans. Comput.-Aided Design,
vol. 8, pp. 860–874, Aug. 1989.
[11] R. H. Mohring, Graphs and Orders: The Role of Graphs in the Theory
of Ordered Sets and Its Application, I. Rival, Ed. New York: D. Reidel,
May 1984, pp. 41–101.
[12] S. Raje and M. Sarrafzadeh, “Scheduling with multiple voltages,” Inte-
gration VLSI J. 23, pp. 37–59, 1997.
[13] J. M. Chang and M. Pedram, “Energy minimization using multiple
supply voltages,” IEEE Trans. VLSI Syst., vol. 5, pp. 1–8, Dec. 1997.
[14] D. Kagaris and S. Tragoudas, “Maximum independent sets on transi-
tive graphs and their applications in testing and CAD,” in Proc. Int.
Conf. Computer-Aided Design (ICCAD), San Jose, CA, Nov. 1997, pp.
736–740.
(b)
[15] K. Usami et al., “Design methodology of ultra low-power MPEG4 codec
Fig. 15. Power consumption for circuit 9symml under different input statistics. core exploiting voltage scaling techniques,” in Proc. ACM/IEEE Design
(a) Initial power versus random input activity. (b) Power savings versus random Automation Conf. (DAC), San Francisco, CA, June 1998, pp. 483–488.
input activity. [16] C. Yeh, Y. Kang, S. Shieh, and J. Wang, “Layout techniques supporting
the use of dual supply voltages for cell-based designs,” in Proc.
ACM/IEEE Design Automation Conf. (DAC), New Orleans, LA, June
between signals with different voltages. Noise checks are thus 1999, pp. 62–67.
[17] C. Chen and M. Sarrafzadeh, “Power reduction by simultaneous voltage
increasingly necessary during layout. scaling and gate sizing,” in Proc. Asia and South Pacific Design Automa-
More recently, we are combining gate level design with phys- tion Conf. (ASPDAC), Yokohama, Japan, Jan. 2000, pp. 333–338.
ical-level for power optimization with dual-supply voltages. [18] C. Yeh, M. Chang, S. Chang, and W. Jone, “Gate-level design ex-
Since some important parameters (such as wiring capacitance) ploiting dual supply voltages for power-driven applications,” in Proc.
ACM/IEEE Design Automation Conf. (DAC), New Orleans, LA, June
cannot be available at gate level, some kind of feedback 1999, pp. 68–71.
between logic and physical levels is required in order to know [19] V. Sundararajan and K. K. Parhi, “Synthesis of low power CMOS VLSI
the exact effect of dual-voltage approach on physical level. circuits using dual supply voltages,” in Proc. ACM/IEEE Design Au-
tomation Conf. (DAC), New Orleans, LA, June 1999, pp. 72–75.
In the logic level, dual-voltage technique gives an estimate
[20] L. Wei, Z. Chen, K. Roy, M. C. Johnson, Y. Ye, and V. K. De, “Design
of gate power reduction. In the physical level, we can use and optimization of dual-threshold circuits for low-voltage low-power
simulated annealing with additional constraints for placement applications,” IEEE Trans. VLSI Syst., vol. 7, pp. 16–23, Mar. 1999.
and routing followed by some post processing with an ultimate [21] C. Chen and M. Sarrafzadeh, “Provably good algorithm for low power
consumption with dual supply voltages,” in Proc. Int. Conf. Comput.-
goal of reducing power consumption due to both gates and Aided Design (ICCAD), San Jose, CA, Nov. 1999, pp. 76–79.
interconnections [26]. Further work is under way. [22] C. Chen and M. Sarrafzadeh, “An effective algorithm for gate-level
power-delay tradeoff using two voltages,” in Proc. Int. Conf. Computer
Design (ICCD), Austin, TX, Oct. 1999, pp. 222–227.
[23] M. Nemani and F. N. Najm, “Toward a high level power estimation capa-
ACKNOWLEDGMENT bility,” IEEE Trans. Comput.-Aided Design, vol. 15, pp. 588–598, June
1996.
The authors would like to thank the anonymous reviewers for [24] G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar, “Multilevel
their comments, which improved the quality of the paper. hypergraph partitioning: Applications in VLSI design,” in Proc.
ACM/IEEE Design Automation Conf. (DAC), Anaheim, CA, June 1997,
pp. 526–529.
[25] M. Sarrafzadeh and S. Raje, “Scheduling with multiple voltages under
REFERENCES resource constraints,” in Proc. Int. Symp. Circuits Syst. (ISCAS), Or-
lando, FL, May 1999, pp. 350–353.
[1] A. P. Chandrakasan, S. Sheng, and R. W. Brodersen, “Low-power [26] A. Nayak, P. Banerjee, C. Chen, and M. Sarrafzadeh, “Power optimiza-
CMOS digital design,” J. Solid-State Circuits, vol. 27, no. 4, pp. tion issues in dual voltage design,” in Proc. Int. Conf. Chip Design Au-
473–484, April 1992. tomation (ICDA), Beijing, China, Aug. 2000, pp. 99–105.
[2] K. Usami and M. Horowitz, “Cluster voltage scaling technique for low- [27] A. Nayak, M. Haldar, P. Banerjee, C. Chen, and M. Sarrafzadeh, “Power
power design,” in Proc. Int. Symp. Low Power Design (ISLPD), Dara optimization of delay constrained circuits,” J. VLSI Design—Special
Point, CA, Apr. 1995, pp. 3–8. Issue Low Power System Design, to be published.
CHEN et al.: ON GATE LEVEL POWER OPTIMIZATION USING DUAL-SUPPLY VOLTAGES 629

Chunhong Chen (M’99) received the B.S. and M.S. degrees in electrical engi- Majid Sarrafzadeh (M’87–SM’92–F’96) received the B.S., M.S., and Ph.D.
neering from Tianjin University, Tianjin, China, and the Ph.D. degree in elec- degrees in 1982, 1984, and 1987, respectively, from the University of Illinois at
trical engineering from Fudan University, Shanghai, China. Urbana-Champaign in electrical and computer engineering.
From 1986 to 1996, he was with Zhejiang University of Technology, He joined Northwestern University, Evanston, IL, as an Assistant Professor
Hangzhou, China, as an Assistant/Associate Professor. From 1997 to 1998, in 1987. In 2000, he joined the Computer Science Department at University of
he was a Research Associate at the Hong Kong University of Science and California at Los Angeles (UCLA). He has collaborated with many industries in
Technology, Hong Kong. From 1999 to 2001, he was a Postdoctoral Fellow, the past ten years, including IBM and Motorola and many CAD industries. He
first at Northwestern University, Evanston, IL, and then at the University of contributed to Theory and Practice of VLSI Design. He has published approx-
California, Los Angeles. Since then, he has been an Assistant Professor at the imately 200 papers, is a Co-Editor of Algorithm Aspects of VLSI Layout (Sin-
University of Windsor, ON, Canada. His current research interests include gapore: World Scientific, 1994), coauthor of An Introduction to VLSI Physical
physical layout, logic synthesis, timing analysis, and power optimization for Design (New York: McGraw Hill, 1996), and the author of an invited chapter in
integrated circuits. the Encyclopedia of Electrical and Electronics Engineering in the area of VLSI
circuit layout. He is on the editorial board of the VLSI Design Journal and an
Associate Editor of ACM Transaction on Design Automation (TODAES). His re-
cent research interests include embedded and reconfigurable computing, VLSI
CAD, and design and analysis of algorithms.
Ankur Srivastava (S’00) received the Bachelor of Technology degree in elec- Dr. Sarrafzadeh received the NSF Engineering Initiation award, two Distin-
trical engineering from the Indian Institute of Technology, Delhi, India, in 1998 guished Paper Awards in the International Conference on Computer-Aided De-
and the M.S. degree in computer engineering from Northwestern University, sign (ICCAD), and the Best Paper Award in the Design Automation Conference
Evanston, IL, in 2000. He is currently working toward the Ph.D. degree at the (DAC) for his work in the area of physical design. He has served on the tech-
Computer Science Department, University of California, Los Angeles, under nical program committee of numerous conferences in the area of VLSI Design
the guidance of Prof. M. Sarrafzadeh. and CAD. He has served as committee chair of a number of these conferences,
He has served as an Intern at Fujitsu Laboratories of America, Sunnyvale, including International Conference on CAD and International Symposium on
CA, during the summer of 1999. His primary research interests include delay Physical Design. He was the General Chair of the 1998 International Sympo-
and power optimization in logic circuits, reconfigurable computing, and power sium on Physical Design. He is an Associate Editor of IEEE TRANSACTIONS ON
issues in sensor networks. COMPUTER-AIDED DESIGN.

Вам также может понравиться