Академический Документы
Профессиональный Документы
Культура Документы
0-7803-7147-x/01/$10.0002001IEEE 867
3 Program Analysis 4.1 Signal Gating with One Boundary
With the execution tracing tool Shade [18], we have Suppose the upper G-bit portion of the adder is
analyzed run-time features of operations and operands identified as the candidate to be gated. The general
in 32-bit Mediabench programs [19]. We have the fol- structure of a signal-gated adder is illustrated in Fig. 2.
lowing observations. First, about 70% of the total ex- IDL is identity detection logic and GCTL is gating con-
ecuted instructions involve addition steps. This is be- trol logic as shown in 3. The behavior of si nal gating
cause additions/subtractions, load/store memory ad- is as follows. The first step is to detect ]Ex7and IEyl,
dress calculation and branches all require addition op- the sign-extension widths of X and Y . Two leading
erations. Moreover, most instruction executions have bits from the lower ( N - G) part are involved in test-
a PC incrementing step. In program djpeg, for ex- ing. This is necessary because we need the result of
ample, the distribution of instruction types is: addi- lower-part computation to be in a correct form with-
tions/subtractions are 36.08%, multldiv 3.26%, shift out overflow. The OR signal of upper (G 2) bits in +
and logic 21.02%' load/store 30.49%, branches 7.44%. each operand indicate if upper G bits can be viewed as
Second, most arithmetic operations have precisions extension bits of '0'. The NAND signal of upper ( G + 2 )
much smaller than the datapath hardware width. In bits indicate if upper G bits can be viewed as exten-
djpeg, 86% of additionlsubtraction has precision of 20 sion bits of '1'. To protect the clock signal from glitches
bits or less, 57% has precision of 13 bits or less, as and ensure correct timing, g is latched to be g l before
shown in Fig. 1. Third, the precision difference be- controlling the clock and is registered to be gll before
tween two operands is significant. In djpeg, the average controlling the combinational circuit. If lExl > G and
precision difference between additionlsubtraction's two lEyl > G, the G-bit portion is gated by disabling the
operands is 7 bits while the difference between the two clock of input registers and blocking the carry signal.
operands in data memory address calculation is 13 bits. The adder works as a short-precision adder and the re-
Fourth, the SP of each bit is less than 0.5 in most cases, sult is then restored to the full width. Otherwise, the
and the T R is often not equal to 2 x S P x (1 - S P ) adder works as a normal full-width adder.
because of the correlation. These observations have
X G bits Y: G bits X (N-G) bits Y: (N-G)
bits
motivated this work.
.:i.-
Figure 1: Precision distribution of AddISub.
w biu
a bich
gclk
4 Signal Gating for Symmetric Adders
c
a, Lars lkg
the general case. In our design, input data of an adder (a) IDL (b) GCTL
are stored in two registers. Upon each clock rising edge,
new data are loaded into the registers and the addition Figure 3: Identical Detection and Gating Control Logic.
works on the loaded data. Signal gating is applied to
both input registers and combinational addition logic.
In many cases, signal gating with one gating boundary
cannot fully utilize dynamic data precisions. Adders 4.2 Overhead Analysis
with multiple gating boundaries may be designed for
more energy saving. Here we only discuss signal gating To justify the signal gating technique, the energy
with one boundary. overhead should not exceed the power reductions.
868
Moreover, the area and delay overhead need to be con- Power Distribution in 32-bit AddedSubtractor
sidered. We assume tree structures of 2-input gates are
used t o implement IDL. To implement w-input NAND,
a tree structure will have log2 w levels and (20-1) gates.
For the structure in Fig. 2, we estimate the area over-
head as
Aoh = (4G + 8) 1 Anand2 4- Q A ~ a t c h4-G . Amur21
869
X: s d d d d d d d d did d d d d d Table 1: Power comparison ( n W / M H z )
y: s s s s s s s s s s j d d d d d d
s Cdl
y
Y: Sign Portion
cd II p
Schemes I baseline I G11 I
din
I-.. ,I
i195.7 i140.7
30.6 I 46.2 I
i
G18 I 2G 1
99.2 i 109.2
58.9 I 57.6
I
0 0 1 0 0 0 0 0 0 0 0 0 0 I F PI)” 120.8 I 90.51 I 81.1 I 76.0 1
( 1 1 1 0 0 0 0 0 0 0 0 0 01
0 1 0 0 0 0 0 0 0 0 0 -1
1 0 0 0 0 0 0 0 0 0 0 I
870
to reduce the number of bits to be processed, which is Proc. IEEE Int. Symp. Circuits and Systems (IS-
out of our scope here. Based on the simulation data CAS’98), ~01.2,pp.453-457, 1998.
in Table 1, each unit with dual signal gating is as- [5] M.D. Ercegovac and T. Lang, “Reducing transi-
sumed to consume 33% less power on average when tion counts in arithmetic circuits,” in IEEE Symp.
gated. Table 2 lists the the power reduction percent- Low Power Electronics, pp.64-65, 1994.
age in pipeline blocks of two gating schemes compared [6] M.D. Ercegovac and T. Lang, “Low-power accu-
to the baseline DLX pipeline with no gating. mulator(correlator),” in IEEE Symp. Low Power
Electronics , pp.30-31, Oct. 1995.
171 C.A. Fabian and M.D. Ercegovac, “Input synchro-
nization in low power CMOS arithmetic circuit de-
sign,” in Proc. 30th Asilomar Conf. Signals, Sys-
Schemes PC+1 BTA EX MEM WB I REG tems and Computers, pp.172-176, Nov. 1996.
W 0 0 9.17 69.5 9.17 I 26.33
[8] Y . Wang and K.K. Parhi, “New low power adders
W+P 44.53 25.00 39.42 79.65 39.42 1 37.77
based on new representations of carry signals,” in
Proc. 35th Asalomlsr Conf Signals, Systems and
Computers, Nov. 2000.
Only those units with power changing are listed in [9] Bishop, B.; Kelliher, T.P.; Irwin, M.J. “A detailed
the table. W is the whole-unit gating scheme. W+P analysis of MediaBench,” in IEEE Workshop on
is the combining of whole-unit gating and portion-unit Signal Processing Systems (SiPS’99), pp.448-455,
gating. It can be seen that W+P achieve another 10- 1999.
45% reduction compared t o W. With respect to the
gating overhead, there is little cost in the W scheme [lo] D. Brooks and M. Martonosi, “Value-based clock
because there is no precision detection and the gating gating and operation packing: dynamic strate-
control signals are also pipeline control signals. The gies for improving processor power and perfor-
W+P scheme has precision detection in both ALU and mance,” ACM Trans. Computer Systems, vo1.18,
MEM outputs. The total overhead may offset the ben- no.2, pp.89-126, May 2000.
efit in EX stage, judged from Table 1. Considering [ l l ] Stephenson, M.; Babb, J.; Amarashinghe, S.
that ALU and MEM are gated when idle, the overhead “Bitwidth analysis with application to silicon com-
would be much less. pilation”, ACM SIGPLAN Notices, vo1.35, (no.5),
pp.108-20, May 2000.
[12] Canal, R.; Gonzalez, A.; Smith, J.E. “Very low
7 Conclusions power pipelines using significance compression,’’
in Proc. 33rd Annual IEEE/ACM Int. Symp. on
Signal gating schemes for low-power adder design Microarchitecture, pp.181-190, 2000.
have been studied in this paper. The program anal- E131 Gowan, M.K.; Biro, L.L.; Jackson, D.B. “Power
ysis indicates that there is a large number of short- considerations in the design of the Alpha 21264
precision additions and the precision difference between microprocessor,” in Proc. 35th Design and Au-
two operands is large. Based on the precision features, tomation Coni pp.726-731, 1998.
signal gating is developed for two types of adders: sym- [14] Wu Ye; Irwin, M.J. “Power analysis of gated
metric adders and asymmetric adders. The effect of sig- pipeline registers,” in 12th Annual IEEE Int.
nal gating is studied by treating a signal-gated adder ASIC/SOC Con& pp.281-285, 1999.
as a separate unit as well as incorporating signal gat- [15] M. Bhardwaj, R. Min, and A. Chandrakasan,
ing into a RISC pipeline. Experimental results indicate “Power-aware systems,” in Proc. 35th Asilomar
10-45% power saving in the pipeline units compared to Conf. Signals, Systems and Computers, v01.2,
previous work. pp.1695-1701, NOV.2000.
[16] J . Choi, J. Jeon, and K. Choi, “Power minimiza-
tion of functional units by partially guarded com-
References putation,” in Proc. Int. symp. Low Power Elec-
tronics and Design, pp.131-136. Jul. 2000.
[l] Callaway, T.K.; Swartzlander, E.E., Jr. “Estimat-
[17] J.L. Hennessy and D.A. Patterson, Computer Ar-
ing the power consumption of CMOS adders, ” chitecture: A Quantitative Approach, 2nd Edition,
in Proc. IEEE 11th Symp. Computer Arithmetic, Morgan Kaufmann Publishers, Inc., 1996.
pp .2 10-216, 1993. [18] Sun Microsystems, Shade User’s Manual, 1993.
[2] Nagendra, C.; Irwin, M.J.; Owens, R.M. “Power- [19] Chunho Lee; Potkonjak, M.; Mangione-Smith,
delay characteristics of CMOS adders”, IEEE W.H. “MediaBench: a tool for evaluating and
Trans. VLSI Systems, v01.2, no.3, Sept. 1994. synthesizing multimedia and communications sys-
[3] Nagendra, C.; Irwin, M.J.; Owens, R.M. “Area- tems,’’ in Proc. 30th Annual IEEE/ACM Int.
time-power tradeoffs in parallel adders”, IEEE Symp. Microarchitecture,pp.330-335, Dec. 1997.
Trans. Circuits and Systems 11: Analog and Digi-
tal Signal Processing, vo1.43, (no.lO), Oct. 1996.
[4] Freking, R.A.; Parhi, K.K. “Theoretical estima-
tion of power consumption in binary adders,” in
87 1