Академический Документы
Профессиональный Документы
Культура Документы
net/publication/4139548
CITATIONS READS
4 19
4 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Lars Wanhammar on 30 May 2014.
Abstract in
In several previous papers the area, delay, and power
consumption for various carry-propagation adders have T T T VMA out
been compared. However, for high throughput applica-
tions it may be necessary to introduce pipelining into the Figure 1. High throughput FIR filter utilizing carry-
adder. The number of stages to be inserted and the width save arithmetic. Bold lines corresponds to
of the pipelining registers differs between different adders data in carry-save representation.
structure. In this work we focus on the power consump-
pipelining registers will also be different. As the main
tion for adder structures when pipelining is used to
application for this work is DSP we will consider all pos-
increase the throughput. Four adder structures with vary-
sible wordlengths, not only powers of two as in much of
ing wordlengths and pipeline levels are implemented
the previous work. This will also have some effects as the
using standard cells and the power consumption is com-
number of stages in most of the high-speed adder struc-
pared. The results show that the Kogge-Stone parallel
tures increases with every power of two input bits. Hence,
prefix adder gives the lowest power consumption given
generally, a 17-bits and a 32-bits adder will have the same
the throughput most of the time.
logic depth for the high-speed adders.
In the next section, the considered adder structures are
1. Introduction reviewed. In Section 3 the implementation is discussed.
Then, in Section 4, the results are presented. Finally, in
The operation of adding two numbers in a non-redundant
Section 5 some conclusions are drawn.
representation to obtain another non-redundant number is
a key operation in most DSP algorithms. As a non-redun-
dant output is required this implies that a carry signal
2. Adder Structures
must be propagated from the least significant bit of the
In this work four different adder structures are consid-
input words to the most significant bit of the output word.
ered. The adder structures are ripple-carry adder, condi-
This introduces a bound on the adder delay that increases
tional sum adder, and Brent-Kung and Kogge-Stone
with the wordlength.
parallel prefix adders.
During the years numerous types of different adder
Among adder structures not considered are, e.g., the
structures have been proposed to, essentially, speed up the
carry lookahead adder. This structure was left out as it
carry propagation. Comparisons have been performed
was not possible to introduce pipelining in a regular way
and presented for different adder structures in terms of
inside the lookahead blocks. There are also several more
area, delay, and power consumption [1]–[3].
parallel prefix adder structures, that could be considered,
For high-throughput applications, these different
but Brent-Kung and Kogge-Stone was selected. Further-
adder structures may not be enough to obtain the required
more, carry-select or carry-increment adders are also left
throughput. Instead, pipelining must be introduced. This
out, due to irregularity in the pipelining.
may be the case for, e.g., high-speed FIR filters based on
We also restrict ourselves to introducing pipelining
carry-save arithmetic, or heavily pipelined multipliers,
into a given adder structure. An alternative would be to
where a carry propagation adder is used at the output of
divide the addition into several shorter additions, where
the filter (often referred to as vector merging adder). In
all shorter additions are performed using a certain adder
Fig. 1, a transposed direct form FIR filter using carry-
structure. It would also be possible to introduce pipelin-
save arithmetic is shown.
ing within the shorter additions.
The aim of this work is to compare the power con-
To conclude there are a multitude of different cases
sumption for different adder structures implemented
that could have been included, but we selected the, in our
using standard cells, when pipelining is used to meet the
opinion, most important cases.
requirements on throughput. The number of pipelining
stages will differ between adder structures for the same
2.1. Ripple-Carry Adder
throughput requirements. Furthermore, the width of the
The ripple-carry adder (RCA) is the most straightforward
This work was supported by the Swedish Research Council implementation of a carry-propagation adder. It is con-
y6 x6 y5 x5 y4 x4 y3 x3 y2 x2 y1 x1 y0 x0 y6 x6 y5 x5 y4 x4 y3 x3 y2 x2 y1 x1 y0 x0
.. ........ ........ ........ ........ ........ ......... ........ ........ ........ ........ ........ .......
.
.
?? ?? ?? ?? ?? ?? ?? cin
.. ......... ........ ........ ........ ........ ........ ......... ........ ........ ...... .
. . claPG claPG claPG claPG claPG claPG claPG
. .
.. ......... ........ ......... ........ ........ ......... ........ ..... . . ... ... .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... .. ... ... ... ... ... ... ... ... ...
. . .
. . G0:0
q q q
?? q
. . . G6:6 G4:4 G2:2
.. .... .... .... ..... .... .... .... .... .... ..... .... .. . P6:6 P4:4 P2:2 P0:0
.
.
.
.
.
.
.
.
.
. ? ? G5:5
q ? ? G3:3
q ? ? G1:1
q
...................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. FCO q
P5:5
FCO q
P3:3 FCO q
P1:1
claC qc0
.................. . . . . .
. . .
? ? ? ? ? ? ? ?.
.
.
? ? ? ? ? ? .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... ... .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... .. ... ... ... ... ... ... ... ... ...
. .
.
.
.
.
. . . G6:5
? ?
P6:5 q G2:1
??
P2:1
q
cout
.
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
cin G4:3 q qc1
FA .
.
FA .
.
FA .
.
FA .
.
FA .
.
FA .
.
FA
q
P4:3
claC
??
? ?
. FCO
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . . . . . . . . . . . .. q
claC q
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..................................
FCO q
. . . .
. . . . ... ... .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... .. ... ... ... ... ... ... ... ... ...
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
. . .
.
.
.
.
.
.
.
..................................................................
G6:3
??
P6:3
G5:3 P5:3 q
.
.
.
.
. c3 q
.
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. claC
??
. q
..................................................................................................
? ? ? ? ? ? ?
claC
??
s s s s s s s claC q
6 5 4 3 2 1 0
??
claC q
Figure 2. Ripple-carry adder with seven bits inputs. ... ... .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... .. ... ... ... ... ... ... ... ... ...
P6:6 P P P P P P
Dashed lines indicates possible pipeline cuts. ? ?5:5 ?4:4 ?3:3 ?2:2 ?1:1 ?0:0
claS claS claS claS c3
claS
c2
claS
c1 claS
c0
cout c 6 c5 c4
y6 x6 x5 y4 x4 y3 x3 y2 x2 y1 x1 y0 x0 ci s6
? ?
s5
?
s4 s
? ?
s2
?
s1
?
s0
y5 3
? ? ? ? ? ? ? ? ? ? ? ? ? ?
condHA condHA condHA condHA condHA condHA condHA
Figure 4. Brent-Kung parallel prefix adder with seven
cc0=0 sc0=0
.........................................................................................................................................................................................................................................................
1 0 sc0=1
cc0=1
? ? ? ?
cc5=0
6 sc5=0 cc3=0 sc3=0 C
0
1
mux c
1
0 C
0
mux
1
0
c0
.........................................................................................................................................................................................................................................................
5 4 3
sc3=1
bits inputs. Dashed lines indicates possible
cc5=1 sc5=1 cc3=1
p p p p p p
??p ??p
3
6 5 4 cc1=0 sc1=0
C
? ?
??
C
0
? ?
mux
1
??
0
mux
1 c c4=0
5 C
0
??
mux
C
1
??
0
mux
1 c c2=0
3
C
0
2
??
mux
cc1=1
1
2
C
0
1
??
mux
1
sc1=1
1 pipeline cuts.
0 1 0 1 cc4=1 0 1 0 1 cc2=1 c1 c1
cc6=0
7
cc6=1
sc6=0
6 C
sc6=1
C mux mux 5 C C
mux mux 3
p p 7 p p
.........................................................................................................................................................................................................................................................
6
sc2=0
? ? ? ? 2
sc2=1 y6 x6 y5 x5 y4 x4 y2 x2 y1 x1 y0 x0
0 1 0 1 cc4=0
? ? ?? ?? y3 x3
C mux
?? C ??mux 6
2
?? ?? ?? ?? ?? ?? ??
0 1 0 1 0 1
0 1 0 1 cc4=1 C
mux c
C
mux c
C
mux c
cc4=0
C mux
C sc4=0
mux 6
sc4=0
2 2 2
cin
4
7 .........................................................................................................................................................................................................................................................
6
?? ??
cc4=1 ??
sc4=1 sc4=1 ?? sc4=1 claPG claPG claPG claPG claPG claPG claPG
0 17 0 1 6 0 1 5 0 1
4
.... ..... .... .... .... .... .... ..... .... .... .... .... .... ..... .... .... .... .... .... ..... .... .... .... .... .... ..... .... .... .... .... ..
C
mux c
C mux c
C mux c
C
mux c4
q
4 4 4
G6:6
cout P6:6
? s6
? s5 s4
? ?
s3
?
s2 s1
? ?
s0 ? ? G5:5 q
FCO q P5:5
Figure 3. Conditional sum adder with seven bits inputs.
? ?q G4:4 q P
Dashed lines indicates possible pipeline cuts. FCO q 4:4
? ? q
G3:3 q
structed from cascaded full adder gates, where each gate FCO qP3:3
has inputs for the two bits to add and one incoming carry ? ?q G2:2 q
FCO qP2:2
from the lower significance full adder. The outputs of the ? ?q G1:1 q
full adder gate are a sum bit and an outgoing carry to the FCO qP1:1
q
? ? G0:0 q
next higher significance full adder. In Fig. 2 a seven bit FCO q 0:0
P
PDP [µW/MHz]
As for the Brent-Kung adder, a parallel prefix computa-
tion is used for the Kogge-Stone (KS) adder [6]. How- 2000
(a)
ever, for the Kogge-Stone adder the fanout for a dot
operator is at each level limited to two. This leads to a 0
smaller capacitive load, resulting in higher speed, as will 10
be seen in the results section. This comes at a cost of an 5
Pipeline stages 0 20 10
60 50 40 30
increased number of dot-operators. The Kogge-Stone Wordlength
adder also has the smallest possible number of cascaded
PDP [µW/MHz]
dot operators assuming only two-input dot operators are
used. A seven bit Kogge-Stone adder is shown in Fig. 5. 500
(b)
0
3. Implementation 4
2
A program was implemented that can generate synthesiz- Pipeline stages 0 20 10
60 50 40 30
able VHDL for any of the four adder types with an arbi- Wordlength
PDP [µW/MHz]
trary wordlength and an arbitrary degree of pipelining.
The VHDL netlist was then mapped to a 0.35 µm 500
(c)
standard cell library using Leonardo Spectrum. Registers
are included on all inputs and outputs. A clock tree was 0
inserted to take the different number of flip-flops more 5
0 0 0 0
0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800
16−bits adders 17−bits adders 20−bits adders 24−bits adders
200
Power Consumption [mW]
400
250
150
150 200 300
100 150
100 200
100
50 50 100
50
0 0 0 0
0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800
28−bits adders 32−bits adders 48−bits adders 64−bits adders
500 1500
Power Consumption [mW]
2000
600
400
1000 1500
300 400
1000
200
500
200
100 500
0 0 0 0
0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800
Clock frequency [MHz] Clock frequency [MHz] Clock frequency [MHz] Clock frequency [MHz]
Figure 7. Power consumption as a function of maximal clock frequency using different degree of pipelining for ripple-
carry (RCA) , conditional sum (COSA) , Brent-Kung (BK) , and Kogge-Stone (KS) adders.
namely, ripple-carry, conditional sum, Brent-Kung, and [3] R. Zimmermann, Binary Adder Architectures for
Kogge-Stone adders. Simulations showed that the Kogge- Cell-Based VLSI and their Synthesis, PhD Diss.,
Stone adder generally have the lowest power consump- Swiss Federal Institute of Technology (ETH) Zurich,
tion for a given throughput. The Brent-Kung adder gave Switzerland, 1998.
similar results, but with a lower throughput.
[4] J. Sklansky, “Conditional-sum addition logic,” IRE
Trans. Electronic Computers, vol. 9, pp. 226–231,
6. References June 1960.
[1] T. K. Callaway and E. E. Swartzlander Jr., [5] R. P. Brent and H. T. Kung, “A regular layout for
“Estimating the power consumption of CMOS parallel adders,” IEEE Trans. Computers, vol. 31, pp.
adders,” in Proc. IEEE Symp. Computer Arithmetic, 260–264, March 1982.
1993, pp. 210–216. [6] P. M. Kogge and H. S. Stone, “A parallel algorithm
[2] C. Nagendra, M. J. Irwin, and R. M. Owens, “Area- for the efficient solution of a general class of
time-power tradeoffs in parallel adder,” IEEE Trans. recurrence equations,” IEEE Trans. Computers, vol.
Circuits Syst.–II, vol. 43, no. 10, pp. 689–702, Oct. 22, no. 8, pp. 260-264, Aug. 1973.
1996. [7] J. M. Rabaey, Digital Integrated Circuits: A Design
Perspective, Prentice-Hall, 1996.