Академический Документы
Профессиональный Документы
Культура Документы
www.elsevier.com/locate/mejo
Abstract
This paper presents a highly area-efficient CMOS carry-select adder (CSA) with a regular and iterative-shared transistor structure very
suitable for implementation in VLSI. This adder is based on both a static and compact multi-output carry look-ahead (CLA) circuit and a very
simple select circuit. Comparisons with other representative 32-bit CSAs show that the proposed adder reduces the area by between 25 and
16%, the number of transistors by between 43 and 30%, and the dynamic power supply between 35 and 16%, while maintaining a high speed.
q 2004 Elsevier Ltd. All rights reserved.
1. Introduction In the classical scheme, the CSA [4] is divided into k-bit
blocks, where each one performs (generally by means of
Addition is one of the fundamental arithmetic operations. two ripple-carry adders) two additions in parallel, one
A number of fast adder architectures have been proposed in assuming the carry-in 0 and the other assuming carry-in 1,
the long history of computer arithmetic [1–3] in pursuit of as shown in Fig. 1. When the carry-out of the preceding
three basic characteristics: a regular structure, a fast logic block (Cpblock) is finally known, the correct sum (which has
evaluation and a compact circuit layout. Table 1 shows the been precomputed) is simply selected. Cpblock must drive
asymptotic time and area requirements for most important many multiplexers, this being one of the factors, which
types of adders. El carry-ripple adder (CRA) is the simplest conditions the size of the CSA blocks and speed. One of the
approach. However, the carry-lookahead adder (CLA) and most typical applications of CSA is their use in the final
its fast version, the parallel-prefix CLA, is the selected adder of parallel multipliers [9].
scheme for time-critical applications with a considerable Different implementations of CSA have been proposed.
cost in terms of silicon area and power dissipation. The CSA Tyagi [10] shows that the classical CSA can be replaced by
provides a compromise between a RCA and a CLA adder. a variable parallel prefix block and a carry circuit selection
Hybrid adders combine elements of different approaches to reducing by about 23% the area and 7% the delay in
obtain adders with a higher performance, reduced area and comparison with similar fast adders. The area-efficient CSA
low power consumption. The CLA/CSA hybrid adder is the proposed in [11] is based on an add-one circuit to replace
most popular for high-speed applications. Implementations one carry-ripple adder resulting in 29.2% fewer transistors
of fast CLA/CSA hybrid adders based on Manchester carry- but with a speed loss of 5.9% for length nZ64. This CSA is
lookahead chains of fixed [5] or variable [6] length, mux- improved in [12] with a substantial reduction in area. Others
based CLA circuits [7] and Ling’s carry [8] have been CSAs with pipeline structure [13], self-timed applications
[14,15] and FPGA technology [16,17] have been proposed.
recently reported.
Recently, new methods to minimize the power-delay
product in CSAs have been presented in [18].
* Corresponding author. Fax: C34 942 201402. This paper presents a highly area-efficient CSA based on
E-mail address: ruizrg@unican.es (G.A. Ruiz). a static and compact multi-output CLA. The select circuit is
0026-2692/$ - see front matter q 2004 Elsevier Ltd. All rights reserved.
doi:10.1016/j.mejo.2004.09.002
940 G.A. Ruiz, M. Granda / Microelectronics Journal 35 (2004) 939–944
C41 Z G4 C P4 G3 C P4 P3 G2 C P4 P3 P2 G1 C P4 P3 P2 P1 The authors indicate that this proposed 1Kb CSA has a
20% size advantage over 1Kb conventional CSA with the
Z C40 C P4 P3 P2 P1 ð4Þ same critical path.
Fig. 2. Bit-slice selection circuits proposed in (a) [10], (b) [7], (c) [11] and (d) [12].
The CSA of [11] is based on a ripple carry adder with principles, whichever are the most suitable for a given
carry-in 0 to obtain S0i and additional logic so that technology.
8 In a carry-ripple adder of n bits, stage i uses three
< S1 Z S01 4Cpblock inputs to implement a 1-bit addition: two input data bits
(9) (Ai, Bi) and a carry input (CiK1) from the previous stage.
: S Z S0 C 0 0
i i pblock C ðSi 4SS iK1 ÞCpblock for iO 1 The speed of this adder depends to a large extent on
where the carry propagation time through its stages. Table 2
shows the truth table of the full adder carry-out and its
Y
i
complement. When AiZBiZ0 or AiZBiZ1, the carry-
SS0i Z S0j (10)
out is generated at the ith stage. Pi term indicates when
jZ1
the ith stage will pass the incoming carry CiK1 ðC iK1 Þ to
This CSA is highly area-efficient resulting in 29.2% the next higher stage. The carry-out and its complement
fewer transistors but with a speed loss of 5.9% for length [19] can be expressed as follows
nZ64 in comparison with classical CSA. Its select circuit
(Fig. 2c) is improved in [12] resulting in the circuit shown in Ci Z Ai Bi C Pi CiK1 Z Gi C Pi CiK1
Fig. 2d which implements the following expressions (13)
8 C i Z A i B i C Pi C iK1 Z Ni C Pi C iK1
>
> S Z S01 4Cpblock
< 1 0
Si Z Si ðSS0iK1 Cpblock Þ C S0i ðSS0iK1 Cpblock Þ (11) where GiZAiBi and Ni Z A i B i . Note that GiPiZ0, NiPiZ
>
> 0 and NiGiZ0, that is, these signals have a mutually
: 0 0
Z Si 4ðSSiK1 Cpblock Þ for iO 1 exclusive property. Fig. 3a and b show an efficient
implementation in static CMOS logic of the basic cells
This CSA reduce the number of transistors by 29% in
that enable the Ci and C i carry-outs to be obtained. The
comparison with [11] with negligible speed loss.
cell of Fig. 3b is more suitable since it is not necessary
All of the above CSAs are functionally equivalent,
to complement the input data (Ai, Bi). Moreover, the
fulfilling the following relation
noise margin problem presented by both cells can be
SS0i Z PPi (12) eliminated by means of output restoring inverters. This
problem is due to the fact that the high level is lower
than the supply voltage level by the threshold voltage of
the NMOS pass-transistors (Pi). These inverters also
3. Compact multi-output CLA provide electrical insulation from the cell and increase its
Table 2
Of all the choices of fast binary adders available for Full adder carry-out
implementation in VLSI, by far the most popular are adders
Ai Bi Ci C i Pi
based on CLA, mainly because they improve the carry delay
by calculating the carries of each stage in parallel. The carry 0 0 0 1 0
generation logic of these parallel adders uses fast and 0 1 CiK1 C iK1 1
1 0 CiK1 C iK1 1
efficient structures developed from techniques, both in
1 1 1 0 0
the domain of logic structure and that of basic circuit
942 G.A. Ruiz, M. Granda / Microelectronics Journal 35 (2004) 939–944
Fig. 3. Static and compact implementation of carry-out: (a) cell for generation of Ci, (b) cell for generation of C i , and (c) 4-bit CLA based on cell (b).
fan-out. Fig. 1c shows the structure of a regular, simple directly transmitted in parallel to all C j ðj% iÞ nodes.
and multi-output 4-bit CLA unit made up of several cells Otherwise, if PPiZ0, then C i is generated in the CLA
0
from Fig. 3b connected in cascade; the optimum number circuit (note that C i PPi Z 0). This adder uses NMOS pass-
of cells may be calculated for a given technology by transistors and thus the level restoring inverters are
simulation. To ensure full swing when a high level is necessary.
transmitted via Pi pass-transistors, the level restoring In order to make a comparative analysis of the most
inverter displayed in Fig. 3c (labelled with a big point) is representative CSAs described in [7,10,12] and shown in
required. This inverter has a weak fed back PMOS Fig. 4, four 32-b CSA adders implemented in 4 bit groups
transistor and the N- and P-transistors should be adjusted were designed using a standard 0.6 mm CMOS two metals
to balance the rising and falling output time. p-well technology. The electrical circuit was extracted from
the layout, including very precise extraction of parasitics,
4. Compact CSA based on CLA and comparisons and simulated with HSPICE at VDDZ3.3 V and CLZ0.1 pF
for each output. Table 3 lists the area, number of transistors,
The area and power efficiency of the proposed CSA is dynamic power consumption at 50 MHz and timing
based on the CLA of Fig. 3 and the following expressions characteristics of these adders.
derived from Eqs. (7) and (13) resulting in The proposed adder occupies 6608 mm2 and reduces the
area by between 16 and 25% in comparison with other
0
C i Z C i C PPi C pblock Si Z Pi 4C iK1 (14) CSAs. More significant is the decrease in the number of
transistors which may reach 43% with respect to that
This CSA, as shown in Fig. 4, is basically made up of a
4-bit CLA circuit and a multi-output AND gate which proposed in [7] and 30% with respect to the others. These
generates the PPi signals. The select circuit is formed by reductions in size and in number of transistors are achieved
NMOS pass-transistors and is far simpler than those shown through the efficient and compact CLA structure and
in Fig. 2. In this circuit, when PPiZ1, then the carry-in is through the ease with which the carry-in can be propagated
G.A. Ruiz, M. Granda / Microelectronics Journal 35 (2004) 939–944 943
Fig. 4. New static and compact 4-bit CSA. All dimensions of transistors are in mm. LZ0.35 mm, except for weak transistors, where LZ0.6 mm.
in parallel using simple NMOS pass transistors. As result, where ti is the time to create the carry signals out of the
the dynamic power supply is reduced by 16% with respect to first block, tg is the delay of the carry selection circuit,
[10], 19% to [7] and 35% to [12]. and te is the delay of the sum selection circuit. These
In a CSA the critical path is defined by the carry chain times, shown in Table 3 and highlighted in Fig. 5,
of the first block and by the selection circuit of each demonstrate that the proposed adder presents a tp similar
block. In the final block the worst delay is in the selection to [10] and slightly lower than [7] which is the fastest.
circuit of final sum bits. Fig. 5 shows the transient Since it has the lowest tg, the critical path of carry-out
waveforms of proposed 32-b CSA for the worst-case propagation through the blocks is minimal, which will
delay path. C4 to C28 are the carry-out signals of different lead to a low tp even for high N.
blocks and S32 the output sum. Therefore, the delay of the
critical path tp of this CSA is made up of N identical
blocks and can be written as
tp Z ti C ðN K 2Þtg C te (15)
Table 3
Simulation results for 32-bit CSAs