Вы находитесь на странице: 1из 6

Microelectronics Journal 35 (2004) 939–944

www.elsevier.com/locate/mejo

An area-efficient static CMOS carry-select adder based


on a compact carry look-ahead unit
G.A. Ruiz*, M. Granda
Depto. de Electrónica y Computadores, Facultad de Ciencias, Universidad de Cantabria, Avda. de Los Castros s/n, 39005 Santander, Spain
Received 13 January 2004; received in revised form 23 August 2004; accepted 2 September 2004

Abstract
This paper presents a highly area-efficient CMOS carry-select adder (CSA) with a regular and iterative-shared transistor structure very
suitable for implementation in VLSI. This adder is based on both a static and compact multi-output carry look-ahead (CLA) circuit and a very
simple select circuit. Comparisons with other representative 32-bit CSAs show that the proposed adder reduces the area by between 25 and
16%, the number of transistors by between 43 and 30%, and the dynamic power supply between 35 and 16%, while maintaining a high speed.
q 2004 Elsevier Ltd. All rights reserved.

Keywords: Addition; Carry-select adder; Carry-look ahead; Computer arithmetic circuits

1. Introduction In the classical scheme, the CSA [4] is divided into k-bit
blocks, where each one performs (generally by means of
Addition is one of the fundamental arithmetic operations. two ripple-carry adders) two additions in parallel, one
A number of fast adder architectures have been proposed in assuming the carry-in 0 and the other assuming carry-in 1,
the long history of computer arithmetic [1–3] in pursuit of as shown in Fig. 1. When the carry-out of the preceding
three basic characteristics: a regular structure, a fast logic block (Cpblock) is finally known, the correct sum (which has
evaluation and a compact circuit layout. Table 1 shows the been precomputed) is simply selected. Cpblock must drive
asymptotic time and area requirements for most important many multiplexers, this being one of the factors, which
types of adders. El carry-ripple adder (CRA) is the simplest conditions the size of the CSA blocks and speed. One of the
approach. However, the carry-lookahead adder (CLA) and most typical applications of CSA is their use in the final
its fast version, the parallel-prefix CLA, is the selected adder of parallel multipliers [9].
scheme for time-critical applications with a considerable Different implementations of CSA have been proposed.
cost in terms of silicon area and power dissipation. The CSA Tyagi [10] shows that the classical CSA can be replaced by
provides a compromise between a RCA and a CLA adder. a variable parallel prefix block and a carry circuit selection
Hybrid adders combine elements of different approaches to reducing by about 23% the area and 7% the delay in
obtain adders with a higher performance, reduced area and comparison with similar fast adders. The area-efficient CSA
low power consumption. The CLA/CSA hybrid adder is the proposed in [11] is based on an add-one circuit to replace
most popular for high-speed applications. Implementations one carry-ripple adder resulting in 29.2% fewer transistors
of fast CLA/CSA hybrid adders based on Manchester carry- but with a speed loss of 5.9% for length nZ64. This CSA is
lookahead chains of fixed [5] or variable [6] length, mux- improved in [12] with a substantial reduction in area. Others
based CLA circuits [7] and Ling’s carry [8] have been CSAs with pipeline structure [13], self-timed applications
[14,15] and FPGA technology [16,17] have been proposed.
recently reported.
Recently, new methods to minimize the power-delay
product in CSAs have been presented in [18].
* Corresponding author. Fax: C34 942 201402. This paper presents a highly area-efficient CSA based on
E-mail address: ruizrg@unican.es (G.A. Ruiz). a static and compact multi-output CLA. The select circuit is
0026-2692/$ - see front matter q 2004 Elsevier Ltd. All rights reserved.
doi:10.1016/j.mejo.2004.09.002
940 G.A. Ruiz, M. Granda / Microelectronics Journal 35 (2004) 939–944

Table 1 From Eq. (4), it can be deduced that


Asymptotic time and area requeriments for different types of adders

Time Area Ci Z Ci0 C PPi Cpblock Z Ci0 C Ci1 Cpblock (5)


CRA O(n) O(n)
CLA O(log n) O(n log n) Ci0 PPi being 0 since PiGiZ0, and where
Parallel-prefix CLA O(2 log n) O(2n log n)
pffiffiffi
CSA Oð nÞ O(n)
Y
i
PPi Z Pj (6)
made up of NMOS pass-transistors with a simpler structure jZ1
than that proposed in other CSAs. Hence, this adder has a
regular and iterative-shared transistor structure very suitable Different CSAs with more area-efficient structures or
for implementation in VLSI. Comparison with similar with an improved delay of critical path have been proposed.
32-bit CSAs shows a significant reduction in area and These CSAs can be roughly classified according to the type
power, while maintaining the same speed. of select circuit in the CSA, whether based on carry
selection [7,10] whose main objective is high-speed, or on
adder selection [11,12], where the main objective is to
2. Circuit selection for CSAs reduce area.
The CSA proposed by Tyagi [10] reduces the area by
Let Ai and Bi be the i bits of the input data and CiK1 the about 23% and the delay by 7% in comparison with other
carry-in for stage i. The usual method for computing the competitive adders. For this purpose, it uses a variable
carry-out Ci and sum Si in an adder is 0
parallel prefix block to generate the CiK1 terms and the bit
Ci Z Gi C Pi CiK1 Si Z Pi 4CiK1 (1) slice selection circuit of Fig. 2a. The sum is obtained by
combining the following equations
where
0
Pi Z Ai 4Bi Gi Z Ai Bi (2) CiK1 Z CiK1 C PPiK1 Cpblock Si Z Pi 4CiK1 (7)
are carry propagate signal and carry generate signal,
respectively. For one block of the classical CSA, the In Ref. [7] it is demonstrated that GiCPiZGi4Pi so that
following equations can be defined the following relation can be established: S1i Z S0i 4PPiK1 .
0
The CiK1 are obtained from a mux-based carry look-ahead
Si Z S0i C pblock C S1i Cpblock Ci Z Ci0 C Ci1 Cpblock (3) circuit and the select circuit in Fig. 2b allows the sum S0i Z
0
where Cpblock is the carry-out of preceding block, S0i is the Pi 4CiK1 to be generated, and from these, S1i . The final sum
sum output and Ci0 is the carry output of the adder for carry- is defined as
in 0, and S1i and Ci1 for carry-in 1. For example, for iZ4, (
we get S1 Z P1 4Cpblock
(8)
C40 Z G4 C P4 G3 C P4 P3 G2 C P4 P3 P2 G1 Si Z S0i C pblock C ðS0i 4PPiK1 ÞCpblock for iO 1

C41 Z G4 C P4 G3 C P4 P3 G2 C P4 P3 P2 G1 C P4 P3 P2 P1 The authors indicate that this proposed 1Kb CSA has a
20% size advantage over 1Kb conventional CSA with the
Z C40 C P4 P3 P2 P1 ð4Þ same critical path.

Fig. 1. Classical carry select adder divided in k-bit blocks.


G.A. Ruiz, M. Granda / Microelectronics Journal 35 (2004) 939–944 941

Fig. 2. Bit-slice selection circuits proposed in (a) [10], (b) [7], (c) [11] and (d) [12].

The CSA of [11] is based on a ripple carry adder with principles, whichever are the most suitable for a given
carry-in 0 to obtain S0i and additional logic so that technology.
8 In a carry-ripple adder of n bits, stage i uses three
< S1 Z S01 4Cpblock inputs to implement a 1-bit addition: two input data bits
(9) (Ai, Bi) and a carry input (CiK1) from the previous stage.
: S Z S0 C 0 0
i i pblock C ðSi 4SS iK1 ÞCpblock for iO 1 The speed of this adder depends to a large extent on
where the carry propagation time through its stages. Table 2
shows the truth table of the full adder carry-out and its
Y
i
complement. When AiZBiZ0 or AiZBiZ1, the carry-
SS0i Z S0j (10)
out is generated at the ith stage. Pi term indicates when
jZ1
the ith stage will pass the incoming carry CiK1 ðC iK1 Þ to
This CSA is highly area-efficient resulting in 29.2% the next higher stage. The carry-out and its complement
fewer transistors but with a speed loss of 5.9% for length [19] can be expressed as follows
nZ64 in comparison with classical CSA. Its select circuit
(Fig. 2c) is improved in [12] resulting in the circuit shown in Ci Z Ai Bi C Pi CiK1 Z Gi C Pi CiK1
Fig. 2d which implements the following expressions (13)
8 C i Z A i B i C Pi C iK1 Z Ni C Pi C iK1
>
> S Z S01 4Cpblock
< 1 0
Si Z Si ðSS0iK1 Cpblock Þ C S0i ðSS0iK1 Cpblock Þ (11) where GiZAiBi and Ni Z A i B i . Note that GiPiZ0, NiPiZ
>
> 0 and NiGiZ0, that is, these signals have a mutually
: 0 0
Z Si 4ðSSiK1 Cpblock Þ for iO 1 exclusive property. Fig. 3a and b show an efficient
implementation in static CMOS logic of the basic cells
This CSA reduce the number of transistors by 29% in
that enable the Ci and C i carry-outs to be obtained. The
comparison with [11] with negligible speed loss.
cell of Fig. 3b is more suitable since it is not necessary
All of the above CSAs are functionally equivalent,
to complement the input data (Ai, Bi). Moreover, the
fulfilling the following relation
noise margin problem presented by both cells can be
SS0i Z PPi (12) eliminated by means of output restoring inverters. This
problem is due to the fact that the high level is lower
than the supply voltage level by the threshold voltage of
the NMOS pass-transistors (Pi). These inverters also
3. Compact multi-output CLA provide electrical insulation from the cell and increase its
Table 2
Of all the choices of fast binary adders available for Full adder carry-out
implementation in VLSI, by far the most popular are adders
Ai Bi Ci C i Pi
based on CLA, mainly because they improve the carry delay
by calculating the carries of each stage in parallel. The carry 0 0 0 1 0
generation logic of these parallel adders uses fast and 0 1 CiK1 C iK1 1
1 0 CiK1 C iK1 1
efficient structures developed from techniques, both in
1 1 1 0 0
the domain of logic structure and that of basic circuit
942 G.A. Ruiz, M. Granda / Microelectronics Journal 35 (2004) 939–944

Fig. 3. Static and compact implementation of carry-out: (a) cell for generation of Ci, (b) cell for generation of C i , and (c) 4-bit CLA based on cell (b).

fan-out. Fig. 1c shows the structure of a regular, simple directly transmitted in parallel to all C j ðj% iÞ nodes.
and multi-output 4-bit CLA unit made up of several cells Otherwise, if PPiZ0, then C i is generated in the CLA
0
from Fig. 3b connected in cascade; the optimum number circuit (note that C i PPi Z 0). This adder uses NMOS pass-
of cells may be calculated for a given technology by transistors and thus the level restoring inverters are
simulation. To ensure full swing when a high level is necessary.
transmitted via Pi pass-transistors, the level restoring In order to make a comparative analysis of the most
inverter displayed in Fig. 3c (labelled with a big point) is representative CSAs described in [7,10,12] and shown in
required. This inverter has a weak fed back PMOS Fig. 4, four 32-b CSA adders implemented in 4 bit groups
transistor and the N- and P-transistors should be adjusted were designed using a standard 0.6 mm CMOS two metals
to balance the rising and falling output time. p-well technology. The electrical circuit was extracted from
the layout, including very precise extraction of parasitics,
4. Compact CSA based on CLA and comparisons and simulated with HSPICE at VDDZ3.3 V and CLZ0.1 pF
for each output. Table 3 lists the area, number of transistors,
The area and power efficiency of the proposed CSA is dynamic power consumption at 50 MHz and timing
based on the CLA of Fig. 3 and the following expressions characteristics of these adders.
derived from Eqs. (7) and (13) resulting in The proposed adder occupies 6608 mm2 and reduces the
area by between 16 and 25% in comparison with other
0
C i Z C i C PPi C pblock Si Z Pi 4C iK1 (14) CSAs. More significant is the decrease in the number of
transistors which may reach 43% with respect to that
This CSA, as shown in Fig. 4, is basically made up of a
4-bit CLA circuit and a multi-output AND gate which proposed in [7] and 30% with respect to the others. These
generates the PPi signals. The select circuit is formed by reductions in size and in number of transistors are achieved
NMOS pass-transistors and is far simpler than those shown through the efficient and compact CLA structure and
in Fig. 2. In this circuit, when PPiZ1, then the carry-in is through the ease with which the carry-in can be propagated
G.A. Ruiz, M. Granda / Microelectronics Journal 35 (2004) 939–944 943

Fig. 4. New static and compact 4-bit CSA. All dimensions of transistors are in mm. LZ0.35 mm, except for weak transistors, where LZ0.6 mm.

in parallel using simple NMOS pass transistors. As result, where ti is the time to create the carry signals out of the
the dynamic power supply is reduced by 16% with respect to first block, tg is the delay of the carry selection circuit,
[10], 19% to [7] and 35% to [12]. and te is the delay of the sum selection circuit. These
In a CSA the critical path is defined by the carry chain times, shown in Table 3 and highlighted in Fig. 5,
of the first block and by the selection circuit of each demonstrate that the proposed adder presents a tp similar
block. In the final block the worst delay is in the selection to [10] and slightly lower than [7] which is the fastest.
circuit of final sum bits. Fig. 5 shows the transient Since it has the lowest tg, the critical path of carry-out
waveforms of proposed 32-b CSA for the worst-case propagation through the blocks is minimal, which will
delay path. C4 to C28 are the carry-out signals of different lead to a low tp even for high N.
blocks and S32 the output sum. Therefore, the delay of the
critical path tp of this CSA is made up of N identical
blocks and can be written as
tp Z ti C ðN K 2Þtg C te (15)

Table 3
Simulation results for 32-bit CSAs

Area No. Power at tp (ns) ti (ns) tg (ns) te (ns)


(mm2) trans. 50 MHz
(mW)
[10] 7690 169 5.05 5.65 1.2 0.61 0.79
[7] 8277 210 5.25 5.23 1.28 0.53 0.77
[12] 7746 173 6.57 7.33 1.1 0.89 0.89
Proposed 6608 120 4.52 5.6 1.77 0.5 0.83
Fig. 5. Transients of 32-b CSA.
944 G.A. Ruiz, M. Granda / Microelectronics Journal 35 (2004) 939–944

5. Conclusions [8] Y. Wang, C. Pai, X. Song, The design of hybrid carry-lookahead/-


carry-select adders, IEEE Transactions on Circuit and Systems-II:
Analog and Digital Signal Processing 49 (1) (2000) 16–24.
The CSA presented in this paper is made up of a compact [9] V.G. Oklobdzija, D. Villeger, S.S. Liu, A method for speed optimized
CLA and a very simple selection circuit in order to obtain a partial product reduction and generation of fast parallel multipliers
highly area-efficient CMOS circuit. The CLA has a using an algorithmic approach, IEEE Transactions on Computers
compact, static, regular and multi-output structure with a 45 (3) (1996) 294–306.
low number of transistors. The select circuit is based on [10] A. Tyagi, A reduced-area scheme for carry-select adders, IEEE
Transactions on Computers 42 (10) (1993) 1163–1170.
NMOS pass-transistors and is far simpler than those
[11] T.Y. Chang, M.J. Hsiao, Carry-select adder using single ripple-carry
proposed in other CSAs (Fig. 2). Comparisons with others adder, Electronics Letters 34 (22) (1998) 2101–2103.
representative 32-bit CSAs show a high reduction in area, [12] Y. Kim, L.S. Kim, 64-bit carry-select adder with reduced area,
number of transistors and dynamic power, while maintain- Electronics Letters 37 (10) (2001) 614–615.
ing a low delay of the critical path. [13] Y. Kim, K.H. Sung, L.S. Kim, 1.67 GHz 32-bit pipelined carry-
select adder using complementary scheme, IEEE International
Symposium on Circuits and Systems, Piscataway (USA), 2002
pp. I-461–464.
References [14] P. Corsonello, S. Perri, G. Cocorullo, Hybrid carry-select statistical
carry look-ahead adder, Electronics Letters 35 (7) (1999) 549–551.
[1] K. Hwang, Computer Arithmetic: Principles, Architecture, and Design, [15] A. de Gloria, M. Oliveri, Completion-detecting carry select addition,
Wiley, New York/ChiChester/Brisbane/Toronto/Singapore, 1979. IEEE Proceedings on Computer and Digital Techniques 147 (2)
[2] B. Parhami, Computer Arithmetic, Algorithms and Hardware, Oxford (2000) 93–100.
University Press, New York, Oxford, 2000. [16] R. Hshermian, An algorithm and design procedure for high speed
[3] M.D. Ercegovac, T. Lang, Digital Arithmetic, Morgan Kaufmann carry select adders using FPGA technology, Proceedings of 37th
Publishers, San Francisco, CA, 2004. Midwest Symposium on Circuits and Systems, New York (USA),
[4] N.H.E. Weste, K.E. Eshraghian, Principles of CMOS VLSI Design: a 1994 pp. 257–260.
Systems Perpective, Addison Wesley, NY, 1992. [17] R. Hshermian, A new design for high speed and high-density carry
[5] T. Lynch, E.E. Swartzlander, A spanning tree carry lookahead adder, select adders, Proceedings of 43rd IEEE Midwest Symposium on
IEEE Transactions on Computers 41 (8) (1992) 931–939. Circuits and Systems, Lansing, MI, 2000 pp. 1300–1303.
[6] V. Kantabruta, A recursive carry-lookahead/carry-select hybrid adder, [18] A. Nève, H. Schettler, T. Ludwig, D. Flandre, Power-delay product
IEEE Transactions on Computers 42 (12) (1993) 1495–1499. minimization in high-performance 64-bit carry-select adders, IEEE
[7] H. Morinaka, H. Makino, Y. Nakase, H. Suzuki, K. Mashiko, A 64 bit Transactions on VLSI 12 (3) (2004) 235–243.
carry lookahead CMOS adder using Modified Carry Select, Proceed- [19] G.A. Ruiz, Evaluation of three 32-bit CMOS adders in DCVS logic
ings of the IEEE Custom Integrated Circuits Conference, New York for self-timed circuits, IEEE Journal of Solid-State Circuits 33 (4)
(USA), 1995 pp. 585–588. (1998) 604–613.

Вам также может понравиться