This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TC.2015.2441711, IEEE Transactions on Computers
IEEE TRANSACTIONS ON COMPUTERS
A Modified Partial Product Generator for Redundant Binary Multipliers
Xiaoping Cui, Weiqiang Liu, Senior Member, IEEE, Xin Chen, Earl E. Swartzlander, Jr., Life Fellow, IEEE and Fabrizio Lombardi, Fellow, IEEE
Abstract—Due to its high modularity and carryfree addition, a redundant binary (RB) representation can be used when designing high performance multipliers. The conventional RB multiplier re quires an additional RB partial product (RBPP) row, because an errorcorrecting word (ECW) is generated by both the radix4 Mod ified Booth encoding (MBE) and the RB encoding. This incurs in an additional RBPP accumulation stage for the MBE multiplier. In this paper, a new RB modified partial product generator (RBMPPG) is proposed; it removes the extra ECW and hence, it saves one RBPP accumulation stage. Therefore, the proposed RBMPPG generates fewer partial product rows than a conventional RB MBE multiplier. Simulation results show that the proposed RBMPPG based designs significantly improve the area and power consumption when the word length of each operand in the multiplier is at least 32 bits; these reductions over previous NB multiplier designs incur in a modest delay increase (approximately 5%). The powerdelay prod uct can be reduced by up to 59% using the proposed RB multipliers when compared with existing RB multipliers. Index Terms—Redundant binary, modified Booth encoding, RB partial product generator, RB multiplier.
—————————— _{} ——————————
1
INTRODUCTION
D igital multipliers are widely used in arithmetic
units of microprocessors, multimedia and digital signal processors. Many algorithms and architectures have been proposed to design highspeed and low power multipliers [113]. A normal binary (NB) multipli cation by digital circuits includes three steps. In the first step, partial products are generated; in the second step, all partial products are added by a partial product re duction tree until two partial product rows remain. In the third step, the two partial product rows are added by a fast carry propagation adder. Two methods have been used to perform the second step for the partial product
reduction. A first method uses 42 compressors, while a second method uses redundant binary (RB) numbers [5 6]. Both methods allow the partial product reduction tree to be reduced at a rate of 2:1. The redundant binary number representation has been introduced by Avizienis [1] to perform signeddigit arithmetic; the RB number has the capability to be
————————————————
• X. Cui, W. Liu and X. Chen are with the College of Electronic Informa tion and Engineering, Nanjing University of Aeronautics and Astro nautics, Nanjing, Jiangsu, China, 210016. Email: {wnhcxp, liuwei qiang, xin_chen}@nuaa.edu.cn.
• E. E. Swartzlander, Jr. is with Department of Electrical and Computer Engineering, University of Texas at Austin, Austin, TX, 78712. E mail: eswartzla@aol.com
• F. Lombardi is with the Department of Electrical and Comput er Engi neering, Northeastern University, Boston, MA 02115. Email: lombar di@ece.neu.edu.
xxxxxxxx/0x/$xx.00 © 201x IEEE
represented in different ways. Fast multipliers can be designed using redundant binary addition trees [23]. The redundant binary representation has also been ap plied to a floatingpoint processor and implemented in VLSI [4]. High performance RB multipliers have become popular due to the advantageous features, such as high modularity and carryfree addition [59].
A RB multiplier consists of a RB partial product (RBPP) generator, a RBPP reduction tree and a RBNB converter.
A Radix4 Booth encoding or a modified Booth encoding
(MBE) is usually used in the partial product generator of
parallel multipliers to reduce the number of partial product rows by half [56] [1013]. A RBPP row can be obtained from two adjacent NB partial product rows by inverting one of the pair rows [56]; an Nbit convention al RB MBE (CRBBE2) multiplier requires /4 RBPP rows. An additional errorcorrecting word (ECW) is also required by both the RB and the Booth encoding [56] [14]; therefore, the number of RBPP accumulation stages (NRBPPAS) required by a poweroftwo wordlength (i.e., 2 ^{} bit) multiplier is given by:
NRBPPAS = log _{} ( /4 + 1) ,
= n − 1 , if = 2 ^{}
(1)
If the additional ECW can be removed, an RBPP accu
mulation stage is saved, so resulting in improvements of complexity and critical path delay for a RB multiplier. For example, a conventional 32bit RB multiplier has 4 RBPP accumulation stages; if the ECW is removed, then the number of RBPP accumulation stages is reduced to 3, i.e., the stage count is decreased by 25%. Note that the
problem of extra ECW does not exist in standard signifi cand size (i.e., 24×24bit and 54×54bit) RB multipliers as used in floating pointarithmetic units [56]. Alternatively, a highradix Booth encoding technique can reduce the number of partial products. However, the number of expensive hard multiples (i.e., a multiple that
is not a power of two and the operation cannot be per
formed by simple shifting and/or complementation) in creases too [1416]. Besli et al. [16] noticed that some hard multiples can be obtained by the differences of two sim
ple poweroftwo multiplies. A new radix16 Booth en coding (RBBE4) technique without ECW has been pro posed in [14]; it avoids the issue of hard multiples. A radix16 RB Booth encoder can be used to overcome the hard multiple problem and avoid the extra ECW, but at
the cost of doubling the number of RBPP rows. There fore, the number of radix16 RBPP rows is the same as in the radix4 MBE. However, the RBPP generator based on
a radix16 Booth encoding has a complex circuit struc ture and a lower speed compared with the MBE partial
product generator [10] when requiring the same number
of partial products. This paper focuses on the RBPP generator for design
Published by the IEEE Computer Society
00189340 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TC.2015.2441711, IEEE Transactions on Computers
2
ing a 2 ^{} bit RB multiplier with fewer partial product rows by eliminating the extra ECW. A new RB modified partial product generator based on MBE (RBMPPG2) is proposed. In the proposed RBMPPG2, the ECW of each row is moved to its next neighbor row. Furthermore, the extra ECW generated by the last partial product row is combined with both the two most significant bits (MSBs) of the first partial product row and the two least signifi cant bits (LSBs) of the last partial product row by logic simplification. Therefore, the proposed method reduces the number of RBPP rows from /4 + 1 to /4 , i.e., a RBPP accumulation stage is saved. The proposed me
thod is applied to 8×8bit, 16×16bit, 32×32bit, and
64×64bit RB multiplier designs; the designs are synthe sized using the NanGate 45nm Open Cell Library. The proposed designs achieve significant reductions in area and power consumption compared with existing multip liers when the word length of each of the operands is at least 32 bits. While a modest increase in delay is encoun tered (approximately 5%), the powerdelay product (PDP) at word lengths of at least 32 bits confirms that the proposed designs are the best also by this figure of merit. This paper is organized as follows. Section 2 intro duces radix4 Booth encoding. The design of the conven tional RBPP generator is also reviewed. Section 3 presents the proposed RBMPPG. This section also de monstrates the adoption of the proposed RBMPPG into various wordlength RB multipliers. Section 4 provides the evaluation results of the new RB multipliers using the proposed RBMPPG for different word lengths and compares them to previous best designs found in the technical literature. The conclusion is provided in Sec tion 5.
2 REVIEW OF BOOTH ENCODING AND RB PARTIAL PRODUCT GENERATOR
2.1 Radix4 Booth Encoding
Booth encoding has been proposed to facilitate the
multiplication of two's complement binary numbers [17]. It was revised as modified Booth encoding (MBE) or ra dix4 Booth encoding [18]. The MBE scheme is summa rized in Table I, where = _{}_{}_{} _{}_{}_{} … _{} _{} _{} stands
for the
for the multiplier. The multiplier bits are grouped in sets
of three adjacent bits. The two side bits are overlapped with neighboring groups except the first multiplier bits group in which it is {b1, b0, 0}. Each group is decoded by selecting the partial product shown in Table I, where 2A indicates twice the multiplicand, which can be obtained by left shifting. Negation operation is achieved by in verting each bit of A and adding ‘1’ (defined as correc tion bit) to the LSB [1013]. Methods have been proposed to solve the problem of correction bits for NB radix4
multiplicand, and = _{}_{}_{} _{}_{}_{} … _{} _{} _{} stands
IEEE TRANSACTIONS ON COMPUTERS
Booth encoding (NBBE2) multipliers. However, this problem has not been solved for RB MBE multipliers.
2.2 RB Partial Product Generator
As two bits are used to represent one RB digit, then a RBPP is generated from two NB partial products [16]. The addition of two Nbit NB partial products X and Y using two’s complement representation can be expressed as follows [6]:
+ = − − 1
=
− _{} 2 ^{} + _{} 2 ^{} − − _{} 2 ^{} + _{} 2 ^{} − 1
= − _{} − _{} 2 ^{} + ( _{} − _{} ) 2 ^{} − 1
= , − 1
(2)
where is the inverse of , and the same convention is used in the rest of the paper. The composite number
, can be interpreted as a RB number. The RBPP is generated by inverting one of the two NB partial prod ucts and adding 1 to the LSB. Each RB digit _{} belongs
to the set 1, 0, 1 ; this is coded by two bits as the pair
( _{} ^{} , _{} ^{} ). Note that 1 = −1 . RB numbers can be coded in several ways. Table II shows one specific RB encoding [6], where the RB digit is obtained by performing _{} ^{} − _{} ^{} .
TABLE I
MBE SCHEME
b2i+1, b2i, b2i1 
Operation 
000 
0 
001 
+A 
010 
+A 
011 
+2A 
100 
2A 
101 
A 
110 
A 
111 
0 
TABLE II
RB ENCODING USED IN THIS WORK [6]


RB digit ( _{} ) 
0 
0 
0 
0 
1 
1 
1 
0 
1 
1 
1 
0 
Both MBE and RB coding schemes introduce errors and two correction terms are required: 1) when the NB number is converted to a RB format, 1 must be added to the LSB of the RB number; 2) when the multiplicand is multiplied by 1 or 2 during the Booth encoding, the number is inverted and +1 must be added to the LSB of the partial product. A single ECW can compensate errors from both the RB encoding and the radix4 Booth recod ing. The conventional partial product architecture of an
00189340 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TC.2015.2441711, IEEE Transactions on Computers
X. CUI ET AL: A NEW MODIFIED PARTIAL PRODUCT GENERATOR FOR HIGHSPEED REDUNDANT BINARY MULTIPLIERS
8bit MBE multiplier [56] is shown in Fig. 1, where b_p represents the bit position, _{}_{} or _{}_{} and is generated by using an encoder and decoder (Fig. 2) [10]. An Nbit CRBBE2 multiplier includes N/4 RBPP rows and one ECW; the ECW takes the form as follows:
= _{} _{} _{/} _{} _{} _{} 0 _{} _{} _{/} _{} _{} _{} … 0 _{}_{} 0 _{}_{} … 0 _{}_{} 0 _{}_{} , (3) where i represents the i ^{t}^{h} row of the RBPPs, _{}_{} ∈ 0,1
and _{}_{} ∈ 0, 1 . In _{}_{} , a 1 correction term is always re quired by RB coding. If _{}_{} also corrects the errors from the MBE recoding, then the correction term cancels out to 0. That is to say that if the multiplicand digit is in verted and added to 1, then _{}_{} is 0, otherwise _{}_{} is 1. The errorcorrecting digit _{}_{} is determined only by the Booth encoding:
_{} _{=} _{} 0, no negative encoding 1, negative encoding
^{}
(4)
Fig. 1 Conventional RBPP architecture for an 8bit MBE multiplier.
As shown in Fig. 1 the first RBPP row, i.e. PP _{} , consists of the first partial product row PP _{} ^{} and the second par
and _{} ^{} =
are the sign extension
tial product row PP _{} ^{} i.e., _{} ^{} = _{}_{}
^{} ^{}
…
^{} ^{}
…
,
where, _{}_{}
and _{}_{}
bits, so
^{}
^{}
=
= _{} _{} ∙ 0 + _{} _{} ∙ _{} + _{} _{} ∙ _{} + _{} _{} ∙ _{}
= _{} _{} ∙ _{} + _{} _{}
(5)
(6)
According to Eq. (2), the sign extension bit _{}_{} is also
the inverse of _{}_{} . The _{}_{} in PP _{} ^{} and the _{}_{} in PP _{} ^{} are
also negated as _{}_{} and _{}_{} . Eq. (5) and Eq. (6) are further used in the next section when presenting the proposed modified RBPP generator. For a 2 ^{} bit CRBBE2 multiplier, one additional RBPP accumulation stage is required due to the ECW. For a 64 bit RB multiplier, there are 5 RBPP accumulation stages; therefore, the number of RBPP accumulation stages can be reduced by 20% when eliminating the ECW in a 64bit RB multiplier, which improves both the complexity and
the critical path delay.
3 PROPOSED RB PARTIAL PRODUCT GENERATOR
A new RB modified partial product generator based on MBE (RBMPPG2) is presented in this section; in this design, ECW is eliminated by incorporating it into both the two MSBs of the first partial product row ( _{} ^{} ) and the two LSBs of the last partial product row ( _{(} _{} _{/} _{} _{)} ).
3
3.1 Proposed RBMPPG2
Fig. 3 illustrates the proposed RBMPPG2 scheme for
an 8x8bit multiplier. It is different from the scheme in Fig. 1, where all the errorcorrecting terms are in the last row. ECW1 is generated by PP _{} and expressed as
ECW _{} = 0 E _{}_{} 0 F _{}_{} .
The ECW2 generated by PP _{} (also defined as an extra
(7)
ECW) is left as the last row and it is expressed as:
ECW _{} = 0 _{}_{} 0 _{}_{} _{.}
(8)
To eliminate a RBPP accumulation stage, ECW 2 needs to be incorporated into PP _{} and PP _{} . As discussed in Sec tion 2.2 for _{}_{} and as per Table I, F20 is determined by _{} _{} _{} as follows:
_{} _{=} _{} −1, _{} _{} _{} = 000,001,010,011, or 111
^{}
0,
_{} _{} _{} = 100,101, or 110
(9)
As per Table I, when
_{} _{} _{} = 111 , −0 = 0 can be
used. Therefore, _{}_{} can be expressed as follows:
_{}_{} =
−1,
0,
_{} _{} _{} = 000,001,010, or _{} _{} _{} = 100,101, 110, or
b p
_ 15 14 13 12 11 10
9
8
7
6
5
4
3
2
1
011
_{1}_{1}_{1} ^{}
0
(10)
PP
1
+
PP
1
−
+
PP
2
PP
−
2
_
b p
p
+
29
p
−
27
p
p
+
28
−
26
p
+
27
p
−
25
p
p
+
19
p
+
18
p
+
17
p
+
16
p
+
15
p
p
+
14
p
+
13
p
+
12
p
+
11
p
+
10
−
17
p
−
16
p
−
15
p
−
14
p
−
13
p
−
12
p
p
−
11
p
−
10
+
26
−
24
ECW
(a)
15 14 13 12 11 10
9
8
7
6
5
4
3
2
1
0
PP
1
+
PP
1
−
+ p
PP
2
+
29
PP
− −
2
p
27
p
p
+
28
−
26
p
p
+
27
−
25
p
p
+
26
−
24
p
+
19
−
17
p
p
+
25
p
−
23
p
+
18
p
−
16
p
+
24
p
−
22
+
17
p
−
15
p
p
+
23
p
−
21
(b)
p
p
p
+
16
−
14
+
22
p
−
20
+
15
p
p
+
14
p
+
13
p
−
13
p
−
12
p
−
11
p
+
21
q
−
2( − 1)
p
+
20
0
−
2( − 2)
q
p
+
12
p
−
10
p
+
11
p
+
10
_
b p
15 14 13 12 11 10
2( − 1)
4
3
2
1
0
PP
1
+
PP
1
−
+ p
PP
2
+
29
PP
− −
2
p
27
p
p
+
28
−
26
p
p
+
27
−
25
p
+
14
p
−
12
p
+
20
q
p
+
13
p
−
11
0
−
2( − 2)
p
+
12
p
−
10
p
+
11
p
+
10
(c)
Fig. 3 (a) The first new RBMPPG2 architecture for an 8bit MBE
multiplier; (b) The further revised RBMPPG2 architecture by re
; (c) The final proposed
RBMPPG2 architecture by totally eliminating ECW2 and further
placing E22 and F20 with E2, _{} _{(} _{}_{} _{)}
^{}
^{,} ^{a}^{n}^{d} ^{} ( )
00189340 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TC.2015.2441711, IEEE Transactions on Computers
4
combing E2 into _{}_{}
, ,
,
and _{}_{} .
By setting PP _{} ^{} to all ones and adding +1 to the LSB of the partial product, F20 can then be determined only by _{} as follows:
_{} _{=} _{} −1, _{} = 0 0, _{} = 1
^{}
(11)
A modified radix4 Booth encoding and a decoding circuit for the partial product _{} ^{} are proposed here (Fig. 4); an extra 3input OR gate is then added to the design of [10] (Fig. 2). The three inputs of the additional OR gate
are _{} , _{} ,
is clear that
_{} _{} _{} = 000, _{}_{} = 1 , and _{} ^{} is set to all ones. So, E22 and F20 in ECW2 are now determined by _{} _{} _{} without _{} , _{} . Although the complexity is slightly increased compared with the previous design (Fig. 2), the delay stage remains the same.
are used to
represent the modified partial products (i.e., replacing
to
^{p}
represent the additional partial products that are deter
and _{} . When _{} _{} _{} = 111 ,
it
In
this work, Q
and p _{}_{}
,
Q
,
Q
^{)}^{.} ^{q} ( )
,
and Q
, p , p
,
and q ( )
are
used
mined by F _{}_{} . As 1 can be coded as 111 in RB format,
E22 and F20 can be represented by E2, _{} _{(} _{}_{} _{)} (b)) as follows:
, (Fig. 3
^{}
^{,} ^{} ( )
_{} =
^{} ^{,}
^{}
= 0
_{=} _{−}_{1} ^{}
0, _{}_{} = 0
_{}_{} − 1, _{}_{}
As per Eq. (11) and Eq. (13), _{} _{(} _{}_{} _{)}
^{} ( ) ^{=} ^{} ( )
=
1, _{}_{} _{=} _{−}_{1} ^{}
, and ( )
(12)
^{(}^{1}^{3}^{)}
can also
be expressed as follows:
^{} ( ) ^{=} ^{} ( )
= b
(14)
This is further explained by the truth table of E22, F20
and E2, _{} _{(} _{}_{} _{)}
E2 and _{} ∈ 0, 1, 1 ; E2 can be incorporated into the mod
(Table III). Now ECW2 only includes
^{,} ^{} ( )
ified partial products _{}_{}
in the shortest path (Fig. 3(c)). From
the truth table, E2 can be determined by _{} _{} _{} as fol
lows:
and _{}_{} by replacing
^{} ^{,} ^{}
,
^{}
,
and _{}_{}
,
_{} = −1, 1,
_{} _{} _{} = 000, or 010 _{} _{} _{} = 101 ^{}
0, _{} _{} _{} = 001,011,100, 110, or 111
(15)
So the following three cases can be distinguished: 1)
and _{}_{} remain unchanged as:
When
. 3) When _{} = −1 , a 1
When E2=0, _{}_{}
^{}
, ,
=
, = , =
and _{}_{}
=
.
2)
_{} = 1 , a 1 is added to _{}_{}
^{} ^{} ^{} ^{}
is subtracted from _{}_{}
^{} ^{} ^{} ^{}
.
IEEE TRANSACTIONS ON COMPUTERS
The relationships between _{}_{}
,
and _{}_{} ,
are summarized in Table IV. As the two
^{} ^{,} ^{}
,
,
^{}
,
MSBs of _{} ^{} i.e., _{}_{} and _{}_{} take complementary values as shown in Eq. (4), the operations of adding or subtract
ing a 1 will never incur in an overflow. Therefore, as per
,
Eq. (15) and Table IV, the logic functions of _{}_{} and _{}_{} can be expressed as follows:
^{} ^{,} ^{}
,
^{} = ( _{} ⊕ _{} + _{} _{} _{} ) ⋅ _{}_{} + _{} _{} ⋅ ( _{}_{}
^{}
+ +
⊕
^{}
) + _{} _{} _{} ⋅ ( _{}_{}
^{} ^{}
⊕
)
(16)
^{}
= ( _{} ⊕ _{} + _{} _{} _{} ) ⋅ _{}_{} + _{} _{} ⋅ ( _{}_{}

+ ⊕ )


+ _{} _{} _{} ⋅ ( _{}_{} ^{} 

⊕
) 
(17) 

^{} 
= ( _{} ⊕ _{} + _{} _{} _{} ) ⋅ _{}_{} + _{} _{} ⋅ _{}_{}

⊕


^{} 
+ _{} _{} _{} ⋅ _{}_{}
⊕
= ( _{} ⊕ _{} + _{} _{} _{} ) ⋅ _{}_{} + _{} _{} ⋅ _{}_{}

(18) 

+ _{} _{} _{} ⋅ _{}_{}

(19) 

TABLE III − − 
− 
− 

TRUTH TABLE OF E2, _{2}_{(}_{−}_{2}_{)} ^{,} ^{} 2(−1) 
AND _{2}_{1} 
^{,} ^{} 20 
. 






_{} _{} _{} 
E22F20 
_{} _{} _{(} _{} _{} _{)} ^{} ( ) 
^{} 
^{} 

000 
0 
1 
111 
0 
0 

001 
0 
0 
000 
_{} _{} 
_{} 
_{} 

010 
0 
1 
111 
_{} 

_{} 

011 
0 
0 
000 
_{} _{} 
_{0} 

100 
_{1} 
_{1} 
011 

1 

101 
1 
0 
_{1}_{0}_{0} 
_{} _{} 


110 
_{1} 
_{1} 
011 
_{} 

111 
0 
0 
_{0}_{0}_{0} 
0 
0 
TABLE IV
THE TRUTH TABLE OF _{}_{}
, , ,
^{} ^{} ^{} ^{}
^{} ^{} ^{} ^{}
^{} ^{} ^{} ^{}
^{} ^{} ^{} ^{}
(E2=0) 
(E2=1) 
(E2=－1) 

0100 
0100 
0101 
0011 
0101 
0101 
0110 
0100 
0110 
0110 
0111 
0101 
0111 
0111 
1000 
0110 
1000 
1000 
1001 
0111 
1001 
1001 
1010 
1000 
1010 
1010 
1011 
1001 
1011 
1011 
1100 
1010 
The delay of the RBMPPG2 can be further reduced
directly from the multip
licand A and the multiplier B. The relationships between
by generating _{}_{}
,
, ,
^{}
,
and A, B have been discussed in Section 2.2 as Eq.
(5) and Eq. (6). The relationships between _{}_{}
B are also shown in Table III according to the MBE
, pressed as follows by replacing _{}_{}
scheme. Therefore, Q
, and _{}_{} with
ex
and A,
,
, Q
, Q
and Q
,
can be
^{} ^{,} ^{}
the multiplicand bits (ai) and the multiplier bits (bi) after
00189340 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TC.2015.2441711, IEEE Transactions on Computers
X. CUI ET AL: A NEW MODIFIED PARTIAL PRODUCT GENERATOR FOR HIGHSPEED REDUNDANT BINARY MULTIPLIERS
^{}
^{}
^{}
= _{} _{} _{} + _{} _{} ⋅ _{} ⋅ _{} + _{} _{} + _{} _{} + _{} _{} _{} _{} _{}
+ _{} _{} _{} + _{} _{} ⋅ [ _{} ⋅ _{} + _{} _{} + _{} _{} +
_{} ∙ _{} _{} _{} _{} ]
(21)
= _{} ⋅ ( _{} ⋅ _{} _{} _{} + _{} _{} _{} + _{} ⋅ _{} _{} + _{} _{} ) +
_{} ⋅ ( _{} ⋅ _{} _{} + _{} _{} + _{} ⋅ _{} + _{} _{} + _{} _{} ) (22)
(23)
= _{} _{} + _{} _{}
The circuit diagrams of the modified partial product
variables _{}_{}
,
and _{}_{} are shown in Fig. 5. It is clear
that _{}_{} has the longest delay path. It is well known that the inverter, the 2input NAND gate and the transmis
sion gate (TG) are faster than other gates. So, it is desira ble to use TGs when designing the multiplexer [56]. As shown in Fig. 5 (a), the critical path delay (the dash line) consists of a 1stage ANDORInverter gate, a 1stage inverter, and 2stage TGs. Therefore, RBMPPG2 just increases the TG delay by 1stage compared with the MBE partial product of Fig. 2. The above discussion is only an example; the above technique can be applied to design any 2 ^{} bit RB mul tipliers. It eliminates the extra ECWN/4 and saves one RBPP accumulation stage, i.e., three XOR gate delays, while only slightly increasing the delay of the partial product generation stage. In general, an Nbit RB multip lier has /4 RBPP rows using the proposed RBMPPG2.
The partial product variables _{} _{(} _{}_{}_{} _{)}
and
Q
needs additional 3input OR gates (Fig. 4). Therefore, the extra ECWN/4 is removed by the transformation of 4 par
tial product variables Q
partial product row is saved in RB multipliers with any poweroftwo wordlength.
and one
. The radix4 Booth decoding of a PPR ( PP _{} _{/} _{} )
and
^{}
^{,} ^{} ^{,} ^{} ( / )
, Q
/ )
(
,
^{} ( / )
( / )
can be replaced by Q
( )
, Q , Q
, Q
( )
/ )
(
, Q
/ )
(
3.2 Design of RBMPPG2based HighSpeed RB Multipliers
The proposed RBMPPG2 can be applied to any 2 ^{}  bit RB multipliers with a reduction of a RBPP accumula tion stage compared with conventional designs. Al though the delay of RMPPG2 increases by 1stage of TG delay, the delay of one RBPP accumulation stage is sig nificantly larger than a 1stage TG delay. Therefore, the delay of the entire multiplier is reduced. The improved complexity, delay and power consumption are very at tractive for the proposed design. A 32bit RB MBE multiplier using the proposed RBPP generator is shown in Fig. 6. The multiplier consists of the proposed RBMPPG2, three RBPP accumulation stages, and one RBNB converter. Eight RBBE2 blocks generate the RBPP ( _{} ^{} , _{} ^{} ); they are summed up by the RBPP reduction tree that has three RBPP accumulation stages. Each RBPP accumulation block contains RB full adders (RBFAs) and half adders (RBHAs) [7]. The 64bit
5
RBNB converter converts the final accumulation results into the NB representation, which uses a hybrid parallel prefix/carry select adder [25] (as one of the most efficient
fast parallel adder designs).
(b)
Fig. 5 The circuit diagram of the modified partial product vari
ables: (a) Q
and Q
, (b) Q
.
There are 4 stages in a conventional 32bit RB MBE multiplier architecture; however, by using the proposed RBMPPG2, the number of RBPP accumulation stages is reduced from 4 to 3 (i.e., a 25% reduction). These are sig nificant savings in delay, area as well as power con sumption. The improvements in delay, area and power consumption are further demonstrated in the next sec tion by simulation. Table V compares the number of RBPP accumulation stages in different 2 ^{} bit RB multipliers, i.e., 8×8bit, 16×16bit, 32×32bit, 64×64bit multipliers. For a 64bit multiplier, the proposed design has 4 RBPP accumulation stages; it reduces the partial product accumulation delay time by 20% compared with CRBBE 2 multipliers. Although both the proposed design and RBBE4 have the same number of RBPP accumulation stages, RBBE4 is more complex, because it uses radix16 Booth encoding [14].
00189340 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TC.2015.2441711, IEEE Transactions on Computers
6
IEEE TRANSACTIONS ON COMPUTERS
TABLE V
COMPARISON OF RBPP ACCUMULATION STAGES IN RBPP REDUCTION TREE
Methods
CRBBE2
RBBE4 [14]
Proposed
64×64
32×32
16×16
8×8
5
4
4
4
3
3
3
2
2
2
1
1
4
PERFORMANCE EVALUATION
The performance of various 2 ^{} bit RB multipliers using the proposed RBMPPG2 is assessed; the results are compared with NBBE2, CRBBE2 and RBBE4 [14] mul tipliers that are the latest and best designs found in the technical literature. All designs of RB multipliers use the RBFA and RBHA of [7]. An RBNB converter is required in the final stage of the RB multiplier to convert the summation result in RB form to a two’s complement number. It has been shown that the constanttime converter in [7] does not exist [1921]. However, there is a carryfree multiplier that uses redundant adders in the reduction of partial products by applying onthefly conversion [22] in paral lel with the reduction and generates the product without a carrypropagate adder [2324]. A hybrid parallelprefix/carryselect adder [25] is used for the final RBNB converter. The NBBE2 multip lier design uses the same encoder and decoder as shown in Fig. 2. 42 compressors [2628] are used in the partial product reduction tree. The extra ECW in the NB multip lier designs is also modified as proposed in [11]. The multiplier designs are described at gate level in Verilog HDL and verified by Synopsys VCS using ran domly generated input patterns; the designs are synthe
sized by the Synopsys Design Compiler using the Nan Gate45nm Open Cell Library. In the simulation of each design, a supply voltage of 1.25V and room temperature are assumed. Standard buffers of a 2X strength are used for both the input drive and the output load. The option for logic structuring is turned off to prevent the tool from changing the structure of the unit cells. The aver age power consumption is found using the Synopsys Power Compiler with back annotated switching activity files generated from 2500 random input vectors. Table VI summarizes the delay, area, power and pow erdelay product (PDP) of the NB and RB multiplier de signs; the delay, area, power and PDP metrics are com pared separately.
TABLE VI
DESIGN RESULTS OF RB MULTIPLIERS (USING NANGATE 45NM OPEN CELL LIBRARY)
Nbits 
NB and RB Multipliers 
Delay 
Area( ^{} ) 
^{P}^{o}^{w}^{e}^{r} 
PDP 

(ns) 
( 
) 
(pJ) 

8 
NBBE2 
0.95 
1210 
301 
0.285 

CRBBE2 
1.20 
1322 
485 
0.582 

RBBE4 [14] 
1.32 
1071 
546 
0.721 

Proposed 
1.00 
1258 
496 
0.496 

16 
NBBE2 
1.20 
4055 
1128 
1.353 

CRBBE2 
1.48 
4165 
1549 
2.293 

RBBE4 [14] 
1.62 
3897 
2498 
4.047 

Proposed 
1.26 
4004 
1500 
1.890 

32 
NBBE2 
1.51 
14420 
4215 
6.364 

CRBBE2 
1.79 
13925 
3227 
5.776 

RBBE4 [14] 
2.09 
14454 
5745 
12.007 

Proposed 
1.57 
13589 
3090 
4.851 

64 
NBBE2 
1.92 
54120 
16047 
30.810 

CRBBE2 
2.29 
48624 
11852 
27.141 

RBBE4 [14] 
2.47 
55119 
20517 
50.677 

Proposed 
2.05 
47903 
11199 
22.958 
00189340 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TC.2015.2441711, IEEE Transactions on Computers
X. CUI ET AL: A NEW MODIFIED PARTIAL PRODUCT GENERATOR FOR HIGHSPEED REDUNDANT BINARY MULTIPLIERS
Consider the delay first (Fig. 7); compared with CRBBE2, the proposed designs can reduce the delay (for example up to 16.6% for the case of 8×8bit multiplier; for all cases of wordlength, the delay is reduced by at least 10%. Compared with RBBE4, the proposed designs can reduce the delay by up to 24.8% for the case of 32×32bit and the delay is reduced by at least 17% for all cases of wordlength. The delay improvement is achieved by the reduced critical path due to the elimina tion of one RBPP accumulation stage. The delay of the proposed RB multipliers is slightly larger (approximately 5%) compared with the best NB multiplier, i.e., NBBE2. However, its area and power are significantly lower than NBBE2 for large word length designs (32bit and 64bit), as discussed next.
Fig. 7 Delay comparison of the NB and RB MBE multipliers at different wordlengths.
Compared with CRBBE2, the RB multiplier using the proposed RBMPPG2 has the smallest area for all cases (Fig. 8). For 8×8bit and 16×16bit multipliers, the area of RBBE4 RB multipliers is smaller than that of the pro posed RB multipliers because RBBE4 based designs don't require extra ECW, while the area is slightly in creased by the modified partial product in the proposed RB multipliers. Compared with NBBE2 and RBBE4, the proposed designs can reduce the area by up to 11.5% and 13.0%, respectively, for the case of a 64×64bit mul tiplier and it is especially pronounced for large size de signs, thus confirming the area efficiency of the pro posed approach. Power consumptions of NB and RB multipliers are al so considered and compared (Fig. 9). The proposed de signs can reduce the power for a 64×64bit multiplier by up to 30.2%, 5.5% and 45.4%, respectively, compared with NBBE2, CRBBE2, and RBBE4. PDP is a commonly used metric for combined per formance in terms of delay and power consumption. In Fig. 10, the RB multipliers using the proposed RBMPPG 2 have the smallest PDP under all cases of RB multipliers.
7
Compared with CRBBE2, the proposed designs can re duce the PDP by over 14% for all cases. Compared with RBBE4, the proposed designs can reduce the PDP by up to 59.6% for the case of a 32×32bit multiplier, and over all cases the proposed designs can reduce the PDP by over 30%. Thus, these results confirm the proposed RBMPP2 can be very useful for designing area and PDP efficient RB multipliers.
Fig. 8 Area comparison of the NB and RB MBE multipliers at dif ferent wordlengths.
Fig. 10 PDP comparison of the NB and RB MBE multipliers at differ ent wordlengths.
Although the PDP of the proposed design is larger than that of NBBE2 for small sizes, it is much better for large word lengths. Compared with NBBE2, the pro
00189340 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TC.2015.2441711, IEEE Transactions on Computers
8
posed designs reduce the PDP by 23.8% and 25.5% for 32×32bit and 64×64bit multipliers, respectively.
5
CONCLUSIONS
A new modified RBPP generator has been proposed in this paper; this design eliminates the additional ECW that is introduced by previous designs. Therefore, a RBPP accumulation stage is saved due to the elimination of ECW. The new RB partial product generation tech nique can be applied to any 2 ^{} bit RB multipliers to re duce the number of RBPP rows from /4 + 1 to /4 . Simulation results have shown that the performance of RB MBE multipliers using the proposed RBMPPG2 is improved significantly in terms of delay and area. The proposed designs achieve significant reductions in area and power consumption when the word length is at least 32 bits. The PDP can be reduced by up to 59% using the proposed RB multipliers when compared with existing RB multipliers. Hence, the proposed RBPP generation method is a very useful technique when designing area and PDP efficient poweroftwo RB MBE multipliers.
ACKNOWLEDGMENT
This work is supported by grants from the National Natural Science Foundation of China (No. 61401197 and No. 61106029). The corresponding author of this paper is Weiqiang Liu.
REFERENCES
[1] A. Avizienis, “Signeddigit number representations for fast
parallel arithmetic,” IRE Trans. Electron. Computers, vol. EC10, pp. 389–400, 1961. N. Takagi, H. Yasuura, and S. Yajima, “Highspeed VLSI mul
[2]
tiplication algorithm with a redundant binary addition tree,” IEEE Trans. Computers, vol. C34, pp. 789796, 1985. [3] Y. Harata, Y. Nakamura, H. Nagase, M. Takigawa, and N. Takagi, “A high speed multiplier using a redundant binary
adder tree,” IEEE J. SolidState Circuits, vol. SC22, pp. 2834,
1987.
[4] 
H. Edamatsu, T. Taniguchi, T. Nishiyama, and S. Kuninobu, “A 
[5] 
33 MFLOPS floating point processor using redundant binary representation,” in Proc. IEEE Int. SolidState Circuits Conf. (ISSCC), pp. 152–153, 1988. H. Makino, Y. Nakase, and H. Shinohara, “A 8.8ns 54x54bit 
[6] 
multiplier using new redundant binary architecture,” in Proc. Int. Conf. Comput. Design (ICCD), pp. 202205, 1993. H. Makino, Y. Nakase, H. Suzuki, H. Morinaka, H. Shinohara, 
[7] 
and K. Makino, “An 8.8ns 54×54bit multiplier with high speed redundant binary architecture,” IEEE J. SolidState Cir cuits, vol. 31, pp. 773783, 1996. Y. Kim, B. Song, J. Grosspietsch, and S. Gillig, “A carryfree 
[8] 
54b×54b multiplier using equivalent bit conversion algorithm,” IEEE J. SolidState Circuits, vol. 36, pp. 1538–1545, 2001. Y. He and C. Chang, “A powerdelay efficient hybrid carry lookahead carryselect based redundant binary to two’s com plement converter,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 55, pp. 336–346, 2008. 
IEEE TRANSACTIONS ON COMPUTERS
[9] G. Wang and M. Tull, “A new redundant binary number to 2's complement number converter,” in Proc. Region 5 Conference:
Annual Technical and Leadership Workshop, pp. 141143, 2004. [10] W. Yeh and C. Jen, “Highspeed Booth encoded parallel mul tiplier design,” IEEE Trans. Computers, vol. 49, pp. 692701,
2000.
[11] S. Kuang, J. Wang, and C. Guo, “Modified Booth multiplier with a regular partial product array,” IEEE Trans. Circuits Syst. II, vol. 56, pp. 404408, 2009.
[12] J. Kang and J. Gaudiot,“A simple highspeed multiplier de sign,” IEEE Trans. Computers, vol. 55, pp.12531258, 2006. [13] F. Lamberti, N. Andrikos, E. Antelo, and P. Montuschi, “Re
ducing the computation time in (short bitwidth) two’s com plement multipliers,” IEEE Trans. Computers, vol. 60, pp. 148 156, 2011. [14] Y. He and C. Chang, “A new redundant binary Booth encoding for fast 2 ^{} bit multiplier design,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 56, pp. 1192–1199, 2009. [15] Y. He, C. Chang, J. Gu, and H. Fahmy, “A novel covalent re dundant binary Booth encoder,” in Proc. IEEE Int. Symp. Cir cuits Syst. (ISCAS), vol. 1, pp. 69–72, 2005. [16] N. Besli and R. Deshmukh, “A novel redundant binary signed digit (RBSD) Booth’s encoding,” in Proc. IEEE Southeast Conf., pp. 426–431, 2002. [17] A. Booth, “A signed binary multiplication technique,” The Quarterly J. Mechanical and Applied Math., vol. 4, pp. 236240,
1951.
[18] O. MacSorley, “Highspeed arithmetic in binary computers,” IRE Proc., vol. 49, pp. 67–91, 1961. [19] Y. Kim, B. Song, J. Grosspietsch, and S. Gillig, “Correction to ‘a carryfree 54b×54b multiplier using equivalent bit conversion algorithm’,” IEEE J. SolidState Circuits, vol. 38, p. 159, 2003.
[20] W. Rulling, “A remark on carryfree binary multiplication,” IEEE J. SolidState Circuits, vol. 38, pp. 159–160, 2003.
[21] M. Ercegovac and T. Lang, “Comments on ‘a
carryfree
54b×54b multiplier using equivalent bit conversion algorithm’,” IEEE J. SolidState Circuits, vol. 38, pp. 160–161, 2003.
[22] M. Ercegovac and T. Lang, “Onthefly conversion of redun dant to conventional representations,” IEEE Trans. Computers, vol. C36, pp. 895897, 1987.
[23] M. Ercegovac and T. Lang, “Fast multiplication without carry propagate addition,” IEEE Trans. Computers, vol. C39, pp. 13851390, 1990.
[24] L. Ciminiera and P. Montuschi, “Carrysave multiplication schemes without final addition,” IEEE Trans. Computers, vol. 45, pp. 10501055,1996.
[25] G. Dimitrakopoulos and D. Nikolos, “Highspeed parallel prefix VLSI Ling adders,” IEEE Trans. Computers, vol. 54, pp. 225231, 2005. [26] S. Hsiao, M. Jiang, and J. Yeh, “Design of highspeed low power 3–2 counter and 4–2 compressor for fast multipliers,” Electron. Lett., vol. 34, pp. 341–343, Feb. 1998. [27] C. Chang, J. Gu, and M. Zhang, “Ultra lowvoltage lowpower CMOS 42 and 52 compressors for fast arithmetic circuits,”
IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 51, pp. 1985–1997,
2004.
[28] D. Radhakrishnan and A. Preethy, “Low power CMOS pass logic 42 compressor for highspeed multiplication,” in Proc. IEEE Midwest Symp. Circuits Syst., vol. 3, pp. 1296–1298, 2000.
00189340 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Гораздо больше, чем просто документы.
Откройте для себя все, что может предложить Scribd, включая книги и аудиокниги от крупных издательств.
Отменить можно в любой момент.