Вы находитесь на странице: 1из 5

2608 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO.

7, JULY 2016

A New Fast and Area-Efficient Adder-Based Sign Detector


for RNS {2n 1, 2n , 2n + 1}
Sachin Kumar and Chip-Hong Chang

Abstract The moduli set {2n 1, 2n , 2n + 1} has been widely used pipeline. The size and number of lookup tables, as well as their access
in residue number system (RNS)-based computations. Its sign extraction time also grow with the size of the moduli. Among these operations,
problem, albeit fundamentally important in magnitude comparison and
sign detection is an example of a less focused problem for this
other difficult algorithms in RNS, has received considerably less attention
than its scaling and reverse conversion problems. This brief presents a moduli set. Despite its importance as a preprocessing operation and
new algorithm for the design of a fast adder-based sign detector. The an integral component of other intermodulo operations like magnitude
circuit is greatly simplified by shrinking the dynamic range to eliminate comparison and overflow detection [10], [13], only a handful of
large modulo operations with the help of the new Chinese remainder solutions are found in the literature.
theorem. Our synthesis results with the 65-nm CMOS standard cell
library show that the proposed design outperforms all the existing A general theorem for sign detection in residue domain is presented
adder-based sign detectors reported for this moduli set in area and speed in [5], where the magnitude of an integer is first decoded from its
for n ranges from 5 to 25 in the step of 5. residue representations by converting the residues into its equivalent
binary representation to find the halfway point of the dynamic
Index Terms Chinese remainder theorem (CRT), computer
range. The large modulo operation in the reverse conversion is
arithmetic, residue number system (RNS), RNS scaling, sign
detection. reduced in [14] by the mixed radix conversion (MRC) and in [15]
to a modulo-two sum by using the fractional binary representation.
Both the implementations are based on ROMs, which suffer
I. I NTRODUCTION from the aforementioned deficiencies. The most recent RNS sign
Residue number system (RNS) is gaining increasing popularity detectors [16], [17] were designed for {2n 1, 2n , 2n+1 1}.
in the VLSI implementation of application-specific digital signal Although very efficient standalone, its use is limited as the efficient
processors (DSPs). This is in part due to its ability to accelerate and reverse converter, and scaler required for a complete system imple-
to reduce the power consumptions of crucial and frequently used data mentation has not been reported for this new moduli set. The only
path operations by subword-level parallelism and modularity, and in adder-based sign detection circuits for {2n 1, 2n , 2n + 1} detect the
part due to the ease of realizing modulo operations using the moduli sign by either full [12] or partial reverse conversion into the binary
of the forms 2n and 2n 1. Modular 2n 1 arithmetic properties domain using the new Chinese remainder theorem (CRT)-I [18] and
have been exploited with arithmetic structures, such as diminished-1, CRT-II [19], or through the mixed radix coefficients of MRC-II [20]
sparse carry chain, KoggeStone adder, and so on, in [1][4] to reduce and MRC [21].
the implementation complexity of modulo addition, subtraction, and In this brief, an alternative efficient sign detection algorithm
multiplication for these special moduli to an extent that is comparable for {2n 1, 2n , 2n +1} is proposed. The proposed technique exploits
with their twos complement number system counterparts. These the new CRT-I to greatly simplify the 22n 1 scaling of residue
advancements have given rise to the extensive use and continual representation into addends that can be readily obtained by circular
successes in exploiting the balanced three moduli set {2n 1, left shifted residues at no logic cost. The reduced dynamic range
2n , 2n + 1} for the implementation of many new and existing enables the sign of an integer to be computed directly from the most
DSP algorithms, including fast Fourier transform, discrete wavelet significant bit (MSB) of the scaled residues with a heavily stripped-
transform, finite and infinite impulse response filters, and digital down version of a reverse converter. A side benefit of our proposed
image processing [5][9]. In fact, the difficulties associated with the sign detector is that it can also be used as a scaler. The rest of this
implementation of nonmodular operations, such as scaling and reverse brief is organized as follows. Section II presents the fundamentals
conversion from residue-to-binary representation, have largely been of signed integer representation in RNS and the notations used.
resolved for this three moduli set [10][12]. The proposed sign detection algorithm for {2n 1, 2n , 2n + 1} is
Notwithstanding the hardware efficiency of its individual residue introduced in Section III. Its architectural simplification and circuit
arithmetic operations, as well as its forward and reverse converters, implementation are detailed in Section IV. The circuit complexity
some fundamental operations, such as sign detection, magnitude is analyzed, and its synthesis results are compared against existing
comparison, and overflow detection, for this moduli set remain slow adder-based sign detectors for the same moduli set of different
and expensive. These operations are difficult to parallelize as they dynamic ranges in Section V. The brief is concluded in Section VI.
require the combination of multiple residue values to compute. II. P RELIMINARIES AND N OTATIONS
To reduce the computational complexity, lookup tables are often
RNS is characterized by a set of N coprime numbers, known as
used to store the precomputed orthogonal projections of the numbers
the moduli set {m 1 , m 2 , . . . , m N }, i.e., GCD(m i , m j ) = 1i = j .
of interest. Unfortunately, memory-based approaches are difficult to
Any integer X can be represented by an N-tuple (x1 , x2 , . . . , x N ) in
Manuscript received September 17, 2015; revised December 2, 2015; this moduli set. Each residue xi is the least nonnegative remainder
accepted January 6, 2016. Date of publication January 26, 2016; date of computed by dividing X by the modulus m i , which can be expressed
current version June 23, 2016. mathematically as xi = |X|m i for i = 1, 2, . . . , N. The product
The authors are with the School of Electrical and Electronic N
Engineering, Nanyang Technological University, Singapore 639798 (e-mail:
of all moduli is called the dynamic range M, i.e., M = i=1 mi .
sachinku001@e.ntu.edu.sg; echchang@ntu.edu.sg). Any integer X that lies within 0 X < M will have a unique residue
Digital Object Identifier 10.1109/TVLSI.2016.2516522 representation.
1063-8210 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO. 7, JULY 2016 2609

An integer X within the dynamic range can be recovered from its


residue representation (x1 , x2 , . . . , x N ) by applying the CRT [5]
 
N 
  1   
X =  Mi  Mi m xi   (1)
i=1 i mi 
M

where Mi = M/m i and |Mi1 |m i is the multiplicative inverse Fig. 1. Mapping of the half ranges of integer X in [0, M) to the half ranges
of |Mi |m i . of its scaled integer Y in [0, M  ).
To represent a signed integer X in RNS, M is divided into two
symmetrical half ranges for the representation of positive and negative
integers. When M is even, the range of signed integers that can be By shrinking the dynamic range from M = 23n 22n to M  = 2n ,
unambiguously represented in RNS is [M/2, M/2 1]. Similarly, its half range can be easily detected from the MSB of the scaled
for odd M, the range of unambiguously representable signed integers integer Y .
in RNS is [(M 1)/2, (M 1)/2]. The signed integer X can be This new concept of sign detection in {2n 1, 2n , 2n + 1} can be
represented using the same residue representation as an unsigned made very efficient provided that scaling by 22n 1 as well as the
integer X for the same moduli set. The relationship between X and X reverse conversion of the scaled residues into Y can be computed
is given as follows: efficiently from the residues x1 , x2 , and x3 . As only the MSB
    of Y is needed for the sign detection of X, a full reverse conversion
 M  M

X =  X + . (2) from (x1 , x2 , x3 ) is not required.
2 M 2 To simplify the scaling by 22n 1 in the residue domain, the new
When X 0, the residue representation of X can be mapped to that CRT [23], also known as CRT-I, is used to convert X into a weighted
of X in the range of [0, M/2 1] if M is even and [0, (M 1)/2] sum of its residues modulo 22n 1. According to CRT-I
if M is odd. In a similar way, when X < 0, the residue representation X = x3 + m 3 |k1 (x1 x3 ) + k2 m 1 (x2 x1 )|m 1 m 2 (7)
of X can be mapped to that of X in the range of [M/2, M 1]
if M is even and [(M + 1)/2, M 1] if M is odd [22]. Thus, the where k1 m 3 = |1|m 1 m 2 and k2 m 3 m 1 = |1|m 2 .
sign of X can be detected as follows. With m 1 = 2n 1, m 2 = 2n , and m 3 = 2n + 1, we have
When M is even X = x3 + (2n + 1)|k1 (x1 x3 ) + k2 (2n 1)(x2 x1 )|2n (2n 1) .

0, if X [0, (M/2) 1] (8)
sign( X) = (3)
1, if X [M/2, M 1].
It can be proved that the multiplicative inverses of |2n + 1|2n (2n 1)
When M is odd
 and |22n 1|2n are given by k1 = 22n1 (2n 1) and k2 = 1,
0, if X [0, (M 1)/2] respectively. These closed form expressions of k1 and k2 are proved
sign( X) = (4)
1, if X [(M + 1)/2, M 1]. as follows.
Properties 1 and 2 [5] are employed in order to simplify some Proof of k1 = 22n1 (2n 1):
arithmetic operations in the derivation of our proposed sign detection |k1 (2n + 1)|2n (2n 1) = |[22n1 (2n 1)](2n + 1)|2n (2n 1)
circuit for RNS {2n 1, 2n , 2n + 1}.
= |22n1 (2n + 1) (22n 1)|2n (2n 1)
Property 1: The modulo 2n 1 multiplication of an n-bit binary
number x and r exponent of two is equivalent to a circular left = |23n1 22n1 + 1|2n (2n 1)
shift (CLS) of the binary bits of x by r positions = |22n1 (2n 1) + 1|2n (2n 1) = 1.
|2r x|2n 1 = CLSn (x, r ) (5) Proof of k2 = 1:
where CLSn (x, r ) represents the circular shift of an n-bit binary |k2 (22n 1)|2n = |1 (22n 1)|2n = |22n + 1|2n = 1.
number x by r bits to the left.
Property 2: As a corollary of Property 1 Substituting the values of k1 and k2 into (8), we have
 2n1 
 {2 (2n 1)}(x1 x3 ) 
|2r x|2n 1 = |2r (2n 1 x)|2n 1 = |2r x|2n 1 = CLSn (x, r ) X = x3 + (2n + 1)   n n
(2n 1)(x2 x1 ) 2 (2 1)
(6)  2n1 
2 (x1 x3 ) 
n
= x3 + (2 + 1)    . (9)
where x is the ones complement of integer x. (2n 1)(x2 x3 ) 2n (2n 1)

III. P ROPOSED S IGN D ETECTION A LGORITHM By scaling X by 22n 1, the scaled integer Y can be obtained by
FOR RNS {2n 1, 2n , 2n + 1}      n

X x3 2 +1 Z
Y = = + (10)
Let (x1 , x2 , x3 ) be the residue representation of an integer X with 22n 1 22n 1 22n 1
respect to the moduli set {m 1 , m 2 , m 3 } = {2n 1, 2n , 2n + 1}.
Since the dynamic range M of this moduli set can be factored into where Z = |22n1 (x1 x3 ) (2n 1)(x2 x3 )|2n (2n 1) .
2n and 22n 1, the sizes of the modulo operations required for Since x3 [0, 2n ], x3 < 22n 1. Therefore (x3 /22n 1) = 0,
detecting the sign of X from its equivalent residue representation and Y can be written as
 n   
of X can be substantially reduced by scaling (x1 , x2 , x3 ) (2 + 1)Z Z
in the residue domain by 22n 1. This will map the lower half Y = =
22n 1 2n 1
range [0, 23n1 2n1 ) of X to the lower half range [0, 2n1 ) of the 
|22n1 (x1 x3 ) (2n 1)(x2 x3 )|2n (2n 1)
scaled integer Y and the upper half range [23n1 2n1 , 23n 2n ) = . (11)
(2n 1)
of X to the upper half range [2n1 , 2n ) of Y , as shown in Fig. 1.
2610 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO. 7, JULY 2016

As (|x|m 1 m 2 /m 1 ) = |(x/m 1 ) |m 2 from [11], (11) can be


rewritten as
 
 22n1 (x x ) (2n 1)(x x ) 
 1 3 2 3 
Y = 
 (2n 1)  n
  2
 22n1 (x x ) 
 1 3 
= + x3 x2  . (12)
 2n 1  n
2

Let H = 22n1 (x1 x3 ). Since H = m(H/m) + |H |m for any


integer H and m, we have
 
H
H = (2n 1) + |H |2n 1 . (13)
2n 1 Fig. 2. Generation of carry-in signal cin .
Taking mod 2n operation on both the sides of (13), we have
  
 H 
|H |2n = (2n 1) n  + ||H |2n 1 |2n . (14)
2 1  2n

Since |H |2n = |22n1 (x1 x3 )|2n = 0 and |2n 1|2n = 1


 
 H 
 
 2n 1  n = ||H |2n 1 |2n . (15)
2
Substituting (15) into (12), we have

Y = ||H |2n 1 + x3 x2 |2n


= ||22n1 (x1 x3 )|2n 1 + x3 x2 |2n . (16)

If Y [0, 2n1 ), X falls in the lower half range of M and


(x1 , x2 , x3 ) represents a positive integer, i.e., X 0. Otherwise,
if Y [2n1 , 2n ), X falls in the upper half range of M and Fig. 3. Example of the generation of the carry signal C1 and v for n = 8.
(x1 , x2 , x3 ) represents a negative integer, i.e., X < 0.

Hence, ||u 1 + u 2 |2n 1 |2n = |u 1 + u 2 + cin |2n , where cin {0, 1}.
IV. H ARDWARE I MPLEMENTATIONS
As |x2 |2n = 2n x2 = x2 + 1, (17) can be written as
The residues x1 , x2 , and x3 can be represented in a binary form
as x1 = x1,n1 x1,n2 . . . x1,0 , x2 = x2,n1 x2,n2 . . . x2,0 and Y = |u 1 + u 2 + cin + x3 + x2 + 1|2n . (21)
x3 = x3,n x3,n1 . . . x3,0 , respectively, where xi, j denotes the j th bit
The generation of the carry-in signal cin is shown in Fig. 2. The
of the residue xi . The binary vectors of x1 and x2 are of n bits but
condition u 1 + u 2 2n is detected by C1 = 1 and the signal C1 can
the binary vector of x3 is of n + 1 bits. In (16), one of the terms
be generated by parallel prefix operators [2]. As an example, the
in the modulo 2n 1 sum involves the operation |22n1 x3 |2n 1 ,
carry signal C1 for n = 8 can be generated by the circuit shown
which cannot be directly implemented by Property 2, since x3 has
in Fig. 3. The condition u 1 +u 2 = 2n 1 = 11
. 
. . 11 can be detected
n + 1 bits. To apply the CLS property on the ones complement of x3
n
as in (6), x3 is expressed as x3 = 2n x3,n + x3,n1 x3,n2 . . . x3,0 .
by C2 = 1. C2 is generated by w v, where w = n1 i=0 gi and
Since |2n x3,n |2n 1 = x3,n , the MSB x3,n of x3 can be logically
OR with x 3,0 to form an n-bit binary vector x 3 = |x 3 |2n 1 =
v = pn1:0 = n1 p
i=0 i , where denotes a logical AND operator. The
 | n , where x  = x
|x3,n1 x3,n2 . . . x3,0 signals gi and pn1:0 have already been generated in the computation
2 1 3,0 3,0 x3,n and denotes
a logical OR operator. of C1 . Consequently, the condition u 1 + u 2 2n 1 for cin = 1 can
|H |2n 1 in (16) can then be implemented using the CLS operations be detected by
of Properties 1 and 2 to obtain cin = C1 C2 . (22)
Y = ||u 1 + u 2 |2n 1 + x3 x2 |2n (17) The two addends, u 2 and x3 , in (21) can be further simplified as
where follows:

u 1 = |22n1 x1 |2n 1 = CLSn (x1 , 2n 1) = x1,0 x1,n1 . . . x1,1 |u 2 + x3 |2n = ||2u


 2 |2 u 2 + |x3 |2 |2
n n n

    
 
1 n1  
(18) =  x3,n1 x3,n2 . . . x3,1 0 +|x3 |2n u 2 

   
  n
u 2 = |22n1 x3 |2n 1 = CLSn x3 , 2n 1 = x3,0
 x
3,n1 . . . x 3,1 . 
n 2 
    
 
1 n1  
(19) =  x3,n1 x3,n2 . . . x3,1 x3,0 x3,0 + |x3 |2n u 2 
   
 n

2n
The term |u 1 + u 2 |2n 1 can be expressed as = ||x3 |2n + |x3 |2n u 2 x3,0 |2n

|u 1 + u 2 + 1|2n , if u 1 + u 2 2n 1 = |2n 1 u 2 x3,0 |2n = |u 2 x3,0 |2n
|u 1 + u 2 |2n 1 = (20)  
u1 + u2, otherwise. =  x  x3,n1 x3,n2 . . . x3,1 x3,0  n . (23)
3,0 2
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO. 7, JULY 2016 2611

Fig. 4. Proposed sign detection architecture for {2n 1, 2n , 2n + 1}.

 =x
When x3,n = 0, x3,0 3,0 0 = x3,0 . Then Fig. 5. Simplified prefix adder for n = 8.
|u 2 + x3 |2n = |x3,0 x3,n1 x3,n2 . . . x3,1 x3,0 |2n . (24)
When x3,n = 1, since x3 [0, 2n ], x3,n1 x3,n2 . . . x3,0 =
00 . . . 0. Hence, x3,0 =x
3,0 x3,n = 1 and
 
 
 

|u 2 + x3 |2 = 100
n . . . 0 1 = 011 . . . 1
   n 
n n
 2 
 
 
 
=  x3,0 x3,n x3,n . . . x3,n x3,0 + x3,n 
   
 n
 n Fig. 6. Computation of Y for Example 1.
  2
 
 
  the residues are x1 = 111102 , x2 = 001112 , and x3 = 0110102 .
=  x3,n 00 . . . 0 x3,0  . (25)
    According to (18), (19), and (27), u 1 = 011112 , u 2 = 100102 ,
 
n 2 n and u 3 = 011012 . Also, x3,0 = 0. Since u 1 + u 2 = 01111 +
To satisfy both (24) and (25) 10010 = 33 > 32, C1 = 1. Since 33 = 31, C2 = 0. According
to (22), cin = C1 C2 = 1. The computation of Y in (29) is
|u 2 + x3 |2n = |u 3 x3,0 |2n (26) illustrated in Fig. 6. Since MSB of Y = 1, the integer X represented
where the n-bit binary vector u 3 is given by by (30, 7, 26) is negative.

u 3 = (x3,0 x3,n )x3,n1 x3,n2 . . . x3,1 . (27)


V. E VALUATION AND C OMPARISON
Substituting (26) into (21), we have The proposed architecture in Fig. 4 consists of an n-bit CSA,
Y = |u 1 + u 3 + x2 + cin + 1 x3,0 |2n . (28) a cin generator, and a simplified prefix adder. Based on the unit-
gate model [24], the area and time complexity of our design are
If x3,0 = 1, 1 x3,0 = 1, and if x3,0 = 0, 1 x3,0 = 0. Hence, estimated to be 18n 7 and 2log2 n + 9 units of equivalent two-input
the term 1 x3,0 in (28) can be replaced by x3,0 and gates, respectively, which indicates that the area increases linearly
while the delay increases logarithmically with the size of its input
Y = |u 1 + u 3 + x2 + cin + x3,0 |2n . (29)
moduli n. Due to the potentially high error margin of the simplistic
The sign of X can be detected by the MSB of Y . An n-bit carry gate count and gate delay model, for a more accurate comparison with
save adder (CSA) can be used to add the three n-bit operands, u 1 , u 3 , other designs, all the designs in comparison are described in Verilog
and x2 , to produce an n-bit sum A = an1 an2 . . . a0 and an n-bit HDL and synthesized for minimum delay by Synopsys Design
carry vector B = bn1 bn2 . . . b1 0. Due to the modulo 2n addition, Compiler version E-2010.12-SP4 using the same STM CMOS 65-nm
the final carry output bit bn of the CSA need not be generated. LPGP 0.9 V standard cell library. The designs in comparison
As b0 = 0, it can be replaced by x3,0 of (29) before the MSB includes CRT-I-based [18], CRT-II-based [19], MRC-II-based [20],
of Y is computed by a simplified parallel prefix adder of A and B MRC-based [21], and the most efficient reverse converter-based [12]
with the input carry bit c1 = cin . The prefix adder is simplified sign detectors for {2n 1, 2n , 2n + 1}. The latter is reduced by
by keeping only the carry generation network for the computation discarding all logic gates except only those that contribute to the
of carry signal cn1 , from which the sign of X can be detected by calculation of sign bit. The synthesized areas in m2 and delays
sign( X) = an1 bn1 cn1 . The architecture of the proposed in ns for n = 5, 10, 15, 20, and 25 were tabulated in Table I. The
sign detector is shown in Fig. 4, where the circuit diagram for the results show that our sign detector is the most areatime efficient. On
simplified prefix adder is depicted in Fig. 5 for n = 8. average, it consumes 36.5%, 8.6%, 14.7%, 54.9%, and 39.6% lesser
Example 1: For n = 5, {m 1 , m 2 , m 3 } = {31, 32, 33}, area than [12] and [18][21]. It is also 32.8%, 7.8%, 8.6%, 52.1%,
M = 31 32 33 = 32 736, and M/2 = 16 368. The signed and 35.3% faster than [12] and [18][21] respectively.
integer X = 11 161 can be represented by the residue representation Monte Carlo power simulation [25] is also performed on each
(x1 , x2 , x3 ) = (30, 7, 26) corresponding to the unsigned integer design by Synopsys Prime Time PX, where randomly generated input
X = 21 575 in the same moduli set. The binary representation of vectors are applied until the error in the calculated average power is
2612 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO. 7, JULY 2016

TABLE I [3] H. T. Vergos, A family of area-time efficient modulo 2n + 1 adders, in


C OMPARISON OF A REA (m2 ) AND D ELAY (ns) OF S IGN Proc. IEEE Comput. Soc. Annu. Symp. VLSI, Lixouri, Kefalonia, Greece,
D ETECTION C IRCUITS FOR {2n 1, 2n , 2n + 1} Jul. 2010, pp. 442443.
[4] L.-S. Didier and L. Jaulmes, Fast modulo 2n 1 and 2n +1 adder using
carry-chain on FPGA, in Proc. Asilomar Conf. Signals, Syst., Comput.,
Pacific Grove, CA, USA, Nov. 2013, pp. 11551159.
[5] N. S. Szab and R. I. Tanaka, Residue Arithmetic and Its Applications
to Computer Technology. New York, NY, USA: McGraw-Hill, 1967.
[6] R. Conway and J. Nelson, Improved RNS FIR filter architectures,
IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 51, no. 1, pp. 2628,
Jan. 2004.
[7] E. Vassalos, D. Bakalis, and H. T. Vergos, RNS assisted image filtering
and edge detection, in Proc. IEEE 18th Int. Conf. Digit. Signal Process.,
Fira, Santorini, Greece, Jul. 2013, pp. 16.
[8] J.-C. Bajard, L.-S. Didier, and T. Hilaire, -direct form transposed and
residue number systems for filter implementations, in Proc. IEEE 54th
Int. Midwest Symp. Circuits Syst., Seoul, Korea, Aug. 2011, pp. 14.
[9] Y. Liu and E. M.-K. Lai, Design and implementation of an RNS-based
2-D DWT processor, IEEE Trans. Consum. Electron., vol. 50, no. 1,
pp. 376385, Feb. 2004.
[10] T. F. Tay, C.-H. Chang, and J. Y. S. Low, Efficient VLSI implementation
TABLE II of 2n scaling of signed integer in RNS {2n 1, 2n , 2n +1}, IEEE Trans.
Very Large Scale Integr. (VLSI) Syst., vol. 21, no. 10, pp. 19361940,
C OMPARISON OF AVERAGE L EAKAGE (LP) IN W AND TP IN mW OF Oct. 2013.
S IGN D ETECTION C IRCUITS FOR {2n 1, 2n , 2n + 1} [11] C.-H. Chang and J. Y. S. Low, Simple, fast, and exact RNS scaler for
the three-moduli set {2n 1, 2n , 2n +1} set, IEEE Trans. Circuits Syst.
I, Reg. Papers, vol. 58, no. 11, pp. 26862697, Nov. 2011.
[12] Z. Wang, G. A. Jullien, and W. C. Miller, An improved residue-to-
binary converter, IEEE Trans. Circuits Syst. I, Fundam. Theory Appl.,
vol. 47, no. 9, pp. 14371440, Sep. 2000.
[13] H. Brnnimann, I. Z. Emiris, V. Y. Pan, and S. Pion, Computing exact
geometric predicates using modular arithmetic with single precision, in
Proc. 13th Annu. Symp. Comput. Geometry, Nice, France, Jun. 1997,
pp. 174182.
[14] Z. D. Ulman, Sign detection and implicit-explicit conversion of num-
bounded below 3% with 99% confidence interval. The clock rate bers in residue arithmetic, IEEE Trans. Comput., vol. 32, no. 6,
for the input data is chosen based on the slowest design for each pp. 590594, Jun. 1983.
value of n. The average leakage power (LP) and total power (TP) are [15] T. Van Vu, Efficient implementations of the Chinese remainder theorem
compared in Table II. The proposed design saves 29.3%, 5.2%, 9.7%, for sign detection and residue decoding, IEEE Trans. Comput., vol. 34,
no. 7, pp. 646651, Jul. 1985.
58.3%, and 38.3% of power over [12] and [18][21] respectively. [16] M. Xu, Z. Bian, and R. Yao, Fast sign detection algorithm for the
RNS moduli set {2n+1 1, 2n 1, 2n }, IEEE Trans. Very Large Scale
VI. C ONCLUSION Integr. (VLSI) Syst., vol. 23, no. 2, pp. 379383, Feb. 2015.
[17] C. V. Niras and Y. Kong, Fast sign-detection algorithm for residue
In this brief, a new sign detection algorithm for RNS {2n 1, 2n , number system moduli set {2n 1, 2n , 2n+1 1}, IET Comput. Digit.
2n + 1} is proposed, which leads to a high-speed and area-efficient Techn., pp. 15, Sep. 2015. DOI: 10.1049/iet-cdt.2015.0050
adder-based implementation. Our experimental results show that the [18] M. Xu, R. Yao, and F. Luo, Low-complexity sign detection algorithm
proposed circuit is on average 8.6%, 14.7%, 54.9%, 39.6%, and for RNS {2n 1, 2n , 2n + 1}, IEICE Trans. Electron., vol. E95-C,
36.5% smaller, 7.8%, 8.6%, 52.1%, 35.3%, and 32.8% faster, and no. 9, pp. 15521556, Sep. 2012.
[19] T. Tomczak, Fast sign detection for RNS (2n 1, 2n , 2n + 1), IEEE
5.2%, 9.7%, 58.3%, 38.3%, and 29.3% more power efficient than Trans. Circuits Syst. I, Reg. Papers, vol. 55, no. 6, pp. 15021511,
the latest CRT-I, CRT-II, MRC-II, MRC, and stripped-down reverse Jul. 2008.
converter-based sign detectors, respectively. [20] M. Akkal and P. Siy, Optimum RNS sign detection algorithm using
MRC-II with special moduli set, J. Syst. Archit., vol. 54, no. 10,
pp. 911918, Oct. 2008.
ACKNOWLEDGMENT [21] L. Sousa and P. Martins, Efficient sign identification engines for integers
The authors would like to thank Prof. L. Sousa for sharing represented in RNS extended 3-moduli set {2n 1, 2n+k , 2n + 1},
his VHDL codes, as well as the helpful clarification and valuable Electron. Lett., vol. 50, no. 16, pp. 11381139, Jul. 2014.
[22] C.-H. Chang and S. Kumar, Area-efficient and fast sign detection for
discussion with regards to the synthesis results of their design with four-moduli set RNS {2n 1, 2n , 2n + 1, 22n + 1}, in Proc. IEEE Int.
a different cell library in our laboratory. Symp. Circuits Syst. (ISCAS), Melbourne, VIC, Australia, Jun. 2014,
pp. 15401543.
R EFERENCES [23] Y. Wang, New Chinese remainder theorems, in Proc. Conf. Rec.
32nd Asilomar Conf. Signals, Syst., Comput., Pacific Grove, CA, USA,
[1] R. Muralidharan and C.-H. Chang, Area-power efficient modulo 2n 1 pp. 165171, Nov. 1998.
and modulo 2n + 1 multipliers for {2n 1, 2n , 2n + 1} based RNS, [24] A. Tyagi, A reduced-area scheme for carry-select adders, IEEE Trans.
IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 59, no. 10, pp. 22632274, Comput., vol. 42, no. 10, pp. 11631170, Oct. 1993.
Oct. 2012. [25] R. Burch, F. N. Najm, P. Yang, and T. N. Trick, A Monte Carlo approach
[2] H. T. Vergos and G. Dimitrakopoulos, On modulo 2n +1 adder design, for power estimation, IEEE Trans. Very Large Scale Integr. (VLSI) Syst.,
IEEE Trans. Comput., vol. 61, no. 2, pp. 173186, Feb. 2012. vol. 1, no. 1, pp. 6371, Mar. 1993.

Вам также может понравиться