Академический Документы
Профессиональный Документы
Культура Документы
2, FEBRUARY 2003
Authorized licensed use limited to: UNIVERSIDAD DE ALICANTE . Downloaded on February 18,2020 at 23:38:54 UTC from IEEE Xplore. Restrictions apply.
HUANG AND WANG: HIGH-PERFORMANCE AND POWER-EFFICIENT CMOS COMPARATORS 255
Authorized licensed use limited to: UNIVERSIDAD DE ALICANTE . Downloaded on February 18,2020 at 23:38:54 UTC from IEEE Xplore. Restrictions apply.
256 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 2, FEBRUARY 2003
Note that although the above design concept is similar to that from to . For example, if , then the
described in [1], the implementation details are quite different. signal will be used to turn off the discharging paths
We will elaborate these details shortly. for and . Therefore,
outputs and will be kept
at “0.” The values of and depend
III. BASIC DESIGN TECHNIQUES
on and . For example, if , nodes and
When implementing the comparator in CMOS technology, will be evaluated as logic 0 and 1, while nodes
we found that the priority encoder and those AND gates in Fig. 2 and will be kept at logic 1 and 0, respectively. On
can be merged into a functional block, called the magnitude the other hand, if , neither node nor node have
decision module (MDM). With the MDM, the block diagram discharging path because transistor is turned off. Then,
of a 4-b comparator is revised as shown in Fig. 3(a). The circuit outputs and stay in the precharged
for generating EQUAL will not be shown hereafter because it state. At the same time, relinquishes the control
is not in the critical path. The MDM implements the functions and the rest of the circuit functions as if there are only three
listed below, and it is designed as the circuit shown in Fig. 3(b). inputs, , , and .
The schematic of the 4-b comparator [Fig. 3(a)] is shown in
Fig. 4. The circuit follows the domino logic style [9] and, hence,
the necessary inversion function is moved to the input terminal
and implemented via static CMOS circuits. On the other hand,
the OR function is implemented by a dynamic NOR gate plus a
NOT gate, and placed after the dynamic MDM circuit.
Although we can derive an MDM with more than four inputs
in the same way as (2), the circuit becomes too complicated to
achieve high speed. Thus, instead, we employ the concept of
multilevel lookahead proposed for the priority encoder [6] to
(2) design comparators with more than four input bits. The concept
of multilevel lookahead is illustrated with the aid of the block
The circuit in Fig. 3(b) is derived from the priority encoder we diagram of a 16-b comparator in Fig. 5(a), and the schematic
proposed in [6]. We also adapt the MODL style [7] to reduce diagram of the modified 4-b comparator macro PEBCLA4b is
circuit complexity and increase operating speed. shown in Fig. 5(b).
The circuit in Fig. 3(b) operates as follows. When the clock In addition to the input/output (I/O) signals shown in Fig. 4,
signal clk goes low, the circuit enters the precharging phase and the new 4-b comparator macro needs an extra input look-ahead
the output nodes and signal and an extra output look-ahead signal . As il-
are precharged to 0. When clk goes high, the circuit enters lustrated in Fig. 5(a), the in the th macro is connected to
the evaluation phase. For , the priority descends the in the th macro, except that the in the least
Authorized licensed use limited to: UNIVERSIDAD DE ALICANTE . Downloaded on February 18,2020 at 23:38:54 UTC from IEEE Xplore. Restrictions apply.
HUANG AND WANG: HIGH-PERFORMANCE AND POWER-EFFICIENT CMOS COMPARATORS 257
(a)
(b)
Fig. 5. (a) Block diagram of a 16-b comparator. (b) Schematic diagram of the macro PEBCLA4b.
significant macro should be tied to directly. The following As described in [6], and in (3) realize
equations describe the functions of Fig. 5(b). the first-level look-ahead mechanism because all these func-
tions are flattened without iteration and finished with one gate
delay. On the other hand, the circuits enclosed in the gray areas
of Fig. 5(b) realize the second-level look-ahead mechanism
because the signal is generated only with a domino-gate
delay. The look-ahead signals are used to connect different
macros to shorten the critical path.
Authorized licensed use limited to: UNIVERSIDAD DE ALICANTE . Downloaded on February 18,2020 at 23:38:54 UTC from IEEE Xplore. Restrictions apply.
258 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 2, FEBRUARY 2003
Authorized licensed use limited to: UNIVERSIDAD DE ALICANTE . Downloaded on February 18,2020 at 23:38:54 UTC from IEEE Xplore. Restrictions apply.
HUANG AND WANG: HIGH-PERFORMANCE AND POWER-EFFICIENT CMOS COMPARATORS 259
1) The first and second pipeline stages of the 64-b com- 2) When goes low, the macro cells in the first pipeline
parator utilize the same 8-b macro PEB8b. However, the stage enter the precharge phase and the evaluated results
macros in the first pipeline stage accept the clock signal are latched in the N-C MOS latches. These outputs are
, but the macro in the second pipeline stage accepts also fed into the corresponding inputs of the macro in the
the clock signal . Therefore, when goes high, the second pipeline stage for obtaining the final comparison
macro cells in the first pipeline stage enter the evaluation result.
phase and the macro cell in the second pipeline stage en- 3) Both stages have the same critical path, i.e., the 8-b com-
ters the precharge phase. parator. Because the critical paths of both stages are short-
Authorized licensed use limited to: UNIVERSIDAD DE ALICANTE . Downloaded on February 18,2020 at 23:38:54 UTC from IEEE Xplore. Restrictions apply.
260 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 2, FEBRUARY 2003
TABLE I
COMPLEXITY COMPARISON OF TWO 64-b COMPARATORS
TABLE II
POST-LAYOUT SIMULATION RESULTS OF TWO DIFFERENT 64-b COMPARATORS
design, both stages have the same critical path, i.e., the 8-b com-
parator. Then, we only need to characterize the critical path
delay of the 8-b comparator macro, which is the sum-
(a)
mation of the delay of the static XOR gate and the eval-
uation delay of the dynamic gate . Note that the output
of the static XOR gate must be stable before the dynamic gate
entering the evaluation phase. This means that can be
viewed as the setup time of the dynamic circuit. The minimal
cycle time will be twice of , and the maximal operating
frequency will be . Analysis shows that we can
apply the pattern ( , ) to
trigger the longest signal propagation path. The timing chart of
the 8-b macro PEB8b is illustrated in Fig. 9.
As mentioned above, the new comparator finishes each com-
parison in just one clock cycle, while the conventional 64-b
(b) comparator takes three clock cycles to finish the task. Similar to
Fig. 8. Layouts of (a) Wang et al.’s comparator [3]. (b) Proposed comparator. the new design, all stages in the conventional design also have
the same critical path, but each pipeline is a 2-b comparator in
ened and balanced, the operation speed of the comparator this case. Then, we only need to characterize the critical path
delay of the 2-b comparator macro. For a fair compar-
is improved significantly.
ison, we define the equivalent total delay time for each
operation to be six times of , and the equivalent max-
V. PERFORMANCE EVALUATION AND EXPERIMENTAL RESULTS imal operating frequency is defined to be . The
In order to verify the proposed techniques, a two-stage pattern ( ) is applied to trigger the longest
pipelined 64-b comparator is realized. To minimize the layout signal propagation path.
effort and layout area, we have all N-type transistors at the Post-layout simulation results are summarized in Table II.
pull-down network with the same transistor width instead of Power consumption listed in Table II is evaluated at the max-
ratioed design. We also enlarge the width of these transistors imum clock frequency. It shows that the proposed comparator
up to 5 m to reduce the pull-down delay. For example, the is 16% faster and consumes 79% less power as compared with
channel width of the transistors – in Fig. 7 are all 5 m. Wang et al.’s comparator [3]. For the new design, it is possible
The design is implemented based on a 3-V 0.6- m CMOS to trade the layout area and the power consumption for more
technology [10], which is the same as that used in the ANT speed advantages.
comparator [3]. The 64-b comparator based on Wang et al.’s The proposed 64-b comparator has been fabricated for per-
approach is also resimulated with the transistor sizes reported formance verification. Fig. 10 shows the test chip architecture
in [3]. However, for comparison purpose, the performance used to measure the delay time of the dynamic circuit .
comparison is based on the results of post-layout simulations This measurement method is commonly used in measuring
running at 3-V supply voltage. The layouts of both designs are delay time of dynamic circuits [13], [14]. The input clock signal
shown in Fig. 8, and the complexity information is listed in goes through the clock buffer first, and then proceeds in two
Table I. We found the transistor count of the new design is less paths. One goes through the comparator core, output buffer, and
than that required in the conventional design, while the layout reaches output pad. The other one only goes through the output
area of the new design is only nearly half of the conventional buffer to reach the output pad. Obviously, the only difference
design. This is mainly because the transistor size used in the between these two paths is the comparator core. Therefore,
new design is typically much smaller than that used in the we can measure the time between clock output signal Clk
previous design. and comparator output and get the delay time .
Before reporting the timing information, timing characteri- The photograph of the test chip is shown in Fig. 11(a) and
zation methods for both designs will be described. For the new measured waveforms with 160- and 50-MHz clocks are shown
Authorized licensed use limited to: UNIVERSIDAD DE ALICANTE . Downloaded on February 18,2020 at 23:38:54 UTC from IEEE Xplore. Restrictions apply.
HUANG AND WANG: HIGH-PERFORMANCE AND POWER-EFFICIENT CMOS COMPARATORS 261
in Fig. 11(b) and (c), respectively. Measured chip features and circuit technique to result in a compact comparator with high
post-layout simulation results are summarized in Table III. The performance. In implementation, the priority-encoding function
measured waveforms indicate that the delay time of the and the subsequent AND function are merged as an MDM, which
dynamic gate in the 8-b macro is 2.2 ns no matter which clock is realized in the MODL. Such a design not only improves the
rate is used, which completely matches with the simulation operating speed due to the reduced logic depth, but also makes
result. We cannot measure directly on the chip because the circuit compact and power efficient because fewer transis-
it is the set-up time in nature. However, according to the above tors are used. To efficiently shorten the critical path that lies in
measurement result, we have confidence that the experimental the MDM, multilevel look-ahead technique is adopted. To en-
result is very close to the simulation result. The maximal hance the operating speed further, the circuit is realized with a
operating frequency is measured around 180 MHz (not shown), latch-based two-stage pipelined structure, and the logic func-
which again agrees with the simulation. The measured power tions are partitioned into two parts, with each part executed in
consumption is also very close to the simulated result. half of the clock cycle in a delay-balanced manner. Post-layout
simulation results show that a 64-b comparator designed with
VI. CONCLUSION the proposed techniques in a 3-V 0.6- m CMOS technology is
16% faster, 50% smaller, and 79% more power efficient as com-
Design techniques for high-performance and power-efficient pared with the fastest conventional design. Measurement results
CMOS comparators are proposed. The design is based on the of the test chip confirm with simulation results and prove the
priority-encoding algorithm and utilizes the dynamic CMOS feasibility of the proposed techniques.
Authorized licensed use limited to: UNIVERSIDAD DE ALICANTE . Downloaded on February 18,2020 at 23:38:54 UTC from IEEE Xplore. Restrictions apply.
262 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 2, FEBRUARY 2003
REFERENCES
[1] M. M. Mano, Digital Design. Englewood Cliffs, NJ: Prentice-Hall,
1991, ch. 5.
[2] N. West and K. Eshraghian, Principles of CMOS VLSI De-
sign. Reading, MA: Addison-Wesley, 1993, ch. 8.
[3] C.-C. Wang, C.-F. Wu, and K.-C. Tsai, “1-GHz 64-b high-speed com-
parator using ANT dynamic logic with two-phase clocking,” Proc. Inst.
Elect. Eng. Comput. Digital Techn., vol. 145, no. 6, pp. 433–436, Nov.
1998.
[4] R. X. Gu and M. I. Elmasry, “All-N-Logic high-speed true-single-phase
dynamic CMOS logic,” IEEE J. Solid-State Circuits, vol. 31, pp.
221–229, Feb. 1996.
(a) [5] S. Furber, ARM System Architecture. Reading, MA: Addison-Wesley,
1997.
[6] J.-S. Wang and C.-H. Huang, “High-speed and low-power CMOS pri-
ority encoders,” IEEE J. Solid-State Circuits, vol. 35, pp. 1511–1514,
Oct. 2000.
[7] I. S. Hwang and A. L. Fisher, “Ultrafast compact 32-b CMOS adders in
multiple-output domino logic,” IEEE J. Solid-State Circuits, vol. 24, pp.
358–369, Apr. 1989.
[8] J.-S. Wang and C.-S. Huang, “A high-speed single-phase-clocked
CMOS priority encoder,” in Proc. IEEE Int. Symp. Circuit and Systems,
vol. 5, May 2000, pp. 537–540.
[9] R. W. Krambeck, C. M. Lee, and H.-F. S. Law, “High-speed compact
circuits with CMOS,” IEEE J. Solid-State Circuits, vol. SC-17, pp.
614–619, June 1982.
[10] “0.6-m CMOS ASIC process digests,” Taiwan Semiconductor Manu-
facturing Corp., Hsinchu, Taiwan, R.O.C., 1996.
[11] J. Park, H. C. Ngo, J. A. Silberman, and S. H. Dhong, “470 ps 64-b
parallel binary adder [for CPU chip],” in Symp. VLSI Circuits Dig. Tech.
Papers, 2000, pp. 192–193.
(b) [12] S. Naffziger, “A sub-nanosecond 0.5-m 64-b adder design,” in IEEE
Int. Solid-State Circuits Conf. Dig. Tech. Papers, 1996, pp. 362–363.
[13] R. Woo, S.-J. Lee, and H.-J. Yoo, “A 670-ps 64-b dynamic low-power
adder design,” in Proc. IEEE Int. Symp. Circuit and Systems, vol. 1, May
2000, pp. 28–31.
[14] G. Yee and C. Sechen, “Clock-delayed domino for dynamic circuit de-
sign,” IEEE Trans. VLSI Syst., vol. 8, pp. 425–430, Aug. 2000.
Authorized licensed use limited to: UNIVERSIDAD DE ALICANTE . Downloaded on February 18,2020 at 23:38:54 UTC from IEEE Xplore. Restrictions apply.