Академический Документы
Профессиональный Документы
Культура Документы
net/publication/3452301
Article in Circuits and Systems II: Express Briefs, IEEE Transactions on · March 2006
DOI: 10.1109/TCSII.2005.855734 · Source: IEEE Xplore
CITATIONS READS
16 354
4 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Shing Tenqchen on 09 January 2014.
INSTRUCTION
It is well known [1-3] that over the past two decades,
researches into asynchronous circuits have revealed the
possibility they can accomplish better average-case performance
than other types by removing the unnecessary overhead of the
globally synchronizing clock signal. The non-synchronizing (a) (b)
property of asynchronous circuits presents new attractions as an
alternative to synchronous circuits for certain applications. Figure 1. Pipelined protocols: (a) two-phase, (b) four-phase.
Circuit designers may investigate asynchronous solutions for
certain local function blocks in which various computing results
reveal large timing divergences, or, for certain applications in The two-phase handshaking protocol uses a transition on
local pipeline systems where timing considerations are also the request signal to signify “data ready to send”, as shown in
severely restricted. The Intel research group experimentally Figure 1(a). The receiver then uses a transition on the
implemented its asynchronous instruction-length decoder design, acknowledge signal to indicate capture completion. “Two-phase”
which outperformed the original synchronous circuits in the refers to rising and falling transitions, thus the two-phase
Pentium II by threefold, and which is famous in CPU protocol is also called “transition signaling”.
development history as RAPPID (Revolving Asynchronous
Pentium Processor Instruction Decoder) [1]. The four-phase handshaking protocol treats only rising
The concept of globally asynchronous locally synchronous transitions on the request signal as “data ready”, and only rising
(GALS) circuit extends the application of asynchronous circuits transitions on the acknowledge signal as “data captured”. Falling
to VLSI systems [2]. Advocators promote implementing an entire control signal transitions in the four-phase protocol are used to
VLSI system with isolated asynchronous modules, which are pre-charge the computing module, or to indicate that the
self-controlled by local internal clocks, and communicate with asynchronous module ready for the next computation. Normally,
other modules via a data-driven controller. Their asynchronous the four-phase protocol is easy to implement in GALS systems or
handshaking protocol not only keeps system stability robust and modular synchronous systems since most latches and decision
free from clock-skew problems, but also brings many potential cells used in asynchronous systems are level-sensitive, and there
advantages. Since the asynchronous modules are isolated by their is an extra period to reset/pre-charge modules. Thus, the implied
own data-driven controllers with various clock timings, wrapped timing concern in the four-phase protocol is not so severe. By
79
contrast, the two-phase protocol may be more efficient since it
eliminates the unnecessary control overhead of waiting for signal
transitions on the control bus, particularly for those asynchronous
modules without large timing variations between common and
rare cases. Of course, the key components in two-phase pipeline
systems are the latches and flip-flops, which capture data on
rising and falling control signal transitions.
80
transitions, as shown in Fig. 5. Actually, the width-to-length ratio also connected the input line to the Mp1 and Mn2 of the selector
of the input inverter determines whether our XNOR based shown in Fig. 6(b) so the Mp1 and Mn2 gate and channel
DETDFF can work normally, and how low its supply voltage can capacitors are pre-charged prior to the control signal because the
be. timing requirement of asynchronous logic is that input data be
ready for next stage when the control signal arrives.
(a)
Figure 4. Alternative XOR/XNOR based on the capture pass
(transmission path) concept.
In deciding on whether to get complementary pulses from a Figure 6. Transparent latch: (a) functional diagram, (b) transistor-
single output of the XNOR gate with an added inverter or from a level.
pair of differential outputs from XOR and XNOR, we choose the
single output and removed Mn3. Thus, we cut off an undesired
current path, and made the width-to-length ratio of the input TWO-PHASE PIPELINE SYSTEM
inverter fit our design criterion, as mentioned above. We also
added inverters as output buffers to get better driving capacity. The simplified system
The remaining issue was the mismatch between the “clk” and the
“Iclk”, since the “Iclk” signal follows the “clk” signal. Noting the The two-phase asynchronous control system used in our
result discussed in [12], we enlarged the width-to-length ratio of simulation was modified from Sutherland’s micro-pipeline,
the “Iclk” inverter, which reduced the mismatch between the which is well discussed in [7, 14], and shown in Fig. 7. Only five
clocks due to different driving capacities and different loads. logic components are used in this system, C-element, delay
We also needed a transparent latch with small input capacitance. buffer, inverter, XNOR gate, and transparent latch. The XNOR
The latch used in our proposed DET flip-flop is a common gate and transparent latch work as double edge-triggered flip-
transparent latch made up of a latching inverter pair and an input flops (DET-FF), as discussed above. In our simulation, we treated
selector (multiplexer), as shown in Fig. 6(a). Normally, the time the logic cells between stages as a simple signal path, which
required to break the latch status between the inverter pair means the entire system worked as a simple FIFO (first-in-first-
dominates the entire transient time of a transparent latch. To out) operation. In practical applications, those logic cells may be
avoid reaction from a weak inverter, the channel resistance of the asynchronous modules or computing cells.
input selector must be much smaller than that of the inverter. In The transition of the “request” signal from the left stage means
our design, we set the width-to-length ratio of the weak inverter that the left-stage data is ready, and will enable the C-element to
to one-fifth of the input-selection gates. However, as noted in fire the DETDFF to capture the data after the acknowledge signal
[12], the larger gate and diffusion capacitances incurred by from the right C-element arrives. In the meantime, the firing
longer channel lengths postpone responses. Thus, we modified signal sent from this C-element resets the left-stage C-element to
our transparent latch by implementing active loads on the weak wait for the next “request” signal. This firing signal is also the
inverter to reduce propagation delays during transitions. This “request” signal for the right C-element after passing the delay
active-load concept is also discussed in [12-13] as “skinny buffer. Obviously, there may be racing problems or
current limit”, and “pre-charging method”. That is the pre- synchronizing failures in such a simple two-phase control system.
charged active load provides the major channel resistance, and If the delay time provided by the delay buffer is not long enough
permits the smallest portion of the original weak inverter to to allow the logic cells to reflect the new data before enabling the
perform the defined latch function, as shown in Fig. 6(b). We transparent latch of the next stage, the next stage may capture
81
unpredictable logic states, even the previous logic state. falling times of a simple inverter. Because the originally
designed width-to-length ratio of the MOS-style DET-FF [8]
could not work normally in our simulative CMOS process, we
doubled the NMOS width-to-length ratio to allow a low enough
logic-low for its weak inverters.
Logic Cell
82
two phenomena is that channel charge-release plays a major part
in the MOS-style DET-FF since the pre-stored data relies on the
input capacitance of the main inverter. Because we consider the
worst-case control overhead as the timing assessment in an
asynchronous control system, the power-delay performance
shown in Fig. 10(c) is based on the worst case for each DET-FF.
We must especially emphasize that the propagation delay shown
in our simulation results is the time delay from the control signal
transition to the output data signal transition, and “power
dissipation” refers to the power consumption of all control X Power dissipation of XNOR based
circuits, including the input and output buffers, because the □ Power dissipation of conventional
driving current for the transmission paths comes from the output
△ Power dissipation of MOS-style
of the previous stage.
83
decision stage. Even the double-size decision-N-MOSFET took a [5] M. Afghahi and J. Yuan, “Double edge-triggered d-flip-flops
long time to drop the input of the output stage. for high-speed CMOS circuits,” IEEE Journal of Solid-State
Circuits, 26(8), August 1991.
3rd stage of [6] R. Hossain, L. Wronski and A. Albicki, “Low power design
XNOR based
using double edge-triggered flip-flops,” IEEE transaction on
4th stage of
VLSI system, vol. 2, No. 2, pp.261-265, June 1994.
XNOR based [7] K. Y. Yun, P. A. Beerel and J. Arceo, “High-performance two-
phase micropipeline building blocks: double edge-triggered
3rd stage of latches and burst-mode select and toggle circuits,” IEE
conventional
Proceeding, Circuits, Devices and Systems, 143(5):282-288,
4th stage of October 1996.
conventional [8] Pradeep Varma, B. S. Panwar, Ashutosh Chakraborty, and
Dheeraj Kapoor, “A MOS approach to CMOS DET flips-flop
3rd stage of design,” IEEE Transactions on Circuits and Systems I, vol. 49,
MOS-style
No7. July 2002.
4th stage of [9] J. M. Wang, S. C. Fang and W. S. Feng, “New efficient
MOS-style designs for XOR and XNOR functions on the transistor level,”
IEEE Journal of Solid-State Circuits, vol. 29, pp. 780-786,
July 1994.
Figure 11. Output waveforms from the DET-FFs in pipeline [10] Nan Zhuang and Haomin Wu, “A new design of the CMOS
simulation. full adder,” IEEE Journal of Solid-State Circuits, vol. 27, No.
5, pp.840-844, May 1992
CONCLUSION [11] Abdellatif Bellaouar and Mohamed I. Elmasry, “Low-power
The concept of wrapped asynchronous modules opens a new digital VLSI design: Circuits and Systems”, Kluwer Reading,
possibility of using various power systems and operating Kluwer Academic Publishers, MA: 1995.
frequencies in isolated computing modules in system-on-chip [12] Maitham Shames, Jo C. Ebergen, and Mohamed I. Elmasry,
(SOC) and other single-chip applications. The two-phase “Modeling and Comparing CMOS Implementations of the C-
pipelined control system removes the unnecessary overhead of Element”, IEEE Trans. on VLSI System, Vol. 6, No. 4, Dec.
waiting for signal transitions on the control bus. The key 1998.
component in an efficient two-phase pipelined system is a [13] Marc Renaudin, Bachar El Hassan, and Alain Guyot, “A
specified edge-triggered flip-flop that can latch data on the rising new Asynchronous Pipeline Scheme: Application to the
and falling control signal transitions. We have presented a Design of Self-Timed Ring Divider,” IEEE Journal of Solid-
different approach to implementing XNOR-based double-edge- State Circuits, Vol. 31, No. 7, pp.1001~1013, July 1996.
triggered flip-flops. We re-used an alternative capture-pass [14] Ivan E. Sutherland, “Micropipelines,” Communications of
XOR/XNOR gate, and took advantage of its potential weakness, ACM, 32(6): 720-738, June 1989.
sensitivity to the driving capacity of the previous stage, to [15] K. V. Berkel, “Beware the isochronic fork,” Integration, The
generate a pair of stable clock pulses for a simple transparent VLSI J. vol. 13, pp. 103-128, June 1992.
latch. Our approach allows use of the good property of level- [16] J. C. Ebergen, J. Segers and I. Benko, “Parallel progress and
sensitive latches from other studies to be reproduced in a two- asynchronous circuit design,” in Asynchronous Digital Circuit
phase pipelined control system. Simulation results show that the Design, New York: Springer-Verlag, 1995, pp. 51-103.
proposed XNOR-based double-edge-triggered flip-flop is faster [17] W. A. Clark, “Macromodular computer systems,” in Proc.
than the pseudo-static DET-FF and more power saving. Although AFIPS Conf. 1967 Spring Joint Comput. Conf., Atlantic City,
the MOS-style DET-FF demonstrated significant speed NJ. vol. 30, pp. 335-336.
performance, the trade-off for the active load and simple latching
mechanism is more power dissipation and a disappointing output
spike. Generally, our proposed XNOR-based DET-FF had stable
performance, even when the system power was reduced to 1.8V,
72% of rated supply voltage. And this reliable performance was
also evident in the two-phase pipelined system simulation.
REFERENCES
[1] S. Rotem, K. Stevens, R. Ginosar, P. Beerel, C. Myers, K.
Yun, R. Kol, C. Dike, M. Roncken, and B. Agapiev, “ RAPPID:
An asynchronous instruction length decoder”, in Proc. Int.
Symp. Advanced Research in Asynchronous Circuits and
Systems, Apr. 1999, pp.60-70.
[2] D. M. Chapiro, “Globally-asynchronous locally-synchronous
systems”, Ph. D. dissertation, Stanford University. Oct. 1984.
[3] J. Carlsson, W. Li, T. Njolstad, K. Palmkvist, L. Wanhammar,
and S. Zhuang, “A Modular Asynchronous Wrapper”, National
Conf. Radio Science (RVK), Stockholm, Sweden, Jane. 10-13,
2002.
[4] Paul Day and J. Viv Woods, “Investigation into Micropipeline
latch design styles,” IEEE Transactions on VLSI Systems,
V3(2):264-272, June 1995.
84