Вы находитесь на странице: 1из 6

OVERVIEW OF OVERSAMPLING CLOCK AND DATA

RECOVERY CIRCUITS

S. I. Ahmed Tad A. Kwasniewski


Carleton University Carleton University
Department of Electronics Department of Electronics
Ottawa ON K1S 5B6 Ottawa ON K1S 5B6
email: siahmed@doe.carleton.ca email: tak@doe.carleton.ca

Abstract
NRZ data Symbol period = Bit period
Phase-Locked Loop (PLL) based Clock and Data Recovery
(CDR) circuits use a 2x-oversampling (2XO) of the incoming
Recovered clock T
Non-Return to Zero (NRZ) data stream to recover the data. As
an extension of the idea, 3x-oversampling (3XO) CDR circuits
provide improved performance in the presence of total asym-
metric jitter. This paper presents an overview of the oversam-
pling CDR circuits with an emphasis on digital architectures. falling edge aligned rising clock edge
with data transitions makes a decision
These include, but are not limited to, the 3XO jitter-tolerant
variable-interval 3XO architecture, the 3XO eye-tracking archi- Fig. 1. Conventional 2x-oversampling concept
tecture, and the blind oversampling architecture. We propose a
modified architecture that utilizes multiple rotating phases to improve CDR performance. We will review the mixed-signal,
improve the performance of the 3XO eye-tracking architecture. digital and all-digital implementations of 3XO-CDR circuits
that have been presented in the literature during the last decade.
Keywords: CDR, PLL, DR, Oversampling, Jitter Tolerance We will present a modified CDR architecture that improves one
aspect of an existing 3XO design. Simulation results are pre-
sented that show the equivalence of the new architecture with
the existing one. Using this result, some other research possibil-
1. INTRODUCTION
ities are also mentioned.
This paper concentrates on CDR architectures used at the
receiver end of a Serializer-Deserializer (SERDES) chain. A 2. FILTER-TYPE CDR CIRCUIT
CDR circuit has to counter the amplitude and phase degrada-
tions induced by the transmitter, channel and the receiver as it The main task of CDR circuits is to recover the clock that is
recovers the clock and retimes the data [1]. The channel can be not transmitted with the NRZ data in order to save power and
a satellite-link, a fiber-optic cable, a coaxial cable, a backplane avoid skew at the transmitter end. The block diagram of a filter-
Copper trace or an integrated circuit (IC) interconnect over a style or a microwave-style CDR is shown in Fig. 2 [3]. The
semiconductor substrate each with its own challenges and limi- clock component has to be created in the received data spec-
tations [2]. With the IC technologies undergoing constant fea- trum using an edge-detector block (d/dt) followed by a non-lin-
ture-size reduction, it is in the interest of portability and reduced earity (x2) and, subsequently, to be isolated using a high-Q
time-to-market that digital, adaptive and automatically synthe- Bandpass Filter (BPF) or an external Surface-Acoustic Wave
sizable CDR architectures be used and re-used.
We will briefly discuss the filter-type CDR circuits to elabo- Variable Retimed
rate the point that a PLL is not necessarily required to imple- delay data
ment a CDR circuit. Nevertheless, PLLs provide a cost- NRZ data
+ jitter
effective and a well-studied CDR implementation alternative D Q
despite the numerous circuit- and system-design challenges
associated with them. As shown in Fig. 1, one of the clock
CK
edges is aligned with the data edges using a PLL. The other
clock edge samples the incoming data to implement a 2XO- d/dt x2
CDR circuit with a maximum timing margin of 0.5 Unit Inter- Recovered
vals peak-to-peak (UI p-p). A logical extension of this idea BPF clock
would be to acquire more samples per period in order to Fig. 2. Filter-type CDR circuit [3]

0-7803-8886-0/05/$20.00 ©2005 IEEE 1876


CCECE/CCGEI, Saskatoon, May 2005
(SAW) filter. A high-Q on-chip filter that is implemented using
Full rate NRZ data Symbol period = Bit period
a PLL with a narrow loop bandwidth is also being actively
researched in conjunction with novel receiver architectures [4].
A filter-type CDR circuit using MESFET-based off-the-shelf Recovered clock uses both edges T
chips has been reported [5] with a bit rate of 35.1 Gb/s. By
using this as a simulation technique, however, a large number of
CDR circuits in a system can be replaced by black boxes to
reduce simulation times. D+ve
D+ve and D-ve to be
3. PLL-BASED CDR ARCHITECTURE
multiplexed
D-ve
In a conventional PLL-based CDR circuit shown in Fig. 3,
the clock is recovered using a PLL. The PLL acts like high-Q Fig. 4. Half-rate PLL-type CDR concept
bandpass filter that can switch from one frequency to another
electronically, if there exists a frequency divider (not shown), in cycle problems, data feedthrough, increased VCO jitter (due to
the feedback loop. A D-type flipflop forms the Data Recovery high-VCO gain resulting from supply voltage reduction) and
(DR) or the decision circuit. The operating principle for this poor performance in the presence of asymmetric jitter. In order
CDR was introduced earlier in Fig. 1. In practice, one would to achieve high data rates while maintaining an acceptable per-
have to carefully analyze jitter, systematic skews, effect of long formance, reduced-rate architectures are employed [13]. A
run-lengths and acquisition of lock problems [6]. PLL-type Novel 1/8th-rate PD implementation is reported in [14].
CDR circuits are roughly classified according to the type of
Phase Detector (PD) used. The most important ones are the lin- 3.2. Reduced-Rate CDR Architectures
ear and the non-linear CDR structures. A linear CDR circuit
uses a Hogge’s Phase Detector [7] or one of its variants [8], In order to sample the full-rate data stream with a half-rate
whereas a non-linear CDR circuit uses a Bang-Bang Phase VCO, one has to use both the edges of the recovered clock as
Detector (BBPD). It is also known as ‘Alexander PD’ [9] or an shown in Fig. 4 and later multiplex the two data streams
‘Early-Late PD’. Bang-Bang PDs provide high gain and there- labelled D+ve and D-ve. The PD could either be linear [15] or
fore require no charge-pump (CP), no limiting pre-amplifier, non-linear [16]. If further reduction in clock rate is sought, one
automatic compensation of timing offsets since these PDs use can use additional equi-distant phases provided by a locked
sampled data, the possibility of multi-bit sampling in one clock oscillator. The main problem in such a case becomes the gener-
cycle to implement a reduced-rate architecture and no fre- ation of equi-distant phases, modeling and capacitance issues
quency drifts during long run-lengths [10]. A sample-and-hold associated with multiplexer and de-multiplexer circuit design.
style PD that combines the advantages of a linear-PD and a
BBPD has been reported in [11]. For more information on mod- 4. VARIABLE-INTERVAL OVERSAMPLING
ern PD architectures, the reader can refer to [12]. CDR ARCHITECTURE

3.1. Disadvantages A mixed-signal, variable-interval 3XO CDR circuit has


been proposed in [17][18]. This 3XO concept [17] is shown in
Traditional PLL-based CDR circuits suffer from device Fig. 5 . Using a VCO and a DLL combination, the architecture
speed limitations with increasing data rates, degradation of on- tracks the data edges and then puts the sampling strobe exactly
chip Q for inductors (if an LC-VCO is used), 50 percent duty- in the centre of the two data edges. In this quarter-rate architec-
ture operating at 5 Gb/s (shown in Fig. 6), four NRZ data bits
Retimed data
NRZ data + jitter
D Q

Phase CK
Detector
VCO
Recovered
Timing clock
CP
Recovery
PLL LPF

Fig. 3. Linear PLL-based CDR architecture Fig. 5. Jitter pdf tracking 3XO concept [17]

1877
NRZ
data
sampler Edge-Detect / DSP
Buffer(FIFO) control
block

3XO Interpolate
clock Pick
missing
Data
edges

Fig. 8. Blind Oversampling Architecture block diagram


with Motorola’s 16x-oversampling Universal Synchronous and
Asynchronous Transceiver (UART) chips and later used in
other similar industrial derivatives and variants [21]. An odd-
number of samples (three or five) are acquired per bit. The data
Fig. 6. Variable Interval Oversampling Architecture edges are detected using an XOR gate (and the missing ones are
block diagram [18] interpolated digitally). The sample picked using the center-
phase is declared as the correct data bit. Majority-voting can
are recovered every clock cycle. The ring oscillator locks to also be employed but is less superior than center-picking [20].
1.25 GHz using a ‘reference loop’. Once the frequency-lock is The block diagram of the architecture appears in Fig. 8. Its main
signaled by the ‘lock-detect’ circuit, the control voltages for the advantage is the all-digital nature and the main disadvantage is
DLL and the PLL become independent of each other. The DLL the extra power and increased latency due to the DSP core and
tracks the jitter probability density function (pdf) of the incom- the algorithm. This operation was named ‘phase-picking’ [20],
ing data edges using the ‘eye-measuring loop’ and puts the sam- although a data bit acquired by a particular phase is picked and
pling clock edge in the centre of the two edge-clocks. The BER not the phase itself. The ambiguity can be resolved by the con-
is improved and a high-frequency jitter tolerance of 0.65 UI p-p text of the discussion.
in the presence of asymmetric jitter is achieved.
As opposed to the 2XO architectures, this 3XO circuit does
6. EYE-TRACKING DR ARCHITECTURE
not acquire equidistant samples except at the end of tracking
stage, so it is not strictly a 3XO CDR circuit. The authors call it An all-digital DR circuit core has been presented in [22] as
a ‘variable-interval’ oversampling circuit. It is possible to dis- shown in Fig. 9. This is a 3XO architecture that tracks the data
card the extra information (or rather not collect it all) due to the eye instead of the data edges. The edge-detection interval is
presence of a three loops that track the data edges so that three realized using CMOS style delay elements in the BBPD. The
samples are collected only where these are the most relevant. decisions are accumulated in a serial shift register and a rotating
clock phase pointer either accelerates or decelerates the phase
5. PHASE-PICKING OR BLIND OVERSAM- in addition to providing an unlimited phase-range to the DR cir-
PLING CDR ARCHITECTURE cuit. The paper provides excellent design information and equa-
tions. Fig. 10 shows the 3XO concept for this DR circuit. One
The blind oversampling [19] (also called ‘phase-picking’ has to detect the edges and keep the recovered clock away from
[20]) concept is shown in Fig. 7. This oversampling concept has these edges. The sampling occurs at the centre of the eye and
been around since the early 1970s when it was originally used provides a low BER and a high-frequency jitter tolerance of 0.7
NRZ
data

3XO
clock

Edge
Detect

Data 1 0 0 1 1 0 1 0 1 0 1
Picking
Fig. 7. Blind oversampling concept Fig. 9. Eye-Tracking DR circuit block diagram [22]

1878
BBPD
Din
D Q

CKD_Early CK
Dout
D Q DR circuit
CKD_Center CK

D Q

CKD_Late CK

Phase
Switcher local
Fig. 10. Eye-Tracking 3XO concept INC/DEC fref
DLL-based
UI p-p. The architecture is compliant with the SFI-5 specifica- Phase
tion. Of all the architectures reviewed, this consumes the least Generator
N-phases
power and area as well.
Fig. 11. Block Diagram of the Modified Architecture
7. OTHER OVERSAMPLING CDR ARCHI- data frequency. The BBPD architecture shown in [22] (also see
TECTURES Fig. 9) is retained, but the delay elements are removed and the
Early, Centre and Late Phases as shown in Fig. 11 are used to
A true phase-picking or phase-selection CDR architecture
clock the three front-end flipflops. This implements a 3XO all-
would pick and use one of the phases directly [23][24] or feed it
digital CDR architecture that doesn’t suffer from PVT induced
back to another PLL in order to achieve further dithering of the
jitter tolerance shifts due to CMOS type delay blocks.
jitter [23]. Some of the other important publications in this field
The phase resolution would depend on the speed of the tech-
are [25][26][27][28]. One noteworthy entry is a quad 3.125
nology and the design requirements for the DLL. For achieving
transceiver chip featuring an analog phase rotator [29]. Due to
limited space, we now limit this overview and present our mod-
Jitter
ified DR architecture. Frequency
Recovered
8. MODIFIED DR ARCHITECTURE Clock Phase
IN – 1 I0
In our previous paper [30], we swept three critical Digital
PLL (DPLL) parameters for the eye-tracking architecture [22]; (a)
I1
i.e., the edge-detection interval of the BBPD, the number of
clock phases and the phase update interval, in order to investi-
gate their effects on the jitter tolerance of the DR circuit. The I2
edge-detection interval for the DR circuit is a critical parameter
of this design. Its value has a conflicting influence on the high-
and low- frequency jitter tolerance of the DR circuit. If the
edge-detection window is too narrow, data edges cannot be
detected effectively and the low-frequency tracking ability of Jitter
Frequency Center
the DPLL suffers. If the edge-detection window is too wide, the Phase
high-frequency jitter tolerance deteriorates but the tracking Early
improves. The simulations were performed in Matlab/Simulink. Phase Late
Our modified architecture ameliorates this problem by IN – 1 I0 Phase
removing the CMOS style delay blocks in the BBPD altogether. I1
Instead we rely on the presence of a DLL-based phase generator
[31] that can provide equidistant phases with much less Process, (b)
Voltage and Temperature (PVT) dependence, perhaps for many I2
on-chip data lanes. In addition, we not only use one rotating
phase as shown in Fig. 12(a) [22], but three rotating phases, a
fixed distance apart (as seen in Fig. 12 (b)). The difference
between this circuit and a conventional VCO type circuit would
be the unlimited phase range due to three rotating phases and
the fact that the phases generated by the DLL are not synchro- Fig. 12. Phase Rotation Concept (a) from [22] (b)
nized to the data stream, although they have to be close to the The modified architecture

1879
finer resolutions with slower CMOS technologies, a phase- • Changing from the full-rate architecture to half-rate or
interpolator could be used. With the improvement in the speed quarter-rate architecture and having a multiplicity of flip-
of the technology, the entire design could be implemented using flops in the BBPD clocked by DLL generated phases. The
digital-style delay cells in the DLL with sufficient resolution. multiple phases would still keep rotating.
This requires the addition of some digital circuitry and
8.1. Simulation results would produce a DR architecture that is all-digital, and mea-
sures the eye-opening digitally with low power dissipation.
Matlab simulation results available at the time were
reported in [30]. In order to maintain continuity, we report the 9. Conclusions
comparison of the jitter tolerance for our version of the refer-
ence architecture and our modified architecture using the Mat- • Following the basic clock recovery concept behind a filter-
lab/Simulink platform. The number of clock phases is eight. type CDR circuit, a brief review of PLL-style 2x-oversam-
The phase update interval is 16 bits and the clock phases main- pling CDR architectures was presented. Several 3x-over-
tain a 0.25 UI separation. Fig. 13 shows that the two approaches sampling CDR architectures were reviewed including the
produce equivalent results for the jitter tolerance. A typical sim- eye-tracking, jitter-tolerant variable-oversampling and the
ulation of 2 Ps is shown in Fig. 14 (5000 bits are not shown). blind oversampling architectures.
The DR circuit uses three clock phases, a fixed distance (0.25 • A modified architecture was presented that minimizes the
UI) apart and comfortably tracks a 0.5 MHz, 8.5 UI p-p jitter shifts in jitter tolerance performance of the eye-tracking
sinusoid as it recovers the NRZ data at 2.5 Gb/s. The simulation architecture using a 3XO architecture, with three rotating-
is self-explanatory, except that any spike reaching a +1 thresh-
old would have meant a bit error in Fig. 14(e) [30].

8.2. Future Research

If we take a closer look, the eye-tracking architecture and (a)


the jitter-tolerant variable-oversampling architecture can both
be merged using the modified architecture. One would have to
make two significant adjustments. These would be:
• The static distance between the early, center and late
phases could become dynamic, similar to the one presented (b)
in [17], limited by the available phase resolution from the
DLL circuitry. This would allow one to measure the eye-
opening digitally. This information can be used to control
equalizers in the overall SERDES architecture.
(c)

(d)

(e)

Fig. 14. Typical Simulation Results (a) Early Clock


Phase (b) Center Clock Phase (c) Late Clock Phase
Fig. 13. Jitter Tolerance Comparison: Reference (d) Amplitude of added sinusoidal jitter (e) Residual
Design vs Modified design Phase Error

1880
phases instead of one rotating phase, and utilizing a DLL- [15] J. Savoj, B. Razavi, “A 10-Gb/s CMOS clock and data recovery
based phase generator. circuit with a half-rate linear phase detector,” IEEE J. Solid-
State Circuits, vol. 36, no. 5, May 2001, pp. 761-768
• The two architectures were found equivalent with respect
[16] J. Savoj, B. Razavi, “A 10-Gb/s CMOS Clock and Data Recovery
to their jitter tolerance performance. Circuit with a half-rate binary Phase/Frequency Detector,” IEEE
• Some possibilities about work in progress were also men- J. Solid-State Circuits, vol. 38, no. 1, Jan. 2003, pp. 13-21
tioned. Current work is being done using the Verilog-A [17] S-H. Lee, M.-S. Hwang et al, “A 5-Gb/s 0.25-Pm CMOS Jitter-
platform, the added benefits of which will be described in Tolerant Variable-Interval Oversampling Clock/Data Recovery
another publication along with a pertinent comparison of Circuit,” IEEE J. Solid-State Circuits, vol. 37, no. 12, Dec.
the simulation results. 2002, pp. 1822-1830
[18] S-H. Lee, M.-S. Hwang et al, “A 5-Gb/s 0.25-Pm CMOS Jitter-
Tolerant Variable Interval Oversampling Clock/Data Recovery
Acknowledgements Circuit,” ISSCC Dig., 2002, paper 15.5
[19] J. Kim, D-K. Jeong, “Multi-Gigabit-Rate Clock and Data Recov-
The authors would like to thank the Ministry of Science and ery Based on Blind Oversampling,” IEEE Communications
Technology, Government of Ontario Canada, NSERC, Altera Mag., Dec. 2003, pp. 68-74
Canada and Carleton University for their financial support. [20] C-K. K. Yang, “Design of High-Speed Serial Links in CMOS”,
Thanks to the many colleagues and friends for sharing their Ph.D. Dissertation, CSL-TR-98-775, December 1998, Stanford
expertise and for their patience during long discussions. University, California
[21] R. Wagner, DAN 108, Data Comm. Application Note, EXAR Cor-
References poration, California
[22] Y. Miki et. al, “A 50-mW/ch 2.5-Gb/s/ch Data Recovery Circuit
[1] B. Razavi, in Monolithic Phase-Locked Loops and Clock Recov- for the SFI-5 Interface with Digital Eye-Tracking,” IEEE J.
ery Circuits, Theory and Design, IEEE Press 1996 Solid-State Circuits, vol. 39, no. 4, Apr. 2004, pp 613-621
[2] M. Horowitz, C-K. K. Yang, S. Sidiropoulos, “High-Speed Elec- [23] P. Larsson, “A 2-1600 MHz CMOS Clock Recovery PLL with
trical Signaling: Overview and Limitations,” IEEE Micro, 1998, Low-Vdd Capability,” IEEE J. Solid-State Circuits, vol. 34, no.
pp. 12-24 12, Dec. 1999, pp. 1951-1960
[3] R. Walker, “Clock and Data Recovery for Serial Digital Commu- [24] J. M. Khoury, K. R. Lakshmikumar, “High-Speed Serial Trans-
nications,” ISSCC Short Course, Feb. 2002 ceivers for Data Communication Systems,” IEEE Comms. Mag.,
[4] C. DeVries, R. Mason, “A 0.18 Pm CMOS 900 MHz receiver Jul. 2001, pp. 160-165
front-end using RF Q-enhanced filters,” ISCAS 2004, May 2004, [25] Y. Choi, D-K. Jeong, W. Kim, “Jitter Transfer Analysis of Tracked
vol. 4, pp. IV-325-328 Oversampling Techniques for Multigigabit Clock and Data
Recovery,” IEEE Trans. Circuits and Systems II, Analog and
[5] K. Murata, T. Otsuji, “A Novel Clock Recovery Circuit for Fully
Digital Signal Processing, vol. 50, no. 11, Nov. 2003
Monolithic Integration,” IEEE Trans. on Microwave Theory and
[26] Y-H. Lin, S.H.-L. Tu, “A pipelined serial data receiver with over-
Tech., vol. 47, no. 12, Dec. 1999, pp. 2528-2533
sampling techniques for high-speed data communications,”,
[6] B. Razavi, “Challenges in the Design of High-Speed Clock and
IEEE Conf. Electron Devices and Solid-State Circuits, 16-18th
Data Recovery Circuits,” IEEE Communications Magazine,
Dec. 2003
Aug. 2002, pp. 94-101
[27] S. Kim, K. Lee, D-K. Jeong, D. D. Lee, A. G. Nowatzyk, “An 800
[7] C. R. Hogge, “A Self-Correcting Clock Recovery Circuit,” IEEE
Mbps multi-channel CMOS serial link with 3x oversampling,”
J. Lightwave Tech.,vol. 3, Dec. 1985, pp. 1312-1314
IEEE Proc. CICC ‘95, pp. 451-455
[8] T. H. Lee, J. F. Bulzacchelli, “A 155-MHz Clock Recovery Delay-
[28] C-K. K. Yang, R. Farjad-Rad, M. A. Horowitz, “A 0.5-Pm CMOS
and Phase-Locked Loop,” IEEE J. Solid-State Circuits, vol. 27,
4.0-Gbit/s serial link transceiver with data recovery using over-
no. 12, Dec. 1992, pp. 1736-1745
sampling,” IEEE J. Solid-State Circuits, vol. 33, no. 5, May
[9] J. D. H. Alexander, “Clock Recovery from Random Binary Data,”
1998, pp. 713-722
Elect. Lett., vol. 11, Oct. 1975, pp. 541-542
[29] D. Zheng, X. Jin, E. Cheung, M. Rana, G. Song, Y. Jiang, Y-H.
[10] J. Lee, K. S. Kundert, B. Razavi, “Analysis and Modeling of
Sutu, B. Wu, “A Quad 3.125 Gb/s/Channel Transceiver withAn-
Bang-Bang Clock and Data Recovery Circuits,” IEEE J. Solid
alog Phase Rotators,” ISSCC Dig. Tech. papers, 2002, paper 4.2
State Circuits, vol 39, no. 4, Apr. 2004, pp. 613-621
[11] S. B. Anand, B. Razavi, “A CMOS Clock Recovery Circuit for [30] S. I. Ahmed, T. A. Kwasniewski, “An All-Digital Clock and Data
2.5Gb/s NRZ Data,” IEEE J. Solid-State Circuits, vol. 36, no. 3, Recovery Circuit Optimization Using Matlab/Simulink,” IEEE
Mar. 2001, pp. 432-439 ISCAS, May 2005 (Accepted)
[12] S. Soliman, F. Yuan, K. Raahemifar, “An Overview Of Design [31] R. Zhang, G. S. La Rue, “Clock and Data Recovery Circuits with
Techniques For CMOS Phase Detectors,” ISCAS 2002, vol. 5, Fast Acquisition and Low Jitter,” IEEE Workshop on Microelec-
pp V-457 - V460, May 2002 tronics and Electron Devices, 2004, pp. 48-51
[13] J. Lee, B. Razavi, “A 40-Gb/s clock and data recovery circuit in
0.18 Pm CMOS technology,” IEEE J. Solid-State Circuits, vol.
38, no. 12, Dec 2003, pp. 2181-2190
[14] A. Seedher, G. E. Sobelman, “Fractional-rate phase detectors for
clock and data recovery,” Proc. IEEE SoC Conf., 17-20 Sept.
2003, pp. 313 - 316

1881