Вы находитесь на странице: 1из 5

A C ircu it for All Seasons

Behzad Razavi

The Decision-Feedback Equalizer

The decision-feedback equalizer (DFE) say the equalizer provides a high- izer does not suffice in most prac-
dates back to the 1960s [1] and be­­gan frequency “boost” to compensate for tical cases. Specifically, a typical
to appear in high-speed wireline com­­ the channel loss. channel introduces, in the signal
munication systems in the early 2000s. path, i mp e d a n ce discontinuities
In this article, we study the properties The Need for Decision-Feedback (mismatches) resulting from connec-
of this circuit and describe its “ana- Equalization tors and other physical interfaces
log” implementations. While intuitively appealing, linear between boards, cables, etc. Such
equalization faces three issues. First, discontinuities manifest themselves
The Need for Equalization since it requires a large amount of as deep notches in the channel’s fre­­
As high-speed random data propa- boost for very lossy channels, it sig- quency response (Figure 2) that would
gates through a medium with a lim- nificantly amplifies high-frequency be difficult to compensate by a lin-
ited bandwidth (also called a “lossy” noise, corrupting the data. Second, ear equalizer.
medium), it is dispersed. That is, the a high boost demands multiple To appreciate the beauty of DFEs,
data edges become slower, possi- stages, each one inevitably limiting we first return to the time domain and
bly disallowing full transition if a the bandwidth and consuming con- view the data waveform as the superposi-
010 or 101 sequence occurs [Fig- siderable power. Third, the inverse tion of random steps shifted in time by
ure 1(a)]. This sluggishness of the response provided by a linear equal- integer multiples of Tb [Figure 3(a)].
channel also makes the zero crossing
times of the data a function of the bit
amp­­litudes, causing significant jitter.
Both degradations increase the bit- Din Dispersive
detection error rate. In the frequency Channel
domain, the channel attenuates the Tb t
(a) t
high-frequency content of the data
[Figure 1(b)]. Spectrum of
Channel Response

The frequency-dependent chan- Random Data

nel loss depicted in Figure 1(b) can
be undone by means of a circuit
having the inverse response, i.e., a
high-pass filter (HPF). As illustrated
in Figure 1(c), if subjected to such a
response, the received data assumes (b) f
its original, undispersed shape and Dispersive
more easily lends itself to detec- Channel HPF
tion. This HPF exemplifies a “linear”
equalizer as it can be approximated
by a finite impulse response filter t t
incorporating only linear stages (delay
units and scaling coefficients). We
Figure 1: (a) The dispersion of random data in a lossy channel, (b) the channel frequency
Digital Object Identifier 10.1109/MSSC.2017.2745939 response showing attenuation of high-frequency components, and (c) the use of an HPF to
Date of publication: 16 November 2017 equalize the channel.


[Figure 1(b)] or as narrowband rejec- This issue can be remedied if
Channel tion (Figure 2). We surmise that ISI can the delay element is followed by a
be suppressed if we reconstruct the limiter, also known as a “slicer,” so
tail values and subtract them from the as to remove the amplitude noise
next bit(s). [Figure  4(b)]. The loop thus stops
Let us implement this idea for the noise from circulating and acts as
f canceling the first postcursor in Fig- a nonlinear equalizer. For robust op­­
ure 3(b). We must delay the present eration, we replace the delay stage
Figure 2: Notch in frequency response due bit by one bit period, scale this result and the slicer with a f lipf lop ( F F )
to impedance discontinuity. by a factor equal to h 1, and subtract [Figure  4(c)], recognizing that typi-
it from the next bit. Figure 4(a) shows cal FFs provide both a one-period
Due to the channel imperfections, such an arrangement. delay and limiting action on the ampli-
each output bit is broadened, exhib- If we consider D in as the pres- tude. We can say the loop feeds the
iting a tail that interferes with the ent bit, D out holds the previous bit FF’s decision back to the input, hence
next bit(s). Called intersymbol inter- and D F a scaled copy thereof. Thus, the term DFE. As illustrated in Fig-
ference (ISI), this phenomenon is D in-D F is free from the first post- ure 4(d), the summer output, D sum,
more clearly seen in the impulse c ur sor, whet her it is cr e ated by is free from the first postcursor, mak-
response of the channel [Figure 3(b)]. wideband loss or impedance discon- ing greater voltage excursions with
We observe that the tail values at Tb, tinuities. With a linear delay element, less jitter and allowing better de­­
2Tb, etc., (called the “postcursors”) this loop is still a “linear” equalizer. tection. Sensed by the flipflop, this
respectively represent the ISI intro- A side effect is that the amplitude output is the most critical node in
duced in the next bit, the bit after it, noise at the output is scaled by a fac- the circuit.
and so on. In general, energy removal tor of h 1 and added to the input data, In the DFE loop of Figure 4(c), two
from a signal’s spectrum causes ISI, degrading the signal-to-noise ratio of parameters must be adjusted to reach
whether it occurs as wideband loss each bit. optimum performance. First, the clock
sampling edges must occur at the
peaks of D sum, necessitating a clock
recovery circuit. Second, the first “tap”
Din value, h 1, must be chosen according
t Main to the actual channel response. This
Dout Cursor h0 Postcursors is typically accomplished by moni-
t h1 toring the eye diagram at the summer
Dispersive h2 output and adjusting h 1 to maximize
Dout Channel
t 0 Tb 2Tb t its height.
t Higher-order postcursors can
(a) (b)
also be removed by a DFE. Depicted
in Figure 5, a two-tap realization
Figure 3: (a) Random data viewed as superposition of steps and (b) the impulse response of
returns scaled copies of the last two
a lossy channel.
bits to the input.

Slicer Design Issues

Din Tb Dout Din Tb Dout In addition to clock phase alignment
+ +

and proper setting of the feedback
– h1 h1
tap, the DFE shown in Figure 4(c) must
DF DF also deal with the total loop delay.
(a) (b) We predict that, at a sufficiently high
Summer FF Din
data and clock rate, the circuit begins
to incur errors.
D Q Dout
+ Dsum Dsum t To study the DFE speed limitations,

consider the differential top­ology in
CK t Figure 6 and suppose the slave latch
h1 Dout
in the FF enters the sense mode on
DF the falling edge of the clock, at t = t 1 .
(c) (d) The slave output requires a certain
Figure 4: (a) A feedback loop canceling the first postcursor, (b) the addition of a slicer to amount of time to change state, called
suppress amplitude noise, (c) the use of an FF as a delay element and a slicer, and (d) the the “clock-to-Q” delay, TCK -Q = t 2 - t 1 .
resulting ­waveforms. This t­ransition ­propagates through


the scaling stage and causes a change
at the summing node. This node
has a finite time constant, introduc- Din D Q D Q Dout
ing its own delay, TFB = t 3 - t 2 . When –

CK goes high, the master latch enters
the sense mode and must change its
output according to the new value of
D sum before CK goes low again. The
necessary time for this change is the
setup time of the FF, Tsetup = t 5 - t 4 . h2
Thus, TCK -Q + TFB + Tsetup must not
exceed one clock cycle and hence one
bit period:

Figure 5: A two-tap DFE architecture.

TCK -Q + TFB + Tsetup # Tb .(1)

Master Slave
Similar speed limitations exist in other
Latch Latch
variants of this architecture as well (see Dsum DM
the “DFE Variants” section). Din D Q D Q Dout
Three other nonidealities affect +

the performance of the DFE. First,
the data input port of the summer in
Figure 6 cannot be arbitrarily non-
linear because the dispersed data’s h1
amplitude carries information about DF
the channel and must not experience
significant limiting. We can see intui-
tively that, if the D in waveform in
Figure 4(d) is greatly amplified and Dout
sliced, then all of the bits exhibit a Dout
full swing but the jitter introduced
by the channel remains. As a guide- Dsum
line, we choose the 1-dB compression Dsum
point of this port to be greater than
the main cursor amplitude [2] so that
the nonlinearity negligibly increases DM
the ISI. t1 t2 t3 t4 t5 t
Second, the input offset of the
FF, VOS, shifts the net voltage sensed
Figure 6: A DFE with differential signal paths.
at the summing node, D sum, equiv­­
alently deg rading the volt age
m a r gin for the negative or posi-
tive data values sampled by the FF.
Third, the total noise in D sum, Vn, +
yields a finite bit error rate. This –
noise includes that produced by
the summer and the stages preced- +h1
ing the DFE and the input-referred

Din D Q Dout
noise of the FF. As a rule of thumb,
w e e n s u r e t h a t 8 (4VOS + Vn, rms)
r em a in s less t h a n the peak data

swing D sum. The factor of eight is +
chosen to ensure error rates on the
order of 10 -12 and the factor of four
represents the four-sigma variance
of the offset. Figure 7: An unrolled DFE architecture.


DFE Variants
A multitude of DFE architectures have CK1/2 CK1/2
been proposed to ease the design
tradeoffs. We study some here.
D Q Deven Din D Q Deven
It is possible to transform the +
– –
feedback loop of Figure 4(c) to a pre- h1 h1

dictive or “unrolled” topology. Sup-
pose D in and D out swing between –1 h1
and +1. Since we wish to compute D Q Dodd
D in - h 1 D out, we can equivalently –
consider D in - h 1 and D in + h 1 as the D Q Dodd
only two possible levels that must
reach the FF. The selection between
these two values can be made by
(a) (b)
the previous bit. Figure 7 shows the
resulting “unrolled DFE” [3]. Here,
the previous bit available at D out Figure 8: Half-rate DFE architectures with (a) two summers and (b) one summer and one
decides whether D in - h 1 or D in + h 1
must travel through the multiplexer
and be sliced by the FF. We note that
the summing nodes lie outside the
feedback loop, which is the princi-
pal advantage of this arrangement.
I1 I2
The timing budget is now given by
TCK -Q + Tsetup + TMUX 1 Tb, where TMUX X Y
denotes the delay from the select
input of the multiplexer to its out- Vin1 M1 M2 Vr 1 Vr 2 M3 M4 Vin2
put. In some cases, TMUX is less than
TFB in (1). However, the D out signal
must be level shifted and/or ampli-
fied to properly switch the multi-
plexer, leading to additional delay.
At very high speeds, it is desir- Figure 9: A comparator input stage based on two differential pairs.
able to drive the DFE with a half-
rate clock, CK 1/2, which is simpler At very high speeds, the sum- As a result, the transconductance
to generate and distribute. Fig- ming node and the FFs can incorpo- of the two pairs falls considerably,
ure  8(a) shows a half-rate DFE [4], rate inductive peaking for a greater making the offsets of the subse-
where the FFs are clocked by CK 1/2 bandwidth and a smaller loop delay. quent stages significant.
and CK 1/2, thereby demultiplexing This improvement comes at the cost 2) How does the characteristic shown
the data by a factor of two. Each out- of a more complex layout and signal i n Fig u r e 10 ( b) c h a nge i f the
put bit lasts for 2Tb seconds and, distribution difficulties. front-end comparator has an off-
after subtraction from D in, is fed set equal to 1.5 least-significant
to the FF in the other branch. This Questions for the Reader bits (LSBs)?
topology nonetheless does not re­­­­­­­lax 1) Can the delay stage and the ­slicer In the ideal case, we have
the loop timing budget given by in Figure 4(b) be realized as a sin- V +F -V -F = V +in -V -in if V +in - V -in 2 0
(1). It also consumes about twice as gle limiting differential pair? a n d V +F - V -F = - (V +in - V -in) i f
much power as the full-rate DFE of 2) Can the unrolled DFE of Figure 7 V +in - V -in 1 0. With a compara-
Figure 4(c). ­accommodate a second tap? tor offset of 1.5 LSBs, the former
Another half-rate DFE architecture holds if V +in - V -in 2 1.5 LSBs and
is depicted in Figure 8(b) [5]. Here, Answers to Last Issue’s Questions the latter, if V +in - V -in 1 1.5 LSBs.
the half-rate outputs are multiplexed 1) In Figure 9, why can we not apply That is, the circuit negates the
so as to reconstruct the full-rate data, Vin1 and Vin2 to M 1 and M 2 and differential input even for values
with the result serving as the feed- Vr1 and Vr2 to M 3 and M 4 ? reaching = 1.5 LSBs. The result-
back signal. While using only one In such a case, each differential ing characteristic is shown in
summer, this method adds the mul- pair can experience a large input Figure 10(c).
tiplexer delay to TCK -Q + TFB + Tsetup, difference even when the compar-
degrading the speed. ator is making a critical decision. (continued on p. 132)


SSCS Vice President Bram Nauta Inducted to
the Royal Dutch Academy of Arts and Sciences

IEEE Solid-State Circuits Society members represent a wide spectrum
(SSCS) Vice President Bram Nauta was of scientific and scholarly disciplines,
inducted into the Royal Dutch Acad- giving all members the opportu-
emy of Arts and Sciences in June. nity to embrace new fields in science
Nauta is a professor at the Univer- and scholarship.
sity of Twente, heading the Integrated Nauta was inducted as a result of
Circuits Design group. His current the work he performed throughout
research interests are high-speed ana- his career and it was a great honor.
log complementary metal-oxide-semi- “It was a surprise for me,” Nauta
conductor circuits, software-defined said, “especially because I’m an elec-
radio, cognitive radio, and beamforming. trical engineer working on the appli-
Academy membership is a great cation side of science.”
honor in The Netherlands. The acad- He hopes his induction will open
emy appoints a maximum of 16 new new doors for him, especially outside
members every year. Membership is Bram Nauta his own scientific field.
awarded based on an individual’s sci­­ For more information about the
entific and scholarly achievements. Royal Dutch Academy of Arts and Sci-
Once appointed, indiv iduals are ences, visit https://www.knaw.nl/nl.
members for life. Members meet and
Digital Object Identifier 10.1109/MSSC.2017.2746193 discuss issues of interest to science, —Abira Sengupta
Date of publication: 16 November 2017 scholarship, and society. Academy 

A Circu it for All Sea sons (continued from p. 16)

+ VF + –
Vin +
– VF − VF


Vin + – 1.5 + –
– Vin − Vin Vin − Vin (LSB)
(a) (b) (c)

Figure 10: (a) The flash stage preceded by a polarity detector and (b) the resulting characteristic and the characteristic in the presence of
comparator offset.

References Areas Commun., vol. 9, pp. 711–717, June

[5] R. Payne, P. Landman, B. Bhakta, S. Ra-
maswamy, S. Wu, J. D. Powers, M. U. Er-
[1] M. E. Austin, “Decision-feedback equal-
ization for digital communication over [4] J. F. Bulzacchelli, M. Meghelli, S. V. Rylov, dogan, A. Yee, R. G. L. Wu, Y. Xie, B. Par-
dispersive channels,” Tech. Rep. 437, Lin- W. Rhee, A. Rylyakov, H. A. Ainspan, B. D. thasarathy, K. Brouse, W. Mohammed, K.
coln Laboratory, Aug. 1967. Parker, M. P. Beakes, A. Chung, T. J. Beu- Heragu, V. Gupta, L. Dyson, and W. Lee, “A
[2] J. Jung and B. Razavi, “A 25 Gb/s 5.8 mW kema, P. K. Pepeljugoski, L. Shan, Y. H. 6.25-Gb/s binary transceiver in 0.13-um
CMOS equalizer,” IEEE J. Solid-State Cir- Kwark, S. Gowda, and D. J. Friedman, “A CMOS for serial data transmission across
cuits, vol. 50, pp. 515–526, Feb. 2015. 10-Gb/s 5-tap DFE/4-tap FFE transceiver high loss legacy backplane channels,”
[3] S. Kasturia and J. H. Winters, “Tech- in 90-nm CMOS technology,” IEEE J. Solid- IEEE J. Solid-State Circuits, vol. 40, no. 12,
niques for high-speed implementation of State Circuits, vol. 41, no. 12, pp. 2885– pp. 2646–2657, Dec. 2005.
nonlinear cancellation,” IEEE J. Selected 2900, Dec. 2006.