06454433

Low Complexity Hardware Architectural Design for
Adaptive Decision Feedback Equalizer using

Distributed Arithmetic
Surya Prakash M
Rafi Ahamed Shaik
Department of Electronics and Electrical Engineering

Indian Institute of Technology Guwahati
Assam, India 781039
Email: surya@iitg.ac.in
Department of Electronics and Electrical Engineering

Indian Institute of Technology Guwahati
Assam, India 781039
Email: rafiahamed@iitg.ac.in
AbstractIn this paper, we propose a low complex architectural design for adaptive decision feedback equalizer (ADFE).
For this we recast the ADFE equations using distributed arithmetic (DA), which enables the implementation of ADFE without
multipliers. The design is based on the distributed arithmetic
based formulation of it. It is further shown that high order
filters, which are required to implement high speed ADFEs can be
realized using only look-up-tables (LUTs) and shift-accumulate
operations. A novel approach was proposed to replace feed
forward and feedback filters of ADFE with a single DA unit.
By proper initialization, it is also shown that a low complexity
ADFE architecture can be obtained.
Index TermsInter Symbol Interference (ISI), Adaptive Decision Feedback Equalizer (ADFE), Distributed Arithmetic (DA),
Least Significant Bit (LSB), Most Significant Bit (MSB), slicer,
quantization.
I. I NTRODUCTION
N digital communications, transmitted symbols are more

often prone to distortion because of the channel. The
distortion can be both because of the additive noise in the
channel as well as the overlapping of symbols due to the multipath propagation of the signal. Further, the distortion is not
the same at every instant because of the time-varying nature
of the channels. Adaptive equalizers try to nullify the effect of
channel by estimating the inverse of its mathematical model.
Linear Adaptive equalizers seldom provide a good means of
channel equalization by approximating the channel inverse.
However the noise power at the channel spectral nulls will
be increased. Adaptive decision feedback equalizers (ADFEs)
are widely used to eliminate the Inter symbol interference
(ISI) introduced by the channel. The idea is to eliminate the
pre-cursor and post-cursor components of ISI based on the
decisions made on the past symbols. The block diagram of
ADFE is shown in Fig. 1. The Feed forward filter (FFF)
removes some of the ISI from the received signal, but leaves
some of the post-cursor ISI on the signal. The Feedback filter
(FBF) estimates the residual ISI from the past decisions and
subtracts it from the feed-forward filter output. The low noise
enhancement of the DFE is due to the fact that, by assuming
no decision errors, the decision device removes all the noise

present in the signal. As long as the inputs of the feedback
filter have no noise, so the outputs of the feedback filter
will have no noise. Also, the feed forward filter has the less
complex problem of removing only pre-cursor coefficients.
This results in a better performance for the feed forward filter
according to the linear equalizer. But, the assumption that the
correct symbol decisions are made at the output of the decision
device may not work in all practical cases and hence error
propagation occurs and the performance of the equalizer is
degraded.
ADFEs are one whose implementations need to be addressed because of their design difficulty. Fine-grain pipelining
of adaptive DFEs is difficult beacuse of the fact that ADFE
will have a non-linear (a quantizer) in the decision-feedback
loop. Also that the adaptation loop makes it more difficult
to achieve pipelining. Hence, previous work [1], [2], [3]
in high-speed ADFE architectures have exclusively adopted
parallelization. Algorithms in [2], [3] result in the performance
loss and coding loss due to incorrect initialization of the
FFF and FBF respectively. A new method of pipelining
algorithms with quantizer loops was proposed by Parhi in [4]
where loops containing nonlinear devices are transformed to
equivalent forms which contain no nonlinear operation using
look-ahead computation. The drawback of this approach is that
the hardware grows with increase in the number of levels in
the quantizer or the order of the predictor in algorithms with
quantizer loops, and with increase in the number of linear
segments in piecewise-linear recursive systems. Later K. K.
Parhi et al. have succussefully applied look-ahead and relaxed
look-ahead [5] inorder to pipeline the adaptive equalizers.
In [6], [7], [8], several block and frequency domain based
efficient techniques have been proposed. In [6], Berberidis
et al. presented a new block ADFE that is mathematically
equivalent to the conventional LMS-based sample-by-sample
DFE but with considerably reduced computational load. A
new block ADFE implemented in frequency domain was later
proposed by the same author in [7]. In [8], Rontogiannis et
c 2013 IEEE
978-1-4673-5157-7/13/$31.00
24 LU T
e(n)
x(n)
FFF
Address
bits
sq (n)
s(n)
v(n)
Fig. 1.
x
(n)-
a3 a2 a1 a0
Slicer
FBF
b0,j
b1,j
b2,j
b3,j
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
Z 1
v(n)
Block diagram of an Adaptive Decision Feedback Equalizer.
al. subsequently presented a new efficient DFE appropriate for

channels with long and sparse impulse response. It is shown
that, in cases of sparse channels, the FFF and FBF have a
particular structure, which can be exploited to derive efficient
implementation of DFE, provided that the time delays of the
channel impulse response multipath components are known.
A multiplexer loop based adaptive DFE has been proposed in
[9]. In the recent past [10], two approaches namely partial precomputation scheme and two stage pre-computation scheme
have been proposed where the former is used for trade-off
between complexity and computational speed and the later
used for the reduction of hardware overhead and iteration
bound. So far, the DA [11] benefits of implementing ADFE
have not been studied and in this paper we formulated the
ADFE equations in a manner which can be realized by DA.
The paper is organized as follows: in Section II, DA and the
ROM decomposition technique are presented and Section III
presents the mathematical formulation based on DA. Based on
the above mathematical formulation, a new ADFE architecture
is developed and is presented in Section IV.
II. BACKGROUND
DA is a bit-by-bit serial operation [11] that computes the
inner product of two vectors in a fixed number of clock cycles
regardless of the length of the vectors. In other words, it
implements a series of sum-of-product operations (or MAC
operation) regardless of the number of products to be summed
up. The bit-serial operation from the sum-of-product operation
of vectors can be obtained as follows.
The sum-of-product of two length- vectors c and x is
given as
=
(1)
=0
If each is represented by its 2s-complement form as follows

= ,1 +
, 2
=1
then substituting (2) in (1) and re-arranging we get
(2)
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
Entries
0
c0
c1
c1 + c0
c2
c2 + c0
c2 + c1
c2 + c1 + c0
c3
c3 + c0
c3 + c1
c3 + c1 + c0
c3 + c2
c3 + c2 + c0
c3 + c2 + c1
c3 + c2 + c1 + c0
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
sign bit
control
+/
Accumulating
register
21
Shifter
Fig. 2. Architecture for computing inner product of two 4-length vectors

using distributed arithmetic.
[ 1
]
,1 +
=0
Defining
[ 1
=1
{
=
=0 ,
1
=0 1
]
, 2
(3)
=0
=
0
=1
Then
=
(4)
=0
It can be observed that in (3) takes one out of 2

possible values, which may be stored in a Look-up-table
(LUT). Hence, accessing the required value from LUT and
then shift-accumulating it would compute the output . The
LUT stores 2 values and the entry of LUT addressed by a,
is given as
() =
1
=0
()
= 0, 1, . . . , 2 1
(5)
()
where is the th -bit in the -bit representation of address

. Specifically
=
=0
()
(6)
The architecture for the computation of inner product of

two 4-length vectors using DA is shown in Fig. 2. The Lookup-table stores all the possible combinations (partial prdocuts)
of the elements of vector c. The bits in the 2s-complement
representation of each element of x i.e., , become the
address bits of the LUT. Starting from the Least signficant
bit (LSB) i.e., ,0 the bits are supplied serially until the
Most significant bit (MSB) i.e., ,1 is arrived. Hence the
computation of inner product takes clock cycles. This
makes DA a prefferable method to compute the inner product
22 LU T
a1 a0
Address b0,j
bits
b1,j
0
0
1
1
0
1
0
1
Entries
0
c0
c1
c1 + c0
Now,
sign bit
control
+
+/
1
Accumulating
register
() =
Address b2,j
bits
b3,j
0
0
1
1
0
1
0
1
Entries
0
c2
c3
c3 + c2
()( )
=0
1
22 LU T
a1 a0
Fig. 3. Architecture for computing the inner product of two 4-length vectors
using DA ROM decompostion.
=0
()( )
=0
()( ) +
=0
1
Shifter
{ ()}( )
=0
()( ) +
()( )
=0
()( )
=0
of two vectors of very large size since the time it takes

to compute the output is independent of the length of the
vectors. When is very large, the size of the LUT becomes
very large and hence the practical implementation would be
difficult. However, using the ROM decomposition technique,
the LUT can be split into multiple LUTs storing the partial
products again but with a linear increase in the adders. In other
words, if is split as = , then number of 2 sized LUTs can be used instead of using a single 2 -sized
LUT. The architecture with ROM decomposition technique
for the computation of vectors c and x can be modified
as shown in Fig. 3.
Therefore,
() = w z
(11)
where = + and w and z are given as

w = [,0 (), ,1 (), . . . , 1 (),
,0 (), ,1 (), . . . , 1 ()]
z = [(), ( 1), . . . ( + 1),
(), ( 1), . . . ( + 1)]
(12)
(13)
Hence,
() =
( )
(14)
=0
III. M ATHEMATICAL F ORMULATION

Consider an ADFE as shown in (1) that processes an input
signal () and generates the output decision () as per the
following:
Now, if each of ( ) is represented

1 by its 2s-complement
form as ( ) = ,0 + =1 , 2
() =
[ 1
=0
() = [()]
(7)
() =
() ()
(8)
() =
() =
=0
()( )
(9)
,0 +
{ 1
=1
=0
}
,
(15)
It can be observed that (15) is similar to (3) and hence the FFF
and FBF can be combined and implemented using a single DA
processing unit by proper initialization of FFF and FBF coefficients. Choosing the LMS criteria inorder to update the coefficients of FFF and FBF, the weight update equations would
be
, ( + 1) = , () + ()( )
(16)
, ( + 1) = , () + ()( )
(17)
where and are the step sizes of FFF and FBF respectively, () is the error signal which is given as () =
() ().
Replacing , ( + 1) by its negative number
()( )
=0
() = ( 1)
(10)
where [.] represents quantization of the slicer, and

being the length of FFF and FBF vectors respectively.
, ( + 1) = , () + ()( )
(18)
Multiplying (16), (18) with , and , respectively and

introducing the summation we get
Buers
2Nb LU T
FBF
Co-ecients
2Nf LU T
Input
Samples
Control Signal and

Address generator
Accum
Buers
2Nf LU T
FFF
Co-ecients
2Nb LU T
Decision
Outputs
Slicer
Equalized
Output
Fig. 5.
Fig. 4.
ADFE architecture based on DA formulation.
Auxiliary-LUT at time 1 and .
are stored individually in two separate LUTs in which case

the signal () becomes
1
, , ( + 1) =
=0
, , ()
=0
(19)
+ ()
, ( )
=0
1
=0
, , ( + 1) =
() =
+ ()
=1
(21)
, , ()
,0 +
1
1

, 2
=1
=0
=0
+ 1
1
+
1
,0 +
, 2
=0
(20)
, ( )
=0
where = [0, 1, . . . , 1].

The left hand side in the (19) and (20) describe the entires of
LUT in the current iteration while the first term on the right
hand side describe the entries of LUT in the previous iteration.
Further, it can be seen that inorder to update the entries of LUT
from time to time, one needs to maintain one more look-up
table storing the partial products of input samples samples and
equalized samples as described by the second terms on the left
side of (19) and (20).
IV. ADFE ARCHITECTURE DESIGN AND ITS
PERFORMANCE ANALYSIS
Based on the above formulation, we derive the architecture

for low complexity implementation of ADFE based on DA.
From (15) it can be observed that FFF and FBF can be combined and replaced by a single DA processing unit. However,
even if the weights of FFF and FBF can be combined and
replaced with the same vector, the input symbols to each
of these filters differ. Further, if the scheme in [12] is to
be employed, an LUT which store the combinations of both
the input symbols as well the decision outputs has to be
maintained whose LUT update criteria becomes impossible.
Hence, ROM decomposition technique has been employed
where the partial products of co-efficients of FFF and FBF
The first two terms in (21) represent the partial products

of FFF co-efficients while the other two terms represent the
partial products of FBF co-efficients. The architecture based
on the mathematical formulation is shown in Fig. 4. The
architecture is similar to the architecture in [12]. It consists of
two LUTs storing the FFF and FBF co-efficients respectively
along with two corresponding LUTs that store the partial
products of input samples and decision outputs. The LUTs
that store the partial products of FFF and FBF are said to be
filtering LUTs and those that store the partial prodcuts of input
samples and decision outputs are said to be auxiliary-LUTs or
DA-A-LUTs. The weight update scheme for LUT storing FFF
co-efficients is explained as follows. One by one, each entry of
the auxiliary-LUT is multiplied by the term (), the result
is added to the same addressed entry of the filtering LUT and
stored back in the same addressed location of filtering LUT.
The weight update scheme for FBF co-efficients is same as
that of FFF except that each entry of the auxiliary-LUT is
multiplied by the term () as the negative partial products
of FBF co-efficients are stored unlike FFF co-efficients. The
auxiliary-LUT update scheme is same as that of the scheme in
[12] and is explained as follows. The upper half entries of the
auxiliary-LUT in the current iteration are mapped on to the
even addressed locations and the remaining half are updated
by just adding the newest arrived sample with the value in
its previous even-addressed location. Instead of re-updating
all the entries of auxiliary-LUT, the external address of it is
circulary shifted in every iteration. The entries of DA-A-LUTs
TABLE I
C OMPARISON OF HARDWARE COMPLEXITY.
Implementations
Number of
adders
Number of
multipliers
Number of
21
MUXs
Existing scheme 1 [10]
2+2 +
1
2+2 1
Existing scheme 2 [10]
2() /2
Proposed
2 + 2
2() /2
2
2
at time 1 and is shown in Fig. 5. With as the

input bit-width, the comparison of the hardware complexity
of the proposed DFE design with the best of recent existing
designs is given in Tab. I. It can be observed that the proposed
design is utilizing less number of adders and multiplexers and
with no hardware multipliers.
V. C ONCLUSION
A new distributed arithmetic formulation of ADFE is
presented. Based on the mathematical formulation, a lowcomplexity ADFE architecture is designed. The feed forward
filter and feedback filter are combined and replaced by a
single distributed arithmetic processing unit and by employing the ROM decompostion technique the LUTs are made
well suitable for the weight-update operation. The resultant
architecture is multiplierless and utilizes less number of adders
and multiplexers with increased number of memory locations.
The computational load and hardware overhead can be further reduced using other algorithm-architecture transformation
techniques.
R EFERENCES
[1] G. Davidson, D. Falconer, and A. Sheikh, An investigation of blockadaptive decision feedback equalization for frequency selective fading
channels, in IEEE Int. Conf. on Comm., 1988. ICC 88. Digital
Technology - Spanning the Universe. Conference Record.,, Jun. 1988,
pp. 360 365 vol.1.
[2] A. Gatherer and T.-Y. Meng, High sampling rate adaptive decision
feedback equalizers, in Proc. IEEE Int. Conf. Acoust., Speech, Signal
Process., Apr. 1990, pp. 909 912 vol.2.
[3] K. Raghunath and K. Parhi, Parallel adaptive decision feedback equalizers, IEEE Trans. Signal Process., vol. 41, no. 5, pp. 1956 1961,
May. 1993.
[4] K. Parhi, Pipelining in algorithms with quantizer loops, IEEE Trans.
Circuits Syst., vol. 38, no. 7, pp. 745 754, Jul. 1991.
[5] N. Shanbhag and K. Parhi, Pipelined adaptive DFE architectures using
relaxed look-ahead, IEEE Trans. Signal Process., vol. 43, no. 6, pp.
1368 1385, Jun. 1995.
[6] K. Berberidis, T. Rontogiannis, and S. Theodoridis, Efficient block implementation of the decision feedback equalizer, IEEE Signal Process.
Lett., vol. 5, no. 6, pp. 129 131, Jun. 1998.
[7] K. Berberidis and P. Karaivazoglou, An efficient block adaptive decision feedback equalizer implemented in the frequency domain, IEEE
Trans. Signal Process., vol. 50, no. 9, pp. 2273 2285, Sept. 2002.
[8] A. Rontogiannis and K. Berberidis, Efficient decision feedback equalization for sparse wireless channels, IEEE Trans. Wireless Commun.,
vol. 2, no. 3, pp. 570 581, May. 2003.
[9] K. Parhi, Design of multigigabit multiplexer-loop-based decision feedback equalizers, IEEE Trans. Very Large Scale Integr. (VLSI) Syst.,
vol. 13, no. 4, pp. 489 493, Apr. 2005.
[10] C.-H. Lin, A.-Y. Wu, and F.-M. Li, High-Performance VLSI Architecture of Decision Feedback Equalizer for Gigabit Systems, IEEE Trans.
Circuits Syst. II, Exp. Briefs., vol. 53, no. 9, pp. 911 915, Sept. 2006.
[11] S. White, Applications of distributed arithmetic to digital signal processing: a tutorial review, IEEE ASSP Mag., vol. 6, no. 3, pp. 4 19,
Jul. 1989.
[12] D. Allred, H. Yoo, V. Krishnan, W. Huang, and D. Anderson, LMS
adaptive filters using distributed arithmetic for high throughput, IEEE
Trans. Circuits Syst. I, Reg. Papers, vol. 52, no. 7, pp. 1327 1337, Jul.
2005.

06454433

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

06454433

Загружено:

Авторское право:

Доступные форматы

Low Complexity Hardware Architectural Design for

Adaptive Decision Feedback Equalizer using

Rafi Ahamed Shaik

Department of Electronics and Electrical Engineering

Department of Electronics and Electrical Engineering

N digital communications, transmitted symbols are more

no decision errors, the decision device removes all the noise

Block diagram of an Adaptive Decision Feedback Equalizer.

al. subsequently presented a new efficient DFE appropriate for

If each is represented by its 2s-complement form as follows

then substituting (2) in (1) and re-arranging we get

Fig. 2. Architecture for computing inner product of two 4-length vectors

It can be observed that in (3) takes one out of 2

where is the th -bit in the -bit representation of address

The architecture for the computation of inner product of

of two vectors of very large size since the time it takes

where = + and w and z are given as

III. M ATHEMATICAL F ORMULATION

Now, if each of ( ) is represented

where [.] represents quantization of the slicer, and

Multiplying (16), (18) with , and , respectively and

Control Signal and

ADFE architecture based on DA formulation.

Auxiliary-LUT at time 1 and .

are stored individually in two separate LUTs in which case

where = [0, 1, . . . , 1].

Based on the above formulation, we derive the architecture

The first two terms in (21) represent the partial products

Existing scheme 1 [10]

Existing scheme 2 [10]

at time 1 and is shown in Fig. 5. With as the

Вам также может понравиться