Академический Документы
Профессиональный Документы
Культура Документы
Abstract - Adaptive filtering techniques are widely used in the fields of signal
processing and communication such as echo/noise cancellation and speech/image
coding. Adaptive filters usually need real time ability to process signal. This paper
presents a high speed and flexible VLSI architecture. This filter is the digital adaptive
finite impulse response (FIR) filter based on the delayed error least mean square
(DELMS) algorithm. The architecture has hardware utilization efficiency (HUE) of
loo%, and we can easily scale the filter without reducing the throughput rate. The
timing simulation results demonstrate the effectiveness of the architecture. We have
used 0.6 p CMOS SPTM standard cells technology to implement the chip.
INTRODUCTION
Usually, applications of adaptive system have long filter length and high
signals clock rate, The systolic architecture provides high throughput. So we
design an adaptive system with systolic architecture based on the Delay Error
LMS algorithm.
ADAPTIVE ALGORITHM
y ( n )= c
N-1
k=O
w, ( n ) x ( n - k) = w (n)X(n)
osk Gv-1,
where n is the time index, w,(n) is the coefficient, x(n) is the input sample, y(n) is
the output, p(n) is the desired signal, e(n) is the estimation error at time n, W(n)=
[w,(n) wl(n)..... wN.l(n)]Tis the coefficient vector, X(n) = [x(n) x(n-1). ....
x( n - N + 1)IT is the input vector, the superscript T denotes vector
transposition, N is the filter length.
The LMS adaptive filter contains feedback operations such that the current
recursion must be completed before the next recursion is initiated. This
computation cannot be pipelined. This characteristic of recurrent computation
prevents the system fiom being used for high-speed real time application. This is
called the latency problem.
To overcome the latency problem, the algorithm has therefore been modified
to enable its pipelined implementation.The modified algorithm is named Delayed
LMS [ 2 ] and can be described as follows:
OIkI N-1,
where N is the number of filter taps, and U , is the step size whic.. governs the
stability and the rate of convergence, and A4 is the delay-units inserted in the
coefficient adaptation block. The delayed LMS (DLMS) algorithm solves the
latency problem by using the delayed input and residual error signal to update the
filter coefficient. One major disadvantage of this approach, however, is that it has
a worse convergenceperformance than the LMS algorithm.
The delay enforced on the input sample serves only to align it in time with the
delayed error samples in the DLMS coefficient-updating rule; it doesn't
contribute to increase the sample rate. Accordingly, J. Thomas [3] presents a
705
Delayed Error Least Mean Square (DELMS) adaptation rule that can be
described as follows:
w,(n+l) = w, ( n ) +pe(n-M)x(n-k) ,
OIkIN-1,
where M is the error delay in the error feedback path. This rule proposes a
possible realization that convolves delayed error samples with underlayed input
samples. If the delay element unitD I (k- I ) , the DELMS algorithm has better
convergence behavior than that which can be obtained with the DLMS technique.
Therefore DELMS is a proper choice for pipeline architecture.
According to J. Thomas [SI, the signal flow graph (SFG) for N-tap DELMS
adaptive FIR filters can be divided into two blocks: the tap coefficients adaptation
block and the FIR filter output block. The output block computes the summation
of the products of multiplication of w, ( n ) x(n-k) , OIk IN-1. Fig. 2 is the signal
flow graph (SFG) for N-tap DELMS adaptive FIR filters.
Since there are two time scaling factors in the architecture, the utilization of
the circuit in the systolic is only 50%. To restore the 100% efficiency of the
systolic array, we fold the upper MAC operation modules in the coefficient
adaptation block into the lower MAC operation modules in the filter output block
for each stage of the adaptive filter. The folded architecture is shown in Fig. 4. We
add a multiplexer into the output of the filter to obtain the valid output sample
signal in the' folded architecture. The system consists of N identical processing
elements (PES) that are connected in cascade. Each processing element performs
706
all of the computations associated with a single coefficient of the adaptive filter.
The proposed system thus provides a significant computational speed up over the
single processor LMS filter. Fig. 4 shows the folded systolic array for the DELMS
algorithm. Fig. 5 shows the processing elements (PES) shown in Fig. 4.
EXPERIMENTAL RESULTS
We used Verilog-XL Tool to run the timing simulation and 0.6 P CMOS
SPTM standard cells technology to implement the chip. The simulation results
demonstrated the effectiveness of the architecture in Figure 3. We set the tap
length, N , of the DELMS adaptive filter as N = 5. The adaptation delay MD =
(N+1) D / 2 = 3 0 was adopted in the systolic architecture. The main features of
this chip are summarized in Table 1. The whole chip layout is shown in Figure 6.
CONCLUSIONS
References
707
[8] H. Herzberg, R. Haimi-Cohen, and Y. Be’ery, “A systolic array realization of
an LMS adaptive filter and the effects of delayed adaptation,” IEEE Trans.
Signal Processing, vol. 40, no. 11, pp.2799-2803, Nov. 1992.
[9] G. Long, E Ling, and J. G. Proakis, “Corrections to “The LMS algorithm
with delayed coefficient adaptation,” IEEE Trans. Acoust., Speech, Signal
Processing, vol. 40, no. 1, pp. 230-232, Jan. 1992.
708
x( n): input sequence
y ( n): output
p ( n): desired signal
e: error between the desired signal
and the output
Adaption
Algorithm
709
-p(l) -
Legend:
710
U
Fig. 5 The i th processing element (PE) for the pipelined DELMS adaptive filter in
Figure 4
71 1
Die size 1720 X 1720 Iim2
Total area with 3 120 X 312OPrn2
Pin count 40
Transistor gate 6950
Clock rate 23Mhz
712