Вы находитесь на странице: 1из 9

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

ISSN 0976 6464(Print) ISSN 0976 6472(Online) Volume 4, Issue 2, March April, 2013, pp. 348-356 IAEME: www.iaeme.com/ijecet.asp Journal Impact Factor (2013): 5.8896 (Calculated by GISI) www.jifactor.com

International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 6464(Print), ISSN 0976 6472(Online) Volume 4, Issue 2, March April (2013), IAEME

IJECET
IAEME

IMPLEMENTATION AND VALIDATION OF MULTIPLIER LESS FPGA BASED DIGITAL FILTER


Jaya Koshta1,Vineeta Saxena(Nigam)2,Rakesh .K Arya3
2

(Electronics & Communication Department,BIRTS,Bhopal,India) (Electronics & Communication Department,UIT-RGPV,Bhopal,India) 3 (Senior resource scientist, MPCST,Bhopal,India)

ABSTRACT Finite impulse-response filters (FIR filters) are commonly used in digital signal processing applications and traditionally implemented using ASICs or DSP-processors. Nowadays, Field Programmable Gate Array (FPGA) technology is widely used in digital signal processing area because FPGA-based solution can achieve high speed due to its parallel structure and configurable logic, which provides great flexibility and high reliability in the course of design and later maintenance. However, the limitation of resources on an FPGA, i. e., logic blocks and flip flops, and furthermore, the high routing delays, require compact implementations of the circuits. Hence, FIR filter is implemented using distributed arithmetic technique which uses look-up table with offset binary coding. This paper describes an approach for implementation of FIR filter using distributed arithmetic, based on field programmable gate arrays (FPGAs).The experimental results shows that implementation of low pass FIR filter using DA technique with offset binary coding requires less resource utilization inside FPGA as compared to implementation of FIR filter using conventional multiply and accumulate (MAC) technique. The advantages of the FPGA approach to FIR filter implementation include higher sampling rates than are available from traditional DSP chips, lower costs than an ASIC for moderate volume applications and more flexibility than the alternate approaches. Keywords Binary offset coding, distributed arithmetic, FIR filter, FPGA , Look-up table

348

International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 6464(Print), ISSN 0976 6472(Online) Volume 4, Issue 2, March April (2013), IAEME

1.

INTRODUCTION

In general, Digital filters are divided into two categories, including Finite Impulse Response (FIR) and Infinite Impulse Response (IIR). And FIR filters are widely applied to a variety of digital signal processing areas for the virtues of providing linear phase and system stability. Compared to IIR filters, FIR filters have simple and regular structures which are easy to implement on hardware. However FIR filters require higher number of taps compared to IIR filters to achieve the same frequency specification. FIR filter implementation on FPGA requires special attention as area, power, speed constraints have to be satisfied. A number of filter architectures for FPGA implementation exist. Out of these, Distributed Arithmetic (DA) architecture yields better area, power and speed trade off balance. A discrete-time linear finite impulse response (FIR) filter generates the output y[n] as a sum of delayed and scaled input samples x[n] via the equation
y =

w
k =0

N 1

xk

(1)

A typical digital implementation will require N multiply-and-accumulate (MAC) operations, which are expensive to compute in hardware due to logic complexity, area usage, and throughput. Alternatively, the MAC operations may be replaced by a series of look-up-table (LUT) accesses and summations. Such an implementation of the filter, known as distributed arithmetic (DA), achieves higher throughput and lower logic complexity at the cost of increased memory usage. Recent advances in memory design technology have resulted in shrinking memory sizes, making this tradeoff an attractive option. Distributed Arithmetic (DA) appeared as a very efficient solution especially suited for LUT-based FPGA architectures. This technique, first proposed by Croisier et al[1], is a multiplier-less architecture that is based on an efficient partition of the function in partial terms using 2s complement binary representation of data. The partial terms can be pre-computed and stored in LUTs. The flexibility of this algorithm on FPGAs permits everything from bit-serial implementations to pipelined or full-parallel versions of the scheme, which can greatly improve the design performance. The main problem with DA is that the requirement of memory/LUT capacity increases exponentially with the order of the filter, given that DA implementations need 2K-words (K being the number of taps of the filter). That constitutes a first obstacle for FIR filters of high order. In this paper FIR filter is implemented using distributed arithmetic with offset binary coding so that the memory size is reduced by a factor of 2 to 2K-1.Also the resource utilization inside FPGA of FIR filter implemented using DA technique with offset binary coding is compared with FIR filter implemented using conventional MAC technique. 2. DISTRIBUTED ARITHMETIC

Distributed arithmetic (DA) is a bit-serial operation that computes the inner product of two vectors (one of which is a constant) in parallel. DA eliminates the need for multiply operations by using lookup tables (LUTs).The right balance among versions is
349

International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 6464(Print), ISSN 0976 6472(Online) Volume 4, Issue 2, March April (2013), IAEME

tied to specifications for a given application, and basically depends on requirements in terms of hardware cost and throughput. In each case, the designer has to trade bandwidth for area. Conventional Distributed arithmetic Consider a discrete N-order FIR lter with constant coefcients, and input samples coded as B-bit twos complement numbers with only the sign bit to the left of the binary point as:

x(n k) = xk 0 + xkj 2 j
j =1

B1

(2)

Using (1) to compute the FIR output gives


N k =0

y (n) = wk xk 0 + [ wk xkj ]2 j
j =1 k = 0
N N

B 1

(3)

With C j = wk xkj where j= 1to B-1 and C0 = wk xk 0 , equation (3) can be rewritten
k =0 k =0

as
y (n) =

C
j=0

B 1

2 j

(4)

Since the term Cj depends on xk,j values and has only 2N possible values, it is possible to precompute them and store them in look-up table or in read only memory[2],[3]. An input set of N bits (x0j,x1j xN-1,j) is used as an address to retrieve the corresponding Cj values. These intermediate results are accumulated in B clock cycles to produce one y value. This leads to multiplier free realization of FIR filter. Table I shows the contents of the look-up table for N = 4. Fig.1 shows a typical architecture for FIR filter using conventional distributed arithmetic. The shiftaccumulator is a bit-parallel carry-propagate adder that adds the LUT content to the previous accumulated result. The inverter and the MUX are used for inverting the output of the LUT in order to compute CB-1 and the control signal S is 1 when j = B-1 and 0 otherwise. The computation runs from j = 0 to j =B-1 and the result is available in bit parallel format after B clock cycles. This approach corresponds to bit serial arithmetic. However the main problem with DA is that the requirement of memory/LUT capacity increases exponentially with the order of the filter, given that DA implementations need 2K words (K being the number of taps of the filter). That constitutes a first obstacle for FIR filters of high order. Therefore offset binary-coding is introduced that can reduce the LUT size by a factor of 2 to 2N-1.

350

International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 6464(Print), ISSN 0976 6472(Online) Volume 4, Issue 2, March April (2013), IAEME

Table I Content of LUT (N =4)


x0,j 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 x1,j x 2,j 0 0 0 0 0 1 0 1 1 0 1 0 1 1 1 1 0 0 0 0 0 1 0 1 1 0 1 0 1 1 1 1 x 3,j 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Content of LUT 0 w3 w2 w2 + w3 w1 w1 + w3 w1 + w2 w1 + w2 +w3 w0 w0 + w2 w0 +w2 w0 +w2 + w3 w0 + w1 w0 +w1 + w3 w0 + w1 +w2 w0 + w1 + w2 + w3

Fig. 1 Implementation of conventional distributed arithmetic FIR filter 3. SUGGESTED METHODOLOGY FOR FIR FILTER IMPLEMENTATION

In suggested methodology for FIR filter implementation offset binary coding is used for distributed arithmetic. The offset-binary coding (OBC) is used to reduce the look-up table size by a factor of 2 to 2N-1.Also to increase the speed of FIR filter look-up table partitioning can also be done. Equation (2) can be written as:
x(n k ) = 1 { x ( n k ) [ x ( n k )]} 2

(5)

351

International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 6464(Print), ISSN 0976 6472(Online) Volume 4, Issue 2, March April (2013), IAEME

x(n k ) = xk 0 + xkj 2 j + 2 ( B 1)
j =1

B 1

(6)

Substituting (2) and (6) into (5),


B 1 1 x(n k ) = [( xk 0 xk 0 ) + ( xkj xkj )2 j 2 ( B1) ] (7) 2 j =1

By defining D kj as xkj xkj , the output from FIR lter can be written as

y ( n) =

B 1 wk [ Dk 0 + Dkj 2 j 2 ( B 1) ] k =0 2 j =1 N

=
N

wk Dk 0 B 1 N wk Dkj j N wk ( B 1) (8) + [ ]2 2 2 2 k =0 j =1 k = 0 k =0 2
N

Defining E j as
k =0

w k D kj 2

,and Eextra as

wk equation (8) can be rewritten as k =0 2


B 1 j =1

y (n) = E0 + E j 2 j Eextra 2 ( B 1)

(9)

Equations (5)-(9) characterize the OBC scheme. Table II shows the content of the look-up table. Table II Content of LUT with OBC coding(N=4)
x0,j 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 x1,j x 2,j 0 0 0 0 0 1 0 1 1 0 1 0 1 1 1 1 0 0 0 0 0 1 0 1 1 0 1 0 1 1 1 1 x 3,j 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Content of LUT -(w0 + w1 + w2 + w3) / 2 -(w0 + w1 + w2 - w3) / 2 -(w0 + w1 - w2 + w3) / 2 -(w0 + w1 - w2 - w3) / 2 -(w0 - w1 + w2 + w3) / 2 -(w0 - w1 + w2 - w3) / 2 -(w0 - w1 w2 + w3) / 2 -(w0 - w1 - w2 - w3) / 2 (w0 - w1 w2 - w3) / 2 (w0 - w1 -w2 + w3) / 2 (w0 - w1 + w2 - w3) / 2 (w0 - w1 + w2 + w3) / 2 (w0 + w1 - w2 - w3) / 2 (w0 + w1 - w2 + w3) / 2 (w0 + w1 + w2 - w3) / 2 (w0 + w1 + w2 + w3) / 2

352

International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 6464(Print), ISSN 0976 6472(Online) Volume 4, Issue 2, March April (2013), IAEME

It is obvious that the Ej values are mirrored along the line between the 8-th and the 9th rows in the LUT table. In other words the term Ej has only 2 N-1 possible values depending on xk,j values. Therefore it is possible to reduce the LUT size by a factor of 2[4-5]. Table III illustrates the content of LUT. Fig. 2 shows implementation of FIR filter using OBC coding. The computation starts from the lsb of xi ,i.e., j=0.The XOR gates are used for address deciding, the MUX with the constant Eextra provides the initial value to the shift accumulator and the MUX after the LUT is used to inverse the output of LUT when j= B-1.Two control signals S1 and S2 are required, where S1 is 1 when j = B-1 and 0 otherwise, and S2 is 1 when j=0 and 0 otherwise.

Table III Content of reduced LUT for N=4


x1,j 0 0 0 0 1 1 1 1 x2,j 0 0 1 1 0 0 1 1 x 3,j 0 1 0 1 0 1 0 1
Content of LUT -(w0 + w1 + w2 + w3) / 2 -(w0 + w1 + w2 - w3) / 2 -(w0 + w1 - w2 + w3) / 2 -(w0 + w1 - w2 - w3) / 2 -(w0 - w1 + w2 + w3) / 2 -(w0 - w1 + w2 - w3) / 2 -(w0 - w1 - w2 + w3) / 2 -(w0 - w1 - w2 - w3) / 2

Fig. 2 Implementation of distributed arithmetic FIR filter using OBC coding


The look-up table (LUT) of distributed arithmetic increases exponentially with N. Generally, LUT access time can be a bottleneck for the speed of filter, especially when LUT size is large. Therefore, reducing the LUT size is very important[6]. So to reduce the LUT size one possible solution is to divide the N address bits of the LUT into N/K groups of K bits. Hence it is possible to decompose the LUT of size 2N-1 into N/K LUTs of size 2K and add the outputs of these LUTs using multi-input accumulator.
353

International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 6464(Print), ISSN 0976 6472(Online) Volume 4, Issue 2, March April (2013), IAEME

4.

RESULTS

The proposed methodology is implemented for 8-tap low pass FIR filter. The FIR filter is simulated and synthesized using Xilinx ISE on Spartan board. The coefficients of filter are truncated to four decimals places, scaled to signed integer and are represented in 2s complement form. The precision for inputs and coefficients used are 8 and 12 bits respectively. The results of offset binary coding Distributed arithmetic FIR filter is compared with conventional multiply and accumulate technique of FIR filter implementation. Fig. 3 shows the simulation results for FIR filter implemented using MAC technique. Fig. 4 shows the simulation results for FIR filter implemented using DA using offset binary coding technique.

Fig.3 Simulation result for FIR filter implemented MAC technique.

Fig.4 Simulation result for FIR filter implemented using DA using offset binary coding technique.

354

International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 6464(Print), ISSN 0976 6472(Online) Volume 4, Issue 2, March April (2013), IAEME

Table IV compares the resource utilization of FIR filter on FPGA implemented using both MAC technique and DA with offset binary coding technique.

Table IV Device utilization summary for FIR filter designed using MAC and DA with offset binary coding
Selected Device: Spartan 2- xc2s200-5pq208 MAC Number of Slice Flip Flops: Number of 4 input LUTs: Number of occupied Slices: Number of GCLKs: Total equivalent gate count for design 101 out of 4704 319 out of 2352 291 out of 4704 1 out of 4 DA 68 out of 4704

250 out of 2352

205 out of 4704

1 out of 4

5423

2423

From Table IV it is seen that DA based implementation of FIR filter requires less resources inside FPGA as compared to MAC based implementation of FIR filter. Also it is seen that DA-based filters exhibit lower gate counts than their MAC counterparts because they don't require multipliers.

5.

CONCLUSION

Distributed Arithmetic has proved to be an area efficient technique of FIR filter implementation. While using it, special care is required against exponential growth of LUT size. Slicing of LUT of desired length, gives an effective solution, particularly, for high order filter designs. The FIR filters implemented in FPGAs provide the designer tremendous flexibility in terms of the number of filter taps and changes in existing coefficients.

355

International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 6464(Print), ISSN 0976 6472(Online) Volume 4, Issue 2, March April (2013), IAEME

REFERENCES
[1] Croisier, D. J. Esteban, M. E. Levilion, and V. Rizo, Digital Filter for PCM Encoded Signals, U.S. Patent No. 3,777,130, issued April, 1973. [2] C.S Burrus,Digital filter structures described by distributed arithmetic, IEEE Trans. On Circuits and Systems, Dec. 1977. [3] A. Peled and B. Liu, A new hardware realization of digital filters, IEEE Trans. Acoustics, Speech and Signal Processing, vol. ASSP- 22, no. 6, pp. 456-462, Dec 1974. [4] J. Choi, S. Shin and J. Chung, Efficient ROM size reduction for distributed arithmetic, in Proceedings of the IEEE ISCAS, Geneva, Switzerland, May 2000, vol. 2, pp. 61-64. [5] H. Yoo and D. V. Anderson, Hardware-Efficient Distributed Arithmetic Architecture For High-Order Digital Filters, IEEE International Conference on Acoustics Speech and Signal Processing,CASSP, pp.125-128, 2005. [6] Shanthala S, and S. Y. Kulkarni, High Speed and Low power FPGA Implementation of FIR Filter for DSP Applications, European Journal of Scientific Research, 2009 [7] Martinez-Peiro, J. Valls, T. Sansaloni, A.P. Pascual, and E.I. Boemo, A Comparison between Lattice, Cascade and Direct Form FIR Filter Structures by using a FPGA BitSerial DA Implementation, in Proc. IEEE International Conference on Electronics, Circuits and Systems, 1999, Vol. 1,pp. 241 244. [8] K. K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation. New York: Wiley, 1999. [9] S. A. White, Applications of distributed arithmetic to digital signal processing: A tutorial review, IEEE Acoust. Speech Signal Processing Mag., vol 6,pp.4-19 , July 1989. [10] M. A. Majed and Prof. C.S. Khandelwal, Efficient Dynamic System Implementation of FPGA Based Pid Control Algorithm for Temperature Control System, International Journal of Electrical Engineering & Technology (IJEET), Volume 3, Issue 2, 2012, pp. 306 - 312, ISSN Print : 0976-6545, ISSN Online: 0976-6553. [11] G.Prasad and N.Vasantha, Design and Implementation of Multi Channel Frame Synchronization in FPGA, International journal of Electronics and Communication Engineering &Technology (IJECET), Volume 4, Issue 1, 2013, pp. 189 - 199, ISSN Print: 0976- 6464, ISSN Online: 0976 6472. [12] Sriadibhatla Sridevi, Dr. Ravindra Dhuli and Prof. P. L. H. Varaprasad, FPGA Implementation of Low Complexity Linear Periodically Time Varying Filter, International journal of Electronics and Communication Engineering & Technology (IJECET), Volume 3, Issue 1, 2012, pp. 130 - 138, ISSN Print: 0976- 6464, ISSN Online: 0976 6472.

356

Вам также может понравиться