Вы находитесь на странице: 1из 31

Design And Implementation Of Lut Multiplier For Dsp Based Applications

By Dinesh Alapati (08J41A0483) S. M. Himanshu (08J41A0488) P. Vishnu (08J41A04D1)

Objective Introduction Digital Signal Processing (DSP) Field Programmable Gate Array (FPGA) Distributed Arithmetic (DA) Architecture New approach to LUT design of FIR digital filter Simulation Results Advantages Applications Conclusion Future scope References

Digital filters are a very important part of DSP. Advancements of digital signal processing functions in FPGA has put great efforts in designing efficient architectures for DSP functions. Conventional design of an Digital FIR Filter based on the direct implementation of a K-tap FIR filter requires K multiply-and-accumulate (MAC) blocks. we first present DA, which is a multiplier-less architecture. There is an exponential increase in the size of the memory with respect to order of the filter. Proposed architecture is designed with a new approach to LUT based multiplier whose memory is reduced to half.

The size of each LUT is fixed. No multiplier units so complexity reduces. Low power consumption. Performance increases and hence speed increases. The multipliers are fast and efficient. The multipliers can be cascaded with each other or CLB logic for larger or more complex functions. The memory used for implementing the LUT multiplier is exactly half of that of the DA architecture. Decoders, Adders, Registers, Latches used in implementing for the design of multiplier has reduced.

Digital Signal Processing (DSP) is one of the most active area in VLSI
applications Traditionally, DSP algorithms are implemented either using general purpose

DSP processors (Low speed, less expensive, flexible) or using ASICs (High
speed, expensive, less flexible) FPGAs provide solutions that maintain both the advantages of the approach

based on DSP processors and the approach based on ASICs

DSP applications include multiply-and-accumulate (MAC) blocks, which

require efficient architecture to designing these blocks.

Multipliers using the logic fabric of the FPGA is costly. An alternative method for computing multiplication is to decompose the MAC operations

into a series of LUTs.

Distributed Arithmetic (DA) FIR filter using FPGA architecture. We propose, New approach to LUT design and memory based realization

of FIR digital filter.

Comparisons of diff Programmable Logics

FIR Filters

FPGA Generic Structure

A Field-Programmable Gate Array (FPGA) is a semiconductor device that can be configured by the designer after manufacturing hence the name "field-programmable". FPGA Building Blocks : o Programmable logic blocks Implement combinatorial and sequential logic o Programmable interconnect Wires to connect inputs and outputs to logic blocks o Programmable I/O blocks Special logic blocks at the periphery of device for
Logic Block Interconnection Switch


external connections

FPGA Basic Logic Element

LUT to implement combinatorial logic Register for sequential circuits

Additional logic (not shown):

o Carry logic for arithmetic functions o Expansion logic for functions requiring more than 4 inputs

Out A B C D



L o o k U p Ta b l e ( L U T )
The basic features of LUT is : o Complete times table of all possible input combinations Look-Up Table is possible to store binary data within solid-state devices. Those o One address bit solid-state in each devices storage "cells" withinfor each bit memoryinput are easily addressed by o Table size grows exponentially driving the "address" lines of the device with the proper binary value(s). o Very limited use o Fast - result is N-inputs can be used away Look-up table with just a memory access to implement any combinatorial function of N inputs

LUT Based Multiplier

In many DSP circuits, multipliers always have one constant input. Ci (constant)


For the above multiplier, y[n] purely depends on x[n]. Thus, a look-up table (LUT) can be used to implement the multiplier.

For example, a 256 X 16 bit memory can be used to implement a 8-bit ,multiplier if one of its input is always constant.

Distributed Arithmetic Architecture

LUT Technique for Distributed Arithmetic : Distributed Arithmetic (DA) is the well known method of implementing FIR filters. DA solves the computation of the inner product equation when the coefficients are pre knowledge, as happens in FIR filters. An FIR filter K is described as: y=K-1 x[n-k] hk

In this equation, the hk are the fixed coefficients, K is the number of filter taps and xk are the input data words. These ones have a standard fixed-point format number. Using registers, memory resources and a scaling accumulator does the implementation of digital filters using this arithmetic.

Distributed Arithmetic (contd)

Distributed Arithmetic (contd)

Original LUT-based DA implementation of a 4-tap (K=4) FIR filter is shown in Figure. The DA architecture includes three units: the shift register unit, the DA-LUT unit, and the adder/shifter unit.

New Approach to LUT Design of FIR Digital Filter

FIR digital filter is widely used in various signal processing applications. The order of the filter is directly dependent on the width of the transition band. Hence, the number of MAC operations required increases respectively. In DA architecture, where the memory elements store all the possible values of products of the filter coefficients could be an area efficient criteria for implementation of FIR filter. There is a basic variant of memory based technique which is based on computation of multiplication by LUT. In this work, i.e. in designing the LUT for LUT-based-multiplier implementation, where the memory size is reduced to nearly half of the conventional approach.

LUT Design for Memory-Based Multiplication

The conventional memory-based-multiplier is depicted as following:

Fig : Conventional Memory-Based Multiplier

o Let A be a fixed coefficient and X be an input word to be multiplied with A. o If X is an unsigned binary number of word length L, there can be 2L possible values of X and accordingly there can be 2L possible values of product C=A.X. o Therefore, for the conventional implementation of memory based multiplication, a memory unit of 2L words is required to be used as look-up-table consisting of pre-computed product values corresponding to all possible values of X.

LUT Design for Memory-Based Multiplication (contd)

In this work, the basic principle of memory based multiplication is depicted in following : o In the proposed memory based multiplication the memory used is exactly reduced to half of that which used in the conventional based multiplication. o Although 2L possible values of X corresponding to 2L possible values of C=A.X, recently we have shown that only (2L/2) words corresponding to the odd multiples of A may only be stored in the LUT. o while all the rest (2L/2)-1 are even multiples of A which could be derived by left-shift operations of one of the odd multiples of A. o We illustrate this in the following table for L=4.

LUT Design for Memory-Based Multiplication (contd)

Table: LUT words and product values for input word length L=4

LUT Design for Memory-Based Multiplication (contd)

We illustrate this approach in table for L=4. o At 8 memory locations, 8 odd multiples A*(2i +1) are stored as Pi for i = 0,1,27. o The even multiples 2A, 4A, and 8A are derived by left-shifting operations of A. o Similarly, 6A and 12A are derived by left shifting 3A, while 10A and 14A are derived by left shifting 5A and 7A, respectively. o The address X=(0000) corresponding to (A.X)=0, which can be obtained by resetting the LUT output. o Therefore, for an input multiplicand of word-size L similarly, only (2L/2) odd multiple values need to be stored in the memory-core of the LUT, while the other (2L/2-1) non zero could be derived by left shift operations of the stored values.

P r o p o s e d L U T- B a s e d M u l t i p l i e r f o r 4 - B i t Input
The proposed LUT-based multiplier for input word size is shown in the following figure:

Fig : Proposed LUT design for multiplier

L U T- B a s e d M u l t i p l i e r f o r 4 - B i t I n p u t (contd)
The various modules included in the above block diagram are : o 4 to 3 bit Address Encoder :

The 4-to-3 bit input encoder is shown in Fig. 3(b). It receives a four-bit input word (x3x2x1x0) and maps that onto the three-bit address word, according to the logical relations.

L U T- B a s e d M u l t i p l i e r f o r 4 - B i t I n p u t (contd)
o 3 to 8 Line Address Decoder : The decoder takes the 3-bit address from the input encoder, and generates 8 word-select signals, to select the referenced-word from the memory-array.

Fig: 3 to 8 decoder

L U T- B a s e d M u l t i p l i e r f o r 4 - B i t I n p u t (contd)
o Control Circuit :

The number of shifts required to be performed on the output of the LUT and the control-bits and for different values of are shown Table. The control circuit accordingly generates the control-bits given by,

L U T- B a s e d M u l t i p l i e r f o r 4 - B i t I n p u t (contd)
o Barrel Shifter : The LUT output is required to be shifted through 1 location to left when the input operand is one of the values. Two left-shifts are required if is either (0 1 0 0) or (1 1 0 0). Only when the input word is (1 0 0 0), three shifts are required. For all other possible input operands, no shifts are required. Since the maximum number of left-shifts required on the storedword is three, a two-stage logarithmic barrel-shifter is adequate to perform the necessary left-shift operations.

L U T- B a s e d M u l t i p l i e r f o r 4 - B i t I n p u t (contd)
NOR cell :

o The RESET bit is fed to one of the inputs of all those NOR gates, and the other input lines of 8 NOR gates of NOR cell are fed with 8 bits of LUT output in parallel. o When RESET = 1, the output is 0. o When RESET = 0, the outputs of NOR gates is just the compliment of the LUT output-bits.

Applications of LUT multiplier

The applications of implementing LUT multiplier are : In Digital filters ( such as FIR and IIR). Finite Impulse Response (FIR) filters using LUT multiplier approach are widely used to implement pulse-shaping filters. Digital phase-locked loop (PLL) frequency synchronizers. Discrete cosine transform (DCT) cores. In digital communication : o Channel equalization. o Frequency channelization. Speech processing (adaptive noise cancelation).

Traditionally, direct implementation of a K-tap FIR filter requires K multiplyand-accumulate (MAC) blocks, which are expensive to implement in FPGA due to logic complexity and resource usage. An alternative to computing the multiplication is to decompose the MAC operations into a series of lookup table (LUT) accesses and summations.

Advantage of this method is the LUTs readily available in the FPGAs can be
utilized efficiently. This work presents the proposed DA architectures for FIR filters, i.e., multiplier less architecture. Then, the complexity is reduced. Hence there is low power consumption. Then performance increases. Then the speed increases.

Future Scope
Future scope of this project is to improve the architecture of the Distributed arithmetic FIR filter such that it uses the hardware resources of the latest FPGA

In vertex-5 and Vertex-6 family FPGAs, 6-input LUTs were introduced. Future work includes changing the architecture which uses 6-input LUTs for storing coefficient sums and SRL(Shift register logic) macros to implement shift operations such that total number of slices used will be reduced.

References: o DIGITAL SIGNAL PROCESSING Principles, Algorithms, and Applications by John G.Proakis, Dimitris G.Manolakis o DIGITAL SIGNAL PROCESSING by NagoorKani o SWITCHING THEORY AND LOGIC DESIGN by R.P.Jain o Wang Sen, Tang Bin, Zhu Jun, Distributed Arithmetic for FIR Filter Design on FPGA o o o o Websites: www.wikipedia.org/wiki/FIR www.wikipedia.org/wiki/daFIR www./ipcores/distributedarithmeticFIRd.cfm www.daFIR.cfm