Design and Implementation of An Efficient Lut System in Memory Based Computation PDF

DESIGN AND IMPLEMENTATION OF AN EFFICIENT LUT SYSTEM IN MEMORY BASED COMPUTATION
1 1 2
D. Jahnavi, 2 Y. Ravikiran varma
M.Tech scholar, janu9080@gmail.com, E.C.E, Sreenivasa institute of technology and management studies, Chittoor
Assistant Professor, yrkvarma@gmail.com,E.C.E, Sreenivasa institute of technology and management studies, Chittoor
ABSTRACT: In this project, the implementation of multiplier with more enhanced LUT technique is presented. Antisymmetric product coding (APC) and oddmultiple-storage (OMS) techniques for lookup-table (LUT) design for memorybased multipliers to be used in digital signal processing applications. Thse two technoques seperatley reduces LUT size to half. In this project, it presents a different form of APC and a modified OMS scheme, in order to combine them for an efficient memory-based multiplication. The proposed mixed approach implements a reduction in LUT size to one-fourth of the conventional LUT. It has also suggested a simple technique for selective sign reversal to be used in the proposed design. It is shown that the proposed LUT design for small input sizes can be used for efficient implementation of high-precision multiplication by input operand decomposition. Keywords: Memory based computations, antisymmetric product coding, odd-multiplestorage, lookup-table, Digital signal processing INTRODUCTION: In terms of the algorithms employed, the mappers are divided into structural and functional. Structural mappers consider the circuit graph as a given and find a covering of the graph with K-input subgraphs corresponding to LUTs. The functional approaches perform Boolean decomposition of the logic functions of the nodes into subfunctions of limited support size realizable by individual LUTs. Since functional mappers explore a larger solution space, they tend to be time-consuming, which limits their use to small designs. In practice, FPGA mapping for large designs is done using structural mappers, whereas thefunctionalmappers are used for resynthesis after technology mapping.In this paper, we consider the recent work on DAOmap[2] as representative of the advanced structural technology mapping for LUT-based FPGAs and refer to it as the previous work and discuss several ways of improving it. Field
Programmable Gate Arrays (FPGAs) are an attractive hardware design option, making technology mapping for FPGAs an important EDA problem. For an excellent overview of the classical and recent work on FPGA technology mapping, focusing on area, delay, and power minimization, the reader is referred to [2]. The recent advanced algorithms for FPGA mapping, such as [2][12][16][23], focus on area minimization under delay constraints. If delay constraints are not given, first the optimum delay for the given logic structure is found and then area is minimized without changing delay. In terms of the algorithms employed, the mappers are divided into structural and functional. Structural mappers consider the circuit graph as a given and find a covering of the graph with K-input subgraphs corresponding to LUTs. The functional approaches perform Boolean decomposition of the logic functions of the nodes into subfunctions of limited support size realizable by individual LUTs. Since functional mappers explore a larger solution space, they tend to be time-consuming, which limits their use to small designs. In practice, FPGA mapping for large designs is done using structural mappers, whereas the functional mappers are used for resynthesis after technology mapping. In this paper, we consider the recent work on DAOmap [2] as representative of the advanced structural technology mapping for LUT-based FPGAs and refer to it as the previous work and discuss several ways of improving it. LUT for Multipliers: Multiplications can be computationally expensive in most hardware and software implementations. Various approaches in literature have been proposed to alleviate this overhead, usually at the cost of multiplication accuracy. One such example is the conversion of multiplication coefficients to dyadic fractions, which can be computed with a minimal sequence of bit shifts and additions. However, such approaches have proved to be limiting, requiring a lot of hand-tweaking to simultaneously minimize the complexity of the
calculation as well as the deviation from the desired result. Instead, a table-based lookup scheme to implement the multiplication steps is proposed. Whenever a multiplication result is needed, the system can simply look up the correct result on a precomputed table, without needing any computation whatsoever. This greatly simplifies the transform and inverse calculations.It is possible to store binary data within solid-state devices. Those storage "cells" within solid-state memory devices are easily addressed by driving the "address" lines of the device with the proper binary values. A ROM memory circuit written, or programmed, with certain data, such that the address lines of the ROM served as inputs and the data lines of the ROM served as outputs, generating the characteristic response of a particular logic functionTable lookup can replace any coefficient multiplication or unary operation. Although table lookup is often simpler than the actual calculation, the table size grows exponentially with the input signal range. However, for image and video applications, most signals are unsigned 8 bit values, which require only 256 possible cases, so the table based approach can be implemented with a reasonable cost. To implement coefficient multiplication, where the coefficient is 0.6834. To avoid using a multiplier, traditional lossless transforms approximate the given coefficient with a dyadic fraction (for example, to ). Then the coefficient multiplication can be implemented using shifts and additions as shown. Table lookup is also depicted .Unlike in the dyadic fraction case, table based multiplication yields a much more accurate approximation of the original coefficient. Literature Survey: a.The efficient memorybased VLSI array designs for DFT and DCT Guo, J.-I.; Liu, C.-M.; Jen, C.-W Nat. Chiao Tung Univ., Hsinchu :Efficient memorybased VLSI arrays and a new design approach for the discrete Fourier transform and discrete cosine transform are presented. The DFT and DCT are formulated as cyclic convolution forms and mapped into linear arrays which characterize small numbers of I/O channels and low I/O bandwidth. b.On the design automation of the memorybased VLSI architectures for FIR filters Lee, H.-R. Jen, C.-W. Liu, C.-M. Dept. of Electron. Eng., Nat. Chiao Tung Univ., Hsinchu:An approach to automating the
design of memory based VLSI architectures for FIR filters has been developed. The automation is based on the exploration of the design space and schemes for efficient memory replacement, algorithm formulation, architecture design, and evaluation method. c.A memory-efficient realization of cyclic convolution and its application to discrete cosine transform The memory efficient design for realizing the cyclic convolution and its application to the discrete cosine transform. To adopt the method of distributed arithmetic computation, and exploit the symmetry property of DCT coefficients to merge the elements in the matrix of the DCT kernel and then separate the kernel to be two perfect cyclic forms to facilitate an efficient realization of 1-D Npoint DCT using (N-1)/2 adders or subtractors, one small ROM module, a barrel shifter, and N-1/2+1 accumulators. Memory-based hardware for resourceconstraint digital signal processing systems The current trends of advancement of memory technology indicates reasonable scope to have efficient memory-based computing systems as promising alternative to the conventional logic-only computing in order to meet the stringent constraints and growing requirements of the digital signal processing systems in widely varying application environments. Several algorithms and architectures have been proposed in the literature to reduce the area- and timecomplexities of commonly encountered computation-intensive cores of DSP functions by memory-based computing, but many more novel algorithms and architectures need to be developed to design flexible area-delay-powerefficient systems for various DSP applications PROPOSED TECHNIQUE: LUT optimization is the main key factor in our project, in order to reduce power and area. The following techniques have to be implemented in LUT to get required qualities. 1. Anti symmetric Product coding (A.P.C) 2. Modified Odd multiple storage (O.M.S)
In this project, for the reduction of look-uptable (LUT) size of memory-based multipliers to be used in digital signal processing applications. It is shown that by simple sign-bit exclusion, the LUT size is reduced by half at the cost of a marginal area overhead. Moreover, a novel anti-symmetric product coding (APC) scheme is proposed to reduce
the LUT size by further half, where the LUT output is added with or subtracted from a fixed value. It is shown that the optimized LUTs for small input width could be used for efficient implementation of high-precision LUT-
multipliers, where the total contribution of all such fixed offsets could be added to the final result or could be initialized for successive accumulations. The proposed optimized LUTmultiplier is found to involve less area and less multiplication time than the existing LUTmultipliers.
The proposed APCOMS combined design of
the LUT for L = 5 and for any coefficient width W is shown in Fig. 3. It consists of an LUT of nine words of (W + 4)-bit width, a four-to-nine-line address decoder, a barrel shifter, an address generation circuit, and a control circuit for generating the RESET signal and control word (s1s0) for the barrel shifter. The precomputed values of A (2i + 1) are stored as Pi, for i = 0, 1, 2, . . . , 7, at the eight consecutive locations of the memory array, as specified in Table II, while 2A is stored for input X = (00000) at LUT address 1000, as specified in Table III. The decoder takes the 4-bit address from the address generator and generates nine word-select signals, i.e., {wi, for 0 i 8}, to select the referenced word from the LUT. The 4-to-9-line decoder is a simple modification of 3-to-8-line decoder, as shown in Fig. 4(a). The control bits s0 and s1 to be used by the barrel shifter to produce the desired number of shifts of the LUT output are generated by the control circuit, according to the relations Step2: Calculate s0, s1 and address Step3: Depends on s0, s1 output is shifted and stored into final output Proposed System Architecture A new approach to LUT design is presented, where only the odd multiples of the fixed coefficient are required to be stored, which is referred to as the odd-multiplestorage scheme in this brief. In addition, we have shown that, by the anti-symmetric product coding approach, the LUT size can also be reduced to half, where the product words are recoded as Anti-symmetric pairs.
Step2: Calculate APC word of X Step3: If X(4)=1 then output <= 16A - APC word(X) Else Output <= 16A + APC word(X) OMS: Step1:Takes last four bits of X
Fig:Architecure of Present method if the input bit size= 5 then the memory stored is of 2^5/2 = 15 locations which results in a reduction in LUT size by factor of 2. Hardware Environment: quickly regardless of cost. Later an ASIC can be used in place of the FPGA when the production volume increases, in order to reduce cost. FPGAs are programmed using a logic circuit diagram or a source code in a HDL to specify how the chip will work. FPGA Implementation FPGA programmable gate stands arrays for that can field be FPGAs contain programmable logic
components called "logic blocks" and a hierarchy of reconfigurable interconnects that allow the blocks to be "wired together". The programmable logic blocks are called
configured by the customer or designer after manufacturing. Field programmable gate
arrays are called this because rather than having a structure similar to a PAL or other programmable device, they are structured very much like a gate array ASIC. This makes FPGAs very nice for use in prototyping ASICs, or in places where and ASIC will eventually be used. For example an FPGA may be used in a design that needs to get to market
configurable logic blocks and reconfigurable interconnects are called switch boxes. Logic blocks can be configured to perform complex combinational functions, or merely simple logic gates like AND and XOR. In most FPGAs, the logic blocks also include memory elements, which may be simple flip-flops or more complete blocks of memory.
Fig: Flow chart of proposed technique SIMULATION RESULT OPTIMIZATION: OF LUT The applications of LUT optimization for memory based computation are: 1. Bio-medical: The total body wireless operations systems have nano components like nano cameras, CROs. Nano caeras have to be designed with less area occupancy inorderto embed in to human body. So, in design of those nano devices LUTs plays a vital role. CONCLUSION : Finally, an advanced and efficient LUT based multiplier is designed with reduction in area and barrel shifters. This yields multiple through put and gives huge applications with more comfort. Implementation of this type of LUT plays vital role in all type of applications such Biomedical, tele communications, militaries. REFERENCES: [l] A. V. Oppenheim and R. W. Schaffer, Discrete Time Signal Processing, Prentice Hall, 1989 [2] S. A. White, "Applications of Distributed Arithmetic to Digital Signal Processing: A Tutorial Review", IEEE ASSP Magazine, July 1989, pp. 4-19
APPLICATIONS:
[3] M. Mehendale, S. D. Sherlekar and G. Venkatesh, "Area-Delay Tradeoff in Distributed Arithmetic based Implementation of FIR Filters", VLSI Design 97, pp. 134-129 [4] S. Wolter, A. Schubert, H. Matz, R. Laur, "On the Comparison between Achitectures for the Implementationof Distributed Arithmetic", ISCAS 93, pp. 1829-1832 [5] K. Nourji and N. Demassieux, "Optimization of Real- Time VLSI Architectures for Distributed Arithmetic based Algorithms : Application to HDTV Filters",ISCAS 94, vol. 4, pp. 223-226 [6] E. M. Sentovich et. al. "SIS: A System for Sequential Circuit Synthesis", Memorandum No. UCB/ERL M92/41 [8] V. S. Rosa, E. Costa, S. Bampi. A High Performance Parallel FIR Filters Generation Tool. In Iberchip, San Jose:Costa Rica, 2006. [9] Altera Corporation, 101 Innovation Drive, San Jose,California 95134, USA. http://www.altera.com [10] Xilinx, Inc. http://www.xilinx.com [11] Hamming, R. W. Digital Filters, Prentice Hall, 3rd ed., 1989.

Design and Implementation of An Efficient Lut System in Memory Based Computation PDF

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Design and Implementation of An Efficient Lut System in Memory Based Computation PDF

Загружено:

Авторское право:

Доступные форматы

DESIGN AND IMPLEMENTATION OF AN EFFICIENT LUT SYSTEM IN MEMORY BASED COMPUTATION

D. Jahnavi, 2 Y. Ravikiran varma

The proposed APCOMS combined design of

configured by the customer or designer after manufacturing. Field programmable gate

Вам также может понравиться