Вы находитесь на странице: 1из 7

National Conference On Electrical Sciences -2012 (NCES-12) 81583-72-2

ISBN: 978-93-

New Reconfigurable Architectures for Implementing FIR Filters with Low Complexity S.Poojitha Prof.
Dept. of ECE Dept. of ECE SVPCE Visakhapatnam, India Visakhapatnam, India jithamahi@gmail.co m
AbstractRecon the two key requirements of (FIR) communication systems. In this paper, two new recon roposed, namely constant shifts method and programmable shifts method. The proposed FIR architecture is capable of operating for different word length hardware circuitry. We show that dynamically recon rable using common sub expression elimination algorithms. Design examples show that the proposed architect ures offer good area and power reductions and speed improvement compared to the best existing recon e FIR the literature.

M.Murali SVPCE muralitejas@gmail.co m narrowband channels from a wideband signal using a of FIR filters, called channel filters. bank However the stringent adjacent channel due to attenuation specifications of wireless communication standards, order higher filters are required for channelization and consequently the complexity and power consumption of the receiver will be high. As the ultimate aim of the future multistandard communication receiver is to realize wireless its functionalities in mobile handsets, where its full utilization is possible, low power and low area implementation of FIR channel filters is inevitable. The complexity of FIR filters is dominated by the complexity of coefficient multipliers. Moreover, we that there is sufficient scope for more work note on complexity reduction in reconfigurable filters especially for wireless communication applications where higher order filters are often required to meet stringent the adjacent channel attenuation specifications. In this paper, we propose two architectures that integrate reconfigurability and low complexity to realize FIR filters. The FIR filter architectures proposed are called constant shifts method (CSM) and programmable shifts method We have presented the preliminary design (PSM). of these architectures in a recent conference paper [7]. proposed architectures consider coefficients The as constants (as they are stored in LUTs) and input as variable. The coefficient multiplication signal in such a case is known as multiple constant multiplications (MCM), i.e.

Index Terms Channelizer, sub expression elimination, FIR vel synthesis, recon I. I NTRODUCTION

common

Recent advances in mobile computing and communication applications demand low power and speed VLSI Digital Signal Processing high (DSP) systems. One of the most important operations in DSP is finite impulse response filtering. FIR DIGITAL filters find extensive applications in mobile communication systems for applications systems such as channelization, channel equalization, matched filtering, and pulse shaping, due to their absolute stability and linear phase properties. The filters employed in mobile systems must be realized to consume less power and operate at high speed. Recently, with the advent of software defined radio (SDR) technology, finite impulse response (FIR) research filter has been focused on reconfigurable realizations. The fundamental idea of an SDR is to replace most of the analog signal processing in the transceivers with digital signal processing in order to the advantage of flexibility provide through reconfiguration. This will enable different airinterfaces to be implemented on a single generic hardware platform to support multistandard wireless communications [1]. Reconfigurability of the receiver to work with different wireless communication standards is another key requirement in an SDR. The computationally intensive part of an most SDR receiver is the channelizer since it operates at the highest sampling rate [6]. It extracts multiple

Fig.1. Transposed direct form of an FIR filter. multiplication of one variable (input signal) with multiple constants (filter coefficients) The MCM is then optimized for eliminating redundancy using our recently proposed BCSE algorithm [6] to minimize complexity. The proposed CSM focuses the filter on

Department Of EEE, Annamacharya Institute Of Technology & Sciences, Rajampet 172

National Conference On Electrical Sciences -2012 (NCES-12) 81583-72-2


the implementing FIR filters by partitioning the filter coefficients into fixed groups. The PSM has a preanalysis part which eliminates the redundancy in filter coefficients using the BCSE algorithm. The advantage of CSM is that it produc es highspeed at the cost of a slight increase in area filters and power consumption. On the contrary, the PSM produces filters with low area and power consumption at the cost of a slight increase in delay. a dvantage of PSM is that the word length Another of filter coefficients can be dynamically the changed any modification in the without hardware. II. Review of BCSE Method The BCSE technique focuses on eliminating computations redundant in coefficient multipliers by reusing the most common binary bit patterns present in coefficients. An (BCSs) -bit n binary number can form n (n + 1) BCSs 2 among themselves, a 3-bit binary representation can form BCSs, which are [0 1 1], [1 0 1], [1 1 0], four and 1]. These BCSs can be expressed as [0 1 1] [1 1 x 2 = 2 - 1 x + 21 x, [1 0 1] x = x + 2- 2 x, [1 1 0] = 3 = x =x+2- 1 x, and = [1 1 x =x +2- 1 x+2- 2 x, where x is the input 5 1]= that other BCSs such as [0 0 1], signal. and [1 Note [0 1 0], 0 do not require any adder for implementation 0] as they have only one non zero bit. A straightforward above BCSs would require five realization of adders. x can Howe ver be obtained x by a right 2 4 from shift x = 2- 1 x operation (without using any extra 2 adders): - 1 (x+2 1 x) =2- 1 x . Also, + 2- 2 x=2 can be 5 4 x + 2- 1 x + 2- 2 x = x + 2- 2 obtained from using an x = x 4 5 4 x Thus,adder: three adders are needed to realize xs. only the BCSs x to x . The number of adders required for 2 5 the n-all binary sub expressions n - 1 1 bit possible is2 [2]. The proposed FIR filter architecture is based on transposed direct form as shown in Fig. 1. In the transposed direct form, the coefficient multipliers dotted outline in Fig. 1) share the (shown as same and hence commonly known as multiplier input block The MB reduces the complexity of the (MB). FIR filter implementations, by exploiting the redundancy Thus, re dundant computations in MCM. (partial additions in the multiplier) are product eliminated using BCSE. In the case of channel filters for SDR receivers, the coefficients need to be changed as the filter specification changes with the communication standard. Therefore, reconfigurability is a necessary requirement for SDR channel filters. In the next section, we propose two architectures that incorporate reconfigurability into the BCSE-based low
4

ISBN: 978-93-

complexity filter Fig. 2. architecture Architecture of proposed III. Filter method. Proposed Architectures section, the architecture of the In this FIR proposed presented. Our architecture is based filter is on transposed direct form FIR filter structure the as shown in Fig. 1. The dotted portion in Fig. 1 presents the MB. I n Fig. 1, i represents the re PEprocessing eleme nt corr esponding to ith the coefficient. PE performs the coefficient multiplication the help of a shift and add unit operation with .The architecture of PE is different for proposed CSM and In the CSM, the filter coefficients PSM. are partitione d into fixed groups and henc e the PE architecture involves constant shifters. But in the PSM, the PE consists of programmable shifters (PS).FIR filter architecture can be realized in a The serialin which the same PE is used for generation way of partial products by convolving the all coefficients with the input signa l h x[n])is used when ( consumption and area are of power concern. prime The architecture of the PE (dotted portion) is basic shown in Fig. 2. The functions of different blocks of the PE are explained below. and Add Unit: 1) Shift It is well known that one of the efficient ways to reduce the complexity of multiplication operation is to realize it using shift and oper ations. In contrast to conventional shift add and units used in previously proposed add reconfigurable filter architectures, we use the BCSs-based shift and unit in our pro- posed CSM and add PSM architectures. The architecture of shift and add unit is shown in Fig. 3. shift and add unit is used to realize all the 3The bit BCSs of the input signal ranging from [0 0 0] to [1 1 In Fig. 3, x>>k represents the input x 1]. shifted k units. All the 3-bit BCSs [0 1 1], [1 0 1], right by [1 0], and [1 1 1] of a 3-bit number are 1 generated three adders. Since the shifts to obtain using only the are known beforehand, PS are not required. BCSs All eight BCSs (including [000]) are then fed to these the multiplexe r unit. In both the ar chitectures ( CSM and proposed in this paper, we use the same PSM) shift and add unit.

Fig. 3. Architecture of shift and add unit. 2) Multiplexer Unit: The multiplexer units are

Department Of EEE, Annamacharya Institute Of Technology & Sciences, Rajampet 173

National Conference On Electrical Sciences -2012 (NCES-12) 81583-72-2


used to select the appropriate output from the and add unit. All the multiplexers will shift share the outputs of the shift and add unit. The inputs to the multiplexers are the 8/4 inputs from the shift and unit and hence 8:1/4:1 multiplexer units add are employed in the architecture. The select signals of the multiplexers are the filter coefficients which are previously stored in a look up table (LUT ). The CSM PSM architectures basically differ in the and way coefficients are stored in the LUT. In the filter CSM, the coefficients are directly stored in in LUTs without any modification whereas in PSM, the coefficients multiplexers will also be different for PSM and CSM. In CSM, the number of multiplexers will be dependent on the number of groups after the partitioning of the filter coefficient into fixed groups. The number of multiplexers in the PSM is dependent on the number of non-zero operands in the coefficientworst case after the application of for the BCSE algorith m. Final Shifter Unit: 3)

ISBN: 978-93-

coefficien h is the worst-case 8-bit coefficient t the bits are nonzero and hence needs a since all maximum additions and shifts. In this number of n=8, case, therefore the number of multiplexers required and is 3.The output y =h x is expressed as y= -1 x+2-2 x +2-3 x +2-4 x +2-5 x +2-6 x +2-7 x+2 -8 x (3) By2partitioning into groups of three bits from most significant bit (MSB) (3), we obtain (x +2-1 x +2-2 x +2 -3 x +2-4 x +2-5 x +2 -6 x +2- 7 h=2-1 x ) (4) -1 -1 =2-1 x x -2 x -3 ( + 2- 1 + 2 -2 ) + 2 -6 x )) (5) Note terms that the +2 + 2 -2 and +2

-4

-15

x.

, 2-3 . Since shifts are always constant irrespective ofthese the coefficients, programmable shifters are not required and these shifts can be hardwired. The final adder will compute the sum of all the unit intermediateto sums x n obtain The architecture of PE for CSM is shown in Fig. 4. The coefficient word length is considered as 16 bits. magnitude form with the MSB reserved for the sign

=2
-1

-2

( + 2 -1 ) x

(2)
-2

x x help of multiplexer unit, the final shifter unit -4 will perform the shift operations in 2 PSM and CSM architectures also differ in The the nature

(2). an 18-bit value in LUTs. Each row in LUT corresponds to one coefficient. Note that only half the number of coefficients needs to be stored as FIR filter coefficients are symmetric. The coefficient 0 to 2 values groups of three bits and are used as select signals 0 , to ) forms the select signal to Mux1 and so on. Since are there

T his unit will compute -4 the sum of all the intermediate additions x x -1 2 ( +2 ) as in (2). Compared to the

-2

addition). Thus, the same hardware architecture can the necessary reconfigurability. Moreover, the addition operations and hence offers hardware A. Architecture of CSM directly in the LUT. These coefficients are partitioned for the multiplexers. The number of multiplexer units n/ n filter coefficients. The CSM can be explained with h

Fig. 4. CSM

Architecture of PE for

National Conference On Electrical Sciences -2012 (NCES-12) 81583-72-2


3-bits, eight combinations are possible and hence to Mux5 are 8:1 multiplexers. The Mux1 -15 value corresponding to forms the select to a 2 2:1 multiplexer, Mux6. The output from the ith multiplexer is denoted r . Note that even as are taking coefficient with values up to a i though we done finally as -1 is precision the shifting of of 16 bits, shown 2 (4) and (5) and hence the maximum shift will in be Mux7 determines whether the output needs to 15 . 2 be complemented based on the sign bit of the

ISBN: 978-93-

filter coefficient and hence it is a 2:1 multiplexer. In Fig. 4, the shifts are obtained as follows. r1 to Letdenotes the outputs of Mux1 to r6 Mux6, respectively. Then 2 -1 r 1 + -4 r 2 + -7 r3 + -10 r 4 + -13 r 5 + -1 6 y= 2 2 2 2 2 r6 . (6) The shifts are obtained by partitioning the 16 bit coefficient into groups of 3 bits. partitioning By (6) y=2-1 [(r 1+ -3 r 2) -6 [( r3+ -3 r4) -6 ( r5+ -3 r6)]] . 2 +2 2 +2 2 (7) -3 -3 Substituting r1+ r 2), ( r 3 + 2 r4), and r 5+ (6) 2 r 9, ( 2 r r 7 r8, by respectively, We get , and y = 2 -1 [r 7 + -6 (r8 + -6 2 2 (8) By substituting r 8 + -6 ( 2 y = 2 -1 (r 7 + -6 r10) . 2 (9) By substituting ( y = 2 -1 (r 11) . (10) The expressions from (6)(10) are represented in Fig. 4. The main advantage of the CSM architecture the shifts are constants irrespective of is that all the coefficients and hence can be hardwired resulting in high speed operation of the filter.Architecture of PSM B. The PSM is based on the BCSE algorithm presented in our previous work [2]. The PSM architecture presented in this section incorporates reconfigurability into BCSE. The PSM has a preanalysis part in which the filter coefficients are analyzed using the BCSE algorithm in [2]. Thus, the redundant computations (additions) are eliminated BCSs and the resulting coefficients in using the aoded format are stored in the LUT. The c coding is explained in the latter part of this format section. and add unit is identical for both PSM The shift and The number of multiplexer units required CSM. can obtained from the filter coefficients after be the application of BCSE [6]. The number of multiplexers after considering the number of nonis selected zero operands (BCSs and unpaired bits) in each of the r7 2 r 9)] . r 9) by r 10

-3

+ -6 r 10) by

r 11

coefficients after the application of the BCSE algorithm. The number of multiplexers will be corresponding to the number of non-zero operands worst-case for the coefficient (worst-case coefficient being defined as coefficient that has the maximum number of non-zero operands). The architecture of PE for PSM is shown in Fig. 5. The coefficient word length is fixed as 16 bits. Based statistical analysis, we have fixed the on our number of multiplexers as 5 (same as the number of nonzero operands). The LUT consists of two rows of 18 bits each coefficient of the for form SDDDDXXDDDDXXMMMML and DDDDXXDDDDXXDDDDXX where S represents the sign bit, DDDD represents the shift to 2 -15 and XX represents values from 0 2 input x or the BCSs the obtained from the shift In the coded format, XX = and add unit. 01 represents x , 10 x + 2 -1 x , represents x + represents and 2 -2 x , 00 x+2 -1 x 11 x , + 2 -2 represents respectively. Thus, the two rows can store up to five operands which is the worst case number of operands16-bit coefficient. In most of the for a practical coefficients, the number of operands is less than the worst case number of operands, 5. In that case MMMML can be used to avoid unnecessary additions. The values MMMM will be given as select signal to the Mux6 and L to Mux8. MMMML indicates the presence of five operands. each position indicates the presence of A 1 in each operand. Thus, for all operands to be present will be indicated by MMMML = 11111. This means the Mux6 will select the output from the output of adder, Mux8 will select the output of adder, A2. A4 and If only first operand is present, MMMML = 10 This means the Mux8 will select the output 000. of shr4 and Mux6 will select the output of PS, PS, shr1 a result of this none of the adders shr1 to . As shr4 be loaded saving significant amount of will dynamic

Fig. 5. Architecture of PE for Power. ThePSM. coding can be explained as given below. Consider the positive h coefficient h= . (11 [1010011001010011] )

Department Of EEE, Annamacharya Institute Of Technology & Sciences, Rajampet 175

National Conference On Electrical Sciences -2012 (NCES-12) 81583-72-2


By using the BCSE [6], substituting 2= [1 1], 3= [1 0 1], (11) becomes h = . (12 [3000020003000020] ) Then (12) will be stored in the LUT as 000001101011011110 and 100111111010000000. as (12) has must be noted that It only four operands, the fifth operand values DDDDXX are substituted as 000000 and MMMML as 11110. values are given as select signals for The XX Mux1 to Mux5. The values of DDDD are fed to corresponding PS. The multiplexer Mux6 and Mux8 select the appropriate output in case the will number of operands after BCSE is less than 5. The use of Mux6 and Mux8 reduces the number of adders utilized by selecting the output from the appropriate all the adders in the PE are not adder as always needed. For example, in (12), as only four operands occur, output can be taken from the output of PS, without using adder, shr2. Mux8 will do shr4 this hence the adder shr2 is not loaded and and consumes zero current and power. The select signals of Mux6 and Mux8 have five bits and hence 25 different control signals are possible which adds lotsflexibility to the architecture which can of be employed in future if required. Mux7 is used to complement the output in case of a negative and its select signal is the sign bit S coefficient of the coefficient. The PSM architecture has two advantages; first, it guarantees a reduced number of additions compared and second it offers the flexibility to CSM, of changing the word length of coefficients. The same architecture designed for 16-bit coefficients PSM is capable of operating for any coefficient word length less than 16 bits. This means, if the word length is reduced, the format of the LUT can be changed if required. The main advantage of reducing the precision is that some of the adders in the PSM architecture will be unloaded resulting in zero dynamic power. To the best of our knowledge, the PSM architecture is the first approach toward programmable coefficient word length FIR filter architecture. This means that the coefficient word of the proposed PSM architecture can length be changed dynamically without any change in hardware. C. Comparison Between CSM and PSM Thus, the CSM architecture results in faster coefficient multiplication operation at the cost of few adders compared to PSM architecture extra whereas architecture results in fewer number the PSM of additions and thus less area and power consumption to compared the CSM architecture. Another advantage of PSM is that it is independent of word length of the filter coefficients. For the PSM

ISBN: 978-93-

architecture, the number of multiplexers is fixed on the number of BCSs present in a based given coefficient set (worst case-coefficient of the set). even if the word length changes, it Thus, hardly the architecture of PSM. In [11], it affects was pointed out that for many filter taps, the highest coefficient precision is not required. Valuable resources will be wasted if all taps hardware are implemented with the highest precision. The proposed PSM can be implemented for dynamically varying coefficient precision as it is word length independent. One of the limitations of the PSM architecture is that it requires pre-analysis of filter coefficients and hence on-the-fly reconfigurability is not always feasible. But this restriction does not impose constraints on popular reconfigurable filter applications like wireless communications. This is because in such applications, we have a distinct filter each communication standard and the for coefficients are fixed for a specific standard. In of the filter other words, when the communication system is operating on a particular wireless standard, the filter coefficients do not change, i.e., the filter is not required to be an adaptive filter. When the system changes its mode of operation to a different wireless communication standard (as in the case of a multistandard transceiver), the coefficient set corresponding to the specification of the new standard is loaded (replacing the current filter coefficients). Note that the coefficients of the new standard are known beforehand (pre-stored) and therefore the pre-analysis can be done offline and the problem with re-configurability can be solved. TABLE I Synthesis Results for an FIR Filter with T 20 and Coefficient Word length of 16 aps Bits Proposed PSM Proposed CSM Gate count 22 581 22 956 Sampling (MH arrival time (ns) 33.64 frequency24 Data ) 26 26.824 V. Experimental Results section, the synthesis and design results In this theofproposed CSM and PSM architectures are presented and compared A. Synthesis Results We have used Xilinx 8.1i ISE for synthesizing he synthesis has been done on purposes. T Xilinxs 2v3000ff1152-4 FPGA. Table I shows Virtex-II the synthesis results of the CSM and PSM 20-tap FIR that has a coefficient word length of 16 bits. filter We done the implementation of filters with have different pass band edge p ) and stop band edge ( ( specification s

s )

Department Of EEE, Annamacharya Institute Of Technology & Sciences, Rajampet 176

National Conference On Electrical Sciences -2012 (NCES-12) 81583-72-2


TABLE II Synthesis Results for MB (PSM) with Different Coefficient Word lengths 8-bit Word length 12-

ISBN: 978-93-

Mobile Systems . Dordrecht, The Netherlands: Kluwer Academic, 1999, pp. 257283. [2] R.Mahesh and A.P.Vinod, A new common sub expression elimination algorithm for realizing low complexity 16higherdigital filters, order IEEE Trans. Comput.-Aided Design bi bi Integr. Circuits Syst. , vol. 27, no. 2, pp. 217219, Feb. t t Gate count 2878 3532 2008 3771 Sampling Frequency (MHz) 35 20 [3] A. P. Vinod and E. M.-K. Lai, On the implementation of 24 Arrival time (ns) 7.96 8.84 efficient channel filters for wideband receivers by Data optimizing common subexpression elimination methods, 9.92 given by: p = 0 .1 , s = 0 .12 ; p = 0 . 15 IEEE Trans. Comput.-Aided Design Integr. Circuits Syst. , 1) = 0 .2 ; 3) = p = 0 .2 , s 2) 0 .22 ; and , p s vol. 24, no. 2, pp. 295304, Feb. 2005. 4) = 0 .2 , s = 0 .3 , [4] A. Chandrakasan, M. Potkonjak, R. Mehra, J. Rabaey, and R. W.Brodersen, Optimizing power using respectively. Even though the proposed transformations, IEEE Trans.Comput.-Aided Design architectures are reconfigurable, the usage of adders and shifters is Integr. Circuits Syst. , vol. 14, no. 1, pp. 1231, Jan. 1995. dependent on the filter coefficient values. Some [5] K. H. Chen and T. D. Chiueh, A low-power digit-based of adders may not be used by the multiplexers. As the reconfigurable FIR filter, IEEE Trans. Circuits Syst. II , a result of this, they are unloaded and do not vol. 53, no. 8, pp. 617621, Aug. 2006. consume any dynamic power. Hence, the power and [6] J.Mitola,Object-oriented approaches to wireless speed of the synthesis results are dependent on values system the coefficients and hence we have considered filter engineering,inSoftware Radio Architecture .New an York:Wiley,200 average of the synthesis results in all the tables in 0 this From the comparison it is very evident that paper. [7] R. Mahesh and A. P. Vinod, Reconfigurable low the CSM requires 475 gates more than that of complexity FIR filters for software radio PSM, whereas PSM requires 6.82 ns more for the data receivers, in Proc. 17th IEEE Int. Symp. Personal Indoor to arrive at the output compared to CSM. Thus, Mobile Radio Commun. (PIMRC) , Helsinki, Finland, Sep. the results in higher speed whereas the PSM CSM 2006, pp. 15. results area. The reason for lower speed of PSM in lower [8] Analysis of Efficient Architectures for FIR Filters using is to the presence of programmable shifters and due Common Subexpression Elimination Algorithm M. that less area is due to elimination of of Thenmozhi, N. Kirthika 2007, pp. 18 redundant b y using BCSE algorithm. We have additions [9] N. Moreano, E. Borin, C. de Souza, and G. Araujo, also analyzed the effect o f the MB for different Efficient data- path merging for partially reconfigurable filter coefficient word lengths of 8, 12, and 16 bits for architectures, IEEE Trans. Comput.-Aided Design Integr. the architecture. The results are shown in Table PSM Circuits Syst. , vol. 24, no. 7, pp. 969980, Jul. 2005. II. can be noted that as the precision of the [10] M. P otkonjak, M. B. Srivastava, and A. P. It Chandrakasan, Multiple constant multiplications: coefficienthigh the area consumption is increased is made Efficient and versatile framework and algorithms for and speed of operation is reduced. Thus, by the exploring common sub expression elimination, IEEE choosing the appropriate filter coefficient word length, it , Trans. Comput.-Aided Design , vol. 15, no. 2, pp. 151165 is possible to obtain reduced area and power as well Feb. 1996. as increased speed for the PSM

architecture. VII. Conclusion proposed two new approaches We have namely, and CSM PSM, for implementing reconfigurable filters with low complexity. higher order The proposed CSM and PSM methods make use of architectures with fixed number of multiplexers and reduction in complexity is achieved by the applying the BCSE The CSM architecture results algorithm. filters and PSM architecture results in in high speed low and thus low power filter implementations. area The also provides the flexibility of changing PSM the filter coefficient word lengths dynamically. The proposed reconfigurable architectures can be easily modified to employ any CSE (MCM) method. Thus,method is a general approach for low our complexity reconfigurable channel filters. References [1] T. Hentschel and G. Fettweis, Software radio
receivers, in CDMA Techniques for Third Generation

Department Of EEE, Annamacharya Institute Of Technology & Sciences, Rajampet 177

Вам также может понравиться