Final Papper JIOS T 674

Journal of Information & Optimization Sciences
ISSN 0252-2667 (Print), ISSN 2169-0103 (Online)

DOI : 10.1080/02522667.2019.1578087
1 New approach to low-area, low-latency memory-based systolic

2 architecture for FIR filters
3
4 C.S. Vinitha *
5 R.K. Sharma †
6 Ambedkar Institute of Advanced Communication Technologies & Research
7 Guru Gobind Singh Indraprastha University
8 Dwarka, Sector 16-C
9 Delhi 110078
India
10
11
12 Abstract
13 A new approach to memory based systolic architecture for higher order FIR filter has
been proposed. The memory- based multiplier used in this systolic FIR filter design is different
14
from the earlier used memory-based multipliers. This design is applicable for low-area, low
15 latency and high throughput FIR filters. Both 1-D and 2-D systolic computing structures for
16 higher order FIR filter are derived and implemented in Xillinx Virtex-7 XC7vx330tffg1157
17 FPGA using VHDL. Various performance metrics like number of slices, latency of the filter
18 and maximum frequency of operation of the filter are compared for different filter orders
19 at various input sample length. The parameters are compared for both 1-D and 2-D systolic
structure. For a 128-order filter with an input length of 32-bit, the proposed 2-D structure
20
at eighth-level decomposition occupies 88.9% less area as compared to 1-D structure. The
21 latency of the filter is reduced from 127.5ns to 3.5ns for a 128-order filter using the proposed
22 decomposed structure.
23
24 Keywords: Systolic Architectures, Memory-based Computation, FIR filter, Transposed Structure,
25 Low-latency Architecture.
26
27
28 1. Introduction
29
Real time digital signal processing (DSP) is required in many
30
applications such as; remotely controlling or processing the satellite
31
signals, radars, medical image processing, military applications, software
32
defined radios, speech signal processing, video surveillance etc. some of
33
34 *E-mail: vinithavinod1996@gmail.com (Corresponding Author)
35 †
E-mail: 21.ravindra@gmail.com
36 ©
2 C.S.VINITHA AND R. K. SHARMA
1 the above said applications require dedicated systems build around a core
2 Application Specific Integrated Circuit (ASIC) and, hence, the performance
3 of any DSP system depends on the efficient design of this ASIC deployed
4 therein [1]. Whereas, on one hand, certain applications require high speed
5 of operation and less power consumption, on the other, some applications
6 demand less silicon area on the IC. Hence, the ASIC is always required to
7 be designed as per specific demand of the application.
8 Digital filters are almost always part of a DSP system [2] & [3] and
9 therefore, efficient filter design is an important activity of a DSP ASIC.
10 Mostly, in all DSP applications the digital filters are of higher-order [4] &
11 [5]. These higher-order filters increase the complexity of the DSP system.
12 Some textbooks such as [1] showcase many techniques to improve the
13 performance by decreasing the complexity of the Digital filters. These
14 techniques are; pipelining, retiming, parallel processing, folding and
15 un-folding. Pipelining and retiming reduce the critical path of the filter
16 thereby increasing the sampling speed of the filter. Parallel processing
17 and un-folding techniques increase the throughput of the system at the
18 cost of increasing the hardware. Folding technique decreases the area
19 complexity of the filter but at the cost of reduced speed of operation of the
20 filter. As per application requirement we select, from the above, one of the
21 transformation techniques.
22 Systolic architectures are attractive design for computation intensive
23 DSP algorithms. This architecture is suitable for VLSI based design
24 because of its modular, regular and simple structure with high throughput
25 [6]. Systolic architecture is an array of Processing Elements (PE’s). Many
26 algorithms and architectures using systolic array in FIR filter and Discrete
27 sinusoidal transforms are suggested in [7] to [16]. The PE’s of filters consists
28 of multipliers and adders. Multipliers in PE’s limit the area of the filter
29 as they occupy maximum area. Nowadays memory based computation
30 is becoming popular [17], [18]. The conventional multiplier is replaced by
31 memory based multiplier and using these multipliers in PE’S of systolic
32 array results in memory based systolic architectures [19]. In FIR filter one
33 of the inputs to the multiplier i.e. the coefficient of the filter is fixed and
34 the other, which is the input sample, is variable. This nature of the inputs
35 to the multiplier gives rise to the concept of memory based computation
36 in filters [20]. There are two types of memory based computations, one is
37 Direct Memory based and the other is Distributed Arithmetic (DA) based
38 multiplier [21]. In the case of Direct Memory based multiplier the product
39 terms are stored and in the case of DA based multiplier the inner product
40
SYSTOLIC ARCHITECTURE FOR FIR FILTERS 3
1 terms are stored in memory. In [22] to [25] authors have suggested both
2 Direct Memory based and DA based implementation of FIR filters.
3 In the case of transposed FIR filter structure direct memory based
4 structure involves less hardware complexity compared to DA structure
5 [22]. The latency of transposed FIR filter increases with the order of the
6 filter in the case of Direct Memory based computation but it increases
7 with the size of the input sample in the case of DA based computation. In
8 [19] author has proposed Direct Memory based systolic architecture for
9 FIR filter and used the conventional memory based multiplier.
10 In this article we present a new approach to memory-based systolic
11 architecture for FIR filter by employing a new memory-based multiplication
12 strategy. This new approach, thus, reduces hardware complexity of the FIR
13 filter by simplifying the used multiplier configurations. Hence, this work
14 is an enhanced and detailed application version of our earlier work [25].
15 Further, to reduce the latency and the area of the filter, the FIR structure
16 is decomposed and 2-D Memory based systolic architecture for FIR filter
17 is realized. It is then implemented in Xillinx Virtex-7 XC7vx330tffg1157
18 FPGA using VHDL for verification and performance evaluation. The rest
19 of the paper is organized as follows. In section 2, the basic Memory based
20 systolic architecture for FIR filter is derived from the data flow graph and
21 the memory cell of the structure is explained in detail. In section 3, the
22 2-D systolic structure of the filter is derived from the concurrent recursive
23 equation of the filter output response which results in reduction in the
24 latency of the filter. Area and latency of both 1-D and 2-D structures are
25 also compared. In section 4, the work is concluded stating the advantages
26 of the structure.
27
28 2. FIR filter using Memory based Systolic Structure
29
30 In this section memory based systolic structure for N-tap FIR filter is
31 derived. The filter structure is mapped into the PE’s of the systolic array.
32 Then the details of the memory cells used in PE’s are discussed. The
33 conventional memory based PE’s and our proposed memory based PE’s
34 are also dealt in detail.
35
36 2.1 Derivation of memory based systolic architecture
37 The input output relation [2] of FIR filter in time domain is given in
38 Eq. (1) as follows
39
40
1 N −1
2 y ( n) = ∑x(n) h(n) (1)

3 n=0
4 where y(n) is the output, x(n) is the input and h(n) is the impulse response
5 of the filter. Eq. (2) represents the Z-transformed relation of the above.
6 Y (Z) = X (Z).H (Z) (2)
7
8 where Y(Z), X(Z) and H(Z) are the Z-transform of output, input and
9 impulse response of the filter. The output response Y (Z) expressed in Eq.
10 (3) in recursive form as
11
Y
= ( Z) X(Z)[h(0) + z −1 [h(1) + z −1 [h(2) + ...... + z −1
12
[h( N − 2) + z −1 [h( N − 1)]]....]] (3)
13
14 because the transfer function H(Z) is given as
15 N −1
16 H (Z) = ∑z h(n) (4)
−n
17 n=0
18
The transposed form data flow graph (DFG) of the recursive equation
19
of the filter is given in Fig.1. The memory based systolic structure is derived
20
as follows. Each multiplier, delay element and an adder of the DFG is
21
mapped onto the PE’s of the systolic structure. The conventional multipliers
22
are replaced by memory based multipliers. In memory based multipliers,
23
the product terms are stored in memory for the different combination of
24
the input samples. As the input is common to all the multipliers in the
25
transposed form structure, we can replace all the multipliers by a memory
26
module. This memory module will have multiple memory cells one each
27
for the multipliers in the DFG. The systolic array structure for the DFG is
28
29
30
31
32
33
34
35
36
37
38
Figure 1
39
40 Transposed form data flow graph of N-tap FIR filter
1
2
3
4
5
6 (a)
7
8
9
10
11
12
(b)
13
Figure 2
14
15 (a) Systolic architecture of FIR filter. (b) Structure of the memory module.
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31 Figure 3
32 Structure of Memory cell
33 (a) Conventional Multiplier. (b) Proposed Multiplier.
34
given in Fig.2. Apart from the memory module, we have a delay cell (D)
35
for the left most delay in the DFG of the filter. The cell A is replacing the
36
adder followed by the delay element in the DFG of the filter. Thus the DFG
37
is completely mapped onto the memory based systolic architectures.
38
39
40
1 2.2 Memory cell of Conventional and proposed Multiplier

2
In FIR filter, the coefficients are fixed and the only variable quantity
3
is the input sample. Because of this property memory based multipliers
4
are possible. Since one of the inputs to the multiplier is fixed, we can pre-
5
calculate the product terms for the various combination of the input and
6
store them in memory. The size of the memory increases with the length
7
of the input sample. For a W-bit input, we require a memory with 2W
8
memory locations. We can decrease the number of memory locations by
9
using dual-port memory. In this case we can split the input data into two
10
equal parts and its corresponding product term can be stored in parallel in
11
the dual port memory. The product terms are then added in a shift- adder
12
and the final product term is derived. Hence by using dual port memory
13
the number of memory locations is reduced to 2W/2. In our proposed
14
15 Table 1
16 Comparison of various parameters of the FIR filter for different input sample
17 size and different order of the filter.
18
19 Input sample size
Order of the filter
20 8-bit 16-bit 32-bit

Description of
parameters
21
Conventional
Conventional
Conventional
22
Proposed
Proposed
Proposed
23
24
25
26
S 48 100 287 290 726 597
27
28 16 L (ns) 15.5 15.5 15.5 15.5 15.5 15.5
29 Fmax(MHz) 460.61 357 278.3 260.4 187.02 212.6
30 S 263 238 680 617 982 1041
31 32 L (ns) 31.5 31.5 31.5 31.5 31.5 31.5
32
Fmax(MHz) 255.49 301.932 210.7 253.4 188.9 158.9
33
34 S 999 957 1484 1538 2345 2360
35 64 L (ns) 63.5 63.5 63.5 63.5 63.5 63.5
36 Fmax(MHz) 199.16 179.66 168.23 169.8 138.14 141.1
37 S 3096 3545 4398 3765 6064 5755
38
128 L (ns) 127.5 127.5 127.5 127.5 127.5 127.5
39
40 Fmax(MHz) 121.44 108.47 135.5 140.5 100.3 111.11
1 memory cell, the number of memory locations is further reduced. A paper

2 on this proposed multiplier is already published and found in [25]. In
3 conventional memory based multiplier, both even and odd product terms
4 are stored in memory but in our proposed multiplier only the even product
5 terms are stored in memory and the odd product terms are derived using
6 an external control circuit and the even product terms. The external circuit
7 consists of an adder and a multiplexer. The adder is used to derive the
8 odd product term by adding (1H) to the even product term where H is the
9 value of the filter coefficient. The multiplexer is used to select among the
10 even or odd product term as per the value of the least significant bit (LSB)
11 of the input sample. The structure of the memory cell is given in Fig.3 for
12 conventional and proposed multiplier at 3(a) and 3(b) respectively. FIR
13 filter using systolic structures are coded in VHDL and implemented in
14 Virtex 7 FPGA device using conventional and proposed memory based
15 multiplier. The various parameters like Number of Slices(S) which specify
16 the area occupied by the filter, Latency of the filter (L) and the Maximum
17 Frequency of operation (Fmax) are determined for various order of the filter
18 using Xillinx tool. A comparison of the parameters of the filter with both
19 multiplier designs is done and given in Table.1. From the table we can see
20 that in most of the cases our proposed multiplier based filter occupies
21 less area in terms of number of slices used. The maximum frequency of
22 operation is also higher for proposed filter. As the input sample length
23 increases and the order of the filter increases the proposed memory-based
24 systolic architecture for FIR filter is the better alternative as compared
25 to the basic memory-based systolic filter architecture. The latency of the
26 filter depends on the order of the filter hence it remains same for the same
27 order filter with different input length.
28
29 3. 2-D Systolic structure for FIR filter for Reducing Latency and Area
30
31 When the order of the filter increases, the size of the memory increases
32 thereby increasing the memory access time. In that case we can decompose
33 the filter structure thereby reducing the memory access time. The output
34 response of FIR filter is decomposed into concurrent recursive form and
35 this recursive algorithm is further ported into systolic architecture which
36 is implementable.
37
38
39
40
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18 Figure 4
19 2-D DFG of an N-order FIR filters.
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
Figure 5
39
2-D Systolic structure of a 16-order filter for q=p=4.
40
1 3.1 Derivation of systolic architecture for a 2-D DFG of FIR filter

2
If N, the order of the filter is a composite number then the Eq. (3) can
3
be expressed as the sum of q partial equations. The Eq. (3) in partial form
4
[2] can be expressed as follows.
5
q −1
6
7
Y( z ) = ∑y
m=0
m
( z) z − mp (5)
8
9 where ymof Eq.(5) given as follows
10  p −1 
ym  ∑ h(mp + n)z  X( z) (6)
−n
11 =
 n =0 
12
13 and the recursive equation of each is given in Eq.(7)
14
y m ( z) = X( z)[h(mp) + z −1 (h(mp + 1)
15
16 + z −1 (h(mp + 2) + .. + z −1 (h(mp + p − 1)).....))] (7 )
17 for m, p = 0.1.2.......q-1. The 2-D DFG for the Eq. (5) is given in Fig. 4. Each
18 row in the DFG represents the Eq. (7). This row is nothing but the 1-D
19 structure of the filter. Thus all the multiplier nodes in a row are replaced
20 by the memory-module and the adder and delay cell is replaced by cell A
21 of the systolic structure shown in Fig.2. Thus 2-D systolic structure will
22 have rows of 1-D systolic structure and the output of these rows is added
23 in a Pipelined Adder Tree (PAT) which gives the output of the filter.
24
25 3.2 Decomposed structure of a 16-order filter for q=4 and q=8
26
27 The decomposed structure for a 16-order filter for q=4 is given in
28 Fig.5. Since it’s a 16-order filter both q and p will be equal to 4. As the no
29 of adders in a row is reduced as compared to the 1-D structure, the width
30 of the adder used in the cell A of the 2-D structure is reduced. The width
31 of the adder increases as we precede from the first cell A to the last cell in a
32 row. For the example considered, the width of the last adder cell required
33 in the case of 1-D structure is 23-bit for an input sample size of 8-bit. But in
34 the case of 2-D structure for the same example the size of the adder used
35 in the last cell will be 11-bit. We have a series of adders in the PAT block of
36 the structure. For q=4 structure there will be two levels of adders in PAT
37 block. So the width of the final adder used in PAT block will be equal to
38 13-bit. The reduction in the bit length of the adder reduces the width of
39 the register used in the cell A of the structure. Thus the overall area of the
40 decomposed structure is reduced.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20 Figure 6
21 2-D Systolic structure of a 16-order filter for q=8 and p=2.
22
23 Similarly for the 2-D structure with q=8 and p=2, for a 16-order filter
24 the width of the last adder used in the PAT block is 12-bit. This structure
25 further reduces the area of the filter than for the decomposition level q=4.
26 The 2-D structure for q=8 and p=2 is given in Fig.6. The area complexity
27 for q=4 and q=8 for different order of the filter and for different input
28 size is given in Table.2. From the table we can see how much the area of
29 the filter is reduced for the decomposed filter as compared to the area
30 occupied by the 1-D filter. For example, for a 128-order filter for an input
31 length of 32-bit, we get 89.8% area reduction for a decomposed filter (q=8)
32 as compared to 1-D (q=1) filter. Also if we compare the area (in terms of
33 number of occupied Slices) of our proposed Systolic architecture based
34 filter with the basic systolic architecture based filter, we are able to get 5 to
35 7 percent reduction in area.
36 The maximum frequency of operation of the filter for different order
37 and different input length is also compared. The details are given in
38 Table.3. From the table we can see that for q=8, Fmax is higher than for q=4
39 for all cases. Thus we can conclude that the decomposition level q=8 is
40
1 Table 2
2 Comparison of Area in terms of number of slices of the filter for different
3 input sample size and different order of the filter.
4
5 Input sample size
7 Description
Order
of Parameters
Conventional
Conventional
Conventional
8 of the
Proposed
Proposed
Proposed
at different
9 filter
decomposition levels
10
11
12
q=1 263 238 680 617 982 1041
13
14 32 S q=4 54 53 138 129 195 180
15 q=8 02 02 47 29 74 65
16
q=1 999 957 1484 1538 2345 2360
17
18 64 S q=4 119 75 201 250 572 490
19 q=8 54 61 123 132 273 246
20 q=1 3096 3545 4398 3765 6064 5755
21
128 S q=4 293 253 588 603 905 840
22
23 q=8 133 114 344 316 623 587
24
25 considered as the best level of decomposition as we are able to achieve
26 minimum area and maximum frequency of operation.
27
28 3.3 Structure of Pipelined adder tree for various decomposition levels
29
The detailed structure of PAT for both q=4&8 are given in Fig. 7(a)
30
& (b) respectively. For the decomposition level q=4, we have 2- levels
31
of adders. So including the first level of adders of the filter block there
32
are 3-levels of adders. These adder levels change the latency of the filter.
33
Similarly for q=8, we have 4-level of adder which includes three of PAT
34
block and one of filter block. The latency of the filter for different order and
35
different decomposition level is given in Table.4. Latency of the proposed
36
filter is also compared with the latency of the conventional memory based
37
systolic FIR filter given in [19]. Latency of our proposed filter is less than
38
the filter structure given in [19] for all decomposition levels. Also in the
39
proposed filter, the PAT block remains same for all filter order. Hence the
40
1
2
3
4
5
6
7
8
9
10
11
12 (a) q=4
13
14
15
16
17
18
19
20
21
22
(b) q=8
23
Figure 7
24
Structure of PAT for different decomposition levels.
25
26 latency for the decomposed filter remains same for all levels. This is also
27 an advantage of the proposed structure over the filter structure proposed
28 in [19]. The proposed decomposed structure can be used for higher order
29 FIR filter with less area complexity and reduced latency and maximum
30 frequency of operation.
31
32
4. Conclusion
33
34 A Memory based 1-D and 2-D systolic architecture for FIR filter is
35 derived from the data flow graph of the filter. The conventional Memory
36 based multiplier used in the filter is replaced by our proposed Memory
37 based multiplier. The proposed filter is implemented in Xillinx Virtex 7
38 FPGA device using VHDL. The key performance metrics like number of
39 slices, latency and the maximum frequency of operation is compared with
40 the conventional Memory based systolic FIR filter architecture given in
1 Table 3
2 Comparison of Fmax of the filter for different input length and different order
3 of the filter.
4 Input size
6 Description of
Conventional
Conventional
Conventional
7 Order of Parameters With
Proposed
Proposed
Proposed
8 the filter decomposition
levels
9
10
11
q=1 255.4 301.9 210.7 253 188.9 158.9
12 Fmax
13 32 q=4 291.8 256.5 139.4 106.4 101.7 105.3
(MHz)
14 q=8 709.7 709.7 395.2 322.1 286.8 288
15 q=1 199.1 179.6 168.2 169.8 138.1 142.4
Fmax
16 64 q=4 121.9 134 101 100.4 71.2 74.3
(MHz)
17 q=8 182.9 130 122 114.6 100 101.1
18 q=1 121.4 108.4 138.6 140.5 100.3 111.1
19 Fmax
128 q=4 92.8 99.9 65.5 64.4 50.6 49
20 (MHz)
q=8 124.5 128.3 100.2 101.3 77.6 80.4
21
22
23 Table 4
24 Comparison of Latency of the filter for different order and different
25 decomposition level.
26
Order of the filter
27
Decomposition level
28 16 32 64 128
29
Conventional
Conventional
Conventional
Conventional
30
Proposed
Proposed
Proposed
Proposed
31
[19]
[19]
[19]
[19]
32
33
34
q=1 16 15.5 32 31.5 64 63.5 128 127.5
35
36 q=4 6 2.5 10 2.5 18 2.5 34 2.5
37
38 q=8 5 3.5 7 3.5 11 3.5 19 3.5
39
40
1 [19]. Because of the PAT structure proposed in 2-D structure of this work,
2 we are able to achieve same latency for all filter order. Latency achieved
3 by the proposed 2-D structure is very less at every decomposition level as
4 compared to the structure proposed in [19]. The decomposed structure
5 proposed for q=8 is a hardware efficient structure with minimum area of
6 occupation with minimum latency and maximum frequency of operation.
7
8 References
9
10 [1] K. K. Parhi. VLSI Digital Signal Processing Systems: Design and
11 Implementation. New York: John Wiley & Sons, Inc (1999).
12 [2] S. K. Mitra. Digital Signal Processing: a Computer Based Approach.
13 Boston: McGraw-Hill (2006).
14
[3]
J. G. Proakis and D. G. Manolakis. Digital Signal Processing:
15
Principles, Algorithms and Applications. Upper Saddle River, NJ:
16
Prentice-Hall (1996).
17
18 [4] G. Mirchandani, R. L. Zinser, Jr., and J. B. Evans, “A new adaptive
19 noise cancellation scheme in the presence of crosstalk [speech
20 signals],” IEEE Trans. Circuits and Systems II: Analog and Digital
21 Signal Processing, 39(10), 681–694 (1995).
22 [5] D. Xu and J. Chiu, “Design of a high-order FIR digital filtering and
23 variable gain ranging seismic data acquisition system,” In IEEE
24 Proceedings Southeastcon’93, p. 6(1993).
25 [6] H. T. Kung,“Why systolic architectures?,” IEEE Computer, 15(1)37–
26 45 (1982).
27
[7] R. Wyrzykowski and S. Ovramenko, “Flexible systolic architecture
28
for VLSI FIR filters,” IEEE Proceedings-Computers and Digital
29
Techniques, 139(2). 170–172 (1992).
30
31 [8] B. K. Mohanty and P. K. Meher, “Cost-effective novel flexible cell level
32 systolic architecture for high throughput implementation of 2-D FIR
33 filters,” IEEE Proceedings-Computers and Digital Techniques,143(5),
34 436–439 (1996).
35 [9]
B. K. Mohanty and P. K. Meher “Novel flexible systolic mesh
36 architecture for parallel VLSI implementation of finite digital
37 convolution,” IETE Journal of Research, 44(6),261–266 (1988).
38 [10]
P. K. Meher, “High throughput and low-latency implementation
39 of bit-level systolic architecture for 1D and 2D digital filters,” IEEE
40
1 Proceedings- Computers and Digital Techniques, 146,(2),91–99

2 (1999).
3 [11] C. G. Caraiscos and K. Z. Pekmestzi, “Low-latency bit-parallel systolic
4 VLSI implementation of FIR digital filters,” IEEE Trans. Circuits And
5 Systems-II: Analog And Digital Signal Processing,43(7),529–534
6 (1996).
7
[12] P. K. Meher, T. Srikanthan, and J. C. Patra, “Scalable and modular
8
memory-based systolic architectures for discrete Hartley transform,”
9
IEEE Trans. Circuits Syst. I: Regualr Papers, 53(5), 1065– 1077 (2006).
10
11 [13] P. K. Meher. “Unified systolic-like architecture for DCT and DST
12 using distributed arithmetic”, IEEE Trans. Circuits Syst. I, Regular ,
13 53(5), 2656–2663 (2006).
14 [14] D. F. Chiper. “A systolic array algorithm for an efficient unified
15 memory-based implementation of the inverse discrete cosine
16 transform”, In IEEE Conf. Image Process, pp. 764–768 (1999).
17 [15]
D. F. Chiper, M. N. S. Swamy, M. O. Ahmad, and T. Stouraitis.
18 “Systolic algorithms and a memory-based design approach for a
19 unified architecture for the computation of DCT/DST/IDCT/IDST”,
20 IEEE Trans. Circuits Syst. I, Regular, 52(6), 1125–1137 (2005).
21
[16] P. K. Meher and M. N. S. Swamy. “New systolic algorithm and array
22
architecture for prime-length discrete sine transform”, IEEE Trans.
23 Circuits Syst. II, Exp. Briefs, 54(3), 262–266 (2007).
24
[17] P.K. Meher. “Memory-based hardware for resource-constraint digital
25
26 signal processing systems”, In Proceedings of IEEE 6th International
27 Conference on Information, Communications & Signal, pp. 1-4 (2007).
28 [18] C.S.Vinitha and R.K.Sharma. “Memory- Based VLSI Architectures
29 for Digital Filters: A Survey”, In Proceedings of IEEE International
30 Conference UPCON-2016, pp. 98-101 (2016).
31 [19]
P.K. Meher. “Low-latency hardware-efficient memory-based
32 design for large-order FIR digital filters”, In Proceedings of IEEE
33 International Conference on Information, Communications & Signal
34 (2007).
35 [20] S. A. White. “Applications of the distributed arithmetic to digital
36 signal processing: A tutorial review”, IEEE ASSP Magazine, 6(3),
37 5–19 (1989).
38
39
40
1 [21]
H.R. Lee, Jen and C.M. Liu. “On the design automation of the
2 memory-based VLSI architectures for FIR filters”, IEEE Trans.
3 Consumer Electronics, 39(3), 619–629 (1993).
4 [22] P.K. Meher. “New approach to look-up-table design and memory-
5 based realization of FIR digital filter”, IEEE Transactions on Circuits
6 and Systems I: Regular Papers, 57(3), 592-603 (2010).
7
[23]
P.K Meher. “LUT optimization for memory-based computation”,
8
IEEE Transactions on Circuits and Systems II: Express Briefs. 57(4),
9
285-9 (2010).
10
11 [24] P. K. Meher, S. Chandrasekaran, and A. Amira. “FPGA realization
12 of FIR filters by efficient and flexible systolization using distributed
13 arithmetic”, IEEE Trans. Signal Process. 56(7), 3009–3017 (2008).
14 [25] C.S.Vinitha and R.K.Sharma. “A Novel Technique to optimize the LUT
15 used in Memory based filter”, In Proceedings of IEEE International
16 Conference in Electrical, Electronics, Computers, Communication,
17 Mechanical and Computing (2018)
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40

Final Papper JIOS T 674

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Final Papper JIOS T 674

Загружено:

Авторское право:

Доступные форматы

Journal of Information & Optimization Sciences

ISSN 0252-2667 (Print), ISSN 2169-0103 (Online)

1 New approach to low-area, low-latency memory-based systolic

2 y ( n) = ∑x(n) h(n) (1)

1 2.2 Memory cell of Conventional and proposed Multiplier

20 8-bit 16-bit 32-bit

1 memory cell, the number of memory locations is further reduced. A paper

1 3.1 Derivation of systolic architecture for a 2-D DFG of FIR filter

1 Proceedings- Computers and Digital Techniques, 146,(2),91–99

Вам также может понравиться