Вы находитесь на странице: 1из 5

FPGA implementation of

Discrete Fractional Fourier Transform


M.V.N.V.Prasad†1, K. C. Ray†2 and A. S. Dhar‡
† ‡
Department of Electronics and Department of Electronics and
Communication Engineering, Electrical Communication Engineering,
Indian Institute of Information Technology, Allahabad, Indian Institute of Technology, Kharagpur,
Uttar Pradesh, India. West Bengal, India.
Email: 1mvnvprasad@gmail.com, Email: asd@ece.iitkgp.ernet.in
2
kcr@iiita.ac.in

Abstract– Since decades, fractional Fourier transform has proposed hardware architecture. Results and discussion
taken a considerable attention for various applications in of this proposed implementation has been highlighted in
signal and image processing domain. On the evolution of section IV. Finally, section V concludes the paper with
fractional Fourier transform and its discrete form, the real future scope of this work.
time computation of discrete fractional Fourier transform
is essential in those applications. On this context, we have
proposed new hardware architecture for implementing a II. FRACTIONAL FOURIER TRANSFORM
Discrete Fractional Fourier Transform (DFrFT) which
requires hardware complexity of O(4N), where N is A. Continuous Fractional Fourier Transform.
transform order. This proposed architecture has been
simulated and synthesized using verilogHDL, targeting a The generalized Fourier transform rotates the signal
FPGA device (XLV5LX110T). The simulation results are f(u) in time-frequency plane [1] on the rotation angle of
very close to the results obtained by using MATLAB. The α= aπ2 (‘a’ is fractional value) and is given in fallowing
result shows that, this architecture can be operated on a equation ∞

maximum frequency of 217MHz.
fα(v) = f(u) Kα(u,v) du (1)
Keywords– Discrete Fractional Fourier Transform, -∞
2 2
Hardware Architecture, CORDIC and FPGA. 1- j cot α j u +v cot α – j u v cscα
where √ 2π e 2

if α is not a multiple of π
I. INTRODUCTION
Kα(u,v) =
δ(u–v) if α is a multiple of 2π
F ractional Fourier transform [1], [2],[3] has been an
emerging mathematical tool, having wide area of
signal [4], Image processing applications like
δ(u+v) if α+π is a multiple of 2π
Here ‘v’ is the variable in ath order fractional domain and
Biomedical signal detection[6], Image registration[7], ‘u’ is variable in fractional domain in order of zero.
Image Encryption[5], Security of registration data of The kernel Kα(u,v) is decomposed as given in equation
fingerprint image[8], Broadband beam forming of LFM (2) in terms of Hermite-Gaussian function [2] which are
signals[9] and Moving target detection and location in eigen functions of the Fourier transform.
space borne SAR. The decomposed kernel is
Unlike Discrete Fourier Transform (DFT), Discrete ∞
Fractional Fourier Transform (DFrFT) has many
definitions, such as direct form, improved sampling-type,
Kα (u,v) =
Σk=0
ψk(v)e-jαk ψk(u)
1/4
and ψk(u)= 2 Hk (√2πu) e -πu
(2)
2

linear combination-type, eigenvectors decomposition- √2kk!


th
type [10], group theory-type and impulse train-type ψk(u) is the k order Hermite-Gaussian function,
DFrFT. Among these definitions, eigen vector Hk is the kth order Hermite polynomial.
decomposition type is to be a legitimate definition [11]
to satisfy all the properties such as unitary, index B. Discrete Fractional Fourier Transform
additive, reduction to DFT when fractional value is one,
approximation of continuous fractional Fourier The discrete fractional Fourier transform has been
transform. proposed in [10] using discrete Hermite-Gaussian
To the knowledge of authors on the evolution of functions, for N-point as given in equation (3).
Fractional Fourier transform and its application, no N-1
hardware architecture is available except [12] for real
time implementation of DFrFT. In our paper new
α
F [m,n] =
Σ
k=0
uk[m]e-jαk uk[n] (3)

hardware architecture for implementing DFrFT based on


eigen vector decomposition have been proposed and Where uk[n] is kth discrete Hermite-Gaussian function.
implemented on FPGA device for real time applications. The discrete values of continuous Hermite-Gaussian
The rest of this paper has been organized as fallows; function ψk(v)are approximated by using eigen vectors of
Section II presents brief review on Fractional Fourier commuting matrix S in [10]. The N point DFrFT Matrix
Transform and its discrete form. Section III describes the for rotation angle α is defined [3] as
N blocks U1 and U2 are denoted as multi-blocks. In Fig. 4
Fα = Σ u [n] e
k=0
k
-jαk
uTk [n] (4) the data flow between these blocks is given in detail. The
k ≠ N, for N odd time period between two successive input samples f and
k ≠ N-1, for N even the time period between two successive output results fα
= U E UT are same. The rest of this section presents the detail
Where U is discrete Hermite-Gaussian matrix consists of description of each level of proposed architecture.
discrete Hermite-Gaussian functions as in the fallowing
equation Level-I:
u0[1] u1[1] . . uN-2[1] uM[1]
u0[2] u1[2] . . uN-2[2] uM[2] here In an N-point DFrFT, this level-I is partitioned into
U= . . . . M=N-1; for N odd (5) two parts. The first part performs the calculation of eigen
. . . . M=N; for N even values for given rotation angle (α) using a block named
. . . . as C in the architecture as shown in Fig.2. This block
u0[N] u1[N] . . uN-2[N] uM[N]
receives an angle for every N clock cycles and it
and ‘E’ is a diagonal matrix which contains the eigen computes corresponding N complex conjugated eigen
values e-j0α, e-j1α, e-j2α,..... e-j(N-2)α, e-jMα of DFrFT matrix values. The results of block C for given angle α are ej0α,
Fα as diagonal elements. ej1α, ej2α,….ej(N-2)α, ejMα, where M=N-1, for N odd and
The response of an N-point DFrFT ‘fα[n]’, for N M=N, for N even.
input samples f[n] with rotation angle α can be calculated Clk Enable Clkn
by fα[n]= Fα f[n]. i.e. fαN×1=UN×N*(EN×N*(UTN×N*fN×1)). C.E Clkn
Here * indicates matrix multiplication operation. For the Clk Counter

proposed architecture the matrix E is replaced with a


Counts (0 – N-1); If N Odd
Counts (0 – N-2, N); If N Even * R1 Rotation
column matrix C that contains the Eigen values of Clk angle (α)
DFrFT for given input angle α and middle matrix Pipelined CORDIC R2 Clk
multiplication is replaced by an array multiplication. The
(Calculates Sin & Cos Values)
C Output
modified expression is fαN×1=UN×N*(CN×1×(UTN×N*fN×1)), Imaginary
Part
Real Part
R31 R41 RN1
(Real Part)
Where‘×’ indicates the array multiplication operation. Clk Clk Clk
Clk Clk Clk
III. PROPOSAL OF DFRFT ARCHITECTURE R32 R42 RN2
Output
(Imaginary Part)
The proposed architecture is composed of three Fig. 2: Calculation of Eigen values.
levels. The input data to be process is flow through all
The architecture for calculation of eigen values
the three serially connected levels as shown in Fig.1.
requires two clocks, i.e. clock1 (Clk) having the
Rotation angle (α) Input ‘f ’ frequency same as sampling frequency and another
clock2 (Clkn) having 1/Nth of frequency of clock1. With
Level-I C U1 active high enable signal, the counter counts in sequence
…0, 1, 2,…N-2, M, 0, 1, 2... . This counter output is
Level-II connected to a multiplier which took rotation angle as
E
another input through a register ‘R1’ that receives clock2.
The results of multiplier 0, α, 2α… (N-2)α, Mα; M=N-1
Level-III U2 for N odd, M=N for N even are fed to the pipelined
CORDIC (CO-ordinate Rotation DIgital Computer) by
Rotated input ‘fα’ another register ‘R2’.
Fig.1: Block diagram of the DFrFT The CORDIC [15] calculates the cosine and sine
values of its input angles, which are real and imaginary
The level-I performs two mathematical operations,
parts of complex conjugated eigen values for given
one is calculation of eigen values for given input rotation
rotation angle. The real and imaginary parts of computed
angle and another is calculation of the response of matrix
results pass to the output real part port and output
UT for input samples f. these two operations are carried
imaginary part port respectively through a set of registers
out by two blocks of level-I named as C and U1. This
as shown in fig.2. The requirement of these registers has
level passes two computed results that are matrix C and
been presented at the end of Level-I explanation.
UT*f to the level-II, which execute the multiplication of
The block ‘U1’ of second part of level-I multiplies
eigen values with the response of U1 block and feeds the
input values f with the matrix UT. This part consist of a
product C×UT*f to the level-III. In this level we get the
mod-N counter, ‘N’ number of ROMs with N address
rotated input samples fα = U*C×UT*f as an output, by the
locations per each ROM, N Multipliers, N accumulators,
act of matrix multiplication between level-III input and
one N to1 Multiplexer and set of buffers. The data flow
Hermite-Gaussian matrix U.
in this part is shown in Fig.3. As in block ‘C’ this
If input samples are complex values (f=a+jb), we
‘U1’block also operates with two clocks named clock1
have to calculate the response of U1 block separately for
(clk) and clock2 (clkn). The N rows of the matrix UT are
both real and imaginary parts, so that we need two U1
stored in N ROMs. The arrangement of rows of matrix
blocks. Similarly for any type of input samples f, two U2
UT in ROM is shown in Table-I.
blocks are required to process Level-III real and
imaginary inputs separately. For this reason in Fig.1 the
TABLE-I maintain same latency for both the blocks, it is necessary
ARRANGEMENT OF THE ELEMENTS OF MATRIX “UT” IN ROMS
to insert a set of registers either in block C or block U1.
The number of registers is depends upon the values of N
Address ROM ROM . . ROM ROM
Location 1 2 N-1 N
and Ci, where Ci is number of pipelines used in
0 UTR+1,1 UT R+2,1 . . UTR-1,1 UT R,1 CORDIC. If N>Ci–1, then the N+1–Ci number of
1 UTR+1,2 UT R+2,2 . . UT R-1,2 UT R,2 registers have to add in block C, addition of register set
: : : .
. : : in block U1 is not required and the latency is L=N+3. If
N-1 U R+1,N U R+2,N . . UT R-1,N UT R,N
T T N<Ci–1, then the Ci–(N+1) number of registers have to
UTk,l – Indicates the element belongs to add at the output of multiplexer in block U1, addition of
kth row and lth column of UT Matrix, register set in block C is not required and the latency is
R is the value of N/2 that’s Rounded towards Zero L= Ci+2. If N=Ci+1, then the latency of both the blocks
is same, register set is not required in both C, U1 blocks
The ROMs are accessed with a ring counter with active and latency is L=N+3=Ci+2. The data flow from this
high enable signal as shown in Fig.3. block to next blocks is shown in Fig.4.
Counter Out
Rotation
0 0 Angle α

Enable
Enable
Enable
0
Clk

Clk 1 1 1

Clk
ROM
Counter

ROM

Clk

Clk
ROM 2 2 2
. . . Clkn Clkn
C.E 1 2 N
Real(f) ‘U1’ ‘C’ ‘U1’ Imag.(f)
Enable

. . .

Imag.1
Imag.2

Real 2
Real1
N-1 N-1 N-1
Counter Counter
out out
R1
* * * E Clkn
f

Count r1 r2 rN i1 i2 iN Count
Clk

Clk R21 Clk R22 Clk R2N


‘U2’ for ‘U2’ for
Clk Clk Clk Real Part Imag. Part
Clr Clr
Accumulator1 Clr Accumulator2 AccumulatorN
Real (fα[n]) Imag. (fα[n])
Clkn

Clkn R31 Clkn R32 Clkn R3N Fig. 4: Data flow Diagram of DFRFT
Level-II:
f(n)*UT

N to 1 MUX
The Level-II has a complex multiplier followed by
two serial in parallel out shift registers and a set of 2N
R(Ci+1)

U1
R5

R4

Registers. This level receives the real, imaginary parts of


complex conjugated eigen values form the block C
Clk
Clk

Clk

through its Real2(R2), Imag.2(I2) ports respectively and


Fig. 3: The Data flow for part II of level-I the response of block U1 for input samples ‘f’ is received
by its another two input ports Real1(R1), Imag.1(I1).
All the data of corresponding address locations of N The Block diagram is shown in fig.5.
ROMs is proceed to N multipliers with sampled input
Real1(R1) Real2(R2) Imag.2(I2) Imag.1(I1)
f[n] as another input. At every clock1 cycle, all the N
multipliers multiplies sampled input with output values
of the corresponding N ROMs, and forwards these
results to their N accumulators through registers as
shown in Fig.3. Each of these N accumulators performs * * * *
addition operation between its input and output values on Clk Clk Clk
R4
R1 R2 R3 Clk
every clock1 cycle. When all these N accumulators adds Clk
their N set of inputs, these accumulators sends the
resultant data to next stage and clears the accumulators + –
to add N set of fresh inputs. The N accumulator outputs
E
Out1 Out2
passes through the Nto1 multiplexer to set of registers Serial in Parallel out Cl k Cl k Serial in Parallel out
that are operate with clock1 (clk). The multiplexer Shift Register Shift Register

selection line is connected to counter output. The


Clkn Clkn ‘2N’ Number of Registers
multiplexer inputs are connected to the N accumulator
outputs in such a way that the 1st, 2nd, 3rd….Nth valid
output values of the Nto1 multiplexer should be the 1st, Real Part Output Imaginary Part Output
2nd, 3rd….Nth accumulator output values.
Fig 5: Complex Multiplier with shifting operation
In level-II we have to execute a mathematical
operation in between the outputs of block C and block The complex multiplier of this block is different
U1. So that it is necessary to forward the computed from the ordinary complex numbers multiplier. This
results of block C and block U1 at the same time to complex multiplier performs the multiplication between
level-II, but the latency of block C varies with the the Eigen values and the results of block U1 by taking
number of pipelines used in CORDIC and the latency of the complex conjugate of Eigen values and the results of
block U1 depends upon the value of N. In order to block U1 as inputs. For every clock cycle the complex
multiplier multiplies a new pair of complex numbers. memory locations of N ROMs multiplies with output
The outputs of complex multiplier out1, out2 release the values of level-II as shown in the Fig.6. This N resultant
results of mathematical computations (R1×R2) + (I1×I2), multiplier outputs are added by using N-1 adders and
(R2×I1) – (R1×I2) respectively. send out as rotated input samples in time-frequency
The two resultant outputs, one is real part and plane with given angle α.
another imaginary part are connected to two serial in
parallel out shift registers. The number of registers IV. RESULT AND DISCUSSION
required for each shift register is N-1. For every N-1
clock cycles the complex multiplier passes the N-1 The proposed architecture discussed in the previous
results to this serial in parallel out shift register. The shift section had been designed using verilogHDL for the
register fallowed by a set of 2N registers. The first and order of N equal to four. The design has been simulated
(N+1)th register are connected to real and imaginary using Xilinx simulator with random input samples f(n) =
outputs of complex multiplier respectively. Remaining 2 [11+3i, 9+2i, 7+4i, 8+2i] as test vector. For the sake of
to N and N+2 to 2N registers are connected to the N- simplicity and to realize the outputs of the design, the
1outputs of first shift register (corresponding to out1) integer values for the inputs have been chosen which are
and N-1 outputs of second shift register (corresponding representing with five bits (one bit for sign and four bits
to out2) respectively as shown in Fig.5. But these for integer value). The internal precession of each block
registers operate with the clock2, unlike the shift has been chosen according to avoid maximum truncation
registers, which operate by the clock1. error. Finally the outputs are given in 16-bit format
(where one bit for sign, four bits for integer and eleven
Level-III: bits for fractional value). Similarly for the fractional
value α, the format has been chosen with binary
This level-III performs another matrix weightage as [-π π1 π2 π3 . . . . πb-1 ]. In this case b=16.
2 2 2 2
multiplication operation on the outputs of level II. The The hardware complexity of the proposed design for the
th
signal flow graph is shown in Fig.6. N order of DFrFT has been summarized in Table-II.
Input 1 Input 2 Input N This design is based on pipelined approach; hence the
design requires latency period L+N+1, where L is
Address Address Address Count Latency of the CORDIC.
0 0 0 Input
1 1 1
ROM 2 ROM 2 ROM 2 TABLE-II
1 .
2 .
N . HARDWARE REQUIREMENT FOR N-POINT DFRFT
.
.
N-1 N-1
.
N-1
Component Name Number of Components
Data 1 Data 2 Data N N×16NbitROM 2
Multipliers 4N+5
* * * Adders/Subtractors 4N+ Adders in CORDIC
N to 1 Multiplexers 2
Adder Counters 2
U2
Registers 10N+6+Ci+2×(|N+1-Ci|)

Fig 6: Signal Flow graph for level-III
The simulation output of ISE 10.1i has been
This level has N ROMs, each ROM stores a column presented in Table-III, which shows that the verilogHDL
of matrix U of size N×N. Because of accessing all ROMs simulation results of proposed design are close to
using the counter output of block U1, to maintain the MATLAB simulation outputs. The simulated output with
synchronous between ROMs output values and input timing has been shown in Fig.7. This shows that the
values of multiplier the arrangement of matrix elements proposed architecture takes latencies of 19 clock cycles
in ROMs is as fallows, the data of address 0 of all ROMs (14 clock cycles for Level-I and 5 clock cycles for both
contain the rth row of matrix U. where r is the remainder Level-II and Level-III discussed in previous section).
of (N+L)/N. The address1 of N ROMs stores the next Finally the proposed design has been synthesized using
row of the matrix, and remaining locations of N ROMs Xilinx XST tool, targeting a FPGA device
fallows the same sequence. By fallowing this sequence (XLV5LX110T) [15]. The synthesis results obtained for
the (r-1)th memory location stores the first row of the hardware has been presented in Table-IV.
matrix. When counter counts k, all the data in kth

Fig.7: The Simulation Results of proposed DFRFT architecture using ‘Xilinx ISE’ Simulator
results and also compared with existing architecture
presented in [12]. The implementation results shows that
TABLE-III
COMPARISON OF MATLAB AND XILINX-ISE SIMULATION RESULTS the proposed design is suitable to most of signal, image
processing and communication systems. The proposed
MATLAB Xilinx-ISE Simulation Results architecture and its implementation is fixed in terms of
Simulation Results of Proposed Architecture transform length order N. i.e. N is fixed which
Decimal values Decimal (Hexadecimal ) values constraints to specific applications. Flexibility of
10.5406+4.2159i 10.7929+3.6176i (5658+1CF1i) architecture is required to meet the demand of all
9.0500+2.2435i 9.0234+2.1074i (4830+10DCi) applications. In this context, authors of this paper have
7.0371+3.6100i 7.0151+3.8052i (381F+1E71i)
been working for designing a unified architecture
8.0513+2.1945i 8.0234+2.1074i (4030+10DCi)
suitable for all applications.
TABLE-IV
HDL SYNTHESIS REPORT- MACRO STATISTICS REFERENCES

Component Name Number of Components [1] L. B. Almedia, “The Fractional Fourier Transform and
4×64bitROM 2 Time-Frequency Representations”, IEEE Trans. On Sig.
Multipliers 21 Process., vol.42, pp. 3084-3090, November 1994.
Adders/Subtractors 56 [2] V. Namias, “The Fractional Order Fourier Transform and
4 to 1 Multiplexers 2 its Application to Quantum Mechanics”, inst. Math. Appl.,
Counters 2 vol.25, pp. 241-265, August 1980.
Registers 71 [3] S. C. Pei, C. C. Tseng, M. H. Yeh, and J. J. Shyu,
Accumulators 9 “Discrete fractional Hartley and Fourier transforms,”
IEEE Trans. Circuits Syst. II, vol. 45, pp. 665–675, 1998.
[4] H. M. Ozaktas, B. Barshan, D. Mendlovic, L. Onural,
The synthesis report in this table shows that the “Convolution, filtering, and multiplexing in fractional
synthesis results for hardware requirement are fourier domains and their relation to chirp and wavelet
approximately same as the theoretical results. Timing transform”, J. Opt. Soc. Am. A, vol. 11, pp. 547-559,
report of this implementation shows that the proposed February 1994.
design can be operated at maximum frequency of [5] N. Zhou, T. Dong, “Optical image encryption scheme
217MHz. the proposed architecture in this paper has based on multiple parameter random fractional Fourier
been compared with the architecture presented in[12] for transform”, 2009 Second Int. Symposium On electronic
N=1024. The comparison for hardware and timing has commerce and security, pp. 48-51, 2009.
[6] Y. Zhang, Q. Zhang, Shaohua Wu, “Biomedical signal
been highlighted in Table-V.
detection based on Fractional Fourier Transform”, IEEE,
ITAB 2008, pp.349 – 352, May 2008.
TABLE-V
COMPARISON OF PROPOSED ARCHITECTURE WITH [12]
[7] W. Pan, K. Qin, Y. Chen, “An Adaptable-Multilayer
FOR 1024-POINT DFRFT Fractional Fourier Transform Approach for Image
Registration” IEEE Trans. on pattern analysis and
Hardware requirement machine intelligence, vol 31, March 2009.
Number of Components [8] R. IWAI, H.Yoshimura, ”Security of registration data of
Component Name Architecture Proposed fingerprint image with a server by use of the fractional
in [12] Architecture Fourier transform”, IEEE, ICSP2008 Proceedings,
Multipliers 1048576 4101 pp.2070-2073, 2008.
Adders/ Subtactors 1048576 4144 [9] WU. Hai-zhai, Tao ran, ” Broadband Beamforming of
LFM signal based on Fractional Fourier Transform”,
Registers 5242880 12280
ICSP2008 Proceedings, pp.296-298., 2008.
3072 (2:1 Mux)
Multiplexers 2 (1024:1 Mux) [10] C.Candan, M.A.Kutay, H.M.Ozaktas, “ The Discrete
3072 (4:1 Mux)
Fractional Fourier Transform”, IEEE Trans. on sig.
Counters Not Mentioned 2
process., vol. 48, pp. 1329-1337, May 2000.
Timing details [11] T. Ran, Z. Feng & W. Yue, “ Research progress on
Maximum speed 99.58 MHz 217.39 MHz discretization of fractional Fourier transform”, Springer,
Sampling frequency 33.00 MHz 217.39MHz Sci. China Ser F-Inf Sci., pp. 859-880, July 2008
[12] P. Sinha, S. Sarkar, A. Sinha, D. Basu, “ Architecture of a
This shows that the proposed design in this paper is configurable Centered Discrete Fractional Fourier
better in terms of hardware complexity and timing Transform Processor” IEEE Circuits and Systems,
compared to architecture presented in [12]. MWSCAS 2007. 50th Midwest Symposium, pp.329-332,
2007.
V. CONCLUSION [13] S. C. Pei, W.L. Hsue, J.J.Ding, “Discrete Fractional
Fourier Transform Based on New Nearly tridiagonal
commuting Matrices”, IEEE Trans. on Signal processing,
In this paper, new hardware architecture for vol.54, pp. 3815-3828. October 2006.
computing DFrFT has been proposed. This architecture [14] K.C. Ray and A.S. Dhar, “CORDIC-based unified VLSI
has been described using verilogHDL, synthesized and architecture for implementing window functions for real
implemented on targeted FPGA device (XLV5LX110T). time spectral analysis”, IEE Proc.-Circuits Devices Syst.,
The simulation results are verified with MATLAB and Vol. 153, pp. 539-544 , December 2006.
the implementation results are compared with theoretical [15] Xilinx, “Virtex-5 FPGA User Guide”, UG190 (v4.7) May
1, 2009.

Вам также может понравиться