50 Years of FFT Algorithms and Applications.

Circuits, Systems, and Signal Processing
https://doi.org/10.1007/s00034-019-01136-8
50 Years of FFT Algorithms and Applications
G. Ganesh Kumar1 · Subhendu K. Sahoo1 · Pramod Kumar Meher2
Received: 12 October 2018 / Revised: 3 May 2019 / Accepted: 4 May 2019

© Springer Science+Business Media, LLC, part of Springer Nature 2019
Abstract
The fast Fourier transform (FFT) algorithm was developed by Cooley and Tukey
in 1965. It could reduce the computational complexity of discrete Fourier transform
significantly from O(N 2 ) to O(N log2 N ). The invention of FFT is considered as a
landmark development in the field of digital signal processing (DSP), since it could
expedite the DSP algorithms significantly such that real-time digital signal processing
could be possible. During the past 50 years, many researchers have contributed to
the advancements in the FFT algorithm to make it faster and more efficient in order
to match with the requirements of various applications. In this article, we present a
brief overview of the key developments in FFT algorithms along with some popular
applications in speech and image processing, signal analysis, and communication
systems.
Keywords DFT · FFT algorithms · Computational complexity · Digital signal

processing
1 Introduction
The discrete Fourier transform (DFT) is the most widely used tool in digital signal pro-
cessing (DSP) systems. It has indispensable role in many applications such as speech,
audio and image processing, signal analysis, communication systems, and many oth-
ers. It maps time domain sequence to a frequency domain sequence of the same
B G. Ganesh Kumar
ggkumar1988@gmail.com
Subhendu K. Sahoo
sahooemailid@gmail.com
Pramod Kumar Meher
pkmeher@gmail.com
1 Department of Electrical Engineering, BITS - Pilani, Hyderabad Campus, Hyderabad,

Telangana, India
2 C V Raman College of Engineering, Bhubaneswar, India
length, while the inverse discrete Fourier transform (IDFT) performs the opposite.
The brute-force computation of the DFT of length N requires O(N 2 ) multiplications,
and we need to use the DFT of length 128 or more for many practical applications.
The computational requirement in that case becomes more than 104 . Due to such high
computational requirement of brute-force computation of DFT, it was not possible
to use that for real-time and online DSP applications until 1965, when Cooley and
Tukey [19] developed the famous fast Fourier transform (FFT) algorithm. It could be
possible to reduce the operation count of DFT from O(N 2 ) to O(N log2 N ), for a DFT
of length N . During the last 50 years, the innovations in algorithms and architectures
have made remarkable progress in the efficiency of computation of the FFT.
The main objective of this article is to provide a brief review of the developments of
FFT algorithms. There are very large number of research articles on FFT available in
the literature. It is not possible to cover all the work here. Therefore, we focus here on
important developments in FFT algorithms. In this article, we have classified the FFT
algorithms as complex-valued FFTs (CFFTs), real-valued FFTs (RFFTs) according
to the input values and special cases of FFTs. Finally, we discuss some important
applications of FFT and the choice of FFT algorithms for specific applications.
Rest of this article is organized as follows. In Sect. 2, we discuss the significance of
DFT, its computational complexity, and the historical perspective of the development of
FFT from Gauss to Cooley–Tukey algorithm. The key developments in FFT algorithms
are discussed in Sect. 3. The applications of FFT in signal and image processing,
communications, discrete trigonometric transformations and in integer arithmetic are
discussed briefly in Sect. 4. The conclusions are presented in Sect. 5.
2 A Historical Perspective
In this section, we discuss the significance of DFT and the computational complexity
(Multiplications and Additions) involved in the brute-force computation of the DFT.
At the end of this section, we discuss the divide-and-conquer approach, which could
be used to reduce the computational complexity of the DFT.
Discrete Fourier Transform:
The N -point DFT/IDFT are, respectively, calculated as

N −1
X (k) = x(n)W Nnk , k = 0, 1, 2, . . . , N − 1, (1)
n=0
and
N −1
1
x(n) = X (k)W N−nk , n = 0, 1, 2, . . . , N − 1, (2)
N
k=0
0.8
0.6
0.4
0.2
Amplitude
-0.2
-0.4
-0.6
-0.8
-1
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
Time index
Fig. 1 A sine wave
where n is the time index and k is the frequency index. The twiddle factor W Nnk can
be represented as:

− j2π nk/N 2π nk 2π nk
W Nnk =e = cos − j sin (3)
N N
In Eqs. (1) and (2), the data sequence x(n) may be complex and the kth spectral
component X (k) is always complex. These two equations differ only in the sign (−) of
the exponent of twiddle factor W N and the scale factor 1/N . Therefore, the algorithms
for efficient computation of DFT could be applied for the efficient computation of
IDFT by simple and straightforward modifications.
Significance of the DFT
To illustrate the significance of DFT, let us consider a 4-point DFT of samples of a

sinusoidal signal of 10 Hz which is expressed as:
x(t) = sin(2π · 10 · t) (4)
For this sine wave, the fundamental period T0 = 0.1 s as shown in Fig. 1. Let us take
the sample rate f s = 40 Hz, i.e., the input is sampled at every 1/ f s = T = 0.025 s.
Because N = 4, we need 4 input sample values which could be obtained as follows:
nπ
x(n) = x(nT ) = sin(2π · 10 · nT ) = sin
2
at n = 0, x(0) = sin(0) = 0
π
at n = 1, x(1) = sin =1
2
0.5
x(n)
-0.5
-1
0 0.5 1 1.5 2 2.5 3 3.5 4
n
(a)
2
1.5
Magnitude
0.5
0
0 0.5 1 1.5 2 2.5 3 3.5 k 4
(b)
2
1
Phase
-1
-2
0 0.5 1 1.5 2 2.5 3 3.5 4
k
(c)
Fig. 2 Illustration of the DFT for N = 4 of a sine wave. a Finite-length sequence x(n), b DFT magnitude.
c DFT phase
at n = 2, x(2) = sin(π ) = 0

3π
at n = 3, x(3) = sin = −1
2
The finite-length sequence of x(n) is shown in Fig. 2a, where x-axis represents the
values of n and y-axis represents the amplitude. The twiddle factors for N = 4 are
defined as:

2π nk 2π nk
W4nk = cos − j sin (5)
4 4
where nk = 0 to N − 1, i.e., 0 to 3. From Eq. (5), W4nk values are: W40 = 1,

W41 = − j, W42 = −1, W43 = j.
The general equation for the 4-point DFT could be written as

3
X (k) = x(n)W4kn
n=0
(k)(0) (k)(1) (k)(2) (k)(3)
= x(0)W4 +x(1)W4 +x(2)W4 +x(3)W4 0≤k≤3
(6)
The DFT output values are obtained for k = 0, 1, 2, 3 as
X (k) = [0 − 2 j 0 2 j]
From the DFT output values, the sinusoidal signal can be plotted in terms of its
magnitude and phase as shown in Fig. 2b, c, respectively. The value X (k) is said to
provide information about the kth frequency bin.
The frequency resolution can be obtained as:
1 1 fs
f = = = (7)
T0 NT N
As the fundamental period of a sinusoidal signal is 0.1 s, so the frequency resolution

is 10 Hz. From Eq. (7), one can observe that to increase the frequency resolution, the
number of points of data N must be increased [79].
Computational Complexity of DFT
Computation of each DFT component directly according to (1) requires N complex

multiplications and (N −1) complex additions. Therefore, to compute all the N values
of DFT requires a total number of N 2 complex multiplications and N (N − 1) complex
additions.
The DFT of N -point complex-valued input sequence, x(n), can be expressed as
X (k) = X R (k) + j X I (k)

N −1
= [x R (n) + j x I (n)] W RknN + j W IknN
n=0

N −1
= (x R (n)W RknN − x I (n)W IknN ) + j x R (n)W IknN + x I (n)W RknN (8)
n=0
where k = 0, 1, 2, . . . , N − 1. Assuming that each complex multiplication in Eq. (8)

is realized by four real multiplications and two real additions, while each complex
addition is realized by two real additions, the direct computation of Eq. (8) requires
4N 2 and 2N (2N −1) number of real multiplications and real additions [24]. Moreover,
the computation of DFT also requires a number of indexing and addressing operations
to fetch the input values, intermediate results, and complex coefficients W Nkn and to
store the final results. For large values of N , the arithmetic complexity of DFT is very
high. Therefore, different algorithms have been proposed to reduce the arithmetic
complexity for fast and efficient computation of DFT.
A Historical Perspective of Fast Computation of DFT
The computational complexity of DFT is substantially reduced by using the following

trigonometric symmetry and periodicity of the twiddle factor W Nkn :
k+ N2
WN = −W Nk (Symmetry Property) (9)
W Nk+N = W Nk (Periodicity Property) (10)
These properties were known for a long time even before the inception of dig-
ital computation. Heideman et al. [42] traced the first appearance of the FFT
back to Gauss in the year 1805. Gauss developed an algorithm to calculate
the DFT which is equivalent to one of the Cooley–Tukey algorithm. However,
Gauss never published his algorithm outside his collected works. A prior work
of Danielson and Lanczos [21] referred to Runge [73] for their doubling algo-
rithm in X-ray scattering problems. Their algorithm showed how to reduce a DFT
in 2N points to two DFTs on N points with only slightly more than N opera-
tions. The complexity of these algorithms was much less than N 2 but more than
N log2 N .
The early discoveries of the FFT not noticed till the publication of Cooley and
Tukey’s article in 1965 [19]. This article presented an efficient algorithm based
on divide-and-conquer approach in order to compute the DFT. Divide-and-conquer
approach was applied to the DFT recursively, such that a DFT of any size N = N1 N2
computed in terms of smaller DFTs of sizes N1 and N2 . If N can be factored into
N = N1 N2 , the indices n and k in (1) for N -point DFT can be rewritten as:

0 ≤ n 1 ≤ N1 − 1
n = N2 n 1 + n 2 (11a)
0 ≤ n 2 ≤ N2 − 1

0 ≤ k1 ≤ N1 − 1
k = N1 k2 + k1 (11b)
0 ≤ k2 ≤ N2 − 1
The index representation of (11) can be used in (1) to write X (k) as:
2 −1 N
N 1 −1
X (k) = x(N2 n 1 + n 2 )W Nn 11k1 W Nn 2 k1 W Nn 22k2 (12)
n 2 =0 n 1 =0

twiddle factor
N1 −point DFT

N2 −point DFT
where 0 ≤ k1 ≤ N1 − 1 and 0 ≤ k2 ≤ N2 − 1.
The calculation of X(k) according to (12) can be carried out in three steps: (i)
compute N1 -point DFT, (ii) multiply by twiddle factors, and (iii) finally compute
N2 -point DFT. The above three-step procedure can be carried out successively till
N1 = 2. The computational complexity of the DFT by this recursive divide-and-
conquer approach is reduced from O(N 2 ) to O(N log2 N ) operations [19]. This was
the major turning point for real-time DSP applications of the DFT.
3 Advancements in FFT Algorithms
The basic principle of divide-and-conquer approach leads to a variety of efficient

algorithms. As these algorithms improve the performance in terms of computation
time, these are known as fast algorithms or fast Fourier transform algorithms. In
this section, we discuss three major classes of FFT algorithms. The first class of
algorithms is complex-valued FFTs, where the input of the sequence is complex
valued. In the second class of algorithms, the input sequence is real-valued known
as real-valued FFTs. At the end of this section, we discuss the special cases of the
FFT.
3.1 Complex-Valued FFT Algorithms
We discuss here some popular FFT algorithms to reduce the computational complexity
and hardware implementation complexity. In this subsection, we discuss the popular
FFT algorithm followed by some algorithms that can improve the computational speed
and reduce the hardware complexity.
3.1.1 Radix-2 FFT Algorithms
The basic FFT algorithms are decimation-in-time (DIT) and the decimation-in-
frequency (DIF) radix-2 algorithms. These algorithms are applicable to compute the
DFT of integer power-of-two lengths.
(i) Decimation-in-Time Radix-2 FFT Algorithm
This algorithm decomposes the time domain sequence {x(n)} into successively
smaller subsequence. Therefore, it is called as decimation-in-time algorithm [18].
The principle of radix-2 DIT FFT algorithm is illustrated in the following by con-
sidering N = 2 M , where M = 1, 2, 3, . . .. Since N is an even integer, the N -point
input data can be split into two (N /2)-point subsequences {x1 (n)} and {x2 (n)}, which
correspond to the even- and the odd-indexed samples of the input {x(n)}, respectively,
that is,
x1 (n) = x(2n)
x2 (n) = x(2n + 1), n = 0, 1, 2, . . . , N /2 − 1 (13)
Fig. 3 Length-4, DIT radix-2 FFT
Now the N -point DFT can be derived from two half-length DFTs by the decimation-
in-time process as follows:

N −1
X (k) = x(n)W Nnk
n=0
N N
2 −1
2 −1

= x(2n)W N2nk + x(2n + 1)W N(2n+1)k
n=0 n=0
N N
−1 2 −1

2
= x(2n)W Nnk/2 + W Nk x(2n + 1)W Nnk/2 (14)
n=0 n=0
Similarly, the (N /2)-point DFTs can be computed from a pair of (N /4)-point DFTs.
The decimation process is continued till it contains only two-point DFTs. For a power-
of-two length sequences, decomposition of N -point DFT into 2-point DFTs could be
completed in M = log2 N steps of decimation.
Figure 3 shows the decomposition of 4-point radix-2 DIT FFT using the simplified
butterflies which involves two stages, each with two butterflies per stage. The input
data to this are in bit-reversed order and the DFT output is in normal order.
(ii) Decimation-in-Frequency Radix-2 FFT Algorithm
This algorithm is based on computing the DFT by decomposition of the sequence

of DFT coefficients X (k)s into smaller subsequences, hence called as decimation-in-
frequency algorithm [18].
In case of radix-2 DIF FFT, the DFT computation is split into two parts such that
the first part involves the first N /2 data points and the second part involves the next
N /2 data points, as follows:
N
2 −1

N −1
X (k) = x(n)W Nnk + x(n)W Nnk (15)
n=0 n= N2
k N /2
Since W Nnk = e− j2k/N and W N = (−1)k , (15) is simplified as:
2 −1
N
N
X (k) = x(n) + (−1)k · x n + · W Nnk (16)
2
n=0
The radix-2 DIF algorithm rearranges (16) into even-indexed and odd-indexed fre-
quency bins as
2 −1
N
N
X (2k) = x(n) + x n + · W Nnk/2 (17)
2
n=0
2 −1
N
N
X (2k + 1) = x(n) − x n + · W Nnk/2 · W Nnk/2 (18)
2
n=0
According to (17) and (18), the even-indexed and odd-indexed frequency outputs
X (k) can be computed by a pair of N /2-length DFTs. The entire process involves
M = log2 N stages of decimation, where each stage involves N /2 butterflies. Figure 4
shows the flow graph of radix-2 DIF decomposition of a 4-point DFT computation.
In this flow graph, the input is in normal order and the DFT output is in bit-reversed
order. To compute the 4-point DFT, it requires four complex multiplications and eight
complex additions.
The computation of N -point DFT via the DIF or DIT FFT algorithms require
(N /2) log2 N and N log2 N number of complex multiplications and complex addi-
tions, respectively. For a radix-2 algorithm, the operation count can be further reduced
by realizing each complex multiplication by three real multiplications and three real
Fig. 4 Length-4, DIF radix-2 FFT

additions (a 3/3 algorithm) [13]. When 3/3 algorithm is used for complex multipli-
cations the arithmetic complexity of radix-2 FFT could be given by:
3N
RM = log2 N − 5N + 8 (19)
2
7N
RA = log2 N − 5N + 8 (20)
2
where RM and RA are the real multiplications and real additions to compute an N -point
DFT, respectively.
3.1.2 Radix-4 FFT Algorithm
It can be used when the DFT length N is a power-of-four (i.e., N = 4 M ). Unlike

the radix-2 FFT algorithm in the radix-4 algorithm during every step, decimation is
carried out by a factor of 4 [31].
In the first step of radix-4 DIT FFT, the input N -point data are split into four
subsequences as x(4n), x(4n+1), x(4n+2), and x(4n+3), where n = 0, 1, . . . , N /4−
1. Then,

N −1
X (k) = x(n)W Nnk
n=0
N N
4 −1
4 −1

= x(4n)W Nnk/4 + W Nk x(4n + 1)W Nnk/4
n=0 n=0
N N
−1 4 −1

4
+W N2k x(4n + 2)W Nnk/4 + W N3k x(4n + 3)W Nnk/4 (21)
n=0 n=0
log N
As the FFT length of a radix-4 is N = 4 M , it requires M = log4 N = 22
stages of decimation where each stage involves N /4 butterflies. The radix-4 FFT
butterfly structure is shown in Fig. 5. The decimation process of each stage is similar
to radix-2 algorithm. Since W N0 = 1, each radix-4 butterfly involves three complex
multiplications and eight complex additions [83]. Therefore, the number of complex
3N
multiplications is log4 N . Comparing with the radix-2 approach, this requires less
4
number of complex multiplications, although it uses the same number of complex
additions. The total operation count for N -point radix-4 FFT is [24]:
9N 43N 16
RM = log2 N − + (22)
8 12 3
25N 43N 16
RA = log2 N − + (23)
8 12 3
1
x(n) X(k)
1
1
W0
1
1
x(n+N/4) -1
1
X(k+2N/4)
-1
W1
-j 1
-1
x(n+2N/4) j X(k+N/4)
W2
j 1
-1
x(n+3N/4) -j
X(k+3N/4)
W3
Fig. 5 Radix-4 FFT butterfly
3.1.3 Radix-2i and Higher Radix FFT Algorithms
The twiddle factor multiplicative complexity can be reduced by using higher radices
like radix-8 [9] or radix-16 [82]. But, the implementation complexity grows as the
radix becomes higher. In 1996, He and Torkeson [41] discussed about radix-22 and
radix-23 FFT algorithms. These algorithms have the same number of nontrivial1 mul-
tiplications as radix-4 and radix-8 algorithms, respectively. However, these algorithms
differ in the twiddle factors at different FFT stages, but maintain the same butterfly
structure of radix-2 algorithm. Followed by He and Torkeson [41], several radix-2i
[29] algorithms are developed for higher radices that include radix-24 [47], modified
radix-24 [47], radix-25 [16], and modified radix-25 [16] algorithms. The idea of these
radix-2i algorithms is to get simpler butterfly structure with less multiplicative com-
plexity. The following subsection explains the derivation of the radix-22 algorithm,
which can be extended for higher radices.
(i) Radix-22 Algorithm
In [41], the authors have proposed a radix-22 algorithm using index decomposition
technique. To illustrate the derivation of this algorithm, the time and frequency indices
for i = 2 are decomposed as follows:

N N N
n= n1 + n2 + n3 n 1 , n 2 = 0, 1, n 3 = 0 ∼ −1
2 4 4

N
k = k1 + 2k2 + 4k3 k1 , k2 = 0, 1, k3 = 0 ∼ −1 (24)
4
1 Twiddle factor multiplication by 1, −1, j, and − j are trivial and other multiplications by W 1 = 0.707 −
8
1 = 0.923 − j0.382 are nontrivial.
j0.707, W16
Substituting (24) in (1), we can get the following expression:
N
4 −1
1
1
N N
X (k1 +2k2 +4k3 ) = x n1 + n2 + n3
2 4
n 3 =0 n 2 =0 n 1 =0

N N
2 n 1 + 4 n 2 +n 3 (k1 +2k2 +4k3 )
WN

1
N
4 −1
N
N n 2 +n 3 (k1 +2k2 +4k3 )
B Nk1/2
4
= n2 + n3 WN (25)
4
n 3 =0 n 2 =0
where

N N N N
B Nk1/2 n2 + n3 =x n 2 + n 3 + (−1) x
k1
n2 + n3 + (26)
4 4 4 2
The decomposition of common twiddle factor in (25) is the key difference from the
decomposition of the radix-2 algorithm, which can be expressed as

N
n 2 +n 3 (k1 +2k2 +4k3 ) n (k1 +2k2 )
= (− j)n 2 (k1 +2k2 ) W N3 W nN3 k3
4
WN (27)
4
Substituting (27) in (25), the components of N -point DFT could be obtained from
four DFTs of length N /4 as follows:
N
4 −1

X (k1 + 2k2 + 4k3 ) = {B Nk1/4
k2
(n 3 )W Nn 3 (k1 +2k2 ) }W nN3 k3 (28)
4
n 3 =0
where

N
B Nk1/4
k2
(n 3 ) = B Nk1/2 (n 3 ) + (−1)k2 (− j)k1 B Nk1/2 n3 + (29)
4
An N -point DFT is now decomposed into four DFTs of length-(N /4) DFTs, accord-
ing to (28). Each DFT of length N /4 can be further decomposed in the same way until
length-2 or length-4 DFTs are reached.
Figure 6 shows a flow graph of 16-point radix-22 DIF FFT. It requires the trivial
multiplication by W16 4 = − j in the first and the third stages, whereas it requires
nontrivial multiplications in the second stage. This flow graph is different from that
of radix- 2 algorithm in which nontrivial twiddle factors are needed at the outputs
of every stage (except the last one). This algorithm has a great structural advantage
compared to other algorithms (radix-2 and radix-4) when they are implemented in
pipeline architectures [47].
Fig. 6 Signal flow graph of 16-point radix-22 DIF FFT
(ii) Higher Radix Algorithms
The linear index decomposition scheme of radix-22 algorithm can extended for higher
radices, e.g., radix-23 , radix-24 , modified radix-24 (Radix-M.24 ), radix-25 , and mod-
ified radix-25 (Radix-M.25 ). The N -point FFT computation with radix-2i algorithm
involves log2 N stages. Table 1 shows the twiddle factor at each stage to compute
the N -point FFT for various radix-2i algorithms (Number of stages are shown up to
Table 1 Twiddle factor of different stages for N -point FFT flow graph
Algorithm Stage
1 2 3 4 5 6 7 8
Radix−22 W4 WN W4 W N /4 W4 W N /16 W4 W N /64

Radix−23 W4 W8 WN W4 W8 W N /8 W4 W8
Radix−24 W4 W8 W16 WN W4 W8 W16 W N /16
Radix−M.24 W4 W16 W4 WN W4 W16 W4 W N /16
Radix−25 W4 W8 W16 W32 WN W4 W8 W16
Radix−M.25 W4 W8 W32 W4 WN W4 W16 W4
eight in Table 1, which can extend to log2 N stages). These algorithms have the same
butterfly structure but the twiddle factor multiplication structure is varied with the
exponent i. The twiddle factor multiplications are classified into trivial (W4 which is
multiplication by − j), and other multiplications are nontrivial [29].
From Table 1, one can observe that radix-23 [41] algorithm requires trivial multi-
plication at first stage, and nontrivial multiplications at the second and third stages.
This type of sequence is repeated for every three stages in order to obtain radix-23
algorithm. Radix-24 includes trivial multiplication at first stage, and nontrivial mul-
tiplications in the next three stages. In [47], a modified radix-24 algorithm has been
proposed, which requires less number of multiplications. In the modified algorithm of
[47], the twiddle factor of third stage W16 is transferred to the second stage. A modified
radix-25 algorithm is suggested in [16], which is a combination of two decomposition
methods of radix-25 algorithm. In [29], the authors have presented the decomposition
of radix-2i algorithm which further reduces the number of multiplications. The radix-
2i algorithms have the advantages of lower multiplicative complexity and structural
advantage to be used in pipeline architecture.
3.1.4 Split-Radix FFT
The split-radix FFT (SRFFT) algorithm was introduced in [92], but this was clearly
described in [23].
The split-radix algorithm decomposes an N -point DFT into one N /2-point DFT
and two N /4-point DFTs as:

N −1
X (k) = x(n)W Nnk
n=0
N N N
2 −1
4 −1
4 −1

= x(2n)W Nnk/2 + W Nk x(4n + 1)W Nnk/4 + W N3k x(4n + 3)W Nnk/4
n=0 n=0 n=0
This algorithm makes use of both radix-2 and radix-4 (radix-2/4) behavior simul-
taneously on upper and lower half of the signal flow graph as shown in Fig. 7. The
arithmetic complexity of SRFFT algorithm is given by [23]:
RM = N log2 N − 3N + 4 (30)
RA = 3N log2 N − 3N + 4 (31)
The SRFFT algorithm [23] requires less number of multiplications and additions
compared to radix-2 and radix-4 algorithms. Followed by the SRFFT algorithm of
[23], many split-radix algorithms [22,81,94] were suggested to further reduce the
number of complex multiplications and additions over the radix-2, radix-4 or any
higher radix-based algorithms.
Fig. 7 Split-radix FFT
Table 2 Number of real

N Radix-2 Radix-4/ Radix-22 Split radix
multiplications to compute a
length-N complex DFT 16 24 20 20
32 88 – 68
64 264 208 196
128 712 – 516
256 1800 1392 1284
512 4360 – 3076
1024 10,248 7856 7172
Table 3 Number of real

N Radix-2 Radix-4/ Radix-22 Split-radix
additions to compute a length-N
complex DFT 16 152 148 148
32 408 – 388
64 1032 976 964
128 2504 – 2308
256 5896 5488 5380
512 13,566 – 12,292
1024 30,728 28,336 27,652
3.1.5 Computational Complexity for Complex-Valued FFT Algorithms
Tables 2 and 3 shows the comparison of the number of real multiplications and real
additions to compute an N -point DFT. From these tables, one can observe that the
split-radix FFT requires less number of arithmetic operations compared to the other
algorithms. However, the flow graph of this algorithm results in an irregular structure
due to the mix of FFTs of different lengths in different parts.
3.2 Real-Valued FFT Algorithms
When the input sequence {x(n)} is real valued, the DFT components exhibit conju-
gate symmetry behavior, i.e., X (k) = X ∗ (N − k). Therefore, we need to compute
only half the number of DFT components in this case. But the FFT algorithms for the
computation of complex-valued input cannot be used directly to reduce the compu-
tational complexity to half, when we want to compute the DFT of real-valued input.
FFT of real-valued data and FFT of complex-valued data are generally referred to as
real-valued FFT (RFFT) and complex-valued FFT (CFFT), respectively.2
Moreover, efficient realization of RFFT has received great attention due to its sev-
eral important and emerging applications in the area of biomedical engineering and
healthcare, audio and video processing, time series analysis, and many others [80].
Several algorithms are therefore proposed for the RFFT computation. Real-valued
FFTs [9] provide area and speed improvement over the CFFTs. The RFFT algorithms
are generally tailored for real-valued data by using the trigonometric symmetries and
periodicities [9]. In the following subsection, initially we discuss different approaches
for the computation of FFT of real-valued data.
3.2.1 Computation of the RFFT Using the CFFT
The simplest way of using the CFFT algorithm to compute the RFFT is to set the real-
valued sequence into the real part of complex-valued input and to set the imaginary
part of the input values to zero [79]. This approach does not provide significant saving
of computation over the CFFT since the intermediate results become complex-valued
just after the first stage, when the complex twiddle factors are multiplied. Therefore,
doubling algorithm and packing algorithm are proposed to compute the RFFT [79].
(i) Doubling Algorithm
In this algorithm, a pair of real-valued input sequence is used at a time [27]. The first
real-valued data sequence is used as the real part and the second real data sequence as
the imaginary part of the complex-valued input sequence of the CFFT. The complex
input values thus obtained is expressed as:
x(n) = p(n) + j.q(n) (32)
where p(n) and q(n) are elements of two real-valued data sequences. An N -point
CFFT of complex input {x(n)} is then obtained as:
X (K ) = P(K ) + j.Q(K ) (33)
2 The FFT components in case of RFFT also are complex.

Since p(n) and q(n) are real-valued data, the following symmetry holds
P ∗ (N − k) = P(K )
Q ∗ (N − k) = Q(K ) (34)
hence follows the output sequence as:
X ∗ (N − k) = P(K ) − j.Q(K ) (35)
By using Eqs. (33) and (35), P(k) and Q(k) can be obtained as:
1
P(k) = X (k) + X ∗ (N − k)
2
j ∗
Q(k) = X (N − k) − X (k) (36)
2
In order to separate P(k) and Q(k) according to (36), 2(N −1) extra additions over the
normal complex FFT are required. Using the 3/3 algorithm for complex multiplication,
the RFFT requires 21 M N − 23 N +2 multiplications and 23 M N − 21 N additions [65]. This
algorithm requires almost half of the arithmetic complexity of the CFFT algorithm.
(ii) Packing Algorithm
This is another approach to compute N -point FFT of real-valued input using N /2-point
CFFT [65]. It uses the odd- and even-indexed samples of the N -point real-valued input
sequence to form the (N /2)-point complex data. This is called packing algorithm, since
it packs the N -point real-valued sequence into (N /2)-point complex-valued sequence.
The real-valued data can be represented in the form of a complex data as:
x(n) = x(2n) + j.x(2n + 1) (37)
where n = 0, 1, 2, . . . , N − 1.
Let p(n) = x(2n) and q(n) = x(2n+1), then the DFT output X (K ) can be obtained
by using CFFT as in doubling algorithm. Therefore, this also requires 2(N − 1) extra
additions to separate the outputs of the CFFT as in the case of doubling algorithm.
Moreover, it requires an additional stage to compute the outputs of the RFFT. The
corresponding RFFT requires 21 M N − 45 N multiplications and 23 M N − 14 N − 4
additions by using 3/3 algorithm for complex multiplications [65].
3.2.2 FFT of Real-Valued Data
The reduction in the arithmetic complexity can be obtained by using specific algorithms
such as DIT FFT algorithm for the computation of the RFFT. This can be achieved
by applying the conjugate symmetric property and computing only one-half of the
intermediate outputs in each stage, since the others can be obtained by conjugating
those intermediate values. This results with less arithmetic complexity for the radix-
2 DIT FFT algorithm [80]. By assuming a 3/3 algorithm, radix-2 DIT FFT for a
3 7
real-valued sequence require M N − 25 N + 4 multiplications and M N − 27 N + 6
4 4
additions.
The radix-4 and higher radix algorithms [8] for real-valued inputs can be obtained
in the way similar to that of the radix-2 DIT FFT. The split-radix algorithm is more
efficient in terms of arithmetic complexity than higher radix algorithms. It requires
only 21 M N − 23 N + 2 multiplications and 23 M N − 25 N + 4 additions [80]. However,
these algorithms are not valid for the DIF decomposition of the FFT because it is
not possible to apply the conjugate symmetry at each stage. In [75], an alternative
algorithm is proposed to obtain the same savings for the DIF decomposition. In [28],
the authors have proposed a modified radix-2 algorithm for the computation of the
RFFT which solves the irregularities of the RFFT. This approach is valid for both
DIT and DIF decompositions and could be generalized for any number of points,
which is power-of-two. In [6], the computation of RFFT was based on a modified
radix-2 algorithm which removes the redundant operations from the flow graph. This
modified flow graph contains only real data paths instead of complex data paths in a
regular flow graph. In [59], a mathematical formulation was presented for removing
the redundancies in the radix-2 DIT RFFT. This formulation regularizes the flow graph
in order to compute folded RFFT with a simple control unit.
3.2.3 Fast Hartley Transform-Based Algorithm
The DFT of real-valued data could be computed from the Discrete Hartley Transform
(DHT) [12] of the same data. The DHT of a real-valued input sequence is defined as:

N −1
2π kn 2π kn
X (k) = x(n) cos + sin (38)
N N
n=0
for k = 0, 1, 2 . . . N − 1.
Unlike the DFT, the DHT takes real-valued input and provides real-valued output.
The absence of complex arithmetic makes the DHT faster than the DFT. Algorithms
similar to the radix-based FFT can also be applied to DHT computations called as Fast
Hartley Transform (FHT) algorithm. Generally, FHT algorithms involve the same mul-
tiplications and (N − 2) more addition compared to the corresponding FFT algorithm.
(−1) M
The split-radix FHT algorithm requires 2N 3 log2 N − 9 + 3 +
19N
9 multiplications
and 4N3 log 2 N − 14N
9 + 3 + (−1) M 5 additions.
9
An N -point DFT of real-valued data can be computed from the DHT of the same
data as follows:
DHT(k) + DHT(N − k)
Re(DFT(k)) =
2
DHT(k) − DHT(N − k)
Im(DFT(k)) = (39)
2
n k
0 0
1 1
Discrete -1
2 2
Cosine -1
3 3
Transform -1
4 4
N=8
-1
5 5
-1
6 -1 6
7 -1 7
8 8
9 9
-1 j
10 Discrete 10
-1 Sine
j
11 11
-1 Transform j
12 12
-1 j
13 13
-1 N=8 j
14 14
-1 j
15 15
-1 j
Fig. 8 Quick discrete Fourier transform
3.2.4 Quick Discrete Fourier Transform
This algorithm computes the DFT via Discrete Cosine Transform (DCT) and Discrete
Sine Transform (DST). It decomposes the N -point DFT into (N /2 + 1)-point DCT
and (N /2 − 1)-point DST. The Quick DFT for 16-point data are shown in Fig. 8. This
computes the DCT and the DST separately by taking the complex operations at the last
stage. The arithmetic operations required by this algorithm to compute the N -point
DFT are as follows:
N 11
RM = log2 N − N +1 (40)
2 8
7
RA = N log2 N − 3N + 2 (41)
4
3.2.5 Computational Complexity for Real-Valued FFT Algorithms
Although most of the FFT algorithms are developed for complex-valued inputs, by tak-
ing the advantages of redundancies and trigonometric symmetries, the computational
complexity is reduced in all of these RFFT algorithm. The number of real multiplica-
tions and real additions required for the operation of real valued is shown in Tables 4
and 5, respectively. If a CFFT is used directly for a real inputs, it requires more arith-
metic complexity. The packing and doubling algorithms involve more additions than
a split-radix RFFT algorithm [80] for a real-valued input.
Split-radix FHT requires less number of multiplications and additions than radix-2
RFFT for N greater than 16. However, it requires more number of multiplications
Table 4 Number of real multiplications to compute the DFT for a real-valued input
N CFFT CFFT CFFT Radix-2 Split-radix Split-radix Quick DFT

direct packing doubling RFFT RFFT FHT
16 20 12 10 12 10 12 11
32 68 40 34 44 34 42 37
64 196 112 98 132 98 124 105
128 516 288 258 356 258 330 273
256 1284 704 642 900 642 828 673
512 3076 1664 1538 2180 1538 1994 1601
1024 7172 3840 3586 5124 3586 4668 3713
Table 5 Number of real additions to compute the DFT for a real-valued input
N CFFT CFFT CFFT Radix-2 Split-radix Split-radix Quick DFT

direct packing doubling RFFT RFFT FHT
16 148 88 88 62 60 64 66
32 388 228 224 170 164 166 186
64 964 556 544 442 420 416 482
128 2308 1308 1280 1082 1028 998 1186
256 5308 3004 2944 2586 2436 2336 2818
512 12,292 6780 6656 5978 5636 5350 6530
1024 27,652 15,100 14,848 13,658 12,804 12,064 14,580
and additions than split-radix RFFT. The Quick DFT algorithm requires more real
multiplications than the doubling algorithm.
3.3 Special Cases of the FFT Algorithms
The FFT algorithm could be optimized for some special cases, e.g., when only a part
of the output is desired or when there are a large number of zero’s in the input or
when the input is nonpower of two or multidimensional inputs. In this subsection, we
discuss some special cases of FFT algorithms that are useful for specific applications
like Third Generation Partnership Project (3GPP) Long-Term Evolution (LTE) [66],
modern microscopy [77], and radar signal processing [1].
3.3.1 FFT Pruning
If the data sequence contains 2l nonzero data points out of 2m data points, where
m > l, then the corresponding FFT can be computed by means of the pruned FFT
which accomplishes time saving. A slight modification to radix-2 DIT algorithm allows
a time saving of approximately (m-l)/m where 2m points are transformed of which only
2l are nonzero [56].
n K
0 0
W0 W0
1
W2 W1
2
W4 W2
3
W6 W3
2 4
-W0 W4
5
-W2 W5
6
-W4 W6
7
-W6 W7
1 8
-W0
9
-W1
10
-W2
11
-W3
3 12
-W4
13
-W5
14
-W6
15
-W7
Fig. 9 FFT pruning
The FFT pruning for l = 2, m = 3 is shown in Fig. 9. There are four nonzero data
points and three stages. Pruning is applied to first stage, but second and third stages
cannot be pruned [56]. When pruning is applicable, we compute only the partial
butterflies instead of entire butterflies. In general, if there are 2l nonzero data points
in a set of 2m data points, then the number of stage(s) where pruning can be applied
(m − l) stages. FFT pruning is used when there are a large number of zero’s that are
known in the input. However, it only allows l to be a power-of-two. The asymptotic
run time of the pruned FFT is O(N . log z), where N is the FFT length and z is the
number of nonzero inputs. The main drawback of pruning is that the data sequence is
to be known in advance, so that one can find the nonzero input values.
3.3.2 Goertzel Algorithm
Goertzel algorithm [35] can be useful for computing only few frequency selected
components. For example, in specific applications like frequency shift keying demod-
ulation and recognition of dual-tone multi-frequency (DTMF), tones require only a
few DFT frequencies. Goertzel algorithm can be derived by converting the DFT of
Eq. (1) into an equivalent form as a convolution, which can be efficiently implemented
as digital filter as shown in Fig. 10.
Fig. 10 Direct-form realization x(n) X(k)

of the Goertzel algorithm + +
2cos(2πk/N) Z-1
x x
Z-1 -e-j2πk/N
In Goertzel algorithm, it is not required to evaluate X (k) at all N values of k as in

Eq. (1). It can evaluate X (k) for any S values of k, with each DFT value being computed
by a recursive system of the form of Fig. 10 with appropriate coefficients. In this case,
the total computation is proportional to N S. This algorithm is more preferable when
S is small; however, when all N values of X (k) are required, FFT algorithm is more
efficient than the Goertzel algorithm.
3.3.3 Fast Fourier Transform of Sparse Input
The computation time of DFT generally corresponds to its size N . However, in most of
the applications like spectrum sensing and radar signal processing, only a few selected
output of the FFT is used. An algorithm to compute those coefficients of its Fourier
transform is called Sparse FFT (SFFT) [39], whose runtime is sublinear in the signal
size N .
In [51], the first sublinear algorithm for DFT was presented, which is followed
by several other sublinear algorithms [34,44]. Generally, the SFFT proceeds in three
steps (1) identifying the frequency locations of the principal elements with large mag-
nitude (Frequency Bucketization); (2) estimating the coefficients of these elements
in the first step (Frequency Estimation); (3) removing the attribution of the Fourier
result computed by the first two steps from the original signal (Collision Resolution).
These three steps are repeated until all the sparse elements are found. Instead of esti-
mating large coefficients, Hassanieh et al. [40] have identified and estimated the k
largest coefficients and set the others to zeros. The complexity is further reduced by
subtracting the reconstructed partial sparsity from the subsampled signals, such that
the complexity of the inverse FFT is also minimized [39]. A different scheme called
SFFT-DT (SFFT Downsampling in the time domain) was proposed in [43], which
downsample the source signal first and then all subsequent operations are conducted
on the downsampled signals.
There are several versions of SFFT algorithms described in [39]. A hardware imple-
mentation of SFFT algorithm is recently published in [1]. However, it is implemented
for a specific signal size. Therefore, in [2], the authors have presented the hardware
implementation of a million-point SFFT design, that can provide configurable param-
eters. Robust Sparse Fourier Transform (RSFT), which is a modification of Sparse
Fourier Transform (SFT) is presented in [89] which extends the SFT advantages to
short-range radar signal processing. It is shown that the RSFT is robust in detecting
frequencies when exact knowledge of signal sparsity is not available. It has further
investigated the trade-off between detection performance and computational complex-

ity [89].
3.3.4 Algorithms for Nonpower of two Lengths
In general, power-of-two FFT algorithms are used traditionally for most of the signal
processing applications. However, for certain applications like LTE and multimedia,
the input length of DFT is other than power-of-two. Such applications require fast
algorithm with less hardware complexity. In this subsection, we discuss the popular
Winograd FFT algorithm (WFTA) and prime factor algorithm (PFA) which are used
for other than power-of-two FFT lengths.
(i) Winograd Fourier Transform Algorithm
WFTA requires the least number of multiplications among practical FFT algorithms
for moderate length DFTs [78]. WFTA uses Good’s [36] mapping to convert the
length (N1 N2 ) 1-D DFT into a length (N1 N2 ) 2-D DFT. The structure of the small-
length algorithms are used to nest all the multiplications at the center of the overall
algorithm. Unlike Cooley–Tukey algorithm, the WFTA first computes the preadditions
of all sub-DFTs, then computes all multiplications of sub-DFTs and finally computes
the post-addition of all sub-DFTs. Winograd showed that the DFT can be computed
with only O(N ) multiplications. However, this algorithm requires more number of
additions and large memory for higher length. Thus, WFTA may only be efficient in
implementing small size DFTs.
(ii) Prime Factor Algorithm
The elimination of the twiddle factor multiplication to interconnect two decomposed

DFTs could be obtained by using the prime factor algorithm for computation of the
FFT. This is also called as Good–Thomas FFT algorithm [36]. This algorithm uses
a similar divide-and-conquer approach, where the sizes of the decomposed DFTs
are relatively prime. Two numbers are relatively prime or co-prime if their common
divisor is one. In such cases, a special index mapping based on the Chinese remainder
theorem [50] is formed to connect these decomposed DFTs. If the sequence length
can be factored into two mutually prime factors N = N1 N2 , the DFT of Eq. (1) can
be written as:
1 −1 N
N 2 −1
X (k) = xN2 n 1 + N1 n 2 W Nn 22k2 W Nn 11k1 (42)
n 1 =0 n 2 =0

N2 −point DFT

N1 −point DFT
where there is no twiddle factor multiplication to interconnect two decomposed DFTs.

This algorithm can also be applied recursively until the decomposed DFT can be
factored into co-prime DFTs [14,84].
3.3.5 Scaled DFTs
In certain applications like orthogonal frequency division multiplexing demodulation

and modern microscopy, the input length of DFT is length-q ∗ 2 M , where q is a odd
number. However, fast algorithms for such sequence lengths generally require complex
computational structure and are less efficient than that of power-of-two length DFTs.
Zero-padding technique was often used to get the DFT of such sequence lengths. How-
ever, this technique requires more computations. Therefore, a scaled DFT has been
proposed in [10], which can be flexibly used for length-q ∗ 2 M DFTs. Several algo-
rithms [11,52] have been proposed thereafter in order to further reduce the arithmetic
complexity for scaled DFT computation.
3.3.6 Multidimensional FFTs
The multidimensional (M-D) fast Fourier transforms (FFTs in 2D or more dimen-

sions) are used in many applications such as image processing, applied physics. These
applications require large amount of computations.
The general form of the multidimensional FFT is as follows:
1 −1 N
N 2 −1 m −1
N
X (u 1 , u 2 , . . . , u m ) = ... W Nu 11v1 W Nu 22v2
v1 =0 v2 =0 vm =0
u m vm
. . . W Nm x(v1 , v2 , . . . , vm ) (43)
where W Nk = exp( −2π j

Nk ), u k = 0, 1, . . . , u k −1; u k is the length of the kth dimension
k = 1, 2, . . . , m and x(v1 , v2 , . . . , vm ) are the complex input data sequences.
Equation (43) is converted into m one-dimensional FFTs as follows:
1 −1
N 2 −1
N
X (u 1 , u 2 , . . . , u m ) = W Nu 11v1 W Nu 22v2 . . .
v1 =0 v2 =0
m −1
N
W Nu mm vm x(v1 , v2 , . . . , vm ) (44)
vm =0
This provides the simplest algorithm where each one-dimensional FFT can be com-
puted by the Cooley–Tukey FFT [19], so this algorithm is known as row–column
algorithm [87].
Several algorithms have been proposed for the multidimensional FFTs such as the
vector-radix algorithms, the polynomial transform algorithms and the split vector-radix
algorithms [38]. These algorithms reduce the complexity over row–column algorithm.
In [15], a fast algorithm has been derived based on vector coding for multidimensional
integral points. This algorithm has reduced the multiplication complexity and the
number of recursive stages without increasing the number of additions. However, the
most popular one among these algorithms is the row–column decomposition algorithm,
due to its simple structure and easy to program.
3.3.7 Quantum Fourier Transform
The Moores law [68] has been consistent for several decades, but sustaining the pace
of scaling has become increasingly difficult in recent years. To meet the performance
and power requirements of exascale systems, quantum computers may be one of
the alternatives which possibly could offer exponential speedup for certain types of
calculations.
The Quantum Fourier Transform (QFT) is used in quantum computers, which is
similar to FFT [55] . But the QFT operates on quantum bits instead of operating on
vector elements. If 2 p elements are considered for both transforms, these can take
p2 p operations and p( p + 1)/2 operations to compute FFT and QFT, respectively.
By comparing the quantity of operations, it is evidenced that the QFT is efficient than
the FFT. Nowadays, significant attention is given for research to implement the QFT
algorithms [5,49].
Basic quantum computers are developed in many labs across the world. Companies
such as Microsoft, IBM and Google are all developed their own prototypes [67].
However, these prototypes are very simple with only a small number of qubits. The
quantum hardware emulation is also critical in developing practical QFT algorithms
before large-scale quantum computer becomes viable. Therefore, a comprehensive
methodology to perform accurate mapping of quantum algorithm for FPGA emulation
purposes have been demonstrated through the emulation of QFT hardware in [5,49].
4 Applications of FFT
The domains of applications of FFT is very wide and diverse. Moreover, many new
applications are emerging. Here, we outline only a few important and popular applica-
tions. We briefly discuss here the use of FFT, while the detail description of applications
and implementation of different applications along with the theoretical background
with examples and limitations are available in the cite references.
i. Digital Signal Processing Applications
Spectral analysis of signals is one of the most important and core application FFT.
Figure 11 shows the block diagram of FFT-based spectrum analyzer. The FFT analyzer
transforms the time domain data to frequency domain data. By analyzing the spectra
of input signals in frequency domain, the unknown parameters such as frequency,
amplitude, power spectra, and phase parameters of a signal can be observed that are
not easily detectable in time domain waveform.
A physical interpretation of the spectrum [64] is obtained with the amplitude and
phase spectra as
Input
Signal Analog to
Low pass
Sampler Digital
FFT Output/
filter Analyzer Display
Converter
Fig. 11 Block diagram of FFT spectrum analyzer

Perceptual
Model
Input time Output

FFT based Entropy
Signal Scale Factors Quantization Compressed
MDCT Coding
Signal
Fig. 12 Block diagram of an audio encoder

Im(X (k))
|X (k)| = Re(X (k))2 + Im(X (k))2 , arg(x(k)) = arctan (45)
Re(X (k))
The amplitude spectra indicates the signal power at different frequency bins. The power
spectrum of a random signal could be computed by squaring the absolute magnitude
of the complex DFT components of a segment of the signal as
1
P(k) = |X (k)|2 (46)
N
The power spectral estimation according to (46) is known as the periodogram

method [64]. However, it is not a good estimator, as the variance does not reduce to zero.
Therefore, to get a better estimation, the signal is divided into shorter segments and
then averaging the associated periodograms [90]. An alternate approach is suggested
in [48], where a window is applied to the autocorrelation estimates followed by the
DFT which results a smoother periodogram.
In [71], an FFT-Based Maximum Likelihood Estimation (MLE) was presented in
order to estimate the frequency and phase parameters. However, FFT-based MLE is
not an efficient for the real-time applications that require low latency with low com-
plexity. To overcome this problem, a new linear regression-based maximum likelihood
estimator called the Zero-Crossing Phase and Frequency Estimator has been proposed
in [54].
The brute-force computation of convolution involves O(N 2 ) multiplications and
additions, which is huge for large values of convolution length N . Convolution of
signals using FFT involves significantly less complexity compared to the brute-force
approach. In the FFT-based approach, convolution is performed in the following three
steps. (i) FFTs of the pair of input sequences are performed. (ii) The FFT outputs of
the pair of input sequences are multiplied point-by-point, and finally (iii) inverse FFT
of the product sequence is performed to obtain the convolved output.
FFT-based spectrum analysis also plays an important role in speech processing and
speech recognition. The FFT and IFFT are used for vocal-tract characterization, and
also for extraction of information relating to excitation. In the audio encoder [33] as
shown in Fig. 12, the input signal is processed by a FFT-based Modified Discrete
Cosine Transform (MDCT) and the transform coefficients are scaled by scale factors
that are determined by the masking threshold estimated according to the perceptual
model. The scaled coefficients are quantized and then entropy coded. Finally, a com-
pressed audio signal is obtained as the output of the encoder.
d0 X0
Input
Serial Constella- Parallel
Data d1 X1
To tion IFFT To DAC
....
....
....
d0, d1, ......, dn-1 Parallel Mapper Serial
dn-1 Xn-1
Channel
Output
Data Parallel Constella- Serial
To tion FFT To ADC
....
....
....
Serial Demapper Parallel
Fig. 13 Block diagram of FFT-based OFDM system
ii. Applications to Communication

FFT is an important functional block in modern communication systems, specif-
ically for applications in Orthogonal Frequency Division Multiplexing (OFDM)
systems, such as Digital Broadcasting [45], Worldwide Interoperability for Microwave
Access (WiMAX) [26], IEEE 802.11 standards [16], Long-Term Evolution [66]. The
basic block diagram of FFT-based OFDM system [25] is shown in Fig. 13.
The serial input data stream (d0 , d1 , . . . , dn−1 ) is converted into parallel data
stream and mapped to symbols from constellation mapper (Binary Phase Shift
Keying/Quadrature Phase Shift Keying/Quadrature Amplitude Modulation). Let the
symbols are mapped using Binary Phase Shift Keying and the output is repre-
sented as X 0 , X 1 , . . . , X n−1 . Then, these symbols are given to IFFT that generates
a digital OFDM symbol with N orthogonal subcarriers. The output of the IFFT is
x0 , x1 , . . . , xn−1 . This output is then serialized and converted to analog signal using
digital to analog converter (DAC). The complete OFDM symbol x(t) is transmitted
through the channel. On the receiver side, this OFDM symbol is converted back to par-
allel stream and converted to digital signal using analog to digital converter (ADC). An
FFT is used to decode the OFDM subcarriers and then mapped to constellation demap-
per. Finally, the received signal is serialized to get the output data (d̂0 , d̂1 , . . . , d̂n−1 ).
The main benefits of OFDM in wireless communication systems are high bandwidth
efficiency, resistance to RF interference and robustness to multipath fading.
IFFT and FFT algorithms are used to efficiently implement the modulation and
demodulation in OFDM. In [17], hardware implementation of a single-chip 2048
complex point FFT for digital audio broadcasting system has been presented. The
entire FFT chip [17] is designed by considering five stages of radix-4 and one stage
of radix-2 for 2048-point FFT. Based on radix-8 FFT algorithm, a high-performance
8K -point FFT processor architecture has been developed for OFDM utilized in Digital
Video Broadcasting—Terrestrial receiver [45]. A pipeline single path delay feedback
radix-22 DIF FFT processor has been proposed in [20] for wireless local area network
Input
image A 2D-FFT
2D- Output
POC
IFFT
Input
image B 2D-FFT
Fig. 14 Block diagram of phase-only correlation
applications, that achieves the minimum of the area-power cost function. A modified
radix-25 512-point FFT processor has been presented in [16] that gives better per-
formance in terms of speed. An eight data-path pipelined approach has been used in
this FFT processor for high rate wireless personal area network (WPAN) applications.
A variable-length FFT processor has been proposed in [93] for LTE and WiMAX
systems, using radix-2/4/8 DIF and radix-3 FFT. The selection of the algorithm for
communication applications mainly depends on the size of the DFT.
iii. Image Processing Applications
FFT is used in medical imaging [37] for image filtering, image analysis and image
reconstruction. In the Fourier representation of images using FFT, spectral magnitude,
and phase tend to play different roles. In [63], Oppenheim and Lim demonstrated the
importance of phase spectrum over magnitude spectrum. Further in [62], it is shown
that the phase information plays a more important role than the magnitude. A block
diagram for implementation of phase-only correlation (POC) using 2D-FFT and 2D-
IFFT [70] is shown in Fig. 14. Correlation between phase-only versions of the two
images to be aligned is used for image matching. Some of the important applications
based on the FFT-based image matching include face recognition, iris recognition,
palm print recognition, fingerprint matching and waveform matching. In [61], a new
image quality assessment algorithm based on the phase and magnitude of the two-
dimensional DFT has been proposed, which pointed out that both amplitude and phase
spectrum are required for perfect image reconstruction.
Frigo et al. [27] developed Fastest Fourier Transform in the West (FFTW), which
contain a FFT library (C codes). This library is used for computing the DFT in 1-
D or more dimensions of various input size for both real- and complex-valued input
data. Image processing applications make use of FFTW. However, today’s applications
need to process a huge amount of data sets demands fast computation of FFT. The
Sparse FFT algorithm [32] addresses this problem by providing a sublinear complexity
which utilizes the sparsity in the Fourier domain (considering only frequencies with
information). The fast computation makes the sparse FFT a promising tool for many
data-intensive applications such as 4D light fields [76] and 2D Magnetic Resonance
Spectroscopy (MRS) [91].
iv. Other Applications
A. FFT-based Long Integer Multiplication:
Long integer multiplication is popularly used in public-key cryptography algorithms
such as RSA (Rivest, Shamir, and Adelman) [72] which employ arithmetic with inte-
Encode the p p’ P
a integers as Pad FFT
Polynomial zero Point- R r Decode the
vectors wise
Multipli
IFFT Polynomial
vectors as
C= a × b
Encode the q q’ Q -cation integers
b integers as Pad FFT
Polynomial zero
vectors
Fig. 15 Data flow of Schonhage and Strassen algorithm using FFT and IFFT
gers having hundreds of digits. Long integer multiplication is used in several other
applications where high-precision calculations are required and overflow is not desir-
able. The schoolbook method of multiplication involves a time complexity of O(n 2 )
to multiply an n-bit integer, ’a’ with an m-bit integer, ‘b’. In [74], Schonhage and
Strassen used the fast polynomial multiplication algorithm based on FFT to com-
pute integer products in time O(n log2 n log(log n)). The data flow of Schonhage and
Strassen algorithm is shown in Fig. 15.
The pair of integers a and b in polynomial representation can be written as
p(x) = a0 + a1 x + a2 x 2 + · · · + an−1 x n−1 (47)

q(x) = b0 + b1 x + b2 x 2 + · · · + bm−1 x m−1 (48)
where x is the base integer.

The above two equations can be represent in vector format as
p = [a0 , a1 , a2 , . . . , an−1 ] (49)

q = [b0 , b1 , b2 , . . . , bm−1 ] (50)
When the vectors are of different lengths, the shorter one is padded with zeros until
the length becomes the same as the other one. In order to use radix-2 FFT, the length
need to be power-of-two. Therefore, when the length of the vectors are not power-of-
two numbers zeros are padded with them till the length becomes the next power-of-two
integer. Let p and q , respectively, be the vectors obtained by zero-padding of vectors
p and q.
Multiplication is linear convolution. FFT is used for computing circular convolu-
tion. One can compute linear convolution using circular convolution by overlap-add
or overlap-save approach. Based on modified overlap-save and overlap-add, a new
approach for linear convolution using FFT has been proposed by Narasimha [60].
B. Applications of FFT to Large Dataset Input:
A large number of signal processing applications, especially in the areas of next-
generation wireless systems, Radar signal processing, spectrum sensing of Cognitive
Radio, etc., requires the real-time computation of the FFT involving large dataset
input (1K − 1024K ). For 1K -point FFT, a new type of FFT hardware architecture
called serial commutator (SC) is presented in [30]. This SC architecture uses a novel
data management based on circuits for bit dimension permutation of serial data. A
second generation digital video broadcasting-terrestrial (DVB-T2) standard require

variable-length (1K − 32K ) FFT core. In [85,86], various authors proposed different
FFT architectures for the design of the DVB-T2.
In [7], the authors have presented a novel 4K complex point, fully systolic VLSI
FFT architecture based on the combination of three consecutive radix-4 stages. The
architecture is extended to accomplish FFT computations of 16K , 64K and 256K .
In [53], the authors have discussed a very large FFT (VLFFT), which implements
one-dimensional complex single-precision floating-point FFTs of size 16K to 1024K
samples on 1, 2, 4, and 8 DSP cores of TMS320C6678. In [3], the authors have pre-
sented the hardware implementation of a million-point sparse FFT design. This design
has been parameterized and developed in a modular fashion, enabling its use in a wide
variety of sparse FFT applications. In application like radar target detection, the radar
returns are typically sparse in the target parameter space [88]. This motivates the
application of SFFT in radar signal processing.
C. Computation of Discrete Trigonometric Transformations:

FFT also plays an important role in computing Discrete Trigonometric Transforma-
tions such as Discrete Sine Transform (DST), Discrete Cosine Transform (DCT),
Modified DCT (MDCT), and Fast Hartley Transform. These transformations take the
real data as input and produces the real data as output.
DSTs are widely used in solving partial differential equations by spectral meth-
ods [57]. DST of (N − 1) real-valued data can be accomplished by computing a real
FFT of length 2N , which can be implemented by FFT algorithm with real-valued
data. In the similar way, DCT can also be computed by using FFT. Normal DCT [4]
allows overlapping between the adjacent time windows that effects the quality of the
reconstructed image. In [69], a modified version of DCT has been presented called
as Modified DCT, which reduced the overlapping and provide perfect reconstruc-
tion of image. In [46,58], N -point MDCT are realized using various FFT algorithm
(split-radix, mixed-radix). These MDCTs are mainly used for compressing the
signal.
5 Conclusion
The beauty of FFT is its efficient computation of DFT. The direct computation of DFT
requires computation of the order of O(N 2 ) whereas FFT involves only O(N log2 N )
operations. Several FFT algorithms have been developed during the last 50 years and
used in various applications in the broad areas of signal processing, communication,
and image processing. In this article, we have presented brief overview of important
advancements in algorithms and applications of FFT.
References
1. O. Abari, E. Hamed, H. Hassanieh, A. Agarwal, D. Katabi, A.P. Chandrakasan, V. Stojanovic, 27.4
a 0.75-million-point fourier-transform chip for frequency-sparse signals, in 2014 IEEE International
Solid-State Circuits Conference Digest of Technical Papers (ISSCC). (IEEE, 2014), pp. 458–459
2. A. Agarwal, H. Hassanieh, O. Abari, E. Hamed, D. Katabi et al., High-throughput implementation of a

million-point sparse Fourier Transform, in 2014 24th International Conference on Field Programmable
Logic and Applications (FPL) (IEEE, 2014a), pp. 1–6
3. A. Agarwal, H. Hassanieh, O. Abari, E. Hamed, D. Katabi et al., High-throughput implementation of a
million-point sparse Fourier Transform, in 2014 24th International Conference on Field Programmable
Logic and Applications (FPL) (IEEE, 2014b), pp. 1–6
4. N. Ahmed, T. Natarajan, K.R. Rao, Discrete cosine transform. IEEE Trans. Comput. 100(1), 90–93
(1974)
5. M. Aminian, M. Saeedi, M.S. Zamani, M. Sedighi, Fpga-based circuit model emulation of quantum
algorithms, in 2008 IEEE Computer Society Annual Symposium on VLSI. (IEEE, 2008), pp. 399–404
6. M. Ayinala, Y. Lao, K.K. Parhi, An in-place FFT architecture for real-valued signals. IEEE Trans.
Circuits Syst. II Express Briefs 60(10), 652–656 (2013)
7. K. Babionitakis, V.A. Chouliaras, K. Manolopoulos, K. Nakos, D. Reisis, N. Vlassopoulos, Fully
systolic FFT architecture for giga-sample applications. J. Signal Process. Syst. 58(3), 281–299 (2010)
8. G. Bergland, A radix-eight fast Fourier transform subroutine for real-valued series. IEEE Trans. Audio
Electroacoustics 17(2), 138–144 (1969)
9. G.D. Bergland, Numerical analysis: a fast Fourier transform algorithm for real-valued series. Commun.
ACM 11(10), 703–710 (1968)
10. G. Bi, Y.Q. Chen, Fast DFT algorithms for length n= q* 2m . IEEE Trans. Circuits Syst. II Analog
Digit. Signal Process. 45(6), 685–690 (1998)
11. S. Bouguezel, M.O. Ahmad, M.S. Swamy, A new radix-2/8 FFT algorithm for length-q× 2m DFTs.
IEEE Trans. Circuits Syst. I Regul. Pap. 51(9), 1723–1732 (2004)
12. R.N. Bracewell, Discrete hartley transform. JOSA 73(12), 1832–1835 (1983)
13. O. Buneman, Inversion of the Helmholtz (or Laplace-Poisson) operator for slab geometry. J. Comput.
Phys. 12(1), 124–130 (1973)
14. C. Burrus, P. Eschenbacher, An in-place, in-order prime factor FFT algorithm. IEEE Trans. Acoust.
Speech Signal Process. 29(4), 806–817 (1981)
15. Z. Chen, L. Zhang, Vector coding algorithms for multidimensional discrete Fourier transform. J.
Comput. Appl. Math. 212(1), 63–74 (2008)
16. T. Cho, H. Lee, J. Park, C. Park, A high-speed low-complexity modified radix-25 FFT processor for
gigabit WPAN applications, in 2011 IEEE International Symposium on Circuits and Systems (ISCAS)
(2011), pp. 1259–1262. https://doi.org/10.1109/ISCAS.2011.5937799
17. J.-R. Choi, S.-B. Park, D.-S. Han, S.-H. Park, A 2048 complex point FFT architecture for digital audio
broadcasting system, in The 2000 IEEE International Symposium on Circuits and Systems, 2000.
Proceedings. ISCAS 2000 Geneva, vol. 5 (IEEE, 2000), pp. 693–696
18. W.T. Cochran, J.W. Cooley, D.L. Favin, H.D. Helms, R.A. Kaenel, W.W. Lang, G.C. Maling Jr., D.E.
Nelson, C.M. Rader, P.D. Welch, What is the fast Fourier transform? Proc. IEEE 55(10), 1664–1674
(1967)
19. J. Cooley, J. Tukey, An algorithm for the machine calculation of complex Fourier series. Math. Comput.
19(90), 297–301 (1965)
20. A. Cortés, I. Vélez, J. Sevillano, M. Turrillas, Fast Fourier Transform Processors: Implementing FFT
and IFFT Cores for OFDM Communication Systems (INTECH Open Access Publisher, London, 2012)
21. G.C. Danielson, C. Lanczos, Some improvements in practical Fourier analysis and their application to
X-ray scattering from liquids. J. Franklin Inst. 233(4), 365–380 (1942)
22. P. Duhamel, Implementation of “Split-radix” FFT algorithms for complex, real, and real-symmetric
data. IEEE Trans. Acoust. Speech Signal Process. 34(2), 285–295 (1986). https://doi.org/10.1109/
TASSP.1986.1164811
23. P. Duhamel, H. Hollmann, Split radix. FFT algorithm. Electron. Lett. 20(1), 14–16 (1984). https://doi.
org/10.1049/el:19840012
24. P. Duhamel, M. Vetterli, Fast Fourier transforms: a tutorial review and a state of the art. Signal Process.
19(4), 259–299 (1990)
25. V.K. Dwivedi, P. Kumar, G. Singh, A novel blind frequency offset estimation method for OFDM
systems, in International Conference on Recent Advances in Microwave Theory and Applications,
2008. MICROWAVE 2008 (IEEE, 2008), pp. 668–675
26. C.-P. Fan, M.-S. Lee, G.-A. Su, A low multiplier and multiplication costs 256-point FFT implementation
with simplified radix-24 SDF architecture, in IEEE Asia Pacific Conference on Circuits and Systems,
2006. APCCAS 2006 (2006), pp. 1935–1938. https://doi.org/10.1109/APCCAS.2006.342239
27. M. Frigo, S.G. Johnson, FFTW: an adaptive software architecture for the FFT, in Proceedings of the
1998 IEEE International Conference on Acoustics, Speech and Signal Processing, 1998, vol. 3 (IEEE,
1998), pp. 1381–1384
28. M. Garrido, K.K. Parhi, J. Grajal, A pipelined FFT architecture for real-valued signals. IEEE Trans.
Circuits Syst. I Regul. Pap. 56(12), 2634–2643 (2009)
29. M. Garrido, J. Grajal, M. Sanchez, O. Gustafsson, Pipelined radix-2k feedforward FFT architec-
tures. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 21(1), 23–32 (2013). https://doi.org/10.1109/
TVLSI.2011.2178275
30. M. Garrido, S.-J. Huang, S.-G. Chen, O. Gustafsson, The serial commutator FFT. IEEE Trans. Circuits
Syst. II Express Briefs 63(10), 974–978 (2016)
31. W.M. Gentleman, G. Sande, Fast Fourier transforms: for fun and profit, in Proceedings of the November
7–10, 1966, Fall Joint Computer Conference (ACM, 1966), pp. 563–578
32. B. Ghazi, H. Hassanieh, P. Indyk, D. Katabi, E. Price, L. Shi, Sample-optimal average-case sparse
Fourier transform in two dimensions. arXiv preprint arXiv:1303.1209 (2013)
33. J.D. Gibson, Speech compression. Information 7(2), 32 (2016)
34. A.C. Gilbert, S. Muthukrishnan, M. Strauss, Improved time bounds for near-optimal sparse Fourier
representations, in Optics and Photonics 2005 (International Society for Optics and Photonics, 2005)
35. G. Goertzel, An algorithm for the evaluation of finite trigonometric series. Am. Math. Mon. 65, 34–35
(1958)
36. I.J. Good, The interaction algorithm and practical Fourier analysis. J. R. Stat. Soc. Ser. B (Methodol.)
20, 361–372 (1958)
37. M.N. Haque, M.S. Uddin, M. Abdullah-Al-Wadud, Y. Chung, Fast reconstruction technique for medical
images using graphics processing unit, in Signal Processing, Image Processing and Pattern Recognition
(2011), pp. 300–309
38. D. Harris, J.H. McClellan, D. Chan, H. Schuessler, Vector radix fast Fourier transform, in IEEE
International Conference on Acoustics, Speech, and Signal Processing, ICASSP’77, vol. 2 (IEEE,
1977), pp. 548–551
39. H. Hassanieh, P. Indyk, D. Katabi, E. Price, Nearly optimal sparse fourier transform. CoRR,
arxiv:abs/1201.2501 (2012a)
40. H. Hassanieh, P. Indyk, D. Katabi, E. Price, Simple and practical algorithm for sparse Fourier transform,
in Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms (SIAM,
2012b), pp. 1183–1194
41. S. He, M. Torkelson, A new approach to pipeline FFT processor, in Proceedings of IPPS ’96, The 10th
International Parallel Processing Symposium, 1996 (1996), pp. 766–770. https://doi.org/10.1109/
IPPS.1996.508145
42. M.T. Heideman, D.H. Johnson, C.S. Burrus, Gauss and the history of the fast Fourier transform. Arch.
Hist. Exact Sci. 34(3), 265–277 (1985)
43. S.-H. Hsieh, C.-S. Lu, S.-C. Pei, Sparse fast Fourier transform by downsampling, in 2013 IEEE
International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2013), pp.
5637–5641
44. M.A. Iwen, Combinatorial sublinear-time Fourier algorithms. Found. Comput. Math. 10(3), 303–338
(2010)
45. R.M. Jiang, An area-efficient FFT architecture for OFDM digital video broadcasting. IEEE Trans.
Consum. Electron. 53(4), 1322–1326 (2007)
46. S.G. Johnson, M. Frigo, A modified split-radix FFT with fewer arithmetic operations. IEEE Trans.
Signal Process. 55(1), 111–119 (2007)
47. O. Jung-Yeol, L. Myoung-Seob, New radix-2 to the 4th power pipeline FFT processor. IEICE Trans.
Electron. 88(8), 1740–1746 (2005)
48. S.M. Kay, S.L. Marple, Spectrum analysisa modern perspective. Proc. IEEE 69(11), 1380–1419 (1981)
49. M. Khalil-Hani, Y. Lee, M. Marsono, An accurate FPGA-based hardware emulation on quantum
Fourier transform. Quantum 1, a1b3 (2015)
50. D. Kolba, T. Parks, A prime factor FFT algorithm using high-speed convolution. IEEE Trans. Acoust.
Speech Signal Process. 25(4), 281–294 (1977). https://doi.org/10.1109/TASSP.1977.1162973
51. E. Kushilevitz, Y. Mansour, Learning decision trees using the Fourier spectrum. SIAM J. Comput.
22(6), 1331–1348 (1993)
52. K. Li, W. Zheng, K. Li, A fast algorithm with less operations for length-DFTs. IEEE Trans. Signal
Process. 63(3), 673–683 (2015)
53. X. Li, E. Blinka, Very Large FFT for TMS320C6678 Processors (Texas Instruments, Dallas, 2015)
54. Y. Liao, Phase and Frequency Estimation–High-Accuracy and Low-Complexity Techniques. Ph.D.
thesis, Worcester Polytechnic Institute (2011)
55. H.-K. Lo, T. Spiller, S. Popescu, Introduction to Quantum Computation and Information (World Sci-
entific, Singapore, 1998)
56. J. Markel, FFT pruning. IEEE Trans. Audio Electroacoustics 19(4), 305–311 (1971). https://doi.org/
10.1109/TAU.1971.1162205
57. S.A. Martucci, Symmetric convolution and the discrete sine and cosine transforms. IEEE Trans. Signal
Process. 42(5), 1038–1051 (1994)
58. P.K. Meher, Efficient systolic implementation of DFT using a low-complexity convolution-like formu-
lation. IEEE Trans. Circuits Syst. II Express Briefs 53(8), 702–706 (2006)
59. P.K. Meher, B.K. Mohanty, S.K. Patel, S. Ganguly, T. Srikanthan, Efficient vlsi architecture for
decimation-in-time fast fourier transform of real-valued data. IEEE Trans. Circuits Syst. I Regul.
Pap. 62(12), 2836–2845 (2015)
60. M.J. Narasimha, Modified overlap-add and overlap-save convolution algorithms for real signals. IEEE
Signal Process. Lett. 13(11), 669–671 (2006)
61. M. Narwaria, W. Lin, I.V. McLoughlin, S. Emmanuel, L.-T. Chia, Fourier transform-based scalable
image quality measure. IEEE Trans. Image Process. 21(8), 3364–3377 (2012)
62. X.S. Ni, X. Huo, Statistical interpretation of the importance of phase information in signal and image
reconstruction. Stat. Probab. Lett. 77(4), 447–454 (2007)
63. A.V. Oppenheim, J.S. Lim, The importance of phase in signals. Proc. IEEE 69(5), 529–541 (1981)
64. A.V. Oppenheim, R.W. Schafer, J.R. Buck et al., Discrete-Time Signal Processing, vol. 2 (Prentice-Hall,
Englewood Cliffs, 1989)
65. E. Oran Brigham, The Fast Fourier Transform and Its Applications (Prentice Hall, Englewood Cliffs,
1988)
66. S.-Y. Peng, K.-T. Shr, C.-M. Chen, Y.-H. Huang, Energy-efficient 128 2048/1536-point FFT processor
with resource block mapping for 3GPP-LTE system, in 2010 International Conference on Green
Circuits and Systems (ICGCS) (IEEE, 2010), pp. 14–17
67. G. Popkin, Quest for qubits. Science 354(6316), 1090–1093 (2016). https://doi.org/10.1126/science.
354.6316.1090
68. I. Present, Cramming more components onto integrated circuits. Read. Comput. Archit. 56 (2000)
69. J. Princen, A. Bradley, Analysis/synthesis filter bank design based on time domain aliasing cancellation.
IEEE Trans. Acoust. Speech. Signal Process. 34(5), 1153–1161 (1986)
70. K.R. Rao, D.N. Kim, J.J. Hwang, Fast Fourier Transform-Algorithms and Applications (Springer,
Berlin, 2011)
71. D. Rife, R. Boorstyn, Single tone parameter estimation from discrete-time observations. IEEE Trans.
Inf. Theory 20(5), 591–598 (1974)
72. R.L. Rivest, A. Shamir, L. Adleman, A method for obtaining digital signatures and public-key cryp-
tosystems. Commun. ACM 21(2), 120–126 (1978)
73. C. Runge, Zeit. f. Math. u. Phys 48, 443–456 (1903)
74. A. Schönhage, V. Strassen, Schnelle multiplikation grosser zahlen. Computing 7(3–4), 281–292 (1971)
75. B.R. Sekhar, K. Prabhu, Radix-2 decimation-in-frequency algorithm for the computation of the real-
valued fft. IEEE Trans. Signal Process. 47(4), 1181–1184 (1999)
76. L. Shi et al., Imaging Applications of the Sparse FFT. Ph.D. thesis, Massachusetts Institute of Tech-
nology (2013)
77. H. Shu, X. Bao, C. Toumoulin, L. Luo, Radix-3 algorithm for the fast computation of forward and
inverse MDCT. IEEE Signal Process. Lett. 14(2), 93–96 (2007)
78. H. Silverman, An introduction to programming the Winograd Fourier transform algorithm (WFTA).
IEEE Trans. Acoust. Speech Signal Process. 25(2), 152–165 (1977)
79. W.W. Smith, J.M. Smith, Handbook of Real-Time Fast Fourier Transforms, vol. 55 (IEEE Press, New
York, 1995)
80. H.V. Sorensen, D. Jones, M. Heideman, C. Burrus, Real-valued fast Fourier transform algorithms.
IEEE Trans. Acoust. Speech Signal Process. 35(6), 849–863 (1987)
81. D. Takahashi, An extended split-radix FFT algorithm. IEEE Signal Process. Lett. 8(5), 145–147 (2001)
82. D. Takahashi, A radix-16 FFT algorithm suitable for multiply-add instruction based on Goedecker
method, in International Conference on Multimedia and Expo, ICME’03, vol. 2 (IEEE, 2003), pp.
II–845
83. F.J. Taylor, G. Papadourakis, A. Skavantzos, A. Stouraitis, A radix-4 FFT using complex RNS arith-
metic. IEEE Trans. Comput. 100(6), 573–576 (1985)
84. C. Temperton, Implementation of a self-sorting in-place prime factor FFT algorithm. J. Comput. Phys.
58(3), 283–299 (1985)
85. M. Turrillas, A. Cortés, I. Vélez, J.F. Sevillano, A. Irizar, An FFT core for DVB-T2 receivers, in
16th IEEE International Conference on Electronics, Circuits, and Systems, 2009. ICECS 2009 (IEEE,
2009), pp. 120–123
86. M. Turrillas, A. Cortés, I. Vélez, J.F. Sevillano, A. Irizar, An area-efficient radix-28 FFT algorithm for
DVB-T2 receivers. Microelectron. J. 45(10), 1311–1318 (2014)
87. C. Van Loan, Computational Frameworks for the Fast Fourier Transform (SIAM, Philadelphia, 1992)
88. S. Wang, V.M. Patel, A. Petropulu, An efficient high-dimensional sparse Fourier transform. arXiv
preprint arXiv:1610.01050 (2016)
89. S. Wang, V.M. Patel, A. Petropulu, The robust sparse fourier transform (RSFT) and its application in
radar signal processing. IEEE Trans. Aerosp. Electron. Syst. 53(6), 2735–2755 (2017)
90. P. Welch, The use of fast Fourier transform for the estimation of power spectra: a method based on
time averaging over short, modified periodograms. IEEE Trans. Audio Electroacoustics 15(2), 70–73
(1967)
91. L.P. Yaroslavsky, Fast transforms in image processing: compression, restoration, and resampling. Adv.
Electr. Eng. (2014). https://doi.org/10.1155/2014/276241
92. R. Yavne, An economical method for calculating the discrete Fourier transform, in Proceedings of the
December 9–11, 1968, Fall Joint Computer Conference, Part I (ACM, 1968), pp. 115–125
93. C. Yu, M.-H. Yen, Area-efficient 128-to 2048/1536-point pipeline FFT processor for LTE and mobile
WiMAX systems. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 23(9), 1793–1800 (2015)
94. W. Zheng, K. Li, K. Li, A fast algorithm based on SRFFT for length DFTs. IEEE Trans. Circuits Syst.
II Express Briefs 61(2), 110–114 (2014)
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps
and institutional affiliations.

50 Years of FFT Algorithms and Applications.

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

50 Years of FFT Algorithms and Applications.

Загружено:

Авторское право:

Доступные форматы

Circuits, Systems, and Signal Processing

50 Years of FFT Algorithms and Applications

G. Ganesh Kumar1 · Subhendu K. Sahoo1 · Pramod Kumar Meher2

Received: 12 October 2018 / Revised: 3 May 2019 / Accepted: 4 May 2019

Keywords DFT · FFT algorithms · Computational complexity · Digital signal

1 Department of Electrical Engineering, BITS - Pilani, Hyderabad Campus, Hyderabad,

Discrete Fourier Transform:

The N -point DFT/IDFT are, respectively, calculated as

Fig. 1 A sine wave

Significance of the DFT

To illustrate the significance of DFT, let us consider a 4-point DFT of samples of a

x(t) = sin(2π · 10 · t) (4)

where nk = 0 to N − 1, i.e., 0 to 3. From Eq. (5), W4nk values are: W40 = 1,

The general equation for the 4-point DFT could be written as

The DFT output values are obtained for k = 0, 1, 2, 3 as

As the fundamental period of a sinusoidal signal is 0.1 s, so the frequency resolution

Computational Complexity of DFT

Computation of each DFT component directly according to (1) requires N complex

X (k) = X R (k) + j X I (k)

where k = 0, 1, 2, . . . , N − 1. Assuming that each complex multiplication in Eq. (8)

A Historical Perspective of Fast Computation of DFT

The computational complexity of DFT is substantially reduced by using the following

3 Advancements in FFT Algorithms

The basic principle of divide-and-conquer approach leads to a variety of efficient

3.1 Complex-Valued FFT Algorithms

3.1.1 Radix-2 FFT Algorithms

(i) Decimation-in-Time Radix-2 FFT Algorithm

Fig. 3 Length-4, DIT radix-2 FFT

(ii) Decimation-in-Frequency Radix-2 FFT Algorithm

This algorithm is based on computing the DFT by decomposition of the sequence

Fig. 4 Length-4, DIF radix-2 FFT

3.1.2 Radix-4 FFT Algorithm

It can be used when the DFT length N is a power-of-four (i.e., N = 4 M ). Unlike

3.1.3 Radix-2i and Higher Radix FFT Algorithms

(i) Radix-22 Algorithm

Substituting (24) in (1), we can get the following expression:

Fig. 6 Signal flow graph of 16-point radix-22 DIF FFT

(ii) Higher Radix Algorithms

Radix−22 W4 WN W4 W N /4 W4 W N /16 W4 W N /64

3.1.4 Split-Radix FFT

Fig. 7 Split-radix FFT

Table 2 Number of real

Table 3 Number of real

3.1.5 Computational Complexity for Complex-Valued FFT Algorithms

3.2 Real-Valued FFT Algorithms

3.2.1 Computation of the RFFT Using the CFFT

(i) Doubling Algorithm

x(n) = p(n) + j.q(n) (32)

X (K ) = P(K ) + j.Q(K ) (33)

2 The FFT components in case of RFFT also are complex.

hence follows the output sequence as:

X ∗ (N − k) = P(K ) − j.Q(K ) (35)

(ii) Packing Algorithm

x(n) = x(2n) + j.x(2n + 1) (37)

3.2.2 FFT of Real-Valued Data

3.2.3 Fast Hartley Transform-Based Algorithm

Fig. 8 Quick discrete Fourier transform

3.2.4 Quick Discrete Fourier Transform

3.2.5 Computational Complexity for Real-Valued FFT Algorithms

N CFFT CFFT CFFT Radix-2 Split-radix Split-radix Quick DFT

N CFFT CFFT CFFT Radix-2 Split-radix Split-radix Quick DFT

3.3 Special Cases of the FFT Algorithms

3.3.1 FFT Pruning