Академический Документы
Профессиональный Документы
Культура Документы
Abstract—Year 2009 marks the completion of 50 years of the applications. CORDIC-based computing received increased at-
invention of CORDIC (COordinate Rotation DIgital Computer) tention in 1971, when John Walther [3], [4] showed that, by
by Jack E. Volder. The beauty of CORDIC lies in the fact that varying a few simple parameters, it could be used as a single
by simple shift-add operations, it can perform several computing
tasks such as the calculation of trigonometric, hyperbolic and
algorithm for unified implementation of a wide range of ele-
logarithmic functions, real and complex multiplications, division, mentary transcendental functions involving logarithms, expo-
square-root, solution of linear systems, eigenvalue estimation, nentials, and square roots along with those suggested by Volder
singular value decomposition, QR factorization and many others. [1]. During the same time, Cochran [5] benchmarked various al-
As a consequence, CORDIC has been utilized for applications in gorithms, and showed that CORDIC technique is a better choice
diverse areas such as signal and image processing, communication for scientific calculator applications.
systems, robotics and 3-D graphics apart from general scientific
and technical computation. In this article, we present a brief
The popularity of CORDIC was very much enhanced there-
overview of the key developments in the CORDIC algorithms and after primarily due to its potential for efficient and low-cost
architectures along with their potential and upcoming applica- implementation of a large class of applications which include:
tions. the generation of trigonometric, logarithmic and transcendental
elementary functions; complex number multiplication, eigen-
Index Terms—Arithmetic circuits, CORDIC, CORDIC algo-
rithms, digital signal processing chip, VLSI. value computation, matrix inversion, solution of linear systems
and singular value decomposition (SVD) for signal processing,
image processing, and general scientific computation. Some
I. INTRODUCTION other popular and upcoming applications are:
1) direct frequency synthesis, digital modulation and coding
for speech/music synthesis and communication;
OORDINATE Rotation DIgital Computer is abbreviated
C as CORDIC. The key concept of CORDIC arithmetic is
based on the simple and ancient principles of two-dimensional
2) direct and inverse kinematics computation for robot ma-
nipulation;
3) planar and three-dimensional vector rotation for graphics
geometry. But the iterative formulation of a computational algo-
and animation.
rithm for its implementation was first described in 1959 by Jack
Although CORDIC may not be the fastest technique to per-
E. Volder [1], [2] for the computation of trigonometric func-
form these operations, it is attractive due to the simplicity of
tions, multiplication and division. This year therefore marks the
its hardware implementation, since the same iterative algorithm
completion of 50 years of the CORDIC algorithm. Not only
could be used for all these applications using the basic shift-add
a wide variety of applications of CORDIC have emerged in
operations of the form .
the last 50 years, but also a lot of progress has been made in
Keeping the requirements and constraints of different ap-
the area of algorithm design and development of architectures
plication environments in view, the development of CORDIC
for high-performance and low-cost hardware solutions of those
algorithm and architecture has taken place for achieving high
throughput rate and reduction of hardware-complexity as well
Manuscript received August 22, 2008; revised November 26, 2008 and April as the latency of implementation. Some of the typical ap-
10, 2009. First published June 19, 2009; current version published September
02, 2009. This paper was recommended by Associate Editor V. Öwall.
proaches for reduced-complexity implementation are focussed
P. K. Meher is with the Department of Communication Systems, Institute for on minimization of the complexity of scaling operation and the
Infocomm Research, Singapore 138632 (e-mail: pkmeher@i2r.a-star.edu.sg). complexity of barrel-shifter in the CORDIC engine. Latency
J. Valls is with Instituto de Telecomunicaciones y Aplicaciones Multimedia, of implementation is an inherent drawback of the conventional
Universidad Politécnica de Valencia, 46730 Grao de Gandia, Spain (e-mail:
jvalls@eln.upv.es). CORDIC algorithm. Angle recoding schemes, mixed-grain
T.-B. Juang is with the Department of Computer Science and Information rotation and higher radix CORDIC have been developed for
Engineering, National Pingtung Institute of Commerce, Pingtung City, Taiwan reduced latency realization. Parallel and pipelined CORDIC
900 (e-mail: tsobing@npic.edu.tw).
K. Sridharan is with the Department of Electrical Engineering, Indian Insti- have been suggested for high-throughput computation. The
tute of Technology Madras, Chennai 600036, India (e-mail: sridhara@iitm.ac. objective of this article is not to present a detailed survey of
in). the developments of algorithms, architectures and applications
K. Maharatna is with the School of Electronics and Computer Sci-
ence, University of Southampton, Southampton, SO17 1BJ, U.K. (e-mail:
of CORDIC, which would require a few doctoral and masters
km3@ecs.soton.ac.uk). level dissertations. Rather we aim at providing the key develop-
Digital Object Identifier 10.1109/TCSI.2009.2025803 ments in algorithms and architectures along with an overview
1549-8328/$26.00 © 2009 IEEE
Authorized licensed use limited to: UNIVERSITY OF SOUTHAMPTON. Downloaded on September 15, 2009 at 13:19 from IEEE Xplore. Restrictions apply.
1894 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 56, NO. 9, SEPTEMBER 2009
A. The CORDIC Algorithm that satisfies the CORDIC convergence theorem [3]:
As shown in Fig. 1, the rotation of a two-dimensional vector . But,
through an angle , to obtain a rotated vector the decomposition according to (4) could be used only for
could be performed by the matrix product , (called the “convergence range”)
where is the rotation matrix: since . Therefore, the angular decom-
position of (4) is applicable for angles in the first and fourth
(1) quadrants. To obtain on-the-fly decomposition of angles into
the discrete base , one may otherwise use the nonrestoring
By factoring out the cosine term in (1), the rotation matrix decomposition [6]
can be rewritten as
and (5)
(2)
with if and otherwise, where the
rotation matrix for the th iteration corresponding to the selected
and can be interpreted as a product of a scale-factor angle is given by
with a pseudorotation matrix ,
given by
(6)
(3)
being the scale-factor, and the pseudoro-
The pseudorotation operation rotates the vector by an angle tation matrix
and changes its magnitude by a factor , to produce
a pseudo-rotated vector . (7)
To achieve simplicity of hardware realization of the rotation,
the key ideas used in CORDIC arithmetic are to (i) decompose
Note that the pseudo-rotation matrix for the th itera-
the rotations into a sequence of elementary rotations through
tion alters the magnitude of the rotated vector by a scale-factor
predefined angles that could be implemented with minimum
during the th microrotation, which is in-
hardware cost; and (ii) to avoid scaling, that might involve arith-
dependent of the value of (direction of microrotation) used in
metic operation, such as square-root and division. The second
the angle decomposition.
idea is based on the fact the scale-factor contains only the magni-
tude information but no information about the angle of rotation. 1All angles are measured in radian unless otherwise stated.
Authorized licensed use limited to: UNIVERSITY OF SOUTHAMPTON. Downloaded on September 15, 2009 at 13:19 from IEEE Xplore. Restrictions apply.
MEHER et al.: 50 YEARS OF CORDIC: ALGORITHMS, ARCHITECTURES AND APPLICATIONS 1895
TABLE I
GENERALIZED CORDIC ALGORITHM
Authorized licensed use limited to: UNIVERSITY OF SOUTHAMPTON. Downloaded on September 15, 2009 at 13:19 from IEEE Xplore. Restrictions apply.
1896 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 56, NO. 9, SEPTEMBER 2009
CORDIC computation is inherently sequential due to two where and the elementary angles
main bottlenecks: 1) the micro-rotation for any iteration is per- . The scale-factor for the th iteration
formed on the intermediate vector computed by the previous . In order to preserve the norm of the
iteration and 2) the th iteration could be started only vector the output of micro-rotations is required to be scaled by
after the completion of the th iteration, since the value of a factor
which is required to start the th iteration could be known
only after the completion of the th iteration. To alleviate the (16)
second bottleneck some attempts have been made for evalua-
tion of values corresponding to small micro-rotation angles
[9], [10]. However, the CORDIC iterations could not still be To have -bit output precision, the radix-4 CORDIC algorithm
performed in parallel due to the first bottleneck. A partial par- requires micro-rotations, which is half that of radix-2 al-
allelization has been realized in [11] by combining a pair of gorithm. However, it requires more computation time for each
conventional CORDIC iterations into a single merged iteration iteration and involves more hardware compared to the radix-2
which provides better area-delay efficiency. But the accuracy CORDIC to select the value of out of five different possi-
is slightly affected by such merging and cannot be extended to bilities. Moreover, the scale-factor, given by (16), also varies
a higher number of conventional CORDIC iterations since the with the rotation angles since it depends on which could have
Authorized licensed use limited to: UNIVERSITY OF SOUTHAMPTON. Downloaded on September 15, 2009 at 13:19 from IEEE Xplore. Restrictions apply.
MEHER et al.: 50 YEARS OF CORDIC: ALGORITHMS, ARCHITECTURES AND APPLICATIONS 1897
any of the five different values. Some techniques have there- TABLE II
fore been suggested for scale-factor compensation through iter- ANGLE RECODING ALGORITHM
ative shift-add operations [16], [17]. A high-radix CORDIC al-
gorithm in vectoring mode is also suggested in [18], which can
be used for reduced latency operation at the cost of larger size
tables for storing the elementary angles and pre-scaling factors
than the radix-2 and radix-4 implementation.
Authorized licensed use limited to: UNIVERSITY OF SOUTHAMPTON. Downloaded on September 15, 2009 at 13:19 from IEEE Xplore. Restrictions apply.
1898 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 56, NO. 9, SEPTEMBER 2009
C. Hybrid or Coarse-Fine Rotation CORDIC 2) Implementation of Hybrid CORDIC: To derive the effi-
ciency of hybrid CORDIC, the coarse and fine rotations are per-
Based on the radix-2 decomposition, any rotation angle
formed by separate circuits as shown in Fig. 5. The coarse ro-
with -bit precision could be expressed as a linear combina-
tation phase is performed by the CORDIC processor-I and the
tion of angles from the set , given
fine rotation phase is performed by CORDIC processor-II.
by , where , explicitly specifies whether
To have fast implementation, processor-I performs a pair of
there is need of a micro-rotation or not. But, radix-2 decom-
ROM look-up operations followed by addition to realize the ro-
position is not used in the conventional CORDIC because that
tation through angle . Since could be expressed as a linear
would not lead to simplicity of hardware realization. Instead,
combination of angels of small enough magnitude , where
arctangents of the corresponding values of radix-2 based set are
, the computation of fine rotation phase can
used as the elementary-angle-set with a view to implement the
be realized by a sequence of shift-and-add operations. For im-
CORDIC operations only by shift-add operations. The key idea
plementation of the fine rotation phase, no computations are in-
underlying the coarse-fine angular decomposition is that for the
volved to decide the direction of micro-rotation, since the need
fine values of , (i.e., when ),
of a micro-rotation is explicit in the radix-2 representation of
could be replaced by in the radix-set for expan-
. The radix-2 representation could also be recoded to express
sion of , since when is sufficiently large.
where as shown in [9]. Since the
1) Coarse–Fine Angular Decomposition: In the coarse-fine
direction of micro-rotations are explicit in such a representation
angular decomposition, the elementary-angle-set contains
of , it would be possible to implement the fine rotation phase
the arctangents of power-of-two for more-significant part
in parallel for low-latency realization.
while the less significant part contains the power-of-two
The hybrid decomposition could be used for reducing the la-
values, such that the radix-set is given by ,
tency by ROM-based realization of coarse operation. This can
where and
also be used for reducing the hardware complexity of fine rota-
, and is assumed
tion phase since there is no need to find the direction of micro-
to be sufficiently large such that [10]. For
rotation. Several options are however possible for the implemen-
the hybrid decomposition scheme, the rotation angle could be
tation of these two stages. A form of hybrid CORDIC is sug-
partitioned into two terms expressed as
gested in [23] for very-high precision CORDIC rotation where
(19) the ROM size is reduced to nearly bits. The coarse rota-
tions could be implemented as conventional CORDIC through
where and are said to be the coarse and fine subangles, shift-add operations of micro-rotations if the latency is tolerable.
respectively, given by 3) Shift-Add Implementation of Coarse Rotation: Using
the symmetry properties of the sine and cosine functions in
(20a) different quadrants, the rotation through any arbitrary angle
could be mapped from the full range to the first half
the first quadrant . The coarse-fine partition could be
(20b) applied thereafter for reducing the number of micro-rotations
necessary for fine rotations. To implement the course rotations
Authorized licensed use limited to: UNIVERSITY OF SOUTHAMPTON. Downloaded on September 15, 2009 at 13:19 from IEEE Xplore. Restrictions apply.
MEHER et al.: 50 YEARS OF CORDIC: ALGORITHMS, ARCHITECTURES AND APPLICATIONS 1899
Authorized licensed use limited to: UNIVERSITY OF SOUTHAMPTON. Downloaded on September 15, 2009 at 13:19 from IEEE Xplore. Restrictions apply.
1900 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 56, NO. 9, SEPTEMBER 2009
TABLE III
DIFFERENTIAL CORDIC ALGORITHM
Authorized licensed use limited to: UNIVERSITY OF SOUTHAMPTON. Downloaded on September 15, 2009 at 13:19 from IEEE Xplore. Restrictions apply.
MEHER et al.: 50 YEARS OF CORDIC: ALGORITHMS, ARCHITECTURES AND APPLICATIONS 1901
the case of higher-radix CORDIC and most of the optimized “actual” rotation where the norm of the vector is preserved at
CORDIC algorithms. The techniques for scaling compensation every micro-rotation.
for each such algorithms have been studied extensively for min- However, the problem with this formulation is that the overall
imizing the scaling overhead. In case of conventional CORDIC, range of angles for which it can be used is very small, because,
as given by (8), after sufficiently large number of iterations, the for 16-bit wordlength, the largest such angle is
scale-factor converges to 1.6467605, which leads to con- , which obviously is quite small compared to the entire
stant factor scaling since the scale factor remains the same for coordinate space. To overcome this problem, argument reduc-
all the angle of rotations. Constant factor scaling could be ef- tion is performed through “domain folding” [58] by mapping
ficiently implemented in a dedicated scaling unit designed by the target rotation-angles into the range . Besides, the
canonical signed digit (CSD)-based technique [47] and common elementary rotations are carried out in an adaptive manner to
sub-expression elimination (CSE) approach [48], [49]. When enhance the rate of convergence so as to force the approxima-
the sum of the output of more than one independent CORDIC tion error of final angle below a specified limit [59]. But, the
operations are to be evaluated, one can perform only one scaling domain-folding in some cases, involves a rotation through
of the output sum [50] in the case of constant factor scaling. In which demands a scaling by a factor of . Besides, the target
the following subsections, we briefly discuss some interesting range is still much larger than the range of convergence
developments on implementation of on-line scaling and real- of the scaling-free realization. The formulation of (28), there-
ization of scaling-free CORDIC. Besides, we outline here the fore, could be effectively used when a rotation through
sources of error that may arise in a CORDIC design and their is not required and angles of rotations could be folded to the
impact on implementation. range . Generalized algorithms, and their corre-
sponding architectures to perform the scale-factor compensa-
A. Implementation of Mixed-Scaling-Rotation tion in parallel with the CORDIC iterations, for both rotation
and vectoring modes are proposed in [60], where the compen-
Dewilde et al. [51] have suggested the on-line scaling where
sation overhead is reduced to a couple of iterations. It is shown
shift-add operations for scaling and micro-rotations are in-
in [61] that since the scale-factor is known in advance, one can
terleaved in the same circuit. This approach has been used in
perform the minimal recoding of the bits of scaling-factor, and
[52] and improved further in [53]. In the mixed-scaling-rota-
implement the multiplication thereafter by a Wallace tree. It is
tion (MSR) approach, pioneered by Wu et al. [54]–[56], the
a good solution of low-latency scaling particularly for pipelined
micro-rotation and scaling phases are merged into a unified
CORDIC architectures.
vector rotational model to minimize the overhead of the scaling
operation [54]–[56]. The MSR-CORDIC can be applied to
DSP applications, in which the rotation angles are usually C. Quantization and Numerical Accuracy
known a priori, e.g., the twiddle factor in fast Fourier transform Errors in CORDIC are mainly of two types: 1) the angle ap-
(FFT) and kernel components in other sinusoidal transforms. proximation error which originates from quantization of rota-
It is shown in [55] that the MSR technique can significantly tion angle represented by a linear combination of finite numbers
reduce the total iteration count so as to improve the speed of elementary angles and 2) the finite wordlength of the datapath
performance and enhance the signal-to-quantization-noise resulting in the rounding/truncation of output that increases cu-
ratio (SQNR) performance by controlling the internal dy- mulatively through the successive iterations of micro-rotations.
namic range. The MSR-CORDIC scheme has been applied A third source of error that also comes into the picture results
to a variable-length FFT processor design [29], and found to from the scaling of pseudo-rotated outputs. The scaling error is,
result in significant hardware reduction in the implementation however, also due to the use of finite wordlength in the scaling
of twiddle-factor multiplications. Although, the interleaved circuitry and is predominantly a rounding/truncation error. A
scaling and MSR-CORDIC provide hardware reduction, they detailed discussion on rounding error due to fixed and floating
also lead to the reduction of throughput. For high-throughput point implementations is available in [62]. In his earlier work,
implementation, one should implement the micro-rotations and Walther [3] concluded that the errors in the CORDIC output
scaling in two separate pipelined stages. are bounded, and extra bits are required in the datap-
aths to take care of the errors. Hu [62] has provided more pre-
B. Low-Complexity Scaling cise error bounds due to the angle approximation error for dif-
When the elementary angles pertaining to a rotation are “suf- ferent CORDIC modes for fixed point as well as floating-point
ficiently small”, defined by , and the rota- implementations. The error bound resulting for fixed point rep-
tions are only in one direction, the CORDIC rotation is given by resentation of arctangents is further analyzed by Kota and Cav-
the representation [57] allaro [63] and its impact on practical implementation has been
discussed.
Authorized licensed use limited to: UNIVERSITY OF SOUTHAMPTON. Downloaded on September 15, 2009 at 13:19 from IEEE Xplore. Restrictions apply.
1902 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 56, NO. 9, SEPTEMBER 2009
TABLE IV
COMPUTATIONS USING CORDIC ALGORITHM IN DIFFERENT CONFIGURATIONS
[63]. The hardware requirement therefore increases accord- CORDIC to signal and image processing, digital communica-
ingly with the desired accuracy. Floating-point implementation tion, robotics and 3-D graphics.
naturally gives higher accuracy than its fixed-point counter-
part, but at the cost of more complex hardware. To minimize A. Matrix Computation
the angle approximation error, the smallest elementary angle 1) QR Decomposition: QR decomposition of a matrix can
needs to be as small as possible [62]. This consequently be performed through Givens rotation [66] that selectively in-
demands more number of right-shifts and more hardware for troduces zeros into the matrix. Givens rotation is an orthogonal
the barrel-shifters and adders. Besides, to have better angle transformation of the form
approximation, more number of iterations are required which
increases the latency. The additional accuracy resulting from
floating-point implementation or better angle approximation (29)
may not, however, be necessary in many applications. Thus,
there is a need for trade-off between hardware-cost, latency where and
and numerical accuracy subject to a particular application. . The QR decomposition requires two types
Therefore, the designer has to check how much numerical of iterative operations to obtain an upper-triangular matrix
accuracy is needed along with area and speed constraints for using orthogonal transformations. Those are: (i) to calculate
the particular application; and can accordingly decide on fixed the Givens rotation angle, and (ii) to apply the calculated angle
or floating-point implementation and should set the wordlength of rotation to the rest of the rows. Circular coordinate CORDIC
and optimal number of iterations. is a good choice to implement both these Givens rotations,
where the first operation is performed by a VM CORDIC
and the second one is performed by an RM CORDIC. The
V. APPLICATIONS OF CORDIC CORDIC-based QR decomposition can be implemented in
VLSI with suitable area-time trade-off using a systolic trian-
CORDIC technique is basically applied for rotation of a
gular array, a linear array or a single CORDIC processor that is
vector in circular, hyperbolic or linear coordinate systems,
reconfigurable for rotation and vectoring modes of operations.
which in turn could also be used for generation of sinusoidal
A detail explanation of these architectures are available in [64],
waveform, multiplication and division operations, and evalua-
[67].
tion of angle of rotation, trigonometric functions, logarithms,
2) Singular Value Decomposition and Eigenvalue Estima-
exponentials and squareroot [6], [64], [65]. Table IV shows
tion: Singular value decomposition of a matrix is given by
some elementary functions and operations that can be directly
where and are orthogonal matrices and
implemented by CORDIC. The table also indicates whether
is a diagonal matrix of singular values. For CORDIC-based
the coordinate system is circular (CC), linear (LC), or hyper-
implementation of SVD, it is decomposed into 2 2 SVD prob-
bolic (HC), and whether the CORDIC operates in rotation
lems, and solved iteratively. To solve each 2 2 SVD problem,
mode (RM) or vectoring mode (VM), the initialization of the
two-sided Givens rotation is applied to each of the 2 2 ma-
CORDIC and the necessary pre- or postprocessing step to
trices to nullify the off-diagonal elements, as described in the
perform the operation. The scale factors are, however, obviated
following:
in Table IV for simplicity of presentation. In this Section, we
discuss how CORDIC is used for some basic matrix problems
like QR decomposition and singular-value decomposition.
Moreover, we make a brief presentation on the applications of (30)
Authorized licensed use limited to: UNIVERSITY OF SOUTHAMPTON. Downloaded on September 15, 2009 at 13:19 from IEEE Xplore. Restrictions apply.
MEHER et al.: 50 YEARS OF CORDIC: ALGORITHMS, ARCHITECTURES AND APPLICATIONS 1903
for
(31) Fig. 8. CORDIC-based direct digital synthesizer. = [
1 ]
f .
Authorized licensed use limited to: UNIVERSITY OF SOUTHAMPTON. Downloaded on September 15, 2009 at 13:19 from IEEE Xplore. Restrictions apply.
1904 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 56, NO. 9, SEPTEMBER 2009
be used to detect FM and FSK signals and to estimate phase parallel by computing pseudo-inverse through singular value
and frequency parameters [91]. A single VM-CORDIC can be decomposition. Collision detection is another area where
used to perform these computations for the implementation of CORDIC has been applied to robotics [107]. A CORDIC-based
a slicer for a high-order constellation like the 32-APSK used in highly parallel solution for collision detection between a robot
DVB-S2. manipulator and multiple obstacles in the workspace is sug-
CORDIC circuits operating in both modes are also required gested in [107]. The collision detection problem is formulated
in digital receivers for the synchronization stage to perform a as one that involves a number of coordinate transformations.
phase or frequency estimation followed by a correction stage. CORDIC-based processing elements are used to efficiently
This can be done by using two different CORDIC units, to meet perform the coordinate transformations by shift-add operations.
the high speed requirement in Costas loop for phase recovery 4) CORDIC for 3-D Graphics: The processing in graphics
in a QAM modulation [92], [93]. On the other hand the burst- such as 3-D vector rotation, lighting and vector interpolation are
based communication system that needs a preamble for syn- computation-intensive and are geometric in nature. CORDIC
chronization purposes, e.g., in case of IEEE 802.11a WLAN- architecture is therefore a natural candidate for cost-effective
OFDM receivers, can use a single CORDIC unit configurable implementation of these geometric computations in graphics.
for both operating modes since the estimation and correction A systematic formulation to represent 3D computer graphics
are not performed simultaneously [94], [95]. Apart from these, operations in terms of CORDIC-type primitives is provided in
the CORDIC-based QR decomposition has been used in multi- [108]. An efficient stream processor based on CORDIC-type
input-multi-output (MIMO) systems to implement V-BLAST modules to implement the graphic operations is also suggested
detectors [96]–[98], and to implement a recursive-least-square in [108]. 3-D vector interpolation is also an important function
(RLS) adaptive antenna beamformer [67], [99], [100]. in graphics which is required for good-quality shading [109] for
graphic rendering. It is shown that the variable-precision capa-
D. Applications of CORDIC to Robotics and Graphics bility of CORDIC engine could be utilized to realize a power-
aware implementation of the 3-D vector interpolator [110].
Two of the key problems where CORDIC provides area and
power-efficient solutions are: 1) direct kinematics and 2) inverse
kinematics of serial robot manipulators. How CORDIC is ap- VI. CONCLUSION
plied in these applications is discussed below. The beauty of CORDIC is its potential for unified solution
1) Direct Kinematics Solution (DKS) for Serial Robot Ma- for a large set of computational tasks involving the evaluation
nipulators: A robot manipulator consists of a sequence of links, of trigonometric and transcendental functions, calculation of
connected typically by either revolute or prismatic joints. For an multiplication, division, square-root and logarithm, solution of
-degrees-of-freedom manipulator, there are joint-link pairs linear systems, QR-decomposition, and SVD, etc. Moreover,
with link 0 being the supporting base and the last link is attached CORDIC is implemented by a simple hardware through re-
with a tool. The joints and links are numbered outwardly from peated shift-add operations. These features of CORDIC has
the base. The coordinates of the points on the th link repre- made it an attractive choice for a wide variety of applications.
sented by change successively for In the last fifty years, several algorithms and architectures
due to successive rotations and translations of the links. The have been developed to speed up the CORDIC by reducing
translation operations are realized by simple additions of coor- its iteration counts and through its pipelined implementation.
dinate values while the new coordinates of any point due to ro- Moreover, its applications in several diverse areas including
tation are computed by RM-CORDIC circuits. signal processing, image processing, communication, robotics
2) Inverse Kinematics for Robot Manipulators: The inverse and graphics apart from general scientific and technical compu-
kinematics problem involves determination of joint variables for tations have been explored. Latency of computation, however,
a desired position and orientation for the tool. The CORDIC ap- continues to be the major drawback of the CORDIC algorithm,
proach is valuable to find the inverse kinematic solution when since we do not have efficient algorithms for its parallel im-
a closed form solution is possible (when, in particular, the de- plementation. But, CORDIC on the other hand is inherently
sired tool tip position is within the robot’s work envelope and suitable for pipelined designs, due to its iterative behavior, and
when joint angle limits are not violated). The authors in [101] small cycle time compared with the conventional arithmetic.
present a maximum pipelined CORDIC-based architecture for For high-throughput applications, efficient pipelined-archi-
efficient computation of the inverse kinematics solution. It is tectures with multiple-CORDIC units could be developed to
also shown [101], [102] that up to 25 CORDIC processors are take the advantage of pipelineability of CORDIC, because the
required for the computation of the entire inverse kinematics so- digital hardware is getting cheaper along with the progressive
lution for a six-link PUMA-type robotic arm. Apart from imple- device-scaling. Research on fast implementation of shift-ac-
mentation of rotation operations, CORDIC is used in the eval- cumulation operation, exploration of new number systems
uation of trigonometric functions and square root expressions for CORDIC, optimization of CORDIC for constant rotation
involved in the inverse kinematics problems [103]. have scope for further reduction of its latency. Another way
3) CORDIC for Other Robotics Applications: CORDIC has to use CORDIC efficiently, is to transform the computational
also been applied to robot control [104], [105], where CORDIC algorithm into independent segments, and to implement the
circuits serve as the functional units of a programmable CPU individual segments by different CORDIC processors. With
co-processor. Another application of CORDIC is for kinematics enhancement of its throughput and reduction of latency, it is
of redundant manipulators [106]. It is shown in [106] that the expected that CORDIC would be useful for many high-speed
case of inverse kinematics can be implemented efficiently in and real-time applications. The area-delay-accuracy trade-off
Authorized licensed use limited to: UNIVERSITY OF SOUTHAMPTON. Downloaded on September 15, 2009 at 13:19 from IEEE Xplore. Restrictions apply.
MEHER et al.: 50 YEARS OF CORDIC: ALGORITHMS, ARCHITECTURES AND APPLICATIONS 1905
for different advanced algorithms may be investigated in detail [25] C.-Y. Chen and W.-C. Liu, “Architecture for CORDIC algorithm real-
and compared with in future work. ization without ROM lookup tables,” in Proc. 2003 Int. Symp. on Cir-
cuits Syst., ISCAS’03, May 2003, vol. 4, pp. 544–547.
[26] D. Fu and A. N. Willson, Jr., “A two-stage angle-rotation architecture
REFERENCES and its error analysis for efficient digital mixer implementation,” IEEE
Trans.Circuits Syst. I: Reg. Papers, vol. 53, no. 3, pp. 604–614, Mar.
[1] J. E. Volder, “The CORDIC trigonometric computing technique,” IRE 2006.
Trans. Electron. Computers, vol. EC-8, pp. 330–334, Sept. 1959. [27] S. Ravichandran and V. Asari, “Implementation of unidirectional
[2] J. E. Volder, “The birth of CORDIC,” J. VLSI Signal Process., vol. 25, CORDIC algorithm using precomputed rotation bits,” in 45th Midwest
pp. 101–105, 2000. Symp. on Circuits Syst., 2002. MWSCAS 2002, Aug. 2002, vol. 3, pp.
[3] J. S. Walther, “A unified algorithm for elementary functions,” in 453–456.
Proc. 38th Spring Joint Computer Conf., Atlantic City, NJ, 1971, pp. [28] C.-Y. Chen and C.-Y. Lin, “High-resolution architecture for CORDIC
379–385. algorithm realization,” in Proc. Int. Conf. on Commun., Circuits Syst.,
[4] J. S. Walther, “The story of unified CORDIC,” J. VLSI Signal Process., ICCCS’06, June 2006, vol. 1, pp. 579–582.
vol. 25, no. 2, pp. 107–112, June 2000. [29] D. D. Hwang, D. Fu, and A. N. Willson, Jr., “A 400-MHz processor
[5] D. S. Cochran, “Algorithms and accuracy in the HP-35,” Hewlett- for the conversion of rectangular to polar coordinates in 0.25-=mu-m
Packard J., pp. 1–11, Jun. 1972. CMOS,” IEEE J. Solid-State Circuits, vol. 38, no. 10, pp. 1771–1775,
[6] J.-M. Muller, Elementary Functions: Algorithms and Implementa- Oct. 2003.
tion. Boston, MA: Birkhauser Boston, 2006. [30] S.-W. Lee, K.-S. Kwon, and I.-C. Park, “Pipelined cartesian-to-polar
[7] S.-F. Hsiao and J.-M. Delosme, “Householder CORDIC algorithms,” coordinate conversion based on SRT division,” IEEE Trans. Circuits
IEEE Trans. Computers, vol. 44, no. 8, pp. 990–1001, Aug. 1995. Syst. II: Express Briefs, vol. 54, no. 8, pp. 680–684, Aug. 2007.
[8] E. Antelo, J. Villalba, and E. L. Zapata, “A low-latency pipelined 2D [31] S.-F. Hsiao, Y.-H. Hu, and T.-B. Juang, “A memory-efficient and
and 3D CORDIC processors,” IEEE Trans. Computers, vol. 57, no. 3, high-speed sine/cosine generator based on parallel CORDIC rota-
pp. 404–417, Mar. 2008. tions,” IEEE Signal Process. Lett., vol. 11, no. 2, 2004.
[9] D. Timmermann, H. Hahn, and B. J. Hosticka, “Low latency time [32] T.-B. Juang, S.-F. Hsiao, and M.-Y. Tsai, “Para-CORDIC: Parallel
CORDIC algorithms,” IEEE Trans. Computers, vol. 41, no. 8, pp. CORDIC rotation algorithm,” IEEE Trans. Circuits Syst. I: Regular Pa-
1010–1015, Aug. 1992. pers, vol. 51, no. 8, 2004.
[10] S. Wang, V. Piuri, and J. E. E. Swartzlander, “Hybrid CORDIC algo- [33] T.-B. Juang, “Area/delay efficient recoding methods for parallel
rithms,” IEEE Trans. Computers, vol. 46, no. 11, pp. 1202–1207, Nov. CORDIC rotations,” in IEEE Asia Pacific Conf. on Circuits Syst.,
1997. APCCAS’06, Dec. 2006, pp. 1539–1542.
[11] S. Wang and E. E. Swartzlander, Jr., “Merged CORDIC algorithm,” [34] M. D. Ercegovac and T. Lang, “Redundant and on-line CORDIC:
in IEEE Int. Symp. on Circuits Syst. (ISCAS’95), 1995, vol. 3, pp. Application to matrix triangularization and SVD,” IEEE Trans. Com-
1988–1991. puters, vol. 39, no. 6, pp. 725–740, June 1990.
[12] B. Gisuthan and T. Srikanthan, “Pipelining flat CORDIC based [35] N. Takagi, T. Asada, and S. Yajima, “Redundant CORDIC methods
trigonometric function generators,” Microelectron. J., vol. 33, pp. with a constant scale factor for sine and cosine computation,” IEEE
77–89, 2002. Trans. Computers, vol. 40, no. 9, pp. 989–995, Sept. 1991.
[13] S. Suchitra, S. Sukthankar, T. Srikanthan, and C. T. Clarke, “Elimina- [36] J. Duprat and J.-M. Muller, “The CORDIC algorithm: New results for
tion of sign precomputation in flat CORDIC,” in IEEE Int. Symp. on fast VLSI implementation,” IEEE Trans. Computers, vol. 42, no. 2, pp.
Circuits Syst., ISCAS’05, May 2005, vol. 4, pp. 3319–3322. 168–178, Feb. 1993.
[14] E. Deprettere, P. Dewilde, and R. Udo, “Pipelined CORDIC architec- [37] J.-A. Lee and T. Lang, “Constant-factor redundant CORDIC for angle
tures for fast VLSI filtering and array processing,” in IEEE Int. Conf. calculation and rotation,” IEEE Trans. Computers, vol. 41, no. 8, pp.
on Acoust., Speech, Signal Process., ICASSP’84, Mar. 1984, vol. 9, pp. 1016–1025, Aug. 1992.
250–253. [38] N. D. Hemkumar and J. R. Cavallaro, “Redundant and on-line
[15] H. Kunemund, S. Soldner, S. Wohlleben, and T. Noll, “CORDIC pro- CORDIC for unitary transformations,” IEEE Trans. Computers, vol.
cessor with carry save architecture,” in Proc. ESSCIRC 90, Sept. 1990, 43, no. 8, pp. 941–954, Aug. 1994.
pp. 193–196. [39] J. Valls, M. Kuhlmann, and K. K. Parhi, “Evaluation of CORDIC algo-
[16] E. Antelo, J. Villalba, J. D. Bruguera, and E. L. Zapatai, “High perfor- rithms for FPGA design,” J. VLSI Signal Process. Syst., vol. 32, no. 3,
mance rotation architectures based on the radix-4 CORDIC algorithm,” pp. 207–222, Nov. 2002.
IEEE Trans. Computers, vol. 46, no. 8, pp. 855–870, Aug. 1997. [40] D. E. Metafas and C. E. Goutis, “A floating point pipeline CORDIC
[17] P. R. Rao and I. Chakrabarti, “High-performance compensation tech- processor with extended operation set,” in IEEE Int. Symp. on Circuits
nique for the radix-4 CORDIC algorithm,” Proc. IEE Comput. and Dig- Syst., ISCAS’91, June 1991, vol. 5, pp. 3066–3069.
ital Techn., vol. 149, no. 5, pp. 219–228, Sep. 2002. [41] Z. Feng and P. Kornerup, “High speed DCT/IDCT using a pipelined
[18] E. Antelo, T. Lang, and J. D. Bruguera, “Very-high radix circular CORDIC algorithm,” in 12th Symp. on Computer Arithmetic, July
CORDIC: Vectoring and unified rotation/vectoring,” IEEE Trans. 1995, pp. 180–187.
Computers, vol. 49, no. 7, pp. 727–739, July 2000. [42] M. Jun, K. K. Parhi, G. J. Hekstra, and E. F. Deprettere, “Efficient
[19] Y. H. Hu and S. Naganathan, “An angle recoding method for CORDIC implementations of pipelined CORDIC based IIR digital filters using
algorithm implementation,” IEEE Trans. Comput., vol. 42, no. 1, pp. fast orthonormal -rotations,” IEEE Trans. Signal Process., vol. 48, no.
99–102, Jan. 1993. 9, 2000.
[20] Y. H. Hu and H. H. M. Chern, “A novel implementation of CORDIC al- [43] M. Chakraborty, A. S. Dhar, and M. H. Lee, “A trigonometric formu-
gorithm using backward angle recoding (BAR),” IEEE Trans. Comput., lation of the LMS algorithm for realization on pipelined CORDIC,”
vol. 45, no. 12, pp. 1370–1378, Dec. 1996. IEEE Trans. Circuits Syst. II, Express Briefs, vol. 52, no. 9, 2005.
[21] C.-S. Wu, A.-Y. Wu, and C.-H. Lin, “A high-performance/low-latency [44] E. I. Garcia, R. Cumplido, and M. Arias, “Pipelined CORDIC design
vector rotational CORDIC architecture based on extended elementary on FPGA for a digital sine and cosine waves generator,” in Int. Conf.
angle set and trellis-based searching schemes,” IEEE Trans. Circuits on Electr. Electron. Eng., ICEEE’06, Sept. 2006, pp. 1–4.
Syst. II: Anal. Digital Signal Process., vol. 50, no. 9, pp. 589–601, Sep. [45] H. Dawid and H. Meyr, “The differential CORDIC algorithm: Constant
2003. scale factor redundant implementation without correcting iterations,”
[22] T. K. Rodrigues and E. E. Swartzlander, “Adaptive CORDIC: IEEE Trans. Computers, vol. 45, no. 3, pp. 307–318, Mar. 1996.
Using parallel angle recoding to accelerate CORDIC rotations,” in [46] C. Y. Kang and E. E. Swartzlander, Jr., “Digit-pipelined direct digital
40th Asilomar Conf. on Signals, Syst. and Computers, ACSSC’06, frequency synthesis based on differential CORDIC,” IEEE Trans. Cir-
Oct.–Nov. 2006, pp. 323–327. cuits Syst. I, Reg. Papers, vol. 53, no. 5, pp. 1035–1044, May 2006.
[23] M. Kuhlmann and K. K. Parhi, “P-CORDIC: A precomputation based [47] R. I. Hartley, “Subexpression sharing in filters using canonic signed
rotation CORDIC algorithm,” EURASIP J. Appl. Signal Process., vol. digit multipliers,” IEEE Trans. Circuits Syst. II: Analog Digital Signal
2002, no. 9, pp. 936–943, 2002. Process., vol. 43, no. 10, pp. 677–688, Oct. 1996.
[24] D. Fu and A. N. Willson, Jr., “A high-speed processor for digital sine/ [48] O. Gustafsson, A. G. Dempster, K. Johansson, M. D. Macleod, and L.
cosine generation and angle rotation,” in Conf. Rec. 32nd Asilomar Wanhammar, “Simplified design of constant coefficient multipliers,” J.
Conf. on Signals, Syst. & Computers, Nov. 1998, vol. 1, pp. 177–181. Circuits, Syst., Signal Process., vol. 25, no. 2, pp. 225–251, Apr. 2006.
Authorized licensed use limited to: UNIVERSITY OF SOUTHAMPTON. Downloaded on September 15, 2009 at 13:19 from IEEE Xplore. Restrictions apply.
1906 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 56, NO. 9, SEPTEMBER 2009
[49] G. Gilbert, D. Al-Khalili, and C. Rozon, “Optimized distributed [73] L.-W. Chang and S.-W. Lee, “Systolic arrays for the discrete
processing of scaling factor in CORDIC,” in 3rd Int. IEEE-NEWCAS Hartley transform,” IEEE Trans. Signal Process., vol. 39, no. 11, pp.
Conf., June 2005, pp. 35–38. 2411–2418, Nov. 1991.
[50] A. M. Despain, “Fourier transform computers using CORDIC itera- [74] P. K. Meher and G. Panda, “Novel recursive algorithm and highly com-
tions,” IEEE Trans. Computers, vol. 23, no. C-10, pp. 993–1001, Oct. pact semisystolic architecture for high throughput computation of 2-D
1974. DHT,” Electron. Lett., vol. 29, no. 10, pp. 883–885, May 1993.
[51] P. Dewilde, E. F. Deprettere, and R. Nouta, “Parallel and pipelined [75] W.-H. Chen, C. H. Smith, and S. C. Fralick, “A fast computational
VLSI implementation of signal processing algorithms,” in VLSI and algorithm for the discrete cosine transform,” IEEE Trans. Commun.,
Modern Signal Processing, S. Y. Kung, H. J. Whitehouse, and T. vol. 25, no. 9, pp. 1004–1009, Sep. 1977.
Kailath, Eds. Englewood Cliffs, NJ: Prentice-Hall, 1995. [76] B. Das and S. Banerjee, “Unified CORDIC-based chip to realise DFT/
[52] K. J. Jones, “High-throughput, reduced hardware systolic solution to DHT/DCT/DST,” Proc. IEE Computers and Digital Techniques, vol.
prime factor discrete Fourier transform algorithm,” Proc.IEE Com- 149, no. 4, pp. 121–127, July 2002.
puters and Digital Techn., vol. 137, no. 3, pp. 191–196, May 1990. [77] J.-H. Hsiao, L.-G. Ghen, T.-D. Chiueh, and C.-T. Chen, “High
[53] P. K. Meher, J. K. Satapathy, and G. Panda, “Efficient systolic solution throughput CORDIC-based systolic array design for the discrete
for a new prime factor discrete Hartley transform algorithm,” Proc. IEE cosine transform,” IEEE Trans. Circuits Syst. Video Technol., vol. 5,
Circuits, Devices & Syst., vol. 140, no. 2, pp. 135–139, Apr. 1993. no. 3, pp. 218–225, June 1995.
[54] Z.-X. Lin and A.-Y. Wu, “Mixed-scaling-rotation CORDIC [78] D. C. Kar and V. V. B. Rao, “A CORDIC-based unified systolic archi-
(MSR-CORDIC) algorithm and architecture for scaling-free high-per- tecture for sliding window applications of discrete transforms,” IEEE
formance rotational operations,” in IEEE Int. Conf. on Acoust., Speech, Trans. Signal Process., vol. 44, no. 2, pp. 441–444, Feb. 1996.
Signal Process., ICASSP’03, Apr. 2003, vol. 2, pp. 653–656. [79] Y. H. Hu and S. Naganathan, “A novel implementation of chirp
[55] C.-H. Lin and A.-Y. Wu, “Mixed-scaling-rotation CORDIC (MSR- Z-transform using a CORDIC processor,” IEEE Trans. Acoust.,
CORDIC) algorithm and architecture for high-performance vector ro- Speech, Signal Process., vol. 38, no. 2, pp. 352–354, Feb. 1990.
tational DSP applications,” IEEE Trans. Circuits Syst. I, Reg. Papers, [80] J. G. Proakis and D. G. Manolakis, Digital Signal Processing: Prin-
vol. 52, no. 11, pp. 2385–2396, Nov. 2005. ciples, Algorithms and Applications. Upper Saddle River, NJ: Pren-
[56] C.-L. Yu, T.-H. Yu, and A.-Y. Wu, “On the fixed-point properties of tice-Hall, 1996.
mixed-scaling-rotation CORDIC algorithm,” in IEEE Workshop on [81] R. C. Gonzalez, Digital Image Processing, 3rd ed. Upper Saddle
Signal Process. Syst., Oct. 2007, pp. 430–435. River, N.J.: Prentice Hall, 2008.
[57] A. S. Dhar and S. Banerjee, “An array architecture for fast computation [82] N. Guil, J. Villalba, and E. L. Zapata, “A fast Hough transform for
of discrete hartley transform,” IEEE Trans. Circuits Syst., vol. 38, no. segment detection,” IEEE Trans. Image Process., vol. 4, no. 11, pp.
9, pp. 1095–1098, Sep. 1991. 1541–1548, Nov. 1995.
[58] K. Maharatna, A. Troya, S. Banerjee, and E. Grass, “New virtually [83] S. M. Bhandakar and H. Yu, “VLSI implementation of real-time
scaling free adaptive CORDIC rotator,” Proc. IEE Computers and Dig- image rotation,” in Int. Conf. on Image Process., Sept. 1996, vol. 2,
ital Techn., vol. 151, no. 6, pp. 448–456, Nov. 2004. pp. 1015–1018.
[59] K. Maharatna, S. Banerjee, E. Grass, M. Krstic, and A. Troya, “Modi- [84] S. Sathyanarayana, S. R. Kumar, and S. Thambipillai, “Unified
fied virtually scaling free adaptive CORDIC rotator algorithm and ar- CORDIC based processor for image processing,” in 15th Int. Conf. on
chitecture,” IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 11, Digital Signal Process., July 2007, pp. 343–346.
pp. 1463–1474, Nov. 2005. [85] J. Valls, T. Sansaloni, A. Perez-Pascual, V. Torres, and V. Almenar,
[60] J. Villalba, T. Lang, and E. Zapata, “Parallel compensation of scale “The use of CORDIC in software defined radios: A tutorial,” IEEE
factor for the CORDIC algorithm,” J. VLSI Signal Process., vol. 19, Commun. Mag., vol. 44, no. 9, 2006.
no. 3, pp. 227–241, Aug. 1998. [86] L. Cordesses, “Direct digital synthesis: A tool for periodic wave gen-
[61] D. Timmermann, H. Hahn, B. J. Hosticka, and B. Rix, “A new addition eration (part 1),” IEEE Signal Process. Mag., vol. 21, no. 4, 2004.
scheme and fast scaling factor compensation methods for CORDIC al- [87] J. Vankka, Digital Synthesizers and Transmitters for Software Radio.
gorithms,” Integration, the VLSI J., vol. 11, no. 1, pp. 85–100, Mar. Dordrecht, Netherlands: Springer, 2005.
1991. [88] J. Vankka, “Methods of mapping from phase to sine amplitude in di-
[62] Y. H. Hu, “The quantization effects of the CORDIC algorithm,” IEEE rect digital synthesis,” in 50th IEEE International Frequency Control
Trans. Signal Process., vol. 40, no. 4, pp. 834–844, Apr. 1992. Symposium, Jun. 1996, pp. 942–950.
[63] K. Kota and J. R. Cavallaro, “Numerical accuracy and hardware [89] F. Cardells-Tormo and J. Valls-Coquillat, “Optimisation of direct dig-
tradeoffs for CORDIC arithmetic for special-purpose processors,” ital frequency synthesisers based on CORDIC,” Electron. Lett., vol. 37,
IEEE Trans. Computers, vol. 42, no. 7, pp. 769–779, July 1993. no. 21, 2001.
[64] Y. H. Hu, “CORDIC-based VLSI architectures for digital signal pro- [90] M. E. Frerking, Digital Signal Processing in Communication Sys-
cessing,” IEEE Signal Process. Mag., vol. 9, no. 3, pp. 16–35, July tems. New York: Van Nostrand Reinhold, 1994.
1992. [91] H. Meyr, M. Moeneclaey, and S. A. Fechtel, Digital Communication
[65] F. Angarita, A. Perez-Pascual, T. Sansaloni, and J. Vails, “Efficient Receivers: Synchronization, Channel Estimation, and Signal Pro-
FPGA implementation of cordic algorithm for circular and linear co- cessing. New York: Wiley, 1998.
ordinates,” in Int. Conf. on Field Programmable Logic and Appl., Aug. [92] F. Cardells, J. Valls, V. Almenar, and V. Torres, “Efficient FPGA-based
2005, pp. 535–538. QPSK demodulation loops: Application to the DVB standard,” Lec-
[66] G. H. Golub and C. F. Van Loan, Matrix Computations, 3rd ed. Bal- tures Notes on Computer Sci., vol. 2438, pp. 102–111, 2002.
timore, MD: Johns Hopkins Univ. Press, 1996. [93] C. Dick, F. Harris, and M. Rice, “FPGA implementation of carrier syn-
[67] G. Lightbody, R. Woods, and R. Walke, “Design of a parameterizable chronization for QAM receivers,” J. VLSI Signal Process., vol. 36, pp.
silicon intellectual property core for QR-based RLS filtering,” IEEE 57–71, 2004.
Trans. Very Large Scale Integr. (VLSI) Syst., vol. 11, no. 4, pp. 659–678, [94] M. J. Canet, F. Vicedo, V. Almenar, and J. Valls, “FPGA implementa-
Aug. 2003. tion of an IF transceiver for OFDM-based WLAN,” in IEEE Workshop
[68] J. R. Cavallaro and F. T. Luk, “CORDIC arithmetic for a SVD pro- on Signal Process. Syst., SIPS’04, 2004, pp. 227–232.
cessor,” J. Parallel and Distributed Computing, vol. 5, pp. 271–290, [95] A. Troya, K. Maharatna, M. Krstic, E. Grass, U. Jagdhold, and R.
1988. Kraemer, “Low-power VLSI implementation of the inner receiver for
[69] J. M. Delosme, “A processor for two-dimensional symmetric eigen- OFDM-based WLAN systems,” IEEE Trans. Circuits Syst. I, Reg. Pa-
value and singular value arrays,” in IEEE 21th Asilomar Conf. on Cir- pers, vol. 55, no. 2, pp. 672–686, Mar. 2008.
cuits, Syst., and Computers, Nov. 1987, pp. 217–221. [96] Z. Guo and P. Nilsson, “A VLSI architecture of the square root algo-
[70] Z. Liu, K. Dickson, and J. V. McCanny, “Application-specific instruc- rithm for V-BLAST detection,” J. VLSI Signal Process. Syst., vol. 44,
tion set processor for SoC implementation of modern signal processing no. 3, pp. 219–230, Sept. 2006.
algorithms,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 52, no. 4, [97] Z. Khan, T. Arslan, J. S. Thompson, and A. T. Erdogan, “Analysis and
pp. 755–765, Apr. 2005. implementation of multiple-input, multiple-output VBLAST receiver
[71] K. J. Jones, “2D systolic solution to discrete Fourier transform,” Proc. from area and power efficiency perspective,” IEEE Trans. Very Large
IEE Computers and Digital Techn., vol. 136, no. 3, pp. 211–216, May Scale Integr. (VLSI) Syst., vol. 14, no. 11, pp. 1281–1286, Nov. 2006.
1989. [98] F. Sobhanmanesh and S. Nooshabadi, “Parametric minimum hardware
[72] T.-Y. Sung, “Memory-efficient and high-speed split-radix FFT/IFFT QR-factoriser architecture for V-BLAST detection,” Proc. IEE Cir-
processor based on pipelined CORDIC rotations,” Proc. IEE Vision, cuits, Devices and Syst., vol. 153, no. 5, pp. 433–441, Oct. 2006.
Image Signal Process., vol. 153, no. 4, pp. 405–410, Aug. 2006.
Authorized licensed use limited to: UNIVERSITY OF SOUTHAMPTON. Downloaded on September 15, 2009 at 13:19 from IEEE Xplore. Restrictions apply.
MEHER et al.: 50 YEARS OF CORDIC: ALGORITHMS, ARCHITECTURES AND APPLICATIONS 1907
[99] C. M. Rader, “VLSI systolic arrays for adaptive nulling [radar],” IEEE Javier Valls (M’02) received the telecommunication
Signal Process. Mag., vol. 13, no. 4, pp. 29–49, July 1996. engineering degree from the Universidad Politec-
[100] R. Hamill, J. V. McCanny, and R. L. Walke, “Online CORDIC algo- nica de Cataluna, Spain, and the Ph.D. degree in
rithm and VLSI architecture for implementing QR-array processors,” telecommunication engineering from the Univer-
IEEE Trans. Signal Process., vol. 48, no. 2, pp. 592–598, Feb. 2000. sidad Politecnica de Valencia, Spain, in 1993 and
[101] C. Lee and P. Chang, “A maximum pipelined CORDIC architecture for 1999, respectively.
inverse kinematic position computation,” IEEE Trans. Robot. Autom., He is with the Department of Electronics at Uni-
vol. RA-3, no. 5, pp. 445–458, 1987. versidad Politecnica de Valencia, Valencia, Spain,
[102] R. Harber, J. Li, X. Hu, and S. Bass, “The application of bit-serial since 1993, where he currently is an Associate
CORDIC computational units to the design of inverse kinematics pro- Professor. His current research interests include the
cessors,” in Proc. IEEE Int. Conf. on Robot. and Autom., 1988, pp. design of FPGA-based systems, computer arith-
1152–1157. metic, VLSI signal processing, and digital communications.
[103] C. Krieger and B. Hosticka, “Inverse kinematics computations with
modified CORDIC iterations,” Proc. IEE Computers and Digital
Techn., vol. 143, pp. 87–92, Jan. 1996.
[104] Y. Wang and S. Butner, “A new architecture for robot control,” in Proc.
Tso-Bing Juang (M’99) received the Ph.D. degree
IEEE Int. Conf. on Robot. Autom., 1987, pp. 664–670.
in computer science and engineering from National
[105] Y. Wang and S. Butner, “RIPS: A platform for experimental real-time
Sun Yat-sen University, Kaoshiung, Taiwan, in 2004.
sensory-based robot control,” IEEE Trans. Syst., Man, Cybern., vol. 19,
Since 2006, he has been with the Department of Com-
pp. 853–860, 1989.
[106] I. Walker and J. Cavallaro, “Parallel VLSI architectures for real-time puter Science and Information Engineering, National
kinematics of redundant robots,” in Proc. IEEE Int. Conf. on Robot. Pingtung Institute of Commerce, Taiwan, and is cur-
Autom., 1993, pp. 870–877. rently an Assistant Professor. His research interests
[107] M. Kameyama, T. Amada, and T. Higuchi, “Highly parallel collision include computer arithmetic and VLSI systems.
detection processor for intelligent robots,” IEEE J. Solid-State Circuits, Dr. Juang received the Best Thesis Award from
vol. 27, pp. 300–306, 1992. Taiwan Xerox’s foundation in 1995. Currently he
[108] T. Lang and E. Antelo, “High-throughput CORDIC-based geometry serves as the reviewers of IEEE TRANSACTIONS
ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, IEEE TRANSACTIONS
operations for 3D computer graphics,” IEEE Trans. Computers, vol.
ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, IEEE TRANSACTIONS ON
54, no. 3, pp. 347–361, Mar. 2005.
[109] B. Phong, “Illumination for computer generated pictures,” Commun. COMPUTERS, and IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION
ACM, pp. 311–317, June 1975. (VLSI) SYSTEMS.
[110] J. Euh, J. Chittamuru, and W. Burleson, “CORDIC vector interpolator
for power-aware 3D computer graphics,” in IEEE Workshop on Signal
Process. Syst., SIPS’02, Oct. 2002, pp. 240–245.
K. Sridharan (S’84-M’96-SM’01) received the
Ph.D. degree from Rensselaer Polytechnic Institute,
Troy, NY, in 1995.
He was an Assistant Professor at the Indian
Institute of Technology (IIT) Guwahati, India, from
1996 to 2001. Since June 2001, he has been with
IIT Madras, Chennai, India, where he is presently an
Associate Professor. He was a visiting staff member
at NTU, Singapore, in 2000–2001 and 2006–2008.
He has supervised three Ph.D. candidates, and holds
one joint patent. He has published more than 60
papers in various international journals and conferences.
Dr. Sridharan received the Computer Engineering Division Prize for a paper
published in the Journal of I.E. (India) in 2002. He was also a joint recipient of
the IEEE Vincent Bendix Award.
Pramod Kumar Meher (SM’03) received the M.Sc.
and M.Phil. degrees in physics, in 1978 and 1981, re-
spectively, and the Ph.D. degree in science, in 1996,
all from Sambalpur University, Sambalpur, India. Koushik Maharatna (M’02) received the M.Sc. de-
Currently, he is a Senior Scientist with the Institute gree in electronic science from Calcutta University,
for Infocomm Research, Singapore. Prior to this as- in 1995 and the Ph.D. degree from Jadavpur Univer-
signment he was a visiting faculty with the School of sity, Calcutta, India, in 2002.
Computer Engineering, Nanyang Technological Uni- From 2000 to 2003, he was a Research Scientist
versity, Singapore. From 1997 to 2002, he was a Pro- with IHP, Frankfurt (Oder), Germany, where, he
fessor of Computer Application with Utkal Univer- was mainly involved in the design of a single-chip
sity, Bhubaneswar, India, and from 1993 to 1997, he modem for the IEEE 802.11a standard. He is with
was a Reader in electronics with Berhampur University, Berhampur, India. His the Electronics Systems and Devices Group at the
research interest mainly includes the design and optimization of dedicated and School of Electronics and Computer Science of
reconfigurable architectures for computation-intensive algorithms, arithmetic the University of Southampton, U.K., as a Senior
circuits, and algorithm-architecture co-design for digital signal processing and Lecturer since 2006. His research interests include development of VLSI
communication. architectures for DSP and communication applications, computer arithmetic,
Dr. Meher is a Fellow of IET (UK) and of IETE (India). He is currently low-power digital design, analog signal processing, and CNN.
serving as Associate Editor for the IEEE TRANSACTIONS ON CIRCUITS AND Dr. Maharatna has served as session chair, reviewer and review committee
SYSTEMS—II: EXPRESS BRIEFS, IEEE TRANSACTIONS ON VERY LARGE SCALE management member in several IEEE conferences and journals. He is currently
INTEGRATION (VLSI) SYSTEMS, and The Journal of Circuits, Systems, and a member of the Engineering and Physical Research Council (EPSRC) college
Signal Processing. He was the recipient of Samanta Chandrasekhar Award for in the U.K. He is also a member of VLSI System Application (VSA) Technical
excellence in research in engineering and technology for the year 1999. Committee of IEEE Circuits and Systems Society.
Authorized licensed use limited to: UNIVERSITY OF SOUTHAMPTON. Downloaded on September 15, 2009 at 13:19 from IEEE Xplore. Restrictions apply.