Вы находитесь на странице: 1из 4

2014 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT)

An Approach to Design a Matrix Inversion

Hardware Module using FPGA
Gurrala Ajay Kumar1 , Thadigotla Venkata N Raju2 , and V Elamaran2
Subbareddy1 , and Bommepalli Madhava Reddy1 2
Asst. Professor, Dept. of ECE
1 School of EEE, SASTRA Univerisity
Department of ECE, School of EEE
SASTRA University Thanjavur, Tamilnadu, India
Thanjavur, Tamilnadu, India elamaran@ece.sastra.ed

can generate Inverse of a 3×3 matrix in very short time. Out of

Abstract— This study work is basically aimed at designing and the available Matrix Inversion algorithms, two algorithms i.e.,
testing of hardware module to perform inversion operation of a the Adjoint Matrix method and the Cayley-Hamilton Method
matrix in a small time. Here, an approach is made for calculating that can generate inverse in small number of steps are analysed
3×3 matrix inverse. There are many mathematical methods and compared for their co mputational requirements and the
available for performing matrix inversion and out of them a Adjoint method has been chosen among them and suitably
suitable method, like Adjoint Matrix Method is selected by transformed the method to VHDL code that is used to reali ze
analysing the computational requirements. The mathematical
the hardware design [3] [7].
method of calculating the inverse of matrix is then suitably
converted into VHD L code. The code is then tested for simulation
using a set of test matrices. After simulation is verified by II. THE A DJOINT M AT RIX M ET HOD
checking the results of test inputs the code is tested for
This section deals with the approach of the Adjoint matrix
synthesizability. After the synthesizability is verified then it is method to calculate the inverse of the matrix.
finally tested for hardware verification by dumping into FPGA.
Altera’s DE1 board which consists a Cyclone-II series FPGA
EP2C20F484C7 FPGA is used for this study. The test inputs can A. Definition
be fed in either by using on board GPIO or UI or S RAM. The Let A be a non-singular n × n matrix. Then the inverse of A
outputs are taken the same way either by GPIO or UI or written is given by
to S RAM an d are then to be verified by comparing with actual
results. A -1 = (1/det(A))*(adjA) (1)
Adjoint Matrix: If A is an n × n matrix, and C is the
Keywords- Adjoint method, Altera DE1, FPGA, Matrix
associated matrix of cofactors, the transpose CT of the matrix
inversion, VHDL.
of cofactors is called the adjoint of A and is written adjA .

I. INT RODUCT ION B. Inversion of 3x3 matrices

Matrix inversion is a common operation in many signal Computationally efficient 3x3 matrix inversion is given by
processing problems, including most block adaptation methods
used in adaptive communication systems. If the computation a b c 1
time is too high to calculate the inverse of the matrix, then the A -1 = (2)
d e f
performance of signal processing algorithm may degrade. In g h i
general, the inverse matrix calculation is a tedious one
compared to the addition, subtraction, mult iplication and a b c T
division of matrices [1]. = 1 / (det(A))
d e f
Although there are many algorith ms available for Matrix g h i
Inversion, each of the algorithms has their own advantages and
disadvantages which may also vary with respect to order of the a d g
Matrix. In addition, realization of these algorithms through = 1 / (det(A))
b e h
hardware is a typical task. Although many DSP processors can c f i
handle this task of Inversion by using special functional
modules available within them, the time required for where the determinant of A can be co mputed by applying
generating the Inverse is sometimes huge [2]. the rule of Sarrus as follows,
So, in this scenario it can be observed that the demand is det (A) = a(ei-fh) – b(id-fg) + c(dh-eg) (3)
for hardware dedicated to generate the inverse of a given
matrix with in short time and with accurate results. Realizing If the determinant is non-zero, the matrix is invertible, with
such necessity, an attempt is made to design a hardware that the elements of the above Cofactor matrix on the top given by,

978-1-4799-4190-2/14/$31.00 ©2014 IEEE 87

2014 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT)

A = (ei – fh) B = (fg – di) C = (dh – eg) x) After performing the above task, convert all the
outputs to bit vectors and concatenate the outputs of division in
D = (ch – bi ) E = (ai – cg) F = (gb – ah) proper order with r in increasing order from left to right.
G = (bf – ce) H = (cd - af) K = (ae – bd)
xi) Then concatenate a sign bit at the left most position to
indicate the sign of output which is decided according to sign
C. Computational Requirements of inputs.
For an n×n matrix this method involves 2n 2 +n
The above steps are to suitably transferred in to a VHDL
Multiplications, n2 Div isions, and n2 +n+1 Arithmetic
operation. For a 3×3 Matrix Inversion we have n = 3 and hence code segment [4] [8]. For examp le, let us take the case of 8bit
we need 21 Multipliers, 09 Dividers, and 11 Arithmetic blocks. signed data as inputs to our module. Divisor is 8, Dividend is
10, Precision 3 digits. Output would be 1.250. Output of the
The above requirements are compared with those of Cayley Module would be in the following form,
– Hamilton Method requirements., 45 Multipliers, 09
Divisions, 20 Arith metic blocks (analysis not presented here) 0 0000001 0010 0101 0000
and are found to be minimu m of the two and hence we have
proceeded with this method.


The above method of calculating the Matrix Inversion BCD form Fraction Part
basically requires calculation of Adjoint matrix elements,
determinant of the matrix and division of the Adjoint matrix Value of Integer part in binary
elements with Determinant of the matrix. In addition, to Sign bit („1‟ if minus)
facilitate the data flow in and out we have included Mux and
De-Mu x modules [4] [5]. We will describe these
B. The Adjoint Matrix Elements
implementations in brief.
Fro m the description given on Adjoint Matrix Method it
A. The Division Module (Core Module) can be seen that the elements of the Adjoint Matrix i.e.,
transpose of cofactor element matrix can be directly obtained
The Adjoint Matrix model requires dividing the elements of by criss – cross method.
the adjoint matrix with the determinant of the given matrix.
But, since FPGA doesn‟t support floating point operations Recalling the process of calculating elements of Adjoin t
there is a need to design a division module that can give output Matrix we have
similar to floating point division without actually doing the
A = (ek – fh) B = (fg – dk) C = (dh – eg)
floating point division. Hence this is treated as the core module
of the design [6] [7]. Hence a small technique is used in D = (ch – bk) E = (ak – cg) F = (gb – ah)
designing the division module so that it can give the output
similar to floating point division output without actually G = (bf – ce) H = (cd - af) K = (ae – bd)
performing floating point division. a b c a d g
where is the co-factor matrix and is
The technique can be put into following steps: d e f b e h
i) Convert the Dividend and Divisor to positive numbers. g h i c f i
the Adjoint Matrix obtained by transposing the above matrix.
ii) If the Divisor is zero convert it to one to avoid divide by This calculation of Adjoint Matrix elements are suitably
zero error. transferred to VHDL code [6] [8].
iii) Declare some variables say fq r, p r where r=1, 2...
C. The Interface Model
iv) Divide the dividend by divisor by calling the division
Since we are operating on a 3×3 matrix we have to give 9
function from library.
elements as the input and take nine elements at the output. Let
v) Assign the division result to fq r. us consider that we are g iving all the inputs directly and taking
all the outputs directly, in this case we need at the least 261
vi) Obtain the remainder of div ision by calling the modulo pins for data i/o alone. In addition we may need additional pins
function from the library.
for user control signals.
vii) Assign this value to p r. Using this many i/o pins is not a desirable feature for the
viii) If the value of p r is less than divisor, multiply p r by 10. design. Hence an alternative way of giving the inputs and
taking the outputs is implemented.
ix) Repeat steps iv to viii as many times as the precision
required incrementing r every time. In this method we use a de-multip lexer at the input port
which distributes and latches the input into internal architecture
and at the output we use a multip lexer which selects fro m the
available outputs from the internal architecture and gives it out.

978-1-4799-4190-2/14/$31.00 ©2014 IEEE 88

2014 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT)

This way there is a tremendous reduction in the consumption of

i/o pins. This utilizes only 15 pins at the input side and 21 pins
at the output side for 8bit signed data. VHDL implementations
of Mux and De-Mux are used in the design [6] [10].

D. The Complete Design

To control the operations of data input and data output and
to prevent improper results which may occur due to
incompletely loaded inputs separate enable signals are used to
enable data input, data output and to enable processing the
given data. In addition a signal is used to indicate the Fig 3. T he Multiplexer module
occurrence of singular matrix which goes high when the
determinant is zero. This is how the mathematical model is To account for the major cases 3 test inputs are considered
transformed to a VHDL Design [7] [11]. We will describe the for testing the complete design. They are i) Identity Matrix, ii)
simulation, synthesizability and hardware testing in the next Singular Matrix and iii) Non- Singular Matrix. The final results
sections. are shown in Figure 4.


The VHDL code implemented using Modelsim simu lation
tool and synthesis results are obtained using Quartus II
software with Altera DE1 FPGA EP2C20F484C7.

A. Simulation using Modelsim

The individual modules are simulated using some random
choice of inputs while the comp lete design is tested using well
defined input forms like the identity matrix, a singular matrix
and a non-singular matrix data. We present here some screen
shots of the simu lation waveform results of various modules
separately and of complete design. The division module, the
de-mu ltip lexer module and the mu ltiplexer module results are
shown in Fig. 1,2, and 3 respectively.

Fig. 1. T he Division Module

Fig. 4. Simulation results of final module

B. Synthesis Results using Quartus-II

All the designs that pass simulation test cannot be
considered to be feasible for hardware realization. To test
whether the design can be realised in hard ware we need to test
for synthesizability. Altera‟s Quartus-II is one such tool that
can be used to test the synthesizability of design [11] 12]. The
Fig. 2. T he De-multiplexer module sysnthesis results are presented below for 8bit (integral input
data) design. Table I shows the compilation report along with
resources utility summary report for the device Altera DE1
FPGA EP2C20F484C7 [13] [14].

978-1-4799-4190-2/14/$31.00 ©2014 IEEE 89

2014 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT)

which is used in many signal processing algorithms as

T ABLE I. Compilation with Utility Summary Report well as in both adaptive and conventional communication
systems. The inverse of the mat rix would be much useful in
Resources Utility
the applications like adaptive signal processing. They are
Logic elements 5,501 / 18,752 (29%) much useful mo re specifically in Wiener filters in which the
Logic registers 80 / 18,752 (<1%) inverse of matrix consume a large computational work.
I/O Pins 36 / 315 (11%) ACKNOWLEDGEM ENTS
Embedded Multipliers 24 / 52 (46%) Th is project study is done under the guidance of Prof.
Nagarajan Raju, Dept. of E.C.E., School of Electrical and
C. Implementation and Verification of Design in Altera DE-1 Electronics Engineering, SASTRA University and all the
Baord software and hardware requirements were made available in
Though the onboard SRAM can be used to store the Data, the Altera Lab of the same Department.
to read input from and to write output to, to make verification
process easier, Toggle switches, LEDs and Push buttons are
used. To be able to imp lement the design through these things [1] Dr. B.S.Grewal, “Higher Engineering Mathematics”, 4 th Edition, Khanna
Publishers, India, 2007.
alone the original design is reduced to a new form by changing
[2] Alan Jeffrey, “Advanced Engineering Mathematics”, Harcourt Press,
the data length alone keeping all other parameters as such. USA, 2005.
The new design uses a data length of 4-bits for input and [3] Erwin Kreisig, “Advanced Engineering Mathematics”, 9 th Edition,
13-bits for output. The user interface is defined as follows: Wiley India, 2011.
i) 4 toggle switches as Data inputs: [4] Gary K.Yeap, Practical low power digital VLSI design, Kluwer
d3(SW3) d2(SW2) d1(SW1) d0(SW0) Academic Publishers, 1998.
ii) 4 toggle switches as select lines: [5] Robert F.Pierret , “Semiconductor Device Fundamentals”, Pearson
s3(SW7) s2(SW6) s1(SW5) s0(SW4) Education, 2008.
iii) 2 toggle switches as I/O enables: [6] Neil Weste, CMOS VLSI Design, a circuits & systems perspective, 2 nd
Edition, Addison-Wesley, 2004.
ie(SW9) oe(SW8)
[7] John P.Uyemura, Introduction to VLSI circuits and systems, John Wiley
iv) 1 push button as process enable: pe(KEY[0]) & Sons, 2002.
v) 14 LEDs as output display: [8] J.Bhasker, “A VHDL Primer”, 3 RD Edition, B.S.Publications, India,
6 Green LEDs (G5 downto G0) 2010.
[9] Volnei A.Pedroni, “Digital Electronics and Design with VHDL”,
G3 to G0 represent 4 b its of value of left of decimal point; Morgan Kaufmann Publishers, 2008.
R9 to R2 represent the value of right of decimal point; G4 [10] Randall L.Geiger, Phillip E.Allen and Noel R.Strade, “ VLSI Design
represents sign bit; and G5 represents singular matrix T echnqiues for Analog and Digital Circuits”, Tata Mc-Graw Hill, 2010.
indicator. A set of 3 test inputs are fed as input and the [11] V Elamaran, Kalagarla Abhiram, and Narredi Bhanu Prakash Reddy,
corresponding outputs are verified. “ FPGA Implementation of Audio Enhancement using Xilinx System
Generator”, Journal of Applied Sciences, vol.14, no.17, pp.1972-1977,
V. M AJOR LIMIT AT IONS AND CONCLUSIONS [12] Stephen Brown, Zvonko Vranesic, Fundamentals of Digital Logic with
VHDL Design, T MH, 2002.
The Adjoint Method of Matrix Inversion is very
[13] V Elamaran, R Vaishnavi, A Maxel Rozario, Slitta Maria Joseph, and
efficient method for lower order matrices. As the order of Aylwin Cherian, “CIC for Decimation and Interpolation using Xilinx
the matrix increases the computational requirements and System Generator”, Proc. of the International Conference on
time increase considerably and hence is not suggestible for Communication and Signal Processing (ICCSP), 2013, pp. 622-626.
higher order matrices. The present design can take only [14] V.Elamaran, Kalagarla Abhiram and Narredi Bhanu Prakash Reddy,
integral data input with signed 2‟s co mplement fo rm “FPGA Implementation of Audio Enhancement using Xilinx System
Generator”, Journal of Applied Sciences, vol. 14, no. 17, pp.1972-1977,
whereas the output is defined in another format and hence 2014.
the property (A -1 )-1 = A cannot be verified, in the sense that
this design can operate only on integral data.
The output format may have to be changed to other
required formats and this may lead to use of addit ional
hardware. As the data length of input increases data length
of output increases by same number of bits i.e., if there is
need to increase n bits at the input there should be an
increase of n bits at the output also..
The present st udy is an approach to design hardware
for Matrix Inversion and can serve the same purpose to
certain extent. However, it provides a good examp le of
designing a dedicated hardware for such typical operations

978-1-4799-4190-2/14/$31.00 ©2014 IEEE 90