Вы находитесь на странице: 1из 13

Universal Journal of Electrical and Electronic Engineering 6(1): 1-13, 2019 http://www.hrpub.

org
DOI: 10.13189/ujeee.2019.060101

An Optimization Design Strategy for


Arithmetic Logic Unit
Jitesh R. Shinde1,*, Shilpa J. Shinde2

1
Department of Electronics and Communication Engineering, Vaagdevi College of Engineering, India
2
M.Tech in Electronics Engineering, Nagpur, India

Copyright©2019 by authors, all rights reserved. Authors agree that this article remains permanently open access under
the terms of the Creative Commons Attribution License 4.0 International License

Abstract The work in this paper presents a step by step causes a new adder to be built. This repetition of hardware
optimization approach for the Arithmetic Logic Unit (ALU) increases the area of design. In contrast, with resource
at the logic circuit level. Herein concept of resource sharingsharing, several VHDL + operations can be implemented
(viz. operator sharing, functionality sharing), the concept with a single adder to reduce the amount of hardware
of optimized arithmetic expressions (viz. arranging required. Also, different operations such as ‘+’ and ‘–’ can
expression trees for minimum delay, sharing common be assigned to a single adder or subtracter to reduce a
subexpression, merging cascaded adders with carry) for design’s circuit area further [11, 12]. The resource sharing
optimization of combinational blocks in ALU had been techniques used in the optimization of the ALU in this
used. The work in this paper shows how a simple tools like paper are operator sharing & functionality sharing.
Deeds Digital Circuit Simulator (open source) or Aldec’s Operator sharing is a resource sharing technique to
Active HDL in combination with synthesis tool which can reduce the overall size of synthesized hardware. If the same
be used as effective teaching resource to teach concept of operator is used in several different expressions, it can be
digital circuit design and thereby provides a vision to shared. The sharing is done by routing the proper data to or
beginners how to start with VLSI project in VLSI digital from this particular operator via multiplexing circuits.
domain and make it to a successful end. Operators can be shared in mutually exclusive branches by
proper routing of the input operands and/ or result. It is
Keywords Arithmetic Unit, Logical Unit, Arithmetic more beneficial for complex operators. The merit of
Logical Unit (ALU), Resource Sharing, Operator Sharing sharing and the degree of saving depend on the relative
complexity of the multiplexing circuit and the operator.
However, sharing normally faces evaluation of the Boolean
expressions and evaluations of the operators in cascade and
this may introduce extra propagation delay.
1. Introduction Functionality sharing is a resource sharing technique. In
An arithmetic logic unit (ALU) is a combination of a large, complex digital system, such as a processor, an
various digital circuits merged together to execute data array of functions is needed. Some functions or operations
processing instruction (i.e. arithmetic & logical) in the may be interlinked or may have some common
central processing unit (CPU) of any processor or functionality. If such common functions are implemented
microcontroller or computer. ALU is basically a by a common circuit, then the approach is referred as
multi-function combinational digital logic circuit which functionality sharing. Example of functionality sharing can
requires one or two operands upon which it operates and be an implementation of subtractor block using adder block
produces the result. using the concept of 2’s complement arithmetic [1, 2].
The timing response of ALU depends on complexity & Optimized Arithmetic Expressions methodology uses
manner in which the circuit is designed. There are various the properties of arithmetic operators (such as commutative
ways by which the circuit in HDL (Hardware Description & associative properties of addition) to rearrange an
Language) can be implemented. Some such ways are expression so that it results in an optimized implementation.
resource sharing & optimized arithmetic expressions. The three forms of arithmetic optimizations are arranging
Resource sharing diminishes the amount of hardware expression trees for minimum delay, merging cascaded
required to implement HDL operations. Without resource adders with a carry, sharing common sub-expressions [1].
sharing, each HDL operation is built with separate circuitry. ALU being a simple entity to understand for any
For example, every ‘+’ with noncomputable operands beginner working in VLSI domain, hence it has selected as
2 An Optimization Design Strategy for Arithmetic Logic Unit

a case study in the work presented in this paper. arithmetic block. The input logic for carry-in ‘Cin’ & ‘B’
input terminal of each full adder had been decided as per
table (2.2). Depending on the input combination available
2. ALU Design & Implementation: on the select lines (i.e. S2, S1 & S0) of 8:1 multiplexer,
Type I only selected full adder output viz. sum & carry-out will be
available at the output of multiplexer. For example, if
Before starting with any circuit design we have to think S2S1S0Cin = “0000”, then ALU will execute transfer ‘A’
about the formal specification of the circuit. In other words, operation (table 1.2). Herein first full adder output
how a general or traditionally design ALU in CPU blocks (leftmost full adder block) will be available at the output ‘Y’
works, what functions or instructions it executes and how (figure 2.1). For A= ‘0’ then output ‘Y’ = ‘0’ and for A= ‘1’
many inputs and outputs it should have & what basic then output ‘Y’ = ‘1.’
building blocks are there in the circuit to execute the given In addition to this, one 2:1 multiplexer will also be
instructions [2]. The basics of ALU design are available in required to decide at given instance of time whether ALU
the literature mentioned in the references [3, 4, 5, 6, 7 & 8]. is implementing arithmetic or logical function. If select bit
Basically, an ALU should be capable of executing ‘Si’ (S3 in figure 2.1) of multiplexer equals to‘1’, then
following arithmetic & logical functions as listed in tables ALU will perform logical operation otherwise it will
2.1 & 2.2. perform arithmetic operations.
The tables 2.2 suggests that one logical block having Thus, combinational digital blocks to implement an
some logical gates (four) with 4:1 multiplexer will be ALU in traditional approach are two 8:1 multiplexer, one
required to implement logical operations AND, OR, XOR 2:1 multiplexer, 8 one-bit full adder block & 4 two-input
and NOT in ALU to get an output of logical block. logic gates, two NOT gates along with some constants
The table 1.2 suggests that one arithmetic block (logic high or logic low) to get final output ‘Y’ of one bit
consisting of 8 instances of one-bit full adder block with ALU.
two 8:1 multiplexer will be required to implement The logical circuit diagram was implemented in Deeds
operations Addition, Subtraction, Increment, Decrement circuit simulator & using Aldec’s Active HDL tool with
and Passing of Input signal and thereby getting an output of Altera’s (Intel) Quartus synthesis tool.

Table 2.1. Listing logical operations performed by ALU

Function Select Output


Operation
S1 S0 Cin Y
0 0 X Y=A OR B OR
0 1 X Y=A AND B AND
1 0 X Y=A XOR B XOR
1 1 X Y=A’ NOT
Note : A’ indicates Abar & Cin =X i.e. don’t care

Table 2.2. Listing arithmetic functions performed by ALU

Function Select Inputs to Adder Output


Operation
S2 S1 S0 Cin B Y
0 0 0 0 0 Y=A Transfer A
0 0 1 1 0 Y=A+1 Increment A
0 1 0 0 B Y=A+B Add B to A
0 1 1 1 B Y=A+B+1 Add B to A plus 1
1 0 0 0 B’ Y=A+B’ Add 1’s complement of B to A
1 0 1 1 B’ Y=A+B’+1 Add 2’s complement of B to A
1 1 0 0 1 Y=A-1= A’ Decrement A
1 1 1 1 1 Y=A Transfer A
Universal Journal of Electrical and Electronic Engineering 6(1): 1-13, 2019 3

Figure 2.1. Circuit schematic of 1-bit ALU (Deeds DCS)

Figure 2.2. Flow summary of entity ‘alugen’


4 An Optimization Design Strategy for Arithmetic Logic Unit

Figure 2.3. Propagation delay summary of entity ‘alugen’ (delay in nsec)

Figure 2.4. Power consumption of entity ‘alugen’

Figure 2.5. Output waveform of entity ‘alugen’


Universal Journal of Electrical and Electronic Engineering 6(1): 1-13, 2019 5

3. ALU Design & Implementation: by 8:1 multiplexer i.e. for example: for S2S1S0= “000”,
carry-in of full adder block is connected to logic ‘0’.
Type II Similarly, the input logic for input terminal ‘B’ of full
Herein this approach, the concept of resource & operator adder (as per table 2.1) is decided by 8:1 multiplexer i.e. for
sharing had been used to optimize the ALU. Herein, only example: for S2S1S0= “000”, input terminal ‘B’ of full
one full adder block had been used to implement arithmetic adder block is connected to logic ‘0’. Rest of the working
operations given in table 2.1. The input logic for input of the circuit is same as discussed in previous section.
terminal carry-in of full adder (as per table 2.2) is decided

Figure 3.1. Circuit schematic of entity ‘alugenopt’

Figure 3.2. Flow summary of entity ‘alugenopt’


6 An Optimization Design Strategy for Arithmetic Logic Unit

Figure 3.3. Propagation delay summary of entity ‘alugenopt’ (delay in nsec)

Figure 3.4. Power consumption summary of entity ‘alugenopt’

Figure 3.5. Output waveform of entity ‘alugenopt’

4. ALU Design & Implementation: The approach used in the realization of the final
optimized logic circuit shown in figure 4.6 is illustrated
Type III with the help of figures from 4.1 to 4.5 and tables from 4.1
The general ALU architecture that can be inferred from to 4.6 respectively.
section II is given in figure 4.1.
The strategy used in type III ALU optimization is based Arithmetic Unit Realization
on the concept on resource sharing and concept of
optimized arithmetic expressions. In table 4.1, input literal ‘X’ of full adder block had been
In type III strategy, all arithmetic, as well as logical kept fixed. Then, depending on various operations that an
instructions are realized using one full adder block only. arithmetic block needs to execute, input literal ‘Y’ is
Universal Journal of Electrical and Electronic Engineering 6(1): 1-13, 2019 7

decided. Finally, SOP (sum of product) logical equation for and then table 4.2 had been realized. Accordingly, the
literal ‘Y’ in an optimized way is realized using 2:1 logical unit shown in figure 4.5 had been realized. The
multiplexer (figure 4.4). The process is illustrated concept of K-map here used to find the logical equation for
diagrammatically in figure 4.2, 4.3 and 4.4 respectively. literal ‘X’ for corresponding gates & then final SOP (sum
of product) equation for ‘X’ as shown in figure 4.5.
Logical Unit Realization The final block diagram of an optimized 1-bit ALU is
shown in figure 4.6.
To find the equation for input literal ‘X’ of full adder The logical circuit diagram implemented in Deeds DCS
block, first, the logic equation for literal ‘X’ for each gate is circuit simulator & waveform, flow summary, propagation
computed using the concept of K-map wherein SOP (sum delay report & power consumption analysis report obtained
of product) equation for literal ‘X’ has been computed as a from Altera’s (Intel) Quartus tool are shown in figure 4.7 to
function of literal viz. ‘A’ & ‘B’ respectively. This process 4.11 respectively.
had been illustrated in tables from 4.3 to 4.6 respectively

Figure 4.1. Block diagram of 1 bit ALU

Figure 4.2. Arithmetic unit logic in an optimized 1-bit ALU using 4:1 multiplexer
8 An Optimization Design Strategy for Arithmetic Logic Unit

Figure 4.3. Arithmetic unit logic in an optimized 1-bit ALU using 2:1 multiplexer

Figure 4.4. Arithmetic unit logic in an optimized 1-bit ALU using 2:1 multiplexer

Table 4.1. Truth Table for Arithmetic Unit for an optimized 1-bit ALU

S2 S1 S0 Cin X Y Operation
0 0 0 0 A 0 Transfer A
0 0 0 1 A 0 Increment A
0 0 1 0 A B Add B to A
0 0 1 1 A B Add B to A plus 1
0 1 0 0 A B’ Add 1’s complement of B to A
0 1 0 1 A B’ Add 2’s complement of B to A
0 1 1 0 A 1 Decrement A
0 1 1 1 A 1 Transfer A

Table 4.2. Truth Table for Logical Unit

S2 S1 S0 Cin X Y Operation
1 0 0 x A+B 0 OR
1 0 1 x A’B B AND
1 1 0 x A’ B’ XOR
1 1 1 x B 1 NOT
Universal Journal of Electrical and Electronic Engineering 6(1): 1-13, 2019 9

Table 4.3. Truth Table for finding input ‘X’ for ORing operation i.e. X (A, B) = A +B

A B X Y=0 Operation
0 0 0 0 0
0 1 1 0 1
1 0 1 0 1
1 1 1 0 1

Table 4.4. Truth Table to find input ‘X’ for ANDing operation i.e. X (A, B) = A’ B

A B X Y=B Operation
0 0 0 0 0
0 1 1 1 0
1 0 0 0 0
1 1 0 1 1

Table 4.5. Truth Table to find input ‘X’ for XORing operation i.e. X (A, B) = A’

A B X Y=B’ Operation
0 0 1 1 0
0 1 1 0 1
1 0 0 1 1
1 1 0 0 0

Table 4.6. Truth Table to find input ‘X’ for NOT operation i.e. X (A, B) = B

A B X Y=1 Operation
0 0 0 1 1
0 1 1 1 0
1 0 0 1 1
1 1 1 1 0

Figure 4.5. An optimized logical block for 1-bit ALU


10 An Optimization Design Strategy for Arithmetic Logic Unit

Figure 4.6. An optimized 1-bit ALU

Figure 4.7. Circuit schematic of 1 bit optimized ALU ‘aluopt’

Figure 4.8. Flow summary of 1 bit optimized ALU implemented on Intel /Altera DE0 (Quartus 2)
Universal Journal of Electrical and Electronic Engineering 6(1): 1-13, 2019 11

Figure 4.9. Propagation Delay Summary of 1 bit ALU implemented on Intel /Altera DE0 (Quartus 2)

Figure 4.10. Power consumption summary of 1 bit ALU implemented on Intel /Altera DE0 (Quartus 2)

Figure 4.11. Output waveform of 1bit optimized ALU


12 An Optimization Design Strategy for Arithmetic Logic Unit

5. Optimized Approach Verification approach suggested in this paper.


Further, same approach if used for implementing higher
The comparison of flow summary (figure 2.2, 3.2 & 4.8) order ALU can result in efficient realization at VLSI
indicates that the number of elements required in frontend level.
implementation can been reduced in entity ‘aluopt’ Further, optimization of full adder block using fast adder
comparison to entity ‘alugen’ and ‘alugenopt’ if higher approach like carry look ahead adder, carry select adder,
order of ALU is being realized. carry skip adder may achieve better results & hence more
On observing the propagation delay report of entity better realization of ALU and hence Central Processing
‘alugen’ (figure 2.3), ‘alugen’ (figure 3.3) & entity ‘aluopt’ Unit (CPU) in any processor design [9,10,11].
(figure 4.9), it was found that propagation delay in Also, the ALU design approach suggested in this paper if
considerably less in ‘aluopt’ in comparison to ‘alugen’ and implemented at VLSI backend level may provide further
‘aluopt’. The highest propagation delay in ‘alugen’ is 6.12 optimized readings in terms of area, power & delay [8, 9 & 10].
nsec; in ‘alugenopt’ is 6.273 nsec while in ‘aluopt’ is 8.270 Today, there are various tools like Aldec’s Active HDL
nsec. available in market & few open sources like Deeds-DCS
The power analyzer summary comparison (figure 2.4, which diminishes the need to have thorough knowledge of
3.4 & 4.10) shows considerable power savings in ‘aluopt’ HDL (Hardware Description Language). Just need herein
in comparison to ‘alugen’ can be obtained. The power is to have knowledge of digital circuits design.
readings obtained here depend on input test vector The results and conclusions presented in this paper gives
combination taken. line of sight to beginners working in VLSI domain about
No benchmark work with similar formal specification of the how to think about formal specification of any VLSI
one-bit ALU is available. So, comparison of design based project & accordingly what methodology & hence
strategy suggested in this paper cannot be done. tool should be selected for design & implementation.
Table 5.1. Power comparison of ALU (Power in milliWatt)

Entity Name ALUgen ALUGen opt Aluopt Conflicts of Interest


Total Thermal Power
105.42 101.46 103.99
Dissipation The authors have no conflicts of interest to declare.
Core Dynamic
Thermal Power 0.76 0.63 0.37
Dissipation
Core Static Thermal
51.8 51.8 51.8
Power Dissipation
I/O Thermal Power
52.85 49.04 51.82
REFERENCES
Dissipation
[1] cseweb.ucsd.edu/~hepeng/cse143-w08/labs/ VHDL Referen
ce / 09.pdf.

6. Conclusions [2] Sabih Greez, “Algorithms for VLSI Design Automation”,


John Wiley & Sons, Reprint 2000 edition, Clark T. Merkel,
The empirical measurements quantifying the gap “A Matlab-Based Teaching Tool for Digital Logic”,
Mechanical Engineering, Rose-Hulman Institute of
between VLSI implementation of ‘alugen’, ‘alugenopt’ & Technology, “Proceedings of the 2004 American Society for
‘aluopt’ has been presented in this paper. Engineering Education Annual Conference & Exposition,
In ‘alugen’ entity no concept of resource sharing had American Society for Engineering Education, 2004.
been used. Hence, for each corresponding arithmetic
[3] A.P. Godse, D.A Godse, “Digital Electronics”, Third
instruction execution in ALU block, one full adder block is Revised Edition,2008.
required. On other hand in entities ‘alugenopt’ and ‘aluopt’
because of concept of resource sharing i.e. operator sharing [4] Greenfield, Joseph, "Practical Digital Design Using ICs",
and functionality sharing only one full adder block along pub. by J. Wiley & Sons.
with few multiplexer is required for implementation of [5] Jan M. Rabaey, Digital Integrated Circuits, Upper Saddle
arithmetic instruction in ALU. River, NJ: Prentice Hall, 1996.
Moreover, due to use of concept of optimized arithmetic
[6] William Stallings, Computer Design and Architecture,
expressions (viz. arranging expression trees for minimum Upper Saddle River, NJ:Prentice Hall, 1996.
delay, sharing common subexpression, merging cascaded
adders with carry) along with resource sharing entity [7] Douglas Perry, “VHDL Programming by Example”, Fourth
‘aluopt’ is better in terms of area requirement in compare to Edition, TataMcGraw-Hill, Eighth Reprint, 2006.
entity ‘alugenopt’. [8] N.H Weste, David Haris & A.Banrjee, “CMOS VLSI Design,
On comparing the results obtained at frontend VLSI A Circuit & System Perspective”, Third Edition, Pearson
design, it is found that better saving in area or resource Education.
utilization, delay & power can be obtained through the [9] N.Ravindran, R.Mary Lourde, “ An Optimum VLSI Design
Universal Journal of Electrical and Electronic Engineering 6(1): 1-13, 2019 13

Design of a 16-bit ALU”, IEEE International Conference on


Information & Communication Technology Research
(ICTRC), DOI: 10.1109/ICTRC.2015.7156419, Abu Dhabi,
United Arab Emirates July, 2015.
[10] Prashant Gurjar, Rashmi Solanki & Pooja Kansliwal, “ VLSI
Implementation of adders for high speed ALU”, IEEE
Annual India Conference Indicon, Hyderabad, India, DOI:
10.1109/INDCON.2011.6139396 , December, 2011.
[11] Sanjeev Sharma, “FPGA Implementation of 1-bit ALU”,
International Conference on Information, Communication
and Embedded System(ICICES), Chennai, India, 25-26
February, 2012.

Вам также может понравиться