project for m.tech with electrical & electronics

© All Rights Reserved

Просмотров: 17

project for m.tech with electrical & electronics

© All Rights Reserved

- IJAIEM-2013-07-16-049
- Generations of Computer
- 10.1.1.54.9429
- ME-PED - 2010 - Syllabus
- Pet Essentials 2013
- Micropower High Side MOSFET Drivers
- Bicmos Technology Deepika
- SDM_batch.4
- cmos vlsi LEC-1
- Analysis and Simulation of Gate Leakage Current in P3 SRAM Cell at Deep-Sub-Micron Technology for Multimedia Applications
- vlsi design
- EI332 Linear and Digi Inted Circuits Nov Dec 2007
- lp-4
- 50920125 (1)
- nec ir protocol
- 1402.3309
- Old
- Mr.R.benschwartz Technology Related Issues in Manufacturing
- 7. Design Methodology Part 1
- buk104

Вы находитесь на странице: 1из 83

Delay

A Project Report

submitted in partial fulfillment of the requirements for the award of the degree of

MASTER OF TECHNOLOGY

in

VLSI & EMBEDDED SYSTEMS

by

G.BHAGYA SRI (13MK1D6805)

under the esteemed guidance of

Prof. P.BALA MURALI KRISHNA

SRI MITTAPALLI INSTITUTE OF TECHNOLOGY FOR WOMEN

(Approved by AICTE, New Delhi & Affiliated to JNTU, Kakinada)

NH-5, TUMMALAPALEM, GUNTUR-522233, A.P.

2013-2015

(Approved by AICTE, New Delhi & Affiliated to JNTU, Kakinada)

NH5, TUMMALAPALEM, GUNTUR-522233, A.P.

CERTIFICATE

This is to certify that a project report entitled CSLA IMPLEMENTATION

TECHNIQUE TO MINIMIZE THE AREA, POWER AND DELAY being submitted by

GUTTIKONDA BHAGYA SRI (13MK1D6805) in partial fulfillment of the requirements for

the award of the degree of Master of Technology in VLSI & EMBEDDED SYSTEMS to

Jawaharlal Nehru Technological University, Kakinada, during the year 2013-2015 of SRI

MITTAPALLI INSTITUTE OF TECHNOLOGY FOR WOMEN, GUNTUR.

PROJECT GUIDE

Prof. P.Bala Murali Krishna

Professor

Department of ECE

SMITW

G. Suseelamma

Associate Professor

Department of ECE

SMITW

EXTERNAL EXAMINER

ACKNOWLEDGEMENT

M.B.V.Satyanarayana, Secretary and Correspondent of Sri Mittapalli Institute of Technology for

Women, Guntur for providing dexterities to carry out this project.

It gives us an honor to express my deep sense of gratitude and to our principal and project

guide Prof P.Bala Murali Krishna, Department of ECE, Sri Mittapalli Institute of Technology

for Women, Guntur for his valuable guidance, constant encouragement, and for every scientific

and personal concern throughout the course of investigation and successful completion of this

work.

I wish to extend my sincere thanks to G.Suseelamma, Head of the Department of ECE, Sri

Mittapalli Institute of Technology for Women, Guntur for her constant support, encouragement

and enabling us to do a work of this magnitude.

Our sincere thanks to teaching and non-teaching staff members of ECE Department of Sri

Mittapalli Institute of Technology for Women, Guntur.

Lastly, I bow to my affectionate Parents for their love and blessings, which has sustained me a

lot in completing this project work successfully.

BY

G. BHAGYA SRI

(13MK1D6805)

CONTENTS

TITLE

ABSTRACT

Page No

I

LIST OF FIGURES

II & III

LIST OF TABLES

IV

CHAPTER 1: INTRODUCTION

1.2 Objective

2.1 CMOS Technology

8

12

15

15

16

17

19

20

21

23

24

3.2 Operation

27

29

31

32

33

34

36

37

37

38

3.5.5 Multiplexer

39

44

46

4 .1 Performance Evaluation

48

48

48

49

50

52

53

54

56

57

58

61

61

4.4 Applications

63

4.5 Advantages

64

65

5.1 Conclusion

65

65

REFERENCES

66 & 67

ABSTRACT

With the advancements in semiconductor technology, there has been an increased emphasis in

low-power design techniques over the last few decades. Reversible computing has been proposed

by several researchers as a possible alternative to address the energy dissipation problem. This

project describes the design of Mach Zehnder Interferometer and reviews its applications in

emerging optical communication networks. Mach Zehnder Interferometer is used to measure

relative phase shift between two collimated beams from a coherent light source. Using the basic

principle, a number of devices was designed, few of these such as optical sensors, all-optical

switches, optical add-drop multiplexer and implementation of sum function is discussed in this

project.

LIST OF FIGURES

Page No

14

29

30

31

31

32

(b) The logic operations of the RCA is shown in split form

Fig. 3.5 Structure of the BEC-based CSLA; n is the input operand bit-width

34

Fig. 3.6 (a) Proposed CS adder design, where n is the input operand bit-width

34

34

34

34

34

34

36

Fig. 3.8 A Carry Select Adder with 1 level using n/2- bit RCA

37

37

Fig. 3.10 Proposed SQRT-CSLA for n = 16. All intermediate and output

46

Fig. 4.1 (a) Simulation Waveform Result of 8-bit Ripple Carry Adder

48

49

49

50

II

50

51

52

Fig. 4.4 (a) Simulation Waveform Result of 16-bit Ripple Carry Adder

52

53

53

54

54

55

56

Fig. 4.7 (a) Simulation Waveform Result of 32-bit Ripple Carry Adder

56

57

57

58

58

59

60

III

LIST OF TABLES

NAME OF THE TABLE

Page No

31

39

41

45

45

45

46

48

49

51

51

52

53

55

55

56

57

59

59

62

62

62

63

IV

CHAPTER 1

INTRODUCTION

This chapter introduces the concepts such as introduction of VLSI, objective, existing

system proposed systemand the project outline.

VLSI Design presents state-of-the-art papers in VLSI design, computer aided design,

design analysis, design implementation, simulation and testing. Its scope also includes

papers that address technical trends, pressing issues, and educational aspects in VLSI

Design. The Journal provides a dynamic high quality international forum for original

papers and tutorials by academic, industrial, and other scholarly contributors in VLSI

Design.

The development of microelectronics spans a time which is even lesser than the

average life expectancy of a human, and yet it has seen as many as four generations.

Early 60s saw the low density fabrication processes classified under Small Scale

Integration (SSI) in which transistor count was limited to about 10. This rapidly gave

way to Medium Scale Integration in the late 60s when around 100 transistors could be

placed on a single chip. It was the time when the cost of research began to decline and

private firms started entering the competition in contrast to the earlier years where the

main burden was borne by the military. Transistor-Transistor logic (TTL) offering higher

integration densities outlasted other IC families like ECL and became the basis of the

first integrated circuit revolution. It was the production of this family that gave impetus

to semiconductor giants like Texas Instruments, Fairchild and National Semiconductors.

Early seventies marked the growth of transistor count to about 1000 per chip called

the Large Scale Integration. By mid-eighties, the transistor count on a single chip had

already exceeded 1000 and hence came the age of Very Large Scale Integration or VLSI.

Though many improvements have been made and the transistor count is still rising,

further names of generations like ULSI are generally avoided. It was during this time

when TTL lost the battle to MOS family owing to the same problems that had pushed

vacuum tubes into negligence, power dissipation and the limit it imposed on the number

of gates that could be placed on a single die. The second age of Integrated Circuits

1

revolution started with the introduction of the first microprocessor, the 4004 by Intel

in 1972 and the 8080 in 1974. Today many companies like Texas Instruments,

Infineon, Alliance Semiconductors, Cadence, Synopsys,Celox Networks, Cisco,

Micron Tech, National Semiconductors, ST Microelectronics, Qualcomm, Lucent,

Mentor Graphics, Analog Devices, Intel, Philips, Motorola and many other firms

have been established and are dedicated to the various fields in "VLSI" like

Programmable Logic Devices, Hardware Descriptive Languages, Design tools,

Embedded Systems etc.In 1980s hold over from outdated taxonomy for integration

levels. Obviouslyinfluenced from frequency bands i.e. HF, VHF and UHF. Sources

disagree on what is measured (gates or transistors)

SSI Small-Scale Integration (0-102)

MSI Medium-Scale Integration (102 -103)

LSI Large-Scale Integration (103 -105)

VLSI Very Large-Scale Integration (105 - 107)

ULSI Ultra Large-Scale Integration (>= 107)

VLSI Technology Inc. was a company which designed and manufactured custom

and semi-custom ICs. The company was based in Silicon Valley, with headquarters

at 1109 McKay Drive in San Jose, California. Along with LSI Logic, VLSI

Technology defined the leading edge of the application-specific integrated circuit

(ASIC) business, which accelerated the push of powerful embedded systems into

affordable products. The company was founded in 1979 by a trio from Fairchild

Semiconductor by way of Synertek - Jack Balletto, Dan Floyd, Gunnar Wetlesen and by Doug Fairbairn of Xerox PARC and Lambda (later VLSI Design) magazine.

Alfred J. Stein became the CEO of the company in 1982. Subsequently VLSI built

its first fab in San Jose; eventually a second fab was built in San Antonio, Texas.

VLSI had its initial public offering in 1983, and was listed on the stock market as

(NASDAQ: VLSI). The company was later acquired by Philips and survives to this

day as part of NXP Semiconductors.

The first semiconductor chips held two transistors each. Subsequent advances

added more and more transistors, and, as a consequence, more individual functions

or systems were integrated over time. The first integrated circuits held only a few

devices, perhaps as many as ten diodes, transistors, resistors and capacitors, making

2

it possible to fabricate one or more logic gates on a single device. Now known

retrospectively as small-scale integration (SSI), improvements in technique led to

devices with hundreds of logic gates, known as medium-scale integration (MSI).

Further improvements led to large scale integration (LSI), i.e. systems with at least a

thousand logic gates. Current technology has moved far past this mark and today's

microprocessors have many millions of gates and billions of individual transistors.

At one time, there was an effort to name and calibrate various levels of large-scale

integration above VLSI. Terms like ultra-large-scale integration (ULSI) were used.

But the huge number of gates and transistors available on common devices has

rendered such fine distinctions moot. Terms suggesting greater than VLSI levels of

integration are no longer in widespread use.

As of early 2008, billion transistor processors are commercially available. This is

expected to become more commonplace as semiconductor fabrication moves from

the current generation of 65nm processes to the next 45nm generations (while

experiencing new challenges such as increased variation across process corners). A

notable example is Nvidia's 280 series GPU. This GPU is unique in the fact that

almost all of its 1.4 billion transistors are used for logic, in contrast to the Itanium,

whose large transistor count is largely due to its 24MB L3 cache. Current designs, as

opposed to the earliest devices, use extensive design automation and automated logic

synthesis to lay out the transistors, enabling higher levels of complexity in the

resulting logic functionality. Certain high performance logic blocks like the SRAM

(Static Random Access Memory) cell, however, are still designed by hand to ensure

the highest efficiency (sometimes by bending or breaking established design rules to

obtain the last bit of performance by trading stability) [citation needed].

What is VLSI?

VLSI stands for "Very Large Scale Integration". This is the field which involves

packing more and more logic devices into smaller and smaller areas.

Simply we say Integrated circuit is many transistors on one chip.

Design/manufacturing of extremely small, complex circuitry using

modifiedsemi-conductor material.

Integrated circuit (IC) may contain millions of transistors, each a few mm in

size.

3

Why VLSI?

Integration improves the design Lower parasitic means higher speed and lower

power consumption and physically smaller. The Integration reduces manufacturing

cost (almost) no manual assembly. The course will cover basic theory and

techniques of digital VLSI design in CMOS technology. Topics include: CMOS

devices and circuits, fabrication processes, static and dynamic logic structures, chip

layout, simulation and testing, low power techniques, design tools and

methodologies, VLSI architecture.

We use full custom techniques to design basic cells and regular structures such as

data path and memory. There is an emphasis on modern design issues in

interconnect and clocking. We will also use several case studies to explore recent

real world VLSI designs (e.g. Pentium, Alpha, PowerPC Strong ARM, etc.) and

papers from the recent research literature. On-campus students will design small test

circuits using various CAD tools. Circuits will be verified and analyzed for

performance with various simulators. Some final project designs will be fabricated

and returned to students the following semester for testing.

Very-large-scale-integration (VLSI) is the process of creating integrated circuits

by combining thousands of transistor based circuits into a single chip. VLSI began in

the 1970s when complex semiconductor and communication technologies were

being developed. The microprocessor is a VLSI device. The term is no longer as

common as it once was, as chips have increased in complexity into the hundreds of

millions of transistors.

The first semiconductor chips held one transistor each. Subsequent advances

added more and more transistors, and, as a consequence, more individual functions

or systems were integrated over time. The first integrated circuits held only a few

devices, perhaps as many as ten diodes, transistors, resistors and capacitors, making

it possible to fabricate one or more logic gates on a single device. Now known

retrospectively as "small-scale integration" (SSI), improvements in technique led to

devices with hundreds of logic gates, known as large-scale integration (LSI), i.e.

systems with at least a thousand logic gates. Current technology has moved far past

this mark and today's microprocessors have many millions of gates and hundreds of

millions of individual transistors.

Applications of VLSI

I. Electronic system in cars.

II. Digital electronics control VCRs.

III. Transaction processing system, ATM.

IV. Personal computers and Workstations.

V. Medical electronic systems.

I. Electronic systems now perform a wide variety of tasks in daily life. Electronic

systems in some cases have replaced mechanisms that operated mechanically,

hydraulically, or by other means; electronics are usually smaller, more flexible, and

easier to service. In other cases, electronic systems have created totally new

applications. Electronic systems perform a variety of tasks, some of them visible,

some more hidden: Personal entertainment systems such as portable MP3 players

and DVD players perform sophisticated algorithms with remarkably little energy.

Electronic systems in cars operate stereo systems and displays; they also control fuel

injection systems, adjust suspensions to varying terrain, and perform the control

functions required for anti-lock braking systems.

II. Digital electronics compress and decompress video, even at high definition data

rates, on the fly in consumer electronics. Low cost terminals for Web browsing still

require sophisticated electronics, despite their dedicated function.

III. Personal computers and workstations provide word-processing, financial

analysis, and games. Computers include both central processing units and specialpurpose hardware for disk access, faster screen display, etc.

IV.Medical electronic systems measure bodily functions and perform complex

processing algorithms to warn about unusual conditions. The availability of these

complex systems, far from overwhelming consumers, only creates demand for even

more complex systems.

manufacturing of integrated circuits and electronic systems to new levels of

complexity. And perhaps the most amazing characteristic of this collection of

5

systems is its variety as systems become more complex, we build not a few general

purpose computers but an ever wider range of special purpose systems. Our ability

to do so is a testament to our growing mastery of both integrated circuit

manufacturing and design, but the increasing demands of customers continue to test

the limits of design and manufacturing.

1.2 Objective

The main objective of this study is to identify redundant logic operations and

data-dependency so as to provide parallel path for carry propagation which helps to

reduce the overall adder delay. The CLSA has two units: 1) the sum and carry

generator unit (SCG) and 2) the sum and carry selection unit. Accordingly, we

remove all redundant logic operations and sequence logic operations based on their

data-dependency.

In digital adders, the speed of addition is limited by the time required to

propagate a carry through the adder. The sum for each bit position in an elementary

adder is generated sequentially only after the previous bit position has been summed

and a carry propagated into the next position. The early years carry look ahead adder

used to overcome the delay it will produce all produce all the carries at time but it

requires more circuitry, next those are replaced by carry select adders using dual

RCAs.

The Ripple Carry Adder (RCA) provides the most compact design but takes

longer computing time. If there is N-bit RCA, the delay is linearly proportional to N.

Thus for large values of N the RCA gives highest delay of all adders. The Carry

Look Ahead Adder (CLA) gives fast results but consumes large area. So for higher

number of bits, CLA gives higher delay than other adders due to presence of large

number of fan-in and a large number of logic gates. The Carry Select Adder (CSA)

provides a compromise between small area but longer delay RCA and a large area

with shorter delay CLA. In rapidly growing mobile industry, faster units are not the

only concern but also smaller area and less power become major concerns for design

of digital circuits.

In this technique one carry ripple adder is used instead of using dual carry

ripple adder to enhance the area, power and delay. A carry-select adder can be

implemented by using single ripple carry adder and an add-one circuit instead of

using dual ripple-carry adders. This paper proposes a new add-one circuit using the

first zero finding circuit and multiplexers to reduce the area and power with no speed

penalty. For bit length n=64, thisnew carry-select adder requires 38percent fewer

transistors than the dual ripple-carry select adder and 29percent fewer transistors than

Changs carry-select adder using single ripple carry adder.

The project is organized into 6 chapters, namely Introduction, Literature

Review, Proposed Concept, Simulation and Synthesis Result Analysis and

Conclusion and Future Scope.Chapter2 contains the complete details about the

Introduction of VLSI and literature review. Chapter 3 describes about the Logic

Formulation, Proposed Adder Design and Analysis of adders. Chapter 4 explains the

Simulation Result and Synthesis Result. Chapter 5 includes conclusion of proposed

works to enhance the project in the future.

CHAPTER 2

LITERATURE REVIEW

As we know adders are of fundamental importance in a wide variety of digital

systems, several types of fast adders exist but adding fast using low area and power

is still challenging. In digital adders, the speed of addition is limited by the time

required to propagate a carry through adder. So the CSLA is used in many

computational systems to alleviate the problem of carry propagation delay. So many

papers were published on this with several examples of such adders and many

efficient implementations were also done.

A number of modifications are suggested by researchers to improve the

performance of carry select adder. Reference [1] proposes a logic formulation for

CSLA by removing all the redundant logic operation from the conventional CSLA

design. In this design carry select (CS) operation is scheduled before the calculation

of the final SUM. Reference [3] presents various architectures of CSLA and also

presents analysis of the presented architectures for their speed and area. A powerarea efficient gate level modified design is implemented in [15, 4, 8] by minimizing

the logic operationin comparison with the conventional CSLA design. Analysis of

16-bit conventional CSLA and Binary to Excess-1Converter (BEC) CSLA is

presented in [7] and a D-latch based CSLA architecture is proposed in this project.

An area delay optimized architecture of 16-bit, 32-bit and 64-bit CSLA adder is

proposed and analyzed in [5, 6]. Reference [16] presents simulation and

performance evaluation of a 16-bit modified architecture of Square-Root CSLA

(SQRTCSLA). Area-Delay-Power based simulation of redundant logic optimized

modified design of CSLA with respect to the conventional CSLA design is shown in

[9, 10, 11, 12]. A modified design for 16-bit, 32-bit and 64-bit CSLA is proposed in

[19] that does not usemultiplexerarchitecture. This paper also shows a comparative

analysis of theproposed architecture with the conventional architecture. A logic

converter unit (LCU) based modified architecture ofadder is proposed in [20] for

optimized area-delay-power parameter. The modified architectures find applications

inhigh performance VLSI system architectures in the development of modern

electronic devices and gadgets. An efficientarchitecture of Adder essentially

improves the overall performance of complex systems. The different sections of

8

theproposed work are arranged as: Section II presents the architecture of 64-bit

CSLA and the design of its building blockusing gate level logic. Section III presents

the simulation and synthesis results. This section also shows the comparativeanalysis

of the design for dynamic power consumption on different FPGAs. Section IV

presents the conclusion basedon the present design simulation analysis. In the last,

this paper is concluded with the acknowledgement and thereferences.

In 1962, O.J. Bedrij [1] described the extremely fast digital adder with sum

selection and multiple-radix carry. He compared the amount of hardware and the

logical delay for a 100-bit ripple-carry adder and a carry-select adder. The problem

of carry-propagation delay was overcome by independently generating multipleradix carries and using these carries to select between simultaneously generated

sums. In this adder system, the addend and augend were divided into subaddend and

subaugend sections that were added twice to produce two sub sums. One addition

was done with a carry digit forced into each section, and the other addition

combined the operands without the forced carry digit. The selection of the correct

sub sum from each of the adder sections depended upon whether or not there

actually was a carry into that adder section.

Bedriji 1962 proposes [1] that the problem of carry propagation delay is

overcome by independently generating multiple radix carries and using these carries

to select between simultaneously generated sums. Ramkumar et al 2010 proposed a

BEC method to reduce the maximum delay of carry propagation in final stage of

carry save adder [2]. Ramkumar and Harish 2011 [7] propose BEC technique which

is a simple and efficient gate level modification to significantly reduce the area and

power of square root CSLA.

There are many carry select adder approaches available but most of them use

ripple carry adder. T.Y. Chang and M.J. Hsiao [3], suggested that instead of using

dual ripple carry adders, a carry select adder scheme using an add one circuit to

replace one ripple carry adder requires 29.2% fewer transistors with a speed penalty

of 5.9% for bit length n=64. If speed was important for this 64-bit adder, then two of

carry-select adder blocks could be substituted by the proposed scheme with a 6.3%

area saving and the same speed.

9

The Youngjoon kim and Lee-Sup Kim [4] suggested that a carry-select adder

could be implemented by using single ripple carry adder and an add-one circuit

instead of using dual ripple-carry adders. They proposed a new add-one circuit using

the first zero finding circuit and multiplexers to reduce the area and power with no

speed penalty. For n=64 bit, this new carry-select adder requires 38% fewer

transistors than the dual ripple-carry carry select adder and 29 percent fewer

transistors than Chang's carry-select adder using single ripple carry adder. This new

64b adder using a 0.25 um CMOS technology had 3.45 ns delay time at 2.5 V power

supply. Behnam Amelifard et.al [6], suggested a new adder called carry select adder

with sharing (CSAS) which was area efficient but the delay was more. M.Alioto et.al

[5], suggested using variable size block sizing depending on the multiplexers delay.

The B. Ram kumar, H.M. Kittur, and P.M. Kannan [7] suggested a very simple

approach to improve the speed of addition. Based on this approach a 16, 32 and 64bit adder architecture was developed and compared with conventional fast adder

architectures. In many parallel multipliers to speed up the final addition, CLA was

arranged in the form of Carry Select adder (CSLA) & was used. But due to the

structure of the CSLA it occupied more chip area, because it uses multiple pairs of

RCAs to generate the partial sum and carry by considering Cin=0 and Cin=1.Thus

the complexity of the final adder structure was high. So they replaced the RCA

(CLA) with Cin=1 with BEC logic, which reduced the maximum area and delay in

the final adder structure.

Select Adder (CSLA) is one of the fastest adders used in many data-processing

processors to perform fast arithmetic functions. From the structure of the CSLA, it is

clear that there is scope for reducing the area and power consumption in the CSLA.

This work uses a simple and efficient transistor level modification in BEC-1

converter to significantly reduce the area and power of the CSLA. Based on this

modification 16-b square root CSLA (SQRT CSLA) architecture have been

developed and compared with the SQRT CSLA architecture using ordinary BEC-1

converter. The proposed design has reduced area and power as compared with the

10

SQRT CSLA using ordinary BEC-1 converter with only a slight increase in the

delay. This work evaluates the performance of the proposed designs in terms of

delay, area, and power by hand with logical effort and through Cadence Virtuoso.

The results analysis shows that the proposed CSLA structure is better than the SQRT

CSLA with ordinary BEC-1 converter.

on the study of the VLSI design of the carry look-ahead adder (CLAA) based 32-bit

unsigned integer multiplier and the VLSI design of the carry select adder (CSLA)

based 32-bit unsigned integer multiplier. Both the VLSI design of multiplier

multiplies two 32-bit unsigned integer values and gives a product term of 64-bit

values. The CLAA based multiplier uses the delay time of 99ns for performing

multiplication operation where as in CSLA based multiplier also uses nearly the

same delay time for multiplication operation. But the area needed for CLAA

multiplier is reduced to 31% by the CSLA based multiplier to complete the

multiplication operation. These multipliers are implemented using Altera Quartus II

and timing diagrams are viewed through avan waves.

between the design of the 8T adder based Carry Select Adder (CSA) and 10T adder

based CSA. Using both the designs of adders 4-bit CSA architecture has been

developed and compared with the 28T adder and 4-bit CSA. The 10T CSA design

has reduced delay, power and area as compared with the 28T CSA with a slight

tradeoff for area as compared to 8T CSA. The analysis shows that the 10T CSA is

better than both 8T adder based CSA and 28T CSA. This work evaluates the

performance of the 10T CSA design in terms of power, delay and area using 180nm

CMOS process technology Cadence Virtuoso tool and Spectre simulator.

that the Carry Select Adder (CSLA) provides a good compromise between cost and

performance in carry propagation adder design. A Square Root Carry Select Adder

using RCA is introduced but it offers some speed penalty. However, conventional

CSLA is still area-consuming due to the dual ripple carry adder structure. In the

11

proposed work, generally in Wallace multiplier the partial products are reduced as

soon as possible and the final carry propagation path carry select adder is used. In

this project, modification is done at gate level to reduce area and power

consumption. The Modified Square Root Carry Select-Adder (MCSLA) is designed

using Common Boolean Logic and then compared with regular CSLA respective

architectures, and this MCSLA is implemented in Wallace Tree Multiplier. This

work gives the reduced area compared to normal Wallace tree multiplier. Finally, an

area efficient Wallace tree multiplier is designed using common Boolean logic based

square root carry select adder.

presented the study of field of communication and signal processing applications.

Every application demands for a higher throughput arithmetic operation. One of the

key arithmetic operations is multiplication which takes maximum execution time.

The development of efficient multiplier is a subject of interest over decades. So there

is a need for an efficient multiplier which obtains higher performance for real time

signal processing application. The modular design of Vedic multiplier using carry

select adder. The delay of proposed multiplier is reduced due to high speed carry

select adder. The proposed multiplier is applied to parallel FIR filter. It can be

observed that the combinational delay reduced for the proposed multiplier compared

to existing architecture.

Ramkumar and Harish 2011 [4] propose BEC technique which is a simple and

efficient gate level modification to significantly reduce the area and power of square

root CSLA. Veena nair in 2013 suggested a new approach in with D-latch is used

with enabled signal instead of BEC [6]. Based on this approach a 16, 32 and 64-bit

adder architecture was developed and compared with conventional fast adder

architectures. The new structure as a result reduces the delay of the structure.

In the present decade the chips being designed are made from CMOS

technology. CMOS is Complementary Metal Oxide Semiconductor. It consists of

both NMOS and PMOS transistors. To understand CMOS better, we first need to

12

know about the MOS transistor. MOS Transistor MOS stands for Metal Oxide

Semiconductor field effect transistor. MOS is the basic element in the design of a

large scale integrated circuit is the transistor. It is a voltage controlled device. These

transistors are formed as a "sandwich'' consisting of a semiconductor layer, usually a

slice, or wafer, from a single crystal of silicon; a layer of silicon dioxide (the oxide)

and a layer of metal. These layers are patterned in a manner which permits

transistors to be formed in the semiconductor material (the "substrate''); The MOS

transistor consists of three regions, Source, Drain and Gate.

The source and drain regions are quite similar, and are labeled depending on to

what they are connected. The source is the terminal, or node, which acts as the

source of charge carriers; charge carriers leave the source and travel to the drain. In

the case of an N channel MOSFET (NMOS), the source is the more negative of the

terminals; in the case of a P channel device (PMOS), it is the more positive of the

terminals. The area under the gate oxide is called the "channel". Below is figure of a

MOS Transistor.

The transistor normally needs some kind of voltage initially for the channel to

form. When there is no channel formed, the transistor is said to be in the cut off

region. The voltage at which the transistor starts conducting (a channel begins to

form between the source and the drain) is called threshold Voltage. The transistor at

this point is said to be in the linear region. The transistor is said to go into the

saturation region when there are no more charge carriers that go from the source to

the drain. CMOS technology is made up of both NMOS and CMOS transistors.

Complementary Metal Oxide Semiconductors (CMOS) logic devices are the most

common devices used today in the high density, large number transistor count

circuits found in everything from complex microprocessor integrated circuits to

signal processing and communication circuits.

requirements, high operating clock speed, and ease of implementation at the

transistor level. The complementary p-channel and n-channel transistor networks are

used to connect the output of the logic device to the either the VDD or VSS power

supply rails for a given input logic state. The MOSFET transistors canbe treated as

13

between the source and drain terminals. In CMOS, there is only one driver, but the

gate can drive as many gates as possible. In CMOS technology, the output always

drives another CMOS gate input. The charge carriers for PMOS transistors is holes

and charge carriers for NMOS are electrons. The mobility of electrons is two times

more than that of holes. Due to this the output rise and fall time is different. To

make it same, the W/L ratio of the PMOS transistor is made about twice that of the

NMOS transistor. This way, the PMOS andNMOS transistors will have the same

drive strength. In a standard cell library, the length L of a transistor is always

constant. The width W values are changed to have to different drive strengths for

each gate. The resistance is proportional to (L/W). Therefore, if the increasing the

width, decreases the resistance.

Power Dissipation in CMOS ICs the big percentage of power dissipation in

CMOS ICs is due to the charging and discharging of capacitors. Majority of the low

power CMOS IC designs issue is to reduce power dissipation.

The main sources of power dissipation are:

a. Dynamic Switching Power

Due to charging and discharging of circuit capacitances, a low to high output

transition draws energy from the power supply. A high to low transition

dissipatesenergy stored in CMOS transistor.

b. Short Circuit Current

It occurs when the rise/fall time at the input of the gate is larger than theoutput

rise/fall time.

14

It is caused by two reasons a. Reverse Bias Diode Leakage on TransistorDrains: This

happens in CMOS design, when one transistor is off, and the active transistor

charges up/down the drain using the bulk potential of the other transistor.

A PMOS transistor is connected in parallel to a NMOS transistor to form a

Transmission gate. The transmission gate just transmits the value at the input to the

output. It consists of both NMOS and PMOS because, PMOS transistor transmits a

strong 1 and NMOS transistor transmits a strong 0.

The advantages of using a Transmission Gate are:

It shows better characteristics than a switch.

The resistance of the circuit is reduced, since the transistors are connected in

parallel.

It is Silicon of extremely high purity and chemically purified then grown into

large crystals. Wafers is type of crystals are sliced into wafers, and wafer diameter is

currently 150mm, 200mm, 300mm and wafer thickness <1mm and also surface

ispolished to optical smoothness. Wafer is then ready for processing, each wafer will

yield many chips and the chip die size varies from about 5mmx5mm to

15mmx15mm.

A whole wafer is processed at a time; Different parts of each die will be made

P-type or N-type (small amount of other atoms intentionally introduced doping

implant). Interconnections are made with metal insulation used is typically SiO2.

SiN is also used. New materials being investigated (low-k dielectrics). In CMOS

fabrication p-well process, n-well process and twin-tub process. All the devices on

the wafer are made at the same time. After the circuitry has been placed on the chip,

the chip is over glassed (with a passivation layer) to protect it only those areas which

connect to the outside world will be left uncovered (the pads). The wafer finally

passes to a test station test probes send test signal patterns to the chip and monitor

the output of the chip. The yield of a process is the percentage of die which pass this

testing, the wafer is then scribed and separated up into the individual chips. These

15

are then packaged and Chips are binned according to their performance.

The designer facing a design problem must go through a series of steps between

initial ideas and final hardware. This series of steps is commonly referred to as the

design flow. First, after all the requirements have been spelled out, a proper digital

design phase must be carried out. It should be stressed that the tools supplied by the

different FPGA vendors to target their chips do not help the designer in this phase.

They only enter the scene once the designer is ready to translate a given design into

working hardware. The most common flow nowadays used in the design of FPGAs

involves the following subsequent phases:

Design entry: This step consists in transforming the design ideas into some form of

computerized representation. This is most commonly accomplished using Hardware

Description Languages (HDLs). The two most popular HDLs are Verilog and the

Very High Speed Integrated Circuit HDL (VHDL) [2]. It should be noted that an

HDL, as its name implies, is only a tool to describe a design that pre-existed in the

mind, notes, and sketches of a designer. It is not a tool to design electronic circuits.

Another point to note is that HDLs differ from conventional software programming

languages in the sense that they dont support the concept of sequential execution of

statements in the code. This is easy to understand if one considers the alternative

schematic representation of an HDL file: what one sees in the upper part of the

schematic cannot be said to happen before or after what one sees in the lower part.

Synthesis: The synthesis tool receives HDL and a choice of FPGA vendor and

model. From these two pieces of information, it generates a net list which uses the

primitives proposed by the vendor in order to satisfy the logic behavior specified in

the HDLfiles. Most synthesis tools go through additional steps such as logic

optimization, register load balancing, and other techniques to enhance timing

performance, so the resulting net list can be regarded as a very efficient

implementation of the HDLdesign.

Place and route: The placer takes the synthesized net list and chooses a place for

each of the primitives inside the chip. The routers task is then to interconnect all

these primitives together satisfying the timing constraints. The most obvious

constraint for a design is the frequency of the system clock, but there are more

16

involved constraints one can impose on a design using the software packages

supported by the vendors. Bit stream generation: FPGAs are typically configured at

power up time from some sort of external permanent storage device, typically a flash

memory. Once the place and route process is finished, the resulting choices for the

configuration of each programmable element in the FPGA chip, be it logic or

interconnect, must be stored in a file to program the flash. Of these four phases, only

the first one is human labor intensive. Somebody has to type in the HDL code,

which can be tedious and error prone for complicated designs involving, for

example, lots of digital signal processing. This is the reason for the appearance, in

recent years, of alternative flows which include a preliminary phase in which the

user can draw blocks at a higher level ofabstraction and rely on the software tool for

the generation of the HDL. Some of these tools also include the capability of

simulating blocks which will become HDLs with other blocks which provide stimuli

and processing to make the simulation output easier to interpret. The concept of

hardware co-simulation is also becoming widely used. In co-simulation, stimuli are

sent to a running FPGA hosting the design to be tested and the outputs of the design

are sent back to a computer for display (typically through a Joint Test Action Group

(JTAG), or Ethernet connection). The advantage of co-simulation is that one is

testing the real system, therefore suppressing all possible misinterpretations present

in a pure simulator. In other cases, co-simulation may be the only way to simulate a

complex design in a reasonable amount of time.

The standard FPGA design flow starts with design entry using schematics or a

hardware description language (HDL), such as Verilog HDL or VHDL. In this step,

you create the digital circuit that is implemented inside the FPGA. The flow then

proceeds through compilation, simulation, programming, and verification in the

FPGA hardware we first define the relevant terminology in the field and then

describe the recent evolution of FPDs. The three main categories of FPDs are

delineated: Simple PLDs (SPLDs), Complex PLDs (CPLDs) and FieldProgrammable Gate Arrays (FPGAs).

While the headline performance increase offered by FPGAs is often very large

(>100 times for some algorithms) it is important to consider a number of factors

17

practical to implement the whole application on an FPGA? The answer to this is

likely to be no, particularly for floating-point intensive applications which tend to

swallow up a large amount of logic. If it is either impractical or impossible to

implement the whole application on an FPGA, the next best option is to implement

those kernels within the application that are responsible for the majority of the run

time, which may be determined by profiling. Next, the real speedup of the whole

application must be estimated once the kernel has been implemented in a FPGA.

Even if that kernel was originally responsible for 90% of the runtime the total speedup that you can achieve for your application cannot exceed 10 times (even if you

achieve a 1000 times speed up for the kernel), an example of Amdahls law, that

long time irritant of the HPC software engineer. Once such an estimate has been

made, one must decide if the potential gain is worthwhile given the complexity of

instantiating the algorithm on anFPGA.

In general terms FPGAs are best at tasks that use short word length integer or

fixed point data, and exhibit a high degree of parallelism, but they are not so good at

high precision floating-point arithmetic (although they can still outperform

conventional processors in many cases). The implications of shipping data to the

FPGA from the CPU and vice versa must also come under consideration, for if that

outweighs any improvement in the kernel then implementing the algorithm in an

FPGA may be an exercise in futility. FPGAs are best suited to integer arithmetic.

Unfortunately, the vast majority of scientific codes rely heavily on 64 bit IEEE

floating point arithmetic (often referred to as double precision floating point

arithmetic). It is not unreasonable to suggest that in order to get the most out of

FPGAs computational scientists must perform a thorough numerical analysis of their

code, and ideally reemployment it using fixed point arithmetic or lower precision

floating-point arithmetic. Scientists who have been used to continual performance

increases provided by each new generation of processor are not easily convinced that

the large amount of effort required for such an exercise will be sufficiently

rewarded. That said the recent development of efficient floating point cores has gone

some way towards encouraging scientists to use FPGAs.

18

real world applications, then the wider acceptance of FPGAs will move a step closer.

At present there is very little performance data available for 64-bit floating-point

intensive algorithms on FPGAs. To give an indication of expected performance we

have therefore used data taken from the Xilinx floating point cores (v3) datasheet.

To measure the area, performance and power consumption gap between field

programmable gate arrays (FPGAs) and standard cell application-specific integrated

circuits (ASICs) for the following reasons: I. In the early stages of system design,

when system architects choose their implementation medium, they often choose

between FPGAs and ASICs. Such decisions are based on the differences in cost

(which is related to area); performance and power consumption between these

implementation media but to date there have been few attempts to quantify these

differences. A system architect can use these measurements to assess whether

implementation in an FPGA is feasible. II. These measurements can also be useful

for those building ASICs that contain programmable logic, by quantifying the

impact of leaving part of a design to be implemented in the programmable fabric.

III. FPGA makers seeking to improve FPGAs can gain insight by quantitative

measurements of these metrics, particularly when it comes to understanding the

benefit of less programmable (but more efficient) hard heterogeneous blocks such as

block memory multipliers/accumulators and multiplexers that modern FPGAs often

employ.

The most common FPGA architecture consists of an array of configurable logic

blocks (CLBs), I/O pads, and routing channels. Generally, all the routing channels

have the same width (number of wires). Multiple I/O pads may fit into the height of

one row or the width of one column in the array. An application circuit must be

mapped into an FPGA with adequate resources. While the number of CLBs and

I/Os required is easily determined from the design, the number of routing tracks

needed may vary considerably even among designs with the same amount of logic.

(For example, a crossbar switch requires much more routing than a systolic array

with the same gate count.) Since unused routing tracks increase the cost (and

decrease the performance) of the part without providing any benefit, FPGA

manufacturers try to provide just enough tracks so that most designs that will fit in

19

terms of LUTs and IOs can be routed. This is determined by estimates such as those

derived from Rent's rule or by experiments with existing designs.

To define the behavior of the FPGA, the user provides a hardware description

language (HDL) or a schematic design. The HDL form might be easier to work with

when handling large structures because it's possible to just specify them numerically

rather than having to draw every piece by hand. On the other hand, schematic entry

can allow for easier visualization of a design. Then, using an electronic design

automation tool, a technology-mapped netlist is generated. The netlist can then be

fitted to the actual FPGA architecture using a process called place-and-route, usually

performed by the FPGA Companys proprietary place-and-route software. The user

will validate the map, place and route results via timing analysis, simulation, and

other verification methodologies. Once the design and validation process is

complete, the binary file generated (also using the FPGA company's proprietary

software) is used to (re)configure the FPGA. The source files are fed to a software

suite from the FPGA/CPLD vendor that through different steps will produce a file.

This file is then transferred to the FPGA/CPLD via a serial interface or to an

external memory device like an EEPROM.

The most common HDLs are VHDL and Verilog, although in an attempt to reduce

the complexity of designing in HDLs, which have been compared to the equivalent

of assembly languages, there are moves to raise the abstraction level through the

introduction of alternative languages.

Devices Using Hardware Description Languages (HDLs) to design high-density

FPGA devices have the following advantages:

I. Top-Down Approach for Large Projects Designers use HDLs to create

complex designs. The top-down approach to system design works well for

large HDL projects that require many designers working together. After the

design team determines the overall design plan, individual designers can

work independently on separate code sections.

II. Functional Simulation Early in the Design Flow You can verify design

20

Testing your design decisions before the design is implemented at the

Register Transfer Level (RTL) or gate level allows you to make any

necessary changes early on.

III. Synthesis of HDL Code to Gates Synthesizing your hardware description to

target the FPGA implementation:

Decreases design time by allowing a higher-level design specification, rather

than specifying the design from the FPGA base elements.

Reduces the errors that can occur during a manual translation of a hardware

description to a schematic design.

Allows you to apply the automation techniques used by the synthesis tool

(such as machine encoding styles and automatic I/O insertion) during

optimization to the original HDL code. This results in greater optimization

and efficiency.

different design implementations early in the design flow. Use the synthesis tool to

perform the logic synthesis and optimization into gates. Additionally, Xilinx FPGA

devices allow you to implement your design at your computer. Since the synthesis

time is short, you have more time to explore different architectural possibilities at

the Register Transfer Level (RTL). You can reprogram Xilinx FPGA devices to test

several design implementations. You can retarget RTL code to new FPGA devices

with minimum recoding.

Both VHDL and Verilog are well established hardware description languages.

They have the advantage that the user can define high-level algorithms and low-level

optimizations (gate-level and switch-level) in the same language. A basic example of

VHDL code, the evaluation of the Fibonacci series, is shown below, and it is a good

example of the points made above. The code itself is reasonably straightforward for

a software programmer to understand, provided that he/she understands that this is a

truly parallel language and all lines are executing at once. It is also straightforward

to simulate a simple design of this nature. However, it is surprisingly difficult to

21

implement it in hardware and this difficulty is a direct result of I/O issues. As noted

above for a design to work in hardware access is required to resources that are

external to the FPGA, such as memory, and an FPGA is, by its very nature, unaware

of the components to which it is connected. If you want to retrieve a value from

main memory and use it on the FPGA then you need to instantiate a memory

controller. While systems such as the Cray XD1 provide cores for communicating

with memory, such cores are still complex and unfamiliar to software programmers.

Our early experiences with VHDL have indicated that it should only be used for

FPGA development if you are in a position to work closely with experienced

hardware designers throughout the development process.

22

CHAPTER-3

DESIGN APPROACH

Low-Power, area-efficient, and high-performance VLSI systems are increasingly

used in portable and mobile devices, multi standard wireless receivers, and

biomedical instrumentation [1], [2]. An adder is the main component of an

arithmetic unit. A complex digital signal processing (DSP) system involves several

adders. An efficient adder design essentially improves the performance of a complex

DSP system. A ripple carry adder (RCA) uses a simple design, but carry propagation

delay (CPD) is the main concern in this adder. Carry look-ahead and carry select

(CS) methods have been suggested to reduce the CPD of adders. A conventional

carry select adder (CSLA) is an RCARCA configuration that generates a pair of

sum words and output carry bits corresponding the anticipated input-carry (cin = 0

and 1) and selects one out of each pair for final-sum and final-output-carry [3]. A

conventional CSLA has less CPD than an RCA, but the design is not attractive since

it uses a dual RCA. Few attempts have been made to avoid dual use of RCA in

CSLA design. Kim and Kim [4] used one RCA and one add-one circuit instead of

two RCAs, where the add-one circuit is implemented using a multiplexer (MUX).

He et al. [5] proposed a square-root (SQRT)-CSLA to implement large bit-width

adders with less delay. In a SQRT CSLA, CSLAs with increasing size are connected

in a cascading structure. The main objective of SQRT-CSLA design is to provide a

parallel path for carry propagation that helps to reduce the overall adder delay. We

suggested a binary to BEC-based CSLA. The BEC-based CSLA involves less logic

resources than the conventional CSLA, but it has marginally higher delay. A CSLA

based on common Boolean logic (CBL) is also proposed in [7] and [8]. The CBLbased CSLA of [7] involves significantly less logic resource than the conventional

CSLA but it has longer CPD, which is almost equal to that of the RCA. To

overcome this problem, a SQRT-CSLA based on CBL was proposed in [8].

However, the CBL-based SQRTCSLA design of [8] requires more logic resource

and delay than the BEC-based SQRT-CSLA of [6]. We observe that logic

optimization largely depends on availability of redundant operations in the

formulation, whereas adder delay mainly depends on data dependence. In the

existing designs, logic is optimized without giving any consideration to the data

23

conventional and BEC-based CSLAs to study the data dependence and to identify

redundant logic operations. Based on this analysis, we have proposed a logic

formulation for the CSLA.

The main contribution in this brief is logic formulation based on data dependence

and optimized carry generator (CG) and CS design. Based on the proposed logic

formulation, we have derived an efficient logic design for CSLA. Due to optimized

logic units, the proposed CSLA involves significantly less ADP than the existing

CSLAs. We have shown that the SQRT-CSLA using the proposed CSLA design

involves nearly 32% less ADP and consumes 33% less energy than that of the

corresponding SQRT-CSLA.

The carry-select adder generally consists of two ripple carry adder and a

multiplexer. Adding two n-bit numbers with a carry select adder is done with two

adders (therefore two ripple carry adders) in order to perform the calculation twice,

one time with the assumption of the carry being zero and the other assuming one.

After the two results are calculated, the correct sum, as well as the correct carry, is

then selected with the multiplexer once the correct carry is known. The number of

bits in each carry select block can be uniform, or variable. In the uniform case, the

optimal delay occurs for a block size of

have a delay, from addition inputs A and B to the carry out, equal to that of the

multiplexer chain leading in to it, so that the carry out is calculated just in time. The

delay is derived from uniform sizing, wherethe ideal number of full-adder

elements per block is equal to the square root of the number of bits being added,

since that will yield an equal number of MUX delays.

However, the carry select adder is not area efficient because it uses multiple pairs

of Ripple Carry Adders to generate partial sum and carry by considering carry input

and then the final sum and carry are selected by the multiplexers (mux). To

overcome the above problem, the above CSLA is modified by using n-bit Binary to

24

Excess-1 code converters (BEC) to improve the speed of addition. The logic can be

implemented with any type of adder to further improve the speed. We use the Binary

toExcess-1 Converter (BEC) instead of ripple carry adder in the regular CSLA to

achieve lower area and power consumption. The main advantage of this BEC logic

comes from the lesser number of logic gates than the Full Adder (FA) structure. The

modified design has reduced area and power as compared with the regular

SQRTCSLA with an increase in the delay. Therefore, an improved CSLA was

designed with a D-Latch replacing the BEC in the modified CSLA. This design has

efficiently reduced the delay there by increasing the speed making it a high speed

Carry Select Adder.The factors which are desirable in adders are as follows:

High speed, Low power consumption

Area efficient

Robustness and noise stability

Insensitivity to process variables

Less internal activity when activity is low

According to the requirement of the adder the designer has to consider all these

parameter While choosing a structure for adders what makes this decision even

harder is that usually most of these parameter are not independent from each other

tradeoff between desired parameter make this decision a multi-dimensional

optimization problem for high performance system a multi-dimensional optimization

problem for a non-linear system that usually has hundreds of variables, is

unfortunately impossible to solve within the limited design time.

The idea for this thesis is to explore the area, power consumption and time delay

for different structure of adders this will give us a good understanding of different

structure and makes the decision easier for the designers.

The Ripple Carry Adder (RCA) provides the most compact design but takes

longer computing time. If there is N-bit RCA, the delay is linearly proportional to N.

Thus for large values of N the RCA gives highest delay of all adders. The Carry

Look Ahead Adder (CLA) gives fast results but consumes large area. If there is Nbit adder, CLA is fast for N4, but for large values of N its delay increases more

than other adders. So for higher number of bits, CLA gives higher delay than other

adders due to presence of large number of fan-in and a large number of logic gates.

25

The Carry Select Adder (CSA) provides a compromise between small area but

longer delay RCA and a large area with shorter delay CLA. In rapidly growing

mobile industry, faster units are not the only concern but also smaller area and less

power become major concerns for design of digital circuits. In mobile electronics,

reducing area and power consumption are key factors in increasing portability and

battery life. Even in servers and desktop computers power dissipation is an

important design constraint. Design of area and power efficient high-speed data path

logic systems are one of the most substantial areas of research in VLSI system

design.

In the present work, the design of an 8-bit adder topology like ripple carry adder,

carry look ahead adder, carry skip adder, carry select adder, carry increment adder,

carry save adder and carry bypass adder are presented. It tightly integrates mixedsignal implementation with digital implementation, circuit simulation, transistorlevel extraction and verification. Performance issues like area, power dissipation and

propagation delay for all the adders are analyzed at 0.12m 6metal layer CMOS

technology using micro windtool. Design of area and power-efficient high speed

data path logic systems are one of the most substantial areas of research in VLSI

system design. In digital adders, the speed of addition is limited by the time required

to propagate a carry through the adder. The sum for each bit position in an

elementary adder is generated sequentially only after the previous bit position has

been summed and a carry propagated into the next position. The CSLA is used in

many computational systems to alleviate the problem of carry propagation delay by

independently generating multiple carries and then select a carry to generate the sum

[1].

However, the CSLA is not area efficient because it uses multiple pairs of Ripple

Carry Adders (RCA) to generate partial sum and carry by considering carry input

Cin = 0 and Cin = 1, then the final sum and carry are selected by the multiplexers

(mux). The basic idea of this work is to use simple combinational circuit instead of

RCA with cin = 1 and multiplexer in the regular CSLA to achieve lower area and

power. The main advantage of this Project is logic comes from low power than the

n-bit Full Adder (FA) structure. The SQRT CSLA has been developed by using

simple combinational circuit and compared with regular SQRT CSLA.A regular

CSLA uses two copies of the carry evaluation blocks, one with block carry input is

zero and other one with block carry input is one. Regular CSLA suffers from the

26

disadvantage of occupying more chip area. The modified CSLA reduces the area and

power when compared to regular CSLA with increase in delay by the use of Binary

to Excess-1 converter. This Project proposes a scheme which reduces the delay, area

and power than regular and modified CSLA by the use of D-latches.

3.2 Operation

Carry Select Adders (CSA) is one of the fastest adders used in many dataprocessing processors to perform fast arithmetic functions. The carry-select adder

partitions the adder into several groups, each of which performs two additions in

parallel. Therefore, two copies of ripple-carry adder act as carry evaluation block per

select stage. One copy evaluates the carry chain assuming the block carry-in is zero,

while the other assumes it to be one. Once the carry signals are finally computed, the

correct sum and carry-out signals will be simply selected by a set of multiplexers.

The 4-bit adder block is RCA.Systems are one of the most substantial areas of

research in VLSI system design. In digital adders, the speed of addition is limited by

the time required to propagate a carry through the adder. The sum for each bit

position in an elementary adder is generated sequentially only after the previous bit

position has been summed and a carry propagated into the next position. The CSLA

is used in many computational systems to alleviate the problem of carry propagation

delay by independently generating multiple carries and then select a carry to

generate the sum. However, the CSLA is not area efficient because it uses multiple

pairs of Ripple Carry Adders (RCA) to generate partial sum and carry by

considering carry input and, then the final sum and carry are selected by the

multiplexers (MUX).

The carry-select adder generally consists of two ripple carry adders and a

multiplexer. Adding two n-bit numbers with a carry-select adder is done with two

adders (therefore two ripple carry adders) in order to perform the calculation twice,

one time with the assumption of the carry being zero and the other assuming one.

After the two results are calculated, the correct sum, as well as the correct carry, is

then selected with the multiplexer once the correct carry is known. The number of

bits in each carry select block can be uniform, or variable. In the uniform case, the

optimal delay occurs for a block size of n variable, the block size should have a

delay, from additional inputs A and B to the carry out, equal to that of the

27

multiplexer chain leading into it, so that the carry out is calculated just in time. The

delay is derived from uniform sizing, where the ideal number of full-adder elements

per block is equal to the square root of the number of bits being added, since that

will yield an equal number of MUX delays. Two 4-bit ripple carry adders are

multiplexed together, where the resulting carry and sum bits are selected by the

carry-in. Since one ripple carry adder assumes a carry-in of 0, and the other assumes

a carry-in of 1, selecting which adder had the correct assumption via the actual

carry-in yields the desired result. A 16-bit carry-select adder with a uniform block

size of 4 can be created with three of these blocks and a 4-bit ripple carry adder.

Since carry-in is known at the beginning of computation, a carry select block is not

needed for the first four bits. The delay of this adder will be four full adder delays,

plus three MUX delaysA 16-bit carry-select adder with variable size can be similarly

created. Here we show an adder with block sizes. This break-up is ideal when the

full-adder delay is equal to the MUX delay, which is unlikely. The total delay is two

full adder delays, and four MUX delays.

Addition is the heart of computer arithmetic, and the arithmetic unit is often the

work horse of a computational circuit. They are the necessary component of a data

path, e.g. in microprocessors or a signal processor. There are many ways to design

an added. The Ripple Carry Adder (RCA) provides the most compact design but

takes longer computing time. If there is N-bit RCA, the delay is linearly proportional

to N. Thus for large values of N the RCA gives highest delay of all adders. The

Carry Look Ahead Adder (CLA) gives fast results but consumes large area. If there

is N-bit adder, CLA is fast for N4, but for large values of N its delay increases

more than other adders. So for higher number of bits, CLA gives higher delay than

other adders due to presence of large number of fan-in and a large number of logic

gates. The Carry Select Adder (CSA) provides a compromise between small area but

longer delay RCA and a large area with shorter delay CLA.In rapidly growing

mobile industry, faster units are not the only concern but also smaller area and less

power become major concerns for design of digital circuits. In mobile electronics,

reducing area and power consumption are key factors in increasing portability and

battery life. Even in servers and desktop computers power dissipation is an

important design constraint. Design of area- and power-efficient high-speed data

path logic systems are one of the most substantial areas of research in VLSI system

design. In digital adders, the speed of addition is limited by the time required to

28

propagate a carry through the adder. The sum for each bit position in an elementary

adder is generated sequentially only after the previous bit position has been summed

and a carry propagated into the next position. Among various adders, the CSA is

intermediate regarding speed and area.

Regular CSLA has 2 ripple carry adders (rca) in each module for performing

addition depending on carry.

Using 2 RCAs in each module increases the number of transistors.

Increase in number of transistors leads to increase in area and power

consumption.

2nd RCA in each module can be replaced by binary to excess one converter which

performs the same operation with less number of transistors which leads to modified

CSLA which is area efficient and low power consumption.

29

Code converters are very essential in digital systems. Here we are going to give

the truth table for binary to excess-1 converter. The Excess-1 converter is obtained

by adding one to the binary value. The detailed structures of the 5-bit BEC without

carry (BEC) and with carry (BECWC) are shown in Fig.3.3. The BEC gets n

inputs and generates n output; the BECWC gets n input and generates n+1 output to

give the carry output as the selection input of the next stage mux used in the final

adder design. The function table of BEC and BECWC are shown in Table 3.1.

30

Large bit sized multipliers require multiple BEC and each of them requires the

selection input from the carry output of the preceding BEC.

(a) BEC (without carry),

(b) BECWC (with carry).

The CSLA has two units: 1) the sum and carry generator unit (SCG) and 2) the

sum and carry selection unit [9]. The SCG unit consumes most of the logic resources

of CSLA and significantly contributes to the critical path. Different logic designs

31

have been suggested for efficient implementation of the SCG unit. We made a study

of the logic designs suggested for the SCG unit of conventional and BEC-based

CSLAs of [6] by suitable logic expressions. The main objective of this study is to

identify redundant logic operations and data dependence. Accordingly, we remove

all redundant logic operations and sequence logic operations based on their data

dependence.

Fig. 3.4 (a) Conventional CSLA; n is the input operand bit-width. (b) The logic

operations of the RCA are shown in split form, where HSG, HCG, FSG, and

FCG represent half-sum generation, half-carry generation, full-sum generation,

and full-carry generation, respectively.

The SCG unit of the conventional CSLA as shown in Fig. 3.4 (a), [3] is

composed of two n-bit RCAs, where n is the adder bit-width. The logic operation of

the n-bit RCA shown in fig. 3.4 (b) is performed in four stages:

Suppose two n-bit operands are added in the conventional CSLA, then RCA-1

and RCA-2 generate n-bit sum (s0 and s1) and output-carry (c0 out and c1 out)

corresponding to input-carry (cin = 0 and cin = 1), respectively. Logic expressions of

RCA-1 and RCA-2 of the SCG unit of the n-bit CSLA are given as

soo (i) = A(i) XOR B(i), coo(i) = A(i) and B(i)

32

c1o (i) = coo (i) + soo (i) and c1o (i1), couto = c1o (n1)

so1 (i) = A(i) XOR B(i) co1 (i) = A(i) and B(i)

s11 (i) = so1 (i) XOR c11 (i1)

c11 (i) = co1 (i) + so1 (i) and c11 (i1), cout1 = c11 (n1) ..................1

As stated above the main idea of this work is to use BEC instead of the RCA with

Cin=1in order to reduce the area and power consumption of the regular CSLA. To

replace the n bit RCA, an n+1 bit BEC is required.

The RCA as shown in Fig. 3.2, calculates n-bit sum

cin = 0. The BEC unit receives

and

and

corresponding to

excess-1 code. The most significant bit (MSB) of BEC represents c1 out, in which n

least significant bits (LSBs) represent

s11(i)= s1o(i) + c11(i-1)

c11(i)= s1o(i). c11(i-1)

cout1 = c1o(n-1) + c11(n-1)

........ 2

The selected carry word is added with the half-sum (s0) to generate the final-sum

(s). Using this method, one can have three design advantages:

1. Calculation of

2. The n-bit select unit is required instead of the (n+1) bit; and

3. Small output-carry delay.

All these features result in an areadelay and energy-efficient design for the

CSLA. We have removed all the redundant logic operations of 2 and rearranged

logic expressions of 2 based on their dependence. The proposed logic formulation

for the CSLA is given as

so(i) = A(i) XOR B(i), coo(i) = A(i) and B(i)

c1o(i) = c1o(i-1) and soo(i) + co(i) for c1o(0) = 0

c11(i) = c01(i-1) and soo(i) + co(i) for c1o(0) = 1

c(i)= c1o(i) if(cin=0)

c(i)= c11(i) if(cin=1) ..........3

33

Fig. 3.5 Structure of the BEC-based CSLA; n is the input operand bit-width.

The proposed CSLA is based on the logic formulation given in 3.6 (a), and its

structure is shown in Fig. 3.5. It consists of one HSG unit, one FSG unit, one CG

unit, and one CS unit. The CG unit is composed of two CGs (CG0 and CG1)

corresponding to input-carry 0 and 1. The HSG receives two n-bit operands (A

and B) and generate half-sum word s0 and half-carry word c0 of width n bits each.

Both CG0 and CG1 receive s0 and c0 from the HSG unit and generate two n-bit fullcarry words c0 1 and c11 corresponding to input-carry 0 and 1, respectively.

Fig. 3.6 (a) Proposed CS adder design, where n is the input operand bit-width,

and [] represents delay (in the unit of inverter delay), n = max (t, 3.5n + 2.7).

(b) Gate-level design of the HSG. (c) Gate-level optimized design of (CG0) for

input-carry = 0. (d) Gate-level optimized design of (CG1) for input-carry = 1.

(e) Gate-level design of the CS unit. (f) Gate-level design of the final sum

generation (FSG) unit.

34

The logic diagram of the HSG unit is shown in Fig. 3.6 (b). The logic circuits of

CG0 and CG1 are optimized to take advantage of the fixed input-carry bits. The

optimized designs of CG0 and CG1 are shown in Fig. 3.6 (c) and (d), respectively.

The CS unit selects one final carry word from the two carry words available at its

input line using the control signal cin. It selects when cin = 0; otherwise, it selects.

The CS unit can be implemented using an n-bit 2-to-l MUX. However, we find from

the truth table of the CS unit that carry words c0 1 and c11 follow a specific bit

pattern. If (i) = 1, then (i) = 1, irrespective of s0(i) and c0(i), for 0 i n 1. This

feature is used for logic optimization of the CS unit. The optimized design of the CS

unit is shown in Fig. 3.6 (e), which is composed of n ANDOR gates. The final

carry word c is obtained from the CS unit. The MSB of c is sent to output as cout,

and (n 1) LSBs are XORed with (n 1) MSBs of half-sum (s0) in the FSG [shown

in Fig. 3.6 (f)] to obtain (n 1) MSBs of final-sum (s). The LSB of s0 is XORed

with cin to obtain the LSB of s.

We have considered all the gates to be made of 2-input AND, 2-input OR, and

inverter (AOI). A 2-input XOR is composed of 2 AND, 1 OR, and 2 NOT gates. The

area and delay of the 2-input AND, 2-input OR, and NOT gates are taken from the

Synopsys Armenia Educational Department (SAED) 90-nm standard cell library

datasheet for theoretical estimation. The area and delay of a design are calculated

using the following relations:

A = a . Na + r . No + i - Ni

T = na . Ta + no . To + nj . Ti.......... 4

Where (Na, No, Ni) and (na, no, ni), respectively, represent the (AND, OR, NOT)

gate counts of the total design and its critical path. (a, r, i) and (Ta, To, Ti),

respectively, represent the area and delay of one (AND, OR, NOT) gate. We have

calculated the (AOI) gate counts of each design for area and delay estimation the

area and delay of each design are calculated from the AOI gate counts (Na, No, Ni),

(na, no, ni), and the cell details. The path of the proposed CSLA, the delay of each

intermediate and output signals of the proposed n-bit CSLA design of Fig. 3.6 is

shown in the square bracket against each signal. We can observe that the proposed

n-bit single-stage CSLA adder involves 6n less number of AOI gates than the CSLA

35

of [6] and takes 2.7 and 6.6 units less delay to calculate final-sum and output-carry.

Compared with the CBL-based CSLA of [7], the proposed CSLA design involves n

more AOI gates, and it takes (n 4.7) unit less delay to calculate the output-carry.

In this work the following adder structures are used:

Ripple Carry Adder

Carry Save Adder

Carry Look-Ahead Adder

Carry Increment adder

Carry Skip Adder

Carry Bypass Adder

Carry Select Adder

The ripple carry adder is constructed by cascading full adders (FA) blocks in

series. One full adder is responsible for the addition of two binary digits at any stage

of the ripple carry. The carryout of one stage is fed directly to the carry-in of the

next stage. Even though this is a simple adder and can be used to add unrestricted bit

length numbers, it is however not very efficient when large bit numbers are used.

One of the most serious drawbacks of this adder is that the delay increases linearly

with the bit length. The worst-case delay of the RCA is when a carry signal

transition ripples through all stages of adder chain from the least significant bit to the

most significant bit, which is approximated by:

t = (n-1) tc + ts

The well-known adder architecture, ripple carry adder is composed of cascaded

full adders for n-bit adder, as shown in figure 3.7. It is constructed by cascading full

adder blocks in series. The carry out of one stage is fed directly to the carry-in of the

next stage. For an n-bit parallel adder it requires n full adders.

36

Not very efficient when large number bit numbers are used.

Delay increases linearly with bit length.

In Carry select adder scheme, blocks of bits are added in two ways: one assuming

a carry-in of 0 and the other with a carry-in of 1. This results in two pre computed

sum and carry-out signal pairs (s0i-1: k, c0i; s1i-1: k, c1i), later as the blocks true

carry-in (ck) becomes known, the correct signal pairs are selected. Generally,

multiplexers are used to propagate carries.

Fig. 3.8 A Carry Select Adder with 1 level using n/2- bit RCA

Because of multiplexers larger area is required.

Have a lesser delay than Ripple Carry Adders (half delay of RCA).

Hence we always go for Carry Select Adder while working with smaller no of

bits.

Carry Look Ahead Adder can produce carries faster due to carry bits generated in

parallel by an additional circuitry whenever inputs change. This technique uses carry

bypass logic to speed up the carry propagation.

37

Let ai and bi be the augends and addend inputs, ci the carry input, si and ci+1, the

sum and carry-out to the ith bit position. If the auxiliary functions, pi and gi are

called the propagate and generate signals, the sum output respectively are defined as

follows.

As we increase the no of bits in the Carry Look Ahead adders, the complexity

increases because the no. of gates in the expression Ci+1 increases. So

practically its not desirable to use the traditional CLA shown above because it

increases the space required and the power too.

Instead we will use here Carry Look Ahead adder (less bits) in levels to create

a larger CLA. Commonly smaller CLA may be taken as a 4-bit CLA. So we can

define carry look ahead over a group of 4 bits. Hence now we redefine terms

carry generate as [Group Generated Carry] g [i, i+3] and carry propagate as

[Group Propagated Carry] p [i, i+3] which are defined below.

The main idea of this work is to use BEC instead of the RCA with Cin = 1 in

order to reduce the area and power consumption of the regular CSLA. To replace the

n-bit RCA, an n+1-bit BEC is required. A structure and the function table of a 4-b

BEC. Illustrates how the basic function of the CSLA is obtained by using the4-bit

BEC together with the mux. One input of the 2:1 mux gets as it input (B3, B2, B1,

and B0) and another input of the mux is the BEC output. This produces the two

possible partial results in parallel and the mux is used to select either the BEC output

or the direct inputs according to the control signal Cin. The importance of the BEC

logic stems from the large silicon area reduction when the CSLA with large number

of bits are designed. The Boolean expressions of the 4-bit BEC is listed as (note the

functional symbols ~ NOT, & AND, ^ XOR)

X0 = ~B0

X2 = B2 ^ (B0& B1)

X1 = B0 ^ B1

38

The 4-bit BEC with 2:1 multiplexer, the inputs for the 2:1MUX are one is the

output of the 4-bit BEC and another input is output of 4- bit full adder with input

carry equal to zero. The selection line is carry of previous stage which select one of

the input as output, if Cin=1 output is 4-bit BEC output.

Table 3.2 Functional table of the 4-bit BEC

B3 B2 B1 B0 X3 X2 X1 X0

0

0 0

0 0

0 0

0 1

0 1

0 1

0 1

1 0

1 0

1 0

1 0

1 1

1 1

1 1

1 1

0 0

0 0

0 0

0 0

0 1

0 1

0 1

0 1

1 0

1 0

1 0

1 0

1 1

1 1

1 1

1 1

0 0

3.5.5 Multiplexer

In electronics, a multiplexer (or MUX) is a device that selects one of several

analog or digital input signals and forwards the selected input into a single line.

Multiplexer of 2n inputs has n select lines, which are used to select which input line

to send to the output. Multiplexers are mainly used to increase the amount of data

that can be sent over the network within a certain amount of time and bandwidth. A

multiplexer is also called a data selector. An electronic multiplexer makes it possible

for several signals to share one device or resource, for example one A/D converter or

one communication line, instead of having one device per input signal.

39

In digital circuit design, the selector wires are of digital value. In the case of a 2to-1 multiplexer, a logic value of 0 would connect to the output while a logic value

of 1 would connect to the output. In larger multiplexers, the number of selector pins

is equal to where is the number of inputs. A 2-to-1 multiplexer has a Boolean

equation where and are the two inputs, is the selector input, and is the output.

microprocessor, digital signal processor, especially digital computers. Also, it serves

as a building block for synthesis all other arithmetic operations. Therefore, regarding

the efficient implementation of an arithmetic unit, the binary adder structures

become a very critical hardware unit. In any book on computer arithmetic, someone

looks that there exists a large number of different circuit architectures with different

performance characteristics and widely used in the practice. Although many

researches dealing with the binary adder structures have been done, the studies based

on their comparative performance analysis are only a few.

are given. Among the huge member of the adders we wrote VHDL (Hardware

Description Language) code for Ripple-carry, Carry-select and Carry-look ahead to

emphasize the common performance properties belong to their classes. In the

following section, we give a brief description of the studied adder architectures.

With respect to asymptotic delay time and area complexity, the binary adder

architectures can be categorized into four primary classes as given in Table 3.3. The

given results in the table are the highest exponent term of the exact formulas, very

complex for the high bit lengths of the operands.

The first class consists of the very slow ripple-carry adder with the smallest area.

In the second class, the carry-skip, carry-select adders with multiple levels have

small area requirements and shortened computation times. From the third class, the

carry-look ahead adder and from the fourth class, the parallel prefix adder represents

the fastest addition schemes with the largest area complexities.

40

versatile hardware synthesis are rudiments for a high productivity in ASIC design. In

the majority of digital signal processing (DSP) applications the critical operations

are the addition, multiplication and accumulation. Addition is an indispensable

operation for any digital system, DSP or control system. Therefore, a fast and

accurate operation of a digital system is greatly influenced by the performance of the

resident adders. Adders are also very significant component in digital systems

because of their widespread use in other basic digital operations such as subtraction,

multiplication and division. Hence, improving performance of the digital adder

would extensively advance the execution of binary operations inside a circuit

compromised of such blocks. Many different adder architectures for speeding up

binary addition have been studied and proposed over the last decades. For cell-based

design techniques they can be well characterized with respect to circuit area and

speed as well as suitability for logic optimization and synthesis. Ripple Carry Adder

(RCA) [1] [2] is the simplest, but slowest adders with O (n) area and O (n) delay,

where n is the operand size in bits. Carry Look-Ahead (CLA) [3] [4] have O (n * log

(n)) area and O (log (n)) delay, but typically suffer from irregular layout.

On the other hand, carry Addition, one of the most frequently used arithmetic

operations, is employed to build advanced operations such as multiplication and

division. Theoretical research has found that the lower bound on the critical path

delay of the adder has complexity O (log n), where n is the adder width. The design

of high performance adders has been extensively studied [10] [15], and several

adders have achieved logarithmic delays. Whereas theoretical bounds indicate that

41

no traditional adder can achieve sub-logarithmic delay, it has been shown that

speculative adders can achieve sub-logarithmic delays by neglecting rare input

patterns that exercise the critical paths [2, 11, 13]. Furthermore, by augmenting

speculative adders with error detection and recovery, one can construct reliable

variable-latency adders whose average performance is very close to speculative

adders [3, 6, 12, and 17].

Speculative adders are built upon the observation that the critical path is rarely

activated in traditional adders. In traditional adders, each output depends on all

previous (lower or equal significance) bits. In particular, the most significant output

depends on all the n bits, where n is the adder width. In contrast, in speculative

adders [2, 6, 11, 13, 17], each output only depends on the previous k bits rather than

all previous bits, where k is much smaller than n. However, the cumulative error

grows linearly with the adder width since each speculative output can independently

be in error. Moreover, the calculation of each speculative output requires an

individual k-bit adder; hence, such designs also incur large area overhead and large

fanout at the primary inputs. Techniques such as effective sharing [17] can mitigate

but not eliminate fanout and area problems. Although the speculative adder in [18]

can mitigate the area problem, it incurs a fairly high error rate that limits its

application.

For applications where errors cannot be tolerated, a reliable variable latency adder

can be built upon the speculative adder by adding error detection and recovery [3, 6,

12, 17]. For the vast majority of input combinations, the speculative adder produces

correct results; when error detection flags an error, error recovery provides correct

results in one or more extra cycles. Ideally, the average performance of the variable

latency adder should be similar to the speculative one. However, existing variable

latency adders have several drawbacks. When error detection indicates no error, the

actual delay is the longer of the speculative adder and error detection. The delay of

error detection is always longer than the speculative adder [6] [17]. Hence, the

benefit of speculation is limited by the delay of error detection [3] [12]. Besides, the

circuitry for error detection and recovery incurs nontrivial area overhead. Finally,

variable latency adders are mostly restricted for random inputs [3, 12, and 17]. This

42

thesis first describes a novel function speculation technique, called speculative carry

select addition (SCSA). The key idea is to segment the chain of propagate signals in

addition into blocks of the same size. Specifically, the input bits of addends are

segmented into blocks, and the carry bits between blocks are selectively truncated to

0. SCSA is less susceptible to errors, since it is only applied for blocks instead of

individual outputs.

each output, which mitigates the area overhead problem. An analytical model to

determine the error rate of SCSA is formulated, and the accurate relation between

the block size and output error is developed. A high performance speculative adder

design is presented for low error rates (e.g. 0.01% and 0.25%). Secondly, this thesis

describes a reliable variable latency adder design that augments the speculative

adder with error detection and recovery. The speculative adder produces correct

results in a single cycle in most cases, and error recovery provides correct results in

an extra cycle in worst cases. The performance of the variable latency adder is close

to that of the speculative adder. This approach has two advantages. First, the critical

path delay of the error detection block is lower or comparable to that of the

speculative adder. Second, the error detection and recovery circuitry incurs low area

overhead by using intermediate results from the speculative adder.

Finally, the previous variable latency and speculative adders are mainly designed

for unsigned random inputs, so this thesis proposes the modified variable latency

and speculative adders suitable for both random and Gaussian inputs. With modified

speculative adder and error detection block, the variable latency adder still achieves

high performance when 2's complement Gaussian inputs present. This shows that the

variable latency adder design is feasible for practical applications.

In the present work, the design of an 8-bit adder topology like ripple carry adder,

carry look ahead adder, carry skip adder, carry select adder, carry increment adder,

carry save adder and carry bypass adder are presented. It tightly integrates mixedsignal implementation with digital implementation, circuit simulation, transistorlevel extraction and verification. Performance issues like area, power dissipation and

propagation delay for all the adders are analyzed at 0.12m 6metal layer CMOS

43

follows.

Design of area and power-efficient high speed data path logic systems are one of

the most substantial areas of research in VLSI system design. In digital adders, the

speed of addition is limited by the time required to propagate a carry through the

adder. The sum for each bit position in an elementary adder is generated sequentially

only after the previous bit position has been summed and a carry propagated into the

next position. The CSLA is used in many computational systems to alleviate the

problem of carry propagation delay by independently generating multiple carries and

then select a carry to generate the sum [1].

However, the CSLA is not area efficient because it uses multiple pairs of Ripple

Carry Adders (RCA) to generate partial sum and carry by considering carry input

Cin = 0 and Cin = 1, then the final sum and carry are selected by the multiplexers

(mux). The basic idea of this work is to use simple combinational circuit instead of

RCA with cin = 1 and multiplexer in the regular CSLA to achieve lower area and

power. The main advantage of this Project is logic comes from low power than the

n-bit Full Adder (FA) structure. The SQRT CSLA has been developed by using

simple combinational circuit and compared with regular SQRT CSLA.

A regular CSLA uses two copies of the carry evaluation blocks, one with block

carry input is zero and other one with block carry input is one. Regular CSLA

suffers from the disadvantage of occupying more chip area. The modified CSLA

reduces the area and power when compared to regular CSLA with increase in delay

by the use of Binary to Excess-1 converter. This Project proposes a scheme which

reduces the delay, area and power than regular and modified CSLA by the use of Dlatches.

In our project we compared 3- different adders Ripple Carry Adders, Carry Select

Adders and the Carry Look Ahead Adders. The basic purpose of our experiment was

to know the time and power trade-offs between different adders which will give us a

44

clear picture of which adder suits best in which type of situation during design

process. Hence below we present both the theoretical and practical comparisons of

all the three adders whish were taken into consideration.

45

The multipath carry propagation feature of the CSLA is fully exploited in the

SQRT-CSLA [5], which is composed of a chain of CSLAs. CSLAs of increasing

size are used in the SQRT-CSLA to extract the maximum concurrence in the carry

propagation path. Using the SQRT-CSLA design, large-size adders are implemented

with significantly less delay than a single-stage CSLA of same size. However, carry

propagation delay between the CSLA stages of SQRT-CSLA is critical for the

overall adder delay.

Fig. 3.10 Proposed SQRT-CSLA for n = 16. All intermediate and output signals

are labeled with delay

feature, the proposed CSLA design is more favorable than the existing CSLA

designs for area delay [10] efficient implementation of SQRT-CSLA. A 16-bit

SQRT-CSLA design using the proposed CSLA is shown in Fig. 3.7, where the 2-bit

RCA, 2-bit CSLA, 3-bit CSLA, 4-bit CSLA, and 5-bit CSLA are used. We have

46

considered the cascaded configuration of (2-bit RCA and 2-bit, 3-bit, 4-bit, 6-bit, 7bit, and 8-bit CSLAs) and (2-bit RCA and 2-bit, 3-bit, 4-bit, 6-bit, 7-bit, 8-bit, 9-bit,

11-bit, and 12-bit CSLAs), respectively, for the 32-bit SQRTCSLA and the 64-bit

SQRT-CSLA to optimize adder delay. To demonstrate the advantage of the

proposed CSLA design in SQRT-CSLA, we have estimated the area and delay of

SQRTCSLA using the proposed CSLA design and the BEC-based CSLA of [6] and

the CBL-based CSLA of [7] for bit-widths 16 and 32.

47

CHAPTER 4

RESULTS ANALYSIS

In this section, the proposed method synthesis and simulation results are reported.

The proposed system isimplemented by Xilinx software and the simulation

waveforms of each module are shown below.

Fig 4.1 (a) Simulation Waveform Result of 8-bit Ripple Carry Adder

Table 4.1 Device Utilization summary of 8-bit Ripple Carry Adder

Device utilization summary

Selected Device

Number of Slices

Number of 4 input LUTs

Number of IOs

Number of bonded IOBs

3s1600efg484-4

09 out of 14752

16 out of 29504

26

26 out of 00376 6%

48

Table 4.2 Device Utilization summary of 8-bit CSA

Device utilization summary

Selected Device

Number of Slices

Number of 4 input LUTs

Number of IOs

Number of bonded IOBs

3s1600efg484-4

9 out of 14752

16 out of 29504

26

26 out of 00376 6%

49

50

Table 4.3 Synthesis Report of 8-bit Proposed CSA

HDL Synthesis

Report

Macro Statistics

# Xors

:9

1-bit xor2 : 8

8-bit xor2 : 1

Advanced HDL

Synthesis Report

Macro Statistics

# Xors

:9

1-bit xor2 : 8

8-bit xor2 : 1

Optimizing unit <prop8>...

Mapping all equations...

Building and optimizing final netlist...

Found area constraint ratio of 100

(+ 5) on block prop8, actual ratio is 0.

Device utilization summary

Selected Device

Number of Slices

Number of 4 input LUTs

Number of IOs

Number of bonded IOBs

3s1600efg484-4

14 out of 14752

26 out of 29504

26

26 out of 376

51

6%

Fig 4.4 (a) Simulation Waveform Result of 16-bit Ripple Carry Adder

Table 4.5 Device Utilization summary of 16-bit Ripple Carry Adder

Device utilization summary

Selected Device

Number of Slices

Number of 4 input LUTs

Number of IOs

Number of bonded IOBs

3s1600efg484-4

18 out of 14752

32 out of 29504

50

50 out of 376 13%

52

Table 4.6 Device Utilization summary of 16-bit CSA

Device utilization summary

Selected Device

Number of Slices

Number of 4 input LUTs

Number of IOs

Number of bonded IOBs

3s1600efg484-4

18 out of 14752

32 out of 29504

50

50 out of 376 13%

53

54

Table 4.7 Synthesis Report of 16-bit Proposed CSA

HDL Synthesis

Report

Macro Statistics

# Xors

: 17

1-bit xor2 : 16

16-bit xor2 : 1

Advanced HDL

Synthesis Report

Macro Statistics

# Xors

: 17

1-bit xor2 : 16

16-bit xor2 : 1

Optimizing unit <prop16>...

Mapping all equations...

Building and optimizing final netlist...

Found area constraint ratio of 100

(+ 5) on block prop16, actual ratio is 0.

Final Macro Processing...

Device utilization summary

Selected Device

Number of Slices

Number of 4 input LUTs

Number of IOs

Number of bonded IOBs

3s1600efg484-4

34 out of 14752

63 out of 29504

50

50 out of 376

13%

55

Fig 4.7 (a) Simulation Waveform Result of 32-bit Ripple Carry Adder

Table 4.9 Device Utilization summary of 32-bit Ripple Carry Adder

Device utilization summary

Selected Device

Number of Slices

Number of 4 input LUTs

Number of IOs

Number of bonded IOBs

3s1600efg484-4

37 out of 14752

64 out of 29504

98

98 out of 376

26%

56

Table 4.10 Device Utilization summary of 32-bit CSA

Device utilization summary

Selected Device

Number of Slices

Number of 4 input LUTs

Number of IOs

Number of bonded IOBs

3s1600efg484-4

50 out of 14752

91 out of 29504

98

98 out of 376

26%

57

58

Table 4.11 Synthesis Report of 32-bit Proposed CSA

HDL Synthesis

Report

Macro Statistics

# Xors

: 33

1-bit xor2 : 32

32-bit xor2 : 1

Advanced HDL

Synthesis Report

Macro Statistics

# Xors

: 33

1-bit xor2 : 32

32-bit xor2 : 1

Optimizing unit <prop32>...

Mapping all equations...

Building and optimizing final netlist...

Found area constraint ratio of 100

(+ 5) on block prop32, actual ratio is 0.

Device utilization summary

Selected Device

Number of Slices

Number of 4 input LUTs

Number of IOs

Number of bonded IOBs

3s1600efg484-4

76 out of 14752

140 out of 29504

98

98 out of 376 26%

59

The simulated V files are imported into the synthesized tool and corresponding

values of delay and area are noted. The synthesized reports contain area and delay

values for different sized adders. The similar design flow is followed for both the

regular and modified SQRT CSLA of different sizes.

As for the transistor count in 32-bit carry select adder, the transistor count of our

proposed area-efficient carry select adder could be reduced to be very close to that of

carry ripple adder; however, the transistor count in the conventional carry select

adder is nearly double as compared with the proposed design. This result shows that

sharing common Boolean logic term could indeed achieve a superior performance in

aspect of transistor count. As the input bit number of the conventional carry select

adder increases to 32-bit, the power consumption in the conventional carry select

adder will be 3.3 times larger than that in our proposed area-efficient carry select

adder.

It is clear that the delay of the 8-bit, 16-bit, 32-bit, and 64-bit proposed SQRT

CSLA is reduced by 4.6%, 49.3%, 44.5%, and 59.08%, respectively when compared

to regular SQRT CSLA. Power reduction of the proposed paper when compared to

regular SQRT CSLA 8-bit, 16-bit, 32-bit and 64-bit is 10.8%, 17.73%, 20.01% and

21.9% respectively.

60

We perform the simulation and synthesis and summarize the results of all the

adders. The Functional verification (simulation) and synthesis (high level description

is converted into RTL) of all the adders is performed and results are summarized.

calculation of delay and area and thereby the speed and power of the CSLAs are

calculated and a comparison of regular, modified and improved CSLA is made in

terms of delay, area and power

In this section, the proposed method is compared with the other 32-bit ripple

carry adder. Here, we show the result of power delays and critical path delays. The

area indicates the total cell area of the design and the total power is sum of the

leakage power, internal power and switching power. The percentage reduction in the

cell area, total power, power-delay product and the area-delay product as function of

the bit size.

Synthesis report shows the design summary including number of LUTs, I/O

buffers, Slice registers, flip flop pairs and theoretical estimation, layout estimation.

Area and delay Comparison: Table 4.2 exhibits the simulation results of both the

CSLA structures in terms of delay, area and power. Table 4.3 shows the device

utilization summary. Table 4.4 depicts that the proposed SQRT CSLA has less

number of gates and hence less area.

61

Design

SQRT-CSLA

[6]

SQRT-CSLA

(CBL) [7]

SQRT-CSLA

proposed

Width

(n)

Delay

(ns)

Area

(um2)

ADP

(um2us)

EADP

(1%)

16

7.38

1813.71

13.39

161.41

32

14.58

3627.42

52.89

280.64

64

28.98

7254.84

210.25

436.55

16

3.0

1706.80

5.12

---

32

3.85

3608.98

13.89

--

64

5.27

7435.46

39.18

--

16

2.0

1574.54

4.10

--

32

2.75

2989.99

11.89

---

64

4.28

6553.24

32.10

--

Design

SQRT-CSLA

(conv)

SQRT-CSLA

(BEC) [7]

SQRT-CSLA

(CBL)

SQRT-CSLA

proposed

Width(n)

Delay(ns)

Area(um2)

Power(uW)

16

5.61

2890.52

30.5673

32

6.56

6100.34

60.2537

64

8.37

12613.2

113.6457

16

10.45

1722.96

12.8662

32

18.72.

2765.38

17.7900

64

35.10

5530.56

91.1744

16

5.55

1813.45

19.6652

32

6.59

3735.36

38.1886

64

8.35

7603.89

70.62442

16

5.55

1813.45

19.6652

32

6.59

3735.36

38.1886

64

8.35

7603.89

70.62442

Logic utilization

Used

91

29504

1%

51

14752

1%

51

51

100%

0%

97

376

1%

related logic

Number of slices containing only

unrelated logic

Number of bonded IOBs

62

Available Utilization

Word

size

8 bit

16

bit

32

bit

64

bit

Delay

(ns)

Area

1.719

Modified

CSLA

Regular

CSLA

Adder

Power

(uw)

Power

delay

product

(10-15)

Area

delay

product

(10-25)

Leakage

Switching

Total

991

0.007

101.9

203.9

350.5

1703.5

1.958

895

0.006

94.2

188.4

368.8

1752.5

2.775

2272

0.017

263.7

527.4

1463.8

6304.8

Modified

CSLA

Regular

CSLA

3.048

1929

0.013

235.9

471.8

1438.0

5879.6

5.137

4783

0.036

563.6

1127.3

5790.9

24570.2

Modified

CSLA

Regular

CSLA

5.482

3985

0.027

484.9

969.9

5316.9

21848.5

9.174

9916

0.075

1212.4

2425.0

22245.9

90969.3

Modified

CSLA

9.519

8183

0.057

1025.0

2050.1

19514.9

77893.9

Regular

CSLA

RCA is shown in Table 4.4

The delay can be calculated by adding up the number of gates in the longest path

of logic block that contributes maximum delay. The area evolution is done by

counting the total number of AOI gates required for each logic block. The main

disadvantage of regular CSLA is high area usage that can be overcome by using

modified CSLA. Table 4.1 shows the Area and delay of and, or, and not gates given

in the 90-nm standard cell library datasheet of proposed system compared with 32bit RCA.

4.4 Applications

Arithmetic Logic units

High Speed Multiplication

Advanced Microprocessor Design

Digital Signal Process

63

4.5 Advantages

Low Power Consumption

Less Area (Less Complexity)

More Speed Compare to regular CSLA

Less Complexity

64

CHAPTER 5

CONCLUSION & FUTURE SCOPE

5.1 Conclusion

Thus in order to reduce the area and power of SQRT CSLA architecture that

we have implemented in this Project, a simple approach has been used. In this work,

the numbers of gates have been reduced and this feature offers a greater advantage in

the area and power reduction. The simulation results indicate that the modified

SQRT CSLA is suffering from larger delay whereas the in 32-bit modified SQRT

CSLA, area and power are significantly reduced. The delay calculations used here

can be computed using the mentor graphics tool.

Now a days Carry Select Adder (CSLA) used in many data-processing

processors to perform fast arithmetic functions. The speed of SQRT CSLA greater

than Modified SQRT CSLA, but the area and power reduced compared to SQRT

CSLA. So, SQRT CSLA can be replaced by Modified SQRT CSLA Where the area

and power major constraints than speed.

65

REFERENCES

[1] Low-Power and Area-Efficient Carry select Adder by B.Ram Kumar and Harish

M Kitturin IEEE Transactions on Very Large Scale Integration (VLSI) Systems,

Volume 20 No.2, February-2012.

[2] An Area efficient static CMOS carry-select adder based on a compact carry lookahead unit G.A. Ruiz, M. Granda in Micro-Electronics Journal 35(2004) 939-944,

2004-Elsevier Ltd.

[3] O.J. Badrij, Carry-select Adder, IRE Transaction Electronics Computers, pp

340- 344, 1962.

[4] Y. Kim and L.S. Kim, 64-bit carry-select adder with reduced area, Electron.

Lett, vol.37, no.10, pp.614-615, May-2001.

[5] J.M. Rabaey, Digtal Integrated Circuits a Design Perspective. Upper Saddle

River, NJ: Prentice-Hall, 2001.

[6] Cadence, Encounter user guide, Version6.2.4, March 2008.

[7] T.Y. Chang and M.J. Hsiao, Carry-select adder using single ripple-carry adder,

Electronics Letters, vol. 34, no. 22, pp. 2101 2103, Oct. 1998.

[8] Computer Arithmetic Algorithms and hardware designs by Behrooz parhami.

[9] Review on Carry Skip Adder and Gray/Black Cell Function Lecture 18 Datapath

Subsystems Chapter 10 Copyright 2005 Pearson Addison-Wesley. All rights

reserved.

[10] Gray Yeap and Gilbert, Practical Low power Digital VLSI Design, Kluwer

Academic Publishers. 1998.

[11] B. Parhami, Computer Arithmetic: Algorithms and Hardware Designs, 2nd ed.

New York, NY, USA: Oxford Univ. Press, 2010.

66

[12] K.K.Parhi, VLSI Digital Signal Processing. New York, NY, USA Wiley, 1998.

2003.

1962.

[15] I-Chyn Wey, Cheng-Chen Ho, Yi-Sheng Lin, and Chien-Chang Peng, An

Area-Efficient Carry Select Adder Design by Sharing the Common Boolean Logic

Term in Proceedings of International MultiConference of Engineers and Computer

sciencentist 2012 Vol II, IMECS 2012, March 14-16, 2012, Hong-Kong.

[16] A.P.Chandrakasan, N.Verma, and D.C.Daly, Ultralow-power electronics for

biomedical applications, Annu. Rev. Biomed. Eng, vol.10, pp.247274, Aug.2008.

[17] R.UMA, Vidya Vijayan, M. Mohanapriya, Sharon Paul 2, Area, Delay and

Power Comparison of Adder Topologies International Journal of VLSI design

Communication Systems (VLSICS) Vol.3, No.1, February 2012.

[18] S. Manju and V. Sornagopal, An efficient SQRT architecture of carry select

adder design by common Boolean logic, in Proc. VLSI ICEVENT, 2013, pp. 15.

Senior Member, IEEE, and Sujit Kumar Patel, IEEE Transaction On CircuitAnd

System-i: Express Briefs, Vol 61, No 6, June 2014.

[20] Y. He, C. H. Chang, and J.Gu, An area efficient 64-bit square root Carry-Select

Adder for low power applications. in Proc. IEEE Int. Symp. Circuits Syst., 2005,

vol. 4, pp. 40824085.

67

Technology, Management and Research

A Peer Reviewed Open Access International Journal

Power and Delay

Bhagya Sri Gutthikonda

PG Student,

Department of ECE,

Sri Mittapalli Institute of Technology for Women,

Guntur, Andhra Pradesh, India.

ABSTRACT:

Carry Select Adder (CSLA) is one of the fastest adders

used in many data-processing processors to perform fast

arithmetic functions. From the structure of the CSLA, it is

clear that there is scope for reducing the area and power

consumption in the CSLA. This work uses a simple and

efficient gate-level modification to significantly reduce

the area and power of the CSLA. Based on this modification 8-bit, 16-bit, 32-bit, 64-bit square-root CSLA (SQRT

CSLA) architecture have been developed and compared

with the regular SQRT CSLA architecture. The proposed

design has reduced area and power as compared with the

regular SQRT CSLA with only a slight increase in the delay. This work evaluates the performance of the proposed

designs in terms of delay, area, power, and their products

by hand with logical effort and through custom design and

layout in 0.18-m CMOS process technology. The results

analysis shows that the proposed CSLA structure is better

than the regular SQRT CSLA.

Keywords:

SQRT CSLA, area efficient, CSLA, low power, delay efficient.

I. INTRODUCTION:

Design of area and power efficient high speed data pathlogic systems are one of the most substantial areas of researchin VLSI system design. In digital adders, the speed

of additionis limited by the time required to propagate a

carry throughthe adder. The sum for each bit position in an

elementaryadder is generated sequentially only after the

previous bitposition has been summed and a carry propagated into thenext position.The CLSA is used in many

computational system is toalleviate the problem of carry

propagation delay byindependently generating multiple

carries and then select acarry to generate the sum [1].

www.ijmetmr.com

Professor,

Department of ECE,

Sri Mittapalli Institute of Technology for Women,

Guntur, Andhra Pradesh, India.

However, the CSLA[3] is not areaefficient because it uses

multiple pairs of Ripple Carry Adders(RCA) to generate

partial sum and carry by considering carryinput and then

the final sum and carry are selected by themultiplexers

(mux). The basic idea of this work is to use Binary to

Excess-1converted (BEC) instead of RCA with in the regular CSLA toachieve lower area and power consumption

[2]-[4]. The mainadvantage of this BEC logic comes from

the lesser number oflogic gates than the bit Full Adder

(FA) structure.This brief isstructured as follows. This paper deals with thedelay and area evaluation methodology

of the basic adder blocks. And also presents the detailed

structure and thefunction of the BEC logic.The SQRT

CSLA has been chosen for comparison with theproposed

design as it has a more balanced delay, and requires lower

power and area [5], [6]. The delay and area evaluation

methodology of the regular and modified SQRT CSLA are

presented.The rest of the paper is organised as follows.In

Section II, logic formulation is presented. In Section III,

the proposed adder design is explained. In Section IV, the

proposed scheme is compared to the previously proposed

ones and results are shown. Finally, Section V concludes

this paper.

The CSLA has two units: 1) the sum and carry generator unit (SCG) and 2) the sum and carry selection unit

[9]. The SCG unit consumes most of the logic resources

of CSLA as shown in fig 1, and significantly contributes

to the critical path. Different logic designs have been

suggested for efficient implementation of the SCG unit.

We made a study of the logic designs suggested for the

SCG unit of conventional and BEC-based CSLAs of [6]

by suitable logic expressions. The main objective of this

study is to identify redundant logic operations and data

dependence. Accordingly, we remove all redundant logic

operations and sequence logic operations based on their

data dependence which are discussed below.

January 2016

Page 120

Technology, Management and Research

A Peer Reviewed Open Access International Journal

bit-width. (b) The logic operations of the RCA.

The SCG unit of the conventional CSLA as shown in

Fig.1(a), [3] is composed of two n-bit RCAs, where n is

the adder bit-width. The logic operation of the n-bit RCA

shown in fig.1(b) is performed in four stages:

Half-sum generation (HSG);

Half-carry generation (HCG);

Full-sum generation (FSG); and

Full carry generation (FCG).

Suppose two n-bit operands are added in the conventional

CSLA, then RCA-1 and RCA-2 generate n-bit sum (s0

and s1) and output-carry (c0 out and c1 out) corresponding to input-carry (cin = 0 and cin = 1), respectively. Logic

expressions of RCA-1 and RCA-2 of the SCG unit of the

n-bit CSLA are given as

to generate the final-sum (s). Using this method, one can

have three design advantages:

1.Calculation of S10 is avoided in the SCG unit;

2.The n-bit select unit is required instead of the (n+1) bit;

and

3.Small output-carry delay.

All these features result in an areadelay and energyefficient design for the CSLA.We have removed all the

redundant logic operations of 2 and rearranged logic expressions of 2 based on their dependence. The proposed

logic formulation for the CSLA is given as

instead of the RCA with Cin=1in order to reduce the area

and power consumption of the regular CSLA. To replace

the n bit RCA, an n+1 bit BEC is required.

BEC-Based CSLA

www.ijmetmr.com

The proposed CSLA is based on the logic formulation given in equ.4, and its structure is shown in Fig. 3(a)where n

is the input operand bit-width, and [*] represents delay (in

the unit of inverter delay), n = max (t, 3.5n + 2.7).

January 2016

Page 121

Technology, Management and Research

A Peer Reviewed Open Access International Journal

and one CS unit. The CG unit is composed of two CGs

(CG0 and CG1) corresponding to input-carry 0 and 1.

The HSG receives two n-bit operands (A and B) and generate half-sum word s0 and half-carry word c0 of width n

bits each. Both CG0 and CG1 receive s0 and c0 from the

HSG unit and generate two n-bit full-carry words c0 1 and

c11 corresponding to input-carry 0 and 1, respectively.

The logic diagram of the HSG unit is shown in Fig.3 (b).

The logic circuits of CG0 and CG1 are optimized to take

advantage of the fixed input-carry bits. The optimized designs of CG0 and CG1 are shown in Fig. 3(c) and (d),

respectively.

of the HSG. (c) Gate-level optimized design of (CG0) for

input-carry = 0. (d) Gate-level optimized design of (CG1)

for input-carry = 1. (e) Gate-level design of the CS unit.

(f) Gate-level design of the final-sum generation (FSG)

unit.The CS unit selects one final carry word from the two

carry words available at its input line using the control

signal cin. It selects C_1^0 when cin = 0; otherwise, it

selects C_1^1. The CS unit can be implemented using an

n-bit 2-to-l MUX. However, we find from the truth table

of the CS unit that carry words c0 1 and c11 follow a specific bit pattern. If C_1^0 (i) = 1, then C_1^1 (i) = 1, irrespective of s0(i) and c0(i), for 0 i n 1. This feature

is used for logic optimization of the CS unit. The optimized design of the CS unit is shown in Fig. 3(e), which is

composed of n ANDOR gates. The final carry word c is

obtained from the CS unit. The MSB of c is sent to output

as cout, and (n 1) LSBs are XORed with (n 1) MSBs

of half-sum (s0) in the FSG [shown in Fig. 3(f)] to obtain

(n 1) MSBs of final-sum (s). The LSB of s0 is XORed

with cin to obtain the LSB of s.We have considered all

the gates to be made of 2-input AND, 2-input OR, and

inverter (AOI). A 2-input XOR is composed of 2 AND, 1

OR, and 2 NOT gates. The area and delay of the 2-input

AND, 2-input OR, and NOT gates (shown in Table I) are

taken from the Synopsys Armenia Educational

www.ijmetmr.com

for theoretical estimation. The area and delay of a design

are calculated using the following relations:

Where (Na, No, Ni) and (na, no, ni), respectively, represent the (AND, OR, NOT) gate counts of the total design

and its critical path. (a, r, i) and (Ta, To, Ti), respectively,

represent the area and delay of one (AND, OR, NOT)

gate. We have calculated the (AOI) gate counts of each

design for area and delay estimation the area and delay of

each design are calculated from the AOI gate counts (Na,

No, Ni), (na, no, ni), and the cell details of Table I. The

path of the proposed CSLA, the delay of each intermediate and output signals of the proposed n-bit CSLA design

of Fig. 3 is shown in the square bracket against each signal. We can observe that the proposed n-bit single-stage

CSLA adder involves 6n less number of AOI gates than

the CSLA of [6] and takes 2.7 and 6.6 units less delay

to calculate final-sum and output-carry. Compared with

the CBL-based CSLA of [7], the proposed CSLA design

involves n more AOI gates, and it takes (n 4.7) unit less

delay to calculate the output-carry.

CSLA (SQRT-CSLA)

The multipath carry propagation feature of the CSLA is

fully exploited in the SQRT-CSLA [5], which is composed

of a chain of CSLAs. CSLAs of increasing size are used

in the SQRT-CSLA to extract the maximum concurrence

in the carry propagation path. Using the SQRT-CSLA design, large-size adders are implemented with significantly

less delay than a single-stage CSLA of same size. However, carry propagation delay between the CSLA stages of

SQRT-CSLA is critical for the overall adder delay.

Fig.4. Proposed SQRT-CSLA for n = 16. All intermediate and output signals are labelled with delay

January 2016

Page 122

Technology, Management and Research

A Peer Reviewed Open Access International Journal

carry propagation feature, the proposed CSLA design

is more favourable than the existing CSLA designs for

areadelay[10] efficient implementation of SQRT-CSLA.

A 16-bit SQRT-CSLA design using the proposed CSLA

is shown in Fig. 4, where the 2-bit RCA, 2-bit CSLA,

3-bit CSLA, 4-bit CSLA, and 5-bit CSLA are used. We

have considered the cascaded configuration of (2-bit RCA

and 2-, 3-, 4-, 6-, 7-, and 8-bit CSLAs) and (2-bit RCA

and 2-bit, 3-bit, 4-bit, 6-bit, 7-bit, 8-bit, 9-bit, 11-bit, and

12-bit CSLAs), respectively, for the 32-bit SQRTCSLA

and the 64-bit SQRT-CSLA to optimize adder delay. To

demonstrate the advantage of the proposed CSLA design

in SQRT-CSLA, we have estimated the area and delay

of SQRTCSLA using the proposed CSLA design and the

BEC-based CSLA of [6] and the CBL-based CSLA of [7]

for bit-widths 16, 32.

IV RESULTS& DISCUSSION:

In this section, we present the experimental results. In

Section IV-A, the proposedmethod is compared with the

conventional methods.We perform the simulation and

synthesis and summarize the results of all the adders. The

Functional verification (simulation) and synthesis (high

level description is converted into RTL) of all the adders

is performed and results are summarized.After the observation of simulation waveforms, synthesis is performed

for calculation of delay and area and thereby the speed

and power of the CSLAs are calculated and a comparison

of regular, modified and improved CSLA is made in terms

of delay, area and power

structures in terms of delay, area and power.

The area indicates the total cell area of the design and the

total power is sum of the leakage power, internal power

and switching power. The percentage reduction in the cell

area, total power, power-delay product and the areadelay

product as function of the bit size are shown.

A. PERFORMANCE COMPARISON:

In this section, the proposed method is compared with the

other 32-bit ripple carry adder.AreaDelay Estimation

Method: The comparison of proposed system with 32-bit

RCA is shown in Table 1.The delay can be calculated by

adding up the number of gates in the longest path of logic

block that contributes maximum delay.The area evolution is done by counting the total number of AOI gates

required for each logic block. The main disadvantage of

regular CSLA is high area usage that can be overcome by

using modified CSLA.Table 1 shows the Area and delay

of and, or, and not gates given in the90-nm standard cell

library datasheet of proposed system compared with 32bit RCA.Here, we show the result of power delays and

critical path delays.

www.ijmetmr.com

proposed SQRT CSLA has less number of gates and

hence less area.

January 2016

Page 123

Technology, Management and Research

A Peer Reviewed Open Access International Journal

result

In this section, the proposed method synthesis and simulation results are reported.Synthesis Report:Synthesis

report shows the design summary including number of

LUTs, I/O buffers, Slice registers, flip flop pairs as shown

in Table III.

V. CONCLUSION:

A simple approach is in this paper to reduce thearea and

power of SQRT CSLA architecture. The reducednumber

of gates of this work offers the great advantage in thereduction of area and also the total power. The comparedresults show that the modified SQRT CSLA has a slightlylarger delay, but the area and power of the 32-bmodified

SQRT CSLA are significantly reduced by 17.4%and

15.4% respectively. The power-delay product and also

the area-delay product of the proposed design show a decrease for 16-, 32-b sizes which indicates thesuccess of

the method and not a mere trade off of delay forpower and

area. The modified CSLA architecture is therefore,low

area, low power, simple and efficient for VLSI hardwareimplementation. It would be interesting to test the design

ofthe modified SQRT CSLA.

REFERENCES:

[1] Low-Power and Area-Efficient Carry select Adder by

B.Ram Kumar and Harish M Kitturin IEEE Transactions

on Very Large scale Integration(VLSI) Systems, Volume

20 No.2, February 2012.

[2] An Area efficient static CMOS carry-select adder based

on a compact carry look-ahead unit G.A.Ruiz, M.Granda

in Microelectronics Journal 35(2004) 939-944,2004

Elsevier Ltd.

[3] O.J.Badrij,Carry-select Adder, IRETransaction

Electronics Computers, pp 340- 344, 1962.

www.ijmetmr.com

January 2016

Page 124

Technology, Management and Research

A Peer Reviewed Open Access International Journal

reduced area, Electron. Lett., vol.37, no. 10, pp. 614

615, May 2001.

[5] J.M. Rabaey,Digtal Integrated Circuits ADesign Perspective. Upper Saddle River, NJ:Prentice-Hall, 2001.

[6] Cadence, Encounter user guide, Version6.2.4, March

2008.

[7] T.Y.Chang and M.J.Hsiao, Carry-select adder using

single ripple-carry adder, Electronics Letters, vol. 34,

no. 22, pp. 2101 2103, Oct. 1998.

www.ijmetmr.com

[9] Review on Carry Skip Adder and Gray/Black Cell

Function Lecture 18 Datapath Subsystems Chapter 10

Copyright 2005

Pearson Addison-Wesley. All rights

reserved.

[10] Gray Yeap and Gilbert, Practical Lowpower Digital

VLSI Design, Kluwer Academic Publishers. 1998.

January 2016

Page 125

- IJAIEM-2013-07-16-049Загружено:Anonymous vQrJlEN
- Generations of ComputerЗагружено:VehleSingh
- 10.1.1.54.9429Загружено:visu18
- ME-PED - 2010 - SyllabusЗагружено:ravi1afsb_1968
- Pet Essentials 2013Загружено:Silvester Macak
- Micropower High Side MOSFET DriversЗагружено:Dorel Contrra
- Bicmos Technology DeepikaЗагружено:Srujana Bharath
- SDM_batch.4Загружено:Sharan Kumar Goud
- Analysis and Simulation of Gate Leakage Current in P3 SRAM Cell at Deep-Sub-Micron Technology for Multimedia ApplicationsЗагружено:IDES
- cmos vlsi LEC-1Загружено:Jagannath Kb
- vlsi designЗагружено:Nikhil Kumar
- EI332 Linear and Digi Inted Circuits Nov Dec 2007Загружено:Hari Prakash
- 50920125 (1)Загружено:thasarathanr1993_939
- lp-4Загружено:etasuresh
- nec ir protocolЗагружено:Dominic Chan
- 1402.3309Загружено:jewel2080
- OldЗагружено:Umamahesh Mavuluri
- Mr.R.benschwartz Technology Related Issues in ManufacturingЗагружено:lavanyachezhiyan
- 7. Design Methodology Part 1Загружено:AnthonyTitus
- buk104Загружено:WallyWallys
- ecЗагружено:api-273759951
- EE669-1Загружено:NehruBoda
- WET20110100003_51407369.pdfЗагружено:Kuven
- 2d sp model.pdfЗагружено:PruthvirajKosala
- Analysis of an Efficient Multiplier Architecture Using Adaptive Hold LogicЗагружено:IJARTET
- DcdЗагружено:Vimal Pandey
- Chapter 7 - Driving Thytistors and TransistorsЗагружено:mitros
- Operating System From 0 to 1Загружено:buurentriko
- PanasonicЗагружено:dinnhow
- cmoslogic_data7Загружено:baditakali

- WordЗагружено:Vamsi Sakhamuri
- 17KC1E0007Загружено:Vamsi Sakhamuri
- Seismic Behaviour of RC Building Constructred With Different Configurations of Shear WallsЗагружено:Vamsi Sakhamuri
- Foreign Exchange Risk Management Karvy-2017-18(1)Загружено:Vamsi Sakhamuri
- Demand LetterЗагружено:Vamsi Sakhamuri
- A IntroductionЗагружено:Vamsi Sakhamuri
- Adi NarayanaЗагружено:Vamsi Sakhamuri
- C LiteratureЗагружено:Vamsi Sakhamuri
- 15A41E0041Загружено:Vamsi Sakhamuri
- ChapterЗагружено:Vamsi Sakhamuri
- Face PaperЗагружено:Vamsi Sakhamuri
- Main Project Srinu ModifidЗагружено:Vamsi Sakhamuri
- Capital StructureЗагружено:Vamsi Sakhamuri
- 14AH1F0002Загружено:Vamsi Sakhamuri
- Bapatla Engineering CollegeЗагружено:Vamsi Sakhamuri
- Narasimha ProjectЗагружено:Vamsi Sakhamuri
- Only Content and AbstractЗагружено:Vamsi Sakhamuri
- 14AH1F0003 DOQUMENTЗагружено:Vamsi Sakhamuri
- ChapsЗагружено:Vamsi Sakhamuri
- Index PaperЗагружено:Vamsi Sakhamuri
- 14CD1S0906Загружено:Vamsi Sakhamuri
- MainЗагружено:Vamsi Sakhamuri
- 13kh1e0009 CompletedЗагружено:Vamsi Sakhamuri
- Estimate and Validation Cefixime and Azithromycin Simultaneoslyy in Tablt Dosage Froms by Rp-hplc MethodЗагружено:Vamsi Sakhamuri
- DocЗагружено:Vamsi Sakhamuri
- samba-edit doc.in-1Загружено:Vamsi Sakhamuri
- Roja ProjectЗагружено:Vamsi Sakhamuri
- 1.0 IntroductionЗагружено:Vamsi Sakhamuri
- 04Загружено:Vamsi Sakhamuri

- 01 Transistor Biasing EngЗагружено:Sandor Lorand
- The Magpi-Issue 4Загружено:sunny_sigara
- Dc to Ac Converter by Using 555 Timer ICЗагружено:Alfred Adukobirre Adukobilla
- 2SA968Загружено:Khánh
- Clap Switch(2)Загружено:Nayab Gulzar Hussain
- FerroelectricЗагружено:Roxana Stanculescu
- 091_107Загружено:yatheesh_kc
- Transistores SMD - Equivalências CompletaЗагружено:Cristian Sampayo Rojas
- Transistor de potencia MJ15015.pdfЗагружено:Martin
- DC Motor ArduinoЗагружено:Bodo De La Buerno
- syllabusbtechec1.pdfЗагружено:Amol Amoll
- Vlsi DesignЗагружено:Krishna
- Transistor 1Загружено:Huy Khôi Hà
- EC8261-Circuits and Devices LaboratoryЗагружено:Nandha Kumar
- EE 332 Lab3 v2_1Загружено:Ngotranduc Thang
- Shadow AlarmЗагружено:sekhar469
- TimerЗагружено:Mihai Matei
- EIE III TO VIII.pdfЗагружено:Raja Prabhu
- TLP250 Mosfet Gate DriverЗагружено:Kien Trung
- CMOS Technology OverviewЗагружено:Adrián JL
- EE2_12.docЗагружено:yash waingankar
- Gold PicЗагружено:BalbalaManiuk
- Scientific.american.special.edition.1997-12 Solid State CenturyЗагружено:merlino99
- media.pdfЗагружено:Vishal Nair
- 9-IT-finalЗагружено:Dhruv Paul
- AN211 FET Teory and PracticeЗагружено:fabirzn
- ae jntu syllabus.docxЗагружено:hanimireddy025015225
- LB1847Загружено:serlabtrieste
- File-1389421449Загружено:HAFIZ ARSALAN ALI
- 43LJ5000Загружено:atomo333

## Гораздо больше, чем просто документы.

Откройте для себя все, что может предложить Scribd, включая книги и аудиокниги от крупных издательств.

Отменить можно в любой момент.