Академический Документы
Профессиональный Документы
Культура Документы
INTRODUCTION
The world of computing is in transition. As chips become smaller and faster, they
dissipate more heat, the energy that is entirely wasted. By some estimates the difference
between the amount of energy required to carry out a computation and the amount that
today's computers actually use, is some eight orders of magnitude. When conventional
logic gates produce several outputs, some of these are not used and the energy required to
generate them is simply lost. These are known as garbage states and when the complexity
of the circuit increases it dissipates more power. Minimization of the garbage outputs and
power dissipation is the major goals of conventional logic design.
Low power has emerged as a principal theme in todays electronics industry. The need for
low power has caused a major paradigm shift where power dissipation has become an
important consideration for performance and area. The need for low power dissipation
systems has motivated the development of power efficient designs for basic cells. The
most challenging part is to maintain the high performance while attempting to reduce the
consumed power. Several techniques exist to keep power dissipation to a minimum. One
of the major techniques to reduce the power dissipation is by using reversible logic.
According to Landauers principle, the loss of one bit of information dissipates KTln2
joules of energy where K is the Boltzmanns constant and T is the absolute temperature at
which the operation is performed. The heat generated due to the loss of one bit of
information is very small at room temperature but when the number of bits is more as in
the case of high speed computational works the heat dissipated by them will be so large
that it affects the performance and results in the reduction of lifetime of the components.
In 1973, Bennett, showed that one can avoid KTln2 joules of energy dissipation by
constructing circuits using reversible logic gates. Neither feedback nor fan-out is allowed
in reversible circuits. Classical logic gates such as AND, OR and XOR are not reversible.
Hence, these gates dissipate heat and may reduce the life of the circuit. So, reversible
logic is in demand in power aware circuits. In recent years, reversible logic has emerged
as one of the most important approaches for power optimization with its application in
low power CMOS, nanotechnology and quantum computing and optical computing.
Reversible logic circuits provide less power dissipation as well as distinct output
assignment for each distinct input. The classical set of gates such as the NAND, AND,
NOR, OR, XOR and XNOR are not reversible. Attempts have been taken to minimize the
circuit of all the logic gates using CMOS while making them reversible. Further the
reversible logic has been utilized to design the reversible full adder and half adder.
Computers play an important role in every walk of life. Modern computers based on
integrated circuits are millions to billions of times more capable than the early machines,
and occupy a fraction of the space and dissipate little amount of heat. There are many
arithmetic operations which are performed, on a computer ALU. Arithmetic Logic Unit
(ALU) is a digital circuit that performs arithmetic and logical operations. Arithmetic
operations are addition, subtraction, multiplication and division. Logical operations are
AND, OR, NOT, NOR, NAND, NOR etc. Multiplier circuits play an important role in
computational operation using computers.
Different types of conventional multipliers are array multiplier, Baugh wooly multiplier,
Booth multiplier and Wallace tree multiplier. Multipliers are of two types unsigned and
signed multipliers. Conventional multipliers are having high power dissipation. Design
and implementation of digital circuits using reversible logic has attracted popularity to
gain entry into the future computing technology. So, reversible multipliers are designed to
increase the efficiency and to decrease the power dissipation.
Already there are different types of reversible multipliers such as HNG gate multiplier,
Peres gate multiplier, Islam et.al and shams et.al multipliers. These multipliers are having
more garbage outputs, constant inputs and quantum cost, gate count is also high. A logic
synthesis technique using reversible gate should have the features like using minimum
number of garbage outputs, constant inputs, keeping the length of cascading gates
minimum and using minimum number of gates. To avoid all these, an improved design of
reversible multiplier with respect to its previous counterparts is proposed. For this
multiplier construction a new reversible gate known as BVPPG gate is designed. BVPPG
reversible multiplier is having less number of garbage outputs and constant inputs. So it is
said to be an efficient multiplier which satisfies all the requirements of reversible logic.
The testing of a digital circuit is the process of exercising a digital system with stimuli
and then analyzing the output response to see whether the circuit works correctly or not.
In testing, an error refers to an instance of an incorrect operation of the circuit and a fault
is the cause of an error, which represents the physical difference between a good system
and a bad one. Testing digital circuits involves many important aspects such as fault
modeling, fault simulation, and test pattern generation (TPG) for different fault models
and different circuits.
While the synthesis of reversible logic circuits has been explored in some detail, the
testing of reversible circuits is also a critical aspect in this area, yet this aspect is still
underdeveloped. Related work has focused on fault detection and fault localization in
reversible logic circuits under different fault models. Besides the classical stuck-at fault
model, several new faults models, such as missing gate fault model are proposed and
analyzed. Different detection approaches and Automatic Test Pattern Generation (ATPG)
algorithms for these fault models are also present.
The structure of reversible logic circuits is regular and much simpler than that of
conventional irreversible circuits. Also there is no fan-out or feedback in the system.
These features make it easier to test reversible circuits. Although the classical stuck-at
fault model is widely used for testing conventional CMOS circuits, new fault models,
namely, single missing-gate fault (SMGF), repeated-gate fault (RGF), partial missing-gate
fault (PMGF), and multiple missing-gate faults (MMGF), have been found to be more
suitable for modeling defects in quantum k-CNOT gates. The missing-gate fault (MGF)
model based on quantum technology is defined as the complete removal of a gate from
the circuit. It was shown that missing-gate faults are highly detectable. Obviously, the
repeated gate fault model represents situations where a gate is duplicated and the partially
missing-gate fault represents situations where the size of a gate is reduced since part of
the gate is missing.
All these faults can be tested by the addition of only one extra control line along with
duplication of each gate yields an easily testable design, which admits a universal test set
of size (n+1) that detects all stuck-at faults, SMGFs, RGFs, and PMGFs , MMGFs in the
circuit.
CHAPTER 2
REVERSIBLE LOGIC
In this chapter, reversible logic is explained. Some of the reversible gates and their
quantum representation are also given. The characteristics of reversible gates and
quantum circuits are also explained.
Reversible logic has received great attention in the recent years due to their ability to
reduce the power dissipation which is the main requirement in low power VLSI design.
Quantum computers are constructed using reversible logic circuits. It has wide
applications in low power CMOS and Optical information processing, DNA computing,
quantum computation and nanotechnology. In 1960 R. Landauer demonstrated that high
technology circuits and systems constructed using irreversible hardware result in energy
dissipation due to information loss. One of the main constraints in reversible logic is to
minimize the number of reversible gates used and number of unutilized outputs called
garbage produced. Garbage output refers to the output that is not used for further
computations. In other words, it is not used as a primary output or as an input to another
gate. As the number of inputs and outputs are made equal there may be a number of
garbage outputs produced in certain reversible implementations.
Reversible gates have been studied since the 1960s. The original motivation was that
reversible gates dissipate less heat (or, in principle, no heat). In a normal gate, input states
are lost, since less information is present in the output than was present at the input. This
loss of information loses energy to the surrounding area as heat, because
of thermodynamic entropy. Another way to understand this is that charges on a circuit are
grounded and thus flow away, taking a small energy with them when they change state. A
reversible gate only moves the states around, and since no information is lost, energy is
conserved.
In the last decades, great achievements have been made in the development of computing
machines. While computers consisting of a few thousands of components filled whole
rooms in the early 70s, nowadays billions of transistors are built on some square
5
expression limit. Energy consumption in computation is linked with the information loss
that occurs during operation of gates. If a logic gate is irreversible, then some information
is lost in gate operation (e.g. AND gate). Bennett has shown that almost zero switching
power dissipation can be obtained if the circuit is composed of reversible logic gates.
Reversible logic finds applications in low power computing, optical computing,
cryptography, etc.
Most gates used in classical digital design are not reversible. Reversible gates can be used
to design a reversible circuit, for example, the controlled NOT (CNOT) gate proposed
by Feynman. Syntheses of reversible logic circuits have studied. Universal testability of
reversible logic circuit for detecting single and multiple missing-gate faults and stuck-at
faults models have been investigated. Recently several researchers have studied the
different fault models.
There are two aspects to testing a circuit. One is fault detection and the other is termed
fault localization. Earlier one involves the detection of presence of fault in a circuit; while
the latter is about finding the exact location of this fault. So far, nothing has been
published on self-repair and fault localization of reversible circuits, although it is a
common agreement that future technologies will both be low power and fault-tolerant.
Figure 2.1 shows a pictorial representation of different gates. Fredkin and Toffoli are
universal logic gates. Fredkin can be configured to simulate AND, NOT, CROSSOVER
and FANOUT functions and thus can be cascaded to synthesize any classical or quantum
circuit shown in figure 2.2. Similarly, Toffoli gate can be configured to synthesize NAND
and FANOUT which together are universal for computation as shown in figure 2.3.
(a)
(b)
(c)
(d)
(e)
Figure 2.1: (a) NOT (b) CNOT (c) Toffoli (d) SWAP (e) Fredkin gate
(a)
(b)
(c)
Figure 2.2: Fredkin gate as (a) AND (b) NOT (c) CROSSOVER gate
(a)
(b)
Figure 2.3: Toffoli gate as (a) NAND and (b) FANOUT gate
Figure 2.4 shows the example for a reversible circuit.
0, and the outputs of any gate are at one plus the highest level of any of its inputs. The
depth d is used interchangeably with level and is defined as the maximum level, which
can be no longer than the number of gates in the circuit.
Definition 1: A gate is reversible if the (Boolean) function it computes is bijective.
If arbitrary signals are allowed on the inputs, a necessary condition for reversibility is that
the gate has the same number of input and output wires. If it has k input and output wires,
it is called a k k gate, or a gate on k wires. The mth input wire and the mth output wire is
same.
Definition 2: A k-CNOT is a (k+1) (k+1) gate. It leaves the first k inputs unchanged,
and inverts the last if all others are 1.
The unchanged lines are referred to as control lines. Clearly the k-CNOT gates are all
reversible. The first three of these have special names. The 0-CNOT is just an inverter or
NOT gate, and is denoted by N. It performs the operation (x) (x 1), where
denotes
XOR. The 1-CNOT, which performs the operation (y, x) (y, x y) is referred to as a
controlled-NOT or CNOT (C). The 2-CNOT is normally called a TOFFOLI (T) gate, and
performs the operation (z, y, x) (z, y, x yz). Another reversible logic gate is also used,
called the SWAP (S) gate. It is a 22 gate which exchanges the inputs; that is, (x, y) (y,
x).One reason for choosing these particular gates is that they appear often in the quantum
computing context, where no physical wires exist, and swapping two values requires
non-trivial effort.
design, there are many parameters for determining the complexity and performance of
circuits.
In the design of reversible circuits two restrictions should be considered:
10
Figure 2.5: (a) NOT gate (b) Quantum implementation (c) Quantum representation
2.3.2 Feynman Gate: Figure 2.6 shows a 22 Feynman gate. The input vector is I (A, B)
and the output vector is O(P, Q). The outputs are defined by P=A, Q=A B. The number
11
of primitive reversible gates are 1 i.e. one xor gate. Hence the quantum cost of a Feynman
gate is 1. The Controlled NOT gate (also C-NOT or CNOT) is a quantum gate that is an
essential component in the construction of a quantum computer. Specifically, any
quantum circuit can be simulated to an arbitrary degree of accuracy using a combination
of CNOT gates and single qubit rotations. Truth table for CNOT gate is shown in table
2.1.
Operation: The CNOT gate flips the second qubit (the target qubit) if and only if the first
qubit (the control qubit) is 1.
Figure 2.6: (a) Feynman gate (b) Quantum implementation (c) Quantum
representation
Table 2.1: Truth Table for CNOT Gate
The resulting value of the second qubit corresponds to the result of a classical XOR gate.
12
The first experimental realization of a CNOT gate was accomplished in 1995. Here, a
single Beryllium ion in a trap was used. The two qubits were encoded into an optical state
and into the vibration state of the ion within the trap. At the time of the experiment, the
reliability of the CNOT-operation was measured to be on the order of 90%.
In addition to a regular controlled NOT gate, one could construct a function-controlled
NOT gate, which accepts an arbitrary number n+1 of qubits as input, where n+1 is greater
than or equal to 2 (a quantum register). This gate flips the last qubit of the register if and
only if a built-in function, with the first n qubits as input, returns a 1. The functioncontrolled NOT gate is an essential element of the Deutsch-Jozsa algorithm. CNOT is
also a kind of universal gate (in the classical sense of the word). It easy to see that if the
CONTROL is set to '1' the TARGET output is always NOT. So a NOT gate can be
constructed using CNOT. Further we can construct an AND gate by using two CNOT's as
shown in the table 2.2.
13
2.3.3 Double Feynman Gate (F2G): Figure 2.7 shows a 33 Double Feynman gate. The
input vector is I (A, B, C) and the output vector is O (P, Q, R). The outputs are defined by
P = A, Q=A B, R=A C. Double Feynman gate is having two CNOT gates. As there are
two
primitive
gates,
the
quantum
cost
of
double
Feynman
gate
is
2.
Figure 2.7: (a) Double Feynman gate (b) Qantum implementation (c) Quantum
representation
2.3.4 Toffoli Gate: The Toffoli gate (also CCNOT gate which is a controlled-controlledNOT gate.), invented by Tommaso Toffoli, is a universal reversible logic gate, which
means that any reversible circuit can be constructed from Toffoli gates.
Background
14
A logic gate L is reversible if, for any output y, there is a unique input x such that
applying L(x) = y. If a gate L is reversible, there is an inverse gate L which
maps y to x for which L(x) = y. From common logic gates, NOT is reversible. However
the common AND gate is not reversible. The inputs 00, 01 and 10 all get mapped to the
output
0.
More
recent
motivation
comes
from quantum
computing. Quantum
mechanics requires the transformations to be reversible but allows more general states of
the computation.
Universality of Toffoli gates
Any reversible gate must have the same number of input and output bits, by
the pigeonhole principle. For one input bit, there are two possible reversible gates. One of
them is NOT. The other is the identity gate which maps its input to the output unchanged.
For two input bits, the only non-trivial gate is the controlled NOT gate which XORs the
first bit to the second bit and leaves the first bit unchanged.
Fig 2.8 shows a 33 Toffoli gate. The input vector is I(A, B, C) and the output vector is
O(P,Q,R). The outputs are defined by P=A, Q=B, R=AB C. Quantum cost of a Toffoli
gate is 5, because there are 5 primitive gates in the quantum implementation.
Figure 2.8: (a) Toffoli gate (b) Quantum implementation (c) Quantum representation
Unfortunately, there are reversible functions which cannot be computed using just those
gates. In other terms, the set consisting of NOT and XOR gates is not universal. One
possibility is the Toffoli gate, proposed in 1980 by Toffoli. The truth table for 22 Toffoli
gate is shown in table 2.3.
15
This gate has a 3-bit input and output. If the first two bits are set, it flips the third bit. It
can be also described as mapping bits a, b and c to a, b and c XOR (a AND b).
The Toffoli gate is universal. This means that for any Boolean function f(x1, x2, ..., xm),
there is a circuit consisting of Toffoli gates which takes x1, x2, ..., xm and some extra bits
set to 0 or 1 and outputs x1, x2, ..., xm, f(x1, x2, ..., xm), and some other extra bits (called
garbage). Essentially, this means that one can use Toffoli gates to build systems that will
perform any desired Boolean function computation in a reversible manner.The truth table
for a 33 Toffoli gate is shown in table 2.4.
Table 2.4 Truth table for 3 3 Toffoli Gate
16
2.3.5 Fredkin Gate: The Fredkin gate (also CSWAP gate) is a computational circuit
suitable for reversible computing, invented by Ed Fredkin. It is universal, which means
that any logical or arithmetic operation can be constructed entirely of Fredkin gates. The
Fredkin gate is the three-bit gate that swaps the last two bits if the first bit is 1.
Fig 2.9 shows a 33 Fredkin gate. The input vector is I (A, B, C) and the output vector is
O(P, Q, R). The output is defined by P=A, Q=AB AC and R=AC AB. Quantum cost of
a Fredkin gate is 5, as there are 5 primitive gates in quantum implementation..
Figure
2.9: (a)Fredkin
gate
(b)
Quantum
representation
17
implementation
(c)
Quantum
The basic Fredkin gate is a controlled swap gate that maps three inputs (C, I1, I2) onto
three outputs (C, O1, O2). The C input is mapped directly to the C output. If C = 0, no
swap is performed. First input maps to first output, and second input maps to second
output. Otherwise, the two outputs are swapped so that first input maps to second output,
and second input maps to first output. It is easy to see that this circuit is reversible, i.e.,
"undoes itself" when run backwards. A generalized nn Fredkin gate passes its first n-2
inputs unchanged to the corresponding outputs, and swaps its last two outputs if and only
if the first n-2 inputs are all 1.
2.3.6 Peres Gate: Fig 2.10 shows a 33 Peres gate. The input vector is I (A, B, C) and the
output vector is O (P, Q, R). The output is defined by P = A, Q = A B and R=AB C.
There are 4 primitive gates in the quantum implementation. Hence the quantum cost of a
Peres gate is 4.
Figure 2.10: (a) Peres gate (b) Quantum implementation (c) Quantum
representation
18
19
A quantum computer is a device for computation that makes direct use of quantum
mechanical phenomena, such as superposition and entanglement, to perform operations
on data. Quantum computers are different from traditional computers based on transistors.
The basic principle behind quantum computation is that quantum properties can be used
to represent data and perform operations on these data. A theoretical model is the
quantum Turing machine, also known as the universal quantum computer. Quantum
computers share theoretical similarities with non-deterministic and probabilistic
computers, like the ability to be in more than one state simultaneously.
2.4.2 Bits vs. Qubits
A quantum computer with a given number of qubits is exponentially more complex than a
classical computer with the same number of bits because describing the state of n qubits
requires 2n complex coefficients. Measuring the qubits would produce a classical state of
only n bits, but such an action would also destroy the quantum state. For example, a 300qubit quantum computer has a state described by 2300 (approximately 1090) complex
numbers, more than the number of atoms in the observable universe. Figure 2.11 shows
the examples of qubits.
20
For example: Consider first a classical computer that operates on a three-bit register. The
state of the computer at any time is a probability distribution over the 23 = 8 different
three-bit strings 000, 001, 010, 011, 100, 101, 110, 111. If it is a deterministic computer,
then it is in exactly one of these states with probability 1. However, if it is a probabilistic
computer, then there is a possibility of it being in any one of a number of different states.
The probabilistic state can be described by eight non-negative numbers a, b, c, d, e, f, g, h
(where a = probability that computer is in state 000, b = probability that computer is in
state 001, etc.). There is a restriction that these probabilities sum to 1.
In this chapter all the information regarding the reversible logic is given and in the next
chapter the design of reversible SPT multiplier is discussed.
Chapter3
NEW REVERSIBLE LOGIC GATE
1. BVPPG gate
BVPPG gate is a 5 * 5 reversible gate and its logic diagram is as shown in
figure 6. Its quantum cost is 10. Ffoli representation of the BVPPG gate is a
shown in the figure 7. The truth table of BVPPG is as shown in the Table -1.
gate
FIG:BVPPG gate
21
The BVPPG gate is used to construct the partial product generator which has
resulted in least number of gates, least quantum cost and least number of
garbage outputs. The two product terms are available at the outputs R and T of
the BVPPG gate with C and E inputs maintained constant at 0. The other outputs
namely P, Q and S are used for fan-out of the multiplier operands as shown in
figure 8.. This reduces the number of external fan-out gates to zero in our design
which is main design feature. The proposed design is compared with the existing
designs [11-22].
22
23
24
25
26
Chapter 4
Modified BVPPG gate
BVPPG gate contains AND gate and XOR gates for implementing reversible logic. The
main idea in this project is to increase speed and reduce the power consumption as well as
the area of the multiplier.
The schematic of the existing BVPPG gate in terms of AND gate and XOR gate is shown
below
27
28
29
30
31
The highvolume Spartan series, with a cheaper EasyPath option for ramping to
volume production.
The company also provides two CPLD lines, the CoolRunner and the 9500 series.
Each model series has been released in multiple generations since its launch.
With the introduction of its 28 nm FPGAs in June 2010, Xilinx replaced the high
volume Spartan family with a Kintex family and the low cost Artixfamily.In newer FPGA
products, Xilinx minimized total power consumption by adopting a high-K metal gate
(HKMG) process which allows for low static power consumption.
At the 28 nm node, static power is a significant portion of the total power dissipation
of a device and in some cases is the dominant factor. Through the use of a HKMG
process, Xilinx has reduced power use while increasing logic capacity. Virtex-6 and
Spartan-6 FPGA families are said to consume 50% less power and have up to twice the
logic capacity compared to the previous generation of Xilinx FPGAs.
In June 2010, Xilinx introduced the Xilinx 7 series, the Virtex-7, Kintex-7, and Artix7 families and promising improvements in system power, performance, capacity and
price. These new FPGA families are manufactured using TSMC's 28 nm HKMG
32
34
contains
interconnections
two
between
dimensional
logic
blocks.
arrays
Both
of
the
logic
logic
blocks
and
blocks
and
35
(CPU), all the sub functions implemented in logic blocks must be connected
and this is done by programming the interconnects.
FPGAs, alternative to the custom ICs, can be used to implement an
entire System On one Chip (SOC). The main advantage of FPGA is ability to
reprogram. User can reprogram an FPGA to implement a design and this is
done
after
the
FPGA
is
manufactured.
This
brings
the
name
FieldProgrammable.
Custom ICs are expensive and takes long time to design so they are
useful when produced in bulk amounts. But FPGAs are easy to implement with
in a short time with the help of Computer Aided Designing (CAD) tools
(because there is no physical layout process, no mask making, and no IC
manufacturing).
Some disadvantages of FPGAs are, they are slow compared to custom
ICs as they cant handle vary complex designs and also they draw more
power.
36
.
Xilinx logic block consists of one Look Up Table (LUT) and one FlipFlop.
An LUT is used to implement number of different functionality. The input lines
to the logic block go into the LUT and enable it. The output of the LUT gives
the result of the logic function that it implements and the output of logic block
is registered or unregistered output from the LUT.
SRAM is used to implement a LUT.A k-input logic function is
implemented using 2^k * 1 size SRAM. Number of different possible functions
for k input LUT is 2^2^k. Advantage of such an architecture is that it supports
implementation of so many logic functions, however the disadvantage is
unusually large number of memory cells required to implement such a logic
block in case number of inputs is large.
The main advantage of FPGA is ability to reprogram. User can reprogram an
FPGA to implement a design and this is done after the FPGA is manufactured.
This brings the name FieldProgrammable.
37
AND
OR
.....
......
38
......
Interconnects
40
41
42
Step 4:
The source file containing the counter module displays in the Workspace and the
counterdisplays in the Sources tab is shown in the figure 4.5:
43
syntax, if the user tries to proceed, he/she will not be able tosimulate or synthesize the
design.
1. Close the HDL file.
Step 6: Verifying Functionality using Behavioral Simulation
Create a test bench waveform containing input stimulus you can use to verify
thefunctionality of the counter module. The test bench waveform is a graphical view of a
testbench.
Create the test bench waveform as follows:
1. Select the counter HDL file in the Sources window.
2. Create a new test bench source by selecting Project New Source.
3. In the New Source Wizard, select Test Bench WaveFormas the source type, and
typecounter_tbwin the File Name field.
4. Click Next.
5. The Associated Source page shows that you are associating the test bench
waveformwith the source file counter. Click Next.
6. The Summary page shows that the source will be added to the project, and it
displaysthe source directory, type, and name. Click Finish.
7. The clock frequency, setup time and output delay times are required to be set in the
InitializeTiming dialog box before the test bench waveform editing window opens.
The requirements for this design are the following:
The counter must operate correctly with an input clock frequency = 25 MHz.
The DIRECTION input will be valid 10 ns before the rising edge of CLOCK.
The output (COUNT_OUT) must be valid 10 ns after the rising edge of CLOCK.
Offset: 0 ns.
44
Click on the blue cell at approximately the 300 ns to assert DIRECTION high so
45
46
47
48
49
Design Entry
There are different techniques for design entry. Schematic based,
Hardware Description Language and combination of both etc. . Selection of a
method depends on the design and designer. If the designer wants to deal
more with Hardware, then Schematic entry is the better choice. When the
design is complex or the designer thinks the design in an algorithmic way
then HDL is the better choice. Language based entry is faster but lag in
performance and density.
HDLs represent a level of abstraction that can isolate the designers from the
details of the hardware implementation. Schematic based entry gives designers much
more visibility into the hardware. It is the better choice for those who are hardware
oriented. Another method but rarely used is state-machines. It is the better choice for the
designers who think the design as a series of states. But the tools for state machine entry
are limited. In this documentation we are going to deal with the HDL based design entry.
Synthesis
The process which translates VHDL or Verilog code into a device netlist
formate. i.e a complete circuit with logical elements( gates, flip flops, etc)
for the design.If the design contains more than one sub designs, ex. to
implement a processor, we need a CPU as one design element and RAM as
another and so on, then the synthesis process generates netlist for each
design element Synthesis process will check code syntax and analyze the
hierarchy of the design which ensures that the design is optimized for the
design architecture, the designer has selected. The resulting netlist(s) is
saved to an NGC( Native Generic Circuit) file (for Xilinx Synthesis
Technology (XST)).
50
FPGA Synthesis
Implementation:
This process consists a sequence of three steps
1. Translate
2. Map
3. Place and Route
Translate:
Process combines all the input netlists and constraints to a logic design
file. This information is saved as a NGD (Native Generic Database) file. This
can be done using NGD Build program. Here, defining constraints is nothing
but, assigning the ports in the design to the physical elements (ex. pins,
switches,
buttons
etc)
of
the
targeted
device
and
specifying
time
51
FPGA Translate
Map:
Process divides the whole circuit with logical elements into sub blocks
such that they can be fit into the FPGA logic blocks. That means map process
fits the logic defined by the NGD file into the targeted FPGA elements
(Combinational Logic Blocks (CLB), Input Output Blocks (IOB)) and generates
an NCD (Native Circuit Description) file which physically represents the design
mapped to the components of FPGA. MAP program is used for this purpose.
FPGA Map
52
PAR program is used for this process. The place and route process
places the sub blocks from the map process into logic blocks according to the
constraints and connects the logic blocks. Ex. if a sub block is placed in a
logic block which is very near to IO pin, then it may save the time but it may
effect some other constraint. So trade off between all the constraints is taken
account by the place and route process
The PAR tool takes the mapped NCD file as input and produces a
completely routed NCD file as output. Output NCD file consists the routing
information.
Device Programming:
Now the design must be loaded on the FPGA. But the design must be
converted to a format so that the FPGA can accept it. BITGEN program deals
with the conversion. The routed NCD file is then given to the BITGEN program
to generate a bit stream (a .BIT file) which can be used to configure the target
FPGA device. This can be done using a cable. Selection of cable depends on
the design.
53
Design Verification:
Verification can be done at different stages of the process steps.
Behavioral Simulation (RTL Simulation):
This is first of all simulation steps; those are encountered throughout
the hierarchy of the design flow. This simulation is performed before synthesis
process to verify RTL (behavioral) code and to confirm that the design is
functioning as intended. Behavioral simulation can be performed on either
VHDL or Verilog designs. In this process, signals and variables are observed,
procedures and functions are traced and breakpoints are set. This is a very
fast simulation and so allows the designer to change the HDL code if the
required functionality is not met with in a short time period. Since the design
is not yet synthesized to gate level, timing and resource usage properties are
still unknown.
Functional simulation (Post Translate Simulation):
Functional simulation gives information about the logic operation of the
circuit. Designer can verify the functionality of the design using this process
after the Translate process. If the functionality is not as expected, then the
designer has to made changes in the code and again follow the design flow
steps.
4.3.1FPGA Flow
The basic implementation of design on FPGA has the following steps.
1.
2.
3.
4.
Design Entry
Simulation
Synthesis
Logic Optimization
54
5.
6.
7.
8.
9.
Technology Mapping
Placement
Routing
Programming Unit
Configured FPGA
The initial design entry of may be VHDL, schematic or Boolean expression. The
optimization of the Boolean expression will be carried out by considering area or speed.
In technology mapping, the transformation of optimized Boolean expression to
FPGA logic blocks, that is said to be as Slices. Here area and delay optimization will be
taken place. During placement the algorithms are used to place each block in FPGA array.
Assigning the FPGA wire segments, which are programmable, to establish connections
among FPGA blocks through routing. The configuration of final chip is made in
programming unit.
4.3.2FPGA Implementation
Initially the market research should be carried out which covers the previous version
of the design and the current requirements on the design.
55
Translate
Map
Place and Route
56
The developed RTL model will be translated to the mathematical equation format
which will be in the understandable format of the tool. These translated equations will be
then mapped to the library that is mapped to the hardware. Once the mapping is done, the
gates were placed and routed. Before these processes, the constraints can be given in
order to optimize the design. Finally the BIT MAP file will be generated that has the
design information in the binary format which will be dumped in the FPGA board.
4.3.3 Synthesis Results
The developed CSLA is simulated and verified their functionality. Once the
functional verification is done, the RTL model is taken to the synthesis process using the
Xilinx ISE tool. In synthesis process, the RTL model will be converted to the gate level
netlist mapped to a specific technology library.
Here in this Spartan 3E family, many different devices were available in the Xilinx
ISE tool. In order to synthesis the two designs, the device named as XC3S500E has
been chosen and the package as FG320 with the device speed such as -4.
57
Simulation results
Simulation results of binary multiplier using BVPPG gate
58
59
61
Advantages
The advantage of this multiplier is high speed and occupies less area and low powe
consumption than any other multipliers such as parallel multiplier, array multiplier,
booth multiplier and vedic multiplier.
Conclusion results
s.no
Speed
Area
power
BVPPG
15.09ns
40slices
27mW
VEDIC
17.734ns
89slices
32mW
Bibilography
[1] Gordon E. Moore, Cramming more components onto integrated circuits, Electronics
Volume 38,
number 8, April 19, 1965.
62
63
[20] Nidhi Syal, Dr. H.P. Sinha, Design of fault tolerant reversible multiplier, International
Journal of
Soft Computing and Engineering (IJSCE) ISSN: 2231-2307, Volume-1, Issue-6, January
2012.
[21] Somayeh Babazadeh and Majid Haghparast, Design of a Nanometric Fault Tolerant
Reversible
Multiplier Circuit, Journal of Basic and Applied Scientific Research, Text Road Publication,
2(2)1355-1361, 2012,ISSN 2090-4304,www.textroad.com
[22] H. Thapliyal and M.B. Srinivas, Novel Reversible Multiplier Architecture Using
Reversible TSG
Gate, Proc. IEEE International Conference on Computer Systems and Applications, pp.
100-103,
March 2006.
[23] M. Soeken,S. Frehse,R. Wille and R. Drechsler, RevKit: An Open Source Toolkit for
theDesign of
Reversible Circuits, Springer- Lecture Notes in Computer Science, vol- 7165, pages 64-76,
in
Reversible Computation 2011,RevKit is available at www.revkit.org.
[24] R. Wille, D. Grobe, G. W. Dueck and R. Drechsler, RevLib: An Online Resource for
Reversible
Functions and Reversible Circuits, International Symposium on Multiple-Valued Logic
(ISMVL),
pages 220225, 2008.
[25] R. Wille D. M. Miller and Z. Sasanian. Elementary quantum gate realizations for
multiple-control
toffoli gates. International Symposium on Multi-Valued Logic, IEEE, 2011.
[26] Md Belayet Ali , Md Mosharof Hossin and Md Eneyat Ullah, Design of Reversible
Sequential
Circuit Using Reversible Logic Synthesis , International Journal of VLSI design and
Communication Systems (VLSICS) Volume 2, No.4, December 2011.
[27] D. Groe, R. Wille, G.W. Dueck, and R. Drechsler, Exact multiple control Toffoli
network synthesis
with SAT techniques , IEEE Trans. On CAD, 28(5):703715, 2009.
Future scope
The future scope of this project is still we can reduce the number of transistors in XOR
gate to 2 so that the area ,power consumption reduces and speed of operation increases
and also we can use these modified AND or XOR gates in construction of other basic
building blocks of DSP processers such as adders to increase the speed of operation.
Synthesis Report
Release 8.2i - xst I.31
Copyright (c) 1995-2006 Xilinx, Inc. All rights reserved.
64
TABLE OF CONTENTS
1) Synthesis Options Summary
2) HDL Compilation
3) Design Hierarchy Analysis
4) HDL Analysis
5) HDL Synthesis
5.1) HDL Synthesis Report
6) Advanced HDL Synthesis
6.1) Advanced HDL Synthesis Report
7) Low Level Synthesis
8) Partition Report
65
9) Final Report
9.1) Device utilization summary
9.2) TIMING REPORT
================================================================
=========
*
================================================================
=========
---- Source Parameters
Input File Name
Input Format
: "ppg.prj"
: mixed
: "ppg"
: NGC
: xc3s500e-4-fg320
66
: ppg
: YES
: Auto
FSM Style
: lut
RAM Extraction
: Yes
RAM Style
: Auto
ROM Extraction
Mux Style
: Yes
: Auto
Decoder Extraction
: YES
: YES
: YES
: YES
XOR Collapsing
ROM Style
: YES
: Auto
Mux Extraction
: YES
Resource Sharing
: YES
Multiplier Style
: auto
: No
67
: YES
: 500
:8
: YES
: YES
: auto
: YES
: Speed
Optimization Effort
:1
Keep Hierarchy
: NO
RTL Output
Global Optimization
: Yes
: AllClockNets
: NO
:/
: <>
68
Case Specifier
: maintain
: 100
:5
: ppg.lso
: YES
: NO
: YES
: No
: Yes
use_sync_set
: Yes
use_sync_reset
: Yes
================================================================
=========
69
================================================================
=========
*
HDL Compilation
================================================================
=========
Compiling vhdl file "E:/projects/4x4mul/bvppg/BVPPG.vhd" in Library work.
Architecture behavioral of Entity bvppg is up to date.
Compiling vhdl file "E:/projects/4x4mul/bvppg/pg.vhd" in Library work.
Architecture behavioral of Entity pg is up to date.
Compiling vhdl file "E:/projects/4x4mul/bvppg/dpg.vhd" in Library work.
Architecture behavioral of Entity dpg is up to date.
Compiling vhdl file "E:/projects/4x4mul/bvppg/ppg.vhd" in Library work.
Architecture behavioral of Entity ppg is up to date.
================================================================
=========
*
================================================================
=========
Analyzing hierarchy for entity <ppg> in library <work> (architecture <behavioral>).
70
================================================================
=========
*
HDL Analysis
================================================================
=========
Analyzing Entity <ppg> in library <work> (Architecture <behavioral>).
Entity <ppg> analyzed. Unit <ppg> generated.
71
================================================================
=========
*
HDL Synthesis
================================================================
=========
72
73
================================================================
=========
HDL Synthesis Report
74
Macro Statistics
# Xors
: 48
1-bit xor2
: 40
1-bit xor3
:8
================================================================
=========
================================================================
=========
*
================================================================
=========
================================================================
=========
Advanced HDL Synthesis Report
75
Macro Statistics
# Xors
: 48
1-bit xor2
: 40
1-bit xor3
:8
================================================================
=========
================================================================
=========
*
================================================================
=========
76
================================================================
=========
Final Register Report
Found no macro
================================================================
=========
================================================================
=========
*
Partition Report
================================================================
=========
77
-------------------------------
================================================================
=========
*
Final Report
================================================================
=========
Final Results
RTL Top Level Output File Name
Top Level Output File Name
Output Format
: ppg.ngr
: ppg
: NGC
Optimization Goal
: Speed
Keep Hierarchy
: NO
Design Statistics
# IOs
: 40
Cell Usage :
# BELS
: 36
: 16
LUT2
78
LUT3
:4
LUT4
: 16
# IO Buffers
#
IBUF
OBUF
: 40
:8
: 32
================================================================
=========
Number of Slices:
21 out of 4656
0%
36 out of 9312
0%
40 out of
17%
40
79
232
================================================================
=========
TIMING REPORT
Clock Information:
-----------------No clock signals found in this design
Timing Summary:
--------------Speed Grade: -4
80
Timing Detail:
-------------All values displayed in nanoseconds (ns)
================================================================
=========
Timing constraint: Default path analysis
Total number of paths / destination ports: 360 / 32
------------------------------------------------------------------------Delay:
Source:
Destination:
Gate
Cell:in->out
Net
---------------------------------------- -----------IBUF:I->O
LUT2:I0->O
0.704
1.082 REVERSIBLE_GATE5/_xor00001
(REVERSIBLE_GATE5/_xor0000)
LUT4:I0->O
LUT4:I0->O
(dpg3<2>)
LUT3:I0->O
LUT3:I1->O
LUT4:I0->O
OBUF:I->O
3.272
z_7_OBUF (z<7>)
---------------------------------------Total
================================================================
=========
CPU : 5.71 / 5.87 s | Elapsed : 6.00 / 6.00 s
82
-->
Number of errors :
0 ( 0 filtered)
: 0 ( 0 filtered)
Code
Partial product generation (PPG)
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
83
--use UNISIM.VComponents.all;
entity ppg is
Port ( x : in STD_LOGIC_VECTOR (3 downto 0);
y : in STD_LOGIC_VECTOR (3 downto 0);
p00,p01,p02,p03,p10,p11,p12,p13,p20,p21,p22,p23,p30,p31,p32,p33:inout
std_logic;
g:inout std_logic_vector (8 downto 1);
z:out std_logic_vector(7 downto 0));
end ppg;
component pg is
port(a,b,c:in std_logic;
p,q,r:out std_logic);
end component;
signal bvppg0:std_logic_vector(4 downto 0);
signal bvppg1:std_logic_vector(4 downto 0);
signal bvppg2:std_logic_vector(4 downto 0);
signal bvppg3:std_logic_vector(4 downto 0);
signal bvppg4:std_logic_vector(4 downto 0);
signal bvppg5:std_logic_vector(4 downto 0);
signal bvppg6:std_logic_vector(4 downto 0);
signal bvppg7:std_logic_vector(4 downto 0);
signal pg0:std_logic_vector(2 downto 0);
signal pg1:Std_logic_vector(2 downto 0);
signal pg2:std_logic_vector(2 downto 0);
signal pg3:std_logic_vector(2 downto 0);
signal dpg0:std_logic_vector(3 downto 0);
signal dpg1:std_logic_vector(3 downto 0);
signal dpg2:std_logic_vector(3 downto 0);
85
port
map
(x(0),y(0),'0',y(1),'0',bvppg0(0),bvppg0(1),bvppg0(2),bvppg0(3),bvppg0(4));
REVERSIBLE_GATE1:bvppg
port
map
(x(1),bvppg0(1),'0',bvppg0(3),'0',bvppg1(0),bvppg1(1),bvppg1(2),bvppg1(3),bvppg1(4));
REVERSIBLE_GATE2:bvppg
port
map
(x(2),bvppg1(1),'0',bvppg1(3),'0',bvppg2(0),bvppg2(1),bvppg2(2),bvppg2(3),bvppg2(4));
REVERSIBLE_GATE3:bvppg
port
map
(x(3),bvppg2(1),'0',bvppg2(3),'0',bvppg3(0),bvppg3(1),bvppg3(2),bvppg3(3),bvppg3(4));
REVERSIBLE_GATE4:bvppg
port
map
(bvppg0(0),y(2),'0',y(3),'0',bvppg4(0),bvppg4(1),bvppg4(2),bvppg4(3),bvppg4(4));
REVERSIBLE_GATE5:bvppg
port
map
(bvppg1(0),bvppg4(1),'0',bvppg4(3),'0',bvppg5(0),bvppg5(1),bvppg5(2),bvppg5(3),bvppg
5(4));
REVERSIBLE_GATE6:bvppg
port
map
(bvppg2(0),bvppg5(1),'0',bvppg5(3),'0',bvppg6(0),bvppg6(1),bvppg6(2),bvppg6(3),bvppg
6(4));
86
REVERSIBLE_GATE7:bvppg
port
map
(bvppg3(0),bvppg6(1),'0',bvppg6(3),'0',bvppg7(0),bvppg7(1),bvppg7(2),bvppg7(3),bvppg
7(4));
PERES_GATE0:pg port map (p01,p10,'0',pg0(0),pg0(1),pg0(2));
PERES_GATE1:pg port map (p22,p13,'0',pg1(0),pg1(1),pg1(2));
PERES_GATE2:pg port map (pg0(2),dpg0(2),'0',pg2(0),pg2(1),pg2(2));
PERES_GATE3:pg port map (pg2(2),dpg2(2),'0',pg3(0),pg3(1),pg3(2));
DOUBLE_PERES_GATE0:dpg
port
map
port
map
port
map
(p02,p11,p20,'0',dpg0(0),dpg0(1),dpg0(2),dpg0(3));
DOUBLE_PERES_GATE1:dpg
(p30,p12,p21,'0',dpg1(0),dpg1(1),dpg1(2),dpg1(3));
DOUBLE_PERES_GATE2:dpg
(p03,dpg0(3),dpg1(2),'0',dpg2(0),dpg2(1),dpg2(2),dpg2(3));
DOUBLE_PERES_GATE3:dpg
port
map
(p31,dpg1(3),pg1(1),'0',dpg3(0),dpg3(1),dpg3(2),dpg3(3));
DOUBLE_PERES_GATE4:dpg
port
map
port
map
(pg1(2),p23,p32,'0',dpg4(0),dpg4(1),dpg4(2),dpg4(3));
DOUBLE_PERES_GATE5:dpg
(pg3(2),dpg2(3),dpg3(2),'0',dpg5(0),dpg5(1),dpg5(2),dpg5(3));
DOUBLE_PERES_GATE6:dpg
port
map
(dpg5(3),dpg3(3),dpg4(2),'0',dpg6(0),dpg6(1),dpg6(2),dpg6(3));
DOUBLE_PERES_GATE7:dpg
port
(dpg6(3),dpg4(3),p33,'0',dpg7(0),dpg7(1),dpg7(2),dpg7(3));
87
map
p00<=bvppg0(2);
p01<=bvppg0(4);
p02<=bvppg4(2);
p03<=bvppg4(4);
p10<=bvppg1(2);
p11<=bvppg1(4);
p12<=bvppg5(2);
p13<=bvppg5(4);
p20<=bvppg2(2);
p21<=bvppg2(4);
p22<=bvppg6(2);
p23<=bvppg6(4);
p30<=bvppg3(2);
p31<=bvppg3(4);
p32<=bvppg7(2);
p33<=bvppg7(4);
g(1)<=bvppg3(1);
g(2)<=bvppg3(3);
g(3)<=bvppg4(0);
88
g(4)<=bvppg5(0);
g(5)<=bvppg6(0);
g(6)<=bvppg7(0);
g(7)<=bvppg7(1);
g(8)<=bvppg7(3);
z<=dpg7(3)&dpg7(2)&dpg6(2)&dpg5(2)&pg3(1)&pg2(1)&pg0(1)&p00;
end Behavioral;
BVPPG gate
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity bvppg is
89
begin
P<=A;
Q<=B;
R<=(A AND B) XOR C;
S<=D;
T<=(A AND D) XOR E;
end Behavioral;
PG gate
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
90
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity pg is
Port ( A : in STD_LOGIC;
B : in STD_LOGIC;
C : in STD_LOGIC;
P : out STD_LOGIC;
Q : out STD_LOGIC;
R : out STD_LOGIC);
end pg;
architecture Behavioral of pg is
begin
91
P<=A;
Q<=(A XOR B);
R<=((A AND B) XOR C);
end Behavioral;
DPG gate
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity dpg is
Port ( A : in STD_LOGIC;
B : in STD_LOGIC;
92
C : in STD_LOGIC;
D : in STD_LOGIC;
P : out STD_LOGIC;
Q : out STD_LOGIC;
R : out STD_LOGIC;
S : out STD_LOGIC);
end dpg;
begin
P<=A;
Q<=(A XOR B);
R<=(A XOR B XOR C);
S<=(((A XOR B) AND C) XOR ((A AND B) XOR D));
end Behavioral;
93
94