Академический Документы
Профессиональный Документы
Культура Документы
The figure shows the pull down NMOS logic for a NOR gate. This pull down structure is
used in the dynamic gates.
How dynamic gates work :
In static gates, inputs switch and after a finite input to output delay, output possibly
switches to the expected state.
The biggest benefit of dynamic gates is that they can be cascaded together and their pull
down only property can be leveraged to have a very fast delay through a chain of
multiple stage dynamic gates.
Posted in Circuits, CMOS theory | Leave a reply
NMOS and PMOS logic
Posted on August 16, 2012
CMOS is the short form for the Complementary Metal Oxide Semiconductor.
Complementary stands for the fact that in CMOS technology based logic, we use both ptype devices and n-type devices.
Logic circuits that use only p-type devices is referred to as PMOS logic and similarly
circuits only using n-type devices are called NMOS logic. Before CMOS technology
became prevalent, NMOS logic was widely used. PMOS logic had also found its use in
specific applications.
Lets understand more how NMOS logic works. As per the definition, we are only allowed
to use the n type device as building blocks. No p-type devices are allowed. Lets take an
example to clarify this. Following is the truth table for a NOR gate.
Unlike beginend block where expressions are executed in the order they appear,
expression within forkjoin block are executed in parallel. This parallelism can be
the source of the race condition as shown in above example.
Both blocking assignments are scheduled to execute in parallel and depending upon the
order of their execution eventual value of y could be either 2 or the previous value of
x, but it can not be determined beforehand.
4) Race condition because of variable initialization.
reg clk = 0
initial
clk = 1
In Verilog reg type variable can be initialized within the declaration itself. This
initialization is executed at time step zero, just like initial block and if you happen to have
an initial block that does the assignment to the reg variable, you have a race condition.
There are few other situations where race conditions could come up, for example if a
function is invoked from more than one active blocks at the same time, the execution
order could become non-deterministic.
-SS.
driving gate can charge or discharge the load within reasonable time with reasonable
power dissipation.
Our aim is to find out the nominal fanout value which gives the best speed with least
possible power dissipation. To simplify our analysis we can focus on the leakage power,
which is proportional to the width or size of the gate. Hence our problem simplifies to,
how can we get the smallest delay through gates, while choosing smallest possible gate
sizes.
Typical fanout value can be found out using the CMOS gate delay models. Some of the
CMOS gate models are very complicated in nature. Luckily there are simplistic delay
models, which are fairly accurate. For sake of comprehending this issue, we will go
through an overly simplified delay model.
We know that I-V curves of CMOS transistor are not linear and hence, we cant really
assume transistor to be a resistor when transistor is ON, but as mentioned earlier we can
assume transistor to be resistor in a simplified model, for our understanding. Following
figure shows a NMOS and a PMOS device. Lets assume that NMOS device is of unit
gate width W and for such a unit gate width device the resistance is R. If we were to
assume that mobility of electrons is double that of holes, which gives us an approximate
P/N ratio of 2/1 to achieve same delay(with very recent process technologies the P/N
ratio to get same rise and fall delay is getting close to 1/1). In other words to achieve the
same resistance R in a PMOS device, we need PMOS device to have double the width
compared to NMOS device. That is why to get resistance R through PMOS device
device it needs to be 2W wide.
Lets also assume that for width W, the gate capacitance is C. This means our NMOS
gate capacitance is C and our PMOS gate capacitance is 2C. Again for sake of
simplicity lets assume the diffusion capacitance of transistors to be zero.
Lets assume that an inverter with W gate width drives another inverter with gate width
that is a times the width of the driver transistor. This multiplier a is our fanout. For the
receiver inverter(load inverter), NMOS gate capacitance would be a*C as gate
capacitance is proportional to the width of the gate.
For this RC circuit, we can calculate the delay at the driver output node using Elmore
delay approximation. If you can recall in Elmore delay model one can find the total delay
through multiple nodes in a circuit like this : Start with the first node of interest and keep
going downstream along the path where you want to find the delay. Along the path stop at
each node and find the total resistance from that node to VDD/VSS and multiply that
resistance with total Capacitance on that node. Sum up such R and C product for all
nodes.
In our circuit, there is only one node of interest. That is the driver inverter output, or the
end of resistance R. In this case total resistance from the node to VDD/VSS is R and
total capacitance on the node is aC+2aC=3aC. Hence the delay can be approximated to
be R*3aC= 3aRC
Now to find out the typical value of fanout a, we can build a circuit with chain of back
to back inverters like following circuit.
If we want to find the minimum value of total delay function for a specific value of
fanout a, we need to take the derivative of total delay with respect to a and make it
zero. That gives us the minima of the total delay with respect to a.
D = 3*RC*ln(CL/C)*a/ln(a)
dD/da = 3*RC* ln(CL/C) [ (ln(a) -1)/ln2(a)] = 0
For this to be true
(ln(a) -1) = 0
Which means : ln(a) = 1, the root of which is a = e.
This is how we derive the fanout of e to be an optimal fanout for a chain of inverters.
If one were to plot the value of total delay D against a for such an inverter chain it
looks like following.
One more thing to remember here is that, we assumed a chain of inverter. In practice
many times you would find a gate driving a long wire. The theory still applies, one just
have to find out the effective wire capacitance that the driving gate sees and use that to
come up with the fanout ratio.
-SS.
Posted in Circuits, CMOS theory | Leave a reply
Inverted Temperature Dependence.
Posted on July 21, 2012
It is known that with increase in temperate, the resistivity of a metal wire(conductor)
increases. The reason for this phenomenon is that with increase in temperature, thermal
vibrations in lattice increase. This gives rise to increased electron scattering. One can
visualize this as electrons colliding with each other more and hence contributing less to
the streamline flow needed for the flow of electric current.
There is similar effect that happens in semiconductor and the mobility of primary carrier
decreases with increase in temperature. This applies to holes equally as well as electrons.
But in semiconductors, when the supply voltage of a MOS transistor is reduced, and
interesting effect is observed. At lower voltages the delay through the MOS device
decreases with increasing temperature, rather than increasing. After all common wisdom
is that with increasing temperature the mobility decreases and hence one would have
expected reduced current and subsequently reduced delay. This effect is also referred to
as low voltage Inverted Temperature Dependence.
Lets first see, what does the delay of a MOS transistor depend upon, in a simplified
model.
Delay = ( Cout * Vdd )/ Id [ approx ]
Where
Cout = Drain Cap
Vdd = Supply voltage
Id = Drain current.
Now lets see what drain current depends upon.
Id = (T) * (Vdd Vth(T))
Where
= mobility
long these glitches dont happen near the active clock edges. In that sense it is not 100%
protection as random glitch could happen near the active clock edge and meet both setup
and hold requirements and can cause flops to reset, when they are not expected to be
reset.
This type of random glitches are more likely to happen if reset is generated by some
internal conditions, which most of the time means reset travels through some
combinational logic before it finally gets distributed throughout the system.
Synchronous reset flops are smaller as reset is just and-ed outside the flop with data, but
you need that extra and gate per flop to accommodate reset. While asynchronous reset
flop has to factor reset inside the flop design, where typically one of the last inverters in
the feedback loop of the slave device is converted into NAND gate
- Spurious glitches. With asynchronous reset, unintended glitches will cause circuit to go
into reset state. Usually a glitch filter has to be introduced right at the reset input port. Or
one may have to switch to synchronous reset.
- If reset is internally generated and is not coming directly from the chip input port, it has
to be excluded for DFT purposes. The reason is that, in order for the ATPG test vectors to
work correctly, test program has to be able to control all flop inputs, including data, clock
and all resets. During the test vector application, we can not have any flop get reset. If
reset is coming externally, test program hold it at its inactive value. If master
asynchronous reset is coming externally, test program also holds it at inactive state, but if
asynchronous reset is generated internally, test program has no control on the final reset
output and hence the asynchronous reset net has to be removed for DFT purpose.
One issue that is common to both type of reset is that reset release has to happen within
one cycle. If reset release happen in different clock cycles, then different flops will come
out of reset in different clock cycles and this will corrupt the state of your circuit. This
could very well happen with large reset distribution trees, where by some of receivers are
closer to the master distribution point and others could be farther away.
Thus reset tree distribution is non-trivial and almost as important as clock distribution.
Although you dont have to meet skew requirements like clock, but the tree has to
guarantee that all its branches are balanced such that the difference between time delay of
any two branches is not more than a clock cycle, thus guaranteeing that reset removal will
happen within one clock cycle and all flops in the design will come out of reset within
one clock cycle, maintaining the coherent state of the design.
To address this problem with asynchronous reset, where it could be more severe, the
master asynchronous reset coming off chip, is synchronized using a synchronizer, the
synchronizer essentially converts asynchronous reset to be more like synchronous reset
and it becomes the master distributor of the reset ( head of reset tree). By clocking this
synchronizer with the clock similar to the clock for the flops( last stage clock in clock
distribution), we can minimize the risk of reset tree distribution not happening within one
clock.
-SS.
Posted in Digital Design, sta | Leave a reply
Verilog execution order
Posted on July 18, 2012
1
Following three items are essential for getting to the bottom of Verilog execution order.
1) Verilog event queues.
2) Determinism in Verilog.
3) Non determinism in Verilog.
Verilog event queues :
To get a very good idea of the execution order of different statements and assignments,
especially the blocking and non-blocking assignments, one has to have a sound
comprehension of inner workings of Verilog.
This is where Verilog event queues come into picture. Sometime it is called stratified
event queues of Verilog. It is the standard IEEE spec about system Verilog, as to how
different events are organized into logically segmented events queues during
Verilogsimulation and in what order they get executed.
Figure :
Stratified Verilog Event Queues.
As per standard the event queue is logically segmented into four different regions. For
sake of simplicity were showing the three main event queues. The Inactive event
queue has been omitted as #0 delay events that it deals with is not a recommended
guideline.
As you can see at the top there is active event queue. According to the IEEE Verilog
spec, events can be scheduled to any of the event queues, but events can be removed only
from the active event queue. As shown in the image, the active event queue holds
blocking assignments, continuous assignments. primitive IO updates and $write
commands. Within active queue all events have same priority, which is why they can
get executed in any order and is the source of nondeterminism in Verilog.
There is a separate queue for the LHS update for the nonblocking assignments. As you
can see that LHS updates queue is taken up after active events have been exhausted,
but LHS updates for the nonblocking assignments could re-trigger active events.
Lastly once the looping through the active and non blocking LHS update queue has
settled down and finished, the postponed queue is taken up where $strobe and $monitor
commands are executed, again without any particular preference of order.
At the end simulation time is incremented and whole cycle repeats.
Determinism in Verilog.
Based on the event queue diagram above we can make some obvious conclusions about
the determinism.
- $strobe and $monitor commands are executed after all the assignment updates for the
current simulation unit time have been done, hence $strobe and $monitor command
would show the latest value of the variables at the end of the current simulation time.
- Statements within a beginend block are evaluated sequentially. This means the
statements within the beginend block are executed in the order they appear within the
block. The current block execution could get suspended for execution of other active
process blocks, but the execution order of any being..end block does not change in any
circumstances.
This is not to be confused with the fact that nonblocking assignment LHS update will
always happen after the blocking assignments even if blocking assignment appears later
in the begin..end order. Take following example.
initial
x
y
z
end
=
<=
=
begin
0
3
8
statement has same priority as blocking statement execution in general. Hence in our
example here, second step is the evaluation of RHS of nonblocking statement and
3) third step is execution of the last blocking statement z = 8. The last step here will be
the update to y for the nonblocking statement. As you can see here the begin .. end
block maintains the execution order in so far as the within the same priority events.
4) last step would be the update of the LHS for the nonblocking assignment, where y
will be assigned value of 3.
- One obvious question that comes to mind, having gone through previous example is that
what would be the execution order of the nonblocking LHS udpate !! In the previous
example we only had one nonblocking statement. What if we had more than one
nonblocking statement within the begin..end block. We will look at two variation of this
problem. One where two nonblocking assignments are to two different variable and the
two nonblocking assignments to same variable !!
First variation.
initial begin
x=0
y <= 3
z=8
p <= 6
end
For the above mentioned case, the execution order still follows the order in which
statements appear.
1) blocking statement x = 0 is executed in a single go.
2) RHS of nonblocking assignment y <= 3 is evaluated and LHS update is scheduled.
3) blocking assignment z = 8 is executed.
4) RHS of nonblocking assignment p <= 6 is evaluated and LHS update is scheduled.
5) LHS update from the second nonblocking assignment is carried out.
6) LHS update from the last nonblocking assignment is carried out.
Second variation.
initial begin
x=0
y <= 3
z=8
y <= 6
end
For the above mentioned case, the execution order still follows the order in which
statements appear.
1) blocking statement x = 0 is executed in a single go.
2) RHS of nonblocking assignment y <= 3 is evaluated and LHS update is scheduled.
3) blocking assignment z = 8 is executed.
4) RHS of nonblocking assignment y <= 6 is evaluated and LHS update is scheduled.
5) LHS update from the second nonblocking assignment is carried out, y is 3 now.
6) LHS update from the last nonblocking assignment is carried out, y is 6 now.
Non-determinism in Verilog.
One has to look at the active event queue in the Verilog event queues figure, to get an
idea as to where the non-determinism in Verilog stems from. You can see that within the
active event queue, items could be executed in any order. This means that blocking
assignments, continuous assignments, primitive output updates, and $display command,
all could be executed in any random order across all the active processes.
Non-determinism especially bits when race conditions occur. For example we know that
blocking assignments across all the active processes will be carried out in random order.
This is dandy as long as blocking assignments are happening to different variables. As
soon as one make blocking assignments to same variable from different active processes
one will run into issues and one can determine the order of execution. Similarly if two
active blocking assignments happen to read from and write to the same variable, youve a
read write race.
Well look at Verilog race conditions and overall good coding guidelines in a separate
post.
-SS.
Posted in Digital Design, Verilog | 1 Reply