Microsoft PowerPoint - Lecture1

Advance VLSI Design
MEL G623
BITSPilani ANU GUPTA
Pilani Campus EEE
BITSPilani
Pilani Campus
Introduction
HANDOUT
http://nalanda.bits-pilani.ac.in/
BITS Pilani, Pilani Campus

Prerequisite
VLSI DESIGN
Digital system engineering
Engineering problems of composing circuits into systems are
only briefly touched upon

Architecture deals with organizing a system and
defining interfaces (e.g., instruction sets and channel protocols)
to achieve cost and performance goals.
System-level engineering constrains what the architect can do

and is a major determinant of the cost and performance of the
resulting system.
Digital system engineering

Engineering view of digital systems
Speed, pin count
Signalling conventions
Timing and synchronization
Power distribution
Noise
Communication
IC-IC, block to block , board to board communication
Logic to logic communication

Data transmission

Data can be sent either serially, one bit after another through a single
wire, or in parallel, multiple bits at a time, through several parallel wires.
Most famously, these different paradigms are visible in the form of the common PC ports "serial
port" and "parallel port".
Early parallel transmission schemes often were much faster than serial schemes (more wires =
more data faster), but the added cost and complexity of hardware (more wires, more complicated
transmitters and receivers). Parallel transmission protocols are now mainly reserved for
applications like a CPU bus or between IC devices that are physically very close to each other,
usually measured in just a few centimeters.
Serial data transmission is much more common in new communication protocols due to a
reduction in the I/O pin count, hence a reduction in cost.
Common serial protocols include SPI, and I2C. Surprisingly, serial transmission methods can
transmit at much higher clock rates per bit transmitted, thus tending to outweigh the primary
advantage of parallel transmission.
Serial protocols are used for longer distance communication systems, ranging from shared external
devices like a digital camera to global networks or even interplanetary communication for space
probes, however some recent CPU bus architectures are even using serial methodologies as well.
parallel/ vs. serial I/O

For years, parallel communication schemes offered clear
advantages for moving data quickly from chip to chip, board to board or system to
system
But Parallel I/O could work - but only by applying significant engineering resources.
Stringent specifications in the PCIX standard for rise and fall times, drive strengths,
path delays and skews, for example, have proven so expensive that it has been
adopted today only in high-end applications such as computer servers.
Other problems include
Simultaneous Switching Outputs--When too many switch simultaneously, ground

bounce creates a lot of noise.
EMI
Cost

Industry analysts agree that the High-Speed Serial
Initiative is inevitable because parallel I/O schemes reach physical
limitations when data rates begin to exceed just 1 Gb/s
Serial I/O-based designs offer many advantages over traditional parallel

implementations including
fewer device pins,
reduced board space requirements,
fewer printed circuit board (PCB) layers,
easier layout of the PCB, smaller connectors,
lower electromagnetic interference, and better noise immunity.

Chip to chip communication, board to board
communication can be using parallel i/o, serial i/o. serial
communication is preferred as it requires less pin, has
less problems. But serializer / deserializer circuits are
required
Within chip, parallel communication is not expensive as

wire lengths are very small, no i/o pads are required

Pin count is the first problem encountered when trying to move a lot of data
in and out of a chip. The number of input and output pins is always limited.
Although pin count tends to increase over time, it is never
enough to keep up.
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

Engineers view of digital
system

View a digital system in terms of information flow,
power flow and timing
Bandwidth
Bandwidth - A range of frequencies
Analog - measured in Hz
Bandwidth = High-Freq Low-Freq
Spectrum - synonym - used only in analog

measurements.
Bandwidth in digital realm - often used to refer

to bits-per-second (bps)
Bps= 8 bps
CSIS 625
14

Bit Rate
Most digital signals are aperiodic
Period and frequency are not appropriate to describe

digital signals
Bit Interval - time to send one bit
Bit rate - number of bits send in a second. Measured

in bits per second bps - Bits Per Second
Do NOT use Hz when you mean bps or vice-versa

15
CSIS 625
Total die bandwidth is the product of signal
pin count and bandwidth/pin.
Total die bandwidth is the product of signal pin count and
bandwidth / pin.
System design issues
Timing and synchronization

Signalling
Power Distribution
Noise
Need for timing
A timing convention governs when a
transmitter drives symbols onto the signal line and when they
are sampled by the receiver.
A timing convention may be--
Periodic---with a new symbol driven on a signal line at regular

time intervals.
Here we may use a local clock source to determine when the

next symbol is to arrive.
Aperiodic---with new symbols arriving at irregular times.

Here an explicit transition is required to
signal the arrival of each symbol.
This transition may be on a separate clock line that may be

shared among several signals (bundled signaling).
3 basic timing models for
communication between two
ICs
copy of the clock is sent along with the data. The output time
Communication between two blocks where a common clock is of the forwarded clock is adjusted so that the clock
applied to both transmission and reception block. transitions in the middle of the bit cell

the data stream contains both the data and the clock
Use of global clock---

synchronous system , periodic timing
Many systems, or parts of systems, use a global clock to which all

signals in the subsystem are synchronized.
All signals change in response to a transition on the clock and are

sampled in an aperture region surrounding a clock transition.
Using a global clock is convenient in a small subsystem because it

places all signals in a single clock domain, obviating the need for
synchronization that would otherwise be required
in moving between clock domains

Drawbacks of global clock
In such system the maximum data rate is limited by the maximum
delay of the system rather than only uncertainty in delay.
If any cables or logic modules have a delay longer than one clock
period, the system will only operate over certain (often narrow)
windows of clock frequencies.
Clk signal comes earlier than the data as data is usually delayed.
Hence , global clock is not usually centered on the eye of the signals
it is sampling. This convention tends to be less tolerant of timing noise

Good Timing Convention
A good timing convention is one that manages

timing noise ( skew, and jitter) in an efficient manner
and allows the system to operate over a wide range
of data rates and to achieve max data rate.
Data rate

The rate at which we can send data over a
line or through a block of combinational logic is governed by
timing property (delay) of transmitter and receiver,

The rise time of the transmitter determines how fast a new symbol can be put on the line.
This signal must then remain stable during the sampling window of the receiver to be reliably
detected
timing noise like skew and jitter .(or uncertainty)

Uncertainty in the arrival time of the signal at the receiver and uncertainty in
the timing of the transmit and receive clocks must be compensated by widening the
cell to allow for the worst-case timing plus a margin.
Larger is the uncertainty, smaller is the data rate achieved
The rise (transition) time of the transmitter determines how fast a new
symbol can be put on the line.

This signal must then remain stable during the
sampling window of the receiver to be reliably detected.
The minimum bit cell is widened further by timing noise in the system.
Timing noise constitutes Uncertainty in the arrival time of the signal at the
receiver and uncertainty in the timing of the transmit and receive clocks
must be compensated by widening the cell to allow for the worst-case
timing plus a margin.
Pipelining timing convention

A pipeline timing convention overcomes the
limitations of a global clock by generating a clock for each
data signal that is nominally centered on the eye of the data
signal.
The clock for the data at the output of a module is typically

generated by delaying the clock used at the input of the
module by the nominal delay of the module plus one-half
clock cycle
Types of timing convention

A timing convention may operate either open loop or closed loop.
In an open loop timing system,
the frequencies and delays associated with system timing are not subject
to control. The system must be designed to tolerate the worst case
variation in these parameters.
With closed-loop timing, one or more, system timing parameters,

delays and frequencies, are
actively controlled.
The system measures a timing parameter such as skew and uses
feedback control to adjust the variable parameters to reduce the skew.
This active control can greatly decrease the timing uncertainty in a system
and hence increase the maximum data rate if pipeline timing is employed.

open-loop, global-clock
synchronous system
A 400 MHz clock is distributed to the transmitter and receiver from a master
clock generator over transmission lines that are matched to 100 ps. A
single-stage buffer at the clock generator, B1, introduces a timing
uncertainty of 100 ps, and the four-stage on-chip clock generator, B4,
adds 400 ps of timing uncertainty.
Conventional single-edge-triggered flip-flops, the clock frequency is twice the highest

data frequency, which complicates
the design of clock buffers and the clock distribution
network, and, to first approximation, doubles the clock
power BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Signals and events

At any point in time a signal carries a symbol, a digital value,
between two points in a system
Over time the signal carries a stream of symbols, one after the
other, in sequence. We need a source of timing information to
determine where one symbol stops and the next symbol,
possibly with the same value, begins
We refer to the combination of a symbol value and the time it
starts as an event.
Periodic events--equally spaced in time, we can use an
internal time base to count out the intervals between symbols
Aperiodic events---an explicit transition is required to start
every symbol. Because of three possibilities:
continue sending the current symbol,
start the next symbol with the same value, and
start the next symbol with the complement value.
So, Either a ternary signal or two binary signals are required to encode
aperiodic binary event
synchronous system
A 400 MHz clock is distributed to the transmitter and receiver from a master
clock generator over transmission lines that are matched to 100 ps. A
single-stage buffer at the clock generator, B1, introduces a timing
uncertainty of 100 ps, and the four-stage on-chip clock
generator, B4, adds 400 ps of timing uncertainty.
Conventional single-edge-triggered flip-flops, the clock frequency is twice the highest

data frequency, which complicates the design of clock buffers and the clock
distribution network, and, to first
approximation, doubles the clock power BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Example-
Timing of a signal traveling in a digital system with reqd
data rate 400 Mbps
Constraints---
The system operates at 400 Mb/s (bit-cell time tbit = 2.5 ns)
over twire = 6.25 ns (2.5 bits) of cable.
The delay of individual wires is matched to within 100 ps,

the equivalent of 1.5 cm.

TIMING BUDGET-
Components
3 major components of a timing budget
The uncertainty, tu is the difference between the nominal

waveform and the early or late waveform.
The aperture time, ta is a property of the receiver, its

flipflops, or both.
The transition time, tr is the time required for each

waveform to switch states.

Nominal timing budget
2.5-ns bit cell/ interval (required time for data flow, time to send one
bit) , I ns is used for transition time, leaving a 1.5-ns eye opening.
The 300 ps aperture or sampling window is nominally centered on the
eye,
leaving a 600-ps gross timing margin on either side.
This gross timing margin is the amount of timing uncertainty, skew and
jitter, that either system can tolerate

eye diagram
An eye diagram of a signal overlays the signal's
waveform over many cycles
An eye diagram provides a visual indication of the
voltage and timing uncertainty associated with a signal
A Signal and Its Eye

Diagram

Max eye opening

is an indication of AC voltage noise
The size of the eye opening in the center of the diagram

indicates the amount of voltage and timing margin available to
sample this signal.
The vertical thickness of the line bunches in the eye diagram
the horizontal thickness of the bunches where they cross over

is an indication of AC timing noise or jitter
If a margin rectangle with width equal to the required timing

margin and
height equal to the required voltage margin
fits in the opening, then the signal has adequate margins
Abstract eye diagram-for
min timing budget

worst-case late
timing.
net timing margin
3 superimposed waveform pairs,

nominal timing,
worst-case early timing,
The cycle time must be large enough to account for uncertainty, aperture, and
rise time
it is the transition time component of the timing budget that is traded off against
voltage margin when fitting a margin rectangle into the eye opening.
Min. time required
In the absence of timing noise,
the minimum bit-cell width

=transmitter rise time + receiver sampling
window. In the presence of timing noise the
minimum bit-cell width
= transmitter rise time + receiver sampling window

+ timing noise

Diff. in timing convention

Example
Timing of a signal traveling in a digital

system with reqd. data rate 400 Mbps
Using open-loop timing,
global-clock synchronous system

synchronous system
Open loop timing case has a total timing
uncertainty of 2.15 ns = 1.55-ns skew + 0.6-ns jitter = 2150ps
far greater than the 1200 ps allowed for correct

operation.

Allowable Clock Rates
An open-loop timing approach will only sample at the center of
the eye at clock rates

Range of bit periods---
-
Parallel communication
Range of bit periods for which system operate properly
For best performance, we want to center the sampling clock on the

eye. so Line delay must be a multiple of half the clock!
After 0.5 tbit, data , 1.5 t bit can be sampled

Cable delay -negligible
BITS Pilani, Pilani

Campus
Range of bit periods----
Serial communication
Range of bit periods for which system operate properly
If N is the number of bits in flight on the wire at any given time
For best performance, we want to center the sampling clock on the

eye. so Line delay must be a multiple of half the clock!
BITS Pilani, Pilani

Campus
Delay the clock by the same amount as the data plus
half a bit cell
Data residing on the cable at different locations of it
Range of bit periods

Serial communication-long wire
Range of bit periods for serial comm. (no
equipotential criteria)
BITS Pilani, Pilani

Campus
In Approach A , a 400-MHz clock and single edge-
triggered (SET)
flip-flops are not
able to give a
400-Mb/s data
rate.
N larger than 3 gives non-
overlapping constraints
(negative net timing margin). for
N = 1 there is no maximum bit-cell time.
Closed-Loop Timing-
Measure and Cancel Skew
BITS Pilani, Pilani

Campus
All skew can be canceled by a variable delay
element
in clock or data path
Delay line is adjusted to center the clock on the eye
To adjust the delay, need to measure the timing
Usually an iterative process, measure-adjust-measure...
BITS Pilani, Pilani

Campus
Closed loop timing
Approach B
closed-loop timing
controller

Total timing uncertainty of = 450 ps (250-ps skew and 200-ps jitter),
leaving a net timing margin of 150 ps.
Data rate--- >400Mbps can be achieved
Further advantage that it will work at any clock rate as long as the total
timing uncertainty is less than the gross timing margin.
In Approach B , a 400-MHz clock and single -edge-triggered (SET) flipflops

give a 400-Mb/s data rate.
Approach Badv.

Reference clock, RClk, along with the data and uses a
feedback loop to center the lClk, on the eye of the
reference clock.
This approach is insensitive to the static phase difference or skew between

the transmit and receive clocks.
Thus, Approach B does not require matched lines for clock distribution.
This approach is still sensitive to dynamic phase difference or jitter in the

receive clock before the B4 buffer and to relative skew between the data
lines and the reference clock.

Design changes in B
Other modifications----
200-MHz clock and double-edge-triggered (DET) flip-flops to give a
400Mb/s data rate.
The DET flip-flops sample their inputs on both edges of the clock.
With this approach the maximum toggle frequency of the clock is

identical to the maximum toggle frequency of the data.
Closed-loop timing cancels most sources of skew in a timing system,

obviating
the need to match the delay of components or wires and thus
considerably simplifying system design.
With closed-loop timing, jitter becomes the major constraint on a timing

budget
Other considerations
Other variables in the design of a timing convention
Degree of synchronization
Periodicity
Encoding of events

BITSPilani
Pilani Campus
Timing nomenclature
Timing nomenclature
Delay and Transition Times
Other definitions
Relative phase between two periodic signals
-----When working with periodic signals it is often convenient to express
a delay as the relative phase between two signals
Plesio-chronous signals---if two signals A and B have frequencies that
differ by fAB = |fA - fBI fA, then for short periods they can be treated as if
they were synchronized.
Over longer periods, however, the slow drift of phase must be accounted for

Combinational logic delays

Timing properties of edge
triggered FF
The setup time is the delay from the data's becoming valid to the
rising edge of the clock. because delays are specified from the
50% point of the waveform,
So setup time is from tr /2 before the beginning of the aperture to

the rising edge of the clock.
the aperture time is specified from the 10% or 90% point of the
waveform.
Setup and hold time

Necessary to storing correct value in register
Setup time-----data should have its valid value ready when

register opens
Hold time----Do not change data immediately after the edge

has come because register takes time to store the correct
value at its intermediate nodes

Timing Properties of
Clocked edge triggered FF
Edge triggered FF
The aperture offset time, tao, is the time from the center of the aperture to
the rising edge of the clock.
The output remains at its previous value until at least a contamination

delay, tcCQ, after the clock, and the output changes to its new value (y) at
most a propagation delay, tdCQ, after the clock
Double edge triggered

Timing Properties of Clocked

Level-Sensitive Latch
FF/ LATCH---comparison
With a flip-flop, the aperture time and delay are both referenced to the
positive edge of the clock and there is no delay property referenced to
the data input.
With a latch, however, the aperture and delay timing properties are
referenced to different edges of the clock, and an additional delay
property is referenced to the data input.
The aperture time of a latch is referenced to the falling edge of the

clock because this is the edge that samples the data.

BITSPilani
Pilani Campus
Signalling
Encoding Signals/ Events
Over time the signal carries a stream of symbols, one after the
other, in sequence.
We need a source of timing information to determine where one

symbol stops and the next symbol, possibly with the same
value, begins.
The combination of a symbol value and the time it starts is an

event.

EncodingAperiodic Events
For events that are not periodic, an explicit transition is
required to start every symbol.
A binary event cannot be encoded on a single binary

signal, as there are three possibilities:
continue sending the current symbol,
start the next symbol with the same value,
start the next symbol with the complement value.
Either a ternary signal or two binary signals are required to

encode a binary event.
Return-to-Zero (RZ)
Non-return-to-Zero (NRZ) Signaling
Ternary Signaling
Dual-Rail Signaling
Clocked Signaling

NRZsignaling requires
remembering the old state
RZ/ NRZ
RZ-- Return-to-zero signaling is advantageous in cases where power
is dissipated only when the line is high.
Only the positive transitions are significant.

It also simplifies the decoding logic
Doubles the number of transitions required to encode the events.
NRZ Non-return-to-zero signaling requires remembering the old state

of both lines to decode the present value.

Clocked
Signals that are bundled with a single clock signal are in the same clock
domain.
That is, all of the signals change in response to the same source of
events.
Signals in the same clock domain can be combined in combinational

logic, giving an output in the same clock domain (but with greater delay).
Advantage of separating the timing from the symbol-- it permits bundling

of several values with a single event stream.
Disadvantage--Tightens the constraints on line-to-line skew
Bundling in Clocked Signaling

Ternary
The three states of the logic signal are needed to encode
the 3 possibilities :
staying in the current state denotes continuing the current symbol,
changing to the higher of the two other states signals the start of a new 1 symbol,
changing to the lower of the two other states denotes starting a new 0 symbol. Advantage--
Single wire is required
Disadvantagedifficult to distinguish a transition from state 0 (the lowest state) to state 2

(the highest state)
NRZ--Only the time interval between the crossing of the two thresholds U0 and U 1
distinguishes these two cases
RZ--Because the signal returns to zero following each event, each transition crosses only
a single threshold.
There is no ambiguity or timing uncertainty associated with crossing multiple thresholds in
a single transition; however it does require twice the number of transitions as ternary NRZ
signaling.

Encoding Periodic Events
If symbols are transmitted at fixed timing intervals, timing information can be
determined implicitly by measuring the intervals using a local time base.
All that is required in this case is that transitions on the signal be frequent
enough to keep the transmit and receive time bases from drifting too far apart.
The maximum number of bit cells between transitions to synchronize transmit

and receive time bases is given by
where td is the timing margin budgeted to phase drift, f is the maximum

frequency difference in s-l, and tc is the bit cell time.
Example
1 Gb/s link (tc = 1 ns) where we have budgeted 100 ps to phase
drift and the two time bases are matched to f = 100 kHz.
For this link, N is 1000. The link can run for 1,000 bit cells without a
transition before the phase drifts by more than 100 ps.
If we need to tolerate single errors on the link, we must make sure

transitions to synchronize happen at least twice this often so that we
can miss one transition without drifting out of our phase budget.

Transmitter and receiver clks.
Dummy Nth bit.
transitions to synchronize
Bit stuffing
To ensure that a transition is present at least once
every N bit cells, in the worst case we must
consume 1/ N of the channel bandwidth by inserting a transition cell
that carries no data after every N - 1 data bits
The simplest approach is to insert this transition cell whether we need

to or not.
After executing a startup protocol to synchronize to the bit level, the
transmitter inserts a dummy transition bit every N bits, and the
receiver ignores every Nth bit.

Open-loop Synchronous
Timing
All signals are synchronized with a common clock.
Every signal potentially makes a transition within some phase range

relative to a master clock signal and is sampled within a (possibly
different) phase range of the clock.
Such a system is open-loop if the delays or phase relationships of the

signals are uncontrolled.
No delay or phase adjustments can be made to compensate for

measured mismatches.

Types of OLST
Using a single global clock to which all signals are synchronized.
Using a single-phase global clock driving edge-triggered flipflops

gives maximum and minimum delay constraints, both of which are
dependent on skew
Operating a pipeline using a global clock gives a maximum data

rate that is related to the maximum delay of the longest pipeline
stage.

Using a global clock to drive edge-triggered flip-flops
is popular approach to low end logic design.
Edge-triggered flip-flop gives a simple model of sequential logic design

in which all signals are in the same clock domain, the output of the
flipflop is the current state, and the input of the flip-flop is the next state
Edge-triggered flip-flops are widely available in most SSI logic families

(CMOS, TTL, and ECL)

Global Clk.--Edge-Triggered
State-Feedback Timing

Global Clk.-- Pipeline timing
The timing of such a pipeline is optimized by generating a clock for each

pipeline stage that is centered on eye
Such a clock is easily generated by delaying the clock from the previous
stage.
As long as there is no feedback in a pipeline, the throughput of the

pipeline is limited primarily by the timing uncertainty of each stage.
The nominal delay of the stages affects latency but not throughput.
Gives a maximum data rate that is determined by the largest timing

uncertainty of a pipeline stage.

Open loop global clk synchronous
system----Timing Parameters of case1
R1 R2
In Combinational D Q DQ Logic CLK
tCLK1 tCLK2
tc q tlogic
tc q, cd tlogic, cd tsu, thold
Assume positive edge triggered system, one bit data transition

on a wire
The setup and hold representation is more convenient in
openloop timing systems where Tsetup figures in the maximum
delay calculation and Thold figures in the
minimum delay calculation.84
Timing constraint
Ideal clock
Minimum cycle time:

T > tc-q + tsu + tlogic
Hold time constraint:

thold < t(c-q, cd) + t(logic, cd)
85
Clock Non-idealities
Clock skew
Spatial variation in arrival time of a clock transition.
It is caused by mismatches in clock path or clock load
It can be positive or negative depending upon routing
direction and position of clock source
Clock skew does not result in clock period variation but
only in phase shift
86
Positive and Negative Skew
R1 R2 R3
In Combinational Combinational
D Q D Q D Q
Logic Logic
CLK tCLK1 tCLK2 tCLK3

delay delay
(a) Positive skew
R1 R2 R3
In Combinational Combinational
D Q D Q D Q
Logic Logic
tCLK1 tCLK2 tCLK3

delay delay CLK

(b) Negative skew
87
Positive Skew
TCLK
TCLK
1 3
CLK1
CLK2 2 4
th
Launching edge arrives before the receiving edge
88
Impact of positive clock
skew
R1 R2
In Combinational
D Q D Q
Logic
CLK tCLK1 tCLK2
tc q tlogic
Minimum cycle time:

T + = tc-q + tsu + tlogic
Worst case is when receiving edge arrives early (positive )
89
Race condition
Hold time constraint: thold + <
t(c-q, cd) + t(logic, cd)

90
Negative Skew
TCLK -
TCLK
1 3
CLK1
CLK2 2 4

Receiving edge arrives before the launching edge
91
Impact of negative clock skew

In
R1 R2
Combinational
D Q D Q
Logic
CLK tCLK1 tCLK2
tc q tlogic
Minimum cycle time:

T - = tc-q + tsu + tlogic
Worst case is when receiving edge arrives early (positive )
92

No Race condition
Probability of race condition is reduced or nil
thold - < t(c-q, cd) + t(logic, cd)
System never fails as new data latched on to R1 never gets

transferred to R2 as it would turn off
93

Impact of Jitter---always
slows down
ji tte r
In
94
s u, hold t
jitter
Pipeline Timing Case2

pipeline with a cycle time shorter than the maximum delay
Single edge triggered FF

A pipeline with a cycle time shorter than the
maximum delay has several data bits on flight on a
single wire
edge triggered FF
Min. cycle time
with clk. timing uncertainty
Centering the clock on the eye gives the largest timing

margins
The minimum possible cycle time
Logic circuits that are pipelined with several bits in flight

along each path are often called wave-pipe lined.

Min logic delay---- tcd
An output may make several transitions before reaching the steady
state. The extra transitions are called a hazard or glitches.
We denote the minimum time for output j to make its first transition in
response to a transition on input i as the contamination delay tcdji It is
the minimum over all input states, s.
It is the delay to the first transition caused by an input transition, or

equivalently
it is the min. delay over all active paths in the circuit.
It is the min. over process, temperature, and voltage variations

Max Logic Delay-- Tlogic
It is the maximum over all input states, s.
It is the delay to the last transition caused by an input

transition, or equivalently
it is the maximum delay over all active paths in the circuit.
It is the maximum over process, temperature, and voltage

variations.

BITSPilani
Pilani Campus
Latch based design

Level-Sensitive Clocking
Level-Sensitive
ClockingNon-overlapping
clocks
Using two-phase, non-overlapping clocks with level-sensitive latches.
This method results in very robust timing that is largely insensitive to skew;
it does require that logic be partitioned across the two phases, and signals
from the two partitions cannot be intermingled.
Because of its advantages, two-phase clocking is the most common timing

discipline in use today. Using two-phase clock eliminates both the minimum
delay constraint and the skew sensitivity

Nominal operation
Performance Similar to
102
Slack borrowing
Enhanced performance due to flexible timing, yet no
design changes
Possible for logic block to utilize time that is left over from
the previous logic block.
Total logic delay can be more than one clock cycle
Clock rates can be higher than worst case critical delay

path
103

Non overlapping phasezero
104
Reg based vs. latch based--
example
Reg.
latch
EE141 105
Less Tclk
106
Level-Sensitive Pipeline
Timing
A starts changing with the rising edge of i ,
whereas B is sampled by the falling edge
of i+1
The minimum cycle time

BITSPilani
Pilani Campus
2 phase Non-overlap Clocking

CL1 can borrow slack from CL2
CL2 has very small delay
Max. delay constraintputs limit on slack
borrowing
If non overlapping region is 0, then
the two back-to-back latches are equivalent

to an edge triggered flip-flop.
Time borrowing
Time borrowing can be performed across clock cycles as
well as between the two phases of one cycle.
In general, a two-phase system will operate properly if--
if for every cycle or closed loop in the logic that crosses N

pairs of latches, the total delay of the combinational logic
around the loop is less than N clock cycles less the latch
delays

Effect of skew
Late 1, meets early 2, then minimum
delay constraint is required to meet the
hold-time requirements on the falling
edge of 1 .
Advantages
Performance advantage as it Enables

flexible timing by allowing one stage to
pass slack or to borrow time from others
Logic block can utilize time that is left
over from the previous block. [Slack
borrowing]
This is due to transparency of latch
during on time
132
Drawback
Clk signals to be planned carefully
CLB delays to be designed taking nto

account on/ Off conditions of latches
133
Register based design, Tclk= 125ns
assume register has zero delays
134
Clk
In Clb Clb
R1 R2 Clb Clb R3 Out
A B
a 75n b 50n c C d D e
T=125n
T 135
Latch based designcase-1(bad design)
L1--HI, L2-- LO , Tclk= 100ns
T 136
Clk
hi
lo
Clb Clb Clb Clb

Out
L1 A L2 B L3 C L4 D L5
a 75n b 50n c 25n d 25n e
50n Wait
idle
b d e
c
T 137
Latch based designcase-2
T 138
L1---HI, L2 LO Tclk= 100ns
max slack- 1/2 Tclk
T 139
Clk
T 140
Latch based designcase-3 L1---
LO, L2-- HI , Tclk= 100ns
T 141
Clk
hi
lo
Clb Clb Clb Clb

Out
a 25n b 25n c 75n d 50n e
50n SLACK
Borrow
c d e
b
T 142
Latch based designcase-4 L1---
LO, L2 HI , Tclk= 100ns
T 143
Clk
hi
lo
Clb Clb Clb Clb

Out
a 75n b 50n c 75n d 50n e
50n
W
R
O
N b
G
T 144
Maximum slack possible
Max time that can be borrowed is 0.5 Tclk
So max logic cycle delay can be 1.5 Tclk
But for n stages overall delay would be n
Tclk
Drawbacks
We have to use
two phase clocking scheme,
Glitches-power dissipation increases
145
Single-Phase or Zero
Nonoverlap Clocking
146
Pipelines With Feedback
Often it is necessary to feed back information between pipeline stages.

For example in many types of pipelines, including communications
channels, flow-control information must propagate backwards from a
late pipeline stage to an early pipeline stage. All previous concept can
be applied here with care
Asynchronous systems--
Self timed approach
Sync. systems
logical ordering of events by clk. It provides a time base
Physical timing constraint- next edge comes when all

blocks have reached steady state
ProblemCLB has to wait even though it may finish

earlier. Need Clock distribution network
Asynch. designmeeting
constraints
149
Advnext block can start computation as
soon as previous block has finished.
Problem when to latch the output ? When output is a

correct value?
Remedysystem has to meet timing constraints

Local signals
Logical ordering and physical timing --
150
REQUEST , ACKNOWLEDGE - Logical
ordering
START, DONE -- physical timing

Self timed system
System generate its own timing signal
151
Self timed system
152
Hand shake
protocol
Hand shaking- synchronize by mutual agreement
Adv.--timing signals generated locallyless prop. Delay,

high speed, no clock routing
Disadv. hand shaking circuit design

2 phase protocol
153
Simple and fast
Transition signaling or event logic
Data transfer happens at both the edges (falling,

receiving)
No reset state
154
Implementation of HS
protocol2 phase (no return to
zero)
155
EE141
Req 2
Req
Ack
SENDER RECEIVER
Data 3
Ack
(a) Sender-receiver configuration Data 1 1
cycle 1 cycle 2
Senders action
Receivers action
(b) Timing diagram
156
EE141
2 phase protocol (NRZ)
Data change---request----data acceptance---
acknowledge
Data change---request----acknowledge----data
acceptance
Proceeds in cyclic order

2 active cycles-
Sender ---terminated by request event (no change in o/p
possible)
Receiver-------terminated by acknowledge event
4 phase protocol
EE141
2 4 Senders action
Req
Receivers action
Ack
3 5
Data 1 1
Cycle 1 Cycle 2
135
158
EE141
Dual rail coding
I bit information coded using two wires
Request/ done is merged with data wires

Bundled data / single rail
coding
Req.shdbe issued when data output is stable

Correct signaling
For correct functionality, signalling events must be
proper
For this we require handshaking module muller-C

element
Event must occur on both the inputs to create an output

i.e. both req and ack should be high to enable next
function block

Event Logic The Muller-C
Element
A B Fn 1
A
0 0 0
C F 0 1 Fn
B 1 0 Fn
1 1 1
(a) Schematic (b) Truth table

Asynchronous pipeline
example transition signaling
The simplest asynchronous pipeline, in which the LOGIC
blocks are just wires, acts as a first-in first- out (FIFO)
buffer
Here, data being clocked into the align stages by the input
request signal and clocked out by the output acknowledge
signal.
Dual rail
Bundled data
2-phase bundled data
(micro-pipelines)

2-Phase bundled data
Protocol--FIFO

143
EE141
Capture-Pass
transitioncontrolled latch
Transitions on C and
P alternate
Micropipelines
Elegant, no RZ
overhead
But implementation
(latches and other
control circuits) is
complex

145
EE141
Mouse trap pipeline

2 phase bundled data
Simple asynchronous implementation style, uses

transparent latches
simple control: 1 gate/pipeline stage
Target datapath = static logic blocks
MOUSETRAP: uses a capture protocol
Latches are normally transparent: before new data arrives
become opaque: after data arrives (capture data)
Control Signaling: transition-signaling = 2-phase
simple protocol: req/ack = only 2 events per handshake (not
4)
no return-to-zero
each transition (up/down) signals a distinct operation
Our Goal: very fast cycle time
simple inter-stage communication
Dual rail
BITSPilani
Pilani Campus
Memory element design
EE141 149
PERFORMANCE PARAMETERS
CLOCK LOAD
NO OF TRANSISTORS
CLOCKING SCHEME
178
EE141
Latch versus Register
Latch Register stores data when
stores data when
clock is low/ HIGH
clock rises
179
EE141
D Q D Q
Clk Clk
Clk Clk
D D
Q Q
180
EE141
Storage Mechanisms Static
Dynamic (charge-based)
181
EE141
CLK
D Q
CLK
Static-----Mux-Based Latch-1
Q = CLK . Q +CLK . D
182
EE141
CLK LOAD-4
2 PHASE CLOCKING
10-TRANSISTORS
Mux-Based Latch(2)-
LESS CLK LOAD ,
183
EE141
CLK LOAD-2, 2 PHASE CLOCKING, 6-TRANSISTORS
Mux-Based Latch(3)-
LESS CLK LOAD ,Vt DEGRADATION
184
EE141
Non-overlapping clocks
NMOS only
185
EE141
Master-Slave
(EdgeTriggered) Register
186
EE141
Slave
Master CLK
0 Q D
1 QM
1
QM
D 0 Q
CLK
CLK
Two opposite latches trigger on edge

Also called master-slave latch pair
Master-Slave Register
187
EE141
Multiplexer-based latch pair
I2 T2 I3 I5 T4 I6 Q
QM
D I1 T1 I4 T3
CLK
188
EE141
TIMING METRICS
T set up = I1+T1+I3+I2
T CLK-Q = T3+ I6
T HOLD = ~0
EXACT VALUES CAN BE OBTAINED
THROUGH SIMULATION
Reduced Clock Load Master-
Slave Register
SIZING IMPORTANT-REVERSE CONDUCTION
189
EE141
I2 Must Be Weak
When Slave Is On----reverse Conduction Is Possible
190
EE141
TIMING METRICS
T set up = T1+I1
T CLK-Q = T2+ I3
T HOLD = ~0 (OR T1)
THROUGH SIMULATION
191
EE141
CLK CLK
(a) Schematic diagram
CLK
161
CLK
(b) Overlapping clock pairs
EE141
CLK X CLK
Q
A
D
B
Avoiding Clock Overlap
193
EE141
Non overlapping phases
194
EE141
TIMING METRICS
T set up = T1+I1
T CLK-Q = T2+ I3
T HOLD = ~0 (OR T1)
THROUGH SIMULATION
Overpowering the Feedback Loop
195
EE141
Cross-Coupled
Pairs
NOR-based set-reset
S S R Q Q
R
0 0 Q Q
1 0 1 0
0 1 0 1
1 1 0 0
196
EE141
R
Forbidden State
Cross-Coupled NAND
Added clock
Cross-coupled NANDs VDD
197
EE141
S M2 M4
Q
Q
Q
Q CLK M6 M8 CLK
R M1 M3
S M5 M7 R
This is not used in datapaths any more,

but is a basic building memory cell
198
EE141
Dynamic registers
TIMING METRICS
T set up = T1
T CLK-Q = I1+T2+ I2
199
EE141
T HOLD = ~0 (OR T1)
EXACT VALUES CAN BE OBTAINED THROUGH SIMULATION
IN OVERLAP--
200
EE141
OVELAPS
201
EE141
Other Latches/Registers:
2
C MOS
202
EE141
VDD VDD
M2 M6
CLK M4 CLK M8
X
D Q
CL1 CL2
CLK M3 CLK M7
M1 M5
Master Stage Slave Stage
Keepers can be added to make circuit pseudo-static
Insensitive to Clock-Overlap
203
EE141
204
EE141
Dual edge registers
205
EE141
Single phase clock
Latches/Registers: TSPC
206
EE141
207
EE141
Including Logic in TSPC
VDD VDD VDD VDD
In1 In2
PUN
Q Q
In CLK CLK CLK CLK
PDN In1
In2
Example: logic inside the latch

AND latch
208
EE141
Reduced complexity
209
EE141
TSPC Register
VDD VDD VDD
CLK Q
M3 M6 M9
Y
Q
D CLK X CLK
M2 M5 M8
CLK
M1 M4 M7
210
EE141
Pulse-Triggered Latches
An Alternative Approach
Ways to design an edge-triggered sequential cell:
Master-Slave Pulse-Triggered
Latches Latch
L1 L2 L
Data Data
D Q D Q D Q
Clk Clk Clk

Clk
Clk
211
EE141
Pulsed register-avoid race,
single latch
V DD V DD
M3 M6 V DD
CLK
Q
D CLKG CLKG MP CLKG
M2 M5
X
MN
M1 M4
( a) register ( b) glitch generation
CLK
CLKG
( c) glitch clock
212
EE141
Pulsed Latches
CLK P1 P3
x Q
M6
M3
D P2 M5
M2
M4
M1 CLKD
213
EE141
Sense amplifier based register
214
EE141

Microsoft PowerPoint - Lecture1

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Microsoft PowerPoint - Lecture1

Загружено:

Авторское право:

Доступные форматы

Advance VLSI Design

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

System-level engineering constrains what the architect can do

Digital system engineering

IC-IC, block to block , board to board communication

Logic to logic communication

BITS Pilani, Pilani Campus

parallel/ vs. serial I/O

BITS Pilani, Pilani Campus

Other problems include

Simultaneous Switching Outputs--When too many switch simultaneously, ground

BITS Pilani, Pilani Campus

Serial I/O-based designs offer many advantages over traditional parallel

BITS Pilani, Pilani Campus

Within chip, parallel communication is not expensive as

BITS Pilani, Pilani Campus

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

BITS Pilani, Pilani Campus

Spectrum - synonym - used only in analog

Bandwidth in digital realm - often used to refer

BITS Pilani, Pilani Campus

Most digital signals are aperiodic

Period and frequency are not appropriate to describe

Bit Interval - time to send one bit

Bit rate - number of bits send in a second. Measured

Do NOT use Hz when you mean bps or vice-versa

System design issues

Timing and synchronization

A timing convention may be--

Periodic---with a new symbol driven on a signal line at regular

Here we may use a local clock source to determine when the

Aperiodic---with new symbols arriving at irregular times.

This transition may be on a separate clock line that may be

BITS Pilani, Pilani Campus

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

Use of global clock---

Many systems, or parts of systems, use a global clock to which all

All signals change in response to a transition on the clock and are

Using a global clock is convenient in a small subsystem because it

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

A good timing convention is one that manages

BITS Pilani, Pilani Campus

timing property (delay) of transmitter and receiver,

timing noise like skew and jitter .(or uncertainty)

Larger is the uncertainty, smaller is the data rate achieved

BITS Pilani, Pilani Campus

Pipelining timing convention

BITS Pilani, Pilani Campus

The clock for the data at the output of a module is typically

Types of timing convention

With closed-loop timing, one or more, system timing parameters,

BITS Pilani, Pilani Campus

Conventional single-edge-triggered flip-flops, the clock frequency is twice the highest

Signals and events

Periodic events--equally spaced in time, we can use an

internal time base to count out the intervals between symbols

Aperiodic events---an explicit transition is required to start

every symbol. Because of three possibilities:

continue sending the current symbol,

start the next symbol with the same value, and

start the next symbol with the complement value.

Conventional single-edge-triggered flip-flops, the clock frequency is twice the highest