Вы находитесь на странице: 1из 214

Advance VLSI Design

MEL G623
BITSPilani ANU GUPTA
Pilani Campus EEE
BITSPilani
Pilani Campus

Introduction
HANDOUT

http://nalanda.bits-pilani.ac.in/

BITS Pilani, Pilani Campus


Prerequisite

VLSI DESIGN
Digital system engineering
Engineering problems of composing circuits into systems are
only briefly touched upon

BITS Pilani, Pilani Campus


Architecture deals with organizing a system and
defining interfaces (e.g., instruction sets and channel protocols)
to achieve cost and performance goals.

System-level engineering constrains what the architect can do


and is a major determinant of the cost and performance of the
resulting system.

Digital system engineering


Engineering view of digital systems
Speed, pin count
Signalling conventions
BITS Pilani, Pilani Campus
Timing and synchronization
Power distribution
Noise
Communication

IC-IC, block to block , board to board communication

Logic to logic communication


Data transmission

BITS Pilani, Pilani Campus


Data can be sent either serially, one bit after another through a single
wire, or in parallel, multiple bits at a time, through several parallel wires.
Most famously, these different paradigms are visible in the form of the common PC ports "serial
port" and "parallel port".

Early parallel transmission schemes often were much faster than serial schemes (more wires =
more data faster), but the added cost and complexity of hardware (more wires, more complicated
transmitters and receivers). Parallel transmission protocols are now mainly reserved for
applications like a CPU bus or between IC devices that are physically very close to each other,
usually measured in just a few centimeters.

Serial data transmission is much more common in new communication protocols due to a
reduction in the I/O pin count, hence a reduction in cost.
Common serial protocols include SPI, and I2C. Surprisingly, serial transmission methods can
transmit at much higher clock rates per bit transmitted, thus tending to outweigh the primary
advantage of parallel transmission.
Serial protocols are used for longer distance communication systems, ranging from shared external
devices like a digital camera to global networks or even interplanetary communication for space
probes, however some recent CPU bus architectures are even using serial methodologies as well.

parallel/ vs. serial I/O

BITS Pilani, Pilani Campus


For years, parallel communication schemes offered clear
advantages for moving data quickly from chip to chip, board to board or system to
system

But Parallel I/O could work - but only by applying significant engineering resources.
Stringent specifications in the PCIX standard for rise and fall times, drive strengths,
path delays and skews, for example, have proven so expensive that it has been
adopted today only in high-end applications such as computer servers.

Other problems include

Simultaneous Switching Outputs--When too many switch simultaneously, ground


bounce creates a lot of noise.

EMI

Cost

BITS Pilani, Pilani Campus


Industry analysts agree that the High-Speed Serial
Initiative is inevitable because parallel I/O schemes reach physical
limitations when data rates begin to exceed just 1 Gb/s

Serial I/O-based designs offer many advantages over traditional parallel


implementations including
fewer device pins,
reduced board space requirements,
fewer printed circuit board (PCB) layers,
easier layout of the PCB, smaller connectors,
lower electromagnetic interference, and better noise immunity.

BITS Pilani, Pilani Campus


Chip to chip communication, board to board
communication can be using parallel i/o, serial i/o. serial
communication is preferred as it requires less pin, has
less problems. But serializer / deserializer circuits are
required

Within chip, parallel communication is not expensive as


wire lengths are very small, no i/o pads are required

BITS Pilani, Pilani Campus


Pin count is the first problem encountered when trying to move a lot of data
in and out of a chip. The number of input and output pins is always limited.
BITS Pilani, Pilani Campus
Although pin count tends to increase over time, it is never
enough to keep up.

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Engineers view of digital
system

BITS Pilani, Pilani Campus


View a digital system in terms of information flow,
power flow and timing
Bandwidth
Bandwidth - A range of frequencies
Analog - measured in Hz
Bandwidth = High-Freq Low-Freq

Spectrum - synonym - used only in analog


measurements.

Bandwidth in digital realm - often used to refer


to bits-per-second (bps)
Bps= 8 bps
CSIS 625
14

BITS Pilani, Pilani Campus


BITS Pilani, Pilani Campus

Bit Rate

Most digital signals are aperiodic

Period and frequency are not appropriate to describe


digital signals

Bit Interval - time to send one bit

Bit rate - number of bits send in a second. Measured


in bits per second bps - Bits Per Second

Do NOT use Hz when you mean bps or vice-versa


15
CSIS 625
Total die bandwidth is the product of signal
pin count and bandwidth/pin.
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Total die bandwidth is the product of signal pin count and
bandwidth / pin.

System design issues

Timing and synchronization


Signalling
Power Distribution
Noise
Need for timing
A timing convention governs when a
transmitter drives symbols onto the signal line and when they
are sampled by the receiver.

A timing convention may be--

Periodic---with a new symbol driven on a signal line at regular


time intervals.

Here we may use a local clock source to determine when the


next symbol is to arrive.
BITS Pilani, Pilani Campus

Aperiodic---with new symbols arriving at irregular times.


BITS Pilani, Pilani Campus
Here an explicit transition is required to
signal the arrival of each symbol.

This transition may be on a separate clock line that may be


shared among several signals (bundled signaling).
3 basic timing models for
communication between two
ICs

copy of the clock is sent along with the data. The output time
Communication between two blocks where a common clock is of the forwarded clock is adjusted so that the clock
applied to both transmission and reception block. transitions in the middle of the bit cell

BITS Pilani, Pilani Campus


the data stream contains both the data and the clock

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

Use of global clock---


synchronous system , periodic timing

Many systems, or parts of systems, use a global clock to which all


signals in the subsystem are synchronized.

All signals change in response to a transition on the clock and are


sampled in an aperture region surrounding a clock transition.

Using a global clock is convenient in a small subsystem because it


places all signals in a single clock domain, obviating the need for
synchronization that would otherwise be required
in moving between clock domains

BITS Pilani, Pilani Campus


Drawbacks of global clock
In such system the maximum data rate is limited by the maximum
delay of the system rather than only uncertainty in delay.

If any cables or logic modules have a delay longer than one clock
period, the system will only operate over certain (often narrow)
windows of clock frequencies.

Clk signal comes earlier than the data as data is usually delayed.
Hence , global clock is not usually centered on the eye of the signals
it is sampling. This convention tends to be less tolerant of timing noise

BITS Pilani, Pilani Campus


Good Timing Convention

A good timing convention is one that manages


timing noise ( skew, and jitter) in an efficient manner
and allows the system to operate over a wide range
of data rates and to achieve max data rate.

Data rate

BITS Pilani, Pilani Campus


The rate at which we can send data over a
line or through a block of combinational logic is governed by

timing property (delay) of transmitter and receiver,


The rise time of the transmitter determines how fast a new symbol can be put on the line.
This signal must then remain stable during the sampling window of the receiver to be reliably
detected

timing noise like skew and jitter .(or uncertainty)


Uncertainty in the arrival time of the signal at the receiver and uncertainty in
the timing of the transmit and receive clocks must be compensated by widening the
cell to allow for the worst-case timing plus a margin.

Larger is the uncertainty, smaller is the data rate achieved

The rise (transition) time of the transmitter determines how fast a new
symbol can be put on the line.

BITS Pilani, Pilani Campus


This signal must then remain stable during the
sampling window of the receiver to be reliably detected.
The minimum bit cell is widened further by timing noise in the system.

Timing noise constitutes Uncertainty in the arrival time of the signal at the
receiver and uncertainty in the timing of the transmit and receive clocks
must be compensated by widening the cell to allow for the worst-case
timing plus a margin.

Pipelining timing convention

BITS Pilani, Pilani Campus


A pipeline timing convention overcomes the
limitations of a global clock by generating a clock for each
data signal that is nominally centered on the eye of the data
signal.

The clock for the data at the output of a module is typically


generated by delaying the clock used at the input of the
module by the nominal delay of the module plus one-half
clock cycle

Types of timing convention


A timing convention may operate either open loop or closed loop.
BITS Pilani, Pilani Campus
In an open loop timing system,
the frequencies and delays associated with system timing are not subject
to control. The system must be designed to tolerate the worst case
variation in these parameters.

With closed-loop timing, one or more, system timing parameters,


delays and frequencies, are
actively controlled.
The system measures a timing parameter such as skew and uses
feedback control to adjust the variable parameters to reduce the skew.

This active control can greatly decrease the timing uncertainty in a system
and hence increase the maximum data rate if pipeline timing is employed.

BITS Pilani, Pilani Campus


open-loop, global-clock
synchronous system

A 400 MHz clock is distributed to the transmitter and receiver from a master
clock generator over transmission lines that are matched to 100 ps. A
single-stage buffer at the clock generator, B1, introduces a timing
uncertainty of 100 ps, and the four-stage on-chip clock generator, B4,
adds 400 ps of timing uncertainty.

Conventional single-edge-triggered flip-flops, the clock frequency is twice the highest


data frequency, which complicates
the design of clock buffers and the clock distribution
network, and, to first approximation, doubles the clock
power BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

Signals and events


At any point in time a signal carries a symbol, a digital value,
between two points in a system
Over time the signal carries a stream of symbols, one after the
other, in sequence. We need a source of timing information to
determine where one symbol stops and the next symbol,
possibly with the same value, begins
We refer to the combination of a symbol value and the time it
starts as an event.
BITS Pilani, Pilani Campus

Periodic events--equally spaced in time, we can use an

internal time base to count out the intervals between symbols

Aperiodic events---an explicit transition is required to start

every symbol. Because of three possibilities:

continue sending the current symbol,

start the next symbol with the same value, and

start the next symbol with the complement value.

So, Either a ternary signal or two binary signals are required to encode
aperiodic binary event
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

open-loop, global-clock
synchronous system

A 400 MHz clock is distributed to the transmitter and receiver from a master
clock generator over transmission lines that are matched to 100 ps. A
single-stage buffer at the clock generator, B1, introduces a timing
uncertainty of 100 ps, and the four-stage on-chip clock
generator, B4, adds 400 ps of timing uncertainty.

Conventional single-edge-triggered flip-flops, the clock frequency is twice the highest


data frequency, which complicates the design of clock buffers and the clock
distribution network, and, to first
approximation, doubles the clock power BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Example-
Timing of a signal traveling in a digital system with reqd
data rate 400 Mbps

Constraints---
The system operates at 400 Mb/s (bit-cell time tbit = 2.5 ns)
over twire = 6.25 ns (2.5 bits) of cable.

The delay of individual wires is matched to within 100 ps,


the equivalent of 1.5 cm.

BITS Pilani, Pilani Campus


TIMING BUDGET-
Components
3 major components of a timing budget

The uncertainty, tu is the difference between the nominal


waveform and the early or late waveform.

The aperture time, ta is a property of the receiver, its


flipflops, or both.

The transition time, tr is the time required for each


waveform to switch states.

BITS Pilani, Pilani Campus


Nominal timing budget
2.5-ns bit cell/ interval (required time for data flow, time to send one
bit) , I ns is used for transition time, leaving a 1.5-ns eye opening.
The 300 ps aperture or sampling window is nominally centered on the
eye,
leaving a 600-ps gross timing margin on either side.
This gross timing margin is the amount of timing uncertainty, skew and
jitter, that either system can tolerate

BITS Pilani, Pilani Campus


eye diagram
An eye diagram of a signal overlays the signal's
waveform over many cycles
An eye diagram provides a visual indication of the
voltage and timing uncertainty associated with a signal

A Signal and Its Eye


Diagram

BITS Pilani, Pilani Campus


Max eye opening

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


is an indication of AC voltage noise

The size of the eye opening in the center of the diagram


indicates the amount of voltage and timing margin available to
sample this signal.

The vertical thickness of the line bunches in the eye diagram

the horizontal thickness of the bunches where they cross over


is an indication of AC timing noise or jitter

If a margin rectangle with width equal to the required timing


margin and
BITS Pilani, Pilani Campus
height equal to the required voltage margin
fits in the opening, then the signal has adequate margins
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Abstract eye diagram-for
min timing budget

BITS Pilani, Pilani Campus


worst-case late
timing.

net timing margin

3 superimposed waveform pairs,


nominal timing,
worst-case early timing,

The cycle time must be large enough to account for uncertainty, aperture, and
rise time
it is the transition time component of the timing budget that is traded off against
voltage margin when fitting a margin rectangle into the eye opening.
Min. time required
In the absence of timing noise,

the minimum bit-cell width


=transmitter rise time + receiver sampling

window. In the presence of timing noise the

minimum bit-cell width

= transmitter rise time + receiver sampling window


+ timing noise

BITS Pilani, Pilani Campus


BITS Pilani, Pilani Campus

Diff. in timing convention


Example

Timing of a signal traveling in a digital


system with reqd. data rate 400 Mbps
Using open-loop timing,
global-clock synchronous system
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

BITS Pilani, Pilani Campus


open-loop, global-clock
synchronous system
Open loop timing case has a total timing
uncertainty of 2.15 ns = 1.55-ns skew + 0.6-ns jitter = 2150ps

far greater than the 1200 ps allowed for correct


operation.

BITS Pilani, Pilani Campus


BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Allowable Clock Rates
An open-loop timing approach will only sample at the center of
the eye at clock rates

BITS Pilani, Pilani Campus


Range of bit periods---
-
Parallel communication
Range of bit periods for which system operate properly

For best performance, we want to center the sampling clock on the


eye. so Line delay must be a multiple of half the clock!

After 0.5 tbit, data , 1.5 t bit can be sampled


Cable delay -negligible

BITS Pilani, Pilani


Campus
Range of bit periods----
Serial communication
Range of bit periods for which system operate properly
If N is the number of bits in flight on the wire at any given time

For best performance, we want to center the sampling clock on the


eye. so Line delay must be a multiple of half the clock!

BITS Pilani, Pilani


Campus
Delay the clock by the same amount as the data plus
half a bit cell
Data residing on the cable at different locations of it

Range of bit periods


Serial communication-long wire
Range of bit periods for serial comm. (no
equipotential criteria)

BITS Pilani, Pilani


Campus
In Approach A , a 400-MHz clock and single edge-
triggered (SET)
flip-flops are not
able to give a
400-Mb/s data
rate.

N larger than 3 gives non-

overlapping constraints

(negative net timing margin). for

N = 1 there is no maximum bit-cell time.

Closed-Loop Timing-
Measure and Cancel Skew

BITS Pilani, Pilani


Campus
All skew can be canceled by a variable delay
element

in clock or data path

Delay line is adjusted to center the clock on the eye

To adjust the delay, need to measure the timing

Usually an iterative process, measure-adjust-measure...

BITS Pilani, Pilani


Campus
Closed loop timing
Approach B
closed-loop timing
controller

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

BITS Pilani, Pilani Campus


Total timing uncertainty of = 450 ps (250-ps skew and 200-ps jitter),
leaving a net timing margin of 150 ps.
Data rate--- >400Mbps can be achieved

Further advantage that it will work at any clock rate as long as the total
timing uncertainty is less than the gross timing margin.

In Approach B , a 400-MHz clock and single -edge-triggered (SET) flipflops


give a 400-Mb/s data rate.

Approach Badv.

BITS Pilani, Pilani Campus


Reference clock, RClk, along with the data and uses a
feedback loop to center the lClk, on the eye of the
reference clock.

This approach is insensitive to the static phase difference or skew between


the transmit and receive clocks.

Thus, Approach B does not require matched lines for clock distribution.

This approach is still sensitive to dynamic phase difference or jitter in the


receive clock before the B4 buffer and to relative skew between the data
lines and the reference clock.

BITS Pilani, Pilani Campus


Design changes in B
Other modifications----
200-MHz clock and double-edge-triggered (DET) flip-flops to give a
400Mb/s data rate.

The DET flip-flops sample their inputs on both edges of the clock.

With this approach the maximum toggle frequency of the clock is


identical to the maximum toggle frequency of the data.

Closed-loop timing cancels most sources of skew in a timing system,


obviating
the need to match the delay of components or wires and thus
considerably simplifying system design.

With closed-loop timing, jitter becomes the major constraint on a timing


budget
BITS Pilani, Pilani Campus
Other considerations
Other variables in the design of a timing convention
Degree of synchronization
Periodicity
Encoding of events

BITS Pilani, Pilani Campus


BITSPilani
Pilani Campus

Timing nomenclature
Timing nomenclature
Delay and Transition Times
Other definitions
Relative phase between two periodic signals

-----When working with periodic signals it is often convenient to express

a delay as the relative phase between two signals

Plesio-chronous signals---if two signals A and B have frequencies that

differ by fAB = |fA - fBI fA, then for short periods they can be treated as if

they were synchronized.

Over longer periods, however, the slow drift of phase must be accounted for

BITS Pilani, Pilani Campus


Combinational logic delays

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Timing properties of edge
triggered FF
The setup time is the delay from the data's becoming valid to the
rising edge of the clock. because delays are specified from the
50% point of the waveform,

So setup time is from tr /2 before the beginning of the aperture to


the rising edge of the clock.

the aperture time is specified from the 10% or 90% point of the
waveform.
BITS Pilani, Pilani Campus

Setup and hold time


Necessary to storing correct value in register

Setup time-----data should have its valid value ready when


register opens

Hold time----Do not change data immediately after the edge


has come because register takes time to store the correct
value at its intermediate nodes

BITS Pilani, Pilani Campus


Timing Properties of
Clocked edge triggered FF
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

Edge triggered FF

The aperture offset time, tao, is the time from the center of the aperture to
the rising edge of the clock.

The output remains at its previous value until at least a contamination


delay, tcCQ, after the clock, and the output changes to its new value (y) at
most a propagation delay, tdCQ, after the clock

BITS Pilani, Pilani Campus

Double edge triggered


BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

Timing Properties of Clocked


Level-Sensitive Latch
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
FF/ LATCH---comparison
With a flip-flop, the aperture time and delay are both referenced to the
positive edge of the clock and there is no delay property referenced to
the data input.

With a latch, however, the aperture and delay timing properties are
referenced to different edges of the clock, and an additional delay
property is referenced to the data input.

The aperture time of a latch is referenced to the falling edge of the


clock because this is the edge that samples the data.

BITS Pilani, Pilani Campus


BITSPilani
Pilani Campus

Signalling
Encoding Signals/ Events
Over time the signal carries a stream of symbols, one after the
other, in sequence.

We need a source of timing information to determine where one


symbol stops and the next symbol, possibly with the same
value, begins.

The combination of a symbol value and the time it starts is an


event.

BITS Pilani, Pilani Campus


EncodingAperiodic Events
For events that are not periodic, an explicit transition is
required to start every symbol.

A binary event cannot be encoded on a single binary


signal, as there are three possibilities:
continue sending the current symbol,
start the next symbol with the same value,
start the next symbol with the complement value.

Either a ternary signal or two binary signals are required to


encode a binary event.

Return-to-Zero (RZ)
BITS Pilani, Pilani Campus
Non-return-to-Zero (NRZ) Signaling

Ternary Signaling

Dual-Rail Signaling

Clocked Signaling

BITS Pilani, Pilani Campus


NRZsignaling requires
remembering the old state
RZ/ NRZ
RZ-- Return-to-zero signaling is advantageous in cases where power
is dissipated only when the line is high.

Only the positive transitions are significant.


It also simplifies the decoding logic

Doubles the number of transitions required to encode the events.

NRZ Non-return-to-zero signaling requires remembering the old state


of both lines to decode the present value.

BITS Pilani, Pilani Campus


Clocked
Signals that are bundled with a single clock signal are in the same clock
domain.

That is, all of the signals change in response to the same source of
events.

Signals in the same clock domain can be combined in combinational


logic, giving an output in the same clock domain (but with greater delay).

Advantage of separating the timing from the symbol-- it permits bundling


of several values with a single event stream.
Disadvantage--Tightens the constraints on line-to-line skew

Bundling in Clocked Signaling

BITS Pilani, Pilani Campus


BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

Ternary
The three states of the logic signal are needed to encode
the 3 possibilities :
staying in the current state denotes continuing the current symbol,
changing to the higher of the two other states signals the start of a new 1 symbol,
changing to the lower of the two other states denotes starting a new 0 symbol. Advantage--
Single wire is required

Disadvantagedifficult to distinguish a transition from state 0 (the lowest state) to state 2


(the highest state)

NRZ--Only the time interval between the crossing of the two thresholds U0 and U 1
distinguishes these two cases

RZ--Because the signal returns to zero following each event, each transition crosses only
a single threshold.
There is no ambiguity or timing uncertainty associated with crossing multiple thresholds in
a single transition; however it does require twice the number of transitions as ternary NRZ
signaling.

BITS Pilani, Pilani Campus


Encoding Periodic Events
If symbols are transmitted at fixed timing intervals, timing information can be
determined implicitly by measuring the intervals using a local time base.

All that is required in this case is that transitions on the signal be frequent
enough to keep the transmit and receive time bases from drifting too far apart.

The maximum number of bit cells between transitions to synchronize transmit


and receive time bases is given by

where td is the timing margin budgeted to phase drift, f is the maximum


frequency difference in s-l, and tc is the bit cell time.
BITS Pilani, Pilani Campus
Example
1 Gb/s link (tc = 1 ns) where we have budgeted 100 ps to phase

drift and the two time bases are matched to f = 100 kHz.

For this link, N is 1000. The link can run for 1,000 bit cells without a
transition before the phase drifts by more than 100 ps.

If we need to tolerate single errors on the link, we must make sure


transitions to synchronize happen at least twice this often so that we
can miss one transition without drifting out of our phase budget.

BITS Pilani, Pilani Campus


Transmitter and receiver clks.

Dummy Nth bit.

transitions to synchronize

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

Bit stuffing
To ensure that a transition is present at least once
every N bit cells, in the worst case we must
consume 1/ N of the channel bandwidth by inserting a transition cell
that carries no data after every N - 1 data bits

The simplest approach is to insert this transition cell whether we need


to or not.
After executing a startup protocol to synchronize to the bit level, the
transmitter inserts a dummy transition bit every N bits, and the
receiver ignores every Nth bit.

BITS Pilani, Pilani Campus


Open-loop Synchronous
Timing
All signals are synchronized with a common clock.

Every signal potentially makes a transition within some phase range


relative to a master clock signal and is sampled within a (possibly
different) phase range of the clock.

Such a system is open-loop if the delays or phase relationships of the


signals are uncontrolled.

No delay or phase adjustments can be made to compensate for


measured mismatches.

BITS Pilani, Pilani Campus


Types of OLST
Using a single global clock to which all signals are synchronized.

Using a single-phase global clock driving edge-triggered flipflops


gives maximum and minimum delay constraints, both of which are
dependent on skew

Operating a pipeline using a global clock gives a maximum data


rate that is related to the maximum delay of the longest pipeline
stage.

BITS Pilani, Pilani Campus


Using a global clock to drive edge-triggered flip-flops
is popular approach to low end logic design.

Edge-triggered flip-flop gives a simple model of sequential logic design


in which all signals are in the same clock domain, the output of the
flipflop is the current state, and the input of the flip-flop is the next state

Edge-triggered flip-flops are widely available in most SSI logic families


(CMOS, TTL, and ECL)

BITS Pilani, Pilani Campus


Global Clk.--Edge-Triggered
State-Feedback Timing

BITS Pilani, Pilani Campus


Global Clk.-- Pipeline timing

The timing of such a pipeline is optimized by generating a clock for each


pipeline stage that is centered on eye

Such a clock is easily generated by delaying the clock from the previous
stage.

As long as there is no feedback in a pipeline, the throughput of the


pipeline is limited primarily by the timing uncertainty of each stage.

The nominal delay of the stages affects latency but not throughput.

Gives a maximum data rate that is determined by the largest timing


uncertainty of a pipeline stage.

BITS Pilani, Pilani Campus


Open loop global clk synchronous
system----Timing Parameters of case1
R1 R2
In Combinational D Q DQ Logic CLK

tCLK1 tCLK2

tc q tlogic
tc q, cd tlogic, cd tsu, thold

Assume positive edge triggered system, one bit data transition


on a wire
The setup and hold representation is more convenient in
openloop timing systems where Tsetup figures in the maximum
delay calculation and Thold figures in the
minimum delay calculation.84
Timing constraint
Ideal clock

Minimum cycle time:


T > tc-q + tsu + tlogic

Hold time constraint:


thold < t(c-q, cd) + t(logic, cd)
85

BITS Pilani, Pilani Campus

Clock Non-idealities
Clock skew
Spatial variation in arrival time of a clock transition.
It is caused by mismatches in clock path or clock load
It can be positive or negative depending upon routing
direction and position of clock source
Clock skew does not result in clock period variation but
only in phase shift
86

BITS Pilani, Pilani Campus

Positive and Negative Skew

R1 R2 R3
In Combinational Combinational
D Q D Q D Q
Logic Logic

CLK tCLK1 tCLK2 tCLK3


delay delay
(a) Positive skew

R1 R2 R3
In Combinational Combinational
D Q D Q D Q
Logic Logic

tCLK1 tCLK2 tCLK3


delay delay CLK


(b) Negative skew
87

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

Positive Skew
TCLK
TCLK
1 3
CLK1

CLK2 2 4
th

Launching edge arrives before the receiving edge

88
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Impact of positive clock
skew
R1 R2
In Combinational
D Q D Q
Logic

CLK tCLK1 tCLK2

tc q tlogic
tc q, cd tlogic, cd tsu, thold

Minimum cycle time:


T + = tc-q + tsu + tlogic

Worst case is when receiving edge arrives early (positive )

89
BITS Pilani, Pilani Campus

Race condition
Hold time constraint: thold + <

t(c-q, cd) + t(logic, cd)


90

BITS Pilani, Pilani Campus

Negative Skew
TCLK -
TCLK
1 3
CLK1

CLK2 2 4

Receiving edge arrives before the launching edge

91
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

Impact of negative clock skew


In

R1 R2
Combinational
D Q D Q
Logic

CLK tCLK1 tCLK2

tc q tlogic
tc q, cd tlogic, cd tsu, thold

Minimum cycle time:


T - = tc-q + tsu + tlogic
Worst case is when receiving edge arrives early (positive )

92

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


No Race condition
Probability of race condition is reduced or nil
thold - < t(c-q, cd) + t(logic, cd)

System never fails as new data latched on to R1 never gets


transferred to R2 as it would turn off

93

BITS Pilani, Pilani Campus


Impact of Jitter---always
slows down

ji tte r

In

94
s u, hold t
jitter

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

Pipeline Timing Case2


pipeline with a cycle time shorter than the maximum delay

Single edge triggered FF

BITS Pilani, Pilani Campus


A pipeline with a cycle time shorter than the

maximum delay has several data bits on flight on a

single wire
BITS Pilani, Pilani Campus
edge triggered FF
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Min. cycle time
with clk. timing uncertainty

Centering the clock on the eye gives the largest timing


margins

The minimum possible cycle time

Logic circuits that are pipelined with several bits in flight


along each path are often called wave-pipe lined.

BITS Pilani, Pilani Campus


Min logic delay---- tcd
An output may make several transitions before reaching the steady
state. The extra transitions are called a hazard or glitches.

We denote the minimum time for output j to make its first transition in
response to a transition on input i as the contamination delay tcdji It is
the minimum over all input states, s.

It is the delay to the first transition caused by an input transition, or


equivalently

it is the min. delay over all active paths in the circuit.

It is the min. over process, temperature, and voltage variations

BITS Pilani, Pilani Campus


Max Logic Delay-- Tlogic
It is the maximum over all input states, s.

It is the delay to the last transition caused by an input


transition, or equivalently
it is the maximum delay over all active paths in the circuit.

It is the maximum over process, temperature, and voltage


variations.

BITS Pilani, Pilani Campus


BITSPilani
Pilani Campus

Latch based design


Level-Sensitive Clocking
Level-Sensitive
ClockingNon-overlapping
clocks
Using two-phase, non-overlapping clocks with level-sensitive latches.

This method results in very robust timing that is largely insensitive to skew;

it does require that logic be partitioned across the two phases, and signals
from the two partitions cannot be intermingled.

Because of its advantages, two-phase clocking is the most common timing


discipline in use today. Using two-phase clock eliminates both the minimum
delay constraint and the skew sensitivity

BITS Pilani, Pilani Campus


Nominal operation

Performance Similar to
102

Slack borrowing
Enhanced performance due to flexible timing, yet no
design changes
Possible for logic block to utilize time that is left over from
the previous logic block.
Total logic delay can be more than one clock cycle

Clock rates can be higher than worst case critical delay


path

103

BITS Pilani, Pilani Campus


Non overlapping phasezero

104
Reg based vs. latch based--
example
Reg.

latch

EE141 105
Less Tclk

106
Level-Sensitive Pipeline
Timing
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
A starts changing with the rising edge of i ,
whereas B is sampled by the falling edge

of i+1

The minimum cycle time

BITS Pilani, Pilani Campus


BITSPilani
Pilani Campus

2 phase Non-overlap Clocking


CL1 can borrow slack from CL2
CL2 has very small delay
Max. delay constraintputs limit on slack
borrowing
If non overlapping region is 0, then

the two back-to-back latches are equivalent


to an edge triggered flip-flop.

Time borrowing
Time borrowing can be performed across clock cycles as
well as between the two phases of one cycle.
In general, a two-phase system will operate properly if--

if for every cycle or closed loop in the logic that crosses N


pairs of latches, the total delay of the combinational logic
around the loop is less than N clock cycles less the latch
delays

BITS Pilani, Pilani Campus


Effect of skew
Late 1, meets early 2, then minimum
delay constraint is required to meet the
hold-time requirements on the falling
edge of 1 .
Advantages

Performance advantage as it Enables


flexible timing by allowing one stage to
pass slack or to borrow time from others
Logic block can utilize time that is left
over from the previous block. [Slack
borrowing]
This is due to transparency of latch
during on time

132
Drawback
Clk signals to be planned carefully

CLB delays to be designed taking nto


account on/ Off conditions of latches

133
Register based design, Tclk= 125ns
assume register has zero delays

134
Clk

In Clb Clb
R1 R2 Clb Clb R3 Out
A B
a 75n b 50n c C d D e

T=125n

T 135
Latch based designcase-1(bad design)
L1--HI, L2-- LO , Tclk= 100ns

T 136
Clk

hi
lo

Clb Clb Clb Clb


Out
L1 A L2 B L3 C L4 D L5
a 75n b 50n c 25n d 25n e

50n Wait
idle

b d e
c

T 137
Latch based designcase-2

T 138
L1---HI, L2 LO Tclk= 100ns
max slack- 1/2 Tclk

T 139
Clk

T 140
Latch based designcase-3 L1---
LO, L2-- HI , Tclk= 100ns

T 141
Clk

hi
lo

Clb Clb Clb Clb


Out
L1 A L2 B L3 C L4 D L4
a 25n b 25n c 75n d 50n e

50n SLACK
Borrow

c d e
b

T 142
Latch based designcase-4 L1---
LO, L2 HI , Tclk= 100ns

T 143
Clk

hi
lo

Clb Clb Clb Clb


Out
L1 A L2 B L3 C L4 D L4
a 75n b 50n c 75n d 50n e

50n
W
R
O
N b
G

T 144
Maximum slack possible
Max time that can be borrowed is 0.5 Tclk
So max logic cycle delay can be 1.5 Tclk
But for n stages overall delay would be n
Tclk
Drawbacks
We have to use
two phase clocking scheme,
Glitches-power dissipation increases
145
Single-Phase or Zero
Nonoverlap Clocking

146
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Pipelines With Feedback

Often it is necessary to feed back information between pipeline stages.


For example in many types of pipelines, including communications
channels, flow-control information must propagate backwards from a
late pipeline stage to an early pipeline stage. All previous concept can
be applied here with care

Asynchronous systems--
Self timed approach
BITS Pilani, Pilani Campus
Sync. systems

logical ordering of events by clk. It provides a time base

Physical timing constraint- next edge comes when all


blocks have reached steady state

ProblemCLB has to wait even though it may finish


earlier. Need Clock distribution network

Asynch. designmeeting
constraints

149
BITS Pilani, Pilani Campus
Advnext block can start computation as
soon as previous block has finished.

Problem when to latch the output ? When output is a


correct value?

Remedysystem has to meet timing constraints


Local signals
Logical ordering and physical timing --

150
BITS Pilani, Pilani Campus
REQUEST , ACKNOWLEDGE - Logical
ordering

START, DONE -- physical timing


Self timed system
System generate its own timing signal

151
BITS Pilani, Pilani Campus
Self timed system

152
BITS Pilani, Pilani Campus
Hand shake
protocol
Hand shaking- synchronize by mutual agreement

Adv.--timing signals generated locallyless prop. Delay,


high speed, no clock routing

Disadv. hand shaking circuit design


2 phase protocol
153
BITS Pilani, Pilani Campus
Simple and fast

Transition signaling or event logic

Data transfer happens at both the edges (falling,


receiving)
No reset state

154
BITS Pilani, Pilani Campus
Implementation of HS
protocol2 phase (no return to
zero)

155
EE141
BITS Pilani, Pilani Campus
Req 2
Req
Ack
SENDER RECEIVER
Data 3
Ack

(a) Sender-receiver configuration Data 1 1

cycle 1 cycle 2
Senders action
Receivers action
(b) Timing diagram

156
EE141
BITS Pilani, Pilani Campus
2 phase protocol (NRZ)
Data change---request----data acceptance---
acknowledge

Data change---request----acknowledge----data
acceptance

Proceeds in cyclic order


2 active cycles-
Sender ---terminated by request event (no change in o/p
possible)
Receiver-------terminated by acknowledge event
4 phase protocol
EE141
2 4 Senders action
Req
Receivers action

Ack
3 5

Data 1 1

Cycle 1 Cycle 2

135
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

158
EE141
BITS Pilani, Pilani Campus
Dual rail coding
I bit information coded using two wires

Request/ done is merged with data wires

BITS Pilani, Pilani Campus


Bundled data / single rail
coding
Req.shdbe issued when data output is stable

BITS Pilani, Pilani Campus


BITS Pilani, Pilani Campus
Correct signaling
For correct functionality, signalling events must be
proper

For this we require handshaking module muller-C


element

Event must occur on both the inputs to create an output


i.e. both req and ack should be high to enable next
function block

BITS Pilani, Pilani Campus


Event Logic The Muller-C
Element
A B Fn 1
A
0 0 0
C F 0 1 Fn
B 1 0 Fn
1 1 1

(a) Schematic (b) Truth table


BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Asynchronous pipeline
example transition signaling
The simplest asynchronous pipeline, in which the LOGIC
blocks are just wires, acts as a first-in first- out (FIFO)
buffer

Here, data being clocked into the align stages by the input
request signal and clocked out by the output acknowledge
signal.

Dual rail

Bundled data
BITS Pilani, Pilani Campus
2-phase bundled data
(micro-pipelines)

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
2-Phase bundled data
Protocol--FIFO

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


143
EE141

Capture-Pass
transitioncontrolled latch
Transitions on C and
P alternate

Micropipelines
Elegant, no RZ
overhead

But implementation
(latches and other
control circuits) is
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
complex

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


145
EE141

Mouse trap pipeline


2 phase bundled data
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

Simple asynchronous implementation style, uses


transparent latches
simple control: 1 gate/pipeline stage
Target datapath = static logic blocks
MOUSETRAP: uses a capture protocol
Latches are normally transparent: before new data arrives
become opaque: after data arrives (capture data)
Control Signaling: transition-signaling = 2-phase
simple protocol: req/ack = only 2 events per handshake (not
4)
no return-to-zero
each transition (up/down) signals a distinct operation
Our Goal: very fast cycle time
simple inter-stage communication
Dual rail
BITSPilani
Pilani Campus

Memory element design

EE141 149
PERFORMANCE PARAMETERS
CLOCK LOAD
NO OF TRANSISTORS
CLOCKING SCHEME

178
EE141
Latch versus Register
Latch Register stores data when
stores data when
clock is low/ HIGH
clock rises

179
EE141
D Q D Q

Clk Clk

Clk Clk

D D

Q Q

180
EE141
Storage Mechanisms Static

Dynamic (charge-based)

181
EE141
CLK

D Q

CLK

Static-----Mux-Based Latch-1
Q = CLK . Q +CLK . D
182
EE141
CLK LOAD-4

2 PHASE CLOCKING

10-TRANSISTORS

Mux-Based Latch(2)-
LESS CLK LOAD ,
183
EE141
CLK LOAD-2, 2 PHASE CLOCKING, 6-TRANSISTORS

Mux-Based Latch(3)-
LESS CLK LOAD ,Vt DEGRADATION

184
EE141
Non-overlapping clocks

NMOS only

185
EE141
Master-Slave
(EdgeTriggered) Register

186
EE141
Slave
Master CLK

0 Q D
1 QM
1
QM
D 0 Q

CLK
CLK

Two opposite latches trigger on edge


Also called master-slave latch pair

Master-Slave Register
187
EE141
Multiplexer-based latch pair

I2 T2 I3 I5 T4 I6 Q

QM
D I1 T1 I4 T3

CLK

188
EE141
TIMING METRICS
T set up = I1+T1+I3+I2
T CLK-Q = T3+ I6
T HOLD = ~0
EXACT VALUES CAN BE OBTAINED
THROUGH SIMULATION
Reduced Clock Load Master-
Slave Register
SIZING IMPORTANT-REVERSE CONDUCTION

189
EE141
I2 Must Be Weak
When Slave Is On----reverse Conduction Is Possible

190
EE141
TIMING METRICS
T set up = T1+I1

T CLK-Q = T2+ I3
T HOLD = ~0 (OR T1)
EXACT VALUES CAN BE OBTAINED
THROUGH SIMULATION

191
EE141
CLK CLK
(a) Schematic diagram

CLK

161
CLK
(b) Overlapping clock pairs

EE141

CLK X CLK
Q
A
D
B

Avoiding Clock Overlap

193
EE141
Non overlapping phases

194
EE141
TIMING METRICS
T set up = T1+I1

T CLK-Q = T2+ I3
T HOLD = ~0 (OR T1)
EXACT VALUES CAN BE OBTAINED
THROUGH SIMULATION
Overpowering the Feedback Loop

195
EE141
Cross-Coupled
Pairs
NOR-based set-reset

S S R Q Q

R
0 0 Q Q
1 0 1 0
0 1 0 1
1 1 0 0

196
EE141
R

Forbidden State

Cross-Coupled NAND
Added clock
Cross-coupled NANDs VDD

197
EE141
S M2 M4
Q
Q
Q

Q CLK M6 M8 CLK
R M1 M3

S M5 M7 R

This is not used in datapaths any more,


but is a basic building memory cell

198
EE141
Dynamic registers

TIMING METRICS
T set up = T1
T CLK-Q = I1+T2+ I2
199
EE141
T HOLD = ~0 (OR T1)
EXACT VALUES CAN BE OBTAINED THROUGH SIMULATION
IN OVERLAP--

200
EE141
OVELAPS

201
EE141
Other Latches/Registers:
2
C MOS

202
EE141
VDD VDD

M2 M6

CLK M4 CLK M8
X
D Q
CL1 CL2
CLK M3 CLK M7

M1 M5

Master Stage Slave Stage

Keepers can be added to make circuit pseudo-static

Insensitive to Clock-Overlap
203
EE141
204
EE141
Dual edge registers

205
EE141
Single phase clock
Latches/Registers: TSPC

206
EE141
207
EE141
Including Logic in TSPC
VDD VDD VDD VDD

In1 In2
PUN
Q Q

In CLK CLK CLK CLK

PDN In1

In2

Example: logic inside the latch


AND latch

208
EE141
Reduced complexity

209
EE141
TSPC Register
VDD VDD VDD

CLK Q
M3 M6 M9
Y
Q
D CLK X CLK
M2 M5 M8

CLK
M1 M4 M7

210
EE141
Pulse-Triggered Latches
An Alternative Approach
Ways to design an edge-triggered sequential cell:

Master-Slave Pulse-Triggered
Latches Latch
L1 L2 L
Data Data
D Q D Q D Q

Clk Clk Clk


Clk
Clk

211
EE141
Pulsed register-avoid race,
single latch
V DD V DD

M3 M6 V DD
CLK
Q
D CLKG CLKG MP CLKG
M2 M5
X

MN
M1 M4

( a) register ( b) glitch generation

CLK

CLKG

( c) glitch clock

212
EE141
Pulsed Latches
CLK P1 P3
x Q

M6
M3

D P2 M5
M2

M4
M1 CLKD

213
EE141
Sense amplifier based register

214
EE141

Вам также может понравиться