Synchronus

TIMING AND CLOCKING ISSUES IN DIGITAL CIRCUITS
CHAPTER 1
INTRODUCTION
All sequential circuits have one property in commona well-defined ordering of
the switching events must be imposed if the circuit is to operate correctly. If this were not
the case, wrong data might be written into the memory elements, resulting in a functional
failure. The synchronous system approach, in which all memory elements in the system
are simultaneously updated using a globally distributed periodic synchronization signal
(that is, a global clock signal), represents an effective and popular way to enforce this
ordering. Functionality is ensured by imposing some strict constraints on the generation
of the clock signals and their distribution to the memory elements distributed over the
chip; non-compliance often leads to malfunction.
In a synchronous digital system, the clock signal is used to define a time reference
for the movement of data within that system. Since this function is vital to the operation
of a synchronous system, much attention has been given to the characteristics of these
clock signals and the networks used in their distribution. Clock signals are often regarded
as simple control signals; however, these signals have some very special characteristics
and attributes. Clock signals are typically loaded with the greatest fan-out, travel over the
longest distances, and operate at the highest speeds of any signal, either control or data,
within the entire system. Since the data signals are provided with a temporal reference by
the clock signals, the clock waveforms must be particularly clean and sharp. Furthermore,
these clock signals are particularly affected by technology scaling, in that long global
interconnect lines become much more highly resistive as line dimensions are decreased.
This increased line resistance is one of the primary reasons for the growing importance of
clock distribution on synchronous performance. Finally, the control of any differences in
the delay of the clock signals can severely limit the maximum performance of the entire
system as well as create catastrophic race conditions in which an incorrect data signal
may latch within a register.
1.1 Synchronous Digital Systems

Most synchronous digital systems consist of cascaded banks of sequential
registers with combinatorial logic between each set of registers. The functional
requirements of the digital system are satisfied by the logic stages. The global
Department of Electronics & Communication Engineering

performance and local timing requirements are satisfied by the careful insertion of
pipeline registers into equally spaced time windows to satisfy critical worst case timing
constraints. The proper design of the clock distribution network further ensures that these
critical timing requirements are satisfied and that no race conditions exist. With the
careful design of the clock distribution network system-level synchronous performance
can actually increase, surpassing the performance advantages of asynchronous systems by
permitting synchronous performance to be based on average path delays rather than worst
case path delays, without incurring the handshaking protocol delay penalties required in
most asynchronous systems.
In a synchronous system, each data signal is typically stored in a latched state
within a bistable register awaiting the incoming clock signal, which determines when the
data signal leaves the register. Once the enabling clock signal reaches the register, the data
signal leaves the bistable register and propagates through the combinatorial network and,
for a properly working system, enters the next register and is fully latched into that
register before the next clock signal appears. Thus, the delay components that make up a
general synchronous system are composed of the following three individual subsystems:
1) Memory storage elements
2) Logic elements
3) Clocking circuitry and distribution network.
1.2 Importance of Clock

Clock signals are typically loaded with the greatest fanout and operate at the
highest speeds of any signal within the synchronous system. Since the data signals are
provided with a temporal reference by the clock signals, the clock waveforms must be
particularly clean and sharp. Furthermore, these clock signals are particularly affected by
technology scaling (see Moore's law), in that long global interconnect lines become
significantly more resistive as line dimensions are decreased. This increased line
resistance is one of the primary reasons for the increasing significance of clock
distribution on synchronous performance. Finally, the control of any differences and
uncertainty in the arrival times of the clock signals can severely limit the maximum
performance of the entire system and create catastrophic race conditions in which an
incorrect data signal may latch within a register.

Most synchronous digital systems consist of cascaded banks of sequential
registers with combinational logic between each set of registers. The functional
requirements of the digital system are satisfied by the logic stages. Each logic stage
introduces delay that affects timing performance, and the timing performance of the
digital design can be evaluated relative to the timing requirements by a timing analysis.
Often special consideration must be made to meet the timing requirements. For example,
the global performance and local timing requirements may be satisfied by the careful
insertion of pipeline registers into equally spaced time windows to satisfy critical worstcase timing constraints. The proper design of the clock distribution network helps ensure
that critical timing requirements are satisfied and that no race conditions exist (see also
clock skew).
The delay components that make up a general synchronous system are composed
of the following three individual subsystems: the memory storage elements, the logic
elements, and the clocking circuitry and distribution network.
Novel structures are currently under development to ameliorate these issues and
provide effective solutions. Important areas of research include resonant clocking
techniques, on-chip optical interconnect, and local synchronization methodologies.
1.3 Organization of the Report

This report starts with an overview of the different timing methodologies. We
analyze the impact of spatial variations of the clock signal, called clock skew, and
temporal variations of the clock signal, called clock jitter, and introduce techniques to
cope with synchronous approach. These variations fundamentally limit the performance
that can be achieved using a conventional design methodology. The report is organized as
follows:
Chapter 1: Introduction - This chapter briefly explains the overview of the report.
Chapter 2: Timing Methodologies - This chapter describes the different timing
methodologies with respect to the clock system.
Chapter 3: Synchronous Timing Basics - This chapter discusses the timing parameters of
the combinational logic circuits and sequential logic circuits. This chapter also includes
the analysis of synchronous sequential circuit with relative to timing parameters.

Chapter 4: Clock Skew and Clock Jitter - This chapter explains sources of the clock skew
and clock jitter.
Chapter 5: Clock Distribution Techniques - This chapter describes the different clock
distribution networks used to minimize the clock skew.
Chapter 6: Conclusions - This chapter summarizes the major accomplishments of this
report.
CHAPTER 2
TIMING METHODOLOGIES
In digital systems, signals can be classified depending on how they are related to a
local clock. Signals that transition only at predetermined periods in time can be classified
as synchronous, mesochronous, or plesiochronous with respect to a system clock. A signal
that can transition at arbitrary times is considered asynchronous.
2.1 Synchronous Methodology

A synchronous signal is one that has the exact same frequency, and a known fixed
phase offset with respect to the local clock. In such a timing methodology, the signal is
synchronized with the clock, and the data can be sampled directly without any
uncertainty. In digital logic design, synchronous systems are the most straightforward
type of interconnect, where the flow of data in a circuit proceeds in lockstep with the
system clock as shown below.
Figure 2.1 Synchronous interconnect methodology

Here, the input data signal In is sampled with register R1 to give signal Cin, which
is synchronous with the system clock and then passed along to the combinational logic
block. After a suitable setting period, the output Cout becomes valid and can be sampled

by R2 which synchronizes the output with the clock. In a sense, the certainty period of
signal Cout, or the period where data is valid is synchronized with the system clock,
which allows register R2 to sample the data with complete confidence. The length of the
uncertainty period, or the period where data is not valid, places an upper bound on how
fast a synchronous interconnect system can be clocked.
2.2 Mesochronous Methodology

A mesochronous signal is one that has the same frequency but an unknown phase
offset with respect to the local clock (meso from Greek is middle). For example, if data
is being passed between two different clock domains, then the data signal transmitted
from the first module can have an unknown phase relationship to the clock of the
receiving module. In such a system, it is not possible to directly sample the output at the
receiving module because of the uncertainty in the phase offset. A (mesochronous)
synchronizer can be used to synchronize the data signal with the receiving clock as shown
below. The synchronizer serves to adjust the phase of the received signal to ensure proper
sampling.
Figure 2.2 Mesochronous approach using variable delay line.

In Figure 2.2, signal D1 is synchronous with respect to ClkA. However, D1 and
D2 are mesochronous with ClkB because of the unknown phase difference between ClkA
and ClkB and the unknown interconnect delay in the path between Block A and Block B.
The role of the synchronizer is to adjust the variable delay line such that the data signal
D3 (a delayed version of D2) is aligned properly with the system clock of block B. In this
example, the variable delay element is adjusted by measuring the phase difference
between the received signal and the local clock. After register R2 samples the incoming
data during the certainty period, then signal D4 becomes synchronous with ClkB.
2.3 Plesiochronous Methodology


A plesiochronous signal is one that has nominally the same, but slightly different
frequency as the local clock (plesio from Greek is near). In effect, the phase difference
drifts in time. This scenario can easily arise when two interacting modules have
independent clocks generated from separate crystal oscillators. Since the transmitted
signal can arrive at the receiving module at a different rate than the local clock, one needs
to utilize a buffering scheme to ensure all data is received. Typically, plesiochronous
interconnect only occurs in distributed systems like long distance communications, since
chip or even board level circuits typically utilize a common oscillator to derive local
clocks. A possible framework for plesiochronous interconnect is shown in Figure 2.3.
Figure 2.3 Plesiochronous communication using FIFO.

In this digital communications framework, the originating module issues data at
some unknown rate characterized by C1, which is plesiochronous with respect to C2. The
timing recovery unit is responsible for deriving clock C3 from the data sequence, and
buffering the data in a FIFO. As a result, C3 will be synchronous with the data at the input
of the FIFO and will be mesochronous with C1. Since the clock frequencies from the
originating and receiving modules are mismatched, data might have to be dropped if the
transmit frequency is faster, and data can be duplicated if the transmit frequency is slower
than the receive frequency. However, by making the FIFO large enough, and periodically
resetting the system whenever an overflow condition occurs, robust communication can
be achieved.
In telecommunications, a plesiochronous system is one where different parts of
the system are almost, but not quite, perfectly synchronized. According to ITU-T
standards, a pair of signals is plesiochronous if their significant instants occur at
nominally the same rate, with any variation in rate being constrained within specified
limits. A sender and receiver operate plesiosynchronously if they operate at the same
nominal frequency but may have a slight frequency mismatch, which leads to a drifting
phase. The mismatch between the two systems' clocks is known as the plesiochronous
difference.

In general, plesiochronous systems behave similarly to synchronous systems,
except they must employ some means in order to cope with "sync slips", which will
happen at intervals due to the plesiochronous nature of the system. The most common
example of a plesiochronous system design is the plesiochronous digital hierarchy
networking standard.
The asynchronous serial communication protocol is asynchronous on the byte
level, but plesiochronous on the bit level. The receiver detects the start of a byte by
detecting a transition that may occur at a random time after the preceding byte. The
indefinite wait and lack of external synchronization signals makes byte detection
asynchronous. Then the receiver samples at predefined intervals to determine the values
of the bits in the byte; this is plesiochronous since it depends on the transmitter to
transmit at roughly the same rate the receiver expects, without coordination of the rate
while the bits are being transmitted.
2.4 Asynchronous Methodology

Asynchronous signals can transition at any arbitrary time, and are not slaved to
any local clock. As a result, it is not straightforward to map these arbitrary transitions into
a synchronized data stream. Although it is possible to synchronize asynchronous signals
by detecting events and introducing latencies into a data stream synchronized to a local
clock, a more natural way to handle asynchronous signals is to simply eliminate the use of
local clocks and utilize a self-timed asynchronous design approach. In such an approach,
communication between modules is controlled through a handshaking protocol to perform
the proper ordering of commands.
Figure 2.4 Asynchronous methodology for simple pipeline interconnects.


When a logic block completes an operation, it will generate a completion signal
DV to indicate that output data is valid. The handshaking signals then initiate a data
transfer to the next block, which latches in the new data and begins a new computation by
asserting the initialization signal I. Asynchronous designs are advantageous because
computations are performed at the native speed of the logic, where block computations
occur whenever data becomes available. There is no need to manage clock skew, and the
design methodology leads to a very modular approach where interaction between blocks
simply occurs through a handshaking procedure. However, these handshaking protocols
result in increased complexity and overhead in communication that can reduce
performance.
CHAPTER 3
SYNCHRONOUS TIMING BASICS
3.1 Synchronous Sequential Systems
A digital synchronous circuit is composed of a network of functional logic
elements and globally clocked registers. For an arbitrary ordered pair of registers (R1, R2),
one of the following two situations can be observed: either the input of R 2 cannot be
reached from the output of R1 by propagating through a sequence of logical elements only
or there exists at least one sequence of logic blocks that connects the output of R 1 to the
input of R2. In the former case, switching events at the output of the register R 1 do not
affect the input of the register R2 during the same clock period. In the latter case
denoted by R1
R2signal switching at the output of R 1 will propagate to the input
of R2. In this case, (R1, R2) is called a sequentially-adjacent pair of registers which make
up a local data path (see Fig. 3.1).
Figure 3.1 Local data path.

Delay Components of Data Path: The minimum allowable clock period Tcp (min)between
any two registers in a sequential data path is given by
1
=T CP(min) T PD(max ) +T Skew
f clkMAX
(3.1)
+T Set up =D(i, f )
Where T PD (max)=T CQ +T Logic +T
(3.2)
and the total path delay of a data path T PD (max) is the sum of the maximum time required
for the data to leave the initial register once the clock signal Ci arrives, TC-Q, the time
necessary to propagate through the logic and interconnect, T Logic+TInt, and the time
required to successfully propagate to and latch within the final register of the data path,
TSet-up. Observe that the latest arrival time is given by TLogic (max) and the earliest arrival time
is given by TLogic (min), since data is latched into each register within the same clock period.
The sum of the delay components in (2) must satisfy the timing constraint of (1) in
order to support the clock period Tcp (min), which is the inverse of the maximum possible
clock frequency, fclkMAX. Note that the clock skew TSkewij can be positive or negative
depending on whether Cf leads or lags Ci, respectively. The clock period is chosen such
that the latest data signal generated by the initial register is latched in the final register by
the next clock edge after the clock edge that activated the initial register. Furthermore, in
order to avoid race conditions, the local path delay must be chosen such that, for any two
sequentially-adjacent registers in a multistage data path, the latest data signal must arrive
and be latched within the final register before the earliest data signal generated with the
next clock pulse arrives. The waveforms depicted in Fig. 2 show the timing requirement
of (1) being barely satisfied (i.e., the data signal arrives at R f just before the clock signal
arrives at Rf.
Figure 3.2 Timing diagram of clocked data path.

An example of a local data path Ri
Rf is shown in Fig. 3.1. The clock
signals Ci and Cf synchronize the sequentially- adjacent pair of registers R i and Rf,
respectively. Signal switching at the output of R i is triggered by the arrival of the clock
signal Ci. After propagating through the logic block L if, this signal will appear at the input
of Rf. Therefore, a nonzero amount of time elapses between the triggering event and the
signal switching at the input of Rf. The minimum and maximum values of this delay are
called the short and long delays, respectively, and are denoted by d(i,f) and D(i,f),
respectively. Note that both d(i,f) and D(i,f) are due to the accumulative effects of three
sources of delay. These sources are the clock-to-output delay of the register R i, a delay
introduced by the signal propagating through L if, and an interconnect delay due to the
presence of wires on the signal path, Ri
Rf.
3.2 Timing Parameters for Combinational Logic

Physically implemented combinational circuits (NAND or NOR gates for
example) exhibit certain timing characteristics.
A 0 or 1 applied at the input to a combinational circuit does not result in an
instantaneous change at the output because of various electrical constraints. Input-tooutput delay in combinational circuits can be expressed with two parameters, propagation
delay, tpd, and contamination delay, tcd. Both delays are important characteristics for
circuits. They determine the maximum clock rate.
3.2.1 Propagation delay (tpd)

The amount of time needed for a change in a logic input to result in a permanent
change at an output, that is, the combinational logic will not show any further output
changes in response to an input change after time tpd units.
3.2.2 Contamination delay (tcd)
The amount of time needed for a change in a logic input to result in an initial
change at an output, that is, the combinational logic is guaranteed not to show any output
change in response to an input change before tcd time units have passed.
Figure 3.3 Propagation and Contamination Delays in the Combinational Logic

Combinational propagation delays are additive and so the propagation delay of a larger
combinational circuit can be determined by adding the propagation delays of each of the
circuit components along the longest path. In contrast, finding the contamination delay of
the circuit requires identifying the shortest path of contamination delays from input to
output and adding the delay values along this path.
3.3 Timing Parameters for Sequential Logic

Sequential circuits can contain both Combinational Logic and edge triggered flip
flops. A synchronous circuit is a digital circuit in which the parts are synchronized by a
clock signal. In an ideal synchronous circuit, every change in the logical levels of its
storage components is simultaneous. These transitions follow the level change of a special
signal called the clock. Ideally, the input to each storage element has reached its final
value before the next clock occurs, so the behaviour of the whole circuit can be predicted
exactly. Practically, some delay is required for each logical operation, resulting in a
maximum speed at which each synchronous system can run.
To make these circuits work correctly, a great deal of care is needed in the design
of the Clock Distribution Networks. Static timing analysis is often used to determine the
maximum safe operating speed. When sequential circuits are physically implemented they
exhibit certain timing characteristics that unlike combinational circuits are specified in

relation to the clock input. A flip-flop is edge triggered. A flip-flop stores when the clock
rises and is mostly never transparent. Since flip-flops only change value in response to a
change in the clock value, timing parameters can be specified in relation to the rising (for
positive edge-triggered) or falling (for negative-edge triggered) clock edge. The following
parameters specify sequential circuit behavior. Note that these are all for positive edgetriggered flip-flops unless otherwise specified, but are easily applied to negative edge
triggered flip-flops as well.
3.3.1 Propagation delay (tclkq)
The amount of time needed for a change in the flip-flop clock input D to result in
a change at the flip-flop output Q. When the clock edge arrives, the D input value is
transferred to output Q. After time tclkq the output is guaranteed not to change value again
until another clock edge trigger arrives.
3.3.2 Contamination delay (tcd)
This value indicates the amount of time needed for a change in the flip-flop clock
input to result in the initial change at the flip-flop output Q. The output of the flip-flop
maintains its initial value until time tcd has passed and is guaranteed not to show any
output change in response to an input change until after tcd has passed.
Note: delays can be different for both rising and falling transitions.
3.3.3 Setup time (tsu)
The amount of time before the clock edge that data input D must be stable the
rising clock edge arrives.
3.3.4 Hold time (thold)
This indicates the amount of time after the clock edge arrives that data input D
must be held stable in order for the flip-flop to latch the correct value. Hold time is
always measured from the rising clock edge (for positive edge-triggered) to a point after
the clock edge.
Figure 3.4 Timing Parameters of Sequential Circuit

Setup and hold times are restrictions that a flip-flop places on combinational or
sequential circuitry that drives a flip-flop D input. The circuit has to be designed so the D
input signal arrives at least tsu time units before the clock edge and does not change until
at least thold time units after the clock edge. If either of these restrictions is violated for any
of the flip-flops in the circuit, the circuit will not operate correctly. These restrictions limit
the maximum clock frequency at which the circuit can operate.
3.4 Determination of the Maximum Clock Frequency

Most digital circuits contain both combinational components (gates, multiplexes,
adders, etc.) and sequential components (flip-flops). These components can be combined
to form sequential circuits that perform computation and store results. By using
combinational and sequential component parameters, it is possible to determine the
maximum clock frequency at which a circuit will operate and generate correct results.
This analysis can best be examined through use of an example.
Figure 3.5 Sequential Circuit with Corresponding Propagation Delays

The first step is model the given circuit so that the data paths between flip-flops
are characterized by the longest and shortest combinational delays between the flip-flops.
Direction of clock
Figure 3.6 Sequential Circuit characterized by the longest and shortest delays.
Let Tclk be the clock period. Then,
T clk t clkq (register )+ t pd logic (longest )+ t su(destinationregister )
(3.3)
T clk 0.6 ns +8 ns+ 0.4 ns

T clk 9 ns
Therefore the minimum clock period should be 9ns.

The hold time of the destination register must be shorter than the minimum
propagation delay through the logic network,
t h old <t c q ,cd +t logic , cd
(3.4)
The above analysis is simplistic since the clock is never ideal. As a result of
process and environmental variations, the clock signal can have spatial and temporal
variations.
CHAPTER 4
CLOCK SKEW AND CLOCK JITTER
4.1 Clock Skew
The spatial variation in arrival time of a clock transition on an integrated circuit is
commonly referred to as clock skew. The clock skew between two points i and j on a IC
is given by (i,j) = ti - tj, where ti and tj are the position of the rising edge of the clock
with respect to a reference. The clock skew can be positive or negative depending upon
the routing direction and position of the clock source. The timing diagram for the case
with positive skew is shown in Figure 10.6. As the figure illustrates, the rising clock edge
is delayed by a positive at the second register.
Figure 4.1 Timing diagram of positive clock skew.

Clock skew is caused by static path-length mismatches in the clock load and by
definition skew is constant from cycle to cycle. That is, if in one cycle CLK2 lagged
CLK1 by , then on the next cycle it will lag it by the same amount. It is important to note
that clock skew does not result in clock period variation, but rather phase shift.
Figure 4.2 Timing diagram of negative clock skew.

Skew has strong implications on performance and functionality of a sequential
system. First consider the impact of clock skew on performance. From Figure 4.1, a new
input In sampled by R1 at edge 1 will propagate through the combinational logic and be
sampled by R2 on edge 4. If the clock skew is positive, the time available for signal to
propagate from R1 to R2 is increased by the skew . The output of the combinational
logic must be valid one set-up time before the rising edge of CLK2 (point 4). The
constraint on the minimum clock period can then be derived as:
T + t cq +t logic +t su T t cq+ t logic +t su
(4.1)
The above equation suggests that clock skew actually has the potential to improve the
performance of the circuit. That is, the minimum clock period required to operate the
circuit reliably reduces with increasing clock skew! This is indeed correct, but
unfortunately, increasing skew makes the circuit more susceptible to race conditions may
and harm the correct operation of sequential systems.
As above, assume that input In is sampled on the rising edge of CLK1 at edge 1
into R1. The new values at the output of R1 propagate through the combinational logic
and should be valid before edge 4 at CLK2. However, if the minimum delay of the
combinational logic block is small, the inputs to R2 may change before the clock edge 2,
resulting in incorrect evaluation. To avoid races, we must ensure that the minimum
propagation delay through the register and logic must be long enough such that the inputs
to R2 are valid for a hold time after edge 2. The constraint can be formally stated as
+ t h old <t (cq , cd )+t (logic ,cd ) < t ( cq ,cd ) +t (logic, cd )t h old
(4.2)
Figure 4.2 shows the timing diagram for the case when < 0. For this case, the
rising edge of CLK2 happens before the rising edge of CLK1. On the rising edge of
CLK1, a new input is sampled by R1. The new sampled data propagates through the
combinational logic and is sampled by R2 on the rising edge of CLK2, which corresponds
to edge 4. As can be seen from Figure 10.7 and Eq. (4.1), a negative skew directly
impacts the performance of sequential system. However, a negative skew implies that the
system never fails, since edge 2 happens before edge 1. This can also be seen from Eq.
(4.2), which is always satisfied since < 0.
Example scenarios for positive and negative clock skew are shown in Figure 4.3.
Figure4.3 Positive and negative clock skew.

4.1.1 Positive Clock Skew
> 0 - This corresponds to a clock routed in the same direction as the flow of the
data through the pipeline (Figure 4.3a). In this case, the skew has to be strictly controlled
and satisfy Eq. (4.2). If this constraint is not met, the circuit does malfunction
independent of the clock period. Reducing the clock frequency of an edge-triggered
circuit does not help get around skew problems. On the other hand, positive skew
increases the throughput of the circuit as expressed by Eq. (4.1), because the clock period
can be shortened by . The extent of this improvement is limited as large values of soon
provoke violations of Eq. (4.2).
4.1.2 Negative Clock Skew
< 0 - When the clock is routed in the opposite direction of the data (Figure 4.3b),
the skew is negative and condition (4.2) is unconditionally met. The circuit operates
correctly independent of the skew. The skew reduces the time available for actual
computation so that the clock period has to be increased by ||. In summary, routing the
clock in the opposite direction of the data avoids disasters but hampers the circuit
performance.
Unfortunately, since a general logic circuit can have data flowing in both
directions (for example, circuits with feedback), this solution to eliminate races will not
always work (Figure 4.4). The skew can assume both positive and negative values
depending on the direction of the data transfer. Under these circumstances, the designer
has to account for the worst-case skew condition. In general, routing the clock so that
only negative skew occurs is not feasible.
Therefore, the design of a low-skew clock network is essential.
Figure4.4 Datapath structure with feedback.
4.2 Clock Jitter

Jitter is the undesired deviation from true periodicity of an assumed periodic
signal in electronics and telecommunications, often in relation to a reference clock source.
Jitter may be observed in characteristics such as the frequency of successive pulses, the
signal amplitude, or phase of periodic signals. Jitter is a significant, and usually
undesired, factor in the design of almost all communications links (e.g., USB, PCI-e,
SATA, OC-48). In clock recovery applications it is called timing jitter.
Jitter can be quantified in the same terms as all time-varying signals, e.g., RMS, or
peak-to-peak displacement. Also like other time-varying signals, jitter can be expressed in
terms of spectral density (frequency content).
Jitter period is the interval between two times of maximum effect (or minimum
effect) of a signal characteristic that varies regularly with time. Jitter frequency, the more
commonly quoted figure, is its inverse. ITU-T G.810 classifies jitter frequencies below
10 Hz as wander and frequencies at or above 10 Hz as jitter.
Jitter may be caused by electromagnetic interference (EMI) and crosstalk with
carriers of other signals. Jitter can cause a display monitor to flicker, affect the
performance of processors in personal computers, introduce clicks or other undesired
effects in audio signals, and loss of transmitted data between network devices. The
amount of tolerable jitter depends on the affected application.
Clock jitter refers to the temporal variation of the clock period at a given point
that is, the clock period can reduce or expand on a cycle-by-cycle basis. It is strictly a
temporal uncertainty measure and is often specified at a given point on the chip. Jitter can

be measured and cited in one of many ways. Cycle-to-cycle jitter refers to time varying
deviation of a single clock period and for a given spatial location i is given as Tjitter,i(n) =
Ti, n+1 - Ti,n T=, where Ti,n is the clock period for period n, Ti, n+1 is clock period for period
n+1, and TCLK is the nominal clock period.
Figure4.5 Circuit for studying the impact of jitter on performance.

4.2.1 Jitter metrics
For clock jitter, there are three commonly used metrics: absolute jitter, period
jitter, and cycle to cycle jitter.
Absolute jitter is the absolute difference in the position of a clock's edge from
where it would ideally be.
Period jitter (aka cycle jitter) is the difference between any one clock period and
the ideal/average clock period. Accordingly, it can be thought of as the discrete-time
derivative of absolute jitter. Period jitter tends to be important in synchronous circuitry
like digital state machines where the error-free operation of the circuitry is limited by the
shortest possible clock period, and the performance of the circuitry is limited by the
average clock period. Hence, synchronous circuitry benefits from minimizing period
jitter, so that the shortest clock period approaches the average clock period.
Cycle-to-cycle jitter is the difference in length/duration of any two adjacent clock
periods. Accordingly, it can be thought of as the discrete-time derivative of period jitter. It
can be important for some types of clock generation circuitry used in microprocessors and
RAM interfaces.

Since they have different generation mechanisms, different circuit effects, and
different measurement methodology, it is useful to quantify them separately.
4.3.2 Types of Jitter
There are three types of jitter. They are random jitter, deterministic jitter and total
jitter. They are discussed as follows.
Random jitter
Random Jitter, also called Gaussian jitter, is unpredictable electronic timing noise.
Random jitter typically follows a Gaussian distribution or Normal distribution. It is
believed to follow this pattern because most noise or jitter in a electrical circuit is caused
by thermal noise, which has a Gaussian distribution. Another reason for random jitter to
have a distribution like this is due to the central limit theorem. The central limit theorem
states that composite effect of many uncorrelated noise sources, regardless of the
distributions, approaches a Gaussian distribution. One of the main differences between
random and deterministic jitter is that deterministic jitter is bounded and random jitter is
unbounded.
Deterministic jitter
Deterministic jitter is a type of clock timing jitter or data signal jitter that is
predictable and reproducible. The peak-to-peak value of this jitter is bounded, and the
bounds can easily be observed and predicted. Deterministic jitter can either be correlated
to the data stream (data-dependent jitter) or uncorrelated to the data stream (bounded
uncorrelated jitter). Examples of data-dependent jitter are duty-cycle dependent jitter
(also known as duty-cycle distortion) and intersymbol interference. Deterministic jitter
(or DJ) has a known non-Gaussian probability distribution.
Total jitter
Total jitter (T) is the combination of random jitter (R) and deterministic jitter (D):
T = Dpeak-to-peak + 2 nRrms,
in which the value of n is based on the bit error rate (BER) required of the link.

A common bit error rate used in communication standards such as Ethernet is
1012. The relationship for n and bit error rate is given in the table below.
n
6.4
6.7
7
7.3
7.6
BER
1010
1011
1012
1013
1014
Table 4.1 Relationship for n and BER
4.3 Sources of Skew and Jitter

A perfect clock is defined as perfectly periodic signal that is simultaneous
triggered at various memory elements on the chip. However, due to a variety of process
and environmental variations, clocks are not ideal. To illustrate the sources of skew and
jitter, consider the simplistic view of clock generation and distribution as shown in Figure
4.8. Typically, a high frequency clock is either provided from off chip or generated onchip. From a central point, the clock is distributed using multiple matched paths to lowlevel memory elements registers. In this picture, two paths are shown. The clock paths
include wiring and the associated distributed buffers required to drive interconnects and
loads. A key point to realize in clock distribution is that the absolute delay through a clock
distribution path is not important; what matters is the relative arrival time between the
output of each path at the register points (i.e., it is perfectly acceptable for the clock signal
to take multiple cycles to get from a central distribution point to a low-level register as
long as all clocks arrive at the same time to different registers on the chip).
Figure 4.6 Skew and Jitter sources in synchronous clock distribution.

There are many reasons why the two parallel paths dont result in exactly the same
delay. The sources of clock uncertainty can be classified in several ways. First, errors can
be divided into systematic or random. Systematic errors are nominally identical from chip
to chip, and are typically predictable (e.g., variation in total load capacitance of each
clock path). In principle, such errors can be modeled and corrected at design time given
sufficiently good models and simulators. Failing that, systematic errors can be deduced
from measurements over a set of chips, and the design adjusted to compensate. Random
errors are due to manufacturing variations (e.g., dopant fluctuations that result in
threshold variations) that are difficult to model and eliminate. Mismatch may also be
characterized as static or time-varying. In practice, there is a continuum between changes
that are slower than the time constant of interest, and those that are faster. For example,
temperature variations on a chip vary on a millisecond time scale. A clock network tuned
by a one-time calibration or trimming would be vulnerable to time-varying mismatch due
to varying thermal gradients. On the other hand, to a feedback network with a bandwidth
of several mega hertz, thermal changes appear essentially static. For example, the clock
net is usually by far the largest single net on the chip, and simultaneous transitions on the
clock drivers induce noise on the power supply. However, this high speed effect does not
contribute to time-varying mismatch because it is the same on every clock cycle, affecting
each rising clock edge the same way. Of course, this power supply glitch may still cause
static mismatch if it is not the same throughout the chip. Below, the various sources of
skew and jitter, introduced in Figure 4.6, are described in detail.
4.3.1 Clock-Signal Generation
The generation of the clock signal itself causes jitter. A typical on-chip clock
generator, as described at the end of this chapter, takes a low-frequency reference clock

signal, and produces a high-frequency global reference for the processor. The core of such
a generator is a Voltage-Controlled Oscillator (VCO). This is an analog circuit, sensitive
to intrinsic device noise and power supply variations. A major problem is the coupling
from the surrounding noisy digital circuitry through the substrate. This is particularly a
problem in modern fabrication processes that combine a lightly-doped epitaxial layer and
a heavily doped substrate (to combat latch-up). This causes substrate noise to travel over
large distances on the chip. These noise source cause temporal variations of the clock
signal that propagate unfiltered through the clock drivers to the flip-flops, and result in
cycle-to-cycle clock-period variations. This jitter causes performance degradation.
4.3.2 Manufacturing Device Variations
Distributed buffers are integral components of the clock distribution networks, as they are
required to drive both the register loads as well as the global and local interconnects. The
matching of devices in the buffers along multiple clock paths is critical to minimizing
timing uncertainty. Unfortunately, as a result of process variations, devices parameters in
the buffers vary along different paths, resulting in static skew. There are many sources of
variations including oxide variations (that affects the gain and threshold), dopant
variations, and lateral dimension (width and length) variations. The doping variations can
affect the depth of junction and dopant profiles and cause variations in electrical
parameters such as device threshold and parasitic capacitances. The orientation of
polysilicon can also have a big impact on the device parameters. Keeping the orientation
the same across the chip for the clock drivers is critical.
Variation in the polysilicon critical dimension is particularly important as it
translates directly into MOS transistor channel length variation and resulting variations in
the drive current and switching characteristics. Spatial variation usually consists of wafer
level (or within-wafer) variation and die-level (or within-die) variation. At least part of
the variation is systematic and can be modeled and compensated for. The random
variations however, ultimately limit the matching and skew that can be achieved.
4.3.3 Interconnect Variations
Vertical and lateral dimension variations cause the interconnect capacitance and
resistance to vary across a chip. Since this variation is static, it causes skew between
different paths. One important source of interconnect variation is the Inter-level Dielectric
(ILD) thickness variations. In the formation of aluminum interconnect; layers of silicon
dioxide are interposed between layers of patterned metallization. The oxide layer is
deposited over a layer of patterned metal features, generally resulting in some remaining

step height or surface topography. Chemical-mechanical polishing (CMP) is used to
planarize the surface and remove topography resulting from deposition and etch (as
shown in Figure 4.7a).While at the feature scale (over an individual metal line), CMP can
achieve excellent planarity, there are limitations on the planarization that can be achieved
over a global range. This is primarily caused due to variations in polish rate that is a
function of the circuit layout density and pattern effects. Figure 4.7b shows this effect
where the polish rate is higher for the lower spatial density region, resulting in smaller
dielectric thickness and higher capacitance.
The assessment and control of variation is of critical importance in semiconductor
process development and manufacturing. Significant advances have been made to
develop analytical models for estimating the ILD thickness variations based on spatial
density. Since this component is often predictable from the layout, it is possible to
actually correct for the systematic component at design time (e.g., by adding appropriate
delays or making the density uniform by adding dummy fills). Figure 4.8 shows the
spatial pattern density and ILD thickness for a high performance microprocessor. The
graphs show that there is clear correlation between the density and the thickness of the
dielectric. So clock distribution networks must exploit such information to reduce clock
skew.
Figure 4.7 Inter-level Dielectric (ILD) thickness variation due to density.

Other interconnect variations include deviation in the width of the wires and line
spacing. This results from photolithography and etches dependencies. At the lower levels
of metallization, lithographic effects are important while at higher levels etch effects are
important that depend on width and layout. The width is a critical parameter as it directly

impacts the resistance of the line and the wire spacing affects the wire-to-wire
capacitance, which the dominant component of capacitance.
Figure 4.8 Pattern density and ILD thickness variation for a high performance
microprocessor.
4.3.4 Environmental Variations
Environmental variations are probably the most significant and primarily
contribute to skew and jitter. The two major sources of environmental variations are
temperature and power supply. Temperature gradients across the chip are a result of
variations in power dissipation across the die. This has particularly become an issue with
clock gating where some parts of the chip maybe idle while other parts of the chip might
be fully active. This results in large temperature variations. Since the device parameters
(such as threshold, mobility, etc.) depend strongly on temperature, buffer delay for a clock
distribution network along one path can vary drastically for another path. More
importantly, this component is time-varying since the temperature changes as the logic
activity of the circuit varies. As a result, it is not sufficient to simulate the clock networks
at worst case corners of temperature; instead, the worst-case variation in temperature
must be simulated. An interesting question is does temperature variation contribute to
skew or to jitter? Clearly the variation in temperature is time varying but the changes are
relatively slow (typical time constants for temperature on the order of milliseconds).
Therefore it usually considered as a skew component and the worst-case conditions are
used. Fortunately, using feedback, it is possible to calibrate the temperature and
compensate.
Power supply variations on the hand are the major source of jitter in clock
distribution networks. The delay through buffers is a very strong function of power

supply as it directly affects the drive of the transistors. As with temperature, the power
supply voltage is a strong function of the switching activity. Therefore, the buffer delay
along one path is very different than the buffer delay along another path. Power supply
variations can be classified into static (or slow) and high frequency variations. Static
power supply variations may result from fixed currents drawn from various modules,
while high-frequency variations result from instantaneous IR drops along the power grid
due to fluctuations in switching activity. Inductive issues on the power supply are also a
major concern since they cause voltage fluctuations. This has particularly become a
concern with clock gating as the load current can vary dramatically as the logic transitions
back and forth between the idle and active states. Since the power supply can change
rapidly, the period of the clock signal is modulated on a cycle-by-cycle basis, resulting in
jitter. The jitter on two different clock points maybe correlated or uncorrelated depending
on how the power network is configured and the profile of switching patterns.
Unfortunately, high-frequency power supply changes are difficult to compensate even
with feedback techniques. As a result, power supply noise fundamentally limits the
performance of clock networks.
4.3.5 Capacitive Coupling
The variation in capacitive load also contributes to timing uncertainty. There are
two major sources of capacitive load variations: coupling between the clock lines and
adjacent signal wires and variation in gate capacitance. The clock network includes both
the interconnect and the gate capacitance of latches and registers. Any coupling between
the clock wire and adjacent signal results in timing uncertainty. Since the adjacent signal
can transition in arbitrary directions and at arbitrary times, the exactly coupling to the
clock network is not fixed from cycle-to-cycle. This results in clock jitter. Another major
source of clock uncertainty is variation in the gate capacitance related to the sequential
elements. The load capacitance is highly non-linear and depends on the applied voltage.
In many latches and registers this translates to the clock load being a function of the
current state of the latch/register (this is, the values stored on the internal nodes of the
circuit), as well as the next state. This causes the delay through the clock buffers to vary
from cycle-to-cycle, causing jitter.
4.4 Impact of Skew and Jitter on Performance on Sequential Circuits

Consider the sequential circuit show in Figure 4.9. Assume that nominally ideal
clocks are distributed to both registers (the clock period is identical every cycle and the

skew is 0). In reality, there is static skew between the two clock signals (assume that >
0). Assume that CLK1 has a jitter of tjitter1 and CLK2 has a jitter of tjitter2. To determine
the constraint on the minimum clock period, we must look at the minimum available time
to perform the required computation. The worst case happens when the leading edge of
the current clock period on CLK1 happens late (edge 3) and the leading edge of the next
cycle of CLK2 happens early (edge 10). This results in the following constraint
T CLK + t jitter 1t jitter 2 t cq +t logic +t su
T t cq +t logic + t su +t jitter 1 +t jitter 2
(4.3)
Figure4.9 The impact of positive skew and jitter on edge-triggered systems.

As the above equation illustrates, while positive skew can provide potential
performance advantage, jitter has a negative impact on the minimum clock period. To
formulate the minimum delay constraint, consider the case when the leading edge of the
CLK1 cycle arrives early (edge 1) and the leading edge the current cycle of CLK2 arrives
late (edge 6). The separation between edge 1 and 6 should be smaller than the minimum
delay through the network. This result in
+ t h old+t jitter 1+ t jitter 2<t( c q , cd )+ t(logic , cd)
< t( c q , cd)+t (logic , cd ) t h old t jitter 1 t jitter 2
(4.4)
The above relation indicates that the acceptable skew is reduced by the jitter of the two
signals.

Now consider the case when the skew is negative ( <0) as shown in Figure 4.10.
For the timing shown, || > tjitter2. It can be easily verified that the worst case timing is
exactly the same as the previous analysis, with taking a negative value. That is, negative
skew reduces performance.
Figure 4.10 The impact of negative skew and jitter on edge-triggered systems.
4.5 Timing Constraints due to Clock Skew

The magnitude and polarity of the clock skew have a two-sided effect on system
performance and reliability. Depending upon whether
the magnitude of
T Skew
with respect to
T PD
Ci
leads or lags
Cf
and upon
, system performance and reliability can
either be degraded or enhanced. These cases are discussed below.
Figure 4.11 Timing diagram of clocked local data path.

4.5.1 Maximum Data Path/Clock Skew Constraint Relationship
For a design to meet its specified timing requirements, the greatest propagation
delay of any data path between a pair of data registers,
Ri
and
Rf
, being
synchronized by a clock distribution network must be less than the minimum clock period
(the inverse of the maximum clock frequency) of the circuit as shown in 4.11. If the time
of arrival of the clock signal at the final register of a data path
T cf
leads that of the time

of arrival of the clock signal at the initial register of the same sequential data path Tci [see
Fig. 4.12(A)], the clock skew is referred to as positive clock skew and, under this
condition, the maximum attainable operating frequency is decreased. Positive clock skew
is the additional amount of time which must be added to the minimum clock period to
reliably apply a new clock signal at the final register, where reliable operation implies that
the system will function correctly at low as well as at high frequencies (assuming fully
static logic). It should be noted that positive clock skew only affects the maximum
frequency of a system and cannot create race conditions.
Figure 4.12 Clock timing diagrams.

In the positive clock skew case, the clock signal arrives at Rf before it reaches Ri.
The maximum permissible positive clock skew can be expressed as
T
+T
Setup
T Skew T CPT PD(max)=T CP

For TCi > TC
Where
T PD (max )
(4.5)
is the maximum path delay between two sequentially-adjacent registers.
This situation is the typical critical path timing analysis requirement commonly seen in
most high-performance synchronous digital systems. If (4.5) is not satisfied, the system
will not operate correctly at that specific clock period (or clock frequency). Therefore,
T CP
must be increased for the circuit to operate correctly, thereby decreasing the
system performance. In circuits where the tolerance for positive clock skew is small [

T Skew
in (4.5) is small], the clock and data signals should be run in the same direction,
thereby forcing Cf to lag Ci and making the clock skew negative.

4.5.2 Minimum Data Path/Clock Skew Constraint Relationship
If the clock signal arrives at Ri before it reaches Rf [see Fig. 4.12(B)], the clock
skew is defined as being negative. Negative clock skew can be used to improve the
maximum performance of a synchronous system by decreasing the delay of a critical
path; however, a potential minimum constraint can occur, creating a race condition. In
this case, when Cf lags Ci, the clock skew must be less than the time required for the data
signal to leave the initial register, propagate through the interconnect, combinatorial logic,
and setup in the final register (see Fig.4.11). If this condition is not met, the data stored in
register Rf is overwritten by the data that had been stored in register Ri and has propagated
through the combinatorial logic. Furthermore, a circuit operating close to this condition
might pass system diagnostics but malfunction at unpredictable times due to fluctuations
in ambient temperature or power supply voltage. Correct operation requires Rf that latches
data which correspond to the data Ri latched during the previous clock period. This
constraint on clock skew is
+T Hold
|T Skew|T PD(min)=T CQ +T Logic (min) +T
For Tcf > Tci
(4.6)
Figure 4.13 K-bit shift register with positive clock skew

Where
and
T PD (min)
T Hold
is the minimum path delay between two sequentially-adjacent registers
is the amount of time the input data signal must be stable once the clock
signal changes state. An important example in which this minimum constraint can occur
is in those designs which use cascaded registers, such as a serial shift register or a -bit
counter, as shown in Fig. 4.13 (note that a distributed RC impedance is between Ci and
Cf). In cascaded register circuits,
T Logic (min)
is zero and
approaches zero (since
cascaded registers are typically designed, at the geometric level, to abut). If Tcf >Tci (i.e.,
negative clock skew), then the minimum constraint becomes
|T Skew|T CQ +T Hold
For Tcf >Tci
(4.7)
and all that is necessary for the system to malfunction is a poor relative placement of the
flip flops or a highly resistive connection between Ci and Cf. In a circuit configuration
such as a shift register or counter, where negative clock skew is a more serious problem
than positive clock skew, provisions should be made to force Cf to lead Ci, as shown in
Fig. 4.13.
As higher levels of integration are achieved in high-complexity VLSI circuits, onchip testability becomes necessary. Data registers, configured in the form of serial
set/scan chains when operating in the test mode, and are a common example of a design
for testability (DFT) technique. The placement of these circuits is typically optimized
around the functional flow of the data. When the system is reconfigured to use the
registers in the role of the set/scan function, different local path delays are possible. In
particular, the clock skew of the reconfigured local data path can be negative and greater
in magnitude than the local register delays. Therefore, with increased negative clock
skew, (7) may no longer be satisfied and incorrect data may latch into the final register of
the reconfigured local data path. Therefore, it is imperative that attention be placed on the
clock distribution of those paths that have nonstandard modes of operation.
In ideal scaling of MOS devices, all linear dimensions and voltages are multiplied
by the factor 1/S, where S>1. Device dependent delays, such as
T Logic
T C Q
scale as 1/S while interconnect dominated delays such as
T Setup
T Skew
, and
remain
constant to first order, and if fringing capacitance and electro migration are considered,
actually increase with decreasing dimensions. Therefore, when examining the effects of
dimensional scaling on system reliability, (4.6) and (4.7) should be considered carefully.
One straight forward method to avoid the effect of technology scaling on those data paths
particularly susceptible to negative clock skew is to not scale the clock distribution lines.

Svensson and Afghahi show that by using courser than ordinary lines for the global clock
distribution, 20-mm-wide chip sizes with
CMOS circuits scaled to 0.3 m polysilicon lines would have comparable logic and crosschip interconnect delays (on the order of 0.5 ns), making possible synchronous clock
frequencies of up to 1 GHz. Therefore, the scaling of device technologies can severely
affect the design and operation of clock distribution networks, necessitating specialized
strategies and compensation techniques.
4.5.3 Enhancement of Synchronous Performance
Localized clock skew can be used to improve synchronous performance by
providing more time for the critical worst case data paths. By forcing Ci to lead Cf at each
critical local data path, excess time is shifted from the neighboring less critical local data
paths to the critical local data paths. This negative clock skew represents the additional
amount of time that the data signal at Ri has to propagate through the logic stages and
interconnect sections and into the final register. Negative clock skew subtracts from the
logic path delay, thereby decreasing the minimum clock period. Thus, applying negative
clock skew, in effect, increases the total time that a given critical data path has to
accomplish its functional requirements by giving the data signal released from Ri more
time to propagate through the logic and interconnect stages and latch into Rf . Thus, the
differences in delay between each local data path are minimized, thereby compensating
for any inefficient partitioning of the global data path into local data paths that may have
occurred, a common situation in many practical systems.
The maximum permissible negative clock skew of a data path, however, is
dependent upon the clock period itself as well as the time delay of the previous data
paths. This results from the structure of the serially cascaded local data paths making up
the global data path. Since a particular clock
signal synchronizes a register which
functions in a dual role, as the initial register of the next local data path and as the final
register of the previous data path, the earlier Ci is for a given data path, the earlier that
same clock signal, now Cf, is for the previous data path. Thus, the use of negative clock
skew in the ith path results in a positive clock skew for the preceding path, which may
then establish the new upper limit for the system clock frequency.
CHAPTER 5
CLOCK DISTRIBUTION TECHNIQUES
The most effective way to get the clock signal to every part of a chip that needs it,
with the lowest skew, is a metal grid. In a large microprocessor, the power used to drive
the clock signal can be over 30% of the total power used by the entire chip. The whole
structure with the gates at the ends and all amplifiers in between have to be loaded and
unloaded every cycle. To save energy, clock gating temporarily shuts off part of the tree.
The clock distribution network (or clock tree, when this network forms a tree)
distributes the clock signal(s) from a common point to all the elements that need it. Since
this function is vital to the operation of a synchronous system, much attention has been
given to the characteristics of these clock signals and the electrical networks used in their
distribution. Clock signals are often regarded as simple control signals; however, these
signals have some very special characteristics and attributes.
5.1 Clock Distribution Strategies

Many different approaches, from ad hoc to algorithmic, have been developed for
designing clock distribution networks in synchronous digital integrated circuits. The
requirement of distributing a tightly controlled clock signal to each synchronous register
on a large nonredundant hierarchically structured integrated circuit (an example floorplan
is shown in Fig. 5.1) within specific temporal bounds is difficult and problematic.
Furthermore, the tradeoffs that exist among system speed, physical die area, and power
dissipation are greatly affected by the clock distribution network. The design
methodology and structural topology of the clock distribution network should be
considered in the development of a system for distributing the clock signals.
Therefore, various clock distribution strategies have been developed. The most
common and general approach to equipotential clock distribution is the use of buffered
trees. In contrast to these asymmetric structures, symmetric trees, such as H-trees, are
used to distribute high-speed clock signals. In developing structured custom integrated
circuits, such as is illustrated by the floorplan pictured in Fig. 5.1, specific circuit design
techniques are used to control the signal properties within the clock distribution network.
Low-power design techniques are an area of significant currency and importance. Some

examples of different strategies for reducing the power dissipated within the clock
distribution network.
Figure 5.1 Floorplan of structured custom VLSI circuit using synchronous clock
distribution.
5.2 Buffered Clock Distribution Trees

The most common strategy for distributing clock signals in VLSI-based systems is
to insert buffers either at the clock source and/or along a clock path, forming a tree
structure.
Thus, the unique clock source is frequently described as the root of the tree, the initial
portion of the tree as the trunk, individual paths driving each register as the branches, and
the registers being driven as the leaves. This metaphor for describing a clock distribution
network is commonly accepted and used throughout the literature and is illustrated in Fig.
5.2.
Figure 5.2 Tree structure of clock distribution network.

Occasionally, a mesh version of the clock tree structure is used in which shunt
paths further down the clock distribution network are placed to minimize the interconnect
resistance within the clock tree. This mesh structure effectively places the branch
resistances in parallel, minimizing the clock skew.
An example of this mesh structure is described and illustrated in Section IX-B.
The mesh version of the clock tree is considered in this paper as an extended version of
the standard, more commonly used clock tree depicted in Fig. 5.2. The clock distribution
network is therefore typically organized as a rooted tree structure, as illustrated in Fig.
5.2. Various forms of a clock distribution network, including a trunk, tree, mesh, and Htree structures are illustrated in Fig. 5.3. If the interconnect resistance of the buffer at the
clock source is small as compared to the buffer output resistance, a single buffer is often
used to drive the entire clock distribution network. This strategy may be appropriate if the
clock is distributed entirely on metal, making load balancing of the network less critical.
The primary requirement of a single buffer system is that the buffer should provide
sufficient current to drive the network capacitance (both interconnect and fanout) while
maintaining high-quality waveform shapes (i.e., short transition times) and minimizing
the effects of the interconnect resistance by ensuring that the output resistance of the
buffer is much greater than the resistance of the interconnect section being driven.
Figure 5.3 Structures of clock distribution networks including a trunk, tree, mesh, and Htree.

An alternative approach to using only a single buffer at the clock source is to
distribute buffers throughout the clock distribution network, as shown in Fig. 5.2. This
approach requires additional area but greatly improves the precision and control of the
clock signal waveforms and is necessary if the resistance of the interconnect lines is
nonnegligible. The distributed buffers serve the double function of amplifying the clock
signals degraded by the distributed interconnect impedances and isolating the local clock
nets from upstream load impedances. A three-level buffer clock distribution network
utilizing this strategy is shown in Fig. 5.4. In this approach a single buffer drives multiple
clock paths (and buffers). The number of buffer stages between the clock source and each
clocked register depends upon the total capacitive loading, in the form of registers and
interconnect, and the permissible clock skews. It is worth noting that the buffers are a
primary source of the total clock skew within a well-balanced clock distribution network
since the active device characteristics vary much more greatly than the passive device
characteristics. The maximum number of buffers driven by a single buffer is determined
by the current drive of the source buffer and the capacitive load (assuming an MOS
technology) of the destination buffers. The final buffer along each clock path provides the
control signal of the driven register.
Figure 5.4 Three-level buffer clock distribution network.

Historically, the primary design goal in clock distribution networks has been to
ensure that a clock signal arrives at every register within the entire synchronous system at
precisely the same time. This concept of zero clock skew design has been extended, to
provide either a positive or a negative clock skew at a magnitude depending upon the
temporal characteristics of each local data path in order to improve system performance
and enhance system reliability.
5.3 Symmetric H-Tree Clock Distribution Networks

Another approach for distributing clock signals, a subset of the distributed buffer
approach depicted in Fig. 5.2, utilizes a hierarchy of planar symmetric H-tree or X-tree
structures (see Fig. 5.5) to ensure zero clock skew by maintaining the distributed
interconnect and buffers to be identical from the clock signal source to the clocked
register of each clock path.
Figure 5.5 Symmetric H-tree and X-tree clock distribution networks.

In this approach, the primary clock driver is connected to the center of the main
H structure. The clock signal is transmitted to the four corners of the main H. These
four close to identical clock signals provide the inputs to the next level of the H-tree
hierarchy, represented by the four smaller H structures. The distribution process
continues through several levels of progressively smaller H structures. The final
destination points of the H-tree are used to drive the local registers or are amplified by
local buffers which drive the local registers. Thus, each clock path from the clock source
to a clocked register has practically the same delay. The primary delay difference between
the clock signal paths is due to variations in process parameters that affect the
interconnect impedance and, in particular, any active distributed buffer amplifiers. The
amount of clock skew within an H-tree structured clock distribution network is strongly
dependent upon the physical size, the control of the semiconductor process, and the
degree to which active buffers and clocked latches are distributed within the H-tree
structure.
The conductor widths in H-tree structures are designed to progressively decrease
as the signal propagates to lower levels of the hierarchy. This strategy minimizes
reflections of the high-speed clock signals at the branching points. Specifically, the
impedance of the conductor leaving each branch point Z K+1must be twice the impedance

of the conductor providing the signal to the branch point ZK for an H-tree structure and
four times the impedance for an X-tree structure. This tapered H-tree structure is
illustrated in Fig. 5.6:
ZK=ZK+1/2
for an H-tree structure
(5.1)
Figure 5.6 Tapered H-tree clock distribution network.

The planar H-tree structure places constraints on the physical layout of the clock
distribution network as well as on the design methodology used in the development of the
VLSI system. For example, in an H-tree network, clock lines must be routed in both the
vertical and horizontal directions. For a standard two-level metal CMOS process, this
manhattan structure creates added difficulty in routing the clock lines without using either
resistive interconnect or multiple high resistance vias between the two metal lines. This
aspect is a primary reason for the development of three or more layers of metal in logicbased CMOS processes. Furthermore, the interconnect capacitance (and therefore the
power dissipation) is much greater for the H-tree as compared with the standard clock tree
since the total wire length tends to be much greater. This increased capacitance of the Htree structure exemplifies an important tradeoff between clock delay and clock skew in
the design of high-speed clock distribution networks. Symmetric structures are used to
minimize clock skew; however, an increase in clock signal delay is incurred. Therefore,
the increased clock delay must be considered when choosing between buffered tree and
H-tree clock distribution networks. Also, since clock skew only affects sequentiallyadjacent registers, the obvious advantages to using highly symmetric structures to
distribute clock signals are significantly degraded. There may, however, be certain
sequentially-adjacent registers distributed across the integrated circuit. For this situation,

a symmetric H-tree structure may be appropriate, particularly to distribute the global
portion of the clock network.
CHAPTER 6
CONCLUSIONS
An in-depth analysis of the synchronous digital circuits and clocking approaches
was presented. Clock skew and jitter has a major impact on the functionality and
performance of a system. Important parameters are the clocking scheme used and the
nature of the clock-generation and distribution network. Alternative timing approaches,
such as self-timed design, are becoming attractive to deal with clock distribution
problems. Self-timed design uses completion signals and handshaking logic to isolate
physical timing constraints from event ordering.
The connection of synchronous and asynchronous components introduces the risk
of synchronization failure. The introduction of synchronizers helps to reduce that risk, but
can never eliminate it. The key message of is that synchronization and timing are among
the most intriguing challenges facing the digital designer of the next decade.
The different timing methodologies were discussed based how the digital systems
are related with the clock signal. Some information regarding the timing parameters of
sequential circuit with relative to the clock signal are discussed. The sequential circuits
are analyzed from the point of view of propagation delay, contamination delay, setup time
and hold time. The spatial variation in arrival time of a clock transition on an integrated
circuit (i.e., Clock Skew) and the temporal variation of the clock period at a given point
(i.e., Clock Jitter) are explained in detail. The various clock distribution networks are
suggested. In these, H-Tree clock distribution network is used to minimize the clock
skew.
It is the intention of this report to integrate these various topics and to provide
some sense of cohesiveness to the field of timing, clocking and clock distribution
networks.
REFERENCES
[1] Jan M. Rabaey, Anantha Chandrakasan, Borivoje Nikolic, Digital Integrated Circuits,
2nd edition, Prentice Hall of India, New Delhi, 2003.
[2] John F. Wakerly, Digital Design Principles and Practices, 4th Edition, Pearson
Education, India, 2006.
[3] V. G. Oklobdzija, V. M. Stojanovic, D. M. Markovic, and N. M. Nedovic, Digital
System Clocking: High-Performance and Low Power Aspects, ISBN 0-471-27447-X,
IEEE Press/Wiley-Interscience, 2003.
[4] E. G. Friedman, Clock Distribution Networks in Synchronous Digital Integrated
Circuits, Proceedings of the IEEE, Vol. 89, No. 5, pp. 665-692, May 2001.
[5] A.H. Ajami, K. Banerjee, and M. Pedram, Modeling and Analysis of Non-uniform
Substrate Temperature Effects on Global ULSI Interconnects, IEEE Transactions on
Computer Aided Design of Integrated Circuit and Systems, Vol. 24, No. 6, pp. 1121-1127,
June 2005.
[6] http://www.stanford.edu/class/ee273/

Synchronus

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Synchronus

Загружено:

Авторское право:

Доступные форматы

TIMING AND CLOCKING ISSUES IN DIGITAL CIRCUITS

1.1 Synchronous Digital Systems

TIMING AND CLOCKING ISSUES IN DIGITAL CIRCUITS

1.2 Importance of Clock

TIMING AND CLOCKING ISSUES IN DIGITAL CIRCUITS

1.3 Organization of the Report

Department of Electronics & Communication Engineering

TIMING AND CLOCKING ISSUES IN DIGITAL CIRCUITS

2.1 Synchronous Methodology

Figure 2.1 Synchronous interconnect methodology

TIMING AND CLOCKING ISSUES IN DIGITAL CIRCUITS

2.2 Mesochronous Methodology

Figure 2.2 Mesochronous approach using variable delay line.

2.3 Plesiochronous Methodology

TIMING AND CLOCKING ISSUES IN DIGITAL CIRCUITS

Figure 2.3 Plesiochronous communication using FIFO.

TIMING AND CLOCKING ISSUES IN DIGITAL CIRCUITS

2.4 Asynchronous Methodology

Figure 2.4 Asynchronous methodology for simple pipeline interconnects.

TIMING AND CLOCKING ISSUES IN DIGITAL CIRCUITS

R2signal switching at the output of R 1 will propagate to the input

Department of Electronics & Communication Engineering

TIMING AND CLOCKING ISSUES IN DIGITAL CIRCUITS

Figure 3.1 Local data path.

Department of Electronics & Communication Engineering

TIMING AND CLOCKING ISSUES IN DIGITAL CIRCUITS

Figure 3.2 Timing diagram of clocked data path.

Rf is shown in Fig. 3.1. The clock

3.2 Timing Parameters for Combinational Logic

Department of Electronics & Communication Engineering

TIMING AND CLOCKING ISSUES IN DIGITAL CIRCUITS

Figure 3.3 Propagation and Contamination Delays in the Combinational Logic

3.3 Timing Parameters for Sequential Logic

TIMING AND CLOCKING ISSUES IN DIGITAL CIRCUITS

Department of Electronics & Communication Engineering

TIMING AND CLOCKING ISSUES IN DIGITAL CIRCUITS

Figure 3.4 Timing Parameters of Sequential Circuit

3.4 Determination of the Maximum Clock Frequency

Figure 3.5 Sequential Circuit with Corresponding Propagation Delays

TIMING AND CLOCKING ISSUES IN DIGITAL CIRCUITS

T clk 0.6 ns +8 ns+ 0.4 ns

Therefore the minimum clock period should be 9ns.

Department of Electronics & Communication Engineering

TIMING AND CLOCKING ISSUES IN DIGITAL CIRCUITS

Figure 4.1 Timing diagram of positive clock skew.

Figure 4.2 Timing diagram of negative clock skew.

Department of Electronics & Communication Engineering

TIMING AND CLOCKING ISSUES IN DIGITAL CIRCUITS

Department of Electronics & Communication Engineering

TIMING AND CLOCKING ISSUES IN DIGITAL CIRCUITS

Figure4.3 Positive and negative clock skew.

Department of Electronics & Communication Engineering

TIMING AND CLOCKING ISSUES IN DIGITAL CIRCUITS

Figure4.4 Datapath structure with feedback.

4.2 Clock Jitter

TIMING AND CLOCKING ISSUES IN DIGITAL CIRCUITS

Figure4.5 Circuit for studying the impact of jitter on performance.

Department of Electronics & Communication Engineering

TIMING AND CLOCKING ISSUES IN DIGITAL CIRCUITS

Department of Electronics & Communication Engineering

TIMING AND CLOCKING ISSUES IN DIGITAL CIRCUITS

Table 4.1 Relationship for n and BER

4.3 Sources of Skew and Jitter