Вы находитесь на странице: 1из 11

Basics of multi-cycle & false paths

Nitin SinghNeha Agarwal,Arjun Pal Chowdhury, - August 07, 2014

1 Introduction

One of the significant challenges to RTL designers is to identify complete timing exceptions upfront.
This becomes an iterative process in complicated designs where additional timing exceptions are
identified based upon critical path or failing path analysis from timing reports. This approach leaves
a significant number of timing paths which may not be real, but these never get identified, since they
may not come up in the critical path report. However, synthesis and timing tools will continue to
expend resources optimizing these paths when it is not needed. At the same time, it can also impact
area and power consumption of the device.

The intent of this document is to provide examples of false and multi cycle path exceptions that are
easily missed by even experienced designers, and are identified through iterations on timing reports.

1.1 False Path

False Paths are those timing arcs in design where changes in source registers are not expected to
get captured by the destination register within a particular time interval. False Paths can be
categorized in the following design topologies.

1.1.1 Static False Path

Static false path are those timing arc in design where excitation of source register will not have any
impact or change in destination register. The timing path in these topologies can’t be sensitized by
any input vector even if both source and destination flops are running on same clock domains.

● EXAMPLE – 1

The following topology depicts an example of static false path. Here change in D1.Q will never
cause any change in D4.D. The value of D4 flop will always be governed by the value of D2.
Hence for this particular circuit D1->D4 can be treated as false path.
Figure 1: Static timing path

● EXAMPLE - 2

In a system a control bit is defined such that if the bit is SET, the clock going to DMA controller will
remain enabled in Low Power (STOP) mode, otherwise the clock will be gated in STOP mode.

Figure 2: DMA Clock gating

In the above circuit, when the system is in RUN mode, the clock gating cell will always remain
enabled and any change in dma_en control register won’t affect the clock gating enable generation
logic. User is supposed to write/change to this control register in RUN mode prior to enter low
power STOP mode. Once the system enters stop mode the pre programmed value of dma_en register
will govern the state of the clock gating cell.The single cycle timing path is the one generated from
“stop_mode_reg -> CG_cell”. The path “dma_en_reg -> CG_cell” is a false path. The clock skew and
cell placement in this topology could cause timing problems even if there is very small combinational
delay between the flip flops dma_en_reg->CG cell. Giving the right exceptions to the timing tool will
help optimize the cell placement with in the first iteration.

● EXAMPLE-3

IO output timing arcs are usually critical and is a limiting factor to decide the maximum baud rate
support of a synchronous protocol interface. It becomes important to identify the proper REG->OUT
timing arc and isolate the invalid REG->OUT path by constraining them as False Path. It can save lot
of iteration time and efforts put by timing and placement tools for meeting the IO-Spec of
synchronous interfaces.

Usually, before starting any data transaction the protocol modules need to be configured properly.
These configuration registers are considered to be static during any data transaction. However
timing arc could exist from these configuration registers to output pads. This is another example
where upfront design constraints can help ease the timing tool’s task to meet the target frequency.

In the topology given below, the configuration registers (config1, config2 and config3) are supposed
to be programmed with a particular value prior to making any data transaction. When the data
transaction has been initiated, the valid timing arc would be shift register->combo3->IO buffer->
PAD. However the timing arc originated from config1/2/3 and terminated at PAD can easily be
disabled by putting false path exceptions.

Figure 3: I/O path from configuration registers

However the above exceptions need to be reviewed properly to avoid any uncovered corner case.
For example: In a half duplex communication or memory interface the data direction can change on
the fly. So it may require to meet the timing from “data direction register-> PAD” and the path can’t
be considered as false path.

1.1.2 False Reset Timing Arc

During device start-up it is not required all the application modules of a device to remain enabled.
Hence by default, the clock to those modules is gated. During system reset de-assertion sequence
the reset de-assertion to those modules happened in absence of the clock. As there is no clock, so no
chance of metastability due to reset de-assertion timing violation. Hence asynchronous reset
recovery/de-assertion paths to those modules can be considered as false path.

1.1.3 Asynchronous False Path (CDC path)

If the clock domain of the source register is asynchronous to the clock domain of the destination
register then the path is considered as asynchronous or Clock Domain Crossing path. It is not
possible to have any timing correlation in these paths because there is no defined relationship
between the clock edges of the launching and capturing domain. These paths can be treated as false
path for timing analysis. In this case, it is the responsibility of the designer to avoid any occurrence
of setup/hold violation at capturing domain registers. In this section some popular synchronization
techniques are discussed which helps to avoid the occurrence of metastability in designs with
asynchronous clocks. Designers should provide these exceptions to the timing tool before synthesis.
Sometimes it may happen that an asynchronous design is used with synchronous clocks in SOC
integration. In such cases, these exceptions are particularly useful since timing tool doesn’t really
need to optimize these timing paths where synchronization topologies are used.
● Two Flop Synchronizer :

When a signal crosses from one clock domain to another it needs to be synchronized first before it is
used in destination domain. The destination register can experience setup/hold violation since there
is no relationship between the launching and capturing clock domain edges. Metastability problem
in first flop of synchronizer is blocked using second flop with an assumption that the first flop will
settle down to a value before the next clock edge arrives on second flop. The probability of
metastability occurrence on both the flops directly depends on the frequency of the clock source.
The settling time in case of a violation depends on the flop characteristics of a particular technology
library and has to be lesser than the fastest clock supported in the design.

Figure 4: Two Flop Synchronizer

Please note that it is not necessary that DB1 signal will settle down to a high value after the
metastability period. The design has to ensure the stability of DA signal for more than one cycle of
the destination clock. If the design can’t ensure the stability then the same can cause the change of
the signal unregistered in destination domain.

Two flops synchronizer is generally used to synchronize the control signal across the domain.

● Mux Synchronizer

Mux synchronizer based design topology is generally used when designer has to transmit a multi
bit data bus across any clock domain. In this topology, the clock domain crossing of the data bus
is controlled by the mux select signal which gets enabled when the input data becomes valid at
mux input. This ensures that the destination flop will never be meta-stable due to change in data
input.

Designer has to ensure here that the input data should remain stable when the mux enables its
input path and it should only change when the mux selects the feedback path.
Figure 6: Mux Synchronizer

● FIFO

FIFOs are often used to safely pass data from one clock domain to another asynchronous clock
domain. Using a FIFO to pass data from one clock domain to another clock domain requires multi-
asynchronous clock design techniques.

Figure 7: Asynchronous Fifo

The write Pointer points the next memory location for Data Write. The read pointer points the
current FIFO location to be read. Write pointer value changes with write clock and read pointer
value changes with read clock, however both the pointers cross clock domain to determine fifo
full/empty logic. Hence it is important to synchronize them in destination clock domain before using
in FIFO full/empty logic. Gray Encoding is preferred before crossing clock domain for multi bit
signals.

● Handshaking

One of the popular methods to send data from one clock domain to another is using hand shaking
protocol, where sender sends the data with request and the receiver acknowledge the data by
sending acknowledgment. Both the request and acknowledgement signal gets synchronized in
destination clock domain before being used inside destination domain state machine.

Figure 8: Handshaking scheme

1.1.4 Pseudo Asynchronous False Path

Pseudo asynchronous false paths are those where source and destination register both are clocked
by same clock or a clock derived out of the same source but still they are designed such a way that
meeting the timing specification is impossible and can be ignored. The difference in clock and data
skews between source and destination registers makes it virtually impossible to meet timing. In such
cases, designers need to ensure that there should not be any stringent timing requirement for these
paths.
Figure 9: Pseudo Async False Path

In the above design PLL clock is used to generate system_clk along with other on chip/off chip clock
sources. Although system_clk is derived from PLL, there is a huge uncommon path between PLL
output and system clock due to functional and test clock mux, postscaler and clock gating cell
present down the line in clock path.

The circuit working on PLL output can communicate with logic present in system clock.Since system
clock is derived from PLL , hence both the clock are treated synchronous. Tool will try to meet
timing for the same. However the huge uncommon clock path sometime becomes nightmare for
timing tool to meet. This kind of pseudo asynchronous path can be treated as false path. However
treating false path can’t prevent the “capture_intc_reg” to become meta-stable as no timing is
ensured which can lead to any functional failure.

In the above design a 2 flop synchronizer has been placed which blocks the metastability to
propagate to the whole system. Now, simply “D1_reg->Sync1_reg” path can be treated as false path.

Note: In the above circuit, the designer should ensure the delay because of two-flop synchronizer
won’t cause any protocol or functional issue in design.

Multi Cycle Path

1.1 Multi Cycle Path

The maximum frequency of a synchronous system is determined by the delay of the longest
sensitizable paths. However in complex high frequency design there can be some combinatorial path
whose propagation delays are more than that of the time period of maximum operating clock
frequency. If there is no requirement for the signal to propagate to the next sequential element
within one clock cycle, then proper design techniques need to be implemented to make them a valid
Multicycle path. A Multicycle path in a sequential circuit is a combinational path which doesn’t have
to complete the propagation of the signals along the path within one clock cycle. For a Multicycle
path of N, design should ensure the signal transition propagated from source to destination within N
clock cycle. So if the longest path of a system is a Multicycle path it doesn’t limit the operating
frequency of the system. Some popular Multicycle path design has been described below.

1.1.1 MCP of 2

If in a high frequency system two inter communicating sub modules are operating with a clock ratio
of 1:2 and there is no requirement to capture the source signal transition within once clock cycle of
faster clock, a multi cycle of two circuit can be implemented to handle the situation.
Figure 10:MCP Example

In the above circuit where data is launched from flop D1 and captured by a flop which is running
twice faster than launched flop’s clock is a half cycle path of clk or single cycle path of clk_2x.If the
propagation delay of combinatorial logic in between is huge, then it may not be possible to meet the
timing for this half cycle path. If there is no requirement for the input data to get captured by the
very next pos-edge of 2x clock, proper circuit need to be introduced to relax the timing requirement.

In the above circuit the output of F3 is used to enable the capture gate mux. The capture gate enable
(F3.Q) ensures F1_reg->F2_reg as a Multicycle path of two with respect to clk_fast(clk_2x).Similar
circuit can be implemented when data signal crosses from clk_2x -> clk.

Note : In the above circuit it is assumed that the first positive edge of clk_fast and clock_slow will
always happened together and remain aligned. Complex and efficient “capture_gate enable” circuit
can be designed when the ratio is more than 2:1

1.1.2 MCP of n cycle


Figure 11: MCP Example

The above circuit is a Multicycle path of n-1 cycle from D1_reg->D2_reg. When the input data
becomes valid at M1 mux input the state machine (FSM) generates a pulse for one clock cycle. The
data gets captured by D1 flop in next cycle. The pulse gets propagated through a shift register
consisting of ‘n’ number of flops and after “n-1” clock cycle M2 mux will enable the input data to
reach D2 flop. Next cycle it will be captured by D2 flop. This way the timing can be relaxed to
compensate the huge propagating delay of the combinatorial logic present in between D1 and D2
flop.

In the above timing diagram N is equal to 3. So the path from D1_reg -> D2_reg can be constrained
as MCP of 2.

1.1.3 Multi Cycle path at peripheral read/write interface

In some low-performance, low bandwidth peripheral register read/write interface (programmer


model interface) data transaction happens in two phase. First cycle of this transfer is called setup
phase. During this phase the ADDR,DATA(incase of write) and other control signals becomes valid at
peripheral boundary. At Next clock edge the module enable signal gets asserted indicating the
second phase of the read/write transfer. The address and data signals are latched at the end of
ENABLE cycle by the peripheral, hence providing extra one cycle margin for ADDR and DATA signal
to reach at peripheral boundary.
WRITE CYCLE:

In Data write cycle address line and Write Data bus becomes valid at 2nd clock edge( beginning of T2
cycle) and the same gets registered by peripheral at 4th clock edge(beginning of T4 cycle).Let’s
consider the naming of source registers are PERIPH_BUS_CONTROLLER_ADDR_reg ,
PERIPH_BUS_CONTROLLER_WDATA_reg and the destination register is
PERIPH_WDATA_CAPT_reg.

MCP of 2 for Write cycle can be applied on below paths :

PERIPH_BUS_CONTROLLER_ADDR_reg -> PERIPH_WDATA_CAPT_reg


PERIPH_BUS_CONTROLLER_WDATA_reg -> PERIPH_WDATA_CAPT_reg

PERIPH_BUS_CONTROLLER_SEL_reg -> PERIPH_WDATA_CAPT_reg

READ CYCLE:

Similarly for read transfer the address to peripheral becomes valid at 2nd clock edge, and peripheral
puts the valid data on read data bus on T3 cycle hence the read data becomes valid at 4th clock edge.
Here the source register is PERIPH_BUS_CONTROLLER_ADDR_reg and destination register is
PERIPH_BUS_CONTROLLER_RDATA_CAPT_reg.

MCP of 2 for Read Cycle

PERIPH_BUS_CONTROLLER_ADDR_reg -> PERIPH_BUS_CONTROLLER_RDATA_CAPT_reg.

PERIPH_BUS_CONTROLLER_SEL_reg -> PERIPH_BUS_CONTROLLER_RDATA_CAPT_reg.

Note : Amba Peripheral bus is one such interface which have two cycle access to peripheral.

1.1.4 Multi Cycle access to slow peripheral

In high performance peripheral bus, data read/write access happens in single cycle. However in that
system if some peripherals are not required to run as faster as the whole system it can create each
read/write access as Multicycle access by asserting wait states to peripheral bus controller.
In the above example a peripheral is running with a frequency half of that system clock frequency. A
gasket has been placed to generate one cycle wait state to peripheral bus controller for each
read/write access. Without any wait-state (for fast access) bus controller would keep the address,
control and data signal(for write) valid for one cycle of sys_clk. During read/write access to the slow
peripheral , the gasket generates wait state(“transfer_wait” in the diagram) for one cycle which
extends ADDR,DATA and control signal for one more cycle of sys_clk .

Address and write data becomes valid at 3rd edge of sys_clk at peripheral boundary. Peripheral
latches the address and write data bus on 5th edge of sys_clk(3rd edge of periph clk) when
periph_module_en gets asserted by the gasket. Hence the write access to this peripheral is of MCP
of 2 with respect to system clock (sys_clk).

During read operation peripheral puts the valid data on read data bus which gets capture by master
at 5th active clock edge of system clock. Hence the path generated from bus_controller_address_reg
and terminated at “bus_controller_read_data_capt_reg” is a MCP of 2.

Note : Path originated from periph_module_en generation register and terminates at peripheral’s

Write data capture register or bus_controller_read_data capture register is still a half cycle path of
periph_clk or single cycle path of sys_clk.Similarly path originated from transfer_wait generation
register to bus_controller is also a half cucle path of periph_clk or single cycle path of sys_clk.

1.2 Conclusion

Designing proper false or multi cycle paths and using the constraints during timing analysis helps to
close the timing of a high frequency system. At the same time, providing wrong constraints can lead
to catastrophic failure of the device. Designers should be very careful while designing or providing
constraints for synthesis or timing analysis.

The authors work at Freescale Semiconductor, India.

Вам также может понравиться