Вы находитесь на странице: 1из 7

DDR interface gate-level simulation

advantages
Abhinav GaurKushagra Khorwal, - July 28, 2015

No matter how advanced Static Timing Analysis (STA) tools become, there are still a lot of
advantages to running GLS, since it has the capability of uncovering a lot of hidden design issues
which are difficult to find in RTL simulations. It gives a clear picture of how the design will behave at
the desired frequency with actual delays in place. Hence, although GLS has its own set of challenges
(like setup issues, long run time, etc.), it is still very much a part of the sign-off process, and acts as
a design-quality confidence booster.

Why GLS is important for DDR memory interface

The JEDEC standard defines a lot of timing parameters for DDR memories which need to be followed
for ensuring correct operation. The timing requirements are strict and need to be implemented with
a lot of caution. For example, the duty cycle requirement for data strobe signal (commonly known as
DQS) is around 45-55%; if this spec gets violated during either write (where the memory controller
will drive the strobe signals to the external memory) or read (where the memory drives the strobe
signals), then we cannot ensure data sanity. The physical implementation is a challenge in itself
since the designer has to take care that there is no deviation from the standard for various timing
parameters. The DDR implementation is difficult since it involves a lot of custom procedures also.
Hence, we cannot rely solely on STA tools in case of DDR and running GLS becomes a “must” to
ensure correct physical implementation of the design. Running GLS for DDR interfaces is
challenging, since it involves various testbench setup issues, multiple timing checks, skew checks,
various data transfers modes (burst length – 4,8), various standards to conform to (like DDR3,
DDR3L, LPDDR2, etc.) and so on.

This paper will try to highlight the various types of issues that can be found by running gate level
simulations for DDR, how they can be fixed and implemented correctly in design and what should
the Physical Design/STA team do to avoid such type of issues.

Gate level simulations for DDR memory interface – Sample issues, challenges & Solutions

Duty cycle of clock coming to DDR memory controller

The quality of clock coming to the DDR controller is an important parameter since most of the
signals being driven to the external memory by the controller (like DDR clock, data strobes, etc.) are
derived from this clock itself.

Generic diagram for a SoC having DDR controller


The duty cycle of clock coming to the external memory should be in the range of 47-53% for
ensuring correct operation. If the duty cycle is out of range from the source itself (which is the clock
coming to the controller), one is bound to see huge difference between high period and low period of
the clock, which can lead to violation of various parameters like tdqsh_min, tdqsl_min, etc. STA has
checks for this, but any error; mismatch can lead to write/read failures.

The failure in case of WRITE looks quite apparent but this issue can lead to failure in READ
operation as well! It is important to know that during READ, the DQS being driven by the DDR
memory is internally derived from the DDR clock it receives. So if the memory is getting a clock
having bad duty cycle, the DQS it generates during read can violate specs like read postamble
period (trpst), etc.

Hence, the designer must ensure that the path from the clock source (for example – PLL) output to
the input of the input clock of the controller should introduce minimum skew in the high and low
period of the clock.

Read DQS gating issues

Read DQS gating is a feature wherein the DQS is gated to the read circuit of the controller until a
read operation actually starts. This is done to prevent the read FIFOs of the controller from getting
corrupted. The read DQS gating is disabled at the beginning of READ and is again enabled at the
end of read operation. See below example:

First signal is the gated read DQS signal, second is the ungated DQS signal coming from PAD and
n_52 is the enable signal for read DQS gating. From above waveform, it can be clearly seen that late
enabling of read DQS gating at the end of read has caused “x” captured on the gated read DQS
signal.

Also, since it is an asynchronous event, correct software programming of the controller is done so
that read DQS gating is disabled at the right time.

If read DQS gating is getting disabled for a particular DQS byte late as compared to other DQS, it
can lead to a miss of complete read DQS pulse for that byte. See below example:

The first 4 signals correspond to DQS0 and next 4 correspond to DQS1. Since the read DQS gating is
getting disabled later for DQS0, it is leading to a complete miss of one DQS pulse (“rd_dqs” has 4
pulses whereas “rd_dqs_gated” has only 3). On the other hand, for DQS1 (the lower 4 signals), the
DQS gating enabling/disabling is happening correctly.
Hence, timing team must ensure that the delays for read DQS gating disable signal is same for all
the bytes. There should be a timing check on Read DQS Gating de-assertion time with respect to the
read preamble.

Read circuit corruption in case Read DQS gating feature is not used

If read DQS gating is not used, pull down is used on the DQS pads for protecting the read circuit
from getting corrupted during write operation. Generally, at the end of the write operation, output
path of DQS pad gets disabled and input path gets enabled. However, if the input path gets enabled
just a bit earlier (at the end of write), it can lead to “x” corruption on input path of DQS pad which in
turn corrupts the read circuit of the controller. Hence, it must be ensured by timing team that the
input path of DQS pad gets enabled only after the write operation has completed.

Considerations in case of loopback operation

In case of loopback, there is no external memory attached to the SoC. The data is looped back from
the DDR pads and stored in the read FIFO. The loopback feature is used to ensure correct operation
of the read/write path and can be used ofr testing purposes. In case of loopback, whenever a WRITE
command is sent to the controller, it enables both- the output path as well as the input path of the
DQS pads. But with delays in picture, contentions can occur at the end of data. The timing team
must ensure that de-assertion of “ibe” (input path enable of pad) of the DQS pad happens only when
the data has been captured completely by the read circuit, otherwise it can either lead to “x”
corruption of input path of DQS pad or complete miss of data while reading data during loopback.
See below example:

The signal ipp_ibe_DDR0_DQS1 is the input buffer enable of the DQS1 pad and ipp_obe_DDR0_DQS1
is the output buffer enable. For loopback operation, both are “1”. DDR0_DQS1 is the pad signal,
ipp_do_DDR0_DQS1 is the output path of the pad and ipp_ind_DDR0_DQS1 is the input path. Since
the ibe de-assertion is happening a bit earlier than the falling edge of data on the pad, it is leading to
corruption of input path of the pad (ipp_ind_DDR0_DQS1 going “x”).

Trimming and fine tuning options at PAD level

There are various trimming options available for DDR pads – like fine tuning the duty cycle of signal,
controlling the cross point in case of differential pads (DQS & CLK), controlling the delays for output
path of the pad, etc. However, these trimming options must be used in a minimal way during GLS
and most of the conditions must be met through timing itself. This is important so that we are
stressing the design fully without any trimming relaxations.

Paying attention to various timing violations signalled by the DDR memory model

It is important to review each and every warning/error being signaled by the DDR memory model in
GLS simulations. When actual delays come into picture in GLS, the model responds to violation of
various timing parameters which may seem trivial in nature, but it is important to get them resolved.
For example- the end of WRITE burst operation is known as write postamble period (twpst). The
general understanding is that write postamable period should be greater than twpst_min parameter.
However, this is true only if twpst_min is less than the tdqsl_min (minimum DQS low period). In case
dqsl_min > twpst, then tdqsl_min time should be met for the write postamble period. By this way, we
are maintaining both tdqsl_min as well as twpst_min parameters in implementation.

Considerations for the Physical Design team

Considerations for the Physical Design team

Partitioned Design – Skew between chip_top and Block

Generally, DDR controller is implemented as a hard block in SoCs. The data channels and strobes
traverse through a long region physically based on the pad locations. DDR controller Block and Pads
need to be abutted to each other so that the data to/from the pads is fed directly from the DDR block
without any chance of net or cell variation. Many times this requirement may not be feasible due to
physical / floorplan constraints. In those cases, skew balancing should happen looking at the
channels routed at both chip_top and inside block. For e.g., consider channel A is placed far apart,
while channel B is nearby DDR controller block. The skew balancing algorithm should take the
maximum distance amongst all data channels for balancing. There should be buffers placed at
regular intervals both at chip-top and inside block. All the spiral routing on channel B (near the pad)
should be done inside DDR controller block looking at the max distance provided by channel B.
Consider Fig 1 which shows the arrangement where the skew needs to be met for all data channels
through the controller and PHY to PADS. Fig 2 presents a scenario where part of the channel is
sitting on chip-top and part inside block.

Fig 1: Data Transmission between controller and DDR PHY


Fig 2: Data Transmission between controller and DDR PHY

OBE Assertion and De-assertion

Obe (Output Buffer Enable on Pads) assertion and deassertion should be timed properly across
corners w.r.t Data / DQS. Obe Assertion and De-assertion plays an important role in Read DQS
Gating, Preamble and Postamble times. These timings could be modeled in terms of Skew checks
between DQS Data and OBE signals.

Pads Configurability

The DDR pads should be such that they provide delay and duty cycle configurability across PVT. This
could be used for both debug and functional purposes in field. The resolution of the delay could be
as lesser as possible. This is an additional feature which could help in,

- Improved Eye diagrams using Bit Skewing and De-Skewing on Data channels.

- Improved Duty Cycle for Skewed N and P lots of Silicon

Rise/Fall on both Data and Strobe

For meeting Skew across PVT within a certain limit, both rise fall edges of data and strobes must be
looked into. One might ask since data and dqs are generated from the same clock edge, only rise-fall
of data (from symmetric eye) should be considered with respective edge of strobe. But, this isn’t
true. The reason is that memory would always capture the launch data from controller at the next
T/4 shifted edge. So, if controller is launching the data on rise edge, it should get captured on the
fall edge at memory. Hence, the skew should be met across all rise and fall combinations. Fig 3
explains this.
Fig 3 : Data flow between controller and memory. 2nd diagram shows T/4 shifted data at memory

Duty Cycle Checks

JEDEC spec requires a 47-53% duty cycle clock and DQS. As shown in Fig 3, DDR controller block is
generally partitioned in chip-top. PLL or the main clock source should ideally be sitting quite close to
the controller to avoid the long clock path. Both Data and Strobes are generated from the same
clock edge. So, any rise / fall variation on the clock path would eventually disturb the DQS duty
cycle, and the eye diagram of Data. If the PLL is sitting quite far from the DDR controller, following
recommendations would work for better duty cycle:

● Have similar type of cells in the path.


● Use symmetric cells if available.
● Use Inverters if feasible to avoid degradation due to skewed lots.
● Use inverters in front and last stages of a long buffer chain. This would reduce the degradation.
(Fig 4)
● Use same type of transition and loads across the clock path
● For all the clock sources, be it PLL or an external clock source, duty cycle checks should be met till
the pad

Fig 4: Inverters at the beginning and end of a buffer chain to improve the duty cycle

Testbench issues to be addressed

The DDR GLS simulations are sensitive to various testbench setup parameters. If there is any
problem in testbench, it can lead to failures in GLS. Various points to be taken care of are:

One should ensure that all annotations are happening correctly, because if any annotation gets
missed, it can lead to violations of various timing parameters like duty cycle of clock, etc. Hence on
must ensure that all units of the design, clock gating cells, pads, etc. are properly annotated and
there are no warnings pertaining to DDR.

The hookup of the DDR model to the SoC must be as per the standards. For example, in case a pull
down is required on DQS, it should be done before-hand to avoid unnecessary failures in GLS.

Sometimes the DDR models used in verification environment are way stricter than the actual DDR
memory behavior. For example, the model expects zero skew between the differential DQS signals,
which is impossible to achieve in actual implementation. So the verification engineer must be aware
of what all model parameters can be relaxed and what all should be considered as it is.

The DDR pads must be configured according to the values for which timing has been met by STA.
For example, the drive strength of pad needs to be programmed as per recommendation from STA.

The “pulse_e” & “pulse_r” settings must be such that they allow DDR clock pulse to pass through.
The DDR clock is usually of a high, and if the pulse reject setting is not lowered, it will not allow
DDR clock to pass through a delay path.

Synchronizer flops (if any) present in the DDR controller hard (custom) blocks must be identified
before hand to avoid unnecessary GLS debug.

Conclusion

It is important to realize the importance of running GLS for DDR memory interfaces. It can uncover
a lot of hidden design problems which are otherwise invisible to the designer. With a lot of tight
checks for various DDR timing parameters, GLS ensures that the physical implementation is done
correctly. One must ensure the following points while running GLS for DDR interface:

● The design is running at the maximum desired frequency.


● The design is running with models for the actual memories to be used on board.
● The DDR pads are configured as per suggestions from STA and physical implementation team.
● The simulations should be run with the correct set of pad libraries for various DDR memories like
DDR3L, DDR3, LPDDR2, etc.

Running gate level simulations for DDR interfaces requires a lot of interaction between the
verification and timing (STA) team. With correct feedback going to the timing team at each step, one
can ensure proper implementation of a DDR memory interface.

Also see:

● Gate level simulations: verification flow and challenges


● DDR simulation strategy catches bugs early
● DDR Memories Comparison & Overview

Вам также может понравиться