Академический Документы
Профессиональный Документы
Культура Документы
Besides faster data rates, the new DDR4 standard incorporates additional changes from
prior DDR technologies which impact the board design engineer. New factors in DDR4
such as an asymmetric termination scheme, data bus inversion and signal validation
using eye masks require new methods of validating designs through simulation. This
paper investigates the effects of DDR4s Pseudo Open Drain (POD) driver on data bus
signaling and describes methodologies for dynamically calculating the DRAMs internal
VrefDQ level required for data eye analysis, methodologies for generating and verifying
the data eye as well as ways of incorporating write leveling and calibration into the
simulation. Additionally, evaluation of Simultaneous Switching Noise (SSN) by
incorporation of power integrity effects into the signal integrity analysis is also critical
to board design and timing closure and will be elaborated with examples. A system
design example using IBIS 5.0 power aware models will be described including a
simulation accuracy study comparing the IBIS results with transistor-level models.
W H I T E P A P E R
ABSTRACT
w w w . m e n t o r . c o m
INTRODUCTION
DDR4 is the next step in JEDECs family of DRAM parts. It has been developed to serve the market needs of higher
speeds and lower power consumption. These factors have contributed to new features in DDR4, as well as new
requirements which need to be accounted for while designing a DDR4 system.
The first sections of this paper investigate DDR4s Pseudo Open Drain driver and what its use means for power
consumption and Vref levels for the receivers. Subsequent sections of the paper look at a DDR4 system design
example and the need for simulating with IBIS power aware models versus transistor level models for Simultaneous
Switching Noise characterization.
To see the difference that the termination scheme makes in the total power consumption, the current draws in the
low and high states can be compared.
When in the low state, there is a current draw in both SSTL and in POD. In fact, POD might draw slightly higher
current since the termination is to the voltage rail whereas the termination is to only half the voltage rail in SSTL.
This is somewhat offset by a slightly lower voltage rail in DDR4.
w w w. m e nto r. co m
However, the main difference between the two drive options is highlighted when a high is driven. Whereas SSTL
continues to draw current at a rate approximately equal to when driving low, POD draws no power when driving a
high.
Fig
gure 3 - Curren
nt comp
parison
n when driving
g High
So, the way to decrease system power with DDR4 is to maximize the number of highs being driven. This is where
the DBI feature comes in handy. If there are at least 5 DQ signals in an 8-bit lane which are driven low, then all bits
are toggled, and the Data Bus Inversion (DBI) signal is asserted low to indicate that the inversion has taken place.
This way, out of the total of 9 signals (8 DQ signals and one DBI), at least five are driven high. If the original data
contains four or more DQ signals being driven high, then the DBI signal is de-asserted high, once again ensuring at
least 5 of the total 9 bits being driven high. This way, on each transaction, it is guaranteed that at least 5 of the 9
bits are driven in the power reduced high state.
Figure 4 - With DBI, if 5 or more lows are driven, toggle the entire byte
CALCULATION OF VREF
In DDR3, an external reference voltage is used to compare the input signal to determine a high vs. a low. This
external voltage is often a generated either by a voltage divider which is then filtered, or by an external precision
voltage regulator. DDR4 however, requires that the Vref be generated within the DRAM, and be adjustable. The
Vref will be set to a value on each powerup.
w w w. m e nto r. co m
To calculate this center voltage a simple setup with driver and termination is analyzed. To simplify calculations, the
transmission line is taken to be very short, and the driver strength when driven high and low is assumed to be
equal.
We can first consider the DDR3 case.
When driving high, the voltage at the receiver will be the superposition of the effect of the two voltage sources,
Vdd/2 and Vdd.
When driving low, the voltage at the receiver will be a simple voltage divider
The center voltage for this DDR3 setup can then be obtained by taking the average of the two results.
This value is always half the rail voltage. It is constant with respect to all other aspects of the setup, including
termination values or drive strengths.
w w w. m e nto r. co m
We can then consider the DDR4 case, and apply the same sequence as above. When DDR4 is driven high, the
voltage at the receiver is simply Vdd since both the termination and the driver are strapped to Vdd. Similar to DDR3
when driving low, the receiver voltage is the result of the voltage divider.
Again, the center of the receiver eye will be the average of the two values.
Note that in this situation, the center voltage value is dependent on not only the supply voltage, but also on the
characteristics of the transmitter and receiver. This implies that the ideal voltage to be used at the receiver will
depend on the setup, on the silicon batch, read vs. write and other system variables.
To see the effects, we can consider a simple driver, transmission line, receiver setup. The termination resistance of
the receiver is varied to see the effect of the eye for DDR3 and DDR4.
First, with DDR3, as the receiver termination is weakened from 40 Ohms to 60 Ohms to 120 Ohms, the signal is
allowed to freely go towards greater extremes both highs and lows. However, the center of the eye for all three
settings is always fixed at Vdd/2.
w w w. m e nto r. co m
For the DDR4 setup, the receiver ODT is varied from 40 to 60 to 80 Ohms. With the weaker termination, the lows
are allowed to go lower, but the high value stays more or less fixed. This causes the center value of the eye to
increase with stronger (lower
value) termination.
To analyze the effects of using the two options, let us first consider the average margin loss for the device when
using each of these options.
Consider a pin x in a device d. The receiver eye at the pin might have a voltage center, Vx, which is optimal for
that pin. The device however has a different reference voltage, Vd, which it uses for several pins in that device.
Corresponding to each of these reference voltages will be a set of high and low threshold levels.
w w w. m e nto r. co m
Note that the margin lost can be negative, which implies a margin gain. For a margin lost on the high side, an
equivalent margin will be lost on the low side, and vice versa.
Next, we can compare the average margin loss using two different algorithms to determine Vref.
Option 1 uses the average of all the signals as the reference voltage.
Therefore,
This is somewhat intuitively expected. When the reference voltage is obtained by using all the signals, the average
margin loss is zero because for each pin, the margin lost will be offset by the margin gained by another set of pins
in the group.
Now, considering option 2, the reference voltage for the device is taken as the average of the highest and lowest
voltages.
Resulting in,
w w w. m e nto r. co m
The average margin loss is not zero. Either on the high or the low side, the average margin loss will be greater than
zero. However, although it may appear that option 1 might be the better course, option 2 actually works out better
when we consider the extreme signals.
Lets take as an example an eyemask requirement of DDR4. The eyemask height requirement is 136mV, or threshold
68mV.
So, if option 1 were used, then the requirement for the signal would be 73068, or 662mV on the low side and
798mV on the high side. Similarly, with option 2, the requirement for the signal would be 75068, or 682mV on the
low side and 818mV on the high side.
Next, we can take a look at the pin which has an optimal center of 800mV, and compare the results with a reference
voltage using option 1 vs. option 2. Let us assume that the signal arriving at this pin has a peak-to-peak swing of
260mV, or 800130mV (730mV on high side, and 670mV on low side). This eye should be able to pass all signals if
an optimal threshold is selected.
w w w. m e nto r. co m
As can be seen from the diagram below, this implies that the signal with option 1 (730mV) as Vref will have a large
margin on the high side, but will fail on the low side. The signal using option 2 (750mV) as Vref, however, will have
a smaller margin on the high side, but will pass on the low side as well.
In general, if the extreme high and low requirements across all pins of the device are given by Vh and Vl, the
incoming eye needs to have an eye-height of at least Vh-Vl if a common reference voltage is to be used. In this
case, assuming that the high and low requirements from the threshold are equal, the threshold needs to be at
(Vh+Vl)/2, which only considers the two extreme signals, and not the other signals. A threshold set to any other
level might cause issues with some signals even if the eye opening is at least Vh-Vl.
By taking only the extreme signals into account when calculating the reference voltage to be used, the margin of
the remaining signals may be reduced. However, by ensuring that the extreme high and low signals pass, it will be
ensured that all the other signals pass as well.
One method to do so would be to sample the data signal for a predetermined time window around each strobe
crossing as in Figure 13. If the strobe is early, then some parts of the data signal might be shifted. If the strobe is
delayed, there might be parts of the data which are not visible in the window. This is how the actual device would
react, since any shift in the DQS will affect the sampling of the signal.
w w w. m e nto r. co m
To illustrate this, the following is a simulation of a DQS and a DQ. The two parts of the DQS are intentionally
mismatched so as to create a non-ideal strobe at the receiver. The signal is run at 2400Mbps.
If the receiver signal is simply wrapped around at 416.67ps (one UI at 2400Mbps), then the eye has a jitter of about
12ps.
Figure 15 - Data eye with no strobe variation effect (left) vs. Data eye including strobe imperfections (right)
However, if the eye is created by sampling the signal around the strobe, then even discounting the runt signal
caused by the initial strobe transient, the jitter as seen by the data signal increases to 20ps.
w w w. m e nto r. co m
10
Mitigating SSO issues in a system requires optimizing the design of the PDN of the printed circuit board (PCB),
package and on-die. Detailed circuit models are needed for each piece. Historically, these circuit models are
combined and simulated in SPICE based simulators to analyze SSO effects. These simulations are computationally
intensive and lead to lengthy simulation times of hours to days. For solution-space and what-if analysis, simulation
times are simply too long.
SPICE-based transistor level models of the on-die drivers are often the most complex part of the system model.
This is especially true for the most accurate models that include layout-based RC parasitic circuit elements. One
effective way to reduce simulation time is to use behavioral buffer models. Behavioral models use simpler
algorithms than SPICE models, enabling faster simulation with often similar levels of accuracy.
The I/O Buffer Information Specification (IBIS) is a behavioral modeling format used industry-wide for SI simulation.
Commonly used versions of the IBIS specification include IBIS 4.2 and IBIS 5.0. Figure 16 shows common
implementations of an IO circuit model using a SPICE netlist, IBIS 4.2 and IBIS 5.0.
Figure 16 - Setup comparison between SPICE, IBIS 4.2 and IBIS 5.0
IBIS 4.2 and IBIS 5.0 have tables of data that describe circuit characteristics of the final IO buffer.
IBIS 4.2 assumes ideal power connected to the buffer. Thus, the SSO noise cannot be taken into account in the
simulation. IBIS 5.0 extended the usefulness for Power Integrity (PI) simulation specifically enabling the simulation
of SSO noise. New keywords in IBIS 5.0 specific to PI include [Composite Current], [ISSO PU] and [ISSO PD].
w w w. m e nto r. co m
11
[Composite Current] data are I-T tables that describe the shape of the rising and falling edge current waveforms
from the power reference terminal of the buffer (VDE). This switching current includes contributions from the
on-die decoupling circuit, crow-bar current, any termination current, signal driver current and pre-driver current.
Final driver current could be derived accurately by simulating IBIS 4.2 models, but this can significantly
underestimate the total driver current without details of the pre-driver contribution.
[ISSO PU] and [ISSO PD] data are tables describing the effective current of the pullup and pulldown driver
transistors as a function of the voltage on the pullup and pulldown supply reference nodes. The PI problem being
modeled is known as gate modulation and is caused by drooping power supply voltages on-die as the die PDN
attempts to pull current instantaneously through the inductive package PDN.
In addition to the [Composite Current], [ISSO PU] and [ISSO PD] data tables in the IBIS file, it is necessary to include
the characteristics of the on-die power supply decoupling structure. Due to limitations in the IBIS specification, a
model of the decouplings electrical behavior must be included in SSO simulations external to the IBIS buffer
model, connected across the power and ground reference terminals.
SPICE NETLIST
IBIS 4.2
IBIS 5.0
Simulation Time
Longer
Shorter
Shorter
SI simulation accuracy
High
High
Hight
PI simulation accuracy
High
Low
High
For improved SI, DDR3 used ZQ (Zero Quotient) calibration and ODT (On Die Termination). In addition to those,
DDR4 has Vref training functionality in the IO circuit, which can make SPICE Netlists much larger. For example, in
DDR4 the number of elements (that includes both MOSFET and parasitic RC) per SPICE netlist balloons to several
tens of thousands. In order to simulate SSO noise, it is necessary to model the full data channel, so the number of
elements can reach several hundreds of thousands. With this many elements, simulation time can take days. DDR4
simulation with SPICE netlists of the IO is not realistic. Since IBIS models have only data tables modeling the output
circuits, simulation time is significantly shortened.
IBIS 4.2/5.0 both provide accurate simulation results with ideal power conditions. When SSO noise is imposed, IBIS
4.2 has accuracy issues, but IBIS 5.0 gives a good match to SPICE netlist simulation. As seen in Figure 17, there is a
trade-off between simulation time and accuracy when SPICE netlists and IBIS 4.2 models are the choices. However,
IBIS 5.0 balances both performance and accuracy well.
w w w. m e nto r. co m
12
Figure 17 - Voltage noise comparison between SPICE netlist, IBIS 4.2 and
IBIS 5.0
w w w. m e nto r. co m
13
w w w. m e nto r. co m
14
Figure 21 shows a comparison between two simulation engines. The blue line in the figure shows the waveform
generated by a traditional simulator. The correct waveform result in red is generated by a simulator that employed
the improved simulation technique.
w w w. m e nto r. co m
15
First, a comparison was done between the SPICE netlist model and IBIS 5.0 model simulations that do not have SSO
noise or crosstalk noise. The DQS signal and one DQ bit were stimulated at 2400Mbps in Write mode. The
measurement was done at the die pad of the SDRAM with the DQS as the trigger. Both simulations matched well as
seen in Figure 24. The eye widths, referred to as VdiVW, were within 10ps difference. For this simulation, the IBIS 5.0
model provides enough accuracy.
w w w. m e nto r. co m
16
Next, SSO noise was examined. The DQS signals and 32 DQ bits were operated at 2400Mbps in Write mode, and
the SDRAM die pad and VDE voltage at the controller were measured. The upper waveform in Figure 25 shows the
VDE waveform, and the lower waveform is a DQ signals waveform at the SDRAM die pad. Due to the 32 bits of DQ
signals switching, VDE voltage at the controller is fluctuating, which is the SSO noise. The SPICE netlist model (blue
line) and IBIS 5.0 model (red line) meet almost perfectly. It is confirmed that SSO noise was accurately simulated
using the IBIS 5.0 model.
Next, a comparison was done where SSO noise and crosstalk noise were imposed. The DQS signals and one DQ
signal (victim) were operated at 2400Mbps in Write mode with the other 31 bits (aggressors) operated both in
phase and out-of-phase with the DQ victim. The measurement was done at the SDRAM die pad with the DQS as
the trigger. Results are shown in Figure 26
w w w. m e nto r. co m
17
The eye widths in Figure 26 became generally smaller than the widths in Figure 24 due to the SSO noise.
Comparing results between the SPICE netlist model simulation and IBIS 5.0 model simulation shows that IBIS 5.0
eye width is larger (300ps versus 278ps). The IBIS 5.0 model simulation underestimated the SSO noise influence by
22ps (8%). This underestimation was caused by ignoring the delay fluctuations in the pre-driver circuitry. IBIS 5.0
models ignore the effects of voltage changes on pre-driver circuitry. Increasing voltage on-die will make transistors
in the pre-driver circuits switch faster; the opposite effect is seen with decreasing voltage. These voltage changes
can lead to mismatches in timing between pre-driver pullup and pulldown signal paths as well as overall increased
or decreased delay of the driver switching.
Finally, simulation times were compared. One cycle of PRBS7 stimulus for DDR4-2400Mbps is 60ns. It took 221
hours (9.2 days) to simulate the schematic shown in Figure 23 with the SPICE netlist model. The simulation of the
IBIS 5.0 model was completed in 3 hours, which is a 98.6% reduction from the SPICE netlist model. IBIS 5.0 is useful
for large scale simulation, which is required for chip-package-PCB level co-design.
CONCLUSION
A successful DDR4 board design can be accomplished using the analysis techniques described in this paper. EDA
software updated to support DDR4 simulation can help the designer properly use DBI, calculate the proper Vref
level for analysis, apply the DDR4 receiver mask for timing verification and generate data eyes with correct jitter
contributions. Using IBIS 5.0 power aware models can significantly speed up simulation time while allowing for
reasonably accurate simulation of SSO jitter effects.
w w w . m e n t o r . c o m
2015 Mentor Graphics Corporation, all rights reserved. This document contains information that is proprietary to Mentor Graphics Corporation and may
be duplicated in whole or in part by the original recipient for internal business purposes only, provided that this entire notice appears in all copies. In
accepting this document, the recipient agrees to make every reasonable effort to prevent unauthorized use of this information. On March 1st 2015,
system LSI businesses of Fujitsu Limited and Panasonic Corporation have been consolidated and transferred to Socionext Inc. The contents of this white
paper, which contain the company name Fujitsu Semiconductor, are still valid by replacing the name with Socionext. All trademarks mentioned in this
document are the trademarks of their respective owners.
MF 2-15
TECH12690-w