Вы находитесь на странице: 1из 59

Design of High-Speed Links:

A look at Modern VLSI Design


Vladimir Stojanovi

Integrated Systems Group


Massachusetts Institute of Technology

Chip design is changing

Becoming constrained by power

Not so much by area/density


Pentium
3M transistors
30mW/mm2
0.6um tech
4W
0.1GHz

Pentium 4
125M transistors
850mW/mm2
90nm tech
103W
3.4GHz

Best systems trade-off circuits, architecture


and system issues
Integrated Systems Group

Power-performance system optimization

Complex, many levels of hierarchy and variables

Integrated Systems Group

Power-performance system optimization

Complex, many levels of hierarchy and variables

Individual components
Flops & latches
(power and timing critical)
D Q

D Q
Logic

Clk

Clk

V. Stojanovi, V.G. Oklobdzija "Comparative Analysis of MS Latches and Flip-Flops


for High-Performance and Low-Power Systems," IEEE Journal Solid-State Circuits, April 1999.
Integrated Systems Group

Power-performance system optimization

Complex, many levels of hierarchy and variables


Vdd1, Vth1

Individual components
Flops & latches
(power and timing critical)
D Q

D Q

Vdd2,

Vdd3,

Vth2

Vth3

Vdd4,

Vdd5,

Vth4

Vth5

Logic A

Clk

-Physical (Vdd,Vth,Sizing)
-Logic
-uArchitecture
(parallelism, pipelining)

D Q

Logic
Clk

System level,
VLSI blocks and circuits

Logic B

Clk
D Q

D Q
Logic A

D Q
Logic A

Logic B

Clk

D Q
Logic B

Clk

Clk

Clk

V. Stojanovic, D. Markovic, B. Nikolic, M. A. Horowitz and R. W. Brodersen


"Energy-Delay Tradeoffs in Combinational Logic using Gate Sizing and Supply Voltage Optimization,"
European Solid-State Circuits Conference, September 2002
Integrated Systems Group

Look at system-level problem: links

Seems pretty simple:

Channel
Transmitter

Receiver

Challenging multi-disciplinary area

Circuits
Communications
Optimization
Integrated Systems Group

What makes it challenging

High speed
link chip

> 2 GHz signals

Now, the bandwidth limit is in wires


Integrated Systems Group

New link design


Dealing with bandwidth limited channels

This is an old research area

But cant directly apply their solutions

Textbooks on digital communications


Think modems, DSL
Standard approach requires high-speed A/Ds and digital
signal processing
20Gs/s A/Ds are expensive

(Un)fortunately need to rethink issues

Integrated Systems Group

Outline

Show system level optimization for links

Create a framework to evaluate trade-offs

Background on high-speed links

High-speed link modeling

System level optimization

Practical implementation issues

Current / future work

Integrated Systems Group

Backplane environment
Package
On-chip parasitic
Line card trace

Back plane trace

(termination resistance and


device loading capacitance)

Back plane connector

Package
via

Line card
via

Backplane via

Line attenuation
Reflections from stubs (vias)
Integrated Systems Group

10

Backplane channel
Loss is variable

Same backplane
Different lengths
Different stubs

Top vs. Bot

-20
-30
-40

>30dB @ 3GHz
But is that bad?

-50

9" FR4

-10

Attenuation is large

Attenuation [dB]

26" FR4
9" FR4,
via stub
26" FR4,
via stub

-60

Required signal amplitude


set by noise

Integrated Systems Group

6
8
10
frequency [GHz]

11

Inter-symbol interference (ISI)

Channel is low pass

Our nice short pulse gets spread out

pulse response

1
0.8

0.6

Tsymbol=160ps

0.4
0.2

Dispersion
short latency
(skin-effect,
dielectric loss)
Reflections
long latency
(impedance mismatches
connectors, via stubs,
device parasitics,
package)

0
0

3
ns
Integrated Systems
Group

12

ISI
1

Error!

Amplitude

0.8
0.6
0.4
0.2
0
0

8
10 12
Symbol time

14

16

18

Middle sample is corrupted by 0.2 trailing ISI (from the previous


symbol), and 0.1 leading ISI (from the next symbol) resulting in
0.3 total ISI
As a result middle symbol is detected in error
Integrated Systems Group

13

Prior state of high-speed links


Driver/
Equalizer
dataIn

Data Slicer
Channel

serializer

deserializer

dataOut

ref Clk
PLL

Links components well developed

Clock, data
recovery

Fast multiplexed transmitters and receivers


Precise timing generation and data recovery

Starting to use equalization (1 2 taps)

Few taps set manually at the transmitter


Integrated Systems Group

14

Barriers to improving link performance

No good link system and noise models

Maximum achievable data rates unknown

Cannot predict the right architecture for a given


set of channels
Need to make performance/power tradeoff
Limited link communication system design

Peak power constraint in the transmitter

No solution for optimal transmit equalization


No solution for automatic equalization
Integrated Systems Group

15

Previous system models

Mostly non-existent

Borrowed from computer systems

Worst case analysis

Borrowed from data communications

Gaussian distributions

Can be too pessimistic in links

Works well near mean


Often way off at tails
ISI distribution is bounded

Need accurate models

To relate the power/complexity to performance


Integrated Systems Group

16

How bad is Gaussian model?

-2
-4
-6
-8
-10

-1 0

40m V erro r @ 10
25% o f eye h eig h t

25

50
75
100
re sidual ISI [m V ]

10

Steady-State Phase Probability

Impact on CDR phase


0
-2

9% T s ym bol

-4
-6
-8
-10

4% T s ym bol

log

log

10

probability [cdf]

Cumulative ISI distribution

erro r @ 10

80

-1 0

100 120 140 160 180


phase count

Gaussian model only good down to 10-3 probability


Way pessimistic for much lower probabilities
Integrated Systems Group

17

A new model

Use direct noise and interference statistics

Main system impairments

Interference

Voltage noise (thermal, supply, offsets, quantization)

Timing noise always looked at separately

Key to integrate with voltage noise sources


Need to map from time to voltage

Integrated Systems Group

18

Effect of timing noise

Voltage noise
when receiver
clock is off

Jittered
sampling

Ideal
sampling

Voltage noise

The effect depends on the size of the jitter, the


input sequence, and the channel
Need effective voltage noise distribution
Integrated Systems Group

19

Example: Effect of transmitter jitter


ideal

bk

TX
k

kT

TX
k +1

(k + 1)T

kT

(k +1)T

kTX

bk

bk
TX
k +1

noise

bk kTX

Decompose output into ideal and noise


Noise are pulses at front and end of symbol

bk kTX+1

bk

Width of pulse is equal to jitter

Approximate with deltas on bandlimited channels

V. Stojanovi, M. Horowitz, Modeling and Analysis of High-Speed Links,


IEEE Custom Integrated Circuits Conference, September 2003. (invited)
Integrated Systems Group

20

Jitter effect on voltage noise

Transmitter jitter

High frequency (cycle-cycle) jitter is bad

Changes the energy (area) of the symbol


No correlation of noise sources that sum

Low frequency jitter is less bad

Effectively shifts waveform


Correlated noise give partial cancellation

kRx

Receive jitter

kRx

Modeled by shift of transmit sequence


Same as low frequency transmitter jitter

Bandwidth of the jitter is critical

It sets the magnitude


of the noise created
Integrated Systems Group

21

RefClk

Phase

+detector
Kpd

Icp
Icp

VCO

R Kvco/s
C

Clock
buffer

Noise transfer functions [dB]

Jitter source from PLL clocks


10
from
input clock

from
clock buffer supply

-10

-20

from
VCO supply

-30

N
5

10

Noise sources

10

10

10

10

10
10
frequency [Hz]

Reference clock phase noise


VCO supply noise
Clock buffer supply noise

M. Mansuri, C-K.K. Yang, "Jitter optimization based on phase-locked loop design parameters,"
IEEE Journal Solid-State Circuits, Nov. 2002
E. Alon, V. Stojanovic, M. Horowitz Circuits and Techniques for High-Resolution Measurement
of On-Chip Power Supply Noise, IEEE Symposium on VLSI Circuits, June 2004.
Integrated Systems Group

22

2x Oversampled bang-bang CDR


Slicer
deserializer

dn

dataOut

dn
PD

en
data Clk

edge Clk

Phase
mixer

en (late)

ref Clk
PLL

dn-1

Generate early/late from dn,dn-1,en

Phase
control

Simple 1st order loop, cancels receiver setup time

Now need jitter on data Clk, not PLL output

Base linear PLL jitter


Add non-linear phase selector noise from CDR
Integrated Systems Group

23

Bang-bang CDR model

Model CDR loop as a state machine Markov chain


log 10 Steady-State Probability

pdn,i

phold ,i

-5

-10

i 1 i

pup,i

i +1

-15

50

100

150

200
250
Phase Count

Gives the probability distribution of phase

Which is the CDR jitter distribution

A.E. Payzin, "Analysis of a Digital Bit Synchronizer," IEEE Transactions on Communications, April 1983.
Integrated Systems Group

24

Outline

Show system level optimization for links

Create a framework to evaluate trade-offs

Background on high-speed links

High-speed link modeling

System level optimization

Limits What is the capacity of these links?


Improving todays baseband signaling

Practical implementation issues

Current / future work


Integrated Systems Group

25

Attenuation [dB]

Baseline channels
0
-20

26" NELCO,
no stub

(b)

-40
-60
-80

26" FR4,
via stub

-100
0

10

15
20
frequency [GHz]

Legacy (FR4) - lots of reflections


Microwave engineered (NELCO)
Integrated Systems Group

26

Capacity with link-specific noise


FR4

140
therm al noise

120
100

therm al noise and LC PLL


phase noise

80

Capacity [Gb/s]

Capacity [Gb/s]

NELCO
140
120
100

therm al noise

80

therm al noise and ring PLL phase noise

60

60
40

40

20

20

0
-25

0
-25

-20

therm al noise and


LC PLL phase noise

log10(Clipping probability)
-15
-10
-5
0

Effective noise from phase noise

-20

log10(Clipping probability)
-15
-10
-5
0

therm al noise and


ring PLL phase noise

Proportional to signal energy


Decreases expected gains

Still, capacity much higher than data rates in todays links


Integrated Systems Group

27

Removing ISI
Linear transmit equalizer
Tx
Data

Sampled
Data

Anticausal taps

Deadband

Feedback taps

Channel
50

Causal
taps

outP
outN

I eq 0

TapSel
Logic

50

Decision-feedback equalizer

Transmit and Receive Equalization

Changes signal to correct for ISI


Often easier to work at transmitter

DACs easier than ADCs

J. Zerbe et al, "Design, Equalization and Clock Recovery for a 2.5-10Gb/s 2-PAM/4-PAM Backplane
Transceiver Cell," IEEE Journal Solid-State Circuits, Dec. 2003.
Integrated Systems Group

28

Tx
Data

Anticausal taps

Attenuation [dB]

Transmit equalization headroom constraint

Peak power constraint

unequalized

-5

-10
-15

Channel

equalized

-20

Causal
taps

-25
0

frequency [GHz]
0.5

1.5

2.5

Amplitude of equalized signal


depends on the channel

Transmit DAC has limited voltage headroom


Unknown target signal levels

Hard to formulate error or objective function

Need to tune the equalizer and receive comparator levels


Integrated Systems Group

29

Optimization example:
Power constrained linear precoding
pow er constraint

ak

precoder

channel
pulse response

g
noise

ak

ek
ak

MSE( w, g ) = Ea 1 2 g w P1 + g 2 w PPT w + g 2 2

Ea ( w P1 ) 2
SINRunbiased ( w) =
T
T
T
Ea w P (I 1 1 )(I 1 1 )T P T w + 2

Add variable gain to amplify to known target level

Formulate the objective function from error

SINR is not concave in w in general


Change objective to quasiconcave

SINRunbiased

V. Stojanovi, A. Amirkhany, M. Horowitz, Optimal Linear Precoding with Theoretical


and Practical Data Rates in High-Speed Serial-Link Backplane Communication,
IEEE International Conference on
Communications
, June 2004
Integrated
Systems Group
30

Optimal linear precoding

Still, does this objective really relate to link performance?

Need to look at noise and interference distributions


0.5d min w P1 V peak wPI PD 1 offset
T

maximize =
T
T
T
w
Ea w P (I 1 1 I PD )(I 1 1 I PD )T P T w + 2

s.t.

1/ 2

w 1 1
2=wTS0TXw+wTS0RXw+2thermal

Minimize BER

Residual dispersion into peak distortion


Reflections into mean distortion

Includes all link-specific noise sources


Integrated Systems Group

31

Including feedback equalization


Feedback equalization (DFE)

Subtracts error from input


No attenuation

Problem with DFE

Need to know interfering bits


ISI must be causal

Feedback
equalization

0.8
Amplitude

0.6
0.4
0.2
0
0

6 8 10 12 14 16 18
Symbol time

Problem - latency in the decision circuit


Receive latency + DAC settling < bit time

Can increase allowable time by loop unrolling

Receive next bit before the previous is resolved


Integrated Systems Group

32

One-tap DFE with loop unrolling


1

Pulse response

+1

-1

Integrated Systems Group

33

One-tap DFE with loop unrolling


1

+1+
+1
+
0
-1+
-1

Integrated Systems Group

34

One-tap DFE with loop unrolling


1

+1+
+1
+

+1-

0
-
-1+
-1
-1-

Integrated Systems Group

35

One-tap DFE with loop unrolling

+1+

d n | d n 1 = 1

+
+1-

xn

D Q

dClk
-
-1+

d n | d n 1 = 0

-
-1-

d n 1

dClk

Instead of subtracting the error

Move the slicer level to include the noise


Slice for each possible level, since previous value unknown

K.K. Parhi, "High-Speed architectures for algorithms with quantizer loops,"


IEEE International Symposium on Circuits and Systems, May 1990
Integrated Systems Group

36

BER contours
5 tap Tx Eq

5 tap Tx Eq + 1 tap DFE


150

150

-5

-5

100
-10

50

-15

0
-50

-20

-100
-150
0

20

40

60 80 100 120 140 160


time [ps]

margin [mV]

margin [mV]

100

-10

50

-15

0
-50

-20

-25

-100

-25

-30

-150
0

20

40

60 80 100 120 140 160


time [ps]

-30

Voltage margin

Min. distance between the receiver threshold and contours


with same BER
Integrated Systems Group

37

Pulse amplitude modulation

Binary (NRZ)

1 bit / symbol
Symbol rate = bit rate

PAM4

2 bits / symbol
Symbol rate = bit rate/2

00
1

01

11
10

Integrated Systems Group

38

Multi-level: Offset and jitter are crucial


thermal noise +
offset

thermal noise

35

PAM16

30

PAM8

25

20

PAM16
PAM4

25
20

5
2

4 6

PAM4

15

8 10 12 14 16 18 20
Symbol rate [Gs/s]

PAM2

10

10

0
0 2 4 6 8 10 12 14 16 18 20
Symbol rate [Gs/s]

0
0 2 4 6 8 10 12 14 16 18 20
Symbol rate [Gs/s]

PAM2

10

PAM8

25

PAM2

PAM4

15

30

15

20

0
0

30
Data rate [Gb/s]

Data rate [Gb/s]

Data rate [Gb/s]

45
40

thermal noise +
offset+
jitter

PAM8

To make better use of available bandwidth, need better


circuits
PAM2/PAM4 robust candidate for next generation links
Integrated Systems Group

39

Full ISI compensation too costly


thermal noise
+ offset
Data rate [Gb/s]

Data rate [Gb/s]

18

18
16

PAM4

14
12

20

20

20

18

16

16
14

14

PAM8

12 PAM16
10 PAM8
8

10
8
PAM2

Data rate [Gb/s]

thermal noise

thermal noise
+ offset+ jitter

PAM4

12

PAM4

10
PAM2

PAM8

PAM2

0
0 2 4 6 8 10 12 14 16
Symbol rate [Gs/s]

0
0 2 4 6 8 10 12 14 16
Symbol rate [Gs/s]

0
0 2

6 8 10 12 14 16
Symbol rate [Gs/s]

Todays links cannot afford to compensate all ISI

Limits todays maximum achievable data rates


Integrated Systems Group

40

Outline

Show system level optimization for links

Create a framework to evaluate trade-offs

Background on high-speed links

High-speed link modeling

System level optimization

Practical implementation issues

Low-cost adaptation
Dual-mode link (hardware re-use)

Current / future work

Integrated Systems Group

41

Fully adaptive dual-mode link


Config Registers

CDR
Logic

Phase
Mixers

PLL

Receiver

Reflection
Canceller

PAM2/PAM4
2-10Gb/s
0.13m
40mW/Gb/s

Transmitter

Backchannel RX
Backchannel TX

Reconfigurable dual-mode PAM2/PAM4 link

Adaptive equalization
Transmit and receive equalization
DFE with loop unrolling

V. Stojanovi et al. Adaptive Equalization and Data Recovery in Dual-Mode (PAM2/4)


Serial Link Transceiver, IEEE Symposium on VLSI Circuits, June 2004.
Integrated Systems Group
42

Adaptation with minimum overhead


dLev
Tx Data

error
adaptive
aClk sampler

Rx data
Channel

Adaptive
macro

dClk

thresholds
tap
updates

edge

CDR

eClk

aClk dClk eClk

Adaptive sampler

Generates the error signal at reference level

Monitors the link

tap updates

Adjustable voltage and time reference


On-chip sampling scope

Can replace any other sampler - calibration


Integrated Systems Group

43

Dual-loop adaptive algorithm

Data level reference loop


dLevn +1 = dLevn stepdataLev sign(en ), x n > 0
dLevinit

errorinit

xn

dLevmid

p-p

dLevend
Sign(en )

Initial eye

Sign( xn )

Mid-way equalized

Equalized

Equalizer loop
wn +1 = wn + stepw sign(en ) sign( x n )
Scale the equalizer - output Tx constraint
Integrated Systems Group

44

Dual loop convergence 4 tap example


PAM2, 5Gb/s, 4taps Tx Equalization
100

1000
800

tap weight [mV]

dLev [mV]

80
60
40
20

400
200

post2

pre1

-200

0
0

main tap

600

50

100

150

number of updates

200

-400
0

post1
50

100

150

200

number of updates

Hard to estimate analytically


Experimental results show

Both loops are stable within wide range 0.1 10x of relative
speeds
Integrated Systems Group

45

Hardware re-use: Dual-mode receiver


prDFE enable

thresh (+)

D Q

D Q

dClk

in

D Q

D Q

prDFE enable
0

dClk

msb

D Q

thresh(-)
1
D Q

thresh (-)

D Q

prDFE enable

D Q

dClk

lsb(+)

D Q

thresh(+)
0

lsb(-)

D Q

PAM4

Integrated Systems Group

46

Hardware re-use: Dual-mode receiver


prDFE enable

thresh (+)

D Q

D Q

dClk

in

D Q

prDFE enable

D Q

dClk

thresh (-)

clk
D Q

inP

outN
outP

inP

D Q

clk

prDFE enable

outP

D Q

dClk

PAM4

msb

D Q

thresh(-)

lsb(+)

D Q

thresh(+)

D Q

outN

inN
I
+ I thresh
2

I
I th resh
2

clk

pre-amp with offset


Integrated Systems Group

comparator
47

lsb(-)

Hardware re-use: Dual-mode receiver


prDFE enable

thresh (+)

D Q

D Q

lsb(+)

D Q

0
0

dClk

in

D Q

D Q

prDFE enable
0

dClk

msb

D Q

1
1
D Q

thresh (-)

D Q

prDFE enable

D Q

dClk

lsb(-)

D Q

PAM2

Integrated Systems Group

48

Hardware re-use: Dual-mode receiver


prDFE enable

thresh (+)

D Q

D Q

lsb(+)

D Q

dClk

in

D Q

D Q

prDFE enable
0

dClk

msb

D Q

1
1
D Q

thresh (-)

D Q

prDFE enable

D Q

dClk

lsb(-)

D Q

PAM2 with loop-unrolled DFE tap

Integrated Systems Group

49

Hardware re-use: Dual-mode receiver


prDFE enable

thresh (+)

D Q

D Q

lsb(+)

D Q

thresh(+)

dClk

in

D Q

D Q

prDFE enable
0

dClk

thresh(-)
1
D Q

thresh (-)

D Q

prDFE enable

D Q

dClk

msb

D Q

lsb(-)

D Q

PAM2 with loop-unrolled DFE tap

Leverage multi-level properties of signals in loop-unrolling


Re-use PAM4 receiver hardware (slicers and CDR)
Integrated Systems Group

50

Improvements with loop-unrolling


0.4

unequalized

0.3

-3

200

0.2

150

-3.5

100

0.1

-4

[ps]
0

0.25

1000

[V]

2000

3000

[mV]

50

4000

0.2

0.1

0
0

1000

2000

-5
0

0.05

-4.5

-100

fully transmit equalized

0.15

0
-50

transmit equalized
with one tap DFE

log10(voltage probability distribution)

[V]

50

100

150

200 [ps]

Signal as seen by the


receiver (on-chip scope)

[ps]
3000
4000
Integrated Systems Group

51

Model and measurements


0

log10(BER)

-2
-4
-6
-8
-10
-12
-14
80

60

40

20

-20 -40 -60 -80


Voltage Margin [mV]

PAM4, 3taps of transmit equalization, 5Gb/s,


26 FR4 channel
Integrated Systems Group

52

Outline

Show system level optimization for links

Create a framework to evaluate trade-offs

Background on high-speed links

High-speed link modeling

System level optimization

Practical implementation issues

Current / future work

Bridging the gap to link capacity

Other similar system optimizations

Integrated Systems Group

53

Bridging the gap: Multi-tone link

bits/dimension

Multi-tone data rates with thermal noise

Nelco 64 Gb/s

FR4 38 Gb/s

4
2
0

10
12
GHz

A. Amirkhany, V. Stojanovic, M.A. Horowitz, Multi-tone Signaling for High-speed Backplane


Electrical Links, IEEE Global Telecommunications Conference, November 2004.
Integrated Systems Group

54

Bridging the gap: Multi-tone link


bits/dimension

data0

LPF

Nelco 64 Gb/s

dataN
LPF

10
12
GHz

BPF

ejw1t
BPF

data0

LPF

BPF

LPF

data1

ejw1t

# levels

LPF

FR4 38 Gb/s

data1

Multi-tone data rates with thermal noise

BPF

LPF

dataN

f
ejwNt
ejwNt
Challenge balancing the inter-symbol and
inter-channel interference

Microwave filter techniques


Custom signal processing

Integrated Systems Group

55

The Problem with Multi-Mode Fiber


1
0.8
0.6
1

Multi-modal dispersion

0.4

0.8

0.2

0.6

0
0

10

15

20

25

0.4
0.2
0
0

0.8

0.6
0.4

Source - Corning

0.2
0
0

Modal dispersion limits the data rates to ~ 3Gb/s/km


Integrated Systems Group

56

Example Fiber Modes


1000

2000

500

0
5

-2000
5

x 10

-5

0
-5 -5

5
x 10

-5

-5

x 10

-5 -5

2000

2000

-2000
5

-2000
5

x 10

-5

0
-5 -5

5
-5

x 10

x 10

-5

Integrated Systems Group

-5 -5

5
x 10

-5

5
x 10

-5

57

SLMs for Equalization

Shape the E-field projected on the fiber


Lens performs Fourier Transform

SLMs adjust the spatial frequency of the light


1000
500
-

0
5
x 10

-5

0
-5 -5

5
x 10

-5

MEMS
Spatial Light Modulator
Optimize to reduce modal dispersion

dmin

Objective is intensity makes optimization challenging

E. Alon, V. Stojanovic, J. M. Kahn, S. Boyd, M. Horowitz


Equalization of Modal Dispersion in Multimode Fiber using Spatial Light Modulators,
Systems
Group2004.
IEEE Global TelecommunicationsIntegrated
Conference
, November

58

Conclusions

Interfaces are challenging system designs

Good space to explore system level optimization

Optimization leads to novel approaches

Baseband links

Still, far from the capacity of these links

PAM4 and simple DFE reduce effect of ISI


Low cost adaptive, self calibrating link
Looking into multi-tone to bridge the gap

Multimode fiber optics

Leverage multiple propagation modes rather than being


limited

Integrated Systems Group

59

Вам также может понравиться