Вы находитесь на странице: 1из 47

Interconnect Optimizations

A scaling primer
G

Ideal process scaling: S D


Device geometries shrink by S (= 0.7x)
Device delay shrinks by s w
S
h
Wire geometries shrink by s
R/m : r/(ws.hs) = r/s2 l
Cc/m : (hs).e/(Ss) = Cc
C/m : similar
R/m doubles, C/m and Cc/m unchanged hs

ls

ws Ss
Interconnect role
Short (local) interconnect
Used to connect nearby cells
Minimize wire C, i.e., use short min-width wires
Medium to long-distance (global) interconnect
Size wires to tradeoff area vs. delay
Increasing width Capacitance increases, Resistance
decreases Need to find acceptable tradeoff - wire sizing
problem
Fat wires
Thicker cross-sections in higher metal layers
Useful for reducing delays for global wires
Inductance issues, sharing of limited resource
Cross-Section of A Chip
Block scaling

Block area often stays same


# cells, # nets doubles
Wiring histogram shape invariant

Global interconnect lengths dont shrink


Local interconnect lengths shrink by s
Interconnect delay scaling
Delay of a wire of length l :
tint = (rl)(cl) = rcl2 (first order)

Local interconnects :
tint : (r/s2)(c)(ls)2 = rcl2
Local interconnect delay unchanged (compare to faster devices)

Global interconnects :
tint : (r/s2)(c)(l)2 = (rcl2)/s2
Global interconnect delay doubles unsustainable!

Interconnect delay increasingly more dominant


Buffer Insertion For Delay
Reduction
Analysis of Simple RC Circuiti(t)
R
vT(t) C v(t)
R i (t ) v(t ) vT (t )
d (Cv(t )) dv(t )
i (t ) C
dt dt
dv(t )
RC v(t ) vT (t )
dt
state
variable
Input
waveform
Analysis of Simple RC Circuit
dv(t )
Step-input response: RC v(t ) v0u (t )
dt
t
v(t ) Ke RC v0u(t )
v0u(t) match initial state:
v0 v(0) 0 K v0u (t ) 0

v0(1-e-t/RC)u(t) output response for step-input:


t
v(t ) v0 (1 e RC
)u(t )
Delays of Simple RC Circuit
v(t) = v0(1 - e-t/RC) -- waveform
under step input v0u(t)

v(t)=0.5v0 t = 0.69RC
i.e., delay = 0.69RC (50% delay)

v(t)=0.1v0 t = 0.1RC
v(t)=0.9v0 t = 2.3RC

i.e., rise time = 2.2RC (if defined as time from 10% to 90% of Vdd)

Commonly used metric


TD = RC (= Elmore delay)
Elmore Delay

Delay
Elmore Delay
Driver is modeled as R
Driver intrinsic gate delay t(B)
Delay = all Ri all Cj downstream from Ri Ri*Cj
Elmore delay at n2 R(B)*(C1+C2)+R(w)*C2
Elmore delay at n1 R(B)*(C1+C2)

n1 n2

B R(B)
C1 R(w) C2
Elmore Delay
For uniform wire

unit wire capacitance c x


unit wire resistance r
C

No matter how to lump, the Elmore delay


is the same
Delay for Buffer

u v u

C C(b)

Input capacitance Driver resistance


Intrinsic buffer delay
Buffers Reduce Wire Delay

x/2 x/2

R rx/2 C R rx/2
cx/4 cx/4 cx/4 cx/4 C
t
t_unbuf = R( cx + C ) + rx( cx/2 + C )
t_buf = 2R( cx/2 + C ) + rx( cx/4 + C ) + tb x
t_buf t_unbuf = RC + tb rcx2/4
Combinational Logic Delay

Register Register
Combinational
Primary Logic Primary
Input Output

clock

Combinational logic delay <= clock period


Buffered global interconnects:
Intuition
l

Interconnect delay = r.c.l2

l1 l2 l3 ln

Now, interconnect delay = r.c.li2 < r.c.l2 (where l = lj )


since (lj 2) < ( lj )2
(Of course, account for buffer delay also)
Optimal inter-buffer length
First order (lumped parasitic, Elmore delay) analysis
L

Rd On resistance of inverter
l Cg Gate input capacitance
r,c Resistance, cap. per micron

Assume N identical buffers with equal inter-buffer length l (L = Nl)



T N Rd (C g cl rl (C g cl / 2
L rcl / 2 (rC g Rd c (Rd C g
1
l
For minimum delay,
dT rc Rd C g
0 L 2 0 lopt
2 Rd C g
dl 2 lopt rc
Optimal interconnect delay
Substituting lopt back into the interconnect delay
expression:

L rclopt (rC g Rd c (Rd Cg
1
Topt
lopt


(rC g Rd c
2 Rd C g Rd C g
L rc
rc 2 Rd C g

rc


Topt L 2 Rd C g rc (rC g Rd c
Delay grows linearly with L (instead of quadratically)
Total buffer count
80
clk-buf
70

% cells used to buffer nets


buf
60 tot-buf

50
40
30
20
10
0
90nm 65nm 45nm 32nm
Ever-increasing fractions of total cell count will be buffers
70% in 32nm
ITRS projections

Feature size (nm)


Relative
delay 250 180 130 90 65 45 32
100
Gate delay (fanout 4)
Local interconnect (M1,2)
Global interconnect with repeaters
Global interconnect without repeaters
10

Source: ITRS, 2003


0.1
Buffers Improve Slack

RAT = 300
Delay = 350
Slack = -50
slackmin = -50
RAT = 700
Delay = 600
Slack = 100
RAT = Required Arrival Time
Slack = RAT - Delay
RAT = 300
Delay = 250
Decouple capacitive Slack = 50
slackmin = 50 load from critical path
RAT = 700
Delay = 400
Slack = 300
Timing Driven Buffering
Problem Formulation
Given
A Steiner tree
RAT at each sink
A buffer type
RC parameters
Candidate buffer locations
Find buffer insertion solution such that the
slack at the driver is maximized
Candidate Buffering Solutions
Candidate Solution Characteristics

vi is a sink
Each candidate ci is sink capacitance
solution is
associated with
vi: a node
ci: downstream v is an internal node
capacitance
qi: RAT
Van Ginnekens Algorithm

Candidate solutions are


propagated toward the source

Dynamic Programming
Solution Propagation: Add Wire

x (v1, c1, q1)


(v2, c2, q2)

c2 = c1 + cx
q2 = q1 rcx2/2 rxc1
r: wire resistance per unit length
c: wire capacitance per unit length
Solution Propagation: Insert Buffer

(v1, c1, q1)


(v1, c1b, q1b)

c1b = Cb
q1b = q1 Rbc1 tb
Cb: buffer input capacitance
Rb: buffer output resistance
tb: buffer intrinsic delay 28
Solution Propagation: Merge

(v, cl , ql) (v, cr , qr)

cmerge = cl + cr
qmerge = min(ql , qr)
Solution Propagation: Add Driver

(v0, c0, q0)


(v0, c0d, q0d)

q0d = q0 Rdc0 = slackmin


Rd: driver resistance
Pick solution with max slackmin
Example of Solution Propagation

r = 1, c = 1
2 2
(v1, 1, 20) Rb = 1, Cb = 1, tb = 1
Rd = 1
Add wire
(v2, 3, 16) (v2, 1, 12)
v1 v1
Insert buffer
Add wire Add wire
(v3, 5, 8) (v3, 3, 8)
v1 v1

slack = 3 Add driver slack = 5 Add driver


Example of Merging

Left
candidates

Right candidates

Merged candidates
32
Solution Pruning
Two candidate solutions
(v, c1, q1)
(v, c2, q2)
Solution 1 is inferior if
c1 > c2 : larger load
and q1 < q2 : tighter timing
Pruning When Insert Buffer

They have the same load cap Cb,


only the one with max q is kept
Generating Candidates
(1)

(2)

(3)

35 From Dr. Charles Alpert


Pruning Candidates
(3)

(a) (b)

Both (a) and (b) look the same to the source.


Throw out the one with the worst slack

(4)

36
Candidate Example Continued
(4)

(5)

37
Candidate Example Continued
After pruning

(5)

At driver, compute which candidate maximizes


slack. Result is optimal.

38
Merging Branches

Left
Candidates

Right
Candidates

39
Pruning Merged Branches

Critical

With pruning

40
Van Ginneken Example

(20,400)
Buffer Wire
C=5, d=30 C=10,d=150
(30,250)
(5, 220) (20,400)

Buffer Wire
C=5, d=50 C=15,d=200
C=5, d=30 C=15,d=120
(45, 50) (30,250)
(5, 0) (5, 220) (20,400)
(20,100)
(5, 70)

41
Van Ginneken Example Contd
(45, 50) (30,250)
(5, 0) (5, 220) (20,400)
(20,100)
(5, 70)

(5,0) is inferior to (5,70). (45,50) is inferior to (20,100)

Wire C=10

(20,100) (30,250)
(30,10) (5, 220) (20,400)
(5, 70)
(15, -10)

Pick solution with largest slack, follow arrows to get solution

42
Basic Data Structure

Worse load cap

(c1, q1) (c2, q2) (c3, q3)


Better timing

Sorted list such that


c1 < c2 < c 3
If there is no inferior candidates
q1 < q2 < q3
Prune Solution List
Increasing c

(c1, q1) (c2, q2) (c3, q3) (c4, q4)


N N
q1 < q2 ? Prune 2 q1 < q3 ? Prune 3 q1 < q4 ?
Y
Y
N Prune 3 q2 < q4 ?
q2 < q3 ?

Y
N Prune 4
N Prune 4 q3 < q4 ?
q3 < q4 ?
44
Pruning In Merging

Left Right ql1 < ql2 < qr1 < ql3 < qr2
candidates candidates
(cl1, ql1) (cr1, qr1) Merged (cl1, ql1) (cr1, qr1)
(cl2, ql2) (cr2, qr2) candidates (cl2, ql2) (cr2, qr2)
(cl3, ql3) (cl1+cr1, ql1) (cl3, ql3)
(cl2+cr1, ql2)
(cl3+cr1,
(cl1, ql1) (cr1, qr1) (cl1, ql1) (cr1, qr1)
qr1)
(cl2, ql2) (cr2, qr2) (cl2, ql2) (cr2, qr2)
(cl3+cr2, ql3)
(cl3, ql3) (cl3, ql3)
45
Van Ginneken Complexity
Generate candidates from sinks to source
Quadratic runtime
Adding a wire does not change #candidates
Adding a buffer adds only one new candidate
Merging branches additive, not multiplicative
Linear time solution list pruning
Optimal for Elmore delay model
Multiple Buffer Types

2 2 r = 1, c = 1
(v1, 1, 20)
Rb1 = 1, Cb1 = 1, tb1 = 1
Rb2 = 0.5, Cb2 = 2, tb2 = 0.5
(v2, 3, 16)
Rd = 1
v1

(v2, 1, 12) (v2, 2, 14)


v1 v1

Вам также может понравиться