Академический Документы
Профессиональный Документы
Культура Документы
A scaling primer
G
ls
ws Ss
Interconnect role
Short (local) interconnect
Used to connect nearby cells
Minimize wire C, i.e., use short min-width wires
Medium to long-distance (global) interconnect
Size wires to tradeoff area vs. delay
Increasing width Capacitance increases, Resistance
decreases Need to find acceptable tradeoff - wire sizing
problem
Fat wires
Thicker cross-sections in higher metal layers
Useful for reducing delays for global wires
Inductance issues, sharing of limited resource
Cross-Section of A Chip
Block scaling
Local interconnects :
tint : (r/s2)(c)(ls)2 = rcl2
Local interconnect delay unchanged (compare to faster devices)
Global interconnects :
tint : (r/s2)(c)(l)2 = (rcl2)/s2
Global interconnect delay doubles unsustainable!
v(t)=0.5v0 t = 0.69RC
i.e., delay = 0.69RC (50% delay)
v(t)=0.1v0 t = 0.1RC
v(t)=0.9v0 t = 2.3RC
i.e., rise time = 2.2RC (if defined as time from 10% to 90% of Vdd)
Delay
Elmore Delay
Driver is modeled as R
Driver intrinsic gate delay t(B)
Delay = all Ri all Cj downstream from Ri Ri*Cj
Elmore delay at n2 R(B)*(C1+C2)+R(w)*C2
Elmore delay at n1 R(B)*(C1+C2)
n1 n2
B R(B)
C1 R(w) C2
Elmore Delay
For uniform wire
u v u
C C(b)
x/2 x/2
R rx/2 C R rx/2
cx/4 cx/4 cx/4 cx/4 C
t
t_unbuf = R( cx + C ) + rx( cx/2 + C )
t_buf = 2R( cx/2 + C ) + rx( cx/4 + C ) + tb x
t_buf t_unbuf = RC + tb rcx2/4
Combinational Logic Delay
Register Register
Combinational
Primary Logic Primary
Input Output
clock
l1 l2 l3 ln
Topt L 2 Rd C g rc (rC g Rd c
Delay grows linearly with L (instead of quadratically)
Total buffer count
80
clk-buf
70
50
40
30
20
10
0
90nm 65nm 45nm 32nm
Ever-increasing fractions of total cell count will be buffers
70% in 32nm
ITRS projections
RAT = 300
Delay = 350
Slack = -50
slackmin = -50
RAT = 700
Delay = 600
Slack = 100
RAT = Required Arrival Time
Slack = RAT - Delay
RAT = 300
Delay = 250
Decouple capacitive Slack = 50
slackmin = 50 load from critical path
RAT = 700
Delay = 400
Slack = 300
Timing Driven Buffering
Problem Formulation
Given
A Steiner tree
RAT at each sink
A buffer type
RC parameters
Candidate buffer locations
Find buffer insertion solution such that the
slack at the driver is maximized
Candidate Buffering Solutions
Candidate Solution Characteristics
vi is a sink
Each candidate ci is sink capacitance
solution is
associated with
vi: a node
ci: downstream v is an internal node
capacitance
qi: RAT
Van Ginnekens Algorithm
Dynamic Programming
Solution Propagation: Add Wire
c2 = c1 + cx
q2 = q1 rcx2/2 rxc1
r: wire resistance per unit length
c: wire capacitance per unit length
Solution Propagation: Insert Buffer
c1b = Cb
q1b = q1 Rbc1 tb
Cb: buffer input capacitance
Rb: buffer output resistance
tb: buffer intrinsic delay 28
Solution Propagation: Merge
cmerge = cl + cr
qmerge = min(ql , qr)
Solution Propagation: Add Driver
r = 1, c = 1
2 2
(v1, 1, 20) Rb = 1, Cb = 1, tb = 1
Rd = 1
Add wire
(v2, 3, 16) (v2, 1, 12)
v1 v1
Insert buffer
Add wire Add wire
(v3, 5, 8) (v3, 3, 8)
v1 v1
Left
candidates
Right candidates
Merged candidates
32
Solution Pruning
Two candidate solutions
(v, c1, q1)
(v, c2, q2)
Solution 1 is inferior if
c1 > c2 : larger load
and q1 < q2 : tighter timing
Pruning When Insert Buffer
(2)
(3)
(a) (b)
(4)
36
Candidate Example Continued
(4)
(5)
37
Candidate Example Continued
After pruning
(5)
38
Merging Branches
Left
Candidates
Right
Candidates
39
Pruning Merged Branches
Critical
With pruning
40
Van Ginneken Example
(20,400)
Buffer Wire
C=5, d=30 C=10,d=150
(30,250)
(5, 220) (20,400)
Buffer Wire
C=5, d=50 C=15,d=200
C=5, d=30 C=15,d=120
(45, 50) (30,250)
(5, 0) (5, 220) (20,400)
(20,100)
(5, 70)
41
Van Ginneken Example Contd
(45, 50) (30,250)
(5, 0) (5, 220) (20,400)
(20,100)
(5, 70)
Wire C=10
(20,100) (30,250)
(30,10) (5, 220) (20,400)
(5, 70)
(15, -10)
42
Basic Data Structure
Y
N Prune 4
N Prune 4 q3 < q4 ?
q3 < q4 ?
44
Pruning In Merging
Left Right ql1 < ql2 < qr1 < ql3 < qr2
candidates candidates
(cl1, ql1) (cr1, qr1) Merged (cl1, ql1) (cr1, qr1)
(cl2, ql2) (cr2, qr2) candidates (cl2, ql2) (cr2, qr2)
(cl3, ql3) (cl1+cr1, ql1) (cl3, ql3)
(cl2+cr1, ql2)
(cl3+cr1,
(cl1, ql1) (cr1, qr1) (cl1, ql1) (cr1, qr1)
qr1)
(cl2, ql2) (cr2, qr2) (cl2, ql2) (cr2, qr2)
(cl3+cr2, ql3)
(cl3, ql3) (cl3, ql3)
45
Van Ginneken Complexity
Generate candidates from sinks to source
Quadratic runtime
Adding a wire does not change #candidates
Adding a buffer adds only one new candidate
Merging branches additive, not multiplicative
Linear time solution list pruning
Optimal for Elmore delay model
Multiple Buffer Types
2 2 r = 1, c = 1
(v1, 1, 20)
Rb1 = 1, Cb1 = 1, tb1 = 1
Rb2 = 0.5, Cb2 = 2, tb2 = 0.5
(v2, 3, 16)
Rd = 1
v1