Вы находитесь на странице: 1из 41

VLSI Scaling for Architects

Mark Horowitz
Computer Systems Laboratory
Stanford University
horowitz@stanford.edu

MAH VLSI Scaling for Architects 1


The Buzz is VLSI Wires are Bad

• A lot of talk about VLSI wires


A very popular figure
being a problem:
– Delay
– Noise coupling
• What are the characteristics of
chip wires?
– How do they compare to
scaled gates?
• And what does it really mean?
– To CAD developers
– To architects

MAH VLSI Scaling for Architects 2


How Will Scaling Change Computer Design?

To answer this question:


• First look at what changes when technology scales
– Surprisingly less changes than you might think
– Components get faster (both wires and gates)
– Mostly it allows one to build more complex devices
• Then look at how computing devices use silicon technology
– How architects and circuit designers use the transistors
– What are the looming problems with scaling
– What can be done to help

• Let’s start by looking at scaling CMOS technology


MAH VLSI Scaling for Architects 3
Predicting the Future
(without making a fool of yourself)
• Is very difficult
– The only guarantee is:
The future will happen, and you will be wrong
• Two approaches
– Think about limitations
• SIA 1994 Roadmap
– Limited oxide thickness, small clock frequency growth, etc.
– Industry hit points above the curve

– Project from current trends


• SIA 1997 Roadmap
– Allow miracles to occur, continue trends
– Project clock rates higher than physically possible

• So use a range of technology scalings


– Better chance of covering the correct answer
MAH VLSI Scaling for Architects 4
Technology Scaling

In scaling there are really two issues


• Devices
– Can we build smaller devices
– What will their performance be
• Wires
– Try to avoid the wet noodle effect

• There is concern about our ability to scale both of these


components

MAH VLSI Scaling for Architects 5


Device Scaling Limits

• Limitations to device scaling has been around since I started


– I started working in 3µ nMOS, 22 years ago (actually bipolar)
• Worries were
– Short channel effect
– Punchthrough
• drain control of current rather than gate
– Hot electrons
– Parasitic resistances
• Now worries are a little different
– Oxide tunnel currents
– Punchthrough
– Parameter control
– Parasitic resistances
MAH VLSI Scaling for Architects 6
Transistor Scaling

• People are building very short channel devices


– Shown are I-V curves for 15nm L pMOS
– And a short channel nMOS
• The structure is strange
– FinFET
– But you can make them work

MAH VLSI Scaling for Architects 7


Logic Gate Speed

• How does the speed of a gate depend on technology?


• Use a Fanout of 4 inverter metric

1 4 16

FO4

– Measure the delay of an inverter with Cout/Cin = 4

• Divide speed of a circuit by speed of FO4 inverter


– Get delay of circuit in measured in FO4 inverters
– Metric pretty stable, over process, temp, and voltage

MAH VLSI Scaling for Architects 8


Delay Tracking

12
Inverter Delays (log)

12

8.4 8.4

2 2

1.4 1.4

TT FF SS FS SF 2 2.5 3 3.5
Process Corner Power Supply (V)

MAH VLSI Scaling for Architects 9


Using FO4 Metric

• Is very useful way to estimate/compare circuit/logic


– Can compare two designs and normalize out technology
– Can set lower bound on speed of a circuit block
• If you know fanout of gate, you can estimate delay
– Complete theory is called logical effort
• Total Fanout = Electrical fanout * Logical Effort of gate
• Use an adder as an example
– Fastest 64 bit adders run in around 7 FO4 delays
• Dynamic adders
• Hasn’t changed very much recently (HP at ISSCC in 96)
– Can find bound on the delay
• Need to have fanout of 64 for Cin
• L.E. has a 2 input XOR and some carry logic

MAH VLSI Scaling for Architects 10


Memory Performance

40.0
• Can use FO4 delays
to look at memory Total Delay (no wire res.)
Decoder Delay (no wire res.)
access time too. Output Path Delay (no wire res.)
• Delay is log(Size) for 30.0 Total Delay (with wire res.)
an optimized design.
• Wire delay is
Delay (τ fo4)

important at larger 20.0


memories, but not a
dominant factor.
– Partition array,
10.0
use fewer thicker
wires.
• Notice access times
– 10-20 FO4 0.0
64Kb 256Kb 1Mb 4Mb 16Mb
Size

MAH VLSI Scaling for Architects 11


FO4 Inverter Delay Under Scaling

• Device performance will scale


– FO4 delay has been linear with tech
Approximately 0.36 nS/µm*Ldrawn at TT
(0.5nS/µm under worst-case conditions) 0.36 * Ldrawn
7

FO4 delay (nS)


• Easy to predict gate performance
5
– We can measure them
• Labs have built 0.04µm devices 3

– Key issue is voltage scaling 1

• Need to scale Vdd for power 2 1.5 1 0.5 0


• Hard since Vth does not scale Feature size (µm)

• Gate speed improvement might slow down


– Vth issues and gate oxide limitations

MAH VLSI Scaling for Architects 12


Circuit Power

• Is very much tied to voltage scaling


• If the power supply scales with technology
For a fixed complexity circuit
– Power scales down as α^3 if you run as same frequency
– Power scales down as α^2 if you run it 1/ α times faster
• Power scaling is a problem because
– Freq has been scaling at faster than 1/ α
– Complexity of machine has been growing
• This will continue to be an issue in future chips

• Remember scaling the technology makes a chip lower power!


MAH VLSI Scaling for Architects 13
Wire Scaling

• More uncertainty than transistor scaling


– Many options with complex trade-offs
• For each metal layer
– Need to set H, TT, TB, ε1, ε2, conductivity of the metal

TT ILDTop
SL SR

H Metal, ILDmiddle
ε2
W TB
ε1 ILDBot

MAH VLSI Scaling for Architects 14


Wire Capacitance

• Capacitance per micron is roughly constant


– Simple approx. = fringe (0.07fF/µm today) + 4 parallel plates
– Depends only on the ratio of the parameters
– Coupling becomes a large issue with increased εH/S ratio

ε1W/TT ILDTop

ε2H/SL
Metal, ILDmiddle
ε2H/SR

ε1W/TB ILDBot

MAH VLSI Scaling for Architects 15


Wire Resistance

• Resistance is simpler
– R/µm = ρ/wh
– Scales up as the technology shrinks
– Main reason that wire height has not scaled much

H1 R H1 1.4R
αH1 2R

W1 αW1 αW1

• Tradeoffs between height, width and wire pitch

MAH VLSI Scaling for Architects 16


Wire Layers

Not all wiring layers have the same characteristics


• Today have three types of levels
– M1
• Finest pitch, highest resistance, local interconnection in a cell
– M2-4?
• Normal routing level, most of the wires
– M5+
• Thick coarse metal, used for global wires
• When scaling forces thinner metal, create new top layer

MAH VLSI Scaling for Architects 17


Noise Issues

• Two main noise sources for wires


– Capacitance coupling
– Inductive coupling
• Capacitance coupling is mostly a nearest neighbor issue
– High aspect ratio wires make this worse
– Real push for low-κ dielectric between wires
• Inductive coupling
– Is much more complex to analyze
• Depends on where the return currents flow
– Reduce these problem by design constraints
• Gnd returns in buses, power and gnd planes (e.g. 21264)

MAH VLSI Scaling for Architects 18


Scaling Global Wires

• R gets quite a bit worse with scaling; C basically constant


Semi-global wire resistance, 1mm long Semi-global wire capacitance, 1mm long

0.4 Aggressive scaling Aggressive scaling


0.6
Conservative scaling Conservative scaling
0.3
Kohms

0.4
0.2 pF

0.2
0.1

0 0
0.25 0.18 0.13 0.1 0.07 0.05 0.035 0.25 0.18 0.13 0.1 0.07 0.05 0.035
Technology Ldrawn (um) Technology Ldrawn (um)
MAH VLSI Scaling for Architects 19
Scaling Module Wires

• R is basically constant, and C falls linearly with scaling


Semi-global wire resistance, scaled length Semi-global wire capacitance, scaled length

0.4 Aggressive scaling Aggressive scaling


0.6
Conservative scaling Conservative scaling
0.3
Kohms

0.4
0.2 pF

0.2
0.1

0 0
0.25 0.18 0.13 0.1 0.07 0.05 0.035 0.25 0.18 0.13 0.1 0.07 0.05 0.035
Technology Ldrawn (um) Technology Ldrawn (um)
MAH VLSI Scaling for Architects 20
Module Wires

These wires scale fairly well:


Semi-global wire, scaled length
0.5
Aggressive scaling
Wire delay/Gate delay

Conservative scaling
0.4

0.3

0.2

0.1

0
0.25 0.18 0.13 0.1 0.07 0.05 0.035
Technology Ldrawn (um)

Scaled wire delays stay pretty constant relative to gates


• Not a very big change
MAH VLSI Scaling for Architects 21
Is There a Module-Level Wire Problem?

This first cut seems to imply that scaled wires aren’t a problem
• Delay of these wires are scaling (mostly) with gate speed
• Long wires get worse, but pretty slowly
• So the job a designer (or CAD tool) see stays the same, right?

So what are we missing?


• Why are people working so hard on developing new tools?

Die complexity: what if the number of modules doubles?

MAH VLSI Scaling for Architects 22


Implications of Complexity Growth

9 modules, 22 exceptions 19
19 modules,
modules, 49
24 exceptions
exceptions

What can we do?


• Larger design teams?
• Longer design times?
• Better tools that have fewer exceptions per module
MAH VLSI Scaling for Architects 23
Keeping Design Time Reasonable

Suppose we want the total CAD tool exceptions to stay constant


• At a module level, we need 1/2 as many exceptions
• The required threshold for exceptions will grow
• The delay of those lines increases dramatically
Length in gate pitches

100K gate module


300 75K gate module THIS is important!
50K gate module
200 Current trend

100
Is this important?
0
0.25 0.18 0.13 0.10 0.07 0.05 0.035
Technology Ldrawn (um)

• So CAD tools need to handle increasingly longer wires


MAH VLSI Scaling for Architects 24
Global Wire Scaling

Now we examine global wire delay relative to gate delay


Semi-global wire, 1mm long
7
Aggressive scaling
Wire delay/Gate delay
6 Conservative scaling
5
4
3
2
1
0
0.25 0.18 0.13 0.1 0.07 0.05 0.035
Technology Ldrawn (um)

Fixed-length wires, relative to gates, worsen by 2x per generation


• This is a big problem

MAH VLSI Scaling for Architects 25


Designer Responses

• Wire engineering -- use wider wires or thicker wires


• Making wires wider will improve performance
• Resistance = k/W; Capacitance = C0 + b*W
• RC = kC0/W + kb; b << C0 so increasing W helps quite a bit
0.9
Delay (nS) for a 1cm wire

M3
0.8
M5
0.7
0.6
0.5
0.4
0.3
0.2

4 6 8 10 12 14 16
Wire width in lambdas
MAH VLSI Scaling for Architects 26
Designer Responses

• Circuit solution -- use repeaters


– Break the wire into segments
– Delay becomes linear with length
– Signal velocity = k ( FO4 Rw Cw )1/2

Rw/2 Rw/2 Rw/2

Cw Cw Cw Cw Cw CCwload Cload
4 4 4 4 4 4

MAH VLSI Scaling for Architects 27


When to Add Repeaters

• Repeaters have overhead and can introduce logic complexity


– Add when they reduce the delay; wire RC > 8 FO4
– Add sooner to reduce noise coupling issues
• Plot for 0.25µm minimum width wires:

14
M3
12
Delay (FO4)

10
8 Repeaters
6
4 Repeaters
M5
2
0
0 5 10 15 20
Distance (mm)

MAH VLSI Scaling for Architects 28


Signal Velocity for Repeated wires

• Under SIA scaling, pretty constant over many generations


• Under conservative scaling, slow change at sub-0.1µm techs
– Makes wire delay increase slowly
18 0
16 0
14 0 M1
Delay (ps/mm)

C on serva tive
12 0
10 0 S IA
M3
80
60
M5
40
20
0.25 0.2 0.15 0.1 0.05
Fe ature s ize (µm )
MAH VLSI Scaling for Architects 29
A Different View

• Wires are not as bad as they are painted in the media


• If you take a chip and scale it
– Keeping the same exact design
– The ratio of the wire delay to gate delay will change slowly
• Currently they both scale by roughly the scaling factor
• Wire get a little slower each generation
– Not a big issue
– Key is the logical span of the wire remained constant
• The problem is if you make the chip more complex
– Wires in general need to connect up more gates
– Logical span increases
– Get slower
• Circuit Tricks (repeaters) help some, but not enough
MAH VLSI Scaling for Architects 30
The World is Growing

• The problem associated with wires is really due to complexity


• Diagram shows the logical span you reach in a cycle
– It also show the logical span of a chip

Old view: a chip looks small to a wire

Logical chip size

Distance I can go in 1 cycle

New view: a chip looks really big to a wire


How is this going to affect chip design?

MAH VLSI Scaling for Architects 31


Computer Performance

100 ISPEC Performance vs Year

100
10

ISPEC
SpecInt95

ISPEC
1
10
80386
80486
0.1 Pentium
Pentium II

1
0.01
1993 1994 1995 1996 1997 1998 1999
Jan-85 Jan-88 Jan-91 Jan-94 Jan-97
Year

• How is scaling going to effect computer design?


– Use Hennessy Patterson Formula
– Delay = Number of Instructions * Cycles/Inst * Clock Period
• Performance = Clock Frequency/CPI
– Look at how these factors have been improving, and why
MAH VLSI Scaling for Architects 32
Computer Architect’s Job

• Convert transistors to performance


• Use transistors to
– Exploit parallelism
– Or create it (speculate)
• Processor generations
– Simple machine
• Reuse hardware
– Pipelined
• Separate hardware for each stage
– Super-scalar
• Multiple port mems, function units
– Out-of-order
• Mega-ports, complex scheduling
– Speculation
• Each design has more logic to
accomplish same task (but faster)
MAH VLSI Scaling for Architects 33
Architecture Scaling

• Plot of IPC
– Compiler + IPC 0.05

– 1.5x / generation 80386


0.04 80486
• Used all known tricks Pentium
Pentium II

SpecInt95 / MHz
– OOO is old idea 0.03
– Uses lots of wires
• What next? 0.02

– Wider machines
0.01
– Threads
– Speculation 0.00
• Guess answers to Dec-83 Dec-86 Dec-89 Dec-92 Dec-95 Dec-98

create parallelism
– Have high wire costs
MAH VLSI Scaling for Architects 34
Architecture Scaling Issues

Wire scaling is an issue for architecture


• Need to support a old program model
– Modest number of global resources
• Registers, memory port
• To execute in parallel
– Need to find the parallelism in instruction stream
• Many instruction decoders, needing to communicate
– Need to predict instruction stream well
• Large memory for prediction tables
– Need multiple functional units
– Need large ported central registerfiles
This will not scale:
• Machines are already starting to use internal clustering
MAH VLSI Scaling for Architects 35
Clock Frequency

1000
MHz

100

80386
80486
Pentium
Pentium II

10
Dec-83 Dec-86 Dec-89 Dec-92 Dec-95 Dec-98

• Most of performance comes from clock scaling


– Clock frequency double each generation
• Two factors contribute: technology (1.4x/gen), circuit design
MAH VLSI Scaling for Architects 36
Gates Per Clock

• Clock speed has been scaling faster than base technology


• Number of FO4 delays in a cycle has been falling
100.00
• Number of gates decrease
1.4x each generation
FO4 inverter delays / cycle

• Caused by:
– Faster circuit families
(dynamic logic)
80386
80486
– Better optimization
Pentium
Pentium II – Better micro-architecture
10.00
– Better adder/mem arch
Dec-83 Dec-86 Dec-89 Dec-92 Dec-95 Dec-98 • All this generally requires
more transistors
MAH VLSI Scaling for Architects 37
Gates Per Clock Limits

• Current SOA machines are at 16 FO4 gates per cycle


– Historical low values (Cray) were at this level
• Overhead for short tick machines grows rapidly
– Power
• Increases clock power per logic function
– Latency
• Flops are already 10-20% of cycle today
– Logic reach grows smaller
• What fits in a cycle (how many bits/gates) decreases
• Difficult to generate a clock at less than 8 FO4 gates

• Continued scaling of gates/clock will be hard


– Guessing slope will change soon
MAH VLSI Scaling for Architects 38
Performance Scaling

• Remember processor performance plots used to have two lines

Mainframe

uP

• Microprocessors and mainframes


– Mainframes had maxed out and improved at technology rate
– What will happen with microprocessors?

MAH VLSI Scaling for Architects 39


Will Processor Performance Slow Down?

Yes and No
• Uniprocessor performance growth will slow down
– Lastest jump is getting to the 16ish FO4 cycles
• People will change the benchmarks to fix this problem
– More data parallel application
• Multi-media / streaming applications
– More threaded applications

Explicitly parallel machine scale quite nicely


MAH VLSI Scaling for Architects 40
VLSI Scaling Summary

• Scaling allows people to build more complex machines


– That run faster too
• It does not to first order change the difficulty of module design
– Module wires will get worse, but only slowly
– You don’t think to rethink your wires in your adder, memory
• Or even your super-scalar processor core
– It does let you design more modules
• Continued scaling of uniprocessor performance is getting hard
– Machines using global resources run into wire limitations
– Faster clock ticks (in FO4) is getting very hard
• Power and logical reach issues in going much under 16 FO4s
• Machines will have to become more explicitly parallel
MAH VLSI Scaling for Architects 41