Вы находитесь на странице: 1из 57

ECE260B CSE241A

Winter 2005
Design Styles
Multi-Vdd/ Vth Designs
Website: http:/ / vlsicad.ucsd.edu/ courses/ ece260bw05

ECE 260B CSE 241A Design Styles 1

http:/ / vlsicad.ucsd.edu

The Design Problem

Source: sematech97
A growing gap between design complexity and design productivity
ECE 260B CSE 241A Design Styles 2

http:/ / vlsicad.ucsd.edu

Design Methodology

Design process traverses iteratively between three abstractions:


behavior, structure, and geometry
More and more automation for each of these steps
ECE 260B CSE 241A Design Styles 3

http:/ / vlsicad.ucsd.edu

Behavioral Description of Accumulator

entity accumulator is
port (
DI : in integer;
DO : inout integer := 0;
CLK : in bit
);
end accumulator;
architecture behavior of accumulator is
begin
process(CLK)
variable X : integer := 0; -- intermediate variable
begin
if CLK = '1' then
X < = DO + D1;
DO <= X;
end if;
end process;
end behavior;

ECE 260B CSE 241A Design Styles 4

Design described as set of input-output


relations, regardless of chosen
implementation

Data described at higher abstraction


level (integer)

http:/ / vlsicad.ucsd.edu

Structural Description of Accumulator

entity accumulator is
port ( -- definition of input and output terminals
DI: in bit_vector(15 downto 0) -- a vector of 16 bit wide
DO: inout bit_vector(15 downto 0);
CLK: in bit
);
end accumulator;
architecture structure of accumulator is
component reg -- definition of register ports
port (
DI : in bit_vector(15 downto 0);
DO : out bit_vector(15 downto 0);
CLK : in bit
);
end component;
component add -- definition of adder ports
port (
IN0 : in bit_vector(15 downto 0);
IN1 : in bit_vector(15 downto 0);
OUT0 : out bit_vector(15 downto 0)
);
end component;
-- definition of accumulator structure
signal X : bit_vector(15 downto 0);
begin
add1 : add
port map (DI, DO, X); -- defines port connectivity
reg1 : reg
port map (X, DO, CLK);
end structure;

ECE 260B CSE 241A Design Styles 5

Design defined as composition of


register and full-adder cells (netlist)
Data represented as {0,1,Z}
Time discretized and progresses with
unit steps
Description language: VHDL
Other options: schematics, Verilog
http:/ / vlsicad.ucsd.edu

Implementation Methodologies

Digital Circuit Implementation Approaches

Semi-custom

Custom

Cell-Based

Standard Cells
Compiled Cells

ECE 260B CSE 241A Design Styles 6

Macro Cells

Array-Based

Pre-diffused
(Gate Arrays)

Pre-wired
(FPGA)

http:/ / vlsicad.ucsd.edu

Full Custom

Hand drawn geometry


All layers customized
Digital and analog
Simulation at transistor level
High density
High performance
Long design time
Magic Layout Editor
(UC Berkeley)
ECE 260B CSE 241A Design Styles 7

http:/ / vlsicad.ucsd.edu

Symbolic Layout

D D

3
O ut

In

Dimensionless layout entities


Only topology is important
Final layout generated by
compaction program

G N D
Stick diagram of inverter
ECE 260B CSE 241A Design Styles 8

http:/ / vlsicad.ucsd.edu

Standard Cells

Organized in rows
Cells made as full custom by

Logic Cell
Feedthrough Cell

All layers customized


Digital with possible special
analog cells

Simulation at gate level


(digital)

Medium-high density
Medium-high performance
Reasonable design time
ECE 260B CSE 241A Design Styles 9

Rows of Cells

vendor (not user)

Routing
Channel

Functional
Module
(RAM,
multiplier, )

Routing channel
requirements are
reduced by presence
of more interconnect
layers
http:/ / vlsicad.ucsd.edu

Standard Cell Example

[Brodersen92]
ECE 260B CSE 241A Design Styles 10

http:/ / vlsicad.ucsd.edu

Standard Cell - Example

3-input NAND cell


(from Mississippi State Library)
characterized for fanout of 4 and
for three different technologies

ECE 260B CSE 241A Design Styles 11

http:/ / vlsicad.ucsd.edu

Automatic Cell Generation

Random-logic layout
generated by CLEO
cell compiler (Digital)
ECE 260B CSE 241A Design Styles 12

http:/ / vlsicad.ucsd.edu

Module Generators Compiled Datapath

buffer

adder

reg1

reg0

bus2

mux

bus0

bus1
routing area

feed-through

bit-slice

Advantages: One-dimensional placement/routing problem

ECE 260B CSE 241A Design Styles 13

http:/ / vlsicad.ucsd.edu

Macrocell-Based Design

Predefined macro blocks (uP, RAM, etc.)


Macro blocks made as full custom by vendor (IP blocks)
All layers customized
Digital and some analog
Macrocell
Simulation at behavior
or gate level

High density
High performance
Short design time
Use standard on-chip busses
System on a chip (SOC)
ECE 260B CSE 241A Design Styles 14

Interconnect Bus

Routing Channel

http:/ / vlsicad.ucsd.edu

Macrocell Design Methodogoly

Floorplan:
Defines overall
topology of design,
relative placement of
modules, and global
routes of busses,
supplies, and clocks

SRAM

Routing Channel

SRAM

Data paths

Standard cells

Video-encoder chip
[Brodersen92]
ECE 260B CSE 241A Design Styles 15

http:/ / vlsicad.ucsd.edu

Gate Array

Predefined transistors connected via metal


Two types: channel based, sea of gates
Only metal layers customized
Fixed array sizes
Digital cells in library
Simulation at gate level (digital)
Medium density
Medium performance
Reasonable design time
ECE 260B CSE 241A Design Styles 16

rows of
uncommitted
cells

routing
channel

http:/ / vlsicad.ucsd.edu

Gate Array Primitive Cells

polysilicon

In1 In 2

In3 In4

VD D
metal
possible
contact

GND

Out

Uncommited
Cell

ECE 260B CSE 241A Design Styles 17

Committed
Cell
(4-input NOR)

http:/ / vlsicad.ucsd.edu

Sea-of-gate Primitive Cells

O x id e - i s o l a t io n
PM O S
PMOS

NM OS

NM OS
NM OS

Using oxide-isolation

ECE 260B CSE 241A Design Styles 18

Using gate-isolation

http:/ / vlsicad.ucsd.edu

Sea-of-gates

Random Logic

Memory
Subsystem
LSI Logic LEA300K
(0.6 m CMOS)
ECE 260B CSE 241A Design Styles 19

http:/ / vlsicad.ucsd.edu

Prewired Arrays
Programmable logic blocks
Programmable connections between logic blocks
No layers customized (standard devices)
Digital only
Low-medium performance
Low-medium density
Programmable: SRAM, EPROM, Flash,
Anti-fuse, etc.

Easy and quick design changes


Cheap design tools
Low development cost
High device cost
NOT a real ASIC

ECE 260B CSE 241A Design Styles 20

Courtesy Altera Corp.


http:/ / vlsicad.ucsd.edu

Programmable Logic Devices

PLA
ECE 260B CSE 241A Design Styles 21

PROM

PAL
http:/ / vlsicad.ucsd.edu

EPLD Block Diagram

Primary inputs

Macrocell

Courtesy Altera Corp.


ECE 260B CSE 241A Design Styles 22

http:/ / vlsicad.ucsd.edu

Field-Programmable Gate Arrays - Fuse-based

I/O B u ffe r s

P r o g r a m / T e s t / D ia g n o s t i c s
V e r ti c a l ro u te s

I/O B u ffe rs

I/O B u ffe r s

Standard-cell like
floorplan

R o w s o f lo g i c m o d u le s
R o u tin g c h a n n e ls

I/O B u ffe r s

ECE 260B CSE 241A Design Styles 23

http:/ / vlsicad.ucsd.edu

Interconnect

P r o g r a m m e d in t e r c o n n e c t io n

I n p u t/o u tp u t p in

C e ll
A n tifu s e
H o riz o n ta l
tra c k s

V e r t ic a l t r a c k s

ECE 260B CSE 241A Design Styles 24

Programming interconnect using anti-fuses


http:/ / vlsicad.ucsd.edu

Field-Programmable Gate Arrays - RAM-based

CLB

CLB
switching matrix

Horizontal
routing
channel
Interconnect point
CLB

CLB

Vertical routing channel

ECE 260B CSE 241A Design Styles 25

http:/ / vlsicad.ucsd.edu

RAM-based FPGA - Basic Cell (CLB)

C o m b in a tio n a l lo g ic

S to ra g e e l e m e n ts

R
A
B /Q 1 /Q 2

D
Any function of up to
4 variables

C /Q 1 /Q 2

in

B /Q 1 /Q 2
C /Q 1 /Q 2

CE

D
A

Q 1

Any function of up to
4 variables

Q 2

D
E

CE

C lo c k

C E

Courtesy of Xilinx
ECE 260B CSE 241A Design Styles 26

http:/ / vlsicad.ucsd.edu

RAM-based FPGA

Xilinx XC4025
ECE 260B CSE 241A Design Styles 27

http:/ / vlsicad.ucsd.edu

High Performance Devices


Mixture of full custom, standard cells and macros
Full custom for special blocks: Adder (data path), etc.
Macros for standard blocks: RAM, ROM, etc.
Standard cells for non critical digital blocks

ECE 260B CSE 241A Design Styles 28

http:/ / vlsicad.ucsd.edu

Global Signaling and Layout

Global signaling and layout


optimization

Multi-Vdd
Static power analysis
Multi-Vth + Vdd + sizing

ECE 260B CSE 241A Design Styles 29

D. Sylvester, DAC-2001

http:/ / vlsicad.ucsd.edu

Global Signaling
Current global signaling paradigm insert large static
CMOS repeaters to reduce wire RC delay

Impending problems:

Too many repeaters


- 180nm processors: 22K repeaters (Itanium), 70K (Power4)
- Project 1-1.5M repeaters at 45-65nm technologies

Too much power


- Many large repeaters = significant static and dynamic power

Too much noise


- Repeater clustering complicates power distribution
- Inductive coupling across wide bus structures

ECE 260B CSE 241A Design Styles 30

D. Sylvester, DAC-2001

http:/ / vlsicad.ucsd.edu

Cell Layout Optimization


Advanced layout techniques must allow

Continuous individual device sizing

Variable p/n ratios

Tapered FET stacking sizes

Arbitrary Vth assignments within gates

First cut: Cadabra 15-22% power reduction using 1st


two approaches under fixed footprint constraint
Optimize specific
instances of
standard gates

Ref: Hurat, Cadabra

GDSII Import

ECE 260B CSE 241A Design Styles 31

D. Sylvester, DAC-2001

Compact fixed width

http:/ / vlsicad.ucsd.edu

Multi-Vdd
Global signaling and layout optimization

Multi-Vdd
Static power analysis
Multi-Vth + Vdd + sizing

ECE 260B CSE 241A Design Styles 32

D. Sylvester, DAC-2001

http:/ / vlsicad.ucsd.edu

Multi-Vdd Status
Idea: Incorporate two Vdds to reduce dynamic power
Limited to a few recent Japanese multimedia processors

Example 0.3 m, 75MHz, 3.3V media processor (Toshiba)


- Total power savings of 47% in logic, 69% in clock

Dynamic voltage scaling of mobile processors


- Transmeta Crusoe, Intel Speedstep, etc.
- Not considered in this talk

Very powerful technique currently applied only in


low-performance designs

Mentality: todays high performance parts arent limited by


power

ECE 260B CSE 241A Design Styles 33

D. Sylvester, DAC-2001

http:/ / vlsicad.ucsd.edu

Lower Power Via Rich Replacement


Media processors and

60-70% of paths have delay


half the clock period

After replacement, most


paths become near critical

What about high-speed

% of total
paths

other low speed designs


have many non-critical
paths

microprocessors?

Path delay (normalized


to clock period)

ECE 260B CSE 241A Design Styles 34

D. Sylvester, DAC-2001

http:/ / vlsicad.ucsd.edu

Similar Story For High-Performance


IBM 480 MHz PowerPC shows over 50% of paths have
delay less than half the clock period

Implies that high-performance designs can benefit from multiVdd

Ref: Akrout, JSSC98


ECE 260B CSE 241A Design Styles 35

D. Sylvester, DAC-2001

http:/ / vlsicad.ucsd.edu

Resizing Is Not The Right Answer

Post-synthesis optimizations resize gates to recover


power on non-critical paths

Looks similar to pre- and post-replacement figures in media


processor

Before postsynthesis resizing

After postsynthesis resizing

This is the wrong approach for


nanometer design!
ECE 260B CSE 241A Design Styles 36

D. Sylvester, DAC-2001

Ref: Sirichotiyakul, DAC99


http:/ / vlsicad.ucsd.edu

Multi-Vdd Instead of Sizing


Power ~C Vdd2 f, where f is fixed
Key: Reducing gate width impacts power sub-linearly

Interconnect capacitance is not affected

Reducing supply voltage cuts power quadratically

All capacitive loads have lower voltage swing

How can we minimize delay penalty at low Vdd?

ECE 260B CSE 241A Design Styles 37

D. Sylvester, DAC-2001

http:/ / vlsicad.ucsd.edu

Challenges For Multi-Vdd


Area overhead

Toshiba reported 7% rise in area due to placement restrictions,


level converters, additional power grid routing

EDA tool support for the above issues (placement, dual


power routing)

Noise analysis

Additional shielding required between Vdd,low and Vdd,high


signals?

Including clock network

ECE 260B CSE 241A Design Styles 38

D. Sylvester, DAC-2001

http:/ / vlsicad.ucsd.edu

Static Power
Global signaling and layout optimization
Multi-Vdd

Static power
Multi-Vth + Vdd + sizing

ECE 260B CSE 241A Design Styles 39

D. Sylvester, DAC-2001

http:/ / vlsicad.ucsd.edu

Static Power
Why do we care about static power in non-portable
devices?

Standby power is wasted -- leaves fewer Watts for


computation
Worsens reliability by raising die temperatures

Leakage current is a function of Vth and subthreshold


swing (Ss) (x10 at operating vs. room temp!)


V th

I off 10 10

Ss

A/ m

Ss expected to remain at 80-85 mV/dec (room temp)

Device technology may cut this by ~


20%

Vth reductions are mandated by scaling Vdd

V has been around VddD./5


Sylvester, DAC-2001

th Design Styles 40
ECE 260B CSE 241A

http:/ / vlsicad.ucsd.edu

Current Status

No sub-1V technologies demonstrate good on/ off current

performance (yet expect improvements in production)


Oxide scaling is running out of steam; overall 3~x Ioff per node
Reference

ITRS
node

Tox () (electrical)

Vdd

Ion
(A/ m)

Ioff
(nA/ m)

Intel,00

50-70

18

0.85

514

100

Samsung,00

100

21

1.2

860

10

NEC,00

70

25

1.2

697

10

TI,99

100

27

1.2

800

10

Intel,99

70

32

1.2

650

NEC,00

100

13 (physical)

1.0

723

16

ITRS 2000

100

12-15 (physical)

1.2

750

13

ITRS 2000

70

8-12 (physical)

0.9

750

40

ITRS 2000

50

6-8 (physical)

0.6

750

80

ITRS 2001

45

11 (uses high-k)

0.6

1250

3000

ECE 260B CSE 241A Design Styles 41

D. Sylvester, DAC-2001

Working
numbers

http:/ / vlsicad.ucsd.edu

Leakage Suppression Approaches


Dual-Vth (most common)

Low-Vth on critical paths, high-Vth off

Vdd

Only cost is additional masks

MTCMOS

Pull Up

Series inserted high-Vth device cuts


leakage current when off (sleep mode)

Vout

Delay and area penalties, control


device sizing is critical

Pull Down

Other techniques

Substrate biasing to control Vth

Dual-Vth domino
- Use low-Vth devices only in
evaluate paths

ECE 260B CSE 241A Design Styles 42

D. Sylvester, DAC-2001

Vcontrol

Parasitic
Node

High Vth Device

http:/ / vlsicad.ucsd.edu

Can Gate-length biasing help leakage reduction?


Reduce leakage?
1.2
1
0.8
Leakage
Delay

0.6
0.4
0.2

13
0
13
1
13
2
13
3
13
4
1
35
13
6
13
7
1
38
13
9
14
0

Variation of leakage and


delay (each normalized to
1) for an NMOS device in
an industrial 130nm
technology

Gate-length (nm)

Reduce leakage variability?


Leakage Variability

Leakage

Biasing

Gate-length
ECE 260B CSE 241A Design Styles 43

Leakage

Leakage Variability

Gate-length
http:/ / vlsicad.ucsd.edu

Gate-length Biasing
First proposed by Sirisantana et al.

Comparative study of effect of doping, tox and gate-length


Large bias used, significant slow down

Small bias

Little reduction in leakage beyond 10% bias while delay degrades


linearly
Preserves pin compatibility
Technique applicable as post-RET step

Salient features

Design cycle not interfered


Zero cost (no additional masks)

ECE 260B CSE 241A Design Styles 44

http:/ / vlsicad.ucsd.edu

Granularity
Technology-level
All devices in all cells have one biased gate-length

Cell-level
All devices in a cell have one biased gate-length

Device-level
All devices have independent biased gate-length
Simplification: In each cell, NMOS devices have one gate-length and PMOS
devices have another

ECE 260B CSE 241A Design Styles 45

http:/ / vlsicad.ucsd.edu

Device-Level Leakage Reduction

Leakage saving with a delay penalty of up to 10%


(Simplified device level biasing)
40
35
30
25
Low Vt

20

Nom Vt
High Vt

15
10
5
0
INVX4

ECE 260B CSE 241A Design Styles 46

NANDX4

BUFX4

ANDX6

http:/ / vlsicad.ucsd.edu

Circuit level

Bias gate-length for non-critical cells


Library extended with each cell having a biased version
Benefits analyzed in conjunction with Multi-VT
assignment and in isolation

SVT-SGL

DVT-SGL

SVT-DGL

DVT-DGL

ECE 260B CSE 241A Design Styles 47

http:/ / vlsicad.ucsd.edu

Normalized Leakage

Results: Leakage Reduction

1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0

SVT-SGL
SVT-DGL
DVT-SGL
DVT-DGL

c5315

c6288

c7552 alu128

With less than 2.5% delay penalty

Design Compiler used for VT assignment and gate-length biasing


Better results expected with Duet (academic sizer from
Michigan)
ECE 260B CSE 241A Design Styles 48

http:/ / vlsicad.ucsd.edu

Results: Leakage Variability


Leakage distribution for the
testcase alu128
Traces shown
Unbiased circuit
Technology level biasing
Uniform biasing

60.00%

Percentage Reduction in
Leakage Spread

50.00%
40.00%
30.00%
20.00%
10.00%
0.00%
c5315
ECE 260B CSE 241A Design Styles 49

c6288

c7552

alu128
http:/ / vlsicad.ucsd.edu

Futures

Construction of effective biasing based leakage


optimization heuristics

Gate-length selection at true device-level granularity


Evaluation of gate-length biasing at future technology
nodes

ECE 260B CSE 241A Design Styles 50

http:/ / vlsicad.ucsd.edu

Multi-Vth + Vdd + Sizing


Global signaling and layout optimization
Multi-Vdd
Static power analysis

Multi-Vth + Vdd + sizing

ECE 260B CSE 241A Design Styles 51

D. Sylvester, DAC-2001

http:/ / vlsicad.ucsd.edu

Multi-Everything
Need an approach that selects between speed, static
power, and dynamic power

Should be scalable to nanometer design

Rules out dual-Vth domino or other dynamic logic families (low


supplies kill performance advantages)

Techniques mentioned so far

Flexible, optimized cell layouts

Multi-Vdd

Dual-Vth

Put them all together


ECE 260B CSE 241A Design Styles 52

D. Sylvester, DAC-2001

http:/ / vlsicad.ucsd.edu

Multi-Vdd Can Leverage Vths


Existing designs using multi-Vdd do not alter Vth in lowVdd cells

Highly sub-optimal, delay is fully penalized

Limits cell replacement limits power savings

Much better solution: reduce Vth in low-Vdd cells to

carefully balance delay, static power, and dynamic


power

Enforce technology scaling within a chip whenever we reduce


Vdd, we also reduce Vth to maintain speed

ECE 260B CSE 241A Design Styles 53

D. Sylvester, DAC-2001

http:/ / vlsicad.ucsd.edu

Multi-Vdd + Vth Negates Delay Penalty


Delay ~CVdd/Ion

Scenarios

Constant Vth (current paradigm)

Scale Vth to maintain constant static power

Scale Vth to reduce static power linearly with Vdd

Delay penalty is substantially offset

I is very sensitive to V
on

th

at Vdd < 1V
Pstatic reduces with Vdd due
to linear term and smaller
Ioff (Ion and DIBL )

ECE 260B CSE 241A Design Styles 54

D. Sylvester, DAC-2001

http:/ / vlsicad.ucsd.edu

Now Add Sizing


Multi-Vdd + multi-Vth + sizing/cell layout optimization

attacks power from many angles (multi-dimensional)

Depending on criticality and switching activities, noncritical gates can be:

Assigned Vdd,low

Assigned Vdd,low + lower Vth

Assigned Vth,high

Downsized (at the individual transistor level if advantageous)

Assigned Vdd,low and upsized


- For gates that cannot tolerate Vdd,low delay, this can be power
efficient

And others

ECE 260B CSE 241A Design Styles 55

D. Sylvester, DAC-2001

http:/ / vlsicad.ucsd.edu

Summary
Power density must saturate to maintain affordable
packaging options

50 W/ cm2 means 200-250W for future large MPUs

Dynamic thermal management saves 25% on packaging power


budget

Multi-Vdd will leverage multiple Vths to offset delay


penalty at low Vdd

More widespread re-assignment to Vdd,low

Use Vdd first instead of re-sizing to take advantage of large


path slacks

Anticipated power savings of 50-80%

Static power also addressed through multi-Vth + Vdd +


sizing

Vth difficult to control in ultra-short channels


DAC-2001
Intra-cell V assignment D.+Sylvester,
MTCMOS/variants
+ sleep modes
http:/ / vlsicad.ucsd.edu

CSE 241A Design Styles


ECE 260B
th 56

Next Week: Project Meetings

ECE 260B CSE 241A Design Styles 57

D. Sylvester, DAC-2001

http:/ / vlsicad.ucsd.edu

Вам также может понравиться