Вы находитесь на странице: 1из 60

I N VE N TI V E

Low Power ICD Talks


Vivek Shukla

Topics

 Introduction
 Power Dissipation basic
 Existing Low Power Techniques and Issues for
 Advance LP Techniques (under exploration)

October 24, 2007

Introduction

1000
Pentium 4 proc

100

1000's of
Watts?

Power
(Watts) 10
1

Pentium proc
8086

0.1
1970

386

8080

1980

1990

2000

2010

2020

Unconstrained power will reach 1,000


s of watts
1,000s
3

October 24, 2007

Power Density will Get Even Worse

10,000

Suns Surface

Rocket Nozzle

Power Density
(W/cm2)

1,000

Nuclear Reactor
100

Hot Plate
8086
10 4004
8008
8080

8085

286

Pentium
processors

386
486

1
70
4

October 24, 2007

80

90

00

10

Motivation

Portability
Extending battery life
Battery technologies scales-up slowly 150Wh/kg today vs. 75Wh/kg in 1990
1 Kg Ni Cad battery could power 1 hrs for P4 can power Centrino for 4 Hour
Low power dissipation as a product feature in itself
Enabling portable devices to be more powerful and feature-rich

Packaging
High power dissipation leads to expensive packaging and cooling systems
~ 1W: inexpensive plastic package limit
~ 10W: Ceramic package limit
~ 10W/cm2: limit for convection cooling
~ 50W/cm2: limit for forced-air cooling

Reliability
High Product life time

October 24, 2007

Sources of Power Dissipation in CMOS


Ptotal = CLVDD2fclka01 + VDDIshort-circuit + VDDIleakage
Power in a CMOS inverter is governed by the 3 part equation above

Dynamic (switching) power

Leakage Power

Subthreshold conduction getting bigger due to aggressive scaling, temperature, etc.


Reverse leakage of diodes (relatively small)
Possible gate tunneling current in future technologies

Short-circuit (crowbar) current

Currently the largest part, but percentage getting smaller

Both pull-up and pull-down devices are partially conducting for a small, but finite
amount of time
Can be modeled as some fraction of dynamic current

October 24, 2007

Sources of Power Dissipation: Switching

One half of the power from the supply is


consumed in the pull-up network (PMOS) and
the remaining half is stored in CL when Vout
makes 10 transition

During 01 transition the charge stored in CL is


dumped via the pull-down network (NMOS)

Power = (Energy/Transition)*(Transition Rate)


= CLVDD2 * f01
= CLVDD2 * fclk* a01
= CswitchedVDD2fclk
where Cswitched = CL*a01 and
a01 = probability of 01 transition
Dynamic power therefore can be reduced by
Scaling down the supply voltage VDD
Reducing the switching probability thru
architectural means
Scaling down the frequency as per
throughput demands
Optimizing/reducing the load capacitance
(Device Scaling)

October 24, 2007

Sources of Power Dissipation: Short-Circuit

Due to finite input transition time both NMOS and PMOS conduct for a small, but
finite duration, thus providing a resistive path btw VDD and GND

Typically less than 10% of the total dynamic power

The short-circuit current Isc depends on the ratio of input to output transition times
(higher the ratio, more is the duration for which both the devices are ON, higher the
dissipation due to short-circuit current)

Can be minimized by balancing out the input and output rise times

Can be virtually eliminated by making VDD less than (VTN+|VTP|)


October 24, 2007

Sources of Power Dissipation: Leakage


I1 : pn junction reverse bias current
I2 : Subthreshold conduction due to weak inversion
I3 : Drain-induced barrier lowering (DIBL)
I4 : Gate-induced drain leakage (GIDL)
I5 : Punchthrough
I6 : Narrow width effect
I7 : Gate oxide tunneling
I8 : Hot carrier injection (HCI)

Significant contributor to standby power

The most dominant one among these is the


subthreshold leakage current (I2) due to
constant lowering of VTH with scaling (see
the exponential dependence over VTH and
also see the sensitivity w/ temperature)

There are several techniques to contain this


viz. using dual-VT, multi-VT libraries, using
MTCMOS technology, using VTCMOS
technologies, using Back-biasing etc.

October 24, 2007

Advanced

Basic

Low-Power Techniques

10

Power reduction
technique

Leakage
power

Dynamic
power

Timing
penalty

Area
penalty

Implement.
impact

Synth, Formal &


Test Impact

Verification
impact

Area optimization

1.1X

10%

0%

-10%

None

None

None

Multi-Vt optimization

6X

0%

0%

2 to -2%

Low

None

None

Clock gating

0X

20%

0%

<2%

Low

Low

None

Multi-supply voltage (MSV)

2X

40-50%

0%

<10%

Medium

Medium

Low

Power shut-off (PSO)

10-50X

~0%

4-8%

5-15%

Mediumhigh

High

High

Dynamic and Adaptive


Voltage Frequency Scaling
(DVFS and AVS)

2-3X

40-70%

0%

<10%

High

High

High

Substrate Biasing

10X

10%

<10%

High

Medium-High

Medium

October 24, 2007

PSO in std cell based design


Fine Grain Power Switches -Eg.

Coarse Grain Power Switches


Real
SLEEP
VSS

A1

11

Virtual
Vss

Virtual
Vss

Buffered Switch

SLEEP
Real
VSS

SLEEP

A2

SLEEP

Real
VSS

Un-buffered Switch

Real
VSS

Fine grain

Coarse grain

Power gate size

Worst case switching (30% area)

Actual switching (5% area)

Gate control slew rate

Always-on buffer network

Always-on buffer by abutment

Simultaneous switching
capacitance

No issue

Needs to be addressed

Power gate leakage

>30%

<5%

October 24, 2007

PSO in std cell based design (contd..)


Vdd
PSE

VddC

B: PD3 Shut down

D: Active
Iso.

SRP
GFF
FF

shutoff

Switchable Power Domain


(PD1)

FF

Note:
switch cell
has 2 buffers
built-in with
different directions

iso_en
En_out

Vss

En_in

Filler
Breaker
Corner

En_out_1

Forms contiguous ring


Prevents additional leakage
Divides into separate gate control groups
Used with feed-thru enable signal

PD1

Left Offset
(150um)

Acts like corner cell in pad ring

Buffer-only (no switch) / switch-only (no buffer)


Allows flexible control of buffer tree

12

October 24, 2007

En_in

Column Pitch
(200um)

En_out_2

En_out_3

PSO in std cell based design (contd..)

13

Ring
Ring(s) of switches enclose the power domain fully or partially
Switches placed outside the power domain
Switch cell treated as hard macro
Often used with hard macros (not allowed to touch inside)
More IRdrop
Better current distribution
Column
Columns of switches inside power domain
Switches placed in the standard cell rows
Switch is a standard cell
Often used inside hierarchical (soft) blocks
Lesser IRdrop
More prone to rush current issue
Needs careful EM checks

October 24, 2007

PSO in std cell based design (contd..)

Key in PSO design apart of PSO insertion

Power up and rush current

Verification

14

October 24, 2007

More power rails


Stacked via requirements
EM

Testability

RTL simulations
Low power insertion checks
CPF verification

DFM

Dynamic IRdrop becoming must


Optimum nos of switch
Smooth power up

Coverage on the logic on the Restore and Save signals

ESD
IRdrop aware timing analysis

PSO in std cell based design (contd)


Low Power Insertion Checks

PMM

1
PD

PDM1

PDM2

OFF

iA

ISO_EN

iB
X

Good

0
0.8V

Structural/Rule Checking

15

User defines rules for crossings, isolation type, and location




Conformal LP reports missing or redundant isolation/ level shifter cells

Conformal LP reports wrong isolation cell type

Conformal LP reports bad level shifter direction

Conformal LP reports wrong isolation cell / level shifter domain location

October 24, 2007

ON

Missing

LH
ISO

1.2V

PSO in std cell based design


Low Power Insertion Checks

Top-down Single-pass Synthesis

Silicon Virtual Prototype

Switch cell Insertion


(for MTCMOS)
Placement including
SRPG/Level shifters/Isol. cells
Power Routing
Low Power
Clock Tree Synthesis
Domain-aware Post-CTS
Optimization
Domain Aware NanoRoute

Conformal Low Power

Power Grid Synthesis

Equivalence checking for Low Power design

Ensure low power optimizations do not introduce logical errors


Verify gated clocks, gated signals, de-cloning, and re-cloning of
gated clocks
Check State Retention mapping from RTL to gate
Check corresponding presence of Isolation and level shifter during
implementation

Power domain structural and functional checks

Ensure proper insertion of low power cells


Ensure proper connectivity of low power cells
Formally validate isolation function
Formally validate state retention function
Supports both logical and physical (power aware) netlists

IR-Aware Timing/SI Opt.


Decap insertion
Sign-off

16

October 24, 2007

Transistor Electrical Verification

Detect Sneak (leakage) paths across power domain boundaries

PSO in std cell based design (contd...)


Low Power Test
Test the Low Power Design, Reduce Power During Test
Insert the required PowerAware test DFT
Test Access Mechanism
(PTAM)

Power Aware DFT

PD1

Top
PD1

PD2

Power-aware scan chains

Isolation logic stressed

SR

ATPG for Power


Structures

Retention Flops
v2

v1

17

October 24, 2007

PMU
PD4

Encounter Test Model


has test modes that reflects
power modes
Power domains verified for
isolation and scan integrity

Mem
PTAM

Level shifters are tested

Power-Aware
Test Model

ISO

PD2

Core
PD3

Reduce Power
during Test

ATPG can process each


power mode
Low Power scan vectors
reduce scan-shift power
Runtime MBIST scheduling
reduces memory test power
Limited Pin testing reduces
IO power switching

PSO in std cell based design (cont)


Power Analysis

18

Power-gating goals/tasks
Power-switch on overall IR drop
Modelling the power-switch as on
Running IR drop on entire power-grid, both global and switched at once
Power-up
Simulate as the power-switch is turned on
Capture power-switch current behavior
IR drop effect on global grid and neighbors when block powering-up
Use captured current behavior from previous step and feed into rail analysis

October 24, 2007

PSO in std cell based design (cont)


Power Analysis

19

Impact on IR drop and EM


Power switches modeled as resistors in power grid view
Solution flags if switches enter saturation (I/Idsat PI)
Support steady-state on and off
Off-state use leakage value of switch

Power Consumption of steady-state on and off


Power savings in different modes

Power-up analysis
Fastmos simulator used for power-up simulation (UltraSim)
Dynamic currents captured through power switches

Impact of power-up on global grid


Dynamic VSDG rail analysis uses captured currents from power-up analysis to
show impact of power-up on surrounding logic

October 24, 2007

How Many Power Switches?

Two-part approach
1. Steady state analysis
To monitor IR drop through switches
VoltageStorm analyzes for IR drop
VoltageStorm reports power switches
operating in saturation
2. Dynamic analysis
To monitor & control power ramp-up
VoltageStorm reports block
power-on time
Too fast  latch-up
Too slow  limits performance

20

October 24, 2007

Block Power-up/Down Analysis and


Global Grid Verification

VDD

Control

Logic

Circuit
Netlist

1. Create circuit netlist

Circuit
Netlist

Inputs
clamped

Outputs
correctly
loaded

2. Simulate with UltraSim


VDD

Circuit
Netlist

Capture Dynamic
Current in PGV

3. Create Dynamic Power Grid Views


21

October 24, 2007

Load full-chip power RC network


with PGVs and analyze
4. Analyze top level grid in VSDG

MTCMOS Power-On

PowerMeter generates data to drive spice


simulation using Ultrasim
Netlist sensitized to the virtual power domain
Use existing sub-circuit netlist
Generate sub-circuit netlist from .cl
Signal loading dspf (lumped C)
Voltage Source file
Template Stimulus file
DC voltages
PWL for control logic derived from TWF file
QX can generate RC network of Virtual power net
Potential capacity limitations
Analyse to see Ton differences
Not used to date
QX generates RC network of control signals
Important to capture delay in controlling swithes
User simulates power-on conditions
Analyzes ramp-up time to steady-state
UltraSim also captures current behavior through
power-transistors
Leverages existing UltraSim commands used
within integration inside VST (.usim_ir)
Generates binary current data files (.pti)

QX

22

October 24, 2007

PowerMeter
Toplevel
Circuit File
RC grid

Signal
Loading

Netlist
Template
Stimulus

Voltage
Sources

UltraSim
Spice
waveforms
& Results

Power-transistor
dynamic
currents (pti)

PSO for memories

Why Memory Shut-off


On-chip memory is increasing
Memory increase result in higher leakage
Activity factor for the large memories is less so less active
power
Memories already have Higher L devices (lesser Sub-threshold
leakage)
Below 65nm process, Junction leakage starts getting
dominating factor

Reduced standby/average power by power down is absolute necessary

23

October 24, 2007

PSO for memories


 Memory Shut-off can be
 Selective shut-off
 Retention Memory
 Complete memory shut-off
 Memory Shut-off implemented at SOC level
 Tools are competent enough for
implementation

 Key Challenges
 Performance Hit
 RTL functional verification
 Yield is an issue
 Testing is a big issue
 Support for the IRdrop aware timing models

24

October 24, 2007

IO power shut-off

IOs are to be grouped together based on architecture

Issues

25

Set of IO voltages can be shut-down


Board design and pad selection
ESD

October 24, 2007

Dynamic Voltage Frequency Scaling Requires MultiMode Analysis

DULL

DROWSY

CORE

Multiple constraints
(.sdc)

26

Example: baseline.sdc,
ios.sdc, dull.sdc,
drowsy.sdc

October 24, 2007

Mode

Core

Drowsy

Dull

Baseline

1.08V
125MHz

1.08V
125 MHz

1.08V
125 MHz

Slow

1.08V
125MHz

1.08V
125MHz

0.9V
66MHz

Standby

0.0V

1.08V
125MHz

0.0V

Libraries

stdcell_1.08sl.lib,
stdcell_0.9sl.lib,
stdcell_1.08fs.lib,
stdcell_0.9fs.lib

Multiple modes need to be


analyzed/optimized for multiple
corners

Setup analysis for (WC,


1,125C) corner

DVFS: Multi-Mode Multi-Corner Flow

Create library set

Define various RC
corners

Define constraint modes

Create analysis views

optDesign/
timeDesign
27

October 24, 2007

The library set can be a


single library or a collection
of libraries (ECSM)
Specify PVT condition for
each corner. Specify spef for
each corner
Specify SDC file for each
mode. Same SDC file may
be used or specify 1 SDC
file per domain
Associate a corner with a
mode; Design may have 5
corners and 3 modes, but
only 10 views
Run optimization and timing
checks for concurrent
handling of views

Primary Concerns:
1. Timing Closure
2. Verification
3. Mixed Scenario for Power
Saving (DVFS and PSO
together)

Advance Low Power Techniques


 Pulsed Latch Design Methodology
Traditional FF is replaced with a pulsed-latch
 Pulse generator is shared by several pulsed-latch
 Dummy clock delay cell is used to balance clock tree
Traditional register

Pulsed latch

q
t

d
t

t
t
cp

q
t

t
cp

Pulse Generator
pulse clock

Negative edge FF
28

October 24, 2007

memory
Dummy delay

Advance Low Power Techniques (Contd..)


 Pulsed Latch: Results
 25% active power reduction by swapping to pulsed latch
 50% of active power is consumed by FF -> cut half by pulsed latch
 Power consumption overhead :


Slew control after pulse generator cell


Slew need to be faster at pulse clock-tree

Pulse generator cell insertion (addition)


Required # of PG cell is controllable

~5% overhead

General clock-tree structure


Pulse generator
insertion point

latency control : slow slew

skew control : fast slew


29

October 24, 2007

Clock-Tree image

Advance Low Power Techniques (Contd..)


Low Power Arithmetic Units:
Topology

Delay

Power

PDP

Area

Ripple Carry

Constant Block Width


Carry Skip

0.56

1.06

0.59

1.27

Variable Block Width


Carry Skip

0.44

1.29

0.57

1.88

Carry Look Ahead

0.44

1.59

0.70

2.04

Carry Select

0.36

2.24

0.81

3.38

Conditional Sum

0.41

3.18

1.30

4.38

Delay, power, PDP and area of 16-bit adders


normalized to the delay, the power, the PDP and
the area, respectively, of the Ripple Carry Adder

Topology

Delay

Power

PDP

Area

Array

Split Array

0.68

0.87

0.59

1.43

Wallace Tree

0.58

0.74

0.43

1.93

Modified Booth

0.49

0.95

0.47

2.02

Delay, power, PDP and area of 16-bit


multipliers normalized to the delay, the
power, the PDP and the area,
respectively, of the Array Multiplier

Source: T. Callaway and E. Swartzlander, The power consumption of CMOS adders and multipliers

30

October 24, 2007

Advance Low Power Techniques (Contd..)

31

October 24, 2007

Advance Low Power Techniques (Contd..)


Double-Edge Triggered F/Fs

Double-edge triggered F/Fs (DETFF) can ideally save 50% of clock network power
by reducing the clock frequency requirement to half
However stringent 50% duty-cycle constraint over clock and the area overhead of
DETFF can significantly offset the amount of power saved
Slower than normal F/Fs due to increased internal and/or output node capacitance

Clock for single-edge F/F with period T

Clock for DTFF with period 2T and 50% duty-cycle

Clock for DTFF with period 2T and <50% duty-cycle

Clock for DTFF with period >2T and <50% duty-cycle


32

October 24, 2007

Advance Low Power Techniques (Contd..)

 There are Several Other Techniques which are under


exploration/Used
 Thermal Throttling
 Clock Swing Controls
 Clock-on Demand
 Dynamic Threshold
 Generic Bus power reduction IPs

33

October 24, 2007

Q &A

34

October 24, 2007

BACK-UP

35

October 24, 2007

Development goals

ARM 1136JF-S IC
Power optimization methodology leverageable to synthesized digital designs
Collaborative development: Silicon design chain (Applied Materials, ARM,
Cadence, TSMC)

ARM 1136JF-S IC PSO


Power switch-off (PSO) enhancement: Methodology and implementation

ARM 1176JZF-S IC
PSO and dynamic voltage and frequency scaling (DVFS) enhancement:
Methodology and implementation
Facilitate comprehensive methodology across design, verification and
implementation
Power Forward Initiative (Common Power Format, CPF)
ARM, AMD, ATI, Applied Materials, Cadence, Calypto, Freescale, Fujitsu, Golden Gate
Technology, NEC Electronics, NXP, Sequence, TSMC

36

October 24, 2007

Trace

1V VDD
~100K cell
+ 44 SRAMs

Full AHB

LSU

Fetch

ARM1136JF-S IC architecture

~3,400 voltage
level-shifting cells
0.8V VDD
~200K cells

ARM1136JF-S microprocessor

16k I+D cache, 16 kB TCM; Tag RAMs, TLBs

ARM, Thumb, DSP instructions; Java

ETM11 trace macro, ETB11 trace buffer

Adv. high-performance bus (AHB) bus

37

Core AHB Lite ports  AHB I/F (pin access.)

Access to 128 KB on-chip test RAM: Enable concurrent data


transfers from any four ports

October 24, 2007

300 K standard cell instances; 22M


transistors; 44 SRAMs
IC: 355 MHz typical (90nm standard
CMOS: TSMC 90G)
Dual VDD domains, dual VT library

Design methodology overview (1)

Microprocessor verification
Set microprocessor code,
memory configurations

Timing, Power and Area Optimization

Verify RAM functionality in 90nm process


Verify microprocessor functionality (RTL)

Test cases (>135K vectors)

Vector sets generated used subsequently for power


dissipation analysis
VCD and TCF formats

Fully verified RTL golden reference for Regression


tests / functional verification

ARM1136JF-S IC
VDD domain selection and voltage level
shifting cells (VLS) design considerations
MSV RTL synthesis
Clock gating
Timing closure in multi-VDD designs
Dynamic/static IR drop analysis/optimization
System-level validation

38

October 24, 2007

Design methodology overview (2)

ARM1136JF-S IC PSO
PSO design, verification
Structured PSO ring methodology
VLS/isolation cells insertion in synthesis

Timing, Power and Area Optimization

Automated placement / insertion: VLS cells, switch cells,


state retention registers
Automated power stitching
Automated multi-domain clocks
Power switch-off, switch-on voltage drop
and transients analysis

ARM1176JZF-S IC
PSO management, verification
Integrate dynamic voltage and frequency scaling function
(DVFS)
Physical synthesis / optimization and timing analysis
(DVFS)
Functional integrity verification and test insertion with
power-optimization features

39

October 24, 2007

Vsoc, Vram  1.0V libraries; Vcore  0.8V libraries;


~800 test cases

Multiple Supply Voltage (MSV) RTL


synthesis

Multiple supply voltage synthesis


Newly-developed technology
Single-pass concurrent optimization for timing, area and power
0.8 and 1.0 VDD domains, dual-VT cell libraries

Power optimization in synthesis


Logic restructuring
Logic resizing (before clock tree synthesis)

Pin Swapping (CA<CC)

A
B
C

A
B
C

Buffer introduced
A to reduce slew

40

Minimize duration in which both pFET and nFET conduct


simultaneously

Apply high transition rate signal nets to low capacitance inputs

Z
October 24, 2007

Transition rate buffering (Buffer slow transition nets)

Pin swapping
Y

B
C
D
E

Buffer removal/resizing

ARM1136JF-S IC cells: 62%, 38% in 0.8V, 1.0V

VDD domains, clock gating


Cell Delays (normalized)

0.8V, 1.0V VDD domains


Analyze standard cells delay, leakage, standby and
dynamic power (2.5x delta)
Adequate performance for timing critical nets
Customization  further improvements feasible

Architectural clock gating included in uP RTL


Automated design flow  addl. clock gating
Inferred from RTL through low-power synthesis
~1,000 clock gated cells identified and managed  85%
registers gated
Shut off dynamic current in quiescent logic

Clock decloning: 1,112  703 cells (1136 IC)


Move clock gating to highest hierarchical node of logic tree
 reduce power, insertion delay

41

October 24, 2007

MSV electrical/timing closure (1)

Automated (VLS) insertion


For nets traversing VDD domains
Align cells to avoid n-well spacing violations (domain
perimeter placements)
Automated multi-VDD power distribution and cell
placements, antenna diode insertion
ARM1176JZF-S IC: Automated in synthesis

VLS placement directly affects electrical


performance
Optimal or detoured routing
Power-supply-aware timing and multi-VDD supply
constraints  drive placement
ARM1136JF-S IC: Netlist modified to insert VLS cells
where needed
ARM1136JF-S IC PSO, ARM1176JZF-S IC: Automated
VLS cells insertion, placement, timing

42

October 24, 2007

MSV electrical/timing closure (2)


X
C
mm.
(X
AB C
Y
mm.)
mm. D
AB

Cell substitution with timing constraint


Replace standard-VT with high-VT cells
Net by net basis; same footprint as original cell

SPICE
ECSM

Signal integrity addressed within P&R


~10 of 500K nets required post-layout optimization

Length, Y/X Ratio

Distribution (%)

43

October 24, 2007

Effective current source model (ECSM)


instance-specific multiple VDD delay
calculation
Standard cell libraries characterized for multiple VDD
values at outset
Numerical model <2% deviation vs. full circuit
simulation

IR drop analysis and optimization

Grid-specific resistor meshes

Dynamic power (manage di/dt)

22 mV

1.0V VDD

VSS
19 mV

0.8V VDD
44

October 24, 2007

Dynamic IR
drop analysis

Norm. Power Dissipation (%)

ARM1136JF-S IC validation
100%
Leakage (Total)
Switching (Total)
Leakage (Logic)
Switching (Logic)

80%

ARM RealView Validation


System (instrumented system)
Run applications, measure performance

60%

~15,000 system-level validation tests


40%

Linux (2.4.7, 2.4.19, v6 backport and 2.6.x), WinCE


.NET 4.2 and Symbian OS7 operating systems

20%

Applications: X-windows, Doom, Pocket Word, and


Pocket Explorer, etc.

0%
Std. Power Low Power

~40% overall and 46% leakage power reduction


Dynamic Power Dissipation (mW/MHz)
IC Block

Core
Other
Total
45

October 24, 2007

Sim.
Baseline
(90nm)
0.28
0.36
0.64

Sim. LP
(90nm)
0.14
0.32
0.46

Meas. LP
(90nm)
0.10
0.21
0.31

Meas.
Power (130
nm; ARM)
0.60

ARM1136JF-S IC PSO design

Automated PSO implementation


PSO design, functional verification (VLS cells)
Power, clock distribution

PSO
domain

Static and dynamic power analysis

Structured ring methodology

Switchable
Power Domain

Filler, breaker, corner, switch- or buffer-only

Switches and
Fillers forming
the ring

Switch cell has 2 buffers built-in


with different directions
En_out
En_in

46

October 24, 2007

Internal
power
mesh

1.0V

PSOswitched
switchedblock
block
Pso

VLS cells integration

Level shifter/Isolation
cell placement

Multiple height VLS/Isolation cells

Automatic placement (at domain edge)

Automatic power/enable connection


0.8V supply
connects to
M4

1V VDD
VSS
3-row high
isolation cell

47

October 24, 2007

Standard
Cell

MSV optimization
Power Domain 0.8V
Libraries A

Power Domain 1.0V


Libraries B

Cross-domain timing optimization


Automatically handle conditions shown

Dont touch
nets
Power Domain 0.8V
Libraries A

Domain-aware clock tree synthesis


Automatically handle multi-domain clocks

Power Domain 1.0V


Libraries B

Automatic insertion of state retention


registers
RTL synthesis, implement., verification
Capability not implemented in this work
VDD

VDDC (not swtiched) PG


Power Domain 0.8V
Libraries A

Power Domain 1.0V


Libraries B

VDD (switched)

Shutdown block
0.8V
I/O

SRPG
SRPG
FF
FF
FF

0.8V
I/O

RET
VSSC not swtiched)
VSS
48

October 24, 2007

PG

ARM1176JZF-S IC architecture
1176
MBIST

JTAG, TAP
Boundary Scan
Test Logic

PLL
Clock & Reset

ARM 1176_IC

ARM1176Main
Peripheral
AXI

TestChip
RAM

Debug
interface

AXI &
AXI to AHB
Bridge

DMA
AXI
Clocks and
resets

Data
AXI

Voltage
level-shifting cells

VIC

Vsoc
1.0V

Validation
Coprocessor

Validation
Coprocessor

Dormant Mode
Sequencer

Instruction
AXI
VLS/Clamps
Cache and TCM
RAMs
Vram

Vcore
0.8V

ETB11 MBIST
TAP I/F

ETB11 RAM

Trace I/F

Vram
1.0V

TPIU

CP14 I/F

ARM1176JZFSImp

ETM11
CS
ETBM11CS

IARS: IEM Asynchronous Register Slices

49

ARM1176JZF-S microprocessor

16k I+D cache, 16 kB TCM; Tag RAMs, TLBs

ARM, Thumb, DSP instructions; Java, IEM

ARM1176JZF-S IC

ETM11 trace macro, ETB11 trace buffer

AHB bus I/F through AXI to AHB bridge

October 24, 2007

360K standard cell instances; 22M


transistors; 46 SRAMs
IC: 340MHz typical (90nm standard
CMOS: TSMC 90G)
3 power domains defined
Dual VDD domains, dual VT library

Intelligent energy manager (IEM)


ARM1176JZF-S RTL structure

ARM1176 IEM: Ease of implementation in present design methodologies


Asynchronous between voltage domains at different voltages, frequencies
IEM Asynchronous AXI Register Slices required

Has logical partitioning for voltage domains


No logic at the top-level of the design

Has logical partitioning for level shifters


Implementer must replace with specific library cells or rely on implementation tools to add

Has separate clocks and resets per voltage domain

50

October 24, 2007

ARM1176JZF-S IEM configuration overview

51

October 24, 2007

RAM Interface
Clamps for
dormant mode
support
Always
Synchronous
IEM Register
Slices
Asynchronous for
DVS
Synchronous when
Vsoc = Vcore

Additional IEM enabled components

Level 2 Cache Controller


Embedded Trace Macrocell

Level 2 Cache Controller


L220
Vcore

52

October 24, 2007

Instr. IARS Vcore


VLS

Data IARS Vcore


VLS

Instr. IARS Vsoc

Data IARS Vsoc

VLS and standard cells placement and clock


design

Leverage ARM1136JF-S IC PSO design methodology


Automatic placement (at domain edge)
Non-integral multiple height rows
7, 9, 11-track cells, etc. in the same design

Clock skew 122ps skew (worst-case, global)

VCORE-VSOC
VLS
cells

PSO
cells

53

October 24, 2007

VRAM-VSOC
VLS
cells

An effective power management solution

Power Forward Initiative: Common Power


Format (CPF)
New method to capture design and
constraint information
Facilitates comprehensive
methodology across design,
verification, and implementation
Enables automation and what-if
exploration
Collaboration/integration across
design/supply chain
Foundation for an integrated
methodology
R. Goering, EDA spec describes power EETimes, May 22, 2006
54

October 24, 2007

Design methodology with CPF


Spec

Design Creation

Chip Integration
Prototyping
Physical Synthesis

Routing

Sign-off

GDSII

Automatic test scheduling & ATPG for power gating cells


Automatic scan stitching for power domains
55

October 24, 2007

Analysis

Gate
Gate+CPF

Synthesis

ATPG

supply voltage
synthesis
Level shifter
and power
gate insertion

Gates+CPF

DFT

RTL
RTL+CPF

Multiple

Automatic partitioning
of physical design

Physical Implementation

LVS/DRC/Ext

SVP

Acceleration
& Emulation

RTL+CPF

EC

Design for Test

Simulation

Verify low power


implementation

Constraint Design

Synthesis

SDC Constraint Validation

Equivalence Checking

SDC Constraint
Generation

Formal
Analysis

Testbench Automation

Iterate

Iterate

Verification Coverage

Verification
RTL
RTL+CPF
Coding
Coding

MPD, MSV, DVFS

CPF

Summary
VDD
1.0V
VDD
0.8V

VLS

Power optimization methodology


delivered ~40% overall and
46% leakage power reduction (ARM1136JF-S IC)
Single-pass synthesis with concurrent optimization (timing,
power, area); multi-VDD, multi-VT designs

ARM1136JF-S IC PSO implementation


Normalized ~98.5% (66x) reduction of leakage power in the
low power region (typical conditions)
Automated PSO implementation

PSO

Structured ring methodology

ARM1176JZF-S IC development
Dynamic voltage and frequency scaling enhancement
methodology and implementation
Power optimization methodology enhancements
IEM; synthesis, test, formal verification, clocks, timing
closure, electrical/physical design; CPF

56

October 24, 2007

Acknowledgments and references

Acknowledgments
We thank C. Chu, A. Gupta, J. Goodenough, A. Harry, C. Hopkins, L. Jensen, T. Valind, L.
Milano, A. Iyer, P. Mamtora, J. Willis, M. McAweeney, R. Williams,
I. Devereux and the ARM Physical IP team for their contributions
References

57

A. Khan et al., A 90nm Power Optimization Methodology with Application to the ARM 1136JF-S Microprocessor,
In IEEE Journal of Solid State Circuits, Vol. 41, No. 8, pp. 1707 1717, August 2006
A. Khan et al., A 90nm Power Optimization Methodology and its Application to the ARM 1136JF-S
Microprocessor, Proceedings of the IEEE Custom Integrated Circuits Conference, San Jose, CA, September 21,
2005
Gartner- WW ASIC/ASSP, FPGA/PLD and SLI/SOC App. Fcst., 1Q04
B. Calhoun, Ultra-Dynamic Voltage Scaling Using Sub-threshold Operation and Local Voltage Dithering in 90nm
CMOS, ISSCC, 2/05
S. Henzler, Sleep Transistor Circuits for Fine-Grained Power Switch-Off with Short Power-Down Times, ISSCC,
Feb. 05
http://www.arm.com/pdfs/DUI0273B_core_tile_user_guide.pdf.
A. Khan et al., Design and Development of 130-nanometer ICs for a Multi-Gigabit Switching Network System,
CICC, Oct. 04
D. Desharnais, Nanometer IC routing requires new approaches, EEDesign.com, Dec. 03
A. Khan et al., A 150 MHz Graphics Rendering Processor with 256Mb Embedded DRAM, ISSCC, Feb. 2001
G. Paul, et al., A Scalable 160Gb/s Switch Fabric Processor with 320Gb/s Memory Bandwidth, ISSCC, Feb. 04

October 24, 2007

PSO in std cell based design


Logical Netlist

Physical Netlist

Level shifters
Placement
Location
Connectivity

Level shifters
RTL Model

Isolation cells
Placement
Isolation type
Isolation function

Isolation cells
Synthesis

EC

State retention cells

Floating nets / pins

Gate Netlist

Place & Route

Gate Netlist

October 24, 2007

Placement
Power connectivity
Retention function

Miscellaneous

58

Placement/type
Power connectivity
Isolation function

State retention cells

Placement
Retention function

Miscellaneous

Placement/Location
Power connectivity
Level Shifter function

EC

Power switches
Shorts b/n VDD/VSS

59

October 24, 2007

Вам также может понравиться