Академический Документы
Профессиональный Документы
Культура Документы
Topics
Introduction
Power Dissipation basic
Existing Low Power Techniques and Issues for
Advance LP Techniques (under exploration)
Introduction
1000
Pentium 4 proc
100
1000's of
Watts?
Power
(Watts) 10
1
Pentium proc
8086
0.1
1970
386
8080
1980
1990
2000
2010
2020
10,000
Suns Surface
Rocket Nozzle
Power Density
(W/cm2)
1,000
Nuclear Reactor
100
Hot Plate
8086
10 4004
8008
8080
8085
286
Pentium
processors
386
486
1
70
4
80
90
00
10
Motivation
Portability
Extending battery life
Battery technologies scales-up slowly 150Wh/kg today vs. 75Wh/kg in 1990
1 Kg Ni Cad battery could power 1 hrs for P4 can power Centrino for 4 Hour
Low power dissipation as a product feature in itself
Enabling portable devices to be more powerful and feature-rich
Packaging
High power dissipation leads to expensive packaging and cooling systems
~ 1W: inexpensive plastic package limit
~ 10W: Ceramic package limit
~ 10W/cm2: limit for convection cooling
~ 50W/cm2: limit for forced-air cooling
Reliability
High Product life time
Leakage Power
Both pull-up and pull-down devices are partially conducting for a small, but finite
amount of time
Can be modeled as some fraction of dynamic current
Due to finite input transition time both NMOS and PMOS conduct for a small, but
finite duration, thus providing a resistive path btw VDD and GND
The short-circuit current Isc depends on the ratio of input to output transition times
(higher the ratio, more is the duration for which both the devices are ON, higher the
dissipation due to short-circuit current)
Can be minimized by balancing out the input and output rise times
Advanced
Basic
Low-Power Techniques
10
Power reduction
technique
Leakage
power
Dynamic
power
Timing
penalty
Area
penalty
Implement.
impact
Verification
impact
Area optimization
1.1X
10%
0%
-10%
None
None
None
Multi-Vt optimization
6X
0%
0%
2 to -2%
Low
None
None
Clock gating
0X
20%
0%
<2%
Low
Low
None
2X
40-50%
0%
<10%
Medium
Medium
Low
10-50X
~0%
4-8%
5-15%
Mediumhigh
High
High
2-3X
40-70%
0%
<10%
High
High
High
Substrate Biasing
10X
10%
<10%
High
Medium-High
Medium
A1
11
Virtual
Vss
Virtual
Vss
Buffered Switch
SLEEP
Real
VSS
SLEEP
A2
SLEEP
Real
VSS
Un-buffered Switch
Real
VSS
Fine grain
Coarse grain
Simultaneous switching
capacitance
No issue
Needs to be addressed
>30%
<5%
VddC
D: Active
Iso.
SRP
GFF
FF
shutoff
FF
Note:
switch cell
has 2 buffers
built-in with
different directions
iso_en
En_out
Vss
En_in
Filler
Breaker
Corner
En_out_1
PD1
Left Offset
(150um)
12
En_in
Column Pitch
(200um)
En_out_2
En_out_3
13
Ring
Ring(s) of switches enclose the power domain fully or partially
Switches placed outside the power domain
Switch cell treated as hard macro
Often used with hard macros (not allowed to touch inside)
More IRdrop
Better current distribution
Column
Columns of switches inside power domain
Switches placed in the standard cell rows
Switch is a standard cell
Often used inside hierarchical (soft) blocks
Lesser IRdrop
More prone to rush current issue
Needs careful EM checks
Verification
14
Testability
RTL simulations
Low power insertion checks
CPF verification
DFM
ESD
IRdrop aware timing analysis
PMM
1
PD
PDM1
PDM2
OFF
iA
ISO_EN
iB
X
Good
0
0.8V
Structural/Rule Checking
15
ON
Missing
LH
ISO
1.2V
16
PD1
Top
PD1
PD2
SR
Retention Flops
v2
v1
17
PMU
PD4
Mem
PTAM
Power-Aware
Test Model
ISO
PD2
Core
PD3
Reduce Power
during Test
18
Power-gating goals/tasks
Power-switch on overall IR drop
Modelling the power-switch as on
Running IR drop on entire power-grid, both global and switched at once
Power-up
Simulate as the power-switch is turned on
Capture power-switch current behavior
IR drop effect on global grid and neighbors when block powering-up
Use captured current behavior from previous step and feed into rail analysis
19
Power-up analysis
Fastmos simulator used for power-up simulation (UltraSim)
Dynamic currents captured through power switches
Two-part approach
1. Steady state analysis
To monitor IR drop through switches
VoltageStorm analyzes for IR drop
VoltageStorm reports power switches
operating in saturation
2. Dynamic analysis
To monitor & control power ramp-up
VoltageStorm reports block
power-on time
Too fast latch-up
Too slow limits performance
20
VDD
Control
Logic
Circuit
Netlist
Circuit
Netlist
Inputs
clamped
Outputs
correctly
loaded
Circuit
Netlist
Capture Dynamic
Current in PGV
MTCMOS Power-On
QX
22
PowerMeter
Toplevel
Circuit File
RC grid
Signal
Loading
Netlist
Template
Stimulus
Voltage
Sources
UltraSim
Spice
waveforms
& Results
Power-transistor
dynamic
currents (pti)
23
Key Challenges
Performance Hit
RTL functional verification
Yield is an issue
Testing is a big issue
Support for the IRdrop aware timing models
24
IO power shut-off
Issues
25
DULL
DROWSY
CORE
Multiple constraints
(.sdc)
26
Example: baseline.sdc,
ios.sdc, dull.sdc,
drowsy.sdc
Mode
Core
Drowsy
Dull
Baseline
1.08V
125MHz
1.08V
125 MHz
1.08V
125 MHz
Slow
1.08V
125MHz
1.08V
125MHz
0.9V
66MHz
Standby
0.0V
1.08V
125MHz
0.0V
Libraries
stdcell_1.08sl.lib,
stdcell_0.9sl.lib,
stdcell_1.08fs.lib,
stdcell_0.9fs.lib
Define various RC
corners
optDesign/
timeDesign
27
Primary Concerns:
1. Timing Closure
2. Verification
3. Mixed Scenario for Power
Saving (DVFS and PSO
together)
Pulsed latch
q
t
d
t
t
t
cp
q
t
t
cp
Pulse Generator
pulse clock
Negative edge FF
28
memory
Dummy delay
~5% overhead
Clock-Tree image
Delay
Power
PDP
Area
Ripple Carry
0.56
1.06
0.59
1.27
0.44
1.29
0.57
1.88
0.44
1.59
0.70
2.04
Carry Select
0.36
2.24
0.81
3.38
Conditional Sum
0.41
3.18
1.30
4.38
Topology
Delay
Power
PDP
Area
Array
Split Array
0.68
0.87
0.59
1.43
Wallace Tree
0.58
0.74
0.43
1.93
Modified Booth
0.49
0.95
0.47
2.02
Source: T. Callaway and E. Swartzlander, The power consumption of CMOS adders and multipliers
30
31
Double-edge triggered F/Fs (DETFF) can ideally save 50% of clock network power
by reducing the clock frequency requirement to half
However stringent 50% duty-cycle constraint over clock and the area overhead of
DETFF can significantly offset the amount of power saved
Slower than normal F/Fs due to increased internal and/or output node capacitance
33
Q &A
34
BACK-UP
35
Development goals
ARM 1136JF-S IC
Power optimization methodology leverageable to synthesized digital designs
Collaborative development: Silicon design chain (Applied Materials, ARM,
Cadence, TSMC)
ARM 1176JZF-S IC
PSO and dynamic voltage and frequency scaling (DVFS) enhancement:
Methodology and implementation
Facilitate comprehensive methodology across design, verification and
implementation
Power Forward Initiative (Common Power Format, CPF)
ARM, AMD, ATI, Applied Materials, Cadence, Calypto, Freescale, Fujitsu, Golden Gate
Technology, NEC Electronics, NXP, Sequence, TSMC
36
Trace
1V VDD
~100K cell
+ 44 SRAMs
Full AHB
LSU
Fetch
ARM1136JF-S IC architecture
~3,400 voltage
level-shifting cells
0.8V VDD
~200K cells
ARM1136JF-S microprocessor
37
Microprocessor verification
Set microprocessor code,
memory configurations
ARM1136JF-S IC
VDD domain selection and voltage level
shifting cells (VLS) design considerations
MSV RTL synthesis
Clock gating
Timing closure in multi-VDD designs
Dynamic/static IR drop analysis/optimization
System-level validation
38
ARM1136JF-S IC PSO
PSO design, verification
Structured PSO ring methodology
VLS/isolation cells insertion in synthesis
ARM1176JZF-S IC
PSO management, verification
Integrate dynamic voltage and frequency scaling function
(DVFS)
Physical synthesis / optimization and timing analysis
(DVFS)
Functional integrity verification and test insertion with
power-optimization features
39
A
B
C
A
B
C
Buffer introduced
A to reduce slew
40
Z
October 24, 2007
Pin swapping
Y
B
C
D
E
Buffer removal/resizing
41
42
SPICE
ECSM
Distribution (%)
43
22 mV
1.0V VDD
VSS
19 mV
0.8V VDD
44
Dynamic IR
drop analysis
ARM1136JF-S IC validation
100%
Leakage (Total)
Switching (Total)
Leakage (Logic)
Switching (Logic)
80%
60%
20%
0%
Std. Power Low Power
Core
Other
Total
45
Sim.
Baseline
(90nm)
0.28
0.36
0.64
Sim. LP
(90nm)
0.14
0.32
0.46
Meas. LP
(90nm)
0.10
0.21
0.31
Meas.
Power (130
nm; ARM)
0.60
PSO
domain
Switchable
Power Domain
Switches and
Fillers forming
the ring
46
Internal
power
mesh
1.0V
PSOswitched
switchedblock
block
Pso
Level shifter/Isolation
cell placement
1V VDD
VSS
3-row high
isolation cell
47
Standard
Cell
MSV optimization
Power Domain 0.8V
Libraries A
Dont touch
nets
Power Domain 0.8V
Libraries A
VDD (switched)
Shutdown block
0.8V
I/O
SRPG
SRPG
FF
FF
FF
0.8V
I/O
RET
VSSC not swtiched)
VSS
48
PG
ARM1176JZF-S IC architecture
1176
MBIST
JTAG, TAP
Boundary Scan
Test Logic
PLL
Clock & Reset
ARM 1176_IC
ARM1176Main
Peripheral
AXI
TestChip
RAM
Debug
interface
AXI &
AXI to AHB
Bridge
DMA
AXI
Clocks and
resets
Data
AXI
Voltage
level-shifting cells
VIC
Vsoc
1.0V
Validation
Coprocessor
Validation
Coprocessor
Dormant Mode
Sequencer
Instruction
AXI
VLS/Clamps
Cache and TCM
RAMs
Vram
Vcore
0.8V
ETB11 MBIST
TAP I/F
ETB11 RAM
Trace I/F
Vram
1.0V
TPIU
CP14 I/F
ARM1176JZFSImp
ETM11
CS
ETBM11CS
49
ARM1176JZF-S microprocessor
ARM1176JZF-S IC
50
51
RAM Interface
Clamps for
dormant mode
support
Always
Synchronous
IEM Register
Slices
Asynchronous for
DVS
Synchronous when
Vsoc = Vcore
52
VCORE-VSOC
VLS
cells
PSO
cells
53
VRAM-VSOC
VLS
cells
Design Creation
Chip Integration
Prototyping
Physical Synthesis
Routing
Sign-off
GDSII
Analysis
Gate
Gate+CPF
Synthesis
ATPG
supply voltage
synthesis
Level shifter
and power
gate insertion
Gates+CPF
DFT
RTL
RTL+CPF
Multiple
Automatic partitioning
of physical design
Physical Implementation
LVS/DRC/Ext
SVP
Acceleration
& Emulation
RTL+CPF
EC
Simulation
Constraint Design
Synthesis
Equivalence Checking
SDC Constraint
Generation
Formal
Analysis
Testbench Automation
Iterate
Iterate
Verification Coverage
Verification
RTL
RTL+CPF
Coding
Coding
CPF
Summary
VDD
1.0V
VDD
0.8V
VLS
PSO
ARM1176JZF-S IC development
Dynamic voltage and frequency scaling enhancement
methodology and implementation
Power optimization methodology enhancements
IEM; synthesis, test, formal verification, clocks, timing
closure, electrical/physical design; CPF
56
Acknowledgments
We thank C. Chu, A. Gupta, J. Goodenough, A. Harry, C. Hopkins, L. Jensen, T. Valind, L.
Milano, A. Iyer, P. Mamtora, J. Willis, M. McAweeney, R. Williams,
I. Devereux and the ARM Physical IP team for their contributions
References
57
A. Khan et al., A 90nm Power Optimization Methodology with Application to the ARM 1136JF-S Microprocessor,
In IEEE Journal of Solid State Circuits, Vol. 41, No. 8, pp. 1707 1717, August 2006
A. Khan et al., A 90nm Power Optimization Methodology and its Application to the ARM 1136JF-S
Microprocessor, Proceedings of the IEEE Custom Integrated Circuits Conference, San Jose, CA, September 21,
2005
Gartner- WW ASIC/ASSP, FPGA/PLD and SLI/SOC App. Fcst., 1Q04
B. Calhoun, Ultra-Dynamic Voltage Scaling Using Sub-threshold Operation and Local Voltage Dithering in 90nm
CMOS, ISSCC, 2/05
S. Henzler, Sleep Transistor Circuits for Fine-Grained Power Switch-Off with Short Power-Down Times, ISSCC,
Feb. 05
http://www.arm.com/pdfs/DUI0273B_core_tile_user_guide.pdf.
A. Khan et al., Design and Development of 130-nanometer ICs for a Multi-Gigabit Switching Network System,
CICC, Oct. 04
D. Desharnais, Nanometer IC routing requires new approaches, EEDesign.com, Dec. 03
A. Khan et al., A 150 MHz Graphics Rendering Processor with 256Mb Embedded DRAM, ISSCC, Feb. 2001
G. Paul, et al., A Scalable 160Gb/s Switch Fabric Processor with 320Gb/s Memory Bandwidth, ISSCC, Feb. 04
Physical Netlist
Level shifters
Placement
Location
Connectivity
Level shifters
RTL Model
Isolation cells
Placement
Isolation type
Isolation function
Isolation cells
Synthesis
EC
Gate Netlist
Gate Netlist
Placement
Power connectivity
Retention function
Miscellaneous
58
Placement/type
Power connectivity
Isolation function
Placement
Retention function
Miscellaneous
Placement/Location
Power connectivity
Level Shifter function
EC
Power switches
Shorts b/n VDD/VSS
59