Вы находитесь на странице: 1из 44

Chapter Chapter 44 Chapter Chapter 44

Low Low--Power VLSI Design Power VLSI Design Low Low Power VLSI Design Power VLSI Design
Jin-Fu Li
Advanced Reliable Systems (ARES) Lab. y ( )
Department of Electrical Engineering
National Central University National Central University
Jhongli, Taiwan
Outline Outline
Introduction Introduction
Low-Power Gate-Level Design
Low-Power Architecture-Level Design
Algorithmic-Level Power Reduction
RTL T h i f O i i i P RTL Techniques for Optimizing Power
2
EE4012VLSI Design National Central University
Introduction Introduction
Most SOC design teams now regard power as one
of their top design concerns g
Why low-power design?
Battery lifetime (especially for portable devices) Battery lifetime (especially for portable devices)
Reliability
Power consumption
Peak power p
Average power
3
EE4012VLSI Design National Central University
Overview of Power Consumption Overview of Power Consumption
Average power consumption
Dynamic power consumption y p p
Short-circuit power consumption
Leakage power consumption Leakage power consumption
Static power consumption
D i di i ti d i it hi Dynamic power dissipation during switching
C
input
C
drain
C
input
interconnect
4
EE4012VLSI Design National Central University
Overview of Power Consumption Overview of Power Consumption
Generic representation of a CMOS logic gate for
switching power calculation g p
pMOS
V
A
network
V
out
V
B

+ +
input erconnect drain
C C C
int
nMOS
network
V
A
V
BB
} }
+ =
2 /
0 2 /
] ) )( ( ) ( [
1
T T
T
out
load out DD
out
load out avg
dt
dt
dV
C V V dt
dt
dV
C V
T
P
5
EE4012VLSI Design National Central University
Overview of Power Consumption Overview of Power Consumption
The average power consumption can be expressed
as
1
CLK DD load DD load avg
f V C V C
T
P
2 2
1
= =
The node transition rate can be slower than the
clock rate. To better represent this behavior, a
f ( ) node transition factor ( ) should be introduced
CLK DD load T avg
f V C P
2
o =
T
o
The switching power expressed above are derived
by taking into account the output node load
CLK DD load T avg
f
by taking into account the output node load
capacitance
6
EE4012VLSI Design National Central University
Overview of Power Consumption Overview of Power Consumption
V
A
V
A
C
internal
V
internal
A
V
B
V
internal
V
B
C
load
V
A
V
B
V
out
V
out
A B
The generalized expression for the average power dissipation
can be rewritten as
CLK DD
ofnodes
i
i i Ti avg
f V V C P
|
|
.
|

\
|
=

=
#
1
o
7
EE4012VLSI Design National Central University
. \
Gate Gate--Level Design Level Design Technology Mapping Technology Mapping
The objective of logic minimization is to reduce the
boolean function.
For low-power design, the signal switching activity
is minimized by restructuring a logic circuit is minimized by restructuring a logic circuit
The power minimization is constrained by the
delay, however, the area may increase.
During this phase of logic minimization, the During this phase of logic minimization, the
function to be minimized is


i
i i i
C P P ) 1 (
8
EE4012VLSI Design National Central University
Gate Gate--Level Design Level Design Technology Mapping Technology Mapping
The first step in technology mapping is to decompose
each logic function into two-input gates
The objective of this decomposition is to minimizing the
total power dissipation by reducing the total switching
ti it activity
0384 0 = o
A 0.2
0384 . 0 o
B 0.2
C 0.5
D 0 5
0099 . 0 = o
0196 . 0 = o
A
B
C
D
D 0.5
A 0.2
0384 . 0 = o
B 0.2
C 0.5
D 0 5
0099 . 0 = o
9
EE4012VLSI Design National Central University
D 0.5
1875 . 0 = o
Gate Gate--Level Design Level Design Phase Assignment Phase Assignment
High activity node
High activity node
A
B
A
B
C
B
C
C
10
EE4012VLSI Design National Central University
Gate Gate--Level Design Level Design Pin Swapping Pin Swapping
b a d c b a d c
d
b a d c
a
b a d c
d
c b
a
S
w
i
t
c
h
S
w
i
t
c
h
i
n
b
a d
c
i
n
g

a
c
t
i
v
i
t
y
n
g

a
c
t
i
v
i
t
y
y
d
b
a
d
c
b
a
d
c
11
EE4012VLSI Design National Central University
Gate Gate--Level Design Level Design Glitching Power Glitching Power
Glitches
spurious transitions due to imbalanced path delays
A design has more balanced delay paths
has fewer glitches, and thus has less power dissipation g , p p
Note that there will be no glitches in a dynamic CMOS
logic g
A
A
B
D
E
B
C
D
C
D
E
12
EE4012VLSI Design National Central University
Gate Gate--Level Design Level Design Glitching Power Glitching Power
A chain structure has more glitches
A tree structure has fewer glitches A tree structure has fewer glitches
A
B
C
Chain structure
C
D
Chain structure
A
B
Tree structure
B
C
D
Tree structure
13
EE4012VLSI Design National Central University
D
Gate Gate--Level Design Level Design Precomputation Precomputation
Combinational Logic
REG
R1
REG
R2
Combinational Logic
REG REG
Combinational Logic
R1 R2
Precomputation
Logic
14
EE4012VLSI Design National Central University
g
Gate Gate--Level Design Level Design Precomputation Precomputation
REG A<n 1>
1-bit Comparator
(MSB)
REG
R1
A<n-1>
B<n-1>
REG
R2
A<n-2:0>
REG
R4
(n-1)-bit
Comparator
Precomputation logic
Enable
F
REG
R3
B<n-2:0>
15
EE4012VLSI Design National Central University
Gate Gate--Level Design Level Design Gating Clock Gating Clock
D Q D Q D Q D Q
clk
Fail DFT rule
checking
D Q D Q D Q D Q
T
Add control pin
D Q D Q D Q D Q
Add control pin
to solve DFT
violation
problem
clk
16
EE4012VLSI Design National Central University
Gate Gate--Level Design Level Design Input Gating Input Gating
f1
clk
+
l t
f2
select
f2
17
EE4012VLSI Design National Central University
Clock Clock--Gating in Low Gating in Low- -Power Flip Power Flip--Flop Flop
D Q
D
CK
Source: Prof. V. D. Agrawal
18
EE4012VLSI Design National Central University
Source: Prof. V. D. Agrawal
Reduced Reduced--Power Shift Register Power Shift Register
D Q D Q D Q D Q
D
Q Q Q Q
i
p
l
e
x
e
r
Output
D Q D Q D Q D Q
m
u
l
t
i
D Q D Q D Q D Q
CK(f/2)
Flip-flops are operated at full voltage and half the clock frequency.
19
EE4012VLSI Design National Central University
Source: Prof. V. D. Agrawal
Power Consumption of Shift Register Power Consumption of Shift Register
P = CV
DD
2
f/n
1 0 D Of F P
16-bit shift register, 2 CMOS
r
1.0 Deg. Of
parallelism
Freq
(MHz)
Power
(W)
1 33 0 1535
e
d

p
o
w
e
0 5
1 33.0 1535
2 16.5 887
4 8 25 738
o
r
m
a
l
i
z
e 0.5
0.25
4 8.25 738
N
o
0 0
C. Piguet, Circuit and Logic Level
Design pages 103-133 in W Nebel
Degree of parallelism, n
1 2 4
0.0
Design, pages 103 133 in W. Nebel
and J. Mermet (ed.), Low Power
Design in Deep Submicron
Electronics Springer 1997
20
EE4012VLSI Design National Central University
Electronics, Springer, 1997.
Source: Prof. V. D. Agrawal
Architecture Architecture--Level Design Level Design Parallelism Parallelism
R A
R A
16
16
16x16
multiplier
f
ref
32
16x16
multiplier
f
ref/2
32 16
R B
R
M
U
f
32
16
f
ref
U
X
f
ref/2
R
f
Assume that With the same 16x16
16x16
multiplier
f
ref/2
32
f
ref
16
multiplier, the power supply can
be reduced from V
ref
to V
ref
/1.83.
ref ref
P
f V
C P 33 0 ) ( 2 2
2 p
B R
32
16
16
ref
f f
ref parallel
P C P 33 . 0
2
)
83 . 1
( 2 . 2 = =
21
EE4012VLSI Design National Central University
f
ref/2
Architecture Architecture--Level Design Level Design Pipelining Pipelining
The hardware between the pipeline stages is reduced then
the reference voltage V
ref
can be reduced to V
new
to maintain
the same worst case delay For example let a 50MHz the same worst case delay. For example, let a 50MHz
multiplier is broken into two equal parts as shown below. The
delay between the pipeline stages can be remained at 50MHz
when the voltage V is equal to V /1 83 when the voltage V
new
is equal to V
ref
/1.83
Half
multiplier
REG (A ,B)
32
REG
Half
multiplier
32
ff
ref
ref ref
ref
ref pipeline
P f
V
C P 36 . 0 )
83 1
( 2 . 1
2
= =
22
EE4012VLSI Design National Central University
ref ref ref pipeline
f )
83 . 1
(
Architecture Architecture--Level Design Level Design Retiming Retiming
Retiming is a transformation technique used to change the
locations of delay elements in a circuit without affecting the
input/output characteristics of the circuit input/output characteristics of the circuit.
Two versions of an IIR filter.
y(n) x(n) y(n) x(n)
(1)
(1)
D D 2D
D
D
2D
y( ) ( )
w(n)
y( ) ( )
a
(2)
a
(1)
(2) D
D
w
1
(n)
b
(1)
(2) (1)
(2)
retiming
b
w
2
(n)
(2)
(2)
23
EE4012VLSI Design National Central University
Architecture Architecture--Level Design Level Design Retiming Retiming
Retiming for pipeline design
REG REG
C3
(4ns)
C1
(6ns)
C2
(2ns)
f
ref
(4ns)
( ) ( )
REG REG
C3
C1 C2
f
C3
(4ns)
(6ns) (2ns)
f
ref
24
EE4012VLSI Design National Central University
Architecture Architecture--Level Design Level Design Retiming Retiming
Clock cycle is 4 gate delays
Clock cycle is 2 gate delays
25
EE4012VLSI Design National Central University
Architecture Architecture--Level Design Level Design Power Management Power Management
C
2
C
1
C
1
_FREEZE
C
2
_FREEZE
C
2
C
1
C FREEZE C
1
_FREEZE
C
2
_FREEZE
26
EE4012VLSI Design National Central University
Architecture Architecture--Level Design Level Design Bus Segmentation Bus Segmentation
Avoid the sharing of resources
Reduce the switched capacitance
For example: a global system bus
A single shared bus is connected to all modules, this A single shared bus is connected to all modules, this
structure results in a large bus capacitance due to
- The large number of drivers and receivers sharing the same
bus bus
- The parasitic capacitance of the long bus line
A segmented bus structure g
Switched capacitance during each bus access is
significantly reduced
Overall routing area may be increased
27
EE4012VLSI Design National Central University
Architecture Architecture--Level Design Level Design Bus Segmentation Bus Segmentation
C
bus
C
bus1
B
u
s

I
n
t
e
r
f
a
c
e
C
bus1
28
EE4012VLSI Design National Central University
bus1
Algorithmic Algorithmic- -Level Design Level Design ff
activity activity
Reduction Reduction
Minimization the switching activity, at high level, is one way to
reduce the power dissipation of digital processors.
One method to minimize the switching signals at the algorithmic One method to minimize the switching signals, at the algorithmic
level, is to use an appropriate coding for the signals rather than
straight binary code.
The table shown below shows a comparison of 3-bit representation
Binary Code
Gray Code Decimal Equivalent
The table shown below shows a comparison of 3-bit representation
of the binary and Gray codes.
Binary Code
Gray Code Decimal Equivalent
000
001
000
001
0
1
2
010
011
100
011
010
110
2
3
4
100
101
110
111
110
111
101
100
5
6
7
29
EE4012VLSI Design National Central University
111
100
7
State Encoding for a Counter State Encoding for a Counter
Two-bit binary counter:
- State sequence, 00 01 10 11 00 q ,
- Six bit transitions in four clock cycles
- 6/4 = 1.5 transitions per clock p
Two-bit Gray-code counter
St t 00 01 11 10 00 - State sequence, 00 01 11 10 00
- Four bit transitions in four clock cycles
4/4 1 0 t iti l k - 4/4 = 1.0 transition per clock
Gray-code counter is more power efficient. Gray code counter is more power efficient.
G. K. Yeap, Practical Low Power Digital VLSI Design, Boston:
Kluwer Academic Publishers (now Springer) 1998
30
EE4012VLSI Design National Central University
Kluwer Academic Publishers (now Springer), 1998.
Source: Prof. V. D. Agrawal
Binary Counter: Original Encoding Binary Counter: Original Encoding
Present
state
Next state
A
a
b
a b A B
0 0 0 1
B
b
0 1 1 0
1 0 1 1
1 1 0 0
A = ab + ab
B = ab + ab
CK
CLR
31
EE4012VLSI Design National Central University
Source: Prof. V. D. Agrawal
Binary Counter: Gray Encoding Binary Counter: Gray Encoding
Present
state
Next state
A
a
a b A B
0 0 0 1
B
b
0 1 1 1
1 0 0 0
1 1 1 0
A = ab + ab
B = ab + ab
CK
CLR
32
EE4012VLSI Design National Central University
Source: Prof. V. D. Agrawal
Three Three--Bit Counters Bit Counters
Binary Gray-code
State No. of toggles State No. of toggles gg gg
000 - 000 -
001 1 001 1
010 2 011 1
011 1 010 1
100 3 110 1
101 1 111 1
110 2 101 1
111 1 100 1
000 3 000 1
Av. Transitions/clock = 1.75 Av. Transitions/clock = 1
33
EE4012VLSI Design National Central University
Source: Prof. V. D. Agrawal
NN--Bit Counter: Toggles in Counting Cycle Bit Counter: Toggles in Counting Cycle
Binary counter: T(binary) = 2(2
N
1)
Gray code counter: T(gray) = 2
N
Gray-code counter: T(gray) = 2
N
T(gray)/T(binary) = 2
N-1
/(2
N
1) 0.5
Bits T(binary) T(gray) T(gray)/T(binary)
1 2 2 1.0
2 6 4 0.6667
3 14 8 0.5714
4 30 16 0.5333
5 62 32 0.5161
6 126 64 0.5079
- - 0.5000
34
EE4012VLSI Design National Central University
Source: Prof. V. D. Agrawal
FSM State Encoding FSM State Encoding
0 6 0 6
Transition
probability
based on
11
0 1
0.3
0.6
01
0 1
0.3
0.6
based on
PI statistics
01 00
0.1
0.1
0.4
0 9
11 00
0.1
0.1
0.4
0 9
0.1
0.6
0.9
0.1
0.6
0.9
Expected number of state-bit transitions:
1(0.3+0.4+0.1) + 2(0.1) = 1.0 2(0.3+0.4) + 1(0.1+0.1) = 1.6
State encoding can be selected using a power-based cost function.
35
EE4012VLSI Design National Central University
Source: Prof. V. D. Agrawal
FSM: Clock FSM: Clock--Gating Gating
Moore machine: Outputs depend only on the
state variables.
If a state has a self-loop in the state transition
graph (STG), then clock can be stopped graph (STG), then clock can be stopped
whenever a self-loop is to be executed.
Si
Xi/Zk
Si
Sk
Xk/Zk
Clock can be stopped
Sj
Xj/Zk
C oc ca be stopped
when (Xk, Sk) combination
occurs.
36
EE4012VLSI Design National Central University
Source: Prof. V. D. Agrawal
Clock Clock--Gating in Moore FSM Gating in Moore FSM
PI
Combinational
logic
f
l
o
p
s
PO
F
l
i
p
-
f
Latch
Clock
activation
logic
L Benini and G De Micheli
logic
CK
L. Benini and G. De Micheli,
Dynamic Power Management,
Boston: Springer, 1998.
37
EE4012VLSI Design National Central University
Source: Prof. V. D. Agrawal
Bus Encoding for Reduced Power Bus Encoding for Reduced Power
Example: Four bit bus
- 0000 1110 has three transitions.
- If bits of second pattern are inverted then 0000 - If bits of second pattern are inverted, then 0000
0001 will have only one transition.
Bit-inversion encoding for N-bit bus: Bit-inversion encoding for N-bit bus:
N
o
n
s
d
i
n
g
N/2
t
r
a
n
s
i
t
i
o
n

e
n
c
o
d
N/2
e
r

o
f

b
i
t

i
n
v
e
r
s
i
o
n
N b f bi i i
0 N/2 N
0
N
u
m
b
a
f
t
e
r

i
38
EE4012VLSI Design National Central University
Number of bit transitions
Source: Prof. V. D. Agrawal
Bus Bus--Inversion Encoding Logic Inversion Encoding Logic
n
t

d
a
t
a
e
d

d
a
t
a
S
e
n
R
e
c
e
i
v
e
R
Polarity
decision
Bus register
M. Stan and W. Burleson, Bus-Invert
decision
logic
Polarity bit
Coding for Low Power I/O, IEEE
Trans. VLSI Systems, vol. 3, no. 1, pp.
49-58, March 1995.
39
EE4012VLSI Design National Central University
Source: Prof. V. D. Agrawal
RTL RTL- -Level Design Level Design Signal Gating Signal Gating
Decoder with enable
Simple Decoder
module decoder (a, sel);
input [1:0[ a;
ouput [3:0] sel;
[3 0] l
module decoder (en,a, sel);
input en;
input [1:0[ a;
[3 0] l reg [3:0] sel;
always @(a) begin
case (a)
ouput [3:0] sel;
reg [3:0] sel;
always @({en,a}) begin
2b00: sel=4b0001;
2b01: sel=4b0010;
2b10: sel=4b0100;
case ({en,a})
3b100: sel=4b0001;
3b101: sel=4b0010;
2b11: sel=4b1000;
endcase
end
3b110: sel=4b0100;
3b111: sel=4b1000;
default: sel=4b0000;
d
endmodule
endcase
end
endmodule
40
EE4012VLSI Design National Central University
RTL RTL- -Level Design Level Design Datapath Reordering Datapath Reordering
Reordered Initial
M
stable
Mux
glitchy
A<B
Mux
stable
lit h
Mux
stable
Mux
glitchy
A<B
41
EE4012VLSI Design National Central University
RTL RTL- -Level Design Level Design Memory Partition Memory Partition
128x32
32
dout
din
addr
write
M 32
q
d
pre addr
write
addr[7:0]
dd [7 1]
8
noe
U
X
dout
q
d
pre_addr
clk
[ ]
addr[7:1]
noe
32
dout
write
addr
din
128x32
addr0
42
EE4012VLSI Design National Central University
RTL RTL- -Level Design Level Design Memory Partition Memory Partition
Application-driven memory partition
Reads
64K bytes
Data
Reads
ARM
Core
Addr
R/W
Addr
Range
64K
28K 4K 32K
64K
43
EE4012VLSI Design National Central University
RTL RTL- -Level Design Level Design Memory Partition Memory Partition
A power-optimal partitioned memory organization
Decoder
ARM
Core
D
a
t
a
A
d
d
r
R
/
W
C
S
D
a
t
a
A
d
d
r
R
/
W
C
S
D
a
t
a
A
d
d
r
R
/
W
C
S
44
EE4012VLSI Design National Central University

Вам также может понравиться