Академический Документы
Профессиональный Документы
Культура Документы
2
Low-Power Design
g Techniques
q
Implementation at different levels of
abstraction
C
Circuit/Logic
/ Architecture
h
-Logic style: Adiabatic -Parallelization & pipelining
-Multi & variable supply -Retiming
-Glitch minimization
System
-Partitioning
-Component choice
-Power
Power management
Principles
Adiabatic Logic
Adiabatic : reversible thermodynamic process that
occurs without gain or loss of heat
Ideal adiabatic logic: charge can be recycled (reused)
fo infinite number
for n mbe of times
Real adiabatic logic: charge recycled many times so that
significant power reduction can be possible
To achieve the charge saving expected from adiabatic
logic, one of two power supplies is to be used:
5
Adiabatic Charging Principle
6
Adiabatic Recoveryy Requirements
q
Three principles :
- Device turn on under Vds=0
- Source
Source-drain
drain voltage change under device off
- Gradual voltage change
7
Carnot Cycle
In 1822-24, Sadi Carnot analyzed the efficiency of an ideal
heat engine all of whose steps were reversible, and
furthermore proved that:
Any reversible engine (regardless of details) would
Thermally
Thermally-activated
activated leakage of electrons over
potential energy barriers
Quantum tunneling of electrons through narrow
barriers (sub-Fermi wavelength)
Losses due to intentional treatment of known
physical information as entropy (bit erasure)
Some Ways to Reduce Losses
EM interference / emission: Add shielding, use high-Q
MEMS/NEMS oscillators
ill t
Scattering/resistance: Ballistic FETs, superconductors
Thermal
h l leakage:
l k avoid
d low
l VT and/or
d/ h high
h temps
Tunneling: thick tunnel barriers, high-κ dielectrics,
conductors
d t w. llow Fermi-level/high
F il l/hi h electron
l t affinity,
ffi it
vacuum-gap barriers?
Intentional bit erasure: reduce voltages
voltages, use mostly
mostly-
reversible adiabatic logic designs
Adiabatic electronics &
CMOS implementations
Conventional Gates are Irreversible
Logic gate behavior (on receiving new input):
Many
Many-to-one
to one transformation of local state!
Required to dissipate bT, by Landauer principle
19
Complementary Pass-Transistor
Pass Transistor Energy
Recovery Adiabatic Logic (CPERL)
20
CPERL (1)
All NMOS
Gate consists of two parts
1) Charge/discharge
function part (M1 – M6)
2) Logic function part
(M9 – M12)
21
CPERL (2)
22
CPERL (3)
An assumption was made that ϕ1 and IN are
in the same phase
As ϕ1 ramps up,up IN rises also
Inbar remains low
M9 & M11 turns on
BN1 is prechargedg to (Vdd – Vth)
BN2 is still at low voltage
23
CPERL (4)
When ϕ1 ramps p down, IN goes
g down also
causing M9 & M11 to turn off
As ϕ2 rampsp upp and due to the gate-to-channel
g
capacitance in M1, BN1 goes higher than Vdd
causing M1 to turn on
ϕ2 will charge the node OUT in an adiabatic
manner to Vdd
As ϕ2 ramps down, OUT goes down also
The charge stored on OUT is recovered to
supplied through the discharge process
24
CPERL (5)
Two stages
g of CPERL inverters
chain are shown and just half of
the circuit for the simplicity
During period t1, A is assumed
high and BN2 is at (Vdd-Vth)
During t2, ϕ2 ramps downand
the the charge will trapped at
BN2
During t3, ϕ2 rises again
Assuming that A Low and /A
high, M10 of stage 2 will turn on
25
CPERL (6)
M3 of stage 1 will turn on also
Current will flow through M10 & M3due to voltage
difference
This chargeg sharing g will stop
p when a voltage
g balance
occurs between the nodes
M5 is working under diode connection
If the voltage difference is still higher than Vth, M5 will
turn on until the voltage difference becomes lower than
Vth
If this difference is already less than Vth, M5 will stay off
26
CPERL (7)
Brent Kung adder has three units:
- Propagate and Generate unit
- Carry parallel prefix unit
- Sum unit
27
CPERL (8)
28
CPERL (9)
29
CPERL (10)
30
SCRL: Split-level Charge
Recovery Logic
φ
0 0
0
Examples:
E l
Hall’s logic, Input barrier height
SCRL gates,
gates
Rod logic interlocks
0 N 1
Clocked force applied →
Retractile Logic
g w. SCRL gates
g
Simple combinational logic of any depth N:
Requires
R i N timing
ti i phases
h
Non-pipelined Time →
No
N sequential
ti l reuse off
HW (even worse)
We need
sequential
logic!
Sequential
q Retractile Logic
g
Approach #1 (Hall ‘92):
After every N stages, invoke an irreversible latch
Problems:
Reduces
R d dissipation
di i ti by f t off N
b att mostt a factor
Also reduces HW efficiency by order N!
In worst case
case, compared to a pipelined
pipelined, sequential
circuit
Approach #2 (Knight & Younis, ‘93):93):
The “store output” stage can also be reversible!
Gives fully-adiabatic,
y , sequential,
q , pipelined
pp circuits!
N can be as small 1 or 2 & still have arbitrarily high Q
Simple Reversible CMOS Latch
Uses a standard CMOS transmission gate
Sequence of operation:
(1) input initially matches latch contents (output)
(2) input changes→output changes (3) latch closes
(4) input removed
Retracts
R t t copy stored
t d iin llatch
t h also.
l
Input-Bias Clocked-Barrier Logic
Cycle of operation:
Data input applies bias Can amplify/restore
input signal
Add forces to do logic
in clocking step.
Clock signal raises barrier
in out in out
Ticks/cycle:
reciprocal
i l to
t rate
t off d
data
t th
throughput
h t
Transistor-ticks/cycle:
reciprocal to HW cost
cost-efficiency
efficiency
Number of required clock/power input signals:
Q
Q: What is the minimum number of externallyy provided
p
timing signals you can get away with?
A: ≤4 (≤12 if split levels are used)
φ1→0
out→1
in φ0→1 φ0→0
φ1→1
in=0 out→0
out=0
φ0→1 φ0→0
2LAL Shift Register
g Structure
1-tick delay per logic stage:
φ1 φ2 φ3 φ0
in
out
φ0 φ1 φ2 φ3
L i pulse
Logic l timing
ti i & propagation:
ti
0 1 2 3 ... 0 1 2 3 ...
iin
in
More complex logic functions
Non-inverting Boolean functions:
φ
A φ
A A B
B
A∨B
AB
For invertingg functions,, must use quad-rail
q logic
g
encoding:
To invert, just A=0 A=1
swap the rails! A0
Zero-transistor
A0
“inverters.” A1
A1
Hardware Efficiency issues
Hardware efficiency:
# of logic operations / hardware / time
Hardware space-time complexity: How much
h d
hardware for
f howh muchh time per logic
l op?
Minimizing:
(# off transistors)
t i t ) × (# off titicks)
k ) / ((gate
t cycle)
l )
SCRL inverter, w. return path:
i t ) × (6 ticks)
(8 transistors)
t ti k ) = 48 transistor-ticks
t i t ti k
Quad-rail 2LAL buffer stage:
i t ) × (4 ticks)
(16 transistors)
t ti k ) = 64 transistor-ticks
t i t ti k
More SCRL vs. 2LAL
SCRL reversible NAND, w. all inverters:
(23 transistors) × (6 ticks) = 138 TT-ticks
ticks
Quad-rail 2LAL AND:
(48 transistors) × (4 ticks) = 192 TT-ticks
ticks
Result of comparison: Although 2LAL minimizes #
of rails, and # ticks/cycle, it does not minimize
overall spacetime complexity.
The qquestion of whether 6-tick SCRL minimizes p per-
op spacetime complexity among pipelined adiabatic
CMOS logics is still open.
Minimizing Power
Power-Clock
Clock Signals
How many external clock signals required?
N-level-deep
l ld retractile
t til cascade d logic:
l i
2N waveforms × 1 phase = 2N signals
6 tick/cycle,
ti k/ l 6-phase
6 h dynamic
d i SCRL:
SCRL
6 waveforms × 6 phases = 36 signals
4 tick/cycle, 2LAL:
2
GCAL: General CMOS Adiabatic Logic
g
A general CMOS adiabatic design methodology
Currently under development at UF
Combines best features of SCRL, 2LAL, and retractile logics:
Permits designs attaining asymptotically optimal cost-efficiency
For any combination of time
time, space
space, spacetime,
spacetime energy costs
Arbitrarily high degree of reversibility
Permits using minimal 2-level and 3-level adiabatic gates
Requires only 4 externally supplied clock/power signals for 2-level
logic
And only 12 total for mixed 2-level + 3-level logic
Supports mixtures of fully-pipelined and retractile logic.
Supports quiescent dynamic/static latches & RAM cells
Tools currently under development:
A new HDL specialized for describing adiabatic designs
Digital circuit simulator with adiabaticity checker
Adiabatic logic synthesis tool, with automatic legacy design
converter
GCAL DRAM/SRAM cells
GCAL DRAM cell GCAL SRAM cell
4 transistors 8 transistors
4 word lines/row 6 word lines/row
2 bit lines/col
li / l (or
( 1) 2 bit lines/col
li / l (or
( 1)
DRAM Cell Write Cycle
1. All nodes initially ½.
T T-gate
gate initially closed (off).
(off)
2. Transmission gate opens.
Internal node is connected to
bit-line (at matching voltage).
3. Bit line transitions to 0 or 1.
Pulls internal node to matching level.
4. Transmission gate closes.
Internal node latched to new level.
5. Bit line transitions back to ½.
Prepares
P ffor a new cycle.
l
6. Use the reverse sequence of operations to unwrite.
DRAM Cell Read Cycle
1. All external nodes initially ½.
1. T-gate initially off.
2. Internal node contains data.
2. Inverter rails split.
1. Bit line set to ((inverted)) data.
3. T-gate at end of column latches bit-line data.
4. Inverter rails merge.
g
1. Bit line restored to ½ level.
5. Can use the reverse sequence of operations to unread
copy of data available at end of column.
Fully-Adiabatic DRAM cell
6T, 6 lines/row, 1 line/column (in/out together)
Read cycle:
y
Initially: φ lines neutral, out neutral, R off
Write cycle:
First, do read cycle.
in is set to out
W turns on
N1 N2
T1 M T2 T3
in out
Limits of Adiabatics
Structured Systems
A structured system is defined as a system about
whose state we have some knowledge
knowledge.
Some of its physical information is known.
those states
states, in proportion to their probability
probability.
All states
The system’s States w. of the abstract
energy is prob. > 0 system having
“in here” energy E
Desired Trajectories
j
Any structured Time
system
t we b
build
ild
to serve some Config- Desired trajectories
j
purpose has
h uration
some
desired
trajectory, or set
of trajectories,
trajectories through its configuration space that we
would ideally like it to follow at all times.
Think of any given state as having a specific
ation
DRAM cell =
Configura
system’s energy has
departed from desired
trajectory (all 106 stay)
by a small amount
Energy that has
departed from desired
j
trajectories