Вы находитесь на странице: 1из 68

Adiabatic Logic

Power Consumption in CMOS Circuits


-Dynamic
y Power dissipation
p
1. Switching Power
2. Short circuit power

-Static power dissipation


1. DC power
2. Leakage power

2
Low-Power Design
g Techniques
q
Implementation at different levels of
abstraction

Technology Physical Design


-Scaling -Power driven P&R
-VT optimization -Low power layout
-Alternative technology optimization

C
Circuit/Logic
/ Architecture
h
-Logic style: Adiabatic -Parallelization & pipelining
-Multi & variable supply -Retiming
-Glitch minimization

System
-Partitioning
-Component choice
-Power
Power management
Principles
Adiabatic Logic
„ Adiabatic : reversible thermodynamic process that
occurs without gain or loss of heat
„ Ideal adiabatic logic: charge can be recycled (reused)
fo infinite number
for n mbe of times
„ Real adiabatic logic: charge recycled many times so that
significant power reduction can be possible
„ To achieve the charge saving expected from adiabatic
logic, one of two power supplies is to be used:

- Constant current power supply


- Variable voltage supply

5
Adiabatic Charging Principle

Energy can be traded for delay by increasing


charge transport time

6
Adiabatic Recoveryy Requirements
q
„ Three principles :
- Device turn on under Vds=0
- Source
Source-drain
drain voltage change under device off
- Gradual voltage change

7
Carnot Cycle
„ In 1822-24, Sadi Carnot analyzed the efficiency of an ideal
heat engine all of whose steps were reversible, and
furthermore proved that:
„ Any reversible engine (regardless of details) would

have the same efficiency (TH−TL)/TH.


„ No engine could have greater efficiency than a

reversible engine w/o producing work from nothing


„ Temperature
p itself could be defined on a
thermodynamic scale based on heat recoverable by a
reversible engine operating between TH and TL
Degree
g of Reversibilityy
„ The degree of reversibility (a.k.a. reversibility, a.k.a.
thermodynamic
y efficiencyy) of anyy quasi-adiabatic
q
process is defined as the ratio of:
E free (0)
„ the total free energy at the start of the process

„ ÷ by the total energy spent in the process


ΔEspent (t()
„ Or,, equivalently:
q y
„ the known, accessible information at the start
K0
„ ÷ byy the amount that is converted to entropy py ΔSt
„ This same quantity is referred to as the (per-cycle)
“quality factor” Q for any resonant element (e.g., LC
oscillator) in EE.
Electrical Resistance
„ Pspent=I2R=(Q/t)2R,
or Espent =Pt = Q2R/t
/ ← scaling
g with 1//t
„ Charge transfer through a resistor obeys the adiabatic
principle!
„ Why?
„ Conduction electrons have a large Fermi velocity or
thermal velocity relative to drift velocity
velocity.
„ Scatter off of lattice-atom cross-sections with a

mean free time tf that is fairlyy independent


p of drift
velocity
„ Each scattering event thermalizes the electron’s drift
ki ti energy - a frac.
kinetic f f off current’s t l Ek
t’ ttotal
Some Loss-Inducing
g Interactions
For ordinary voltage-coded electronics:
„ Interactions whose dissipation
p scales with speed:
p
„ Parasitic EM emission from dynamic (C,L)
reactances
„ Scattering
S i off ballistic
b lli i electrons
l from
f lattice
l i
imperfections, causing Ohmic resistance
„ Interactions having different scaling laws:

„ Interference from outside EM sources

„ Thermally
Thermally-activated
activated leakage of electrons over
potential energy barriers
„ Quantum tunneling of electrons through narrow
barriers (sub-Fermi wavelength)
„ Losses due to intentional treatment of known
physical information as entropy (bit erasure)
Some Ways to Reduce Losses
„ EM interference / emission: Add shielding, use high-Q
MEMS/NEMS oscillators
ill t
„ Scattering/resistance: Ballistic FETs, superconductors
„ Thermal
h l leakage:
l k avoid
d low
l VT and/or
d/ h high
h temps
„ Tunneling: thick tunnel barriers, high-κ dielectrics,
conductors
d t w. llow Fermi-level/high
F il l/hi h electron
l t affinity,
ffi it
vacuum-gap barriers?
„ Intentional bit erasure: reduce voltages
voltages, use mostly
mostly-
reversible adiabatic logic designs
Adiabatic electronics &
CMOS implementations
Conventional Gates are Irreversible
„ Logic gate behavior (on receiving new input):
„ Many
Many-to-one
to one transformation of local state!
„ Required to dissipate bT, by Landauer principle

„ Incurs ½CV dissipation in 2 out of 4 cases.


cases
2

Transformation of local state:


Example:
Just before After
Static CMOS Inverter:
transition: transition:
iin outt i outt
in
in out 0 0
0 1 0 1
1 0 1 0
1 1
Exact formula:
Ediss = f (1 + f (e −1/ f − 1))⋅ CV 2
for frequency reduction
f :≡ RC/t
Adiabatic Logic Families
„ Partially adiabatic circuits
- Some energy is recovered
‹ 2N2P / 2N-2N2P
‹ CAL (Clocked CMOS Adiabatic Logic)
‹ TSEL (True Single Phase Adiabatic)
‹ SCAL (Source-coupled
(S l d Adi
Adiabatic
b ti Logic)
L i )
„ Fully adiabatic circuits
- Dissipate little energy,
energy very slow
‹ PAL (Pass-transistor Adiabatic Logic)
‹ Split-level Charge Recovery Logic (SCRL)

19
Complementary Pass-Transistor
Pass Transistor Energy
Recovery Adiabatic Logic (CPERL)

20
CPERL (1)
„ All NMOS
„ Gate consists of two parts
1) Charge/discharge
function part (M1 – M6)
2) Logic function part
(M9 – M12)

21
CPERL (2)

22
CPERL (3)
„ An assumption was made that ϕ1 and IN are
in the same phase
„ As ϕ1 ramps up,up IN rises also
„ Inbar remains low
„ M9 & M11 turns on
„ BN1 is prechargedg to (Vdd – Vth)
„ BN2 is still at low voltage

23
CPERL (4)
„ When ϕ1 ramps p down, IN goes
g down also
causing M9 & M11 to turn off
„ As ϕ2 rampsp upp and due to the gate-to-channel
g
capacitance in M1, BN1 goes higher than Vdd
causing M1 to turn on
„ ϕ2 will charge the node OUT in an adiabatic
manner to Vdd
„ As ϕ2 ramps down, OUT goes down also
„ The charge stored on OUT is recovered to
supplied through the discharge process

24
CPERL (5)
„ Two stages
g of CPERL inverters
chain are shown and just half of
the circuit for the simplicity
„ During period t1, A is assumed
high and BN2 is at (Vdd-Vth)
„ During t2, ϕ2 ramps downand
the the charge will trapped at
BN2
„ During t3, ϕ2 rises again
„ Assuming that A Low and /A
high, M10 of stage 2 will turn on

25
CPERL (6)
„ M3 of stage 1 will turn on also
„ Current will flow through M10 & M3due to voltage
difference
„ This chargeg sharing g will stop
p when a voltage
g balance
occurs between the nodes
„ M5 is working under diode connection
„ If the voltage difference is still higher than Vth, M5 will
turn on until the voltage difference becomes lower than
Vth
„ If this difference is already less than Vth, M5 will stay off

26
CPERL (7)
„ Brent Kung adder has three units:
- Propagate and Generate unit
- Carry parallel prefix unit
- Sum unit

27
CPERL (8)

28
CPERL (9)

29
CPERL (10)

30
SCRL: Split-level Charge
Recovery Logic
φ

Transformation of local state:


Just before After
transition: transition:
in out in out
0 ½ 0 1
1 ½ 1 0
Input-Barrier, Clocked-Bias Retractile
„ Cycle of operation: * Must reset output
„ Inputs raise or lower barriers prior to input.
p p
„ Do logic w. series/parallel * Combinational logic
barriers only!
„ Clock applies bias force which

changes state, or not

0 0
0
Examples:
E l
Hall’s logic, Input barrier height
SCRL gates,
gates
Rod logic interlocks
0 N 1
Clocked force applied →
Retractile Logic
g w. SCRL gates
g
„ Simple combinational logic of any depth N:
„ Requires
R i N timing
ti i phases
h
„ Non-pipelined Time →
„ No
N sequential
ti l reuse off
HW (even worse)
„ We need
sequential
logic!
Sequential
q Retractile Logic
g
„ Approach #1 (Hall ‘92):
„ After every N stages, invoke an irreversible latch

„ stores the output of the last stage

„ Then, retract all the stages,

„ and begin a new cycle

„ Problems:
„ Reduces
R d dissipation
di i ti by f t off N
b att mostt a factor
„ Also reduces HW efficiency by order N!

„ In worst case
case, compared to a pipelined
pipelined, sequential
circuit
„ Approach #2 (Knight & Younis, ‘93):93):
„ The “store output” stage can also be reversible!

„ Gives fully-adiabatic,
y , sequential,
q , pipelined
pp circuits!
„ N can be as small 1 or 2 & still have arbitrarily high Q
Simple Reversible CMOS Latch
„ Uses a standard CMOS transmission gate
„ Sequence of operation:
(1) input initially matches latch contents (output)
(2) input changes→output changes (3) latch closes
(4) input removed

P Before Input Input


input: arrived: removed:
in out in out in out in out
a a a a a a
b b a b
P
Resetting
g a Reversible Latch
„ Can reversibly unlatch data as follows:
(exactly the reverse of the latching process)
„ (1) Data value d stored on memory node M.

„ (2) Present an exact copy of d on input.


input
„ (3) Open the latch (connecting input to M).

„ No dissipation since voltage levels match

„ (4) Retract the copy of d from the input.

„ Retracts
R t t copy stored
t d iin llatch
t h also.
l
Input-Bias Clocked-Barrier Logic
„ Cycle of operation:
„ Data input applies bias Can amplify/restore
input signal
„ Add forces to do logic
in clocking step.
„ Clock signal raises barrier

„ Data input bias removed Retract


1 1
input
Can reset latch
reversibly given Retract 0 Clock
0
copy of input barrier
contents. up
Clock up
Examples: Adiabatic Input Input
QDCA, SCRL latch, Rod “0” “1”
logic latch,
latch PQ logic,
logic 0 N 1
Buckled logic
SCRL 6-tick clock cycle
Initial state: All gates off, Tick #1: Input goes Tick #2: Forward gate
all nodes neutral. valid, forward T-gate charges, output goes valid.
opens. (Tick #1 of subsequent
g te )
gate.)
in out in out in out

Tick #3: Forward T-gate Tick #5: Reverse gate


closes reverse gate charges
closes, charges. discharges input goes neutral
discharges, neutral.

in out in out

Tick #4: Reverse Tick #6: Reverse


T-gate opens,
in out T-gate closes, in out
forward g
gate output goes
discharges. neutral.
Ready for next
input!
24 ticks/cycle
ti k / l
in this version-
includes 22-level
level
retractile stages
Some Timing Terminology
For sequential adiabatic circuits:
„ 1 Tick: Time for a single
g ramp p transition
„ adiabatic speed fraction f times the RC gate delay.

„ 1 Phase: Latency for a data value to propagate forward


by 1 pipeline stage.
„ 1 Cycle: Minimum period for all timing information to
return back to its initial state.
state
„ Diadic: Two retractile levels per gate

„ permits inverting or non-inverting


non inverting logic
logic.
„ Dual rail: Two wires per logic value

„ permits universal logic with monadic gates

Monadic: only 1 level


Some Figures of Demerit
„ Some quantities we may wish to minimize:
„ Ticks/phase:

„ proportional to logic propagation latency

„ Ticks/cycle:

„ reciprocal
i l to
t rate
t off d
data
t th
throughput
h t
„ Transistor-ticks/cycle:

„ reciprocal to HW cost
cost-efficiency
efficiency
„ Number of required clock/power input signals:

„ supplying these may be a significant

component of system cost


„ Number of distinct voltageg levels required:
q
„ may affect reliability/power tradeoff
Some Interesting
g Questions
Q
„ About pipelined, sequential, fully-adiabatic CMOS logic:
„ Q: Does it require an intermediate voltage level?

„ A: No, you can get by with only 2 different levels.

„ Q
Q: What is the minimum number of externallyy provided
p
timing signals you can get away with?
„ A: ≤4 (≤12 if split levels are used)

„ Q: Can the order-N different timing signals needed for


long retractile cascades be internally generated within
an adiabatic circuit?
„ A: Yes, but not statically, unless N hardware is used
2

„ where N is the number of stages per full


sequential cycle
„ We now demonstrate these answers.
Some SCRL timing diagrams
2LAL: 2-Level
2 Level Adiabatic Logic
2LAL: 2-level Adiabatic Logic
g
(Implementable using ordinary CMOS transistors)
„ Use simplified
p T-gate
g symbol:
y
Basic buffer element: P
„
P
„ cross-coupled
p T-gates
g
:≡
„ Only 4 timing signals, φ1
4 ticks per cycle: in Tick #
0 1 2 3 P
„ φi rises during tick i
φ0
„ φi falls during tick (i+2)
out φ1
mod 4 φ0 φ2
φ3
2LAL Cycle of Operation
Tick #0 Tick #1 Tick #2 Tick #3
φ1→1
in→1 in→0

φ1→0
out→1
in φ0→1 φ0→0
φ1→1
in=0 out→0

out=0
φ0→1 φ0→0
2LAL Shift Register
g Structure
„ 1-tick delay per logic stage:
φ1 φ2 φ3 φ0
in

out
φ0 φ1 φ2 φ3

„ L i pulse
Logic l timing
ti i & propagation:
ti
0 1 2 3 ... 0 1 2 3 ...
iin
in
More complex logic functions
„ Non-inverting Boolean functions:
φ
A φ

A A B
B
A∨B
AB
„ For invertingg functions,, must use quad-rail
q logic
g
encoding:
„ To invert, just A=0 A=1
swap the rails! A0
„ Zero-transistor
A0
“inverters.” A1
A1
Hardware Efficiency issues
„ Hardware efficiency:
# of logic operations / hardware / time
„ Hardware space-time complexity: How much
h d
hardware for
f howh muchh time per logic
l op?
„ Minimizing:
(# off transistors)
t i t ) × (# off titicks)
k ) / ((gate
t cycle)
l )
„ SCRL inverter, w. return path:
i t ) × (6 ticks)
„ (8 transistors)
t ti k ) = 48 transistor-ticks
t i t ti k
„ Quad-rail 2LAL buffer stage:
i t ) × (4 ticks)
„ (16 transistors)
t ti k ) = 64 transistor-ticks
t i t ti k
More SCRL vs. 2LAL
„ SCRL reversible NAND, w. all inverters:
„ (23 transistors) × (6 ticks) = 138 TT-ticks
ticks
„ Quad-rail 2LAL AND:
„ (48 transistors) × (4 ticks) = 192 TT-ticks
ticks
„ Result of comparison: Although 2LAL minimizes #
of rails, and # ticks/cycle, it does not minimize
overall spacetime complexity.
„ The qquestion of whether 6-tick SCRL minimizes p per-
op spacetime complexity among pipelined adiabatic
CMOS logics is still open.
Minimizing Power
Power-Clock
Clock Signals
„ How many external clock signals required?
„ N-level-deep
l ld retractile
t til cascade d logic:
l i
„ 2N waveforms × 1 phase = 2N signals

„ 6 tick/cycle,
ti k/ l 6-phase
6 h dynamic
d i SCRL:
SCRL
„ 6 waveforms × 6 phases = 36 signals

„ 24 tick/cycle, 3-phase static SCRL:

„ 12 waveforms × 3 phases = 36 signals

„ 4 tick/cycle, 2LAL:

„ 1 waveform × 4 phases = 4 signals!

„ It turns out that 12 signals are sufficient to implement


any combination of 2-level or 3-level logics (including
retractile) on
on-chip!
chip!
How to Do It
„ Circular 2LAL shifter; pulse-gated clocks Tick #
0 1 2 3
P1 P2 P3 P0 P0
0 in P1
P2
P3
out
P0 P1 P2 P3 φ0
φ2 φ2 φ1
φ2
φ3

2
GCAL: General CMOS Adiabatic Logic
g
„ A general CMOS adiabatic design methodology
„ Currently under development at UF
„ Combines best features of SCRL, 2LAL, and retractile logics:
„ Permits designs attaining asymptotically optimal cost-efficiency
„ For any combination of time
time, space
space, spacetime,
spacetime energy costs
„ Arbitrarily high degree of reversibility
„ Permits using minimal 2-level and 3-level adiabatic gates
„ Requires only 4 externally supplied clock/power signals for 2-level
logic
„ And only 12 total for mixed 2-level + 3-level logic
„ Supports mixtures of fully-pipelined and retractile logic.
„ Supports quiescent dynamic/static latches & RAM cells
„ Tools currently under development:
„ A new HDL specialized for describing adiabatic designs
„ Digital circuit simulator with adiabaticity checker
„ Adiabatic logic synthesis tool, with automatic legacy design
converter
GCAL DRAM/SRAM cells
„ GCAL DRAM cell „ GCAL SRAM cell
„ 4 transistors „ 8 transistors
„ 4 word lines/row „ 6 word lines/row
„ 2 bit lines/col
li / l (or
( 1) „ 2 bit lines/col
li / l (or
( 1)
DRAM Cell Write Cycle
1. All nodes initially ½.
„ T T-gate
gate initially closed (off).
(off)
2. Transmission gate opens.
„ Internal node is connected to
bit-line (at matching voltage).
3. Bit line transitions to 0 or 1.
„ Pulls internal node to matching level.
4. Transmission gate closes.
„ Internal node latched to new level.
5. Bit line transitions back to ½.
„ Prepares
P ffor a new cycle.
l
6. Use the reverse sequence of operations to unwrite.
DRAM Cell Read Cycle
1. All external nodes initially ½.
1. T-gate initially off.
2. Internal node contains data.
2. Inverter rails split.
1. Bit line set to ((inverted)) data.
3. T-gate at end of column latches bit-line data.
4. Inverter rails merge.
g
1. Bit line restored to ½ level.
5. Can use the reverse sequence of operations to unread
copy of data available at end of column.
Fully-Adiabatic DRAM cell
„ 6T, 6 lines/row, 1 line/column (in/out together)
„ Read cycle:
y
„ Initially: φ lines neutral, out neutral, R off

„ R for desired row turns on

„ φ for desired row splits, driving out column

„ R turns off, out is read

„ φ merges, out is reset

„ Write cycle:
„ First, do read cycle.

„ in is set to out

„ W turns on

„ in changed to new value


value...
Fully-Adiabatic
y SRAM
„ 10-T, 10 lines/row, 1 line/column
„ Operation similar to DRAM, except:
„ R d t
Read-out:
T2 off; N2 retracts; T3 on; N2 asserts; T2 on, T3 off
„ Write:
T2 off; N2 retracts; N1 retracts, copy of M presented
on input; T1 on; in
changes;
h T1 off,
ff N1
asserts; N2 asserts; T2 on

N1 N2

T1 M T2 T3
in out
Limits of Adiabatics
Structured Systems
„ A structured system is defined as a system about
whose state we have some knowledge
knowledge.
„ Some of its physical information is known.

„ ∴ Its entropy is not at a maximum (by defn.).


defn )
„ ∴ It is not at equilibrium (by defn.).

„ For states with a given energy E,


„ we say the system’s energy is distributed among

those states
states, in proportion to their probability
probability.

All states
The system’s States w. of the abstract
energy is prob. > 0 system having
“in here” energy E
Desired Trajectories
j
„ Any structured Time
system
t we b
build
ild
to serve some Config- Desired trajectories
j
purpose has
h uration
some
desired
trajectory, or set
of trajectories,
trajectories through its configuration space that we
would ideally like it to follow at all times.
„ Think of any given state as having a specific

“desirability” at any given time.


Energy
gy Losses
„ Energy dissipation can be viewed as a departure of
part of the system’s
system s energy away from the system
system’ss
desired trajectory.
„ E.g., 1 of 106 electrons Time
leaks out of a

ation
DRAM cell =

Configura
system’s energy has
departed from desired
trajectory (all 106 stay)
by a small amount
Energy that has
departed from desired
j
trajectories

Вам также может понравиться