Вы находитесь на странице: 1из 54

VLSI Arithmetic

Lecture 5

Prof. Vojin G. Oklobdzija


University of California

http://www.ece.ucdavis.edu/acsel
Review

Lecture 4
Ling’s Adder
Huey Ling, “High-Speed Binary Adder”
IBM Journal of Research and Development, Vol.5, No.3, 1981.

Used in: IBM 3033, IBM S370/168, Amdahl V6, HP etc.


Ling’s Derivations ai bi

define: Ci 1  g i  pi  Ci
H i 1  Ci 1  Ci ci+1 ci

g i  ai bi
gi implies Ci+1 which implies si
Hi+1 , thus: gi= gi Hi+1
pi Ci  pi Ci  pi g i  pi pi Ci ai bi pi gi ti
 pi Ci  pi Ci 1  pi H i 1 0 0 0 0 0

0 1 1 0 1
pi Ci  pi H i 1 Ci 1  ti  H i 1
1 0 1 0 1
Ci 1  gi  pi  Ci  gi H i 1  pi  Ci 1 1 0 1 1
 gi H i 1  pi  H i 1  ti  H i 1
Oklobdzija 2004 Computer Arithmetic 4
Ling’s Derivations
From: H i 1  Ci 1  Ci and Ci 1  g i  pi  Ci
H i 1  Ci 1  Ci  g i  pi Ci  Ci  g i  Ci

H i 1  g i  ti 1 H i because: Ci 1  ti  H i 1
fundamental expansion

Now we need to derive Sum equation

Oklobdzija 2004 Computer Arithmetic 5


Ling Adder

Variation of CLA: Ling’s equations:

pi  ai  bi ti  ai  bi

gi  ai  bi gi  ai  bi

Ci 1  g i  pi  Ci H i 1  g i  ti 1  H i

Si  pi  Ci Si  ti  H i 1  g i ti 1 H i

Ling, IBM J. Res. Dev, 5/81

Oklobdzija 2004 Computer Arithmetic 6


Ling Adder

Variation of CLA: Ling’s equation:


ai bi ai-1 bi-1
Ci 1  g i  g i Ci  pi  Ci Hi+1 Hi

 g i   g i  pi   Ci ci+1 gi, ti ci gi-1, ti-1 ci-1

si si-1

Ci 1  g i  ti  Ci H i 1  g i  ti 1  H i
Ling uses different transfer function.
Four of those functions have desired
properties (Ling’s is one of them)
see: Doran, IEEE Trans on Comp. Vol 37, No.9 Sept. 1988.
Oklobdzija 2004 Computer Arithmetic 7
Ling Adder

Conventional: Fan-in of 5

C4  g 3  t3 g 2  t3t 2 g1  t3t 2t1 g 0  t3t 2t1t0Cin

Ling:
H 4  g 3  t 2 g 2  t 2 t1 g1  t 2t1t0 g 0  t 2t1t0t 1Cin
H 4  g 3  g 2  t 2 g1  t 2t1 g 0  t 2t1t0Cin
Fan-in of 4

Oklobdzija 2004 Computer Arithmetic 8


Advantages of Ling’s Adder
• Uniform loading in fan-in and fan-out
• H16 contains 8 terms as compared to G16 that
contains 15.
• H16 can be implemented with one level of logic
(in ECL), while G16 can not (with 8-way wire-OR).

(Ling’s adder takes full advantage of wired-OR, of


special importance when ECL technology is
used - his IBM limitation was fan-in of 4 and
wire-OR of 8)

Oklobdzija 2004 Computer Arithmetic 9


Ling: Weinberger Notes

Oklobdzija 2004 Computer Arithmetic 10


Ling: Weinberger Notes

Oklobdzija 2004 Computer Arithmetic 11


Ling: Weinberger Notes

Oklobdzija 2004 Computer Arithmetic 12


Advantage of Ling’s Adder
• 32-bit adder used in: IBM 3033, IBM S370/
Model168, Amdahl V6.
• Implements 32-bit addition in 3 levels of
logic
• Implements 32-bit AGEN: B+Index+Disp in
4 levels of logic (rather than 6)
• 5 levels of logic for 64-bit adder used in
HP processor

Oklobdzija 2004 Computer Arithmetic 13


Implementation of Ling’s
Adder in CMOS
(S. Naffziger, “A Subnanosecond 64-b Adder”, ISSCC ‘ 96)

Oklobdzija 2004 Computer Arithmetic 14


S. Naffziger,
ISSCC’96

H 4  g 3  g 2  t 2 g1  t 2t1 g 0

Ci 1  ti  H i 1

Oklobdzija 2004 Computer Arithmetic 15


S. Naffziger,
ISSCC’96

H 4  g 3  g 2  t 2 g1  t 2t1 g 0

Oklobdzija 2004 Computer Arithmetic 16


H 4  g 3  g 2  t 2 g1  t 2t1 g 0

S. Naffziger,
ISSCC’96
Oklobdzija 2004 Computer Arithmetic 17
S. Naffziger,
ISSCC’96

Oklobdzija 2004 Computer Arithmetic 18


S. Naffziger, ISSCC’96
Oklobdzija 2004 Computer Arithmetic 19
S. Naffziger, ISSCC’96
Oklobdzija 2004 Computer Arithmetic 20
S. Naffziger,
ISSCC’96

Oklobdzija 2004 Computer Arithmetic 21


C16  p15 H16  p15 ( g15  g11  t11 g 7  t11t7 g 0 ) 
S. Naffziger, ISSCC’96
Oklobdzija 2004 Computer Arithmetic 22
S. Naffziger,
ISSCC’96

Oklobdzija 2004 Computer Arithmetic 23


S. Naffziger,
ISSCC’96

Oklobdzija 2004 Computer Arithmetic 24


S. Naffziger,
ISSCC’96

Oklobdzija 2004 Computer Arithmetic 25


Ling Adder Critical Path

Oklobdzija 2004 Computer Arithmetic 26


Ling Adder: Circuits
G3
CK CK
CK
A2 B2 A3
A1 B1 A3 B3 P4
A2 A1 A1 B1 B3
G4 A0 B0 A2 B2
B2 B1 A0

B0 CK

LC SumL
CK LCH LCL
C1L C1H C0L C0H
G0 K P
P1 G1 G2
G
P2 C1H C1L C0H C0L

CK SumH LCH LCL

CK

Oklobdzija 2004 Computer Arithmetic 27


LCS4 – Critical G
Path in1
4b

(k,p) or (g,p) P4 G3 G4
12b

C15
32b

C47 C31 C15


16b

S63 S62 S48

Oklobdzija 2004 Computer Arithmetic 28


LCS4 – Logical Effort
Delay

Prefix-4 Ling/Conditional-Sum (Dynamic - Long Carry Path)


Effort Parasitic Total Total
Total Path Delay Delay Delay Delay
Stages Branch LE Parasitic Branch Total LE Effort fo, opt (ps) (ps) (ps) (FO4)
dg3# (dg3) 4.0 0.98 2.97
g4 (NAND2) 2.0 1.11 1.84
C15# (GG4) 1.0 1.01 1.80
C15 (INV) 1.0 1.00 1.00
C47# (LC) 3.0 1.03 3.32
3.84E+02 9.73E-01 3.74E+02 1.81 66 70 136 7.2
C47 (INV) 1.0 1.00 1.00
C47#b (INV) 1.0 1.00 1.00
C47b (INV) 1.0 1.00 1.00
S63# (SUM) 16.0 0.86 1.36
S63 (INV) 1.0 1.00 1.00

Oklobdzija 2004 Computer Arithmetic 29


Results:

• 0.5u Technology
• Speed: 0.930 nS
• Nominal process, 80C, V=3.3V

See: S. Naffziger, “A Subnanosecond 64-b Adder”, ISSCC ‘ 96

Oklobdzija 2004 Computer Arithmetic 30


Prefix Adders
and
Parallel Prefix Adders
from: Ercegovac-Lang
Oklobdzija 2004 Computer Arithmetic 32
Prefix Adders
Following recurrence operation is defined:
(g, p)o(g’,p’)=(g+pg’, pp’)
such that:
(g0, p0) i=0

G i, P i =
(gi, pi)o(Gi-1, Pi-1 ) 1≤i≤n

ci+1 = Gi for i=0, 1, ….. n

c1 = g0+ p0 cin (g-1, p-1)=(cin,cin)

This operation is associative, but not commutative


It can also span a range of bits (overlapping and adjacent)
Oklobdzija 2004 Computer Arithmetic 33
from: Ercegovac-Lang

Oklobdzija 2004 Computer Arithmetic 34


Parallel Prefix Adders: variety of possibilities
from: Ercegovac-Lang

Oklobdzija 2004 Computer Arithmetic 35


Pyramid Adder:
M. Lehman, “A Comparative Study of Propagation Speed-up Circuits in Binary Arithmetic
Units”, IFIP Congress, Munich, Germany, 1962.

Oklobdzija 2004 Computer Arithmetic 36


Parallel Prefix Adders: variety of possibilities
from: Ercegovac-Lang

Oklobdzija 2004 Computer Arithmetic 37


Parallel Prefix Adders: variety of possibilities
from: Ercegovac-Lang

Oklobdzija 2004 Computer Arithmetic 38


Hybrid BK-KS Adder

Oklobdzija 2004 Computer Arithmetic 39


Parallel Prefix Adders: S. Knowles 1999

operation is associative: h>i≥j≥k

operation is idempotent: h>i≥j≥k

produces carry: cin=0

Oklobdzija 2004 Computer Arithmetic 40


Parallel Prefix Adders: Ladner-Fisher

Exploits associativity, but not idempotency.


Produces minimal logical depth

Oklobdzija 2004 Computer Arithmetic 41


Parallel Prefix Adders: Ladner-Fisher
(16,8,4,2,1)

Two wires at each level. Uniform, fan-in of two.


Large fan-out (of 16; n/2); Large capacitive loading
combined with the long wires (in the last stages)
Oklobdzija 2004 Computer Arithmetic 42
Parallel Prefix Adders: Kogge-Stone
Exploits idempotency
to limit the fan-out to 1.
Dramatic increase in
wires. The wire span
remains the same as
in Ladner-Fisher.

Buffers needed in both


cases: K-S, L-F

Oklobdzija 2004 Computer Arithmetic 43


Kogge-Stone Adder

Oklobdzija 2004 Computer Arithmetic 44


Parallel Prefix Adders: Brent-Kung

• Set the fan-out to one


• Avoids explosion of wires (as in K-S)
• Makes no sense in CMOS:
– fan-out = 1 limit is arbitrary and extreme
– much of the capacitive load is due to wire
(anyway)
• It is more efficient to insert buffers in L-F
than to use B-K scheme

Oklobdzija 2004 Computer Arithmetic 45


Brent-Kung Adder

Oklobdzija 2004 Computer Arithmetic 46


Parallel Prefix Adders: Han-Carlson

• Is a hybrid synthesis of L-F and K-S


• Trades increase in logic depth for a
reduction in fan-out:
– effectively a higher-radix variant of K-S.
– others do it similarly by serializing the prefix
computation at the higher fan-out nodes.
• Others, similarly trade the logical depth for
reduction of fan-out and wire.

Oklobdzija 2004 Computer Arithmetic 47


Parallel Prefix Adders: variety of possibilities
from: Knowles

bounded by L-F and K-S at ends

Oklobdzija 2004 Computer Arithmetic 48


Parallel Prefix Adders: variety of possibilities
Knowles 1999

Following rules are used:


• Lateral wires at the jth level span 2j bits
• Lateral fan-out at jth level is power of 2 up
to 2j
• Lateral fan-out at the jth level cannot
exceed that a the (j+1)th level.

Oklobdzija 2004 Computer Arithmetic 49


Parallel Prefix Adders: variety of possibilities
Knowles 1999
• The number of minimal depth graphs of this type
is given in:

• at 4-bits there is only K-S and L-F, afterwards


there are several new possibilities.
Oklobdzija 2004 Computer Arithmetic 50
Parallel Prefix Adders: variety of possibilities

Knowles 1999
example of a new 32-bit adder [4,4,2,2,1]
Oklobdzija 2004 Computer Arithmetic 51
Parallel Prefix Adders: variety of possibilities
Knowles 1999

Example of a new 32-bit adder [4,4,2,2,1]

Oklobdzija 2004 Computer Arithmetic 52


Parallel Prefix Adders: variety of possibilities
Knowles 1999

• Delay is given in terms of FO4 inverter delay: w.c.


(nominal case is 40-50% faster)
• K-S is the fastest
• K-S adders are wire limited (requiring 80% more area)
• The difference is less than 15% between examined schemes

Oklobdzija 2004 Computer Arithmetic 53


Parallel Prefix Adders: variety of possibilities
Knowles 1999
Conclusion
• Irregular, hybrid schmes
are possible
• The speed-up of 15% is
achieved at the cost of
large wiring, hence area
and power
• Circuits close in speed to
K-S are available at
significantly lower wiring
cost
Oklobdzija 2004 Computer Arithmetic 54

Вам также может понравиться