Академический Документы
Профессиональный Документы
Культура Документы
111 Lecture 13
Today: Arithmetic: Multiplication
1.Simple multiplication
2.Twos complement mult.
3.Speed: CSA & Pipelining
4.Booth recoding
5.Behavioral transformations:
Fixed-coef. mult., Canonical Signed Digits, Retiming
Acknowledgements:
• R. Katz, “Contemporary Logic Design”, Addison Wesley Publishing Company, Reading, MA, 1993. (Chapter 5)
• J. Rabaey, A. Chandrakasan, B. Nikolic, “Digital Integrated Circuits: A Design Perspective” Prentice Hall, 2003.
• Kevin Atkinson, Alice Wang, Rex Min
A3 A2 A1 A0
x B3 B2 B1 B0
SN SN-1…S0
Init: P←0, load A and B
LSB
P B NC A Repeat M times {
M bits
P ← P + (BLSB==1 ? A : 0)
N 1 N
+ xN
shift P/B right one bit
}
N+1
y0
x3 x2 x1 x0
y1
x3 x2 x1 x0
¾ Partial product computations z0
are simple (single AND gates)
HA FA FA HA
FA FA FA HA
x3 x2 x1 y3
x0
z2
FA FA FA HA
z7 z6 z5 z4 z3
6.111 Fall 2006 Lecture 13, Slide 4
2’s Complement Multiplication
(Baugh-Wooley)
Step 1: two’s complement operands so Step 3: add the ones to the partial
high order bit is –2N-1. Must sign extend products and propagate the carries. All
partial products and subtract the last one the sign extension bits go away!
Step 2: don’t want all those extra additions, so Step 4: finish computing the constants…
add a carefully chosen constant, remembering
to subtract it at the end. Convert subtraction
into add of (complement + 1).
X3Y0 X2Y0 X1Y0 X0Y0
X3Y0 X3Y0 X3Y0 X3Y0 X3Y0 X2Y0 X1Y0 X0Y0 + X3Y1 X2Y1 X1Y1 X0Y1
+ 1 + X2Y2 X1Y2 X0Y2
+ X3Y1 X3Y1 X3Y1 X3Y1 X2Y1 X1Y1 X0Y1 + X3Y3 X2Y3 X1Y3 X0Y3
+ 1 + 1 1
+ X3Y2 X3Y2 X3Y2 X2Y2 X1Y2 X0Y2
+ 1
+ X3Y3 X3Y3 X2Y3 X1Y3 X0Y3 –B = ~B + 1 Result: multiplying 2’s complement operands
+ 1
+ 1
takes just about same amount of hardware as
- 1 1 1 1 multiplying unsigned operands!
6.111 Fall 2006 Lecture 13, Slide 5
2’s Complement Multiplication
y0
x3 x2 x1 x0
y1
x3 x2 x1 x0
1 z0
FA FA FA HA
x3 x2 y2
x1 x0
z1
FA FA FA HA
x3 x2 x1 y3
x0
1 z2
HA FA FA FA HA
z7 z6 z5 z4 z3
The ISE tools will often use these hardware multipliers when
you use the “*” operator in Verilog. Or can you instantiate
them directly yourself:
wire signed [17:0] a,b;
wire signed [35:0] result;
MULT18X18 mymult(.A(a),.B(b),.P(result));
CSA
= register
Throughput = 1 result per clock cycle (period is now 4*tPD,FA instead of 8*tPD,FA)
6.111 Fall 2006 Lecture 13, Slide 10
Wallace Tree Multiplier
...
hackers: counts
number of 1’s on the
3 inputs, outputs 2-
bit result. CSA CSA
three bits at a
time CSA
M/2 2
...
BK+1,K*A = 0*A → 0
Booth’s insight: rewrite = 1*A → A
2*A and 3*A cases, = 2*A → 4A – 2A
leave 4A for next partial = 3*A → 4A – A
product to do!
6.111 Fall 2006 Lecture 13, Slide 12
Booth recoding
current bit pair from previous bit pair
X Z
Y = (1001)2 = 23 + 20
<< 3
shifts using wiring
6.111 Fall 2006 Lecture 13, Slide 15
Transform: Canonical Signed Digits (CSD)
Canonical signed digit representation is used to increase the number of
zeros. It uses digits {-1, 0, 1} instead of only {0, 1}.
0 1 1 0 1 1 1 1 0 1 1 1 0 0 0 -1
1 0 0 -1 0 0 0 -1
X << 7 Z
<< 4
Shift translates to re-wiring
6.111 Fall 2006 Lecture 13, Slide 16
Algebraic Transformations
Commutativity Distributivity
A C B
A B A B
B A
C
⇔
⇔
A + B = B + A (A + B) C = AB + BC
A B
A B
(A + B) + C = A + (B+C)
6.111 Fall 2006 Lecture 13, Slide 17
Transforms for Efficient Resource Utilization
distributivity
A C B D E FG H I
Reduce number of
operators to 2 multipliers
1
and 2 adders
D
D
D
D
D
Cutset retiming: A cutset intersects the edges, such that this would result in
two disjoint partitions of these edges being cut. To retime, delays are moved
from the ingoing to the outgoing edges or vice versa.
Benefits of retiming:
• Modify critical path delay
• Reduce total number of registers
6.111 Fall 2006 Lecture 13, Slide 19
Pipelining, Just Another Transformation
(Pipelining = Adding Delays + Retiming)
Contrary to retiming,
add input pipelining adds extra
registers registers to the system
D D
D D
How to pipeline:
1. Add extra registers at all
retime inputs (or, equivalently, all
outputs)
2. Retime
D
D D
D D
D 2D
A A A
associativity
x(n) y(n)
x(n) y(n)
retiming
A D D D D 2D
A A2
A2
precomputed
6.111 Fall 2006 Lecture 13, Slide 21
Summary x3
x3
x2
x2
x1
x1
x0
y1
x0
y0
1 z0
FA FA FA HA
x3 x2 y2
x1 x0
z1
• Simple multiplication:
FA FA FA HA
x3 x2 x1 y3
x0
1 z2
– O(N) delay
HA FA FA FA HA
z7 z6 z5 z4 z3
• Faster multipliers:
– Wallace Tree O(log N)
• Booth recoding:
– Add using 2 bits at a time x(n) y(n)
• Behavioral Transformations: A D D D