C 411 L 20 Multiplier

CMPEN 411
VLSI Digital Circuits

Spring 2011
Lecture 20: Multiplier Design
[Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, ©2003

J. Rabaey, A. Chandrakasan, B. Nikolic]
Sp11 CMPEN 411 L20 S.1

Review: Basic Building Blocks
Datapath
Execution units
- Adder, multiplier, divider, shifter, etc.
Register file and pipeline registers
Multiplexers, decoders
Control
Finite state machines (PLA, ROM, random logic)
Interconnect
Switches, arbiters, buses
Memory
Caches (SRAMs), TLBs, DRAMs, buffers

The Binary Multiplication
1 0 1 0 1 0 Multiplicand
x 1 0 1 1 Multiplier
1 0 1 0 1 0
1 0 1 0 1 0
0 0 0 0 0 0 Partial products
+ 1 0 1 0 1 0
1 1 1 0 0 1 1 1 0 Result

Multiply Operation
Multiplication is just a a lot of additions
N
multiplicand
multiplier
partial
N product can be formed in parallel
array
double precision product

2N

Multiplication Approaches
Right shift and add
Partial product array rows are accumulated from top to bottom
on an N-bit adder
- After each addition, right shift (by one bit) the accumulated partial
product to align it with the next row to add
Time for N bits Tserial_mult = O(N Tadder) = O(N2) for a RCA
Making it faster
Use a faster adder
Use higher radix (e.g., base 4) multiplication – O(N/2 Tadder)
- Use multiplier recoding to simplify multiple formation (booth)
Form the partial product array in parallel and add it in parallel
Making it smaller (i.e., slower)
Use serial-parallel mult
Use an array multiplier
- Very regular structure with only short wires to nearest neighbor
cells. Thus, very simple and efficient layout in VLSI Can be easily
and efficiently pipelined
Serial-parallel multiplier structure

The Array Multiplier
X3 X2 X1 X0 Y0
X3 X2 X1 X0 Y 1 Z0
HA FA FA HA
X3 X2 X1 X0 Y2 Z1
FA FA FA HA
X3 X2 X1 X0 Y3 Z2
FA FA FA HA
Z7 Z6 Z5 Z4 Z3

The MxN Array Multiplier— Critical Path
HA FA FA HA
FA FA FA HA Critical Path 1
Critical Path 2
Critical Path 1 & 2

FA FA FA HA

Carry-Save Multiplier
HA HA HA HA
HA FA FA FA
HA FA FA FA
HA FA FA HA
Vector Merging Adder

Multiplier Floorplan
X3 X2 X1 X0
Y0
Y1 HA Multiplier Cell
C S C S C S C S
Z0
FA Multiplier Cell
Y2
C S C S C S C S
Z1 Vector Merging Cell
Y3
C S C S C S C S X and Y signals are broadcasted
Z2 through the complete array.
( )
C C C C
S S S S
Z7 Z6 Z5 Z4 Z3
Sp11 CMPEN 411 L20 S.10

Booth multiplier
Encoding scheme to reduce number of stages in
multiplication.
Performs two bits of multiplication at once—requires half
the stages.
Each stage is slightly more complex than simple
multiplier, but adder/subtracter is almost as small/fast as
adder.
Sp11 CMPEN 411 L20 S.11

Booth encoding
Two’s-complement form of multiplier:
y = -2nyn + 2n-1yn-1 + 2n-2yn-2 + ... (first bit is the sign bit)
(example, y=18=010010 y= -18 = 101110 )
Rewrite using 2a = 2a+1 - 2a:

y = 2n(yn-1-yn) + 2n-1(yn-2 -yn-1) + 2n-2(yn-3 -yn-2) + ...
Consider first two terms: by looking at three bits of y, we

can determine whether to add x, 2x to partial product.
Sp11 CMPEN 411 L20 S.12

Booth actions
y = 2n(yn-1-yn) + 2n-1(yn-2 -yn-1) + 2n-2(yn-3 -yn-2) + ...
Consider first two terms: by looking at three bits of y, we

can determine whether to add x, 2x to partial product.
yi yi-1 yi-2 increment
000 0
001 x
010 x
011 2x
100 -2x
101 -x
110 -x
111 0
Sp11 CMPEN 411 L20 S.13
Booth example
x = 1001 (910), y = 0111 (710).
P0 = 00000000
y3y2y1=011 y1y0y-1=11(0)
y1y0y-1 = 110, P1 = P0 - (1001) = 11110111
x shift left for 2 bits to be 100100
y3y2y1 = 011, P2 = P1+ (10*100100) =
11110111+01001000 = 001111111 (6310)
An array multiplier needs N addtions, booth multiplier
needs only N/2 additions
Sp11 CMPEN 411 L20 S.14

Review: A 64-bit Adder/Subtractor
add/subt C0=Cin
Ripple Carry Adder (RCA) A0 1-bit
built out of 64 FAs FA S0
B0 C1
Subtraction – complement
all subtrahend bits (xor A1 1-bit
FA S1
gates) and set the low B1
C2
order carry-in
A2 1-bit
RCA FA S2
B2 C3
advantage: simple logic,
...
so small (low cost)
C63
disadvantage: slow (O(N)
for N bits) and lots of A63 1-bit
glitching (so lots of energy FA S63
consumption) B63
C64=Cout
Sp11 CMPEN 411 L20 S.15

Booth structure
Sp11 CMPEN 411 L20 S.16

Wallace-Tree Multiplier
Partial products First stage

6 5 4 3 2 1 0 6 5 4 3 2 1 0 Bit position
(a) (b)
Second stage Final adder

6 5 4 3 2 1 0 6 5 4 3 2 1 0
FA HA
(c) (d)
Sp11 CMPEN 411 L20 S.17

Wallace-Tree Multiplier
x3y2 x2y2 x3y1 x1y2 x3y0 x1y1 x2y0 x0y1

Partial products x3y3 x2y3 x1y3 x0y3 x2y1 x0y2 x1y0 x0y0
First stage
HA HA
Second stage FA FA FA FA
Final adder
z7 z6 z5 z4 z3 z2 z1 z0
Full adder = (3,2) compressor

Sp11 CMPEN 411 L20 S.18
Making it Faster: Tree Multiplier Structure
0 D
Q (‘ier)
0 D
multiple 0 D
forming 0 D (‘icand)
circuits
partial
product mux
interconnect
array +
reduction
tree
reduction
tree (log N)
+
fast carry
propagate CPA (log N)
adder (CPA)
P (product)
Sp11 CMPEN 411 L20 S.19

(4,2) Counter
Built out of two (3,2) counters (just FA’s!)
all of the inputs (4 external plus one internal) have
the same weight (i.e., are in the same bit position)
the internal carry output is fed to the next higher
weight position (indicated by the )
(3,2)
Note: Two carry outs - one

(3,2)
“internal” and one “external”
Sp11 CMPEN 411 L20 S.20

Tiling (4,2) Counters
(3,2) (3,2) (3,2)
(3,2) (3,2) (3,2)
Reduces columns four high to columns only two high

Tiles with neighboring (4,2) counters
Internal carry in at same “level” (i.e., bit position weight) as the
internal carry out
Sp11 CMPEN 411 L20 S.21

Tiling (4,2) Counters
(3,2) (3,2) (3,2)
(3,2) (3,2) (3,2)
Reduces columns four high to columns only two high

Tiles with neighboring (4,2) counters
Internal carry in at same “level” (i.e., bit position weight) as the
internal carry out
Sp11 CMPEN 411 L20 S.22

4x4 Partial Product Array Reduction
Fast 4x4 multiplication using (4,2) counters
How would you
lay it out?
multiplicand
multiplier
partial
product
array
reduced pp
array (to
CPA)
double
precision
product
Sp11 CMPEN 411 L20 S.23

Fast 4x4 multiplication using (4,2) counters
How would you
lay it out?
multiplicand
multiplier multiplicand
multiplier
partial
product
array
reduced pp five (4,2) counters

array (to
CPA) 5-bit CPA
double
precision
product 8-bit product
Sp11 CMPEN 411 L20 S.24

‘icand
Wallace tree
‘ier
multiplier
partial two rows of

product nine (4,2)
array counters
reduced one row of

partial thirteen
product (4,2)
array counters
to a 13-bit fast CPA

Sp11 CMPEN 411 L20 S.25
An 8x8 Multiplier Layout
How should it be laid out?
multiplicand
multiplier
nine (4,2) counters
nine (4,2) counters
thirteen (4,2) counters
13-bit CPA
Sp11 CMPEN 411 L20 S.26

Why Not Recode ?
Multiplier recoding (modified Booth’s, canonical, P)
recode the multiplier to allow base 4 multiplication with
simple multiple formation
with recoding have the base 4 multiplier digit set of -2, -1, 0, 1, 2
Thus, with recoding the initial partial

product array is only N/2 high N
But, the first level of (4,2) N/2

counters also reduces the
partial product array to N/2
high 2N
Which is better depends on the logic delay (recoding

wins) and interconnect complexity (counters win big)
Sp11 CMPEN 411 L20 S.27
Hitachi 54X54b Mulitplier
A 4.4 ns CMOS 54X54 multiplier using pass-transitor multiplexer
Sp11 CMPEN 411 L20 S.28

Hitachi Multiplier: Booth encoder and PPG

Sp11 CMPEN 411 L20 S.29

Hitachi multiplier: 4-2 compressor

Sp11 CMPEN 411 L20 S.30

What is the state of art?
ISSCC 2003
Sp11 CMPEN 411 L20 S.31
Multipliers —Summary
• Optimization Goals Different Vs Binary Adder
• Once Again: Identify Critical Path
• Other possible techniques

- Logarithmic versus Linear (Wallace Tree Mult)
- Data encoding (Booth)
- Pipelining
FIRST GLIMPSE AT SYSTEM LEVEL OPTIMIZATION
Sp11 CMPEN 411 L20 S.32

Next Lecture and Reminders
Next lecture
Shifters, decoders, and multiplexers
- Reading assignment – Rabaey, et al, 11.5-11.6
Sp11 CMPEN 411 L20 S.33

C 411 L 20 Multiplier

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

C 411 L 20 Multiplier

Загружено:

Авторское право:

Доступные форматы

CMPEN 411

VLSI Digital Circuits

Lecture 20: Multiplier Design

[Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, ©2003

Sp11 CMPEN 411 L20 S.1

Sp11 CMPEN 411 L20 S.2

Sp11 CMPEN 411 L20 S.3

double precision product

Sp11 CMPEN 411 L20 S.4

Sp11 CMPEN 411 L20 S.6

Sp11 CMPEN 411 L20 S.7

Critical Path 1 & 2

Sp11 CMPEN 411 L20 S.8

Vector Merging Adder

Sp11 CMPEN 411 L20 S.9

Sp11 CMPEN 411 L20 S.10

Sp11 CMPEN 411 L20 S.11

Rewrite using 2a = 2a+1 - 2a:

Consider first two terms: by looking at three bits of y, we

Sp11 CMPEN 411 L20 S.12

y = 2n(yn-1-yn) + 2n-1(yn-2 -yn-1) + 2n-2(yn-3 -yn-2) + ...

Consider first two terms: by looking at three bits of y, we

Sp11 CMPEN 411 L20 S.14

Sp11 CMPEN 411 L20 S.15

Sp11 CMPEN 411 L20 S.16

Partial products First stage

Second stage Final adder

Sp11 CMPEN 411 L20 S.17

x3y2 x2y2 x3y1 x1y2 x3y0 x1y1 x2y0 x0y1

Full adder = (3,2) compressor

Sp11 CMPEN 411 L20 S.19

Note: Two carry outs - one

Sp11 CMPEN 411 L20 S.20

(3,2) (3,2) (3,2)

(3,2) (3,2) (3,2)

Reduces columns four high to columns only two high

Sp11 CMPEN 411 L20 S.21

(3,2) (3,2) (3,2)

(3,2) (3,2) (3,2)

Reduces columns four high to columns only two high

Sp11 CMPEN 411 L20 S.22

Sp11 CMPEN 411 L20 S.23

reduced pp five (4,2) counters

Sp11 CMPEN 411 L20 S.24

partial two rows of

reduced one row of

to a 13-bit fast CPA

nine (4,2) counters

thirteen (4,2) counters

Sp11 CMPEN 411 L20 S.26

Thus, with recoding the initial partial

But, the first level of (4,2) N/2

Which is better depends on the logic delay (recoding

Sp11 CMPEN 411 L20 S.28

Sp11 CMPEN 411 L20 S.29

Sp11 CMPEN 411 L20 S.30

• Optimization Goals Different Vs Binary Adder

• Once Again: Identify Critical Path

• Other possible techniques

Sp11 CMPEN 411 L20 S.32

Sp11 CMPEN 411 L20 S.33

Вам также может понравиться