Вы находитесь на странице: 1из 33

CMPEN 411

VLSI Digital Circuits


Spring 2011

Lecture 20: Multiplier Design

[Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, ©2003


J. Rabaey, A. Chandrakasan, B. Nikolic]

Sp11 CMPEN 411 L20 S.1


Review: Basic Building Blocks
 Datapath
 Execution units
- Adder, multiplier, divider, shifter, etc.
 Register file and pipeline registers
 Multiplexers, decoders

 Control
 Finite state machines (PLA, ROM, random logic)

 Interconnect
 Switches, arbiters, buses

 Memory
 Caches (SRAMs), TLBs, DRAMs, buffers

Sp11 CMPEN 411 L20 S.2


The Binary Multiplication

1 0 1 0 1 0 Multiplicand
x 1 0 1 1 Multiplier
1 0 1 0 1 0
1 0 1 0 1 0

0 0 0 0 0 0 Partial products

+ 1 0 1 0 1 0

1 1 1 0 0 1 1 1 0 Result

Sp11 CMPEN 411 L20 S.3


Multiply Operation
 Multiplication is just a a lot of additions

N
multiplicand
multiplier

partial
N product can be formed in parallel
array

double precision product


2N

Sp11 CMPEN 411 L20 S.4


Multiplication Approaches
 Right shift and add
 Partial product array rows are accumulated from top to bottom
on an N-bit adder
- After each addition, right shift (by one bit) the accumulated partial
product to align it with the next row to add
 Time for N bits Tserial_mult = O(N Tadder) = O(N2) for a RCA

 Making it faster
 Use a faster adder
 Use higher radix (e.g., base 4) multiplication – O(N/2 Tadder)
- Use multiplier recoding to simplify multiple formation (booth)
 Form the partial product array in parallel and add it in parallel
 Making it smaller (i.e., slower)
 Use serial-parallel mult
 Use an array multiplier
- Very regular structure with only short wires to nearest neighbor
cells. Thus, very simple and efficient layout in VLSI Can be easily
Sp11 CMPEN 411 L20 S.5
and efficiently pipelined
Serial-parallel multiplier structure

Sp11 CMPEN 411 L20 S.6


The Array Multiplier

X3 X2 X1 X0 Y0

X3 X2 X1 X0 Y 1 Z0

HA FA FA HA

X3 X2 X1 X0 Y2 Z1

FA FA FA HA

X3 X2 X1 X0 Y3 Z2

FA FA FA HA

Z7 Z6 Z5 Z4 Z3

Sp11 CMPEN 411 L20 S.7


The MxN Array Multiplier— Critical Path

HA FA FA HA

FA FA FA HA Critical Path 1
Critical Path 2

Critical Path 1 & 2


FA FA FA HA

Sp11 CMPEN 411 L20 S.8


Carry-Save Multiplier

HA HA HA HA

HA FA FA FA

HA FA FA FA

HA FA FA HA

Vector Merging Adder

Sp11 CMPEN 411 L20 S.9


Multiplier Floorplan

X3 X2 X1 X0

Y0
Y1 HA Multiplier Cell
C S C S C S C S
Z0

FA Multiplier Cell
Y2
C S C S C S C S
Z1 Vector Merging Cell

Y3
C S C S C S C S X and Y signals are broadcasted
Z2 through the complete array.
( )

C C C C
S S S S

Z7 Z6 Z5 Z4 Z3

Sp11 CMPEN 411 L20 S.10


Booth multiplier
 Encoding scheme to reduce number of stages in
multiplication.
 Performs two bits of multiplication at once—requires half
the stages.
 Each stage is slightly more complex than simple
multiplier, but adder/subtracter is almost as small/fast as
adder.

Sp11 CMPEN 411 L20 S.11


Booth encoding
 Two’s-complement form of multiplier:
 y = -2nyn + 2n-1yn-1 + 2n-2yn-2 + ... (first bit is the sign bit)
(example, y=18=010010 y= -18 = 101110 )

 Rewrite using 2a = 2a+1 - 2a:


 y = 2n(yn-1-yn) + 2n-1(yn-2 -yn-1) + 2n-2(yn-3 -yn-2) + ...

 Consider first two terms: by looking at three bits of y, we


can determine whether to add x, 2x to partial product.

Sp11 CMPEN 411 L20 S.12


Booth actions

 y = 2n(yn-1-yn) + 2n-1(yn-2 -yn-1) + 2n-2(yn-3 -yn-2) + ...

 Consider first two terms: by looking at three bits of y, we


can determine whether to add x, 2x to partial product.
yi yi-1 yi-2 increment
000 0
001 x
010 x
011 2x
100 -2x
101 -x
110 -x
111 0
Sp11 CMPEN 411 L20 S.13
Booth example
 x = 1001 (910), y = 0111 (710).
 P0 = 00000000
 y3y2y1=011 y1y0y-1=11(0)
 y1y0y-1 = 110, P1 = P0 - (1001) = 11110111
 x shift left for 2 bits to be 100100
 y3y2y1 = 011, P2 = P1+ (10*100100) =
11110111+01001000 = 001111111 (6310)
 An array multiplier needs N addtions, booth multiplier
needs only N/2 additions

Sp11 CMPEN 411 L20 S.14


Review: A 64-bit Adder/Subtractor
add/subt C0=Cin
 Ripple Carry Adder (RCA) A0 1-bit
built out of 64 FAs FA S0
B0 C1
 Subtraction – complement
all subtrahend bits (xor A1 1-bit
FA S1
gates) and set the low B1
C2
order carry-in
A2 1-bit
 RCA FA S2
B2 C3
 advantage: simple logic,

...
so small (low cost)
C63
 disadvantage: slow (O(N)
for N bits) and lots of A63 1-bit
glitching (so lots of energy FA S63
consumption) B63
C64=Cout

Sp11 CMPEN 411 L20 S.15


Booth structure

Sp11 CMPEN 411 L20 S.16


Wallace-Tree Multiplier

Partial products First stage


6 5 4 3 2 1 0 6 5 4 3 2 1 0 Bit position

(a) (b)

Second stage Final adder


6 5 4 3 2 1 0 6 5 4 3 2 1 0

FA HA
(c) (d)

Sp11 CMPEN 411 L20 S.17


Wallace-Tree Multiplier

x3y2 x2y2 x3y1 x1y2 x3y0 x1y1 x2y0 x0y1


Partial products x3y3 x2y3 x1y3 x0y3 x2y1 x0y2 x1y0 x0y0

First stage
HA HA

Second stage FA FA FA FA

Final adder
z7 z6 z5 z4 z3 z2 z1 z0

Full adder = (3,2) compressor


Sp11 CMPEN 411 L20 S.18
Making it Faster: Tree Multiplier Structure
0 D
Q (‘ier)
0 D
multiple 0 D
forming 0 D (‘icand)
circuits

partial
product mux

interconnect
array +
reduction
tree
reduction
tree (log N)
+
fast carry
propagate CPA (log N)
adder (CPA)
P (product)

Sp11 CMPEN 411 L20 S.19


(4,2) Counter
 Built out of two (3,2) counters (just FA’s!)
 all of the inputs (4 external plus one internal) have
the same weight (i.e., are in the same bit position)
 the internal carry output is fed to the next higher
weight position (indicated by the )

(3,2)

Note: Two carry outs - one


(3,2)
“internal” and one “external”

Sp11 CMPEN 411 L20 S.20


Tiling (4,2) Counters

(3,2) (3,2) (3,2)

(3,2) (3,2) (3,2)

 Reduces columns four high to columns only two high


 Tiles with neighboring (4,2) counters
 Internal carry in at same “level” (i.e., bit position weight) as the
internal carry out

Sp11 CMPEN 411 L20 S.21


Tiling (4,2) Counters

(3,2) (3,2) (3,2)

(3,2) (3,2) (3,2)

 Reduces columns four high to columns only two high


 Tiles with neighboring (4,2) counters
 Internal carry in at same “level” (i.e., bit position weight) as the
internal carry out

Sp11 CMPEN 411 L20 S.22


4x4 Partial Product Array Reduction
 Fast 4x4 multiplication using (4,2) counters
 How would you
lay it out?
multiplicand
multiplier

partial
product
array

reduced pp
array (to
CPA)
double
precision
product

Sp11 CMPEN 411 L20 S.23


4x4 Partial Product Array Reduction
 Fast 4x4 multiplication using (4,2) counters
 How would you
lay it out?
multiplicand
multiplier multiplicand

multiplier
partial
product
array

reduced pp five (4,2) counters


array (to
CPA) 5-bit CPA
double
precision
product 8-bit product

Sp11 CMPEN 411 L20 S.24


8x8 Partial Product Array Reduction
‘icand
 Wallace tree
‘ier
multiplier

partial two rows of


product nine (4,2)
array counters

reduced one row of


partial thirteen
product (4,2)
array counters

to a 13-bit fast CPA


Sp11 CMPEN 411 L20 S.25
An 8x8 Multiplier Layout
 How should it be laid out?
multiplicand

multiplier
nine (4,2) counters

nine (4,2) counters

thirteen (4,2) counters

13-bit CPA

Sp11 CMPEN 411 L20 S.26


Why Not Recode ?
 Multiplier recoding (modified Booth’s, canonical, P)
recode the multiplier to allow base 4 multiplication with
simple multiple formation
 with recoding have the base 4 multiplier digit set of -2, -1, 0, 1, 2

 Thus, with recoding the initial partial


product array is only N/2 high N

 But, the first level of (4,2) N/2


counters also reduces the
partial product array to N/2
high 2N

 Which is better depends on the logic delay (recoding


wins) and interconnect complexity (counters win big)
Sp11 CMPEN 411 L20 S.27
Hitachi 54X54b Mulitplier
 A 4.4 ns CMOS 54X54 multiplier using pass-transitor multiplexer

Sp11 CMPEN 411 L20 S.28


Hitachi Multiplier: Booth encoder and PPG


Sp11 CMPEN 411 L20 S.29


Hitachi multiplier: 4-2 compressor


Sp11 CMPEN 411 L20 S.30


What is the state of art?

ISSCC 2003
Sp11 CMPEN 411 L20 S.31
Multipliers —Summary

• Optimization Goals Different Vs Binary Adder

• Once Again: Identify Critical Path

• Other possible techniques


- Logarithmic versus Linear (Wallace Tree Mult)
- Data encoding (Booth)
- Pipelining
FIRST GLIMPSE AT SYSTEM LEVEL OPTIMIZATION

Sp11 CMPEN 411 L20 S.32


Next Lecture and Reminders
 Next lecture
 Shifters, decoders, and multiplexers
- Reading assignment – Rabaey, et al, 11.5-11.6

Sp11 CMPEN 411 L20 S.33

Вам также может понравиться