Вы находитесь на странице: 1из 71

1

ALU for Computers (MIPS)


design a fast ALU for the MIPS ISA
requirements ?
support the arithmetic/logic operations: add, addi addiu,
sub, subu, and, or, andi, ori, xor, xori, slt, slti, sltu, sltiu

design a multiplier
design a divider

Review Digital Logic


Gates:

Combinational Logic

Review Digital Logic


PLA: AND array, OR array

Review Digital Logic

A D latch implemented with NOR gates.

A D flip-flop with a falling-edge trigger.

Review Digital Logic


D Q

Value of D is sampled on positive


clock edge.
Q outputs sampled value for rest
of cycle.

CLK
D

Review: Edge-Triggering in Verilog


module ff(D, Q, CLK);
input D, CLK;
output Q;

Module code has two bugs.


Where?

always @ (CLK)
Q <= D;
endmodule
module ff(D, Q, CLK);
input D, CLK;
output Q;
reg Q;
always @ (posedge CLK)
Q <= D;
endmodule

Correct ?

CLK
R
(red)
Y

Change

Rst

If Change == 1 on
positive CLK
edge
traffic light
changes

(yellow)

G
(green)
RYG
100

If Rst == 1 on
positive CLK
edge
RYG=100

Rst == 1

RYG
100

Change == 1

Change == 1

RYG
001

Change == 1

RYG
010

10

Rst == 1

Change == 1

RYG
100

Change == 1

RYG
001

Change == 1

RYG
010

Change

RYG

100

00
1

010

100

11

Rst == 1

Change == 1

RYG
100

Change == 1

RYG
001

Change == 1

RYG
010

One-Hot Encoding
D

12

Rst == 1

RYG
100

Change == 1

Change == 1

RYG
001

Change == 1

Rst

RYG
010
Change

Next State Combinational


Logic
D Q R
D Q G

D Q

13

State Elements: Traffic Light Controller

wire
next_R, next_Y, next_G;
output R, Y, G;

???

14

Value of D is sampled on positive


clock edge.
Q outputs sampled value for rest
of cycle.
module ff(Q, D, CLK);

CLK

input D, CLK;
output Q;
reg Q;
always @ (posedge CLK)
Q <= D;
endmodule

15

State Elements: Traffic Light Controller

wire
next_R, next_Y, next_G;
output R, Y, G;
ff ff_R(R, next_R, CLK);
ff ff_Y(Y, next_Y, CLK);
ff ff_G(G, next_G, CLK);

16

Next State Logic: Traffic Light Controller

Rst

Change

Next State Combinational


Logic
next_R

wire

next_G

next_Y

next_R, next_Y, next_G;

assign next_R = rst ? 1b1 : (change ? G :


R);
assign next_Y = rst ? 1b0 : (change ? R :
Y);
assign next_G = rst ? 1b0 : (change ? Y :

17

wire
next_R, next_Y, next_G;
output R, Y, G;
assign next_R = rst ? 1b1 : (change ? G : R);
assign next_Y = rst ? 1b0 : (change ? R : Y);
assign next_G = rst ? 1b0 : (change ? Y : G);
ff ff_R(R, next_R, CLK);
ff ff_Y(Y, next_Y, CLK);
ff ff_G(G, next_G, CLK);

18

Logic Diagram: Traffic Light Controller


Rst == 1

Change == 1

RYG
100

Change == 1

RYG
001

Change == 1

Next State Combinational


Logic
Q R
D
Q G

RYG
010

ALU for MIPS ISA


design a 1-bit ALU using AND gate, OR gate, a full
adder, and a mux

19

20

ALU for MIPS ISA


design a 32-bit ALU
by cascading 32 1-bit ALUs

21

ALU for MIPS


a 1-bit ALU performing AND, OR, addition and
subtraction

If we set Binvert = Carryin =1


then we can perform a - b

22

23

ALU for MIPS


include a less input for set-on-less-than (slt)

24

ALU for MIPS


design the most significant bit ALU
most significant bit need to do more work (detect
overflow and MSB can be used for slt )
how to detect an overflow
overflow = carryin{MSB} xor carryout{MSB]
overflow = 1 ; means overflow
overflow = 0 ; means no overflow

set-on-less-than
slt $1, $2, $3; if $2 < $3 then $1 = 1, else $1 = 0
; if MSB of $2 - $3 is 1, then $1 = 1
; 2s comp. MSB of a negative no. is 1

25

ALU for MIPS


a 1-bit ALU for the MSB

Overflow
=Carryin XOR Carryout

26

A 32-bit ALU
constructed from
32 1-bit ALUs

27

A 32-bit ALU
with zero detector

28

29

A Verilog behavioral definition of a MIPS ALU.

30

ALU for MIPS


Critical path of 32-bit ripple carry adder is 32 x carry
propagation delay
How to solve this problem
design trick : use more hardware
design trick : look ahead, peek
carry look adder (CLA)

CLA
a b cout
0 0
0
nothing happen
0 1
cin
propagate cin
1 0
cin
propagate cin
1 1
1
generate
propagate = a + b;

generate = ab

31

ALU for MIPS


CLA using 4-bit as an example
two 4-bit numbers: a3a2a1a0, b3b2b1b0
p0 = a0 + b0; g0 = a0b0
c1 = g0 + p0c0
c2 = g1 + p1c1
c3 = g2 + p2c2
c4 = g3 + p3c3
larger CLA adders can be constructed by cascading 4bit CLA adders
other adders: carry select adder, carry skip adder

32

Design Process
Divide and Conquer
using simple components
glue simple components together
work on the things you know how to do. The unknown
will become obvious as you make progress

Successive Refinement
multiplier design
divider design

33

Multiplier
paper and pencil method
multiplicand
multiplier

0110
1001
0110
0000
0000
0110
0110110

product
n bits x m bits = m+n bits
binary : 0
place 0
1
place a copy of multiplicand

34

Multiply Hardware Version 1


32 bits x 32 bits; using 64-bit multiplicand reg. 64 bit ALU, 64 bit product reg. 32 bit multiplier

multiplicand
64 bits

64-bit ALU

product

shift left

shift right
multiplier

ADD

write

control

64 bits

Control provides
four control
signals

Check the right


most bit of Mr
to decide to add 0
or multiplicand

Multiply Algorithm Version 1


1. test multiplier0 (i.e., bit0 of multiplier)
1.a if multiplier0 = 1, add
multiplicand to product
and place result in
product register
2. shift the multiplicand left 1 bit
3. shift the multiplier right 1 bit
4. 32nd repetition ? if yes done
if no go to 1.

35

36

Multiply Algorithm Version 1 Example


0010 x 0101 = 0000 1010
iter.
0
1

2
3

step
initial
1.a
2
3
2
3
1.a
2
3
2
3

multiplier
0101
0101
0101
0010
0010
0001
0001
0001
0000
0000
0000

multiplicand
0000 0010
0000 0010
0000 0100
0000 0100
0000 1000
0000 1000
0000 1000
0001 0000
0001 0000
0010 0000
0010 0000

product
0000 0000
0000 0010
0000 0010
0000 0010
0000 0010
0000 0010
0000 1010
0000 1010
0000 1010
0000 1010
0000 1010

37

Multiplier Algorithm Version 1

observations from version 1


1/2 bits in multiplicand always 0
use 64-bit adder is wasted (for 32 bit x 32 bit)
0s inserted into multiplicand as shifted left, least
significant bits of the product does not change once
formed
3 steps per bit
shift product to right instead of shifting multiplicand to
left ? (by adding to the left half of the product register)

38

Multiply Hardware Version 2


32-bit multiplicand reg. 32-bit ALU, 64-bit product reg. 32-bit multiplier reg
multiplicand
32 bits

32-bit ALU

product

32 bits

32 bits

ADD

shift right
multiplier

shift right
control
write
Check the right
Write into the
most bit of Mr
left half of the
to decide to add 0
product register
or multiplicand

39

Multiply Algorithm Version 2


1. test multiplier0 (i.e., bit 0 of the multiplier)
1a. if multiplier0 = 1 add
multiplicand to the left
half of product and place
the result in the left half of
product register;
2. shift product reg. right 1 bit
3. shift multiplier reg. right 1 bit
4. 32nd repetition ? if yes done
if no, go to 1.

40

Multiply Algorithm Version 2 Example


iter.
0
1

3
4

step
initial
1.a
2
3
1.a
2
3
2
3
2
3

multiplier
0011
0011
0011
0001
0001
0001
0000
0000
0000
0000
0000

multiplicand
0010
0010
0010
0010
0010
0010
0010
0010
0010
0010
0010

product
0000 0000
0010 0000
0001 0000
0001 0000
0011 0000
0001 1000
0001 1000
0000 1100
0000 1100
0000 0110
0000 0110

41

Multiply Version 2
Observations
product reg. wastes space that exactly matches the size
of multiplier
3 steps per bit
combine multiplier register and product register

42

Multiply Hardware Version 3


32-bit multiplicand register, 32-bit ALU, 64-bit product
register, multiplier reg is part of product register

multiplicand

32 bit ALU

ADD
write into
left half
control

product (multiplier)
shift right

43

Multiply Algorithm Version 3


1. test product0 (multiplier is in the right half of product register)
1a. if product0 = 1
add multiplicand to the left
half of product and place the
result in the left half of product
register
2. shift product register right 1 bit
3. 32nd repetition ? if yes, done
if no, go to 1.

44

Multiply Algorithm Version 3 Example


1110 x 1011

iter.
0
1
2
3
4

step
initial
1.a
2
1.a
2
2
1.a
2

multiplicand
1110
1110
1110
1110
1110
1110
1110
1110
need to save the carry

1110 x 1011 = 1001 1010


14 x 11 = 154

product
0000 1011
1110 1011
0111 0101
10101 0101
1010 1010
0101 0101
10011 0101
1001 1010

45

Multiply Algorithm Version 3


Observations
2 steps per bit because of multiplier and product in one register,
shift right 1 bit once (rather than twice in version 1 and version
2)
MIPS registers Hi and Li correspond to left and right half of
product
MIPS has instruction multu
How about signed numbers in multiplication ?
method 1: keep the sign of both numbers and use the magnitude
for multiplication, after 32 repetitions, then change the product to
appropriate sign.
method 2: Booths algorithm
Booths algorithm is more elegant in signed number multiplications
Booths algorithm uses the same hardware as version 3

46

Booths Algorithm
Motivation for Booths Algorithm is speed
example 2 x 6 = 0010 x 0110
normal approach
0010
0110

Booths approach
0010
0110

Booths approach : replace a string of 1s in multiplier by two actions


action 1: beginning of a string of 1s, subtract multiplicand
action 2: end of a string of 1s, add multiplicand

47

Booths Algorithm
end of run

middle of run

beginning of run

011111111111111111110

current bit bit to the right


(previous bit)

explanation

action

beginning of a run of 1s

sub. multd from


left half of product

middle of a run

no arithmetic oper.

end of a run

middle of a run of 0s

add muld to left


half of product
no arith. operation.

48

Booths Algorithm Example


-2 x 7=-14 in signed binary 1110 x 0111 = 1111 0010
iteration
step
0
initial
1
sub.
product shift right
2
shift right
3
shift right
4
add
shift right

multiplicand
1110
1110
1110
1110
1110
1110
1110

product
0000 0111
0010 0111
0001 0011
0000 1001
0000 0100
1110 0100
1111 0010

To begin with we put multiplier at the right half of


the product register

previous
bit
0
0
1
1
1
1
0

49

Divide Algorithm
Paper and pencil
divisor 1011

1010101010

quotient
dividend

remainder (modulo )

50

Divide Hardware Version 1


64-bit divisor reg., 64-bit ALU, 32-bit quotient reg. 64bit remainder register
divisor

shift right

64-bit ALU

remainder

quotient

write

shift left

control

put the dividend in the remainder register initially

51

Divide Algorithm Version 1


start: place dividend in remainder
1. sub. divisor from the remainder and place the result in remainder
2. test remainder
2a. if remainder >= 0, shift quotient to left setting the new rightmost bit to
1
2b. if remainder <0, restore the original value by adding divisor to
remainder, and place the sum in remainder. shift
quotient to left and setting new least significant bit 0
3. shift divisor right 1 bit
4. n+1 repetitions ? if yes, done, if no, go to 1.

Divide Algorithm Version 1 Example


iter.
step
0 initial
1 1
0000
2b 0000
3
0000
2 1
0000
2b 0000
3
0000
3 1
0000
2b 0000
3
0000
4 1
0000
2a 0001
3
0001
5 1
0001
2a 0011
3
0011

quotient
divisor
0000 0010 0000
0000 0111
0010 0000
1110 0111
0010 0000
0000 0111
0001 0000
0000 0111
0001 0000
1111 0111
0001 0000
0000 0111
0000 1000
0000 0111
0000 1000
1111 1111
0000 1000
0000 0111
0000 0100
0000 0111
0000 0100
0000 0011
0000 0100
0000 0011
0000 0010
0000 0011
0000 0010
0000 0001
0000 0010
0000 0001
0000 0001
0000 0001

remainder

52

53

Divide Algorithm Version 1


Observations
1/2 bits in divisor always 0
1/2 of divisor is wasted
1/2 of 64-bit ALU is wasted

Possible improvement
instead of shifting divisor to right, shifting remainder to
left ?
first step can not produce a 1 in quotient, so switch order
to shift first and then subtract. This can save one
iteration

54

Divide Hardware Version 2


32-bit divisor reg. 32-bit ALU, 32-bit quotient reg., 64-bit
remainder reg.

divisor
quotient

32-bit ALU

shift left
remainder

shift left

control

55

Divide Algorithm Version 2


start: place dividend in remainder
1. shift remainder left 1 bit
2. sub. divisor from the left half of remainder and place the
result in the left half of remainder
3. test remainder
3a. if remainder >= 0, shift quotient to left setting the new
rightmost bit to 1
3b. if remainder <0, restore the original value by adding
divisor to the left half of remainder, and place the sum in the
left of the remainder. also shift quotient to left and setting
new least significant bit 0
4. n repetitions ? if yes, done,
if no, go to 1.

56

Divide Algorithm Version 2 Example


iter.
step
0 initial
1 1
0000
2
0000
3b 0000
2 1
0000
2
0000
3a 0001
3 1
0001
2
0001
3b 0010
4 1
0010
2
0010
3a 0101

0000
0011
0011
0011
0011
0011
0011
0011
0011
0011
0011
0011
0011

quotient
divisor
0011 0000 1111
0001 1110
1110 1110
0001 1110
0011 1100
0000 1100
0000 1100
0001 1000
1110 1000
0001 1000
0011 0000
0000 0000
0000 0000

remainder

57

Divide Algorithm Version 2


Observations
3 steps (shift remainder left, subtract, shift quotient left)

Further improvement (version 3)


eliminating quotient register by combining with remainder
register as shifted left
therefore loop contains only two steps, because the shift
of remainder is shifting the remainder in the left half and
the quotient in the right half at the same time
consequence of combining the two registers together is
the remainder shifted one time unnecessary at the last
iteration
final correction step: shift back the remainder in the left
half of the remainder register (i.e., shift right 1 bit of
remainder only)

58

Divide Hardware Version 3


32-bit divisor register, 32-bit ALU, 64-bit remainder
register, 0-bit quotient register (quotient bit shifts into
remainder register, as remainder register shifts left)
divisor
32bits
32-bit ALU
shift left
remainder, quotient
64-bit

write

control

59

Divide Algorithm Version 3


start: place dividend in remainder
1. shift remainder left 1 bit
2. sub. divisor from the remainder and place the result in
remainder
3. test remainder
3a. if remainder >= 0, shift remainder to left setting the new
rightmost bit to 1
3b. if remainder <0, restore the original value by adding
divisor to the left half of remainder, and place the sum in the
left of the remainder. also shift remainder to left and setting
new least significant bit 0
4. n repetitions ? if yes, done,
if no, go to 2.

60

Divide Algorithm Version 3 Example


iter.
0

step
initial
1
2
3b
2
3b
2
3a
2
3b

divisor
0101
0101
0101
0101
0101
0101
0101
0101
0101
0101

remainder
0000 1110
0001 1100
1
1100 1100
0011 1000
2
1110 1000
0111 0000
3
0010 0000
0100 0001
4
1111 0001
1000 0010
0100 0010
correction step: shift remainder right 1bit.

quotient

61

Divide Algorithm Version 3


Observations
same hardware as multiply, need a 32-bit ALU to add and
subtract and a 64-bit register to shift left and right
divide algorithm version 3 is called restoring division algorithm
for unsigned numbers

Signed numbers divide


simplest method
remember signs of dividend and divisor, make positive, and
finally complement quotient and remainder as necessary
dividend and remainder must have the same sign
quotient is negative if dividend sign and divisor sign
disagree
SRT (named after three persons) method
an efficient algorithm

62

Floating Point Numbers


What can be represented in N bits ?
unsigned
0 <-------------> 2N-1
2s complement.
1s comp.
BCD

-2N- 1 <------------------> 2N-1 - 1

-2N-1+ 1 <---------------------->2N-1 - 1

0 <-----------------------> 10N/4 - 1

How about
very small numbers, very large numbers
rationals, such as 2/3; irrationals such as 2;
transcendentals, such as , .

63

Floating Point Numbers


Mantissa (aka Significand), Exponent (using radix of
10)
6.12 x 10 23

1.Mx2 E127
IEEE standard F.P.
single precision S(1bit), E(8 bits), M(23 bits)
mantissa = sign + magnitude; magnitude is normalized with
hidden integer bit: 1.M
exponent = E -127 (excess 127), 0 < E < 255
a FP number N = (-1)S 2(E-127) (1.M)
0 = 0 00000000 00000000000000000000000
-1.5 = 1 01111111 10000000000000000000000

64

Floating Point Numbers


Single Precision FP numbers
- 0.75 = __________________________________
- 5.0 = ___________________________________
7 = ____________________________________
-0.75 =-0.11b=-1.1 x 2-1

E=126

-5.0 = -101.0b=-1.01 x 22

E=129

7 = 111b = 1.11 x 22

E=129

1 01111110 10000.......0

65

Floating Point Numbers


Single precision FP number
What is the smallest number in magnitude ?
(1.0) 2 -126
What is the largest number in magnitude ?
(1.11111111111111111111111)binary 2127 = (2 - 2-23) 2127

66

Floating Point Numbers


single precision FP numbers
Exponent
Significand
Object represented
0
0
0
0
nonzero
denormalized numbers
1 to 254
anything
floating point numbers
255
0
infinite
255
nonzero
NaN (Not A Number)
other topics in FP numbers
1. extra bits for rounding
2. guard bit, sticky bit
3. algorithms for FP numbers

67

Floating Point Numbers


Double precision
64 bits total
52-bit significand
11-bit exponent (excess 1023 bias)
Number is: (-1)s (1.M) x 2E-1023

68

Basic Addition Algorithm


Steps for Y + X, assuming Y >= X
1. Align binary points (denormalize smaller number)
a. compute Diff = Exp(Y) - Exp(X); Exp = Exp(Y)
b. Sig(X) = Sig(X) >> Diff
2. Add the aligned components
Sig = Sig(X) + Sig(Y)
3. Normalize the sum
1. shift Sig right/left until leading bit is 1; decrementing
or incrementing Exp.
2. Check for overflow in Exp
3. Round
4. repeat step 3 it not still normalized

69

Addition Example
4-bit significand
1.0110 x 23 + 1.1000 x 22

align binary points (denormalize smaller number)


1. 0110 x 23
0. 1100 x 23

Add the aligned components


10. 0010 x 23

Normalize the sum


1.0001 x 24
No overflow, no rounding

70

Another Addition Example


1.0001 x 23 - 1.1110 x 1
4-bit significand; extra bit needed for accuracy

1. Align binary point:


1. 0001 x 23
- 0. 01111 x 23
2. Subtract the aligned components
0. 10011 x 23
3. Normalize
1.0011 x 22 = 4.75
Without extra bit, the result would be 0.1001 x 2 3 = 100.1 =
4.5, which is off by 0.25. This is too much!

71

Accuracy and Rounding


Want arithmetic to be fully precise
IEEE 754 keeps two extra digits on the right during
intermediate calculations (guard digit, round digit)

Alignment step can cause data to be discarded (shifted


out on right)
2.56 x 100 + 2.34 x 102
2.3400 x 102
+ 0.0256 x 102
2.3656 x 102
Round
Answer = 2.37 x 102
Guard
Without using Guard and Round digits,
Answer would be 2.36 x 102

Вам также может понравиться