Вы находитесь на странице: 1из 49

Chapter 3

Arithmetic for Computers

3.1 Introduction

Arithmetic for Computers


n

Operations on integers
n n n

Addition and subtraction Multiplication and division Dealing with overflow Representation and operations

Floating-point real numbers


n

Chapter 3 Arithmetic for Computers 2

3.2 Addition and Subtraction

Integer Addition
n

Example: 7 + 6

Overflow if result out of range


n n

Adding +ve and ve operands, no overflow Adding two +ve operands


n

Overflow if result sign is 1 Overflow if result sign is 0


Chapter 3 Arithmetic for Computers 3

Adding two ve operands


n

Integer Subtraction
n n

Add negation of second operand Example: 7 6 = 7 + (6)


+7: 6: +1: 0000 0000 0000 0111 1111 1111 1111 1010 0000 0000 0000 0001

Overflow if result out of range


n n

Subtracting two +ve or two ve operands, no overflow Subtracting +ve from ve operand
n

Overflow if result sign is 0 Overflow if result sign is 1

Subtracting ve from +ve operand


n

Chapter 3 Arithmetic for Computers 4

Dealing with Overflow


n

Some languages (e.g., C) ignore overflow


n

Use MIPS addu, addui, subu instructions

Other languages (e.g., Ada, Fortran) require raising an exception


n n

Use MIPS add, addi, sub instructions On overflow, invoke exception handler
n

n n

Save PC in exception program counter (EPC) register Jump to predefined handler address mfc0 (move from coprocessor reg) instruction can retrieve EPC value, to return after corrective action
Chapter 3 Arithmetic for Computers 5

Arithmetic for Multimedia


n

Graphics and media processing operates on vectors of 8-bit and 16-bit data
n

Use 64-bit adder, with partitioned carry chain


n

Operate on 88-bit, 416-bit, or 232-bit vectors

SIMD (single-instruction, multiple-data) On overflow, result is largest representable value


n

Saturating operations
n

c.f. 2s-complement modulo arithmetic

E.g., clipping in audio, saturation in video


Chapter 3 Arithmetic for Computers 6

3.3 Multiplication

Multiplication
n

Start with long-multiplication approach


1000 1001 1000 0000 0000 1000 1001000

multiplicand multiplier

product

Length of product is the sum of operand lengths

Chapter 3 Arithmetic for Computers 7

FIGURE 3.7 Multiply example using algorithm in Figure 3.5. The bit examined to determine the next step is circled in color. Copyright 2009 Elsevier, Inc. All rights reserved.

Chapter 3 Arithmetic for Computers 8

Multiplication Hardware

Initially 0

Chapter 3 Arithmetic for Computers 9

Optimized Multiplier
n

Perform steps in parallel: add/shift

One cycle per partial-product addition


n

Thats ok, if frequency of multiplications is low


Chapter 3 Arithmetic for Computers 10

Faster Multiplier
n

Uses multiple adders


n

Cost/performance tradeoff

Can be pipelined
n

Several multiplication performed in parallel


Chapter 3 Arithmetic for Computers 11

MIPS Multiplication
n

Two 32-bit registers for product


n n

HI: most-significant 32 bits LO: least-significant 32-bits mult rs, rt


n

Instructions
n

multu rs, rt

64-bit product in HI/LO

mfhi rd
n n

mflo rd

Move from HI/LO to rd Can test HI value to see if product overflows 32 bits Least-significant 32 bits of product > rd

mul rd, rs, rt


n

Chapter 3 Arithmetic for Computers 12

3.4 Division

Division
n n

Check for 0 divisor Long division approach


n

quotient dividend

If divisor dividend bits


n

1 bit in quotient, subtract 0 bit in quotient, bring down next dividend bit

divisor

1001 1000 1001010 -1000 10 101 1010 -1000 10 remainder

Otherwise
n

Restoring division
n

Do the subtract, and if remainder goes < 0, add divisor back Divide using absolute values Adjust sign of quotient and remainder as required

Signed division
n n

n-bit operands yield n-bit quotient and remainder

Chapter 3 Arithmetic for Computers 13

Division Hardware
Initially divisor in left half

Initially dividend

Chapter 3 Arithmetic for Computers 14

Optimized Divider

n n

One cycle per partial-remainder subtraction Looks a lot like a multiplier!


n

Same hardware can be used for both


Chapter 3 Arithmetic for Computers 15

Faster Division
n

Cant use parallel hardware as in multiplier


n

Subtraction is conditional on sign of remainder

Faster dividers (e.g. SRT devision) generate multiple quotient bits per step
n

Still require multiple steps

Chapter 3 Arithmetic for Computers 16

MIPS Division
n

Use HI/LO registers for result


n n

HI: 32-bit remainder LO: 32-bit quotient div rs, rt / divu rs, rt No overflow or divide-by-0 checking
n

Instructions
n n

Software must perform checks if required

Use mfhi, mflo to access result

Chapter 3 Arithmetic for Computers 17

3.5 Floating Point

Floating Point
n

Representation for non-integral numbers


n

Including very small and very large numbers 2.34 1056 +0.002 104 +987.02 109 1.xxxxxxx2 2yyyy
normalized not normalized

Like scientific notation


n n n

In binary
n

Types float and double in C


Chapter 3 Arithmetic for Computers 18

Floating Point Standard


n n

Defined by IEEE Std 754-1985 Developed in response to divergence of representations


n

Portability issues for scientific code

n n

Now almost universally adopted Two representations


n n

Single precision (32-bit) Double precision (64-bit)

Chapter 3 Arithmetic for Computers 19

IEEE Floating-Point Format


single: 8 bits double: 11 bits single: 23 bits double: 52 bits

S Exponent

Fraction

x = (1)S (1+ Fraction) 2(ExponentBias)


n n

S: sign bit (0 ! non-negative, 1 ! negative) Normalize significand: 1.0 |significand| < 2.0
n n

Always has a leading pre-binary-point 1 bit, so no need to represent it explicitly (hidden bit) Significand is Fraction with the 1. restored Ensures exponent is unsigned Single: Bias = 127; Double: Bias = 1203
Chapter 3 Arithmetic for Computers 20

Exponent: excess representation: actual exponent + Bias


n n

Single-Precision Range
n n

Exponents 00000000 and 11111111 reserved Smallest value


n

n n

Exponent: 00000001 ! actual exponent = 1 127 = 126 Fraction: 00000 ! significand = 1.0 1.0 2126 1.2 1038 exponent: 11111110 ! actual exponent = 254 127 = +127 Fraction: 11111 ! significand 2.0 2.0 2+127 3.4 10+38
Chapter 3 Arithmetic for Computers 21

Largest value
n

n n

Double-Precision Range
n n

Exponents 000000 and 111111 reserved Smallest value


n

n n

Exponent: 00000000001 ! actual exponent = 1 1023 = 1022 Fraction: 00000 ! significand = 1.0 1.0 21022 2.2 10308 Exponent: 11111111110 ! actual exponent = 2046 1023 = +1023 Fraction: 11111 ! significand 2.0 2.0 2+1023 1.8 10+308
Chapter 3 Arithmetic for Computers 22

Largest value
n

n n

Floating-Point Precision
n

Relative precision
n n

all fraction bits are significant Single: approx 223


n

Equivalent to 23 log102 23 0.3 6 decimal digits of precision Equivalent to 52 log102 52 0.3 16 decimal digits of precision

Double: approx 252


n

Chapter 3 Arithmetic for Computers 23

Floating-Point Example
n

Represent 0.75
n n n n

0.75 = (1)1 1.12 21 S=1 Fraction = 1000002 Exponent = 1 + Bias


n n

Single: 1 + 127 = 126 = 011111102 Double: 1 + 1023 = 1022 = 011111111102

n n

Single: 101111110100000 Double: 101111111110100000


Chapter 3 Arithmetic for Computers 24

Floating-Point Example
n

What number is represented by the singleprecision float 1100000010100000


n n n

S=1 Fraction = 01000002 Fxponent = 100000012 = 129 = (1) 1.25 22 = 5.0

x = (1)1 (1 + 012) 2(129 127)

Chapter 3 Arithmetic for Computers 25

Denormal Numbers
n

Exponent = 000...0 ! hidden bit is 0


x = (1)S (0 + Fraction) 2Bias

Smaller than normal numbers


n

allow for gradual underflow, with diminishing precision

Denormal with fraction = 000...0


x = (1)S (0 + 0) 2Bias = 0.0
Two representations of 0.0!
Chapter 3 Arithmetic for Computers 26

Infinities and NaNs


n

Exponent = 111...1, Fraction = 000...0


Infinity n Can be used in subsequent calculations, avoiding need for overflow check
n

Exponent = 111...1, Fraction 000...0


n n

Not-a-Number (NaN) Indicates illegal or undefined result


n

e.g., 0.0 / 0.0

Can be used in subsequent calculations


Chapter 3 Arithmetic for Computers 27

Floating-Point Addition
n

Consider a 4-digit decimal example


n

9.999 101 + 1.610 101 Shift number with smaller exponent 9.999 101 + 0.016 101 9.999 101 + 0.016 101 = 10.015 101 1.0015 102 1.002 102
Chapter 3 Arithmetic for Computers 28

1. Align decimal points


n n

2. Add significands
n

3. Normalize result & check for over/underflow


n

4. Round and renormalize if necessary


n

Floating-Point Addition
n

Now consider a 4-digit binary example


n

1.0002 21 + 1.1102 22 (0.5 + 0.4375) Shift number with smaller exponent 1.0002 21 + 0.1112 21 1.0002 21 + 0.1112 21 = 0.0012 21 1.0002 24, with no over/underflow 1.0002 24 (no change) = 0.0625
Chapter 3 Arithmetic for Computers 29

1. Align binary points


n n

2. Add significands
n

3. Normalize result & check for over/underflow


n

4. Round and renormalize if necessary


n

FP Adder Hardware
n n

Much more complex than integer adder Doing it in one clock cycle would take too long
n n

Much longer than integer operations Slower clock would penalize all instructions Can be pipelined

FP adder usually takes several cycles


n

Chapter 3 Arithmetic for Computers 30

FP Adder Hardware

Step 1

Step 2

Step 3 Step 4

Chapter 3 Arithmetic for Computers 31

Floating-Point Multiplication
n

Consider a 4-digit decimal example


n

1.110 1010 9.200 105 For biased exponents, subtract bias from sum New exponent = 10 + 5 = 5 1.110 9.200 = 10.212 ! 10.212 105 1.0212 106 1.021 106 +1.021 106

1. Add exponents
n n

2. Multiply significands
n

3. Normalize result & check for over/underflow


n

4. Round and renormalize if necessary


n

5. Determine sign of result from signs of operands


n

Chapter 3 Arithmetic for Computers 32

Floating-Point Multiplication
n

Now consider a 4-digit binary example


n

1.0002 21 1.1102 22 (0.5 0.4375) Unbiased: 1 + 2 = 3 Biased: (1 + 127) + (2 + 127) = 3 + 254 127 = 3 + 127 1.0002 1.1102 = 1.1102 ! 1.1102 23 1.1102 23 (no change) with no over/underflow 1.1102 23 (no change) 1.1102 23 = 0.21875
Chapter 3 Arithmetic for Computers 33

1. Add exponents
n n

2. Multiply significands
n

3. Normalize result & check for over/underflow


n

4. Round and renormalize if necessary


n

5. Determine sign: +ve ve ! ve


n

FP Arithmetic Hardware
n

FP multiplier is of similar complexity to FP adder


n

But uses a multiplier for significands instead of an adder

FP arithmetic hardware usually does


Addition, subtraction, multiplication, division, reciprocal, square-root n FP ! integer conversion
n

Operations usually takes several cycles


n

Can be pipelined
Chapter 3 Arithmetic for Computers 34

FP Instructions in MIPS
n

FP hardware is coprocessor 1
n

Adjunct processor that extends the ISA 32 single-precision: $f0, $f1, $f31 Paired for double-precision: $f0/$f1, $f2/$f3,
n

Separate FP registers
n n

Release 2 of MIPs ISA supports 32 64-bit FP regs

FP instructions operate only on FP registers


n n

Programs generally dont do integer ops on FP data, or vice versa More registers with minimal code-size impact lwc1, ldc1, swc1, sdc1
n

FP load and store instructions


n

e.g., ldc1 $f8, 32($sp)


Chapter 3 Arithmetic for Computers 35

FP Instructions in MIPS
n

Single-precision arithmetic
n

add.s, sub.s, mul.s, div.s


n

e.g., add.s $f0, $f1, $f6

Double-precision arithmetic
n

add.d, sub.d, mul.d, div.d


n

e.g., mul.d $f4, $f4, $f6

Single- and double-precision comparison


n n

c.xx.s, c.xx.d (xx is eq, lt, le, ) Sets or clears FP condition-code bit
n

e.g. c.lt.s $f3, $f4

Branch on FP condition code true or false


n

bc1t, bc1f
n

e.g., bc1t TargetLabel

Chapter 3 Arithmetic for Computers 36

FP Example: F to C
n

C code:
float f2c (float fahr) { return ((5.0/9.0)*(fahr - 32.0)); } n fahr in $f12, result in $f0, literals in global memory space

Compiled MIPS code:


f2c: lwc1 lwc2 div.s lwc1 sub.s mul.s jr $f16, $f18, $f16, $f18, $f18, $f0, $ra const5($gp) const9($gp) $f16, $f18 const32($gp) $f12, $f18 $f16, $f18
Chapter 3 Arithmetic for Computers 37

FP Example: Array Multiplication


n

X=X+YZ
n

All 32 32 matrices, 64-bit double-precision elements

C code:
void mm (double x[][], double y[][], double z[][]) { int i, j, k; for (i = 0; i! = 32; i = i + 1) for (j = 0; j! = 32; j = j + 1) for (k = 0; k! = 32; k = k + 1) x[i][j] = x[i][j] + y[i][k] * z[k][j]; } n Addresses of x, y, z in $a0, $a1, $a2, and i, j, k in $s0, $s1, $s2
Chapter 3 Arithmetic for Computers 38

FP Example: Array Multiplication


n

MIPS code:
$t1, 32 $s0, 0 $s1, 0 $s2, 0 $t2, $s0, 5 $t2, $t2, $s1 $t2, $t2, 3 $t2, $a0, $t2 $f4, 0($t2) $t0, $s2, 5 $t0, $t0, $s1 $t0, $t0, 3 $t0, $a2, $t0 $f16, 0($t0) # # # # # # # # # # # # # # $t1 = 32 (row size/loop end) i = 0; initialize 1st for loop j = 0; restart 2nd for loop k = 0; restart 3rd for loop $t2 = i * 32 (size of row of x) $t2 = i * size(row) + j $t2 = byte offset of [i][j] $t2 = byte address of x[i][j] $f4 = 8 bytes of x[i][j] $t0 = k * 32 (size of row of z) $t0 = k * size(row) + j $t0 = byte offset of [k][j] $t0 = byte address of z[k][j] $f16 = 8 bytes of z[k][j]

li li L1: li L2: li sll addu sll addu l.d L3: sll addu sll addu l.d

Chapter 3 Arithmetic for Computers 39

FP Example: Array Multiplication


sll $t0, $s0, 5 addu $t0, $t0, $s2 sll $t0, $t0, 3 addu $t0, $a1, $t0 l.d $f18, 0($t0) mul.d $f16, $f18, $f16 add.d $f4, $f4, $f16 addiu $s2, $s2, 1 bne $s2, $t1, L3 s.d $f4, 0($t2) addiu $s1, $s1, 1 bne $s1, $t1, L2 addiu $s0, $s0, 1 bne $s0, $t1, L1 # # # # # # # # # # # # # # $t0 = i*32 (size of row of y) $t0 = i*size(row) + k $t0 = byte offset of [i][k] $t0 = byte address of y[i][k] $f18 = 8 bytes of y[i][k] $f16 = y[i][k] * z[k][j] f4=x[i][j] + y[i][k]*z[k][j] $k k + 1 if (k != 32) go to L3 x[i][j] = $f4 $j = j + 1 if (j != 32) go to L2 $i = i + 1 if (i != 32) go to L1

Chapter 3 Arithmetic for Computers 40

Accurate Arithmetic
n

IEEE Std 754 specifies additional rounding control


n n n

Extra bits of precision (guard, round, sticky) Choice of rounding modes Allows programmer to fine-tune numerical behavior of a computation Most programming languages and FP libraries just use defaults

Not all FP units implement all options


n

Trade-off between hardware complexity, performance, and market requirements

Chapter 3 Arithmetic for Computers 41

Interpretation of Data
The BIG Picture
n

Bits have no inherent meaning


n

Interpretation depends on the instructions applied Finite range and precision Need to account for this in programs

Computer representations of numbers


n n

Chapter 3 Arithmetic for Computers 42

3.6 Parallelism and Computer Arithmetic: Associativity

Associativity
n

Parallel programs may interleave operations in unexpected orders


n

Assumptions of associativity may fail


(x+y)+z x -1.50E+38 y 1.50E+38 0.00E+00 z 1.0 1.0 1.50E+38 1.00E+00 0.00E+00 x+(y+z) -1.50E+38

Need to validate parallel programs under varying degrees of parallelism

Chapter 3 Arithmetic for Computers 43

3.7 Real Stuff: Floating Point in the x86

x86 FP Architecture
n

Originally based on 8087 FP coprocessor


n n n

8 80-bit extended-precision registers Used as a push-down stack Registers indexed from TOS: ST(0), ST(1), Converted on load/store of memory operand Integer operands can also be converted on load/store Result: poor FP performance

FP values are 32-bit or 64 in memory


n n

Very difficult to generate and optimize code


n

Chapter 3 Arithmetic for Computers 44

x86 FP Instructions
Data transfer
FILD mem/ST(i) FISTP mem/ST(i) FLDPI FLD1 FLDZ

Arithmetic
FIADDP FISUBRP FIMULP FIDIVRP FSQRT FABS FRNDINT mem/ST(i) mem/ST(i) mem/ST(i) mem/ST(i)

Compare
FICOMP FIUCOMP FSTSW AX/mem

Transcendental
FPATAN F2XMI FCOS FPTAN FPREM FPSIN FYL2X

Optional variations
n n n n

I: integer operand P: pop operand from stack R: reverse operand order But not all combinations allowed
Chapter 3 Arithmetic for Computers 45

Streaming SIMD Extension 2 (SSE2)


n

Adds 4 128-bit registers


n

Extended to 8 registers in AMD64/EM64T 2 64-bit double precision 4 32-bit double precision Instructions operate on them simultaneously
n

Can be used for multiple FP operands


n n n

Single-Instruction Multiple-Data

Chapter 3 Arithmetic for Computers 46

3.8 Fallacies and Pitfalls

Right Shift and Division


n

Left shift by i places multiplies an integer by 2i Right shift divides by 2i?


n

Only for unsigned integers Arithmetic right shift: replicate the sign bit e.g., 5 / 4
n n

For signed integers


n n

111110112 >> 2 = 111111102 = 2 Rounds toward

c.f. 111110112 >>> 2 = 001111102 = +62


Chapter 3 Arithmetic for Computers 47

Who Cares About FP Accuracy?


n

Important for scientific code


n

But for everyday consumer use?


n

My bank balance is out by 0.0002! L

The Intel Pentium FDIV bug


n n

The market expects accuracy See Colwell, The Pentium Chronicles

Chapter 3 Arithmetic for Computers 48

3.9 Concluding Remarks

Concluding Remarks
n

ISAs support arithmetic


n n

Signed and unsigned integers Floating-point approximation to reals Operations can overflow and underflow Core instructions: 54 most frequently used
n

Bounded range and precision


n

MIPS ISA
n

100% of SPECINT, 97% of SPECFP

Other instructions: less frequent


Chapter 3 Arithmetic for Computers 49

Вам также может понравиться