Академический Документы
Профессиональный Документы
Культура Документы
BinaryFractions
FloatingPoint
Representation
Eachpositionistwicethevalueoftheposition
t th i ht
totheright.
23
8
1
COMP370
IntroductiontoComputerArchitecture
7.75
31
7.375
15.25
21
2
1
20
1
0
.
.
.
21 22 23
1/2 1/4 1/8
0 1 0
Addingthepowersof2gives
8+4+2+0.25=14.25
Whatis111.11indecimal?
1.
2.
3.
4.
22
4
1
Whatis8.5inbinary?
1.
2.
3.
4.
11111111.11111
1000.01
0.100011
1000.10
1/21/2009
RangeofValues
Unsignedintegers:0to2n1
Forbyte,from0to255
Forint,from0to4.2x109
ShiftingExponents
241,506,800canbe
2.415068x108
24.15068x107
241.5068x106
2415.068x105
24150.68x104
241506.8x103
ScientificNotation
Exponent
241,506,800=0.2415068x109
Mantissa
BinaryScientificNotation
Abinarynumber,suchas10110011,canbe
expressedas:
d
1.0110011x27
Notetheexponentisapoweroftwonotten.
etc.
1/21/2009
ShiftingBinaryExponents
Abinarynumbercanbeexpressedin
scientific notationisseveralwayslike
notation is several ways like
scientific
decimalnumbers.
0.110010x25
0.78125x32=25
1.10010x24
1.5625x16=25
11.0010x23
3.125x8=25
2
110.010x2
6.25x4=25
1100.10x21
12.5x2=25
11001.0x20
25x1=25
StandardFormat
Mostcomputers(includingIntelPentiums)
f ll th IEEE St d d f Bi
followtheIEEEStandardforBinaryFloating
Fl ti
PointArithmetic,ANSI/IEEEStandard754
1985
Beforethestandarddifferentcomputersused
gp
differentformatsforfloatingpointnumbers.
Thestandarddefinestheformat,accuracyand
actiontakenwhenerrorsaredetected.
110.010isequivalentto
1.
2.
3.
4.
5
5.
11001.0x22
0.110010x23
6.25
Alloftheabove
None of the above
Noneoftheabove
FloatingpointSizes
ANS/IEEEStandard754
ANS/IEEE Standard 7541985
1985
Singleprecision(32bits)
Doubleprecision(64bits)
Extendedprecision(80bits)
1/21/2009
SinglePrecisionFloatRange
SinglePrecisionFloatingpointNumbers
float variablesinC++orJava
Alittlemorethan7decimaldigitsaccuracy
From3.4x1038 to3.4x1038
Positivenumberscanbeassmallas
1.18x1038 beforegoingtozero.
SignedMagnitude
Forpositivenumbers,thesignbitiszero
Fornegativenumbers,thesignbitisone
andeverythingelseisthesame
DoublePrecisionFloatingPoint
double variablesinC++orJava
approximately16decimaldigitsofaccuracy
i
l 16 d i l di i f
From1.798x10308 to1.798x10308
ExtendedPrecisionFloatingPoint
almost20decimaldigitsofaccuracy
3.37 104932 to1.18 104932
NotdirectlysupportedinC++orJava
Oftenusedinternallyforcalculationswhich
arethenroundedtodesiredprecision
1/21/2009
ExponentBias
Theexponentrepresentsthepowerof2.
Thesingleprecisionexponentisbiasedby
adding127totheactualexponent
Thisavoidsanextrasignbitfortheexponent
Theexponentrangeis126to128
Exponent value
Decimalexponent
Binaryexponent
20
127
132
122
01111111
10000100
01111010
25
25
Normalization
Floatingpointnumbersareadjustedsothe
mantissaorfractionalparthasasingle1bit
ti
f ti
l
th
i l 1 bit
beforetheradixpoint.
Decimal
5.75
0.125
32.0
Binary
Normalized
101.11x20 1.0111x22
0.001x20
1.0x23
100000.0x20
1.0x25
SavingaBit
CreatingaFloatingPointNumber
Thefractionalpartormantissaisalways
adjustedsotheleftmostbitisaone.
dj t d th l ft
t bit i
Sincethisbitisalways aone,itisnotactually
storedinthefloatingpointnumber.
Themantissaisstoredwithouttheleading
one bit although the one bit is assumed in
onebitalthoughtheonebitisassumedin
calculatingthevalueofthenumber.
1. Writethenumberinbinarywithafractional
partasnecessary.
t
2. Adjusttheexponentsotheradixpointisto
therightofthefirstonebit.
3. Themantissaisthebinarynumberwithout
the leading one bit
theleadingonebit.
4. Theexponentfieldiscreatedbyadding127
tothebinaryexponent.
5. Thesignisthesameasthenumberssign.
1/21/2009
DecimaltoFloatingPointExample
Convert15.375toFloatingPoint
Convert4.5tosingleprecisionfloatingpoint
Decimal4.5is100.1inbinary
Adjustradixtoget1.001x22
Theexponentfieldis127+2=129=10000001
Thefloatingpointnumberinbinaryis
S Exponent
0 10000001
Mantissa
00100000000000000000000
Convert15.375toFloatingPoint
Decimal15.375is1111.011inbinary
Adjusttheexponentto1.111011x23
Exponentfieldis3+127=13010 =100000102
S Exponent
0 10000010
Mantissa
11101100000000000000000
FloatingPointtoDecimalExample
S Exponent
1 10000011
Mantissa
01001000000000000000000
Whatisthedecimalvalueofthisnumber?
Exponent10000011=131 127=4
Mantissa is 1 01001000000000000000000
Mantissais1.01001000000000000000000
1.01001x24 =10100.1
10100.1is20.5indecimal
1/21/2009
Whatisthedecimalvalueof
S Exponent
0 10000001
1.
2.
3.
4.
Mantissa
10100000000000000000000
4.5
3.25
6.5
13.0
SpecialValueRepresentation
Value
Sign
Zero
0
+INF +
0
INF
1
NaN
0or1
Exponent Mantissa
0
0
11111111
0
11111111
0
0
Notzero
SpecialFloatingPointValues
Zero isrepresentedasallzerobits.
NotaNumber(NaN)isaspecialvaluethat
indicatesafloatingpointerror,suchastaking
thesquarerootofanegativenumber.
Infinity(INF)bothpositiveandnegative.
OverflowandUnderflow
Whenyoucalculateanumberthatistoobig
t fit i t th fl ti
tofitintothefloatingpointformat,theresult
i tf
t th
lt
isinfinity.
Calculatinganumberthatistoosmall(a
positivenumbersmallerthan1.18x1038 for
g p
)p
singleprecision)produceszero.
Dividingbyzeroproducesinfinitywiththe
propersign.
1/21/2009
CalculatingwithInfinity
(+INF)+(+7)=(+INF)
(+INF) (2)=(INF)
(+INF) 0=NaNmeaninglessresult