Floating Point Numbers
Floating Point Numbers
Numbers
Topics Covered
7632135 763.2135
1794821 179.4821
9426956 942.6956
Fixed Point (Binary) Numbers
Example: Add 3.625 and 6.5
1. Convert the numbers to 8-bit form (4-bit int, 4-bit fraction):
3.625 11.101 0011.1010
6.500 110.10 0110.1000
1. Mass of sun:
1990000000000000000000000000000000
grams
Requires about 14 bytes
2. Mass of electron:
000000000000000000000000000910956
grams
Requires about 12 bytes
Floating Point Numbers
Definitions
Range
How small and how large the numbers can be.
Precision
The number of significant figures used to represent the
number.
A measure of a number’s exactness.
PI = 3.141592 is more precise that PI = 3.14
Accuracy
A measure of the correctness of a number.
PI = 3.241592 is more precise than PI = 3.14, but
PI = 3.14 is more accurate.
IEEE Floating Point Numbers
Single Precision Format
B = 127
IEEE Floating Point Numbers
Range of Mantissa
-2 < x <= -1
x = 0
+1 <= x < +2
IEEE Floating Point Numbers
Exponent
b’ = b + 127
where b’ is the biased exponent, and b is the true
exponent.
Examples:
If the true exponent is 2, the exponent is stored in biased form as
2 + 127 = 1000 0001.
If the stored exponent is 0000 0001, the true exponent is
1 – 127 = -126.
IEEE Floating Point Numbers
Representation of Zero
-2345.12510 = -100100101001.0012
S = 1 (negative)
The biased exponent is 11 + 127 = 138 =
100010102
The fractional part of the mantissa
is .00100101001001000000000
Therefore, -2345.125
10 =
1 10001010 00100101001001000000000
Numbers
Addition and Subtraction
Flowchart
IEEE Floating Point Numbers
Arithmetic Example #1
1. Convert the decimal numbers 123.5 and 100.25 into the IEEE
32-bit floating point number representation. Then carry out the
subtraction of 123.5 – 100.25 and express the result as a
normalized 32-bit floating point number.
1. Convert the decimal numbers 123.5 and 100.25 into the IEEE
32-bit floating point number representation. Then carry out the
subtraction of 123.5 – 100.25 and express the result as a
normalized 32-bit floating point number. (Continued)
1. Convert the decimal numbers 123.5 and 100.25 into the IEEE
32-bit floating point number representation. Then carry out the
subtraction of 123.5 – 100.25 and express the result as a
normalized 32-bit floating point number. (Continued)
1. Convert the decimal numbers 123.5 and 100.25 into the 32-bit
floating point number representation. Then carry out the
subtraction of 123.5 – 100.25 and express the result as a
normalized 32-bit floating point number. (Continued)
1. Convert the decimal numbers 123.5 and 100.25 into the IEEE
32-bit floating point number representation. Then carry out the
subtraction of 123.5 – 100.25 and express the result as a
normalized 32-bit floating point number. (Continued)
2. …
+42.687510:0 10000100 101010101100000000000000
-0.0937510:1 01111011 110000000000000000000000
2. …
+42.687510:0 10000100 101010101100000000000000
-0.0937510:1 10000100 000000000110000000000000000000000
0 10000100 0101010011000000000000