0% found this document useful (0 votes)
22 views23 pages

Floating Point Numbers

Uploaded by

Khaled Alshurman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views23 pages

Floating Point Numbers

Uploaded by

Khaled Alshurman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 23

Floating Point

Numbers
Topics Covered

 Fixed point Numbers

 Representation of Floating Point Numbers


IEEE 32-bit floating point number.

 Floating point Arithmetic


Fixed Point Numbers

 The binary (or decimal) point is assumed


to be in a fixed position

 Base 10 fixed point arithmetic:

7632135 763.2135
1794821 179.4821
9426956 942.6956
Fixed Point (Binary) Numbers
 Example: Add 3.625 and 6.5
1. Convert the numbers to 8-bit form (4-bit int, 4-bit fraction):
3.625  11.101  0011.1010
6.500  110.10  0110.1000

2. Consider the numbers having an imaginary binary point and


added in the normal way:

00111010 + 01101000 = 10100010

3. The integer part of the result is converted to 10, and the


fractional part is interpreted as .125. Therefore, the result is
10.125.
Problem with Fixed Point
(Binary) Numbers

 Some systems require a large range of


numbers:

1. Mass of sun:
1990000000000000000000000000000000
grams
Requires about 14 bytes

2. Mass of electron:
000000000000000000000000000910956
grams
Requires about 12 bytes
Floating Point Numbers
Definitions

 Range
 How small and how large the numbers can be.

 Precision
 The number of significant figures used to represent the
number.
 A measure of a number’s exactness.
 PI = 3.141592 is more precise that PI = 3.14

 Accuracy
 A measure of the correctness of a number.
 PI = 3.241592 is more precise than PI = 3.14, but
 PI = 3.14 is more accurate.
IEEE Floating Point Numbers
Single Precision Format

-1s * 2E-B * 1.F

B = 127
IEEE Floating Point Numbers
Range of Mantissa

 A floating point mantissa is limited to one of the three ranges:

-2 < x <= -1
x = 0
+1 <= x < +2
IEEE Floating Point Numbers
Exponent

Binary Value True Biased Special Numbers


Exponent Exponent
0000 0000 -127 0 zero
0000 0001 -126 1
0000 0010 -125 2
0000 0100 -124 3
. . .
1000 0000 0 128
. . .
1111 1100 125 252
1111 1101 126 253
1111 1110 127 254
1111 1111 128 255 +- Infinity
IEEE Floating Point Numbers
Excess - n

 The stored exponent is also called excess – n, or


excess 127, for the IEEE single precision format.

 The stored exponent exceeds the true exponent by


127, the bias.

 b’ = b + 127
where b’ is the biased exponent, and b is the true
exponent.
 Examples:
 If the true exponent is 2, the exponent is stored in biased form as
2 + 127 = 1000 0001.
 If the stored exponent is 0000 0001, the true exponent is
1 – 127 = -126.
IEEE Floating Point Numbers
Representation of Zero

 The smallest stored exponent 0000 0000 (in biased


form), corresponding to a true exponent of -127, is
used to represent zero.
IEEE Floating Point Numbers
Infinity and Not a Number (NaN)

 1111 1111  used as +- infinity.

 1111 1111 and Mantissa != 0  used as


NaN.
IEEE Floating Point Numbers
Example Representation

 Represent -2345.125 as a single precision IEEE


floating point number.

 -2345.12510 = -100100101001.0012

 -2345.12510 = -1.001001010010012 x 211

 S = 1 (negative)
 The biased exponent is 11 + 127 = 138 =
100010102
 The fractional part of the mantissa
is .00100101001001000000000
 Therefore, -2345.125
10 =
1 10001010 00100101001001000000000
Numbers
Addition and Subtraction
Flowchart
IEEE Floating Point Numbers
Arithmetic Example #1

1. Convert the decimal numbers 123.5 and 100.25 into the IEEE
32-bit floating point number representation. Then carry out the
subtraction of 123.5 – 100.25 and express the result as a
normalized 32-bit floating point number.

 123.510 = 1111011.12 = 1.1110111 x 26


 The mantissa is positive, and so S = 0.
 The exponent is +6, which is stored in biased
form as 6 + 127 = 13310 = 100001012.
 The mantissa is 1.1110111, which is stored in 23-
bits, with the leading ‘1’ suppressed.
 Therefore, 123.510 is stored as:
0 10000101 11101110000000000000000IEEE
IEEE Floating Point Numbers
Arithmetic Example #1 (Continued)

1. Convert the decimal numbers 123.5 and 100.25 into the IEEE
32-bit floating point number representation. Then carry out the
subtraction of 123.5 – 100.25 and express the result as a
normalized 32-bit floating point number. (Continued)

 100.2510 = 1100100.012 = 1.10010001 x 26


 The mantissa is positive, and so S = 0.
 The exponent is +6, which is stored in biased
form as 6 + 127 = 13310 = 100001012.
 The mantissa is 1.10010001, which is stored in
23-bits, with the leading ‘1’ suppressed.
 Therefore, 100.2510 is stored as:
0 10000101 10010001000000000000000IEEE
IEEE Floating Point Numbers
Arithmetic Example #1 (Continued)

1. Convert the decimal numbers 123.5 and 100.25 into the IEEE
32-bit floating point number representation. Then carry out the
subtraction of 123.5 – 100.25 and express the result as a
normalized 32-bit floating point number. (Continued)

 The two IEEE numbers are first unpacked: the


sign, exponent, and mantissa must be
reconstituted.
 The two exponents are compared. If they are the
same, the mantissas are added. If they are not,
the number with the smaller exponent is
denormalized by shifting its mantissa right
(i.e., dividing by 2) and incrementing its
exponent (i.e., multiplying by 2) until the two
exponents are equal. Then the numbers are added.
IEEE Floating Point Numbers
Arithmetic Example #1 (Continued)

1. Convert the decimal numbers 123.5 and 100.25 into the 32-bit
floating point number representation. Then carry out the
subtraction of 123.5 – 100.25 and express the result as a
normalized 32-bit floating point number. (Continued)

 After unpacking, insert the leading ‘1’


and perform the subtraction.
1.11101110000000000000000
-1.10010001000000000000000
0.01011101000000000000000
 Normalize the result:
1.01110100000000000000000
IEEE Floating Point Numbers
Arithmetic Example #1 (Continued)

1. Convert the decimal numbers 123.5 and 100.25 into the IEEE
32-bit floating point number representation. Then carry out the
subtraction of 123.5 – 100.25 and express the result as a
normalized 32-bit floating point number. (Continued)

 The exponent must be decreased by 2.


 10000101 – 210 = 10000011

 The result expressed in IEEE format is:


0 10000011 01110100000000000000000
IEEE Floating Point Numbers
Arithmetic Example #2

2. Convert the decimal numbers 42.6875 and -0.09375 into the


IEEE 32-bit floating point number representation. Then carry
out the addition of 42.6875 and – 0.09375 and express the
result as a normalized 32-bit floating point number.

 42.687510 = 101010.10112 = 1.010101011 x 25


 The mantissa is positive, and so S = 0.
 The exponent is +5, which is stored in biased
form as 5 + 127 = 13210 = 100001002.
 The mantissa is 1.010101011, which is stored in
23-bits, with the leading ‘1’ suppressed.
 Therefore, 42.687510 is stored as:
0 10000100 01010101100000000000000IEEE
IEEE Floating Point Numbers
Arithmetic Example #2 (Continued)

2. Convert the decimal numbers 42.6875 and -0.09375 into the


IEEE 32-bit floating point number representation. Then carry
out the addition of 42.6875 – 0.09375 and express the result as
a normalized 32-bit floating point number (continued).

 -0.0937510 = -0.000112 = -1.1 x 2-4


 The mantissa is negative, and so S = 1.
 The exponent is -4, which is stored in biased
form as -4 + 127 = 12310 = 011110112.
 The mantissa is 1.1, which is stored in 23-bits,
with the leading ‘1’ suppressed.
 Therefore, -0.0937510 is stored as:
1 01111011 10000000000000000000000IEEE
IEEE Floating Point Numbers
Arithmetic Example #2 (Continued)

2. …
+42.687510:0 10000100 101010101100000000000000
-0.0937510:1 01111011 110000000000000000000000

 In order to perform the addition, the


exponents must be the same.
 Increase the second exponent by 9 and shift
the mantissa right 9 times to get:
+42.687510:0 10000100 101010101100000000000000
-0.0937510:1 10000100 000000000110000000000000000000000
IEEE Floating Point Numbers
Arithmetic Example #2 (Continued)

2. …
+42.687510:0 10000100 101010101100000000000000
-0.0937510:1 10000100 000000000110000000000000000000000

 Adding the mantissas, we get:


101010100110000000000000

 The result is positive with a biased


exponent of 10000100.
 Therefore, the result is stored as:

0 10000100 0101010011000000000000

You might also like