BCS302 Unit-2 (Part-III)
BCS302 Unit-2 (Part-III)
m x re
The mantissa may be a fraction or an integer. The location of the radix point and the value of the radix r are assumed and are not
included in the registers.
A floating-point number is normalized if the most significant digit of the mantissa is nonzero. In this way the mantissa
contains the maximum possible number of significant digits.
A zero cannot be normalized because it does not have a nonzero digit. It is represented in floating-point by all 0's in the mantissa
and exponent.
Consider the sum of the following floating-point numbers:
.5372400 x 102
+ .1580000 x 10-1
It is necessary that the two exponents be equal before the mantissas can be added.
The usual alignment procedure is to shift the mantissa that has the smaller exponent to the right by a number of places equal to the
difference between the exponents. After this is done, the mantissas can be added:
.5372400 x 102
+ .0001580 x 102
.5373980 x 102
When two numbers are subtracted, the result may contain most significant zeros as shown in the following example:
.56780 x 105
- .56430 x 105
.00350 x 105
The register organization for floating-point operations is shown in Fig. 10-14. There are three registers, BR, AC, and QR. Each
register is subdivided into two parts. The mantissa part has the same uppercase letter symbols as in fixed-point representation. The
exponent part uses the corresponding lowercase letter symbol.
NOTE: It is assumed that each floating-point number has a mantissa in signed magnitude representation and a biased exponent
(i.e. exponent contain only positive numbers).
Register AC has the mantissa with its sign in AS and magnitude in A (showing most significant bit i.e. MSB as A1) along with
the corresponding exponent in a (represented by lowercase letter).
Similarly, register BR is subdivided into BS, B, and b, and QR into QS, Q, and q.
A parallel-adder adds the two mantissas and transfers the sum into A and the carry into E.
A separate parallel-adder is used for the exponents.
1. Check for zeros- We check for zeros at the beginning and terminate the process if necessary.
2. Align the mantissas- The alignment of the mantissas must be carried out prior to their operation.
3. Add or subtract the mantissas- After the mantissas are added or subtracted, the result may be unnormalized.
4. Normalize the result- The normalization procedure ensures that the result is normalized prior to its transfer to memory.
The flowchart for adding or subtracting two floating-point binary numbers is shown in Fig. 10-15.
Add or Subtract
Figure 10-15: Flowchart for addition and subtraction of floating point numbers
Compiled by- Durgesh Pandey (CSED)
PSIT, Kanpur
The algorithm is shown in the flowchart and steps are described below-
1. If BR = 0, the operation is terminated, with the value in the AC being the result.
If AC = 0, we transfer the content of BR into AC and also complement its sign if the numbers are to be subtracted.
If neither number is equal to zero, we proceed to align the mantissas.
2. The magnitude comparator attached to exponents a and b provides three outputs that indicate their relative magnitude.
If a = b, we go to perform the arithmetic operation.
If a ≠ b, the mantissa having the smaller exponent is shifted to the right and its exponent incremented.
This process is repeated until the two exponents are equal.
3. The addition and subtraction of the two mantissas is identical to the fixed-point addition and subtraction algorithm.
4. When A1 = 1, the mantissa is normalized and the operation is completed.
Multiplication
The multiplication of two floating-point numbers requires that we multiply the mantissas and add the exponents. No comparison
of exponents or alignment of mantissas is necessary.
The multiplication algorithm can also be subdivided into four parts:
Floating-point division requires that the exponents be subtracted and the mantissas divided. The mantissa division is done as in
fixed-point except that the dividend has a single-precision mantissa that is placed in the AC. Remember that the mantissa dividend
is a fraction and not an integer.
The division algorithm can be subdivided into five parts:
IEEE 754 numbers are represented into two standard formats i.e. single precision and double precision.
Single precision 1 (31st bit) 8 bits (30 - 23) 23 bits (22 - 0) 127
Double precision 1 (63rd bit) 11 bits (62 - 52) 52 bits (51 - 0) 1023
Step 1
Start with the positive version of the number:
|-14.625| = 14.625
Step 2
Convert to the binary (base 2) the integer part: 14.
Divide the number repeatedly by 2.
Keep track of each remainder. We stop when we get a quotient that is equal to zero.
division = quotient + remainder;
14 ÷ 2 = 7 + 0;
7 ÷ 2 = 3 + 1;
3 ÷ 2 = 1 + 1;
1 ÷ 2 = 0 + 1;
Now, take all the remainders starting from the bottom of the list constructed above.
14(10) = 1110(2)
Step 3
Convert to the binary (base 2) the fractional part: 0.625.
Multiply it repeatedly by 2.
Keep track of each integer part of the results. Stop when we get a fractional part that is equal to zero.
Now, take all the integer parts of the multiplying operations, starting from the top of the constructed list above:
0.625(10) = 0.101(2)
Hence, positive number before normalization: 14.625(10) = 1110.101(2)
Compiled by- Durgesh Pandey (CSED)
PSIT, Kanpur
Step 4
Normalize the binary representation of the number i.e. Shift the decimal mark 3 positions to the left so that only one non-
zero digit remains to the left of it:
14.625(10) = 1110.101(2) = 1110.101(2) × 20 = 1.110101(2) × 23
Up to this moment, there are the following elements that would feed into the 32 bit single precision IEEE 754 binary
floating point representation:
Sign: 1 (a negative number)
Exponent (unadjusted): 3
Mantissa (not normalized): 1.110101
Step 5
Adjust the exponent. Use the 8 bit excess/bias notation:
Exponent (adjusted) = Exponent (unadjusted) + 2(8-1) - 1 = 3 + 2(8-1) - 1 = (3 + 127)(10) = 130(10)
Step 6
Convert the adjusted exponent from the decimal (base 10) to 8 bit binary.
Use the same technique of repeatedly dividing by 2:
division = quotient + remainder;
130 ÷ 2 = 65 + 0;
65 ÷ 2 = 32 + 1;
32 ÷ 2 = 16 + 0;
16 ÷ 2 = 8 + 0;
8 ÷ 2 = 4 + 0;
4 ÷ 2 = 2 + 0;
2 ÷ 2 = 1 + 0;
1 ÷ 2 = 0 + 1;
Now, Construct the base 2 representation of the adjusted exponent i.e. Take all the remainders starting from the bottom of
the list constructed above:
Exponent (adjusted) = 130(10) = 1000 0010(2)
Step 7
Normalize the mantissa i.e.
a) Remove the leading (the leftmost) bit, since it is always 1, and the binary point, if the case.
b) Adjust its length to 23 bits, by adding the necessary number of zeros to the right.
Therefore,
Mantissa (normalized) = 1. 11 0101 0 0000 0000 0000 0000 = 110 1010 0000 0000 0000 0000
Step 8
The three elements that make up the number's 32 bit single precision IEEE 754 binary floating point representation:
Sign (1 bit) = 1 (a negative number)
Exponent (8 bits) = 1000 0010
Mantissa (23 bits) = 110 1010 0000 0000 0000 0000
Number -14.625 converted from decimal system (base 10) to 32 bit single precision IEEE 754 binary floating point:
1 1000 0010 110 1010 0000 0000 0000 0000 Final Answer.
More Examples –
Q. Convert 85.125 into 32 bit single precision and 64 bit double precision IEEE 754 binary
floating point.
Solution
85 = 1010101
0.125 = 001
85.125 = 1010101.001
=1.010101001 x 2^6
Sign bit = 0
2. Double precision
1029 = 10000000101
Normalized mantissa = 010101001 (we will add least significant 0's to complete the 52 bits)
= 0 10000000101 0101010010000000000000000000000000000000000000000000
~~~~~~*****{Ӂ۞Ӂ}*****~~~~~~