9-Algorithms For Floating Point Arithmetic Operations-22-01-2024
9-Algorithms For Floating Point Arithmetic Operations-22-01-2024
Real Numbers
• Numbers with fractions
• Could be done in pure binary
• 1001.1010 = 23 + 20 +2-1 + 2-3 =9.625
• Radix point: Fixed or Moving?
• Fixed radix point: can’t represent very large or very small
numbers.
• Dynamically sliding the radix point -
• a range of very large and very small numbers can be represented.
In mathematics, radix point refers to the symbol used in numerical representations to separate the integral part of the number (to the
left of the radix) from its fractional part (to the right of the radix). The radix point is usually a small dot, either placed on the baseline
or halfway between the baseline and the top of the numerals. In base 10, the radix point is more commonly called the decimal
point. ... From en.wikipedia.org/wiki/Radix_point 2
Sign bit Floating Point
3
Signs for Floating Point
• Mantissa is stored in 2s compliment.
• Exponent is in excess or biased notation.
• Excess (biased exponent) 128 means
• 8 bit exponent field
• Pure value range 0-255
• Subtract (2 k-1 - 1)to get correct value
• Range -128 to +127
4
Normalization
• FP numbers are usually normalized
• exponent is adjusted so that leading bit (MSB) of mantissa is 1
• Since it is always 1 there is no need to store it
• (Scientific notation where numbers are normalized to give a single digit before
the decimal point e.g. 3.123 x 103)
• In FP representation: not representing more
individual values, but spreading the numbers.
5
IEEE 754
• Standard for floating point storage
• 32 and 64 bit standards
• 8 and 11 bit exponent respectively
• Extended formats (both mantissa and exponent) for intermediate
results
6
IEEE Floating-point Format
• IEEE has introduced a standard floating-point format for
arithmetic operations in mini and microcomputer, which
is defined in IEEE Standard 754
• In this format, the numbers are normalized so that the
significand or mantissa lie in the range 1F<2, which
corresponds to an integer part equal to 1
• An IEEE format floating-point number X is formally
defined as:
EB
X 1 x 2
S
x 1 .F
where S = sign bit [0+, 1]
E = exponent biased by B
F = fractional mantissa
7
Biased Exponent Representation
S ign
8 bits 23 bits
bit
B iased
S ignificand
E xponent
S ign
11 bits 52 bits
bit
9
Floating Point Examples
negative
20 127 + 20 = 147
negative
normalized
-20 127 - 20 = 107
...
stored [23 bits]
– 6 + 127 = 133 10 1
0 0 1 1 0 1 1 0 1 1 0 ...
1 1 0 0 0 0 1 0 1 0 0 1 1 0 1 1 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
sign biased exponent significand
11
12
FP Arithmetic +/-
13
FP Arithmetic x/
14
Floating-point Arithmetic (cont.)
15
Floating-point Arithmetic (cont.)
16
Floating-point Arithmetic (cont.)
• Some problems that may arise during arithmetic operations are:
i. Exponent overflow: A positive exponent exceeds the
maximum possible exponent value and this may leads to +
or - in some systems
ii. Exponent underflow: A negative exponent is less than
the minimum possible exponent value (eg. 2-200), the
number is too small to be represented and maybe
reported as 0
iii. Significand underflow: In the process of aligning
significands, the smaller number may have a
significand which is too small to be represented
iv. Significand overflow: The addition of two
significands of the same sign may result in a carry out
from the most significant bit
17
FP Arithmetic +/-
• Unlike integer and fixed-point number representation,
floating-point numbers cannot be added in one simple
operation
• Consider adding two decimal numbers:
A = 12345
B = 567.89
If these numbers are normalized and added in floating-
point format, we will have
0.12345 x 10 5
+ 0.56789 x 10 3
?.????? x 10 ?
Change sign of Y
X = 1.01101 x 2 7
X+Y=Z 0.100101 x 2 7
Y = 0.110101 x 2 7
no no Expoenents yes Add signed Results yes
ADD X = 0? Y = 0? Round result
Equal? significands norm alized?
yes yes no no
no 1.000 1111 x 2 8
1.00101 x 2 6
RETURN Shift significand RETURN Significand no Decrem ent
right overflow? exponent
yes
...
only 24 bits can be stored
1 0 0 0 1 0 0 0 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1
32-bit register
m ore than half
+1 of the LS B
26
continue ...
12.2 10 = 1100.0011 0011 ... 12 10 = 1100 2
= 1.100 0011 0011 ... x 2 3 0.2 10 0.2 x 2 0.4
0.4 x 2 0.8
0.8 x 2 1.6
0.6 x 2 1.2
0.2 x 2 0.4
...
only 24 bits can be stored
1 1 0 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
27
continue ...
Align the smaller number with the larger number by
shifting it to the right [increasing the exponent]
1000 0010 1.100 00110011 0011001 1 1000 0101 0.001 10000110 0110011 0011
exponent m antissa exponent m antissa
28
Floating-point Multiplication
XxY=Z X = 6.25 10 = 110.01 2 = 1.1001 x 2 2
M U LTIP LY Y = 12.5 10 = 1100.1 2 = 1.1001 x 2 3
E 1 = 127 + 2 = 129
no no
E 2 = 127 + 3 = 130
X = 0? Y = 0? A dd exponents
E 1 + E 2 = 259
yes yes
E T = 259 – 12 7 = 1 32
Z0 S ubtract bias
no
1.100 1 2 no
x 1.100 1 2
M ultiply
10.01 110001 2 significands
10.01110001 x 2 5
=1.001110001 x 2 6 N orm alize
R ound R E TU R N 29
Floating-point Division
X = 3.75 10 = 11.11 2 = 1.111 x 2 1
Y X = Z
D IV ID E Y = 95.625 10 = 101 1111.101 2
= 1.011111101 x 2 6
E 1 = 127 + 1 = 128
X = 0?
no
Y = 0?
no S ubtract
exponents
E 2 = 127 + 6 = 133
E2 – E1 = 5
yes yes
E T = 127 + 5 = 132
Z 0 Z A dd bias
no
no
0.110011
1.111 1.011111101 D ivide
significands
0.110011 x 2 5
= 1.10011 x 2 4 N orm alize
R ound R E TU R N 30
Floating Point Multiplication
31
Floating Point Division
32
PROBLEM (1)
• Express the number - (640.5)10 in IEEE 32 bit and 64 bit floating point
format
33
SOLUTION (1)….
• IEEE 32 BIT FLOATING POINT FORMAT
1010000000.1* 20 = 1. 0100000001* 29
Once Normalized, every number will have 1 at the leftmost bit. So IEEE notation is saying that there is no
need to store this bit. Therefore significand to be stored is 0100 0000 0100 0000 0000 000 in the allotted
23 bits
34
SOLUTION (1)…….
• Step 3: For the 8 bit biased exponent field, the bias
used is
2k-1-1 = 28-1-1 = 127
Add the bias 127 to the exponent 9 and
convert it into binary in order to store for 8-bit biased
exponent. 127 + 9 =136
( 1000 1000)
• Step 4: Since the given number is negative, put MSB
as 1
• Step 5: Pack the result into proper format(IEEE 32 bit)
1010000000.1* 20 = 1. 0100000001* 29
Once Normalized, every number will have 1 at the leftmost bit. So IEEE notation is saying that there is no
need to store this bit. Therefore significand to be stored is 0100 0000 0100 0000 0000 0000 0000 0000
1 1000 0001 000 0100 0000 0010 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
38
Character Representation ASCII
ASCII (American Standard Code for Information Interchange) Code
MSB (3 bits)
0 1 2 3 4 5 6 7
43
• Applications
• EBCDIC is exclusively used on IBM machines such as mainframes,
midrange personal computers, and peripheral devices. Since most
IBM machines include extensive processing capabilities and some
support for modern encoding languages, they are able to keep up
and even outperform devices from other brands. However, most
machines and operating systems depend on ASCII and Unicode as
their default encoding format.
44
EBCDIC
45
ASCII
47
Reference
48
• Fractional Part
49