0% found this document useful (0 votes)

24 views

9-Algorithms For Floating Point Arithmetic Operations-22-01-2024

The document discusses floating point representation and arithmetic. It covers: 1) Floating point numbers represent real numbers using a sign bit, biased exponent, and significand (mantissa). The radix point is fixed between the sign and significand. 2) Common floating point formats like IEEE 754 use biased exponent representation, where the exponent value is the stored exponent minus a bias. 3) Floating point arithmetic operations like addition and multiplication require aligning operands, adjusting exponents, performing operations, then normalizing the result. Special cases like overflow and underflow must be handled.

Uploaded by

Baladhithya T

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views

9-Algorithms For Floating Point Arithmetic Operations-22-01-2024

Uploaded by

Baladhithya T

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 49

Floating Point Representation

Real Numbers
• Numbers with fractions
• Could be done in pure binary
• 1001.1010 = 23 + 20 +2-1 + 2-3 =9.625
• Radix point: Fixed or Moving?
• Fixed radix point: can’t represent very large or very small
numbers.
• Dynamically sliding the radix point -
• a range of very large and very small numbers can be represented.

In mathematics, radix point refers to the symbol used in numerical representations to separate the integral part of the number (to the
left of the radix) from its fractional part (to the right of the radix). The radix point is usually a small dot, either placed on the baseline
or halfway between the baseline and the top of the numerals. In base 10, the radix point is more commonly called the decimal
point. ... From en.wikipedia.org/wiki/Radix_point 2
Sign bit Floating Point

Biased Significand or Mantissa

Exponent
• +/- significand x 2exponent
• Point is actually fixed between sign bit and body of mantissa
• Exponent indicates place value (point position)

3
Signs for Floating Point
• Mantissa is stored in 2s compliment.
• Exponent is in excess or biased notation.
• Excess (biased exponent) 128 means
• 8 bit exponent field
• Pure value range 0-255
• Subtract (2 k-1 - 1)to get correct value
• Range -128 to +127

4
Normalization
• FP numbers are usually normalized
• exponent is adjusted so that leading bit (MSB) of mantissa is 1
• Since it is always 1 there is no need to store it
• (Scientific notation where numbers are normalized to give a single digit before
the decimal point e.g. 3.123 x 103)
• In FP representation: not representing more
individual values, but spreading the numbers.

5
IEEE 754
• Standard for floating point storage
• 32 and 64 bit standards
• 8 and 11 bit exponent respectively
• Extended formats (both mantissa and exponent) for intermediate
results

6
IEEE Floating-point Format
• IEEE has introduced a standard floating-point format for
arithmetic operations in mini and microcomputer, which
is defined in IEEE Standard 754
• In this format, the numbers are normalized so that the
significand or mantissa lie in the range 1F<2, which
corresponds to an integer part equal to 1
• An IEEE format floating-point number X is formally
defined as:
EB
X  1 x 2
S
x 1 .F
where S = sign bit [0+, 1]
E = exponent biased by B
F = fractional mantissa
7
Biased Exponent Representation

How to represent a signed exponent? The Choices are

 Sign + magnitude representation for the exponent
Two’s complement representation
 Biased representation
IEEE 754 uses biased representation for the exponent ( 32 bit)
•Value of exponent = val(E) = E – Bias (Bias is a constant)
•Recall that exponent field is 8 bits for single precision
•E can be in the range 0 to 255
• E = 0 and E = 255 are reserved for special use
• E = 1 to 254 are used for normalized floating point numbers
• Bias = 127 (half of 254), val(E) = E – 127
• val(E=1) = –126, val(E=127) = 0, val(E=254) = 127
• Two basics format are defined in the IEEE Standard 754
• These are the 32-bit single and 64-bit double formats,
with 8-bit and 11-bit exponent respectively

S ign
8 bits 23 bits
bit
B iased
S ignificand
E xponent

(a) S ingle form at

S ign
11 bits 52 bits
bit

B iased Exponent S ignificand

(b) D ouble form at

• A sign-magnitude representation has been adopted for

the mantissa; mantissa is negative if S =1, and positive if
S =0

9
Floating Point Examples

negative

20 127 + 20 = 147

negative

normalized
-20 127 - 20 = 107

The bias equals to (2K-1 – 1)  28-1 – 1 = 127 10

Example
Convert these number to IEEE single precision format:
(a) 199.953125 10 = 1100 0111.111101 2
= 1.100 0111 111101 x 2 7 stored
+ 7 + 127 = 13410 1  1 0 0 0 1 1 1 1 1 1 1 0 1
0 1 0 0 0 0 1 1 0 1 0 0 0 1 1 1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0
sign biased exponent significand

(b) -77.7 10 = -100 1101.10110 0110 2 ... 77 10 = 100 1101 2

= -1.00 1101 101100110 ... x 2 6 0.7 10  0.7 x 2  1.4

0.4 x 2  0.8
0.8 x 2  1.6
0.6 x 2  1.2
0.2 x 2  0.4
Slides adapted from tan 0.4 x 2  0.8
wooi haw’s lecture notes 0.8 x 2  1.6
(FOE) 0.6 x 2  1.2
0.2 x 2  0.4

...
stored [23 bits]
– 6 + 127 = 133 10 1
0 0 1 1 0 1 1 0 1 1 0 ...

1 1 0 0 0 0 1 0 1 0 0 1 1 0 1 1 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
sign biased exponent significand
11
12
FP Arithmetic +/-

• Check for zeros

• Align significands (adjusting exponents)
• Add or subtract significands
• Normalize result

13
FP Arithmetic x/

• Check for zero

• Add/subtract exponents
• Multiply/divide significands (watch sign)
• Normalize
• Round
• All intermediate results should be in double length storage

14
Floating-point Arithmetic (cont.)

Some basic floating-point arithmetic operations are shown in the table

15
Floating-point Arithmetic (cont.)

• For addition and subtraction, it is necessary to ensure that both

operand exponents have the same value
• This may involves shifting the radix point of one of the operand to
achieve alignment

16
Floating-point Arithmetic (cont.)
• Some problems that may arise during arithmetic operations are:
i. Exponent overflow: A positive exponent exceeds the
maximum possible exponent value and this may leads to +
or - in some systems
ii. Exponent underflow: A negative exponent is less than
the minimum possible exponent value (eg. 2-200), the
number is too small to be represented and maybe
reported as 0
iii. Significand underflow: In the process of aligning
significands, the smaller number may have a
significand which is too small to be represented
iv. Significand overflow: The addition of two
significands of the same sign may result in a carry out
from the most significant bit

17
FP Arithmetic +/-
• Unlike integer and fixed-point number representation,
floating-point numbers cannot be added in one simple
operation
• Consider adding two decimal numbers:
A = 12345
B = 567.89
If these numbers are normalized and added in floating-
point format, we will have

0.12345 x 10 5
+ 0.56789 x 10 3
?.????? x 10 ?

Obviously, direct addition cannot take place as the

exponents are different
18
FP Arithmetic +/- (cont.)
• Floating-point addition and subtraction will typically
involve the following steps:
i. Align the significand
ii. Add or subtract the significands
iii. Normalize the result
• Since addition and subtraction are identical except for
a sign change, the process begins by changing the sign
of the subtrahend if it is a subtract operation
• The floating-point numbers can only be added if the
two exponents are equal
• This can be done by aligning the smaller number with
the bigger number [increasing its exponent] or vice-
versa, so that both numbers have the same exponent
Slides adapted from tan wooi
haw’s lecture notes (FOE) 19
FP Arithmetic +/- (cont.)
• As the aligning operation may result in the lost of
digits, it is the smaller number that is shifted so that
any lost will therefore be of relatively insignificant
8 bits rem ains
shift
left
1.1001 x 2 9 110010000 x 2 1 1 x 2 9 is lost
1.0111 x 2 1 1.0111000 x 2 1
• Hence, the smaller number are shifted right by
increasing its exponent until the two exponents are the
same
• If both numbers have exponents that differ
significantly, the smaller number is lost as a result of
shifting
1.1001001 x 2 9 1.1001001 x 2 9
1.0110001 x 2 1 shift
0.0000000 x 2 9
20
right
1.1101 x 2 4
FP Arithmetic +/- (cont.) + 0.0101 x 2 4
10.00 10 x 2 4 1.0001 x 2 5
• After the numbers have been aligned, they are added
together taking into account their signs
• There might be a possibility of significand overflow
due to a carry out from the most significant bit
• If this occurs, the significand of the result if shifted
right and the exponent is incremented
• As the exponents are incremented, it might overflows
and the operation will stop
• Lastly, the result if normalized by shifting significand
digits left until the most significant digit is non-zero
• Each shift causes a decrement of the exponent and thus
could cause an exponent underflow
• Finally, the result is rounded off and reported
21
1.01101 x 2 7 1.01101 x 2 7
X – Y = ZSUBTRACT X = 1.01101 x 27
+ 0.110101 x 2 7 – 0.110101 x 2 7
Y = 1.10101 x 2 6
10.001111 x 2 7 0.100101 x 2 7

Change sign of Y

X = 1.01101 x 2 7
X+Y=Z 0.100101 x 2 7
Y = 0.110101 x 2 7
no no Expoenents yes Add signed Results yes
ADD X = 0? Y = 0? Round result
Equal? significands norm alized?

yes yes no no

Increm ent sm aller yes Significand Shift significand RETURN

ZY ZX Z0
exponent = 0? left

no 1.000 1111 x 2 8
1.00101 x 2 6
RETURN Shift significand RETURN Significand no Decrem ent
right overflow? exponent

10.001111 x 2 7 yes 1.00101 x 2 6

Significand Shift significand no Exponent
Y = 0.110101 x 2no7 = 0? right underflow?

yes

Put other num ber

1.0001111 x 2 8 Increm ent
Report underflow
in Z RETURN exponent

Slides adapted from tan wooi

haw’s lecture notes (FOE)
RETURN
Report overflow
yes Exponent
overflow?
no RETURN
22
FP Arithmetic +/- (cont.)
• Some of the floating-point arithmetic will lead to an
increase number of bits in the mantissa
• For example, consider adding these 5 significant bits
floating-point numbers:
A = 0.11001 x 24
B = 0.10001 x 23
A = 0.11001 x 2 4
B = 0.010001 x 2 4
norm alize
1.000011 x 2 4 0.100 0011 x 2 5

• The result has two extra bit of precision which cannot

be fitted into the floating point format
• For simplicity, the number can be truncated to give
0.10000 x 25
23
FP Arithmetic +/- (cont.)
• Truncation is the simplest method which involves
nothing more than taking away the extra bits
• A much better technique is rounding in which if the
value of the extra bits is greater than half the least
significant bit of the retained bits, 1 is added to the
LSB of the remaining digits
• For example, consider rounding these numbers to 4
significant bits:
i. 0.1101101
extra bits  0.0000101
LSB of retained bits  0.0001
0.110 1
0.1 1 0 1 1 0 1
+ 1
0.111 0
m ore than half
add 1 to the
LS B
24
FP Arithmetic +/- (cont.)
ii. 0.1101011
extra bits  0.0000011
LSB of retained bits  0.0001
0.1 1 0 1 0 1 1 0.1101
extra bits are
truncated
less than half

• Truncation always undervalues the result, leading to a

systematic error, whereas rounding sometimes reduces
the result and sometimes increases it
• Rounding is always preferred to truncation partly
because it is more accurate and partly it gives rise to an
unbiased error
• Major disadvantage of rounding is that it requires a
further arithmetic operation on the result
25
continue ...
ii. 68.310 + 12.210
68.3 10 = 100 0100.01001 1001 ... 68 10 = 100 0100 2
= 1.00 0100 01001 1001 ... x 2 6 0.3 10  0.3 x 2  0.6
0.6 x 2  1.2
0.2 x 2  0.4
0.4 x 2  0.8
0.8 x 2  1.6
0.6 x 2  1.2

...
only 24 bits can be stored

1 0 0 0 1 0 0 0 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1
32-bit register
m ore than half
+1 of the LS B

stored [23 bits]

+ 6 + 127 = 133 10 1  0 0 0 1 0 0 0 1 0 0 1 ...
0 1 0 0 0 0 1 0 1 0 0 0 1 0 0 0 1 0 0 1 1 0 0 1 1 0 0 1 1 0 1 0
sign biased exponent significand

26
continue ...
12.2 10 = 1100.0011 0011 ... 12 10 = 1100 2
= 1.100 0011 0011 ... x 2 3 0.2 10  0.2 x 2  0.4
0.4 x 2  0.8
0.8 x 2  1.6
0.6 x 2  1.2
0.2 x 2  0.4

...
only 24 bits can be stored

1 1 0 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1

less than half of

the LS B

stored [23 bits]

+ 3 + 127 = 130 10 1  1 0 0 0 0 1 1 0 0 1 1 ...
0 1 0 0 0 0 0 1 0 1 0 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
sign biased exponent significand

27
continue ...
Align the smaller number with the larger number by
shifting it to the right [increasing the exponent]
1000 0010 1.100 00110011 0011001 1  1000 0101 0.001 10000110 0110011 0011
exponent m antissa exponent m antissa

ADD the mantissa

1.00010001001100110011010
+ 0.00110000110011001100110011
1.01000010000000000000000011
less than half
of the LS B

Store the result in IEEE single-precision format

0 1 0 0 0 0 1 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
sign biased exponent significand

28
Floating-point Multiplication
XxY=Z X = 6.25 10 = 110.01 2 = 1.1001 x 2 2
M U LTIP LY Y = 12.5 10 = 1100.1 2 = 1.1001 x 2 3
E 1 = 127 + 2 = 129
no no
E 2 = 127 + 3 = 130
X = 0? Y = 0? A dd exponents
E 1 + E 2 = 259
yes yes

E T = 259 – 12 7 = 1 32
Z0 S ubtract bias

R E TU R N E xponent yes R eport

overflow? overflow

E xponent yes R eport

underflow underflow

1.100 1 2 no

x 1.100 1 2
M ultiply
10.01 110001 2 significands

10.01110001 x 2 5
=1.001110001 x 2 6 N orm alize

R ound R E TU R N 29
Floating-point Division
X = 3.75 10 = 11.11 2 = 1.111 x 2 1
Y X = Z
D IV ID E Y = 95.625 10 = 101 1111.101 2
= 1.011111101 x 2 6
E 1 = 127 + 1 = 128
X = 0?
no
Y = 0?
no S ubtract
exponents
E 2 = 127 + 6 = 133
E2 – E1 = 5
yes yes

E T = 127 + 5 = 132
Z 0 Z  A dd bias

R E TU R N E xponent yes R eport

overflow? overflow

E xponent yes R eport

underflow underflow

0.110011
1.111 1.011111101 D ivide
significands

0.110011 x 2 5
= 1.10011 x 2 4 N orm alize

R ound R E TU R N 30
Floating Point Multiplication

31
Floating Point Division

32
PROBLEM (1)

• Express the number - (640.5)10 in IEEE 32 bit and 64 bit floating point
format

33
SOLUTION (1)….
• IEEE 32 BIT FLOATING POINT FORMAT

MSB 8 bits 23 bits

sign Biased Mantissa/Significand
Exponent (Normalized)
Step 1: Express the given number in binary form
(640.5) = 1010000000.1* 20
Step 2: Normalize the number into the form 1.bbbbbbb

1010000000.1* 20 = 1. 0100000001* 29
Once Normalized, every number will have 1 at the leftmost bit. So IEEE notation is saying that there is no

need to store this bit. Therefore significand to be stored is 0100 0000 0100 0000 0000 000 in the allotted

23 bits
34
SOLUTION (1)…….
• Step 3: For the 8 bit biased exponent field, the bias
used is
2k-1-1 = 28-1-1 = 127
Add the bias 127 to the exponent 9 and
convert it into binary in order to store for 8-bit biased
exponent. 127 + 9 =136
( 1000 1000)
• Step 4: Since the given number is negative, put MSB
as 1
• Step 5: Pack the result into proper format(IEEE 32 bit)

1 1000 1000 0100 0000 0010 0000 0000 000

35
IEEE-754 Conversion Example
Represent -12.62510 in single precision IEEE-754 format.
• Step #1: Convert to target base. -12.62510 = -1100.1012
• Step #2: Normalize. -1100.1012 = -1.1001012 × 23
• Step #3: Fill in bit fields. Sign is negative, so sign bit is 1.
Exponent is
in excess 127 (not excess 128!), so exponent is represented as
the
unsigned integer 3 + 127 = 130. Leading 1 of significant is
hidden, so
final bit pattern is:
1 1000 0010 . 1001 0100 0000 0000 0000 000
SOLUTION (1)…...
• IEEE 64 BIT FLOATING POINT FORMAT

MSB 11 bits 52 bits

sign Biased Mantissa/Significand
Exponent (Normalized)
Step 1: Express the given number in binary form
(640.5) = 1010000000.1* 20
Step 2: Normalize the number into the form 1.bbbbbbb

1010000000.1* 20 = 1. 0100000001* 29
Once Normalized, every number will have 1 at the leftmost bit. So IEEE notation is saying that there is no

need to store this bit. Therefore significand to be stored is 0100 0000 0100 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 in the allotted 52 bits

37
SOLUTION (1)…
 Step 3: For the 11 bit biased exponent field, the bias
used is
2k-1-1 = 211-1-1 = 1023
Add the bias 1023 to the
exponent 9 and convert it into binary in order to store for
11-bit biased exponent.
1023 + 9 =1032 ( 1000 0001 000)
 Step 4: Since the given number is negative, put MSB as
1
 Step 5: Pack the result into proper format(IEEE 64 bit)

1 1000 0001 000 0100 0000 0010 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
38
Character Representation ASCII
ASCII (American Standard Code for Information Interchange) Code

MSB (3 bits)
0 1 2 3 4 5 6 7

LSB 0 NUL DLE SP 0 @ P ‘ P

(4 bits) 1 SOH DC1 ! 1 A Q a q
2 STX DC2 “ 2 B R b r
3 ETX DC3 # 3 C S c s
4 EOT DC4 $ 4 D T d t
5 ENQ NAK % 5 E U e u
6 ACK SYN & 6 F V f v
7 BEL ETB ‘ 7 G W g w
8 BS CAN ( 8 H X h x
9 HT EM ) 9 I Y I y
A LF SUB * : J Z j z
B VT ESC + ; K [ k {
C FF FS , < L \ l |
D CR GS - = M ] m }
E SO RS . > N m n ~
F SI US / ? O n o DEL
Control Character Representation
(ASCII)
NUL Null DC1 Device Control 1
SOH Start of Heading (CC) DC2 Device Control 2
STX Start of Text (CC) DC3 Device Control 3
ETX End of Text (CC) DC4 Device Control 4
EOT End of Transmission (CC) NAK Negative Acknowledge (CC)
ENQ Enquiry (CC) SYN Synchronous Idle (CC)
ACK Acknowledge (CC) ETB End of Transmission Block (CC)
BEL Bell CAN Cancel
BS Backspace (FE) EM End of Medium
HT Horizontal Tab. (FE) SUB Substitute
LF Line Feed (FE) ESC Escape
VT Vertical Tab. (FE) FS File Separator (IS)
FF Form Feed (FE) GS Group Separator (IS)
CR Carriage Return (FE) RS Record Separator (IS)
SO Shift Out US Unit Separator (IS)
SI Shift In DEL Delete
DLE Data Link Escape (CC)
(CC) Communication Control
(FE) Format Effector
(IS) Information Separator
The EBCDIC character code, shown
with hexadecimal indices
The EBCDIC control character
representation
EBCDIC

• The EBCDIC (Extended Binary Coded Decimal Interchange Code) is an

extended binary code for IBM mainframes, mid-range computers,
and peripheral devices that use 8 bits instead of the original 6-bit
format.
• Although EBCDIC is still used today, more modern encoding forms,
such as ASCII and Unicode, exist. While all IBM computers use
EBCDIC as their default encoding format, most IBM devices also
include support for modern formats, allowing them to take
advantage of newer features that EBCDIC does not provide.

43
• Applications
• EBCDIC is exclusively used on IBM machines such as mainframes,
midrange personal computers, and peripheral devices. Since most
IBM machines include extensive processing capabilities and some
support for modern encoding languages, they are able to keep up
and even outperform devices from other brands. However, most
machines and operating systems depend on ASCII and Unicode as
their default encoding format.

44
EBCDIC

• Advantages and Disadvantages

• EBCDIC is advantageous because it consists of an 8-bit character
language rather than the old 6-bit character language found on
punch card encoding systems.
• This allows EBCDIC to provide IBM machines with support for a wide
variety of functions that punch card encoding systems did not
provide

45
ASCII

• ASCII represents American Standard Code for Information

Interchange. It is the standard binary code used to represent
alphanumeric characters.
• Alphanumeric characters are used for the transfer of information to
and from the I/O devices and the computer. This standard helps
seven bits to code 128 characters. However, there is an additional bit
on the left that is always assigned 0. Therefore, there are 8 bits in
total.
• The ASCII code consists of 34 nonprinting characters and 94
characters used for various control operations. There are 26
uppercase letters A through Z, 26 lowercase letters a through z,
numerals from 0 to 9, and 32 printable characters including %,*.
• The control characters are used to route the data and arrange the
printed text into a prescribed format.
46
• In general, data is stored in a computer in the form of bits (1 or, 0).
There are various coding schemes available specifying the set of bytes
represented by each character.
• ASCII − Stands for American Standards Code for Information
Interchange. It is developed by American standards association and is
the mostly used coding system. It represents characters using 7 bits and
has includes 128 characters: upper and lowercase Latin alphabet, the
numbers 0-9, and some extra characters).
• Unicode (UTF) − Stands for Unicode Translation Format. It is developed
by The Unicode Consortium. if you want to create documents that use
characters from multiple character sets, you will be able to do so using
the single Unicode character encodings. It provides 3 types of encodings.
• UTF-8 − It comes in 8-bit units (bytes), a character in UTF8 can be from 1
to 4 bytes long, making UTF8 variable width.
• UTF-16 − It comes in 16-bit units (shorts), it can be 1 or 2 shorts long,
making UTF16 variable width.
• UTF-32 − It comes in 32-bit units (longs). It is a fixed-width format and is
always 1 "long" in length.

47
Reference

48
• Fractional Part

3D-MC2 Motor Grader
100% (1)
3D-MC2 Motor Grader
41 pages
Wizard 2 BPUNIR2 PDF
0% (1)
Wizard 2 BPUNIR2 PDF
16 pages
List of ISO Standards, 2016
No ratings yet
List of ISO Standards, 2016
32 pages
SAP Mail Title and Texts For Billing
No ratings yet
SAP Mail Title and Texts For Billing
5 pages
Lecture 10 (Temp)
No ratings yet
Lecture 10 (Temp)
50 pages
Lec07 - Computer Arithmetic - Floating-Point Representation and Arithmetic
No ratings yet
Lec07 - Computer Arithmetic - Floating-Point Representation and Arithmetic
42 pages
181
No ratings yet
181
11 pages
Floating-Point Numbers and Operations Representation
No ratings yet
Floating-Point Numbers and Operations Representation
8 pages
Floating Points
No ratings yet
Floating Points
31 pages
Floating-Point Numbers
No ratings yet
Floating-Point Numbers
23 pages
Floating Point Representation of Data: By-Astha Jain Class-It1 0827IT171019
No ratings yet
Floating Point Representation of Data: By-Astha Jain Class-It1 0827IT171019
16 pages
CEF352 Lect2
No ratings yet
CEF352 Lect2
18 pages
Floating Point Arithmetic Class
No ratings yet
Floating Point Arithmetic Class
24 pages
Module 2 - PART D Floating
No ratings yet
Module 2 - PART D Floating
30 pages
#3 - Floating Point
No ratings yet
#3 - Floating Point
38 pages
IEEE Paper On Floating Point
No ratings yet
IEEE Paper On Floating Point
28 pages
Complete Floating Point (Blog)
No ratings yet
Complete Floating Point (Blog)
18 pages
Floating Point & fixed point Representation_BCA II
No ratings yet
Floating Point & fixed point Representation_BCA II
24 pages
Floating Point: - We Need A Way To Represent
No ratings yet
Floating Point: - We Need A Way To Represent
14 pages
Computer Arithmetic Representations
No ratings yet
Computer Arithmetic Representations
24 pages
Floating-Point Arithmetic Floating-Point Arithmetic Floating-Point Arithmetic Floating-Point Arithmetic Floating-Point Arithmetic 33333
No ratings yet
Floating-Point Arithmetic Floating-Point Arithmetic Floating-Point Arithmetic Floating-Point Arithmetic Floating-Point Arithmetic 33333
18 pages
COA-Module6-FloatingPoint
No ratings yet
COA-Module6-FloatingPoint
17 pages
EC-502 - Aritra Dutta
No ratings yet
EC-502 - Aritra Dutta
6 pages
The IEEE Standard For Floating Point Arithmetic
No ratings yet
The IEEE Standard For Floating Point Arithmetic
9 pages
Cosc 2150: Computer Organization: Chapter 9, Part 3 Floating Point Numbers
No ratings yet
Cosc 2150: Computer Organization: Chapter 9, Part 3 Floating Point Numbers
39 pages
Itec1000 Lecture Note 5
No ratings yet
Itec1000 Lecture Note 5
10 pages
"The Course That Gives CMU Its Zip!": Topics
No ratings yet
"The Course That Gives CMU Its Zip!": Topics
30 pages
Floating Point Numbers
No ratings yet
Floating Point Numbers
7 pages
2.4 Floating Point Representation
No ratings yet
2.4 Floating Point Representation
7 pages
4.16. Floating Point
No ratings yet
4.16. Floating Point
5 pages
Ece552 10 Floating Point
No ratings yet
Ece552 10 Floating Point
15 pages
Floating Point Arithmetic
No ratings yet
Floating Point Arithmetic
30 pages
Real Number Representation and Floating Point Arithmetic
No ratings yet
Real Number Representation and Floating Point Arithmetic
12 pages
BiD 09
No ratings yet
BiD 09
56 pages
Floating Point
No ratings yet
Floating Point
26 pages
Floating Point 6up
No ratings yet
Floating Point 6up
7 pages
lec9
No ratings yet
lec9
11 pages
Lab 3
No ratings yet
Lab 3
5 pages
Floating Point Numbers: CS031 September 12, 2011
No ratings yet
Floating Point Numbers: CS031 September 12, 2011
22 pages
IEEE Standard 754
No ratings yet
IEEE Standard 754
10 pages
4.4_1 New Floating Point.pptx
No ratings yet
4.4_1 New Floating Point.pptx
22 pages
Chapter2 2.5
No ratings yet
Chapter2 2.5
34 pages
Q1: Why Is The Exponent Biased in Floating Point Hardware Design, and What Does Biased Mean in Floating Point?
No ratings yet
Q1: Why Is The Exponent Biased in Floating Point Hardware Design, and What Does Biased Mean in Floating Point?
2 pages
5268882
No ratings yet
5268882
23 pages
Floating Point Representation of Numbers: Wide Range
No ratings yet
Floating Point Representation of Numbers: Wide Range
11 pages
Document From Avijit Mukherjee
No ratings yet
Document From Avijit Mukherjee
10 pages
BCS302 Unit-2 (Part-III)
No ratings yet
BCS302 Unit-2 (Part-III)
7 pages
NXN Crossbar Design For Barrel Shifter: X-Input Y-Output
No ratings yet
NXN Crossbar Design For Barrel Shifter: X-Input Y-Output
18 pages
Cs2100 9 Floating Point
No ratings yet
Cs2100 9 Floating Point
32 pages
Week 5: IEEE Floating Point Revision Guide For Phase Test
No ratings yet
Week 5: IEEE Floating Point Revision Guide For Phase Test
23 pages
Floating Point Arithmetic Presentation
No ratings yet
Floating Point Arithmetic Presentation
3 pages
Floating Point
No ratings yet
Floating Point
16 pages
IEEE Standard 754 Floating Point Numbers
No ratings yet
IEEE Standard 754 Floating Point Numbers
7 pages
COA
No ratings yet
COA
14 pages
Chap-03 Computer Arithmetics
No ratings yet
Chap-03 Computer Arithmetics
16 pages
8.3 Floating Point Numbers
No ratings yet
8.3 Floating Point Numbers
19 pages
CSE_321_4_5
No ratings yet
CSE_321_4_5
11 pages
Computer Arithmetic Representations
No ratings yet
Computer Arithmetic Representations
24 pages
Floating Point Sept 6, 2006 15-213: "The Course That Gives CMU Its Zip!"
No ratings yet
Floating Point Sept 6, 2006 15-213: "The Course That Gives CMU Its Zip!"
34 pages
Floating Point
No ratings yet
Floating Point
26 pages
03.2 Numbers, Floating Point
No ratings yet
03.2 Numbers, Floating Point
12 pages
Division: Check For 0 Divisor Long Division Approach
No ratings yet
Division: Check For 0 Divisor Long Division Approach
27 pages
Pre-Calculus Essentials
From Everand
Pre-Calculus Essentials
Ernest Woodward
No ratings yet
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
2 - Riscv - Isa1
No ratings yet
2 - Riscv - Isa1
31 pages
INOI 2005 Question Paper
No ratings yet
INOI 2005 Question Paper
7 pages
Auto-Adaptive Harris Corner Detection Algorithm Based On Block Processing
No ratings yet
Auto-Adaptive Harris Corner Detection Algorithm Based On Block Processing
4 pages
Change Data Capture Error 14234
No ratings yet
Change Data Capture Error 14234
2 pages
Making Embedded Systems
No ratings yet
Making Embedded Systems
7 pages
Head Hunters School Photography
No ratings yet
Head Hunters School Photography
1 page
Manual Aire Haier Esa412j
No ratings yet
Manual Aire Haier Esa412j
25 pages
Linear and Aerial Perspective
No ratings yet
Linear and Aerial Perspective
8 pages
Understanding Error Log Event Sequence F
No ratings yet
Understanding Error Log Event Sequence F
8 pages
Manage Offerings and Data Stores
No ratings yet
Manage Offerings and Data Stores
8 pages
SPT06T22-3 Service Tools
No ratings yet
SPT06T22-3 Service Tools
13 pages
Gps 2101
No ratings yet
Gps 2101
170 pages
Baldor Ac Drive 18H405-E
No ratings yet
Baldor Ac Drive 18H405-E
132 pages
C Sharp Notes
No ratings yet
C Sharp Notes
172 pages
CHAINFLEX Ethernet Catalogue 2017.11
No ratings yet
CHAINFLEX Ethernet Catalogue 2017.11
31 pages
Java Programming
100% (2)
Java Programming
8 pages
Selection of Surge Protective Device (SPD) - (Part 1)
100% (1)
Selection of Surge Protective Device (SPD) - (Part 1)
5 pages
Essay On Networking
No ratings yet
Essay On Networking
4 pages
Steam Turbine
100% (4)
Steam Turbine
24 pages
English4IT - Unit 1 Essential English For IT Reading
No ratings yet
English4IT - Unit 1 Essential English For IT Reading
2 pages
MC600 Man
No ratings yet
MC600 Man
156 pages
Med SB g07 v2 en Web
No ratings yet
Med SB g07 v2 en Web
93 pages
ARD RAMPS Kit1.6 - Manual - 2022 04 22
No ratings yet
ARD RAMPS Kit1.6 - Manual - 2022 04 22
9 pages
UCCX 7.x Admin Guide
No ratings yet
UCCX 7.x Admin Guide
596 pages
Empowerment Technologies: Learning Activity Sheet
No ratings yet
Empowerment Technologies: Learning Activity Sheet
10 pages
Managing An Information Security and Privacy Awareness and Training Program by Rebecca Herold PDF
No ratings yet
Managing An Information Security and Privacy Awareness and Training Program by Rebecca Herold PDF
546 pages

9-Algorithms For Floating Point Arithmetic Operations-22-01-2024

Uploaded by

9-Algorithms For Floating Point Arithmetic Operations-22-01-2024

Uploaded by

Floating Point Representation

Biased Significand or Mantissa

How to represent a signed exponent? The Choices are

(a) S ingle form at

B iased Exponent S ignificand

(b) D ouble form at

• A sign-magnitude representation has been adopted for

The bias equals to (2K-1 – 1)  28-1 – 1 = 127 10

(b) -77.7 10 = -100 1101.10110 0110 2 ... 77 10 = 100 1101 2

= -1.00 1101 101100110 ... x 2 6 0.7 10  0.7 x 2  1.4

• Check for zeros

• Check for zero

Some basic floating-point arithmetic operations are shown in the table

• For addition and subtraction, it is necessary to ensure that both

Obviously, direct addition cannot take place as the

Increm ent sm aller yes Significand Shift significand RETURN

10.001111 x 2 7 yes 1.00101 x 2 6

Put other num ber

Slides adapted from tan wooi

• The result has two extra bit of precision which cannot

• Truncation always undervalues the result, leading to a

stored [23 bits]

less than half of

stored [23 bits]

ADD the mantissa

Store the result in IEEE single-precision format

R E TU R N E xponent yes R eport

E xponent yes R eport

R E TU R N E xponent yes R eport

E xponent yes R eport

MSB 8 bits 23 bits

1 1000 1000 0100 0000 0010 0000 0000 000

MSB 11 bits 52 bits

0000 0000 0000 0000 0000 in the allotted 52 bits

LSB 0 NUL DLE SP 0 @ P ‘ P

• The EBCDIC (Extended Binary Coded Decimal Interchange Code) is an

• Advantages and Disadvantages

• ASCII represents American Standard Code for Information

You might also like