0% found this document useful (0 votes)
24 views

Module 2 - PART D Floating

Here are the steps to divide 2.5 by 0.5 in IEEE 754 single precision format: 1) 2.5 = 1.01 x 2^1 0.5 = 1 x 2^-1 2) Subtract exponents: 1 - (-1) = 2 3) Add bias 127: 2 + 127 = 129 4) Divide mantissas: 1.01 / 1 = 1.01 5) The result is positive so sign bit is 0 6) Normalize: 1.01 x 2^2 = 5 So the final result in binary32 format is: 0 10000001 0101 x 2^2 = 5

Uploaded by

Phani Thota
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Module 2 - PART D Floating

Here are the steps to divide 2.5 by 0.5 in IEEE 754 single precision format: 1) 2.5 = 1.01 x 2^1 0.5 = 1 x 2^-1 2) Subtract exponents: 1 - (-1) = 2 3) Add bias 127: 2 + 127 = 129 4) Divide mantissas: 1.01 / 1 = 1.01 5) The result is positive so sign bit is 0 6) Normalize: 1.01 x 2^2 = 5 So the final result in binary32 format is: 0 10000001 0101 x 2^2 = 5

Uploaded by

Phani Thota
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Module 2

Data Representation And


Computer Arithmetic

Dr. A. Bhuvaneswari
Assistant Professor, SCOPE, VIT Chennai
Going Beyond Integers : Floating Point

Programming languages support numbers with real numbers i.e., fractions

These fractions are floating point numbers

Examples:
=Pi value is 3.14159……
e= Euler Number is 2.71828…..
Nanoseconds in a day=86,400,000,000,000 ns
The last numbers is large integer that cannot fit in 32-bit integer.
Distribution of Floating Point Numbers
e = -1 e= 0 e= 1

• 3 bit mantissa 1.00 X 2^(-1)


1.01 X 2^(-1)
=
=
1/2
5/8
1.00 X 2^0
1.01 X 2^0
=
=
1
5/4
1.00 X 2^1 = 2
1.01 X 2^1 = 5/2
1.10 X 2^(-1) = 3/4 1.10 X 2^0 = 3/2 1.10 X 2^1= 3
• exponent {-1,0,1} 1.11 X 2^(-1) = 7/8 1.11 X 2^0 = 7/4 1.11 X 2^1 = 7/2

0 1 2 3
Distribution of Floating Point Numbers
e = -1 e= 0 e= 1

• 3 bit mantissa 1.00 X 2^(-1)


1.01 X 2^(-1)
=
=
1/2
5/8
1.00 X 2^0
1.01 X 2^0
=
=
1
5/4
1.00 X 2^1 = 2
1.01 X 2^1 = 5/2
1.10 X 2^(-1) = 3/4 1.10 X 2^0 = 3/2 1.10 X 2^1= 3
• exponent {-1,0,1} 1.11 X 2^(-1) = 7/8 1.11 X 2^0 = 7/4 1.11 X 2^1 = 7/2

0 1 2 3
Floating Point
• An IEEE floating point representation consists of
– A Sign Bit (no surprise)
– An Exponent (“times 2 to the what?”)
– Mantissa (“Significand”), which is assumed to be 1.xxxxx (thus, one
bit of the mantissa is implied as 1)
– This is called a normalized representation
• So a mantissa = 0 really is interpreted to be 1.0, and a
mantissa of all 1111 is interpreted to be 1.1111
• Special cases are used to represent denormalized
mantissas (true mantissa = 0), NaN, etc., as will be
discussed.
Floating Point Standard
• Defined by IEEE Std 754-1985
• Developed in response to divergence of representations
– Portability issues for scientific code
• Now almost universally adopted
• Two representations
– Single precision (32-bit)
– Double precision (64-bit)
IEEE Floating-Point Format

single: 8 bits single: 23 bits


double: 11 bits double: 52 bits
S Exponent Fraction

x  ( 1) S  (1 Fraction)  2(Exponent Bias)

• S: sign bit (0  non-negative, 1  negative)


• Normalize significand: 1.0 ≤ |significand| < 2.0
– Always has a leading pre-binary-point 1 bit, so no need to represent it explicitly
(hidden bit)
– Significand is Fraction with the “1.” restored
• Exponent: excess representation: actual exponent + Bias
– Ensures exponent is unsigned
– Single: Bias = 127; Double: Bias = 1203
Single-Precision Range
• Exponents 00000000 and 11111111 reserved
• Smallest value
– Exponent: 00000001
 actual exponent = 1 – 127 = –126
– Fraction: 000…00  significand = 1.0
– ±1.0 × 2–126 ≈ ±1.2 × 10–38
• Largest value
– exponent: 11111110
 actual exponent = 254 – 127 = +127
– Fraction: 111…11  significand ≈ 2.0
– ±2.0 × 2+127 ≈ ±3.4 × 10+38
Double-Precision Range
• Exponents 0000…00 and 1111…11 reserved
• Smallest value
– Exponent: 00000000001
 actual exponent = 1 – 1023 = –1022
– Fraction: 000…00  significand = 1.0
– ±1.0 × 2–1022 ≈ ±2.2 × 10–308
• Largest value
– Exponent: 11111111110
 actual exponent = 2046 – 1023 = +1023
– Fraction: 111…11  significand ≈ 2.0
– ±2.0 × 2+1023 ≈ ±1.8 × 10+308
Representation of Floating Point
Numbers
• IEEE 754 single precision
31 30 23 22 0

Sign Biased exponent Normalized Mantissa (implicit 24th bit = 1)

Exponent Mantissa Object Represented


0 0 0

(-1)s  F  2E-127
0 non-zero denormalized
1-254 anything FP number
255 0 pm infinity
255 non-zero NaN
Why biased exponent?

• For faster comparisons (for sorting, etc.), allow integer comparisons of floating point numbers:

• Unbiased exponent:

1/2 0 1111 1111 000 0000 0000 0000 0000 0000


2 0 0000 0001 000 0000 0000 0000 0000 0000

• Biased exponent:

1/2 0 0111 1110 000 0000 0000 0000 0000 0000


2 0 1000 0000 000 0000 0000 0000 0000 0000
Floating Point Numbers : Scientific Notation
We use scientific notation to represent fraction/floating point numbers. It
has single digit to the left of decimal point.

A number 0.000000001ten can be represented as 1.0ten x 10–9

Small number:
Mass of neutron=1.67 x 10-27 kg
Seconds in nanosecond=1.0 x 10-9 s

Large number:
Mass of earth=5.92 x 1024 kg
Seconds in a century=3.16 x 109 s
Nanoseconds in a day=8.64 x 1013 ns
Floating Point Numbers :Normalized Scientific Notation
Floating Point Representation: IEEE 754 Standard
A floating point number is represented using three fields: sign bit, exponent
and fraction field
IEEE 754 Floating Point Standard Cntd…
Possibility : numbers(large)
IEEE 754 Floating Point Standard Cntd…
Underflow or overflow problems: solved using exponent(larger)
The following is IEEE 754 double precision floating point format.

This floating point number takes two 32-bit word. S is single bit sign,
11-bits exponent field, 52-bits fraction field.
Biased Exponent Representation

IEEE 754 : exponent -biased representation

For single precision, IEEE 754 uses a bias of 127.

So -1 is represented as -1+127=12610= 0111 11102(Exponent)


+1 is represented as 1+127=12810= 1000 00002

For double precision, IEEE 754 uses a bias of 1023.


Exercise 1:
Exercise 2:
Exercise 3: Find the decimal value for the given IEEE
754 binary representation
Floating Point Addition
Question: Add 0.510 and –0.437510 in binary.
Floating Point Multiplication
Floating point arithmetic: DIV rule
 Subtract the exponents
 Add the bias.
 Divide the mantissas and determine the sign of the result.
 Normalize the result if necessary.
 Truncate/round the mantissa of the result.
Note: Multiplication and division does not require alignment of the
mantissas the way addition and subtraction does.

You might also like