0% found this document useful (0 votes)
8 views

4.4_1 New Floating Point.pptx

Uploaded by

pes2ug23cs007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

4.4_1 New Floating Point.pptx

Uploaded by

pes2ug23cs007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

DIGITAL DESIGN & COMPUTER

ORGANISATION
Floating Point

Sudarshan T S B., Ph.D.


Department of Computer Science
& Engineering
DIGITAL DESIGN & COMPUTER ORGANISATION

Floating Point

Sudarshan T S B., Ph.D.


Department of Computer Science & Engineering
FLOATING POINT
Course Outline

🔵 Digital Design
► Combinational logic design
► Sequential logic design Concepts covered
★ Floating Point
🔵 Floating Point Representation
🔵 Computer Organisation
► Architecture (microprocessor
instruction set)
► Microarchitecture (microprocessor
operation)
FLOATING POINT
Not Just Integers

🔵 Real numbers can be represented using:


► Fixed point
► Floating point

🔵 Fixed point notation is where the decimal point is fixed and numbers to the right
of decimal point are the fraction portion and to the left is the integer portion.
► Limited by the digits used
► Not suitable to represent very small are very large numbers

🔵 Programming languages support fraction called floating point numbers


► Example: 3.14159265… (𝜋); 2.71828… (𝑒)
► Data type used float , double
Number Systems
Fixed–Point Number
Systems
❖ Signed fixed–point numbers can use either two’s complement or
sign/magnitude notation.
❖ Figure shows the fixed–point representation of −2.375 using both
notations with four integer and four fraction bits.
FLOATING POINT
Fixed Point Example

🔵 Represent 6.75 using 4 integer bits and 4 fraction bits:


► 6 => 0110 (22+21)
► 0.75 => 0.1100 (2-1 + 2-2)
► 6.75 => 0110.1100

► Here binary point is implied and the number of bits used is decided before
hand
► Fixed point
► Floating point

🔵 Represent -7.5 using 4 integer and 4 fraction bits


► +7.5 => 0111.1000
► 2’s complement -7.5 => 1000.1000
FLOATING POINT
Fixed Point Example

🔵 Perform the following operation: 7.5 – 0.625 => 7.5 + (-0.625)


► 7.5 => 0111.1000
► -0.625 => 111.0110 (2’scomplement)
► 0111.1000 + 1111.0110 = 0110.1110 (6.875)

🔵 The range and accuracy is very limited.


► Ex: 8.9375 + 8.3125 = 17.2495
► 8.9375 => 1000.1111
► 8.3125 => 1000.0101
► Add: 0001.0100 (1.25) which is the result of limited range and limited
accuracy

🔵 How to increase the range and improve the accuracy?


► Go for Floating Point Representation
FLOATING POINT
Not Just Integers

🔵 Floating point notation is used to represent real numbers which are from small
to large numbers

🔵 We use scientific notation to represent these numbers


► ± d.f1f2f3… x 10 ± 𝑒1e2
► ±MxB±E

🔵 This representation is to include very small numbers like 1.0 x 10-23 and very
larger numbers like 9.546 x 1012

🔵 Floating point numbers should be normalized


► Use one non-zero digit as integer
► In decimal it will be from 1 to 9
► In binary this should be 1
► Ex:
Normalised floating point: 2.234 x 103 or 1.101 x 2-4
Non-normalized floating point: 0.0234 x 105 or 110.1 x 2-6
FLOATING POINT
IEEE 754-2008 Standard

🔵 IEEE Standard defines structure of floating point number representation


🔵 Developed in response to divergence of representations and arithmetic operations
► Portability issues for scientific code
► Universally adopted
🔵 Defines four representations:
► Single Precision (32-bits)
► Double Precision (64-bits)
► Extended Double Precision 10 bytes (80-bits)
► Quadruple Precision 16 bytes(128-bits)
🔵 Real Number is represented in IEEE 754-2008 standard as three parts:
► Sign bit
► Exponent bits
► Mantissa bits or Significand bits
FLOATING POINT
IEEE 754-2008 Standard (Single Precision)

🔵 Sign bit 0 indicates positive and 1 indicates negative number


🔵 Mantissa represents fraction and signifies accuracy of the number
🔵 Exponent represents range of the numbers that shall be represented
🔵 General form: ± 1.Mantissa x 2 Exponent
🔵 For Single precision (32-bits) representation:
► Biased Exponent is 8-bits
► Mantissa is 23 bits
FLOATING POINT
IEEE 754-2008 Standard (Double Precision)

🔵 Sign bit 0 indicates positive and 1 indicates negative number


🔵 Mantissa represents fraction and signifies accuracy of the number
🔵 Exponent represents range of the numbers that shall be represented
🔵 General form: ± 1.Mantissa x 2 Exponent
🔵 For Double precision (64-bits) representation:
► Biased Exponent is 11-bits
► Mantissa is 52 bits
FLOATING POINT
Biased Exponent

🔵 Biased Exponent, BE = Bias + Exponent


🔵 Actual Exponent, E = Biased Exponent – Bias
🔵 Recall for Single precision Biased Exponent is 8-bits (Range: 0 to 255)
🔵 BE = 0 and BE = 255 are reserved for special use
🔵 So, BE = 1 to 254 are used for normalized floating point numbers.
🔵 Bias = 127 => (2n-1 – 1)
🔵 Therefore Range of Actual exponent that could be represented is:
► Min = 1 – 127 = -126

► Max = 254 – 127 = 127

► So range is from -126 to +127

🔵 FP Representation:
s (BE-Bias)
Bias = 127 for SP
N = (-1) * (1+.M)*2 Bias = 1023 for DP
FLOATING POINT
FP Example

What is the value of the following number:


0 100 0011 0 110 0100 0000 0000 0000 0000

In Hexadecimal this is represented as 0x43640000

Solution:
Sign = 0; Positive number
Biased Exponent = (1000 0110)2 = 134;
Actual Exponent = 134 – 127 = 7
Mantissa = (1. 1100 10…000)2 = 1.78125 (1. is implicit)

So, the value of the decimal = 1.1100100 x 27 = 11100100 = 228


FLOATING POINT
FP Example

What is the value of the following number:


1 011 1110 0 010 0000 0000 0000 0000 0000

In Hexadecimal this is represented as 0xBE200000

Solution:
Sign = 1;
Biased Exponent = (0111 1100)2 = 124;
Actual Exponent = 124 – 127 = -3
Mantissa = (1. 0100 00…000)2 = 1.25 (1. is implicit)

So, the value of the decimal = -1.25 x 2-3 = - 0.15625


FLOATING POINT
FP Example

Write -58.2510 in Single Precision Floating Point (IEEE 754)

1. Convert decimal to binary:


58.2510 = 111010.012
2. Write in normalized scientific notation:
1.1101001 × 25 (1. is implicit)
3. Fill in fields:
Sign bit: 1 (negative)
8 exponent bits: (127 + 5) = 132 = 100001002
23 fraction bits: 110 1001 0000 0000 0000 0000

1 100 0010 0 110 1001 0000 0000 0000 0000

In Hexadecimal this is represented as 0xC2690000


FLOATING POINT
FP Example

Write -58.2510 in Double Precision Floating Point (IEEE 754)

1. Convert decimal to binary:


58.2510 = 111010.012
2. Write in normalized scientific notation:
1.1101001 × 25 (1. is implicit)
3. Fill in fields:
Sign bit: 1 (negative)
11 exponent bits: (1023 + 5) = 1028 = 100 0000 01002
52 fraction bits: 1101 0010 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

1 100 0000 0100 1101 0010 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

In Hexadecimal this is represented as 0xC04D200000000000


FLOATING POINT
Smallest and Largest Normalised FP value

Single Precision FP:

Exponents 00000000 and 11111111 are reserved

Smallest value
Bias Exponent: 00000001
⇒ Actual Exponent = 1 – 127 = –126
Fraction: 000…00 ⇒ significand = 1.0
±1.0 × 2–126 ≈ ±1.17549… × 10–38

Largest value
Biased Exponent: 11111110
⇒ Actual Exponent = 254 – 127 = +127
Fraction: 111…11 ⇒ significand ≈ 2.0
±2.0 × 2+127 = 2-128 ≈ ±3.4028… × 10+38
FLOATING POINT
Special Cases

Number Sign Exponent Fraction


0 X 00000000 00000000000000000000000
∞ 0 11111111 00000000000000000000000
-∞ 1 11111111 00000000000000000000000
NaN X 11111111 non-zero

*NaN is Not a Number


Ex: ÷ by zero, √-ve no.
FLOATING POINT
Special Cases

Source: Computer Organisation & Design by


Patterson & Hennessy, Morgan Kaufmann
FLOATING POINT
Rounding Modes

🔵 Overflow: number too large to be represented


🔵 Underflow: number too small to be represented
🔵 Rounding modes:
► Down
► Up
► Toward zero
► To nearest
🔵 Example: round 1.100101 (1.578125) to only 3 fraction bits
► Down: 1.100
► Up: 1.101
► Toward zero: 1.100
► To nearest: 1.101 (1.625 is closer to 1.578125 than 1.5 is)
FLOATING POINT
Think about it

🔵 What are the largest normalized double precision FP numbers?


► Hint: double precision exponent is 11 bits and mantissa is 52
bits

🔵 What is the relative precision in terms of decimal fractional digits


that single precision and double precision offer?
► Hint: mantissa bits

🔵 An example to represent denormalized valid floating point


number?
► Hint: Biased Exponent = 0 & Mantissa = Nonzero
THANK YOU

Sudarshan T S B. Ph.D.,
Department of Computer Science & Engineering
[email protected]
+91 80 6666 3333 Extn 215

You might also like