0% found this document useful (0 votes)
39 views

Floating Point Formats: 0 1 1 P 1 (P 1) e I

This document discusses floating point number formats and arithmetic. It begins by explaining scientific notation and the components of floating point representation, including sign, significand, base, and exponent. It then discusses properties like the scaling of gaps between numbers, relative resolution given by machine epsilon, and the existence of a representable number close to any real number. Special quantities like infinity and NaN are also introduced. The rest of the document focuses on details of the IEEE single and double precision floating point formats, including exponent and significand sizes, valid number ranges, and example calculations. It concludes by noting properties of floating point arithmetic operations like rounding errors bounded by machine epsilon.

Uploaded by

VIJAYPUTRA
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

Floating Point Formats: 0 1 1 P 1 (P 1) e I

This document discusses floating point number formats and arithmetic. It begins by explaining scientific notation and the components of floating point representation, including sign, significand, base, and exponent. It then discusses properties like the scaling of gaps between numbers, relative resolution given by machine epsilon, and the existence of a representable number close to any real number. Special quantities like infinity and NaN are also introduced. The rest of the document focuses on details of the IEEE single and double precision floating point formats, including exponent and significand sizes, valid number ranges, and example calculations. It concludes by noting properties of floating point arithmetic operations like rounding errors bounded by machine epsilon.

Uploaded by

VIJAYPUTRA
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Floating Point Formats

Scientic notation:

Lecture 8 - Floating Point Arithmetic, The IEEE Standard


MIT 18.335J / 6.337J Introduction to Numerical Methods

19 1 10 602 .
sign signicand base exponent

Per-Olof Persson October 3, 2006

Floating point representation d0 + d1 1 + . . . + dp1 (p1) e , 0 di <


with base and precision p

Exponent range [emin , emax ] Normalized if d0 = 0 (use e = emin 1 to represent 0)


1 2

Floating Point Numbers


The gaps between adjacent numbers scale with the size of the numbers Relative resolution given by machine epsilon, machine = .5 1p For all x, there exists a oating point x such that |x x | machine |x| Example: = 2, p = 3, emin = 1, emax = 2

Special Quantities
is returned when an operation overows x/ = 0 for any number x, x/0 = for any nonzero number x Operations with innity are dened as limits, e.g. 4 = lim 4 x =
x

NaN (Not a Number) is returned when the an operation has no


well-dened nite or innite result

Examples: , /, 0/0,

1, NaN x

Denormalized Numbers
With normalized signicand there is a gap between 0 and emin This can result in x y = 0 even though x = y , and code fragments like if x = y then z = 1/(x y ) might break Solution: Allow non-normalized signicand when the exponent is emin This gradual underow garantees that x = y x y = 0

IEEE Single Precision


1 sign bit, 8 exponent bits, 23 signicand bits:
0 S 00000000 E 0000000000000000000000000000000 M

Represented number:

Special cases:
0 emin emin +1 emin +2 emin +3

(1)S 1.M 2E 127


E=0 0 < E < 255
Powers of 2

E = 255
NaN

M =0

emin

emin +1

emin +2
5

emin +3

M = 0 Denormalized Ordinary numbers


6

IEEE Single Precision, Examples


S 0 1 0 0 0 0 0 0 0 1 1 1 E 11111111 11111111 11111111 10000001 10000000 00000001 00000000 00000000 00000000 00000000 10000001 11111111 M 00000100000000000000000 00100010001001010101010 00000000000000000000000 10100000000000000000000 00000000000000000000000 00000000000000000000000 10000000000000000000000 00000000000000000000001 00000000000000000000000 00000000000000000000000 10100000000000000000000 00000000000000000000000
7

IEEE Floating Point Data Types


Single precision Signicand size (p) Exponent size Total size 24 bits 8 bits 32 bits +127 -126 Double precision 53 bits 11 64 bits +1023 -1022

Quantity NaN NaN

+1 2129127 1.101 = 6.5 +1 2


128127

+1 21127 1.0 = 2126 +1 2126 0.1 = 2127 0 0 +1 2126 223 = 2149

1.0 = 2

emax emin
Smallest normalized Largest normalized

126

2127 1038

10

38

21022 10308 21023 10308 253 1016

machine

1 2129127 1.101 = 6.5

224 6 108

Floating Point Arithmetic


Dene (x) as the closest oating point approximation to x By the denition of machine , we have for the relative error:
For all x

R, there exists with || machine such that (x) = x(1 + )

The result of an operation using oating point numbers is (a b) If (a b) is the nearest oating point number to a b, the arithmetic
rounds correctly (IEEE does), which leads to the following property: For all oating point x, y , there exists with ||

x y = (x y )(1 + )

machine such that

Round to nearest even in the case of ties


9

MIT OpenCourseWare http://ocw.mit.edu

18.335J / 6.337J Introduction to Numerical Methods


Fal l 2010

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

You might also like