0% found this document useful (0 votes)

21 views103 pages

MA 214 Lecture 4

The document discusses floating point representation and errors that can occur in numerical computations using floating point numbers. Floating point numbers approximate real numbers using a sign, mantissa, and exponent. Computations can result in underflow or overflow errors due to memory limitations, and rounding or chopping errors due to the finite precision of mantissa representation.

Uploaded by

Harsh Shah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views103 pages

MA 214 Lecture 4

Uploaded by

Harsh Shah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 103

Introduction to Numerical Analysis

(Arithmetic Errors: Floating-point Approximation)

MA 214, Spring 2023-24.

Spring 2022-23 1 / 38
Arithmetic Errors

Spring 2022-23 2 / 38
Floating-Point Representation
Let β ∈ N and β ≥ 2.

Spring 2022-23 3 / 38
Floating-Point Representation
Let β ∈ N and β ≥ 2.
Any real number can be represented exactly in base β as

(−1)s × (.d1 d2 · · · dn dn+1 · · · )β × β e ,

called the floating-point representation, where

Spring 2022-23 3 / 38
Floating-Point Representation
Let β ∈ N and β ≥ 2.
Any real number can be represented exactly in base β as

(−1)s × (.d1 d2 · · · dn dn+1 · · · )β × β e ,

called the floating-point representation, where

di ∈ { 0, 1, · · · , β − 1 }, d1 6= 0 or d1 = d2 = d3 = · · · = 0,

Spring 2022-23 3 / 38
Floating-Point Representation
Let β ∈ N and β ≥ 2.
Any real number can be represented exactly in base β as

(−1)s × (.d1 d2 · · · dn dn+1 · · · )β × β e ,

called the floating-point representation, where

di ∈ { 0, 1, · · · , β − 1 }, d1 6= 0 or d1 = d2 = d3 = · · · = 0,
s = 0 or 1 is called the sign,

Spring 2022-23 3 / 38
Floating-Point Representation
Let β ∈ N and β ≥ 2.
Any real number can be represented exactly in base β as

(−1)s × (.d1 d2 · · · dn dn+1 · · · )β × β e ,

called the floating-point representation, where

di ∈ { 0, 1, · · · , β − 1 }, d1 6= 0 or d1 = d2 = d3 = · · · = 0,
s = 0 or 1 is called the sign,
an appropriate integer e called the exponent,

Spring 2022-23 3 / 38
Floating-Point Representation
Let β ∈ N and β ≥ 2.
Any real number can be represented exactly in base β as

(−1)s × (.d1 d2 · · · dn dn+1 · · · )β × β e ,

called the floating-point representation, where

di ∈ { 0, 1, · · · , β − 1 }, d1 6= 0 or d1 = d2 = d3 = · · · = 0,
s = 0 or 1 is called the sign,
an appropriate integer e called the exponent,
d1 d2 dn dn+1
(.d1 d2 · · · dn dn+1 · · · )β = + 2 + · · · + n + n+1 + · · ·
β β β β
is a β-fraction called the mantissa.
Spring 2022-23 3 / 38
Error Analysis: Floating-Point Representation (contd.)
Example:
When β = 2, the floating-point representation

(−1)s × (.d1 d2 · · · dn dn+1 · · · )2 × 2e

is called the binary floating-point representation.

Spring 2022-23 4 / 38
Error Analysis: Floating-Point Representation (contd.)
Example:
When β = 2, the floating-point representation

(−1)s × (.d1 d2 · · · dn dn+1 · · · )2 × 2e

is called the binary floating-point representation.

When β = 10, the floating-point representation

(−1)s × (.d1 d2 · · · dn dn+1 · · · )10 × 10e

is called the decimal floating-point representation.

Spring 2022-23 4 / 38
Error Analysis: Floating-Point Approximation

Spring 2022-23 5 / 38
Error Analysis: Floating-Point Approximation
Definition (n-Digit Floating-point Number)
Let β ∈ N and β ≥ 2. An n-digit floating-point number in base β is
of the form

(−1)s × (.d1 d2 · · · dn )β × β e

where
d1 d2 dn
(.d1 d2 · · · dn )β = + 2 + ··· + n
β β β
with di ∈ { 0, 1, · · · , β − 1 }, d1 6= 0 or d2 = d3 = · · · = 0, s = 0 or 1,
and an appropriate exponent e.
Spring 2022-23 5 / 38
Error Analysis: Floating-Point Approximation (contd.)
Example: The following are examples of real numbers in the decimal
floating point representation.
The real number x = 6.238 can be represented as
6.238 = (−1)0 × 0.6238 × 101 ,
in which case, we have s = 0, β = 10, e = 1, d1 = 6, d2 = 2,
d3 = 3 and d4 = 8.

Spring 2022-23 6 / 38
Error Analysis: Floating-Point Approximation (contd.)
Example: The following are examples of real numbers in the decimal
floating point representation.
The real number x = 6.238 can be represented as
6.238 = (−1)0 × 0.6238 × 101 ,
in which case, we have s = 0, β = 10, e = 1, d1 = 6, d2 = 2,
d3 = 3 and d4 = 8.
The real number x = −0.0014 can be represented in the decimal
floating-point representation as
x = (−1)1 × 0.14 × 10−2 .
Here s = 1, β = 10, e = −2, d1 = 1 and d2 = 4.
Spring 2022-23 6 / 38
Error Analysis: Floating-Point Approximation (contd.)
Remark:
Note that there are only finite number of digits in the n-digit floating-point
representation.

Spring 2022-23 7 / 38
Error Analysis: Floating-Point Approximation (contd.)
Remark:
Note that there are only finite number of digits in the n-digit floating-point
representation.
But a real number may have infinitely many digits and therefore infinitely
many digits in mantissa.
For instance,
1
= 0.33333 · · · = (−1)0 × (0.33333 · · · )10 × 100 .
3
Therefore, the representation
(−1)s × (.d1 d2 · · · dn )β × β e
is (in general) only an approximation to a real number.
Spring 2022-23 7 / 38
Error Analysis: Underflow and Overflow of Memory

Spring 2022-23 8 / 38
Error Analysis: Underflow and Overflow of Memory

Any computing device has limitations on the exponent e is limited, say,

m ≤ e ≤ M.

Spring 2022-23 8 / 38
Error Analysis: Underflow and Overflow of Memory

Any computing device has limitations on the exponent e is limited, say,

m ≤ e ≤ M.

During the calculation,

if some computed number has an exponent e > M then we say, the
memory overflow occurs and

Spring 2022-23 8 / 38
Error Analysis: Underflow and Overflow of Memory

Any computing device has limitations on the exponent e is limited, say,

m ≤ e ≤ M.

During the calculation,

if some computed number has an exponent e > M then we say, the
memory overflow occurs and
if e < m, we say the memory underflow occurs.

Spring 2022-23 8 / 38
Error Analysis: Chopping and Rounding a Number

Definition (Precision)
The number of digits n in the mantissa as given in the definition of the
n-digit floating point representation is called the precision or length of
the floating-point number.

Spring 2022-23 9 / 38
Error Analysis: Chopping and Rounding a Number (contd.)

Definition (Chopped Numbers)

Let x be a real number given in the floating-point representation as

x = (−1)s × (.d1 d2 · · · dn dn+1 · · · )β × β e .

Spring 2022-23 10 / 38
Error Analysis: Chopping and Rounding a Number (contd.)

Definition (Chopped Numbers)

Let x be a real number given in the floating-point representation as

x = (−1)s × (.d1 d2 · · · dn dn+1 · · · )β × β e .

The chopped approximation of x is given by

fl(x) = (−1)s × (.d1 d2 · · · dn )β × β e .

Spring 2022-23 10 / 38
Error Analysis: Chopping and Rounding a Number (contd.)

Definition (Chopped Numbers)

Let x be a real number given in the floating-point representation as

x = (−1)s × (.d1 d2 · · · dn dn+1 · · · )β × β e .

The chopped approximation of x is given by

fl(x) = (−1)s × (.d1 d2 · · · dn )β × β e .

Spring 2022-23 10 / 38
Error Analysis: Chopping and Rounding a Number (contd.)
Definition (Rounded Numbers)
Let x be a real number given in the floating-point representation as
x = (−1)s × (.d1 d2 · · · dn dn+1 · · · )β × β e .

Spring 2022-23 11 / 38
Error Analysis: Chopping and Rounding a Number (contd.)
Definition (Rounded Numbers)
Let x be a real number given in the floating-point representation as
x = (−1)s × (.d1 d2 · · · dn dn+1 · · · )β × β e .
The rounded approximation of x is given by
, 0 ≤ dn+1 < β2

(−1)s × (.d1 d2 · · · dn )β × β e
fl(x) =
(−1)s × (.d1 d2 · · · (dn + 1))β × β e , β2 ≤ dn+1 < β
s e
 × (.d1 d2 · · · (dn + 1))β × β :=
where (−1)

(−1)s × (.d1 d2 · · · dn )β + (. 0| 0 {z
· · · 0} 1)β  × β e .
(n−1)−times
Spring 2022-23 11 / 38
Error Analysis: Arithmetic Using n-Digit Rounding and Chopping

Procedure of performing arithmetic operations using n-digit round-

ing:

Spring 2022-23 12 / 38
Error Analysis: Arithmetic Using n-Digit Rounding and Chopping

Procedure of performing arithmetic operations using n-digit round-

ing:
Let denote any one of the basic arithmetic operations
‘+’, ‘−’, ‘×’ and ‘÷’.

Spring 2022-23 12 / 38
Error Analysis: Arithmetic Using n-Digit Rounding and Chopping

Procedure of performing arithmetic operations using n-digit round-

ing:
Let denote any one of the basic arithmetic operations
‘+’, ‘−’, ‘×’ and ‘÷’.
Let x and y be real numbers.

Spring 2022-23 12 / 38
Error Analysis: Arithmetic Using n-Digit Rounding and Chopping

Procedure of performing arithmetic operations using n-digit round-

ing:
Let denote any one of the basic arithmetic operations
‘+’, ‘−’, ‘×’ and ‘÷’.
Let x and y be real numbers.
The process of computing x y using n-digit rounding is as follows.

Spring 2022-23 12 / 38
Error Analysis: Arithmetic Using n-Digit Rounding and Chopping

Procedure of performing arithmetic operations using n-digit round-

ing:
Let denote any one of the basic arithmetic operations
‘+’, ‘−’, ‘×’ and ‘÷’.
Let x and y be real numbers.
The process of computing x y using n-digit rounding is as follows.

Step 1
Get the n-digit floating-point approximation fl(x) and fl(y ) of the numbers
x and y , respectively.

Spring 2022-23 12 / 38
Error Analysis: Arithmetic Using n-Digit Rounding and Chopping

Procedure of performing arithmetic operations using n-digit round-

ing:
Let denote any one of the basic arithmetic operations
‘+’, ‘−’, ‘×’ and ‘÷’.
Let x and y be real numbers.
The process of computing x y using n-digit rounding is as follows.

Step 2
Perform the calculation fl(x) fl(y ) using exact arithmetic.

Spring 2022-23 13 / 38
Error Analysis: Arithmetic Using n-Digit Rounding and Chopping

Procedure of performing arithmetic operations using n-digit round-

ing:
Let denote any one of the basic arithmetic operations
‘+’, ‘−’, ‘×’ and ‘÷’.
Let x and y be real numbers.
The process of computing x y using n-digit rounding is as follows.

Step 3
Get the n-digit floating-point approximation fl(fl(x) fl(y )) of fl(x) fl(y ).

Spring 2022-23 14 / 38
Error Analysis: Arithmetic Using n-Digit Rounding and Chopping

Procedure of performing arithmetic operations using n-digit round-

ing:
Let denote any one of the basic arithmetic operations
‘+’, ‘−’, ‘×’ and ‘÷’.
Let x and y be real numbers.
The process of computing x y using n-digit rounding is as follows.

Step 3
Get the n-digit floating-point approximation fl(fl(x) fl(y )) of fl(x) fl(y ).
The result from step 3 is the value of x y using n-digit rounding.

Spring 2022-23 14 / 38
Error Analysis: Arithmetic Using n-Digit Rounding and Chopping
(contd.)
√ √
Example: Consider the function f (x) = x( x + 1 − x).

Spring 2022-23 15 / 38
Error Analysis: Arithmetic Using n-Digit Rounding and Chopping
(contd.)
√ √
Example: Consider the function f (x) = x( x + 1 − x).
Let us evaluate f (100000) using a six-digit rounding.
√ √
f (100000) = 100000 100001 − 100000 .

0.316229 × 103 .

√
=⇒ fl( 100001) = 0.316229 × 103 .

Spring 2022-23 15 / 38
Error Analysis: Arithmetic Using n-Digit Rounding and Chopping
(contd.)

Example:
√
Similarly, fl( 100000) = 0.316228 × 103 . (Here we use rounding,
not chopping)

Spring 2022-23 16 / 38
Error Analysis: Arithmetic Using n-Digit Rounding and Chopping
(contd.)

Example:
√
Similarly, fl( 100000) = 0.316228 × 103 . (Here we use rounding,
not chopping)
√ √
Therefore, fl(fl( 100001) − fl( 100000)) = 0.1 × 10−2 .

Spring 2022-23 16 / 38
Error Analysis: Arithmetic Using n-Digit Rounding and Chopping
(contd.)

Example:
√
Similarly, fl( 100000) = 0.316228 × 103 . (Here we use rounding,
not chopping)
√ √
Therefore, fl(fl( 100001) − fl( 100000)) = 0.1 × 10−2 .
Finally, we have

fl(f (100000)) = fl(100000) × (0.1 × 10−2 )

Spring 2022-23 16 / 38
Error Analysis: Arithmetic Using n-Digit Rounding and Chopping
(contd.)

Example:
√
Similarly, fl( 100000) = 0.316228 × 103 . (Here we use rounding,
not chopping)
√ √
Therefore, fl(fl( 100001) − fl( 100000)) = 0.1 × 10−2 .
Finally, we have

fl(f (100000)) = fl(100000) × (0.1 × 10−2 )

= (0.1 × 106 ) × (0.1 × 10−2 ) = 100.

Spring 2022-23 16 / 38
Using six-digit chopping, the value of fl(f (100000)) is 200.

Justification:
√ √
100001√− 100000√= 0.0002 using 6 digit chopping. Thus
100000( 100001 − 100000) equals 200 after 6 digit chopping.

Spring 2022-23 17 / 38
Using six-digit chopping, the value of fl(f (100000)) is 200.

Justification:
√ √
100001√− 100000√= 0.0002 using 6 digit chopping. Thus
100000( 100001 − 100000) equals 200 after 6 digit chopping.

True value is 158.113

Spring 2022-23 17 / 38
Error Analysis: Machine Epsilon

Definition (Machine Epsilon)

The machine epsilon of a computer is the smallest positive real number
δ such that fl(1 + δ) > 1. Thus, for any real number 0 < δ̂ < δ, we have
fl(1 + δ̂) = 1, and 1 + δ̂ and 1 are identical within the computer’s
arithmetic.

Spring 2022-23 18 / 38
Error Analysis: Types of Errors
Definition (Errors)
1 The error in a computed quantity is defined as
Error = True Value - Approximate Value.

Spring 2022-23 19 / 38
Error Analysis: Types of Errors
Definition (Errors)
1 The error in a computed quantity is defined as
Error = True Value - Approximate Value.
2 Absolute value of an error is called the absolute error.
3 The relative error is a measure of the error in relation to the size of
the true value as given by
Error
Relative Error = ; True Value 6= 0
True Value

Remark: Let xA denote the approximation to the real number x. We use

the following notations:

E (xA ) := Error(xA ) = x − xA .
Ea (xA ) := Absolute Error(xA ) = |E (xA )|
E (xA )
Er (xA ) := Relative Error(xA ) = , x 6= 0.
x

Spring 2022-23 20 / 38
Error Analysis: Significant Digits
Example:
Consider the number x = 1/3 = 0.3333 · · · .
The number xA = 0.333 has three significant digits when compared to x.

Spring 2022-23 21 / 38
Error Analysis: Significant Digits
Example:
Consider the number x = 1/3 = 0.3333 · · · .
The number xA = 0.333 has three significant digits when compared to x.
Definition (Significant β-Digits)
Let β be a radix.
If xA is an approximation to x, then we say that xA approximates x to r
significant β-digits if r is the largest non-negative integer such that
|x − xA | 1 −r +1
≤ β .
|x| 2
Here we assume that x 6= 0.
Spring 2022-23 21 / 38
Error Analysis: Loss of Significant Digits
Example:
Consider two real numbers
x = 7.6545428 = 0.76545428 × 101 , y = 7.6544201 = 0.76544201 × 101 .

Spring 2022-23 22 / 38
Error Analysis: Loss of Significant Digits
Example:
Consider two real numbers
x = 7.6545428 = 0.76545428 × 101 , y = 7.6544201 = 0.76544201 × 101 .
The numbers
xA = 7.6545421 = 0.76545421×101 , yA = 7.6544200 = 0.76544200×101
are approximation to x and y , correct to seven and eight significant digits,
respectively.
zA = xA − yA = 0.12210000 × 10−3 , z = x − y = 0.12270000 × 10−3 .
|z − zA |
=⇒ ≈ 0.0049 < 0.5 × 10−2 .
|z|

xA has seven significant digits with respect to x;

yA has eight significant digits with respect to y ;

Spring 2022-23 23 / 38
Error Analysis: Loss of Significant Digits (contd.)

xA has seven significant digits with respect to x;

yA has eight significant digits with respect to y ;
Their difference zA has only three significant digits with respect to z.

Spring 2022-23 23 / 38
Error Analysis: Loss of Significant Digits (contd.)

xA has seven significant digits with respect to x;

yA has eight significant digits with respect to y ;
Their difference zA has only three significant digits with respect to z.

There is a loss of significant digits in the process of subtraction.

Spring 2022-23 23 / 38
Error Analysis: Loss of Significant Digits (contd.)

xA has seven significant digits with respect to x;

yA has eight significant digits with respect to y ;
Their difference zA has only three significant digits with respect to z.

There is a loss of significant digits in the process of subtraction.

The loss of significant digits in the process of calculating zA compared to

the significant digits in xA is 4.

Spring 2022-23 23 / 38
Error Analysis: Loss of Significant Digits (contd.)

A simple calculation shows that

Er (zA ) ≈ 53581 × Er (xA ),

and similarly for y .

Spring 2022-23 24 / 38
Error Analysis: Loss of Significant Digits (contd.)

A simple calculation shows that

Er (zA ) ≈ 53581 × Er (xA ),

and similarly for y .

Loss of significant digits is therefore dangerous.

Spring 2022-23 24 / 38
Error Analysis: Loss of Significant Digits (contd.)

A simple calculation shows that

Er (zA ) ≈ 53581 × Er (xA ),

and similarly for y .

Loss of significant digits is therefore dangerous.

The loss of significant digits in the process of calculation is referred to as

Loss of Significance.

Spring 2022-23 24 / 38
Error Analysis: Loss of Significant Digits (contd.)

Example: Consider the function

√ √
f (x) = x( x + 1 − x).

Spring 2022-23 25 / 38
Error Analysis: Loss of Significant Digits (contd.)

Example: Consider the function

√ √
f (x) = x( x + 1 − x).

The value of f (100000) using six-digit rounding is 100.

True value is 158.113.

Spring 2022-23 25 / 38
Error Analysis: Loss of Significant Digits (contd.)

Example: Consider the function

√ √
f (x) = x( x + 1 − x).

The value of f (100000) using six-digit rounding is 100.

True value is 158.113.

There is a drastic error in the value of the function, which is due to the
loss of significant digits.

Spring 2022-23 25 / 38
Error Analysis: Loss of Significant Digits (contd.)

How to avoid Loss of Significance?

Spring 2022-23 26 / 38
Error Analysis: Loss of Significant Digits (contd.)

How to avoid Loss of Significance?

By re-writing the expression of the function f by

x
f (x) = √ √ .
x +1+ x

Spring 2022-23 26 / 38
Error Analysis: Loss of Significant Digits (contd.)

How to avoid Loss of Significance?

By re-writing the expression of the function f by

x
f (x) = √ √ .
x +1+ x
With this new form of f , we obtain

f (100000) = 158.114

on a six-digit rounding.
Spring 2022-23 26 / 38
Disasters Caused by Numerical Errors

Patriot Missile Failure:

On February 25, 1991, during the Gulf War,
an American Patriot Missile battery in Dharan,
Saudi Arabia, failed to intercept an incoming
Iraqi Scud missile. The Scud struck an
American Army barracks and killed 28 soldiers,
injuring around a hundred others.
A report of the General Accounting office
reported that cause was an inaccurate
calculation of the time since boot due to
computer arithmetic errors.

Reference: https://apps.dtic.mil/sti/citations/ADA344865
Spring 2022-23 27 / 38
More details can also be found here :
https://www-users.cse.umn.edu/ arnold/disasters/patriot.html

Spring 2022-23 28 / 38
Total Error (contd.)