0% found this document useful (0 votes)
34 views

Errors and Propagation

The document discusses numerical representation of numbers in computers. It explains that numbers are represented using either fixed-point or floating-point representation, where the number is written as the product of a mantissa and a power of the base. In computers, numbers are approximated due to limitations on the number of bits used for the mantissa and exponent. This causes rounding or truncation errors. The document then discusses how errors in initial values propagate through arithmetic operations like addition, subtraction, multiplication and division. In particular, it notes that addition and subtraction errors add, while multiplication errors depend on the factors and their errors.

Uploaded by

Jhon Muñoz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

Errors and Propagation

The document discusses numerical representation of numbers in computers. It explains that numbers are represented using either fixed-point or floating-point representation, where the number is written as the product of a mantissa and a power of the base. In computers, numbers are approximated due to limitations on the number of bits used for the mantissa and exponent. This causes rounding or truncation errors. The document then discusses how errors in initial values propagate through arithmetic operations like addition, subtraction, multiplication and division. In particular, it notes that addition and subtraction errors add, while multiplication errors depend on the factors and their errors.

Uploaded by

Jhon Muñoz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

1 Numerical representation

1.1 Fixed-point and floating-point representation

Any number x has an unique representation in a numerical base b:

x = ±an . . . a1 a0 .a−1 . . . a−s . . . (1)

and we assume henceforth b = 10.

We can also write this number as

x = ±0.an . . . a1 a0 a−1 . . . a−s . . . · 10q (2)

with 0 ≤ aj < 10 and imposing an > 0. For example, the real numbers may be represented
in what is known as standard scientific notation:

37.21829 = 0.3721829 · 102


0.002271828 = 0.2271828 · 10−2

The representation (1) is known as fixed-point representation and (2) is called floating-
point representation. In representation (1), the digits except those that only serve to
locate the position of the decimal point, are called significant digits. Then, for example
the number 0.000341020 has 9 decimal digits, but only 6 of them are significant. In
representation (2), q is called exponent, m ≡ .an . . . a0 . . . is called mantissa and it
meets m ∈ [0.1, 1).

In a computer a similar representation is used, but the mantissa has a finite length. That
causes that not all real numbers can be represented in a ordinador. Concretelly it is
impossible to represent irrational numbers like π. In a standard computer with a word
length of 64 bits, they can be distributed as follows. They can utilitzarse the first two
bits for the signs: the first for the sign of the mantissa and the second for the sign of
the exponent. The following 14 bits can be used to describe the exponent. Finally, the
remaining 48 bits can be used for the mantissa. This distribution can changes between
computers.

The most important point is that we only have a finite amount of digits for both the
mantissa to the exponent. Hence, if we only have s digits for the mantissa, we can not
represent the number
x = 0.a1 a2 . . . as as+1 · 10q

1
We will have to do an approximation, which can be done in two different ways, by trun-
cation or rounding. We can simply remove the digits that do not fit in the mantissa to
obtain the number:
x∗ = 0.a1 a2 . . . as · 10q
This process is known as cutting-off or truncation, but we also can use the rounding-
off process, that follows the next rule:

Rounding-off rule (decimal)

• If the first rejected digit is smaller than 5, the s digits remain unchanged.

• If the first rejected digit is greater than 5, the last preserved digit is increased by
one.

• If the first rejected digit is 5:

– If any of the rejected digit is different from 0, the last retained digit is increased
by one.
– Otherwise, if the last retained digit is an even digit, it remains unchanged but
if it is an odd digit is increased by one.

Note that the error that occurs when rounding may be by excess or defect. Instead, cutting-
off is always a mistake by default, which if it is systematic, it can become serious in some
cases.

The exponent can only be an integer −M leqq ≤ M , where M depends on the computer.
In some operations, the exponent obtained can be larger than M. In these cases the phe-
nomenon known as overf low occurs and normally calculations stop. In the case that the
exponent is smaller than −M , the number is approximated by 0. This process is known
as underf low.

1.2 Error in the floating-point number representation

We represent the exact value by


x = ± m · 10n+1
and the mantissa has the form m = 0.an . . . a1 a0 a−1 . . . a−s a−s−1 . . ., with t = n + 1 + s
figures of which s are decimales.

2
1.2.1 Cut-off error

In this case, we represent the approximated value by

x′ = ± m′ · 10n+1

and the mantissa is


m′ = 0.an . . . a1 a0 a−1 . . . a−s

As the mantissa is in fixed-point representation, its absolute error is

ϵa (m) =| m − m′ |= 0. 0| . . . .{z
. . . . . 0} a−s−1 . . .
t figures

hence, an upper bound of its absolute error is

. . . . . 1} = 10−t
Ea (m) = 0. 0| . . . .{z
t figures

So
| x − x′ |=| m − m′ | ·10n+1
therefore, an upper bound of the absolute error of x is

Ea (x) = Ea (m)10n+1 = 10−t 10n+1 = 10−s (3)

Finally, an upper bound of the relative error is

10−s 10−(n+s+1) 10−t 10−t


Er (x) = = = ≤ = 101−t (4)
| m′ | 10n+1 | m′ | | m′ | 10−1

1.2.2 Round-off error

In this case, we represent the approximated value by

x′′ = ± m′′ · 10n+1

and the mantissa is m′′ = 0.an . . . a1 a0 a−1 . . . a′′−s , where a′′s follows the rounding-off rule.

The absolute error is

. . . . 0}[(a−s a−s−1 . . .) − a′′−s ]


ϵa (m) =| m − m′′ |= 0. 0| . . {z
t − 1 figures

3
hence, an upper bound of its absolute error is
10
Ea (m) = 0. 0 . . . . . . 0 = 0.5 · 10−t
| {z 2}
t figures

So
| x − x′′ |=| m − m′′ | ·10n+1
therefore, an upper bound of the absolute error of x is
Ea (x) = Ea (m)10n+1 = 0.5 · 10−t 10n+1 = 0.5 · 10−s (5)

An upper bound of the relative error is


0.5 · 10−s 0.5 · 10−(n+s+1) 0.5 · 10−t 0.5 · 10−t
Er (x) = = = ≤ = 0.5 · 101−t (6)
| m′′ | 10n+1 | m′′ | | m′′ | 10−1
which is just half the relative error bound by truncation.

2 Propagation of errors of the input data

The study of the propagation of errors in the calculations is absolutely necessary before
attempting to implement a numerical method. Before choosing an algorithm to solve a
given problem, we need to consider whether it magnifies the initial errors. For example,
given an algorithm, it must be taken into account at a time of calculations if double
precision is necessary, because the error is less but normally calculation speed is lost.

2.1 Arithmetic operations (+, −, ∗, /)

We will study how the errors of the input data are propagated with the four arithmetic
operations (+, −, ∗, /). We will supose that the operations are exact and the only origin
of error is in initial data.

2.1.1 Sum and difference

We have two approximated numbers and we sum them:


}
x = x∗ + ϵa (x)
=⇒ x ± y = x∗ ± y ∗ + ϵa (x) ± ϵa (y)
y = y ∗ + ϵa (y)

4
hence,
ϵa (x ± y) = ϵa (x) ± ϵa (y)
and looking for an absolute error upper bound:

| ϵa (x ± y) | =| ϵa (x) ± ϵa (y) |
≤| ϵa (x) | + | ϵa (y) |
≤ Ea (x) + Ea (y)

that is
Ea (x ± y) = Ea (x) + Ea (y)

To obtain a relative error upper bound, we use the definition

Ea (x ± y)
Er (x ± y) =
| x∗ ± y ∗ |
so
Ea (x) + Ea (y)
Er (x ± y) =
| x∗ ± y ∗ |

Note that, if x∗ ≃ y ∗ , the number of significant digits decreases in the result of x∗ − y ∗


and produces that the relative error abruptly increases. This phenomenon is called loss of
significance and we must avoid it at any cost rewriting the formula or algorithm so that
it can dodge.

2.1.2 Product

The product of two approximated numbers gives a new error

xy = (x∗ + ϵa (x))(y ∗ + ϵa (y))


= x∗ y ∗ + x∗ ϵa (y) + y ∗ ϵa (x) + ϵa (x)ϵa (y)

So, the absolute error for product is

ϵa (xy) = xy − x∗ y ∗ = x∗ ϵa (y) + y ∗ ϵa (x) + ϵa (x)ϵa (y)

By the triangle inequality:

| ϵa (xy) | =| x∗ ϵa (y) + y ∗ ϵa (x) + ϵa (x)ϵa (y) |


≤| x∗ | · | ϵa (y) | + | y ∗ | · | ϵa (x) | + | ϵa (x) | · | ϵa (y) |

5
and we obtain an absolute error upper bound

Ea (xy) =| y ∗ | Ea (x)+ | x∗ | Ea (y) + Ea (x)Ea (y)

if we underestimate the terms of second order, we obtain an easier expression

Ea (xy) ≃| y ∗ | Ea (x)+ | x∗ | Ea (y)

We proceed similarly to obtain the relative error:

ϵa (xy)
ϵr (xy) =
x∗ y ∗
x∗ ϵa (y) + y ∗ ϵa (x) + ϵa (x)ϵa (y)
=
x∗ y ∗
ϵa (x) ϵa (y) ϵa (x) ϵa (y)
= + ∗ + ∗
x∗ y x y∗
= ϵr (x) + ϵr (y) + ϵr (x)ϵr (y)

and an upper bound is

Er (xy) = Er (x) + Er (y) + Er (x)Er (y) ≃ Er (x) + Er (y)

So, the upper bound for the product is the sum of the two data upper bounds.

2.1.3 Division

The division of two approximated numbers gives a new error

x x∗ + ϵa (x)
= ∗
y y + ϵa (y)

whose absolute error is


( )
x x∗ + ϵa (x) x∗
ϵa = ∗ −
y y + ϵa (y) y ∗
x∗ y ∗ + ϵa (x)y ∗ − x∗ y ∗ − x∗ ϵa (y)
=
y ∗ (y ∗ + ϵa (y))
y ∗ ϵa (x) − x∗ ϵa (y)
= ( )
∗2
ϵa (y)
y 1+ ∗
y

6
and using the triangle inequality:
( )

(x) | + | x∗ | · | ϵa (y) |
ϵa x ≤ | y | · | ϵa( )
y | ϵa (y) |
y ∗2 1−
| y∗ |

that gives us an absolute error upper bound


( )
x | y ∗ | Ea (x)+ | x∗ | Ea (y) | y ∗ | Ea (x)+ | x∗ | Ea (y)
Ea = ( ) ≃
y Ea (y) y ∗2
y ∗2 1 − ∗
|y |

For the relative error:


( )
x
( ) ϵa
x y y ∗2 ϵa (x) − x∗ y ∗ ϵa (y)
ϵr = = ( )
y x∗ ϵ a (y)
x∗ y ∗2 1 − ∗
y∗ y
ϵa (x) ϵa (y)
− ∗
x∗ y ϵr (x) − ϵr (y)
= =
ϵa (y) 1 − ϵr (y)
1− ∗
y
And, finally ( )
x Er (x) + Er (y)
Er =
y 1 − Er (y)
If y is not close to 0, we can underestimate Er (y) in front of 1 and we obtain the final
formula ( )
x
Er ≃ Er (x) + Er (y)
y
So, in this case we also sum the relative errors.

2.2 General formula of error propagation

One variable functions

We consider one real variable differential function. When you evaluate the function at x∗ ,
the absolute error of the result is

ϵa (f (x)) = f (x) − f (x∗ )

7
Applying the mean value theorem, we can write this expression as

ϵa (f (x)) = f ′ (ξ)(x − x∗ )

with an unknown ξ ∈ (x, x∗ ). So,

| ϵa (f (x)) |=| f ′ (ξ) || x − x∗ |=| f ′ (ξ) || ϵa (x)|

and we have an absolute error upper bound:

Ea (f (x)) =| f ′ (ξ) | Ea (x)

Using the approximation f ′ (ξ) ≃ f ′ (x∗ ) the final formula for the error propagation is

Ea (f (x)) ≃| f ′ (x∗ ) | Ea (x)

So, if the function is for example f (x) = sin(x), this formula gives us:

Ea (sin x) ≃| cos(x∗ ) | Ea (x) ≤ Ea (x)

You might also like