Errors and Propagation
Errors and Propagation
with 0 ≤ aj < 10 and imposing an > 0. For example, the real numbers may be represented
in what is known as standard scientific notation:
The representation (1) is known as fixed-point representation and (2) is called floating-
point representation. In representation (1), the digits except those that only serve to
locate the position of the decimal point, are called significant digits. Then, for example
the number 0.000341020 has 9 decimal digits, but only 6 of them are significant. In
representation (2), q is called exponent, m ≡ .an . . . a0 . . . is called mantissa and it
meets m ∈ [0.1, 1).
In a computer a similar representation is used, but the mantissa has a finite length. That
causes that not all real numbers can be represented in a ordinador. Concretelly it is
impossible to represent irrational numbers like π. In a standard computer with a word
length of 64 bits, they can be distributed as follows. They can utilitzarse the first two
bits for the signs: the first for the sign of the mantissa and the second for the sign of
the exponent. The following 14 bits can be used to describe the exponent. Finally, the
remaining 48 bits can be used for the mantissa. This distribution can changes between
computers.
The most important point is that we only have a finite amount of digits for both the
mantissa to the exponent. Hence, if we only have s digits for the mantissa, we can not
represent the number
x = 0.a1 a2 . . . as as+1 · 10q
1
We will have to do an approximation, which can be done in two different ways, by trun-
cation or rounding. We can simply remove the digits that do not fit in the mantissa to
obtain the number:
x∗ = 0.a1 a2 . . . as · 10q
This process is known as cutting-off or truncation, but we also can use the rounding-
off process, that follows the next rule:
• If the first rejected digit is smaller than 5, the s digits remain unchanged.
• If the first rejected digit is greater than 5, the last preserved digit is increased by
one.
– If any of the rejected digit is different from 0, the last retained digit is increased
by one.
– Otherwise, if the last retained digit is an even digit, it remains unchanged but
if it is an odd digit is increased by one.
Note that the error that occurs when rounding may be by excess or defect. Instead, cutting-
off is always a mistake by default, which if it is systematic, it can become serious in some
cases.
The exponent can only be an integer −M leqq ≤ M , where M depends on the computer.
In some operations, the exponent obtained can be larger than M. In these cases the phe-
nomenon known as overf low occurs and normally calculations stop. In the case that the
exponent is smaller than −M , the number is approximated by 0. This process is known
as underf low.
2
1.2.1 Cut-off error
x′ = ± m′ · 10n+1
ϵa (m) =| m − m′ |= 0. 0| . . . .{z
. . . . . 0} a−s−1 . . .
t figures
. . . . . 1} = 10−t
Ea (m) = 0. 0| . . . .{z
t figures
So
| x − x′ |=| m − m′ | ·10n+1
therefore, an upper bound of the absolute error of x is
and the mantissa is m′′ = 0.an . . . a1 a0 a−1 . . . a′′−s , where a′′s follows the rounding-off rule.
3
hence, an upper bound of its absolute error is
10
Ea (m) = 0. 0 . . . . . . 0 = 0.5 · 10−t
| {z 2}
t figures
So
| x − x′′ |=| m − m′′ | ·10n+1
therefore, an upper bound of the absolute error of x is
Ea (x) = Ea (m)10n+1 = 0.5 · 10−t 10n+1 = 0.5 · 10−s (5)
The study of the propagation of errors in the calculations is absolutely necessary before
attempting to implement a numerical method. Before choosing an algorithm to solve a
given problem, we need to consider whether it magnifies the initial errors. For example,
given an algorithm, it must be taken into account at a time of calculations if double
precision is necessary, because the error is less but normally calculation speed is lost.
We will study how the errors of the input data are propagated with the four arithmetic
operations (+, −, ∗, /). We will supose that the operations are exact and the only origin
of error is in initial data.
4
hence,
ϵa (x ± y) = ϵa (x) ± ϵa (y)
and looking for an absolute error upper bound:
| ϵa (x ± y) | =| ϵa (x) ± ϵa (y) |
≤| ϵa (x) | + | ϵa (y) |
≤ Ea (x) + Ea (y)
that is
Ea (x ± y) = Ea (x) + Ea (y)
Ea (x ± y)
Er (x ± y) =
| x∗ ± y ∗ |
so
Ea (x) + Ea (y)
Er (x ± y) =
| x∗ ± y ∗ |
2.1.2 Product
5
and we obtain an absolute error upper bound
ϵa (xy)
ϵr (xy) =
x∗ y ∗
x∗ ϵa (y) + y ∗ ϵa (x) + ϵa (x)ϵa (y)
=
x∗ y ∗
ϵa (x) ϵa (y) ϵa (x) ϵa (y)
= + ∗ + ∗
x∗ y x y∗
= ϵr (x) + ϵr (y) + ϵr (x)ϵr (y)
So, the upper bound for the product is the sum of the two data upper bounds.
2.1.3 Division
x x∗ + ϵa (x)
= ∗
y y + ϵa (y)
6
and using the triangle inequality:
( )
∗
(x) | + | x∗ | · | ϵa (y) |
ϵa x ≤ | y | · | ϵa( )
y | ϵa (y) |
y ∗2 1−
| y∗ |
We consider one real variable differential function. When you evaluate the function at x∗ ,
the absolute error of the result is
7
Applying the mean value theorem, we can write this expression as
ϵa (f (x)) = f ′ (ξ)(x − x∗ )
Using the approximation f ′ (ξ) ≃ f ′ (x∗ ) the final formula for the error propagation is
So, if the function is for example f (x) = sin(x), this formula gives us: