0% found this document useful (0 votes)
2 views

EEPC102-Module_1-Lesson-3

This lesson discusses the importance of understanding errors in numerical methods, focusing on accuracy, precision, truncation, and round-off errors. It explains how to compute true and approximate errors, and the implications of round-off errors due to limited significant figures in computer representations. Additionally, it covers floating-point representation and the challenges of comparing numerical values in computational contexts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

EEPC102-Module_1-Lesson-3

This lesson discusses the importance of understanding errors in numerical methods, focusing on accuracy, precision, truncation, and round-off errors. It explains how to compute true and approximate errors, and the implications of round-off errors due to limited significant figures in computer representations. Additionally, it covers floating-point representation and the challenges of comparing numerical values in computational contexts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Lesson 3

 Approximations and Round-off


Errors

The concept of errors is very important to the effective use of


numerical methods. Usually we can compare the numerical result with the
analytical solution. However, when the analytical solution is not available
(which is usually the case), we have to estimate the errors. The first step to
minimize the errors is to apply simplifications to our problem and use simple
formulations that can be solved analytically. However, sometimes the results
are far from the reality. Hence, more complex formulations are needed, but
as a consequence, it is more difficult to solve them analytically. Solving these
problems will be then only possible by using numerical methods.

However, the problem with numerical methods is that they yield approximate
results. It is, therefore, important to develop criteria to determine if our
approximation of the solution is acceptable.

Accuracy and precision


The errors associated with computation or measurements can be
characterized by their accuracy and their precision.
Accuracy: how closely a computed or measured value agrees with the true
value.
Precision: how closely individual computed or measured values agree with
each other.

Figure.3.1. accuracy and precision


(a) inaccurate and imprecise; (b)
accurate and imprecise; (c)
inaccurate and precise; (d) accurate
and precise.

Errors definition
In engineering problems, we try to minimize both imprecision and inaccuracy.

EEPC102 Module I
2

The errors encountered in numerical methods can be classified into:


Truncation errors: defined as the errors due the fact that we used an
approximation to solve the problem instead of solving the problem
analytically.
Round-off errors: appears when numbers having limited significant figures
are used to represent exact numbers (example: π; e; …).
When considering the errors due to the fact that we are using numerical
methods, the true value for the solution can be written as:
True value = approximation + error
Hence, the error can be computed as:

Error (Et) = True value – approximation

Et is the true error since we are comparing the approximation with the true
value.

To take into account the magnitude of the error, it is preferable to normalize


the error to the true value:
𝑻𝒓𝒖𝒆 𝒆𝒓𝒓𝒐𝒓
𝑻𝒓𝒖𝒆 𝒇𝒓𝒂𝒄𝒕𝒊𝒐𝒏𝒂𝒍 𝒓𝒆𝒍𝒂𝒕𝒊𝒗𝒆 𝒆𝒓𝒓𝒐𝒓 =
𝑻𝒓𝒖𝒆 𝒗𝒂𝒍𝒖𝒆

We can express this in percentage as:


𝑻𝒓𝒖𝒆 𝒗𝒂𝒍𝒖𝒆 − 𝑨𝒑𝒑𝒓𝒐𝒙𝒊𝒎𝒂𝒕𝒊𝒐𝒏
𝜺𝒕 = | | 𝒙 𝟏𝟎𝟎%
𝑻𝒓𝒖𝒆 𝑽𝒂𝒍𝒖𝒆

Where 𝜀𝑡 is the relative error.

An important point to notice is that in the definition of the true error, we


used the true value of the solution. However, the true value is not always
available and we have, therefore, to compute an approximation of the error.
For that, we normalize the error using the best available estimate of the true
value:

𝑨𝒑𝒑𝒓𝒐𝒙𝒊𝒎𝒂𝒕𝒊𝒐𝒏 𝒆𝒓𝒓𝒐𝒓
𝜺𝒂 = 𝒙 𝟏𝟎𝟎%
𝑨𝒑𝒑𝒓𝒐𝒙𝒊𝒎𝒂𝒕𝒊𝒐𝒏

EEPC102 Module I
3

However, in real life it is not obvious to know the approximation.


As several numerical methods include an iterative process, we will define the
error as:

𝑪𝒖𝒓𝒓𝒆𝒏𝒕 𝑨𝒑𝒑𝒓𝒐𝒙𝒊𝒎𝒂𝒕𝒊𝒐𝒏 − 𝑷𝒓𝒆𝒗𝒊𝒐𝒖𝒔 𝑨𝒑𝒑𝒓𝒐𝒙𝒊𝒎𝒂𝒕𝒊𝒐𝒏


𝜺𝒂 = | | 𝒙 𝟏𝟎𝟎%
𝑪𝒖𝒓𝒓𝒆𝒏𝒕 𝑨𝒑𝒑𝒓𝒐𝒙𝒊𝒎𝒂𝒕𝒊𝒐𝒏

You can notice, from the above formulation, that the error can be negative
or positive. But in reality the most important thing for us is that the absolute
error has to be lower than a certain limit εs (this limit is very dependent upon
the application and the computational time):

|𝜀𝑎 | < 𝜀𝑠
Example 3.1
Suppose that you have the task of measuring the lengths of a bridge and a
rivet and come up with 9999 and 9 cm, respectively. If the true values are
10,000 and 10 cm, respectively, compute (a) the true error and (b) the true
percent relative error for each case.
Solution:
(a) The error for measuring the bridge is
𝐸𝑟𝑟𝑜𝑟 (𝐸𝑡 ) = 𝑇𝑟𝑢𝑒 𝑣𝑎𝑙𝑢𝑒 – 𝑎𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑖𝑜𝑛 = 10,000 − 9999 = 1 𝑐𝑚
and for the rivet it is
𝑟𝑟𝑜𝑟 (𝐸𝑡 ) = 𝑇𝑟𝑢𝑒 𝑣𝑎𝑙𝑢𝑒 – 𝑎𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑖𝑜𝑛 = 10 − 9 = 1 𝑐𝑚
(b) The percent relative error for the bridge is
𝑇𝑟𝑢𝑒 𝑣𝑎𝑙𝑢𝑒 − 𝐴𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑖𝑜𝑛 1
𝜀𝑡 = | | 𝑥 100% = 𝑥 100% = 0.01%
𝑇𝑟𝑢𝑒 𝑉𝑎𝑙𝑢𝑒 10000
and for the rivet it is
𝑇𝑟𝑢𝑒 𝑣𝑎𝑙𝑢𝑒 − 𝐴𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑖𝑜𝑛 1
𝜀𝑡 = | | 𝑥 100% = 𝑥 100% = 10%
𝑇𝑟𝑢𝑒 𝑉𝑎𝑙𝑢𝑒 10

Thus, although both measurements have an error of 1 cm, the relative


error for the rivet is much greater. We would conclude that we have done an
adequate job of measuring the bridge, whereas our estimate for the rivet
leaves something to be desired.

EEPC102 Module I
4

Round-off errors

These errors originate from the fact that computers retain only a fixed
number of significant figures during calculation. These errors are, therefore,
directly related to the manner in which numbers are stored in a computer.
In fact, remember that instead of using decimal number system (or base-10)
as we do, a computer uses a binary system (or base-2). Why? Because, this
corresponds to the on/off positions of electronic components. The
discrepancy introduced by this omission of significant figures is called round-
off error.
A. Computer Representation of Numbers
Numerical round-off errors are directly related to the manner in which
numbers are stored in a computer. The fundamental unit whereby information
is represented is called a word. This is an entity that consists of a string of
binary digits, or bits. Numbers are typically stored in one or more words.
Number Systems. A number system is merely a convention for representing
quantities. Because we have 10 fingers and 10 toes, the number system that
we are most familiar with is the decimal, or base-10, number system. A base
is the number used as the refer-ence for constructing the system. The base-
10 system uses the 10 digits—0, 1, 2, 3, 4, 5, 6, 7, 8, 9—to represent numbers.
By themselves, these digits are satisfactory for counting from 0 to 9.
For larger quantities, combinations of these basic digits are used, with
the position or place value specifying the magnitude. The right-most digit in
a whole number represents a number from 0 to 9. The second digit from the
right represents a multiple of 10. The third digit from the right represents a
multiple of 100 and so on. For example, if we have the number 86,409 then
we have eight groups of 10,000, six groups of 1000, four groups of 100, zero
groups of 10, and nine more units, or
(8𝑥104 ) + (6𝑥103 ) + (4𝑥102 ) + (0𝑥101 ) + (9𝑥100 ) = 86,409
Figure 3.1a provides a visual representation of how a number is
formulated in the base-10 system. This type of representation is called
positional notation.
Because the decimal system is so familiar, it is not commonly realized
that there are alternatives. For example, if human beings happened to have
had eight fingers and eight toes, we would undoubtedly have developed an
octal, or base-8, representation. In the same sense, our friend the computer
is like a two-fingered animal who is limited to two states—either 0 or 1. This
relates to the fact that the primary logic units of digital computers are on/off
electronic components. Hence, numbers on the computer are represented
with a binary, or base-2, system. Just as with the decimal system, quantities
can be represented using positional notation. For example, the binary number

EEPC102 Module I
5

11 is equivalent to (1 𝑥 21 ) + (1 𝑥 20 ) = 2 + 1 = 3 in the decimal system.


Figure 3.1b illustrates a more complicated example.

FIGURE 3.1
How the (a) decimal (base-10) and the (b) binary (base-2) systems work. In
(b) the binary number 10101101 is equivalent to the decimal number 173.
In a 16-bit computer word, the numbers will be stored as:

Example 3.2
Determine the range of integers in base-10 that can be represented on a 16-
bit computer.
Solution:
Of the 16 bits, the first bit holds the sign. The remaining 15 bits can hold
binary numbers from 0 to 111111111111111. The upper limit can be converted
to a decimal integer, as in
(1 𝑥 214 ) + (1 𝑥 213 ) + ⋯ … + (1 𝑥 21 ) 1 (1 𝑥 20 )

EEPC102 Module I
6

which equals 32,767 (note that this expression can be simply evaluated as
(215 − 1). Thus, a 16-bit computer word can store decimal integers ranging
from -32,767 to 32,767. In addition, because zero is already defined as
0000000000000000, it is redundant to use the number 1000000000000000 to
defi ne a “minus zero.” Therefore, it is usually employed to represent an
additional negative number: 232,768, and the range is from 232,768 to
32,767.

B. Floating point representation

The floating point representation is used to store fractional quantities. The


number is expressed under the following form:
m.be

where (m) is the mantissa; (b) is the base of the number system used and (e)
is the exponent. As an example, the number 156.76 could be represented as
0.15676 . 103 in a floating point base10 system.
Usually for the storage of fractional quantities, the first bit is reserved to the
sign, then the signed exponent, and the last bits for the mantissa. Therefore,
for an optimal storage, if the mantissa has leading zero digits, they are
removed and transferred to the exponent.
Example 3.3

1/34 = 0.0294117

Would be stored as: 0.0294 . 10 0 However, because the zero before the 2, we
lose the digit (1). A better storage is:

0.2941 . 10-1

Floating-point representation allows both fractions and large numbers to be


stored, however, this has a computational cost since floating-point numbers
take more time to be processed than integer numbers, and it has also a
precision price, since only a finite number of figures can be stored in the
mantissa.

EEPC102 Module I
7

In 64 bits using IEEE 754 standard

A. Limited range of quantities that may be represented

As the number of bits is limited some very large or very small numbers cannot
be represented. If you try to store a number outside this range you will
generate an overflow error.

- How to deal the problem of 𝜋?

𝜋 = 3.141592653558

To be stored on a base-10 system carrying seven significant figures:


we can omit the figures after the seventh: 𝜋 = 3.141592; this is called
chopping.
This will generate an error of 0.0000065
Or
we can round the eighth figure: 𝜋 = 3.141593
This will generate an error of -0.0000035
Therefore, rounding reduces the error

B. Comparison between two numbers

When comparing two numbers, it is wiser to test that the difference is less
than an acceptably small tolerance rather than to test for equality:
Example 3.4
If you want to test if a=b, the best solution is to write in your program:
If
|𝑎 − 𝑏 | ≤ 𝜀

EEPC102 Module I
8

Machine epsilon can be used as a criterion to ensure a certain portability of


the code, since it will not depend on the storage characteristics of the
machine used.

Machine epsilon can be used as a criterion to ensure a certain portability of


the code, since it will not depend on the storage characteristics of the
machine used.

𝜀 = 𝑏𝑎𝑠𝑒 1−𝑡 t: is the number of digits in the mantissa

Arithmetic manipulation of computer numbers


Basic arithmetic operations such as addition, subtraction or multiplication can
lead to significant round off errors.

- Addition

The mantissa of the number of the smaller exponent is modified so that the
exponents are the same. If we consider a computer with just 4-digit mantissa
and a 1-digit exponent, if we add 0.1557 . 101 to 0.4381 . 10-1, the following
process will occur if chopping is used:
0.4318 . 10-1 0.004318 . 101
Then,
0.1557 . 101
0.0043 . 101
---------------
0.1600 . 101

- Subtraction

The same thing as for addition happens with subtraction:


0.3641 . 102
- 0.2686 . 102
---------------
0.0955 . 102

Due to the presence of the zero just before the (9), the result is normalized:
0.0955 . 102 0.9550 . 101

EEPC102 Module I
9

Note that we added a zero to fill the space of the 4-digit mantissa.

- Multiplication
normalization

0.1363 . 103 × 0.6423 . 10-1 = 0.08754549 . 102


0.8754549 . 101
chopping

0.8754 . 101

The errors produced by these arithmetic manipulations may seem negligible,


but, several methods in engineering require an iterative process to find the
solution. The computations are, therefore, interdependent and this might
lead to a dramatic increase in the round-off errors.

Errors due to addition of large and small numbers

Example 3.5

4000 + 0.0010
Is computed as:
0.4000 . 104
0.0000001 . 104
chopping
----------------------
0.4000001 . 104 0.4000 . 104
The small number is completely ignored

This kind of problems usually occurs in the computation of infinite series


where the first terms are large. To avoid this problem, you have to compute
the series in an ascending order.

Subtractive cancellation

This error occurs when we perform the subtraction of nearly equal floating
point numbers.

EEPC102 Module I
10

Example 3.6

Calculate √9.01 − 3 on a 3-decimal-digit computer.

To avoid the problem of subtractive cancellation,


use double precision (use the function Double(X) with SciLab or Matlab ).
Single precision [32 bits] - 24 bits assigned to mantissa (first bit
assumed =1 and not stored). - 8 bits to signed exponent.

Double precisions [64 bits] - 56 bits assigned to mantissa.


- 8 bits to signed exponent.

EEPC102 Module I

You might also like