EEPC102-Module_1-Lesson-3
EEPC102-Module_1-Lesson-3
However, the problem with numerical methods is that they yield approximate
results. It is, therefore, important to develop criteria to determine if our
approximation of the solution is acceptable.
Errors definition
In engineering problems, we try to minimize both imprecision and inaccuracy.
EEPC102 Module I
2
Et is the true error since we are comparing the approximation with the true
value.
𝑨𝒑𝒑𝒓𝒐𝒙𝒊𝒎𝒂𝒕𝒊𝒐𝒏 𝒆𝒓𝒓𝒐𝒓
𝜺𝒂 = 𝒙 𝟏𝟎𝟎%
𝑨𝒑𝒑𝒓𝒐𝒙𝒊𝒎𝒂𝒕𝒊𝒐𝒏
EEPC102 Module I
3
You can notice, from the above formulation, that the error can be negative
or positive. But in reality the most important thing for us is that the absolute
error has to be lower than a certain limit εs (this limit is very dependent upon
the application and the computational time):
|𝜀𝑎 | < 𝜀𝑠
Example 3.1
Suppose that you have the task of measuring the lengths of a bridge and a
rivet and come up with 9999 and 9 cm, respectively. If the true values are
10,000 and 10 cm, respectively, compute (a) the true error and (b) the true
percent relative error for each case.
Solution:
(a) The error for measuring the bridge is
𝐸𝑟𝑟𝑜𝑟 (𝐸𝑡 ) = 𝑇𝑟𝑢𝑒 𝑣𝑎𝑙𝑢𝑒 – 𝑎𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑖𝑜𝑛 = 10,000 − 9999 = 1 𝑐𝑚
and for the rivet it is
𝑟𝑟𝑜𝑟 (𝐸𝑡 ) = 𝑇𝑟𝑢𝑒 𝑣𝑎𝑙𝑢𝑒 – 𝑎𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑖𝑜𝑛 = 10 − 9 = 1 𝑐𝑚
(b) The percent relative error for the bridge is
𝑇𝑟𝑢𝑒 𝑣𝑎𝑙𝑢𝑒 − 𝐴𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑖𝑜𝑛 1
𝜀𝑡 = | | 𝑥 100% = 𝑥 100% = 0.01%
𝑇𝑟𝑢𝑒 𝑉𝑎𝑙𝑢𝑒 10000
and for the rivet it is
𝑇𝑟𝑢𝑒 𝑣𝑎𝑙𝑢𝑒 − 𝐴𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑖𝑜𝑛 1
𝜀𝑡 = | | 𝑥 100% = 𝑥 100% = 10%
𝑇𝑟𝑢𝑒 𝑉𝑎𝑙𝑢𝑒 10
EEPC102 Module I
4
Round-off errors
These errors originate from the fact that computers retain only a fixed
number of significant figures during calculation. These errors are, therefore,
directly related to the manner in which numbers are stored in a computer.
In fact, remember that instead of using decimal number system (or base-10)
as we do, a computer uses a binary system (or base-2). Why? Because, this
corresponds to the on/off positions of electronic components. The
discrepancy introduced by this omission of significant figures is called round-
off error.
A. Computer Representation of Numbers
Numerical round-off errors are directly related to the manner in which
numbers are stored in a computer. The fundamental unit whereby information
is represented is called a word. This is an entity that consists of a string of
binary digits, or bits. Numbers are typically stored in one or more words.
Number Systems. A number system is merely a convention for representing
quantities. Because we have 10 fingers and 10 toes, the number system that
we are most familiar with is the decimal, or base-10, number system. A base
is the number used as the refer-ence for constructing the system. The base-
10 system uses the 10 digits—0, 1, 2, 3, 4, 5, 6, 7, 8, 9—to represent numbers.
By themselves, these digits are satisfactory for counting from 0 to 9.
For larger quantities, combinations of these basic digits are used, with
the position or place value specifying the magnitude. The right-most digit in
a whole number represents a number from 0 to 9. The second digit from the
right represents a multiple of 10. The third digit from the right represents a
multiple of 100 and so on. For example, if we have the number 86,409 then
we have eight groups of 10,000, six groups of 1000, four groups of 100, zero
groups of 10, and nine more units, or
(8𝑥104 ) + (6𝑥103 ) + (4𝑥102 ) + (0𝑥101 ) + (9𝑥100 ) = 86,409
Figure 3.1a provides a visual representation of how a number is
formulated in the base-10 system. This type of representation is called
positional notation.
Because the decimal system is so familiar, it is not commonly realized
that there are alternatives. For example, if human beings happened to have
had eight fingers and eight toes, we would undoubtedly have developed an
octal, or base-8, representation. In the same sense, our friend the computer
is like a two-fingered animal who is limited to two states—either 0 or 1. This
relates to the fact that the primary logic units of digital computers are on/off
electronic components. Hence, numbers on the computer are represented
with a binary, or base-2, system. Just as with the decimal system, quantities
can be represented using positional notation. For example, the binary number
EEPC102 Module I
5
FIGURE 3.1
How the (a) decimal (base-10) and the (b) binary (base-2) systems work. In
(b) the binary number 10101101 is equivalent to the decimal number 173.
In a 16-bit computer word, the numbers will be stored as:
Example 3.2
Determine the range of integers in base-10 that can be represented on a 16-
bit computer.
Solution:
Of the 16 bits, the first bit holds the sign. The remaining 15 bits can hold
binary numbers from 0 to 111111111111111. The upper limit can be converted
to a decimal integer, as in
(1 𝑥 214 ) + (1 𝑥 213 ) + ⋯ … + (1 𝑥 21 ) 1 (1 𝑥 20 )
EEPC102 Module I
6
which equals 32,767 (note that this expression can be simply evaluated as
(215 − 1). Thus, a 16-bit computer word can store decimal integers ranging
from -32,767 to 32,767. In addition, because zero is already defined as
0000000000000000, it is redundant to use the number 1000000000000000 to
defi ne a “minus zero.” Therefore, it is usually employed to represent an
additional negative number: 232,768, and the range is from 232,768 to
32,767.
where (m) is the mantissa; (b) is the base of the number system used and (e)
is the exponent. As an example, the number 156.76 could be represented as
0.15676 . 103 in a floating point base10 system.
Usually for the storage of fractional quantities, the first bit is reserved to the
sign, then the signed exponent, and the last bits for the mantissa. Therefore,
for an optimal storage, if the mantissa has leading zero digits, they are
removed and transferred to the exponent.
Example 3.3
1/34 = 0.0294117
Would be stored as: 0.0294 . 10 0 However, because the zero before the 2, we
lose the digit (1). A better storage is:
0.2941 . 10-1
EEPC102 Module I
7
As the number of bits is limited some very large or very small numbers cannot
be represented. If you try to store a number outside this range you will
generate an overflow error.
𝜋 = 3.141592653558
When comparing two numbers, it is wiser to test that the difference is less
than an acceptably small tolerance rather than to test for equality:
Example 3.4
If you want to test if a=b, the best solution is to write in your program:
If
|𝑎 − 𝑏 | ≤ 𝜀
EEPC102 Module I
8
- Addition
The mantissa of the number of the smaller exponent is modified so that the
exponents are the same. If we consider a computer with just 4-digit mantissa
and a 1-digit exponent, if we add 0.1557 . 101 to 0.4381 . 10-1, the following
process will occur if chopping is used:
0.4318 . 10-1 0.004318 . 101
Then,
0.1557 . 101
0.0043 . 101
---------------
0.1600 . 101
- Subtraction
Due to the presence of the zero just before the (9), the result is normalized:
0.0955 . 102 0.9550 . 101
EEPC102 Module I
9
Note that we added a zero to fill the space of the 4-digit mantissa.
- Multiplication
normalization
0.8754 . 101
Example 3.5
4000 + 0.0010
Is computed as:
0.4000 . 104
0.0000001 . 104
chopping
----------------------
0.4000001 . 104 0.4000 . 104
The small number is completely ignored
Subtractive cancellation
This error occurs when we perform the subtraction of nearly equal floating
point numbers.
EEPC102 Module I
10
Example 3.6
EEPC102 Module I