EEPC-102-Module-1
EEPC-102-Module-1
This module discusses on some Mathematical preliminaries as a background of the theorems needed
when solving numerical problems. It also gives a background of number system being used by the computer
and how it solves numerical problems, errors that occur during computation to get precise and accurate
results. It further gives a review of the operations of the Binary number system.
1. To introduce the student to Mathematical theorems such as the Intermediate Value Theorem, Rolle’s
Theorem, Mean Value Theorem, Fundamental Theorem of Algebra and Taylor’s Formula.
2. To let students realize the occurrence of Errors in Numerical computation.
3. To let students realize the importance of Numbers and their accuracy
4. To let the students refresh the conversion, operations performed on Binary numbers.
Where equations or functions become complicated and cannot be solved by simple and ordinary solving
methods, numerical methods are extremely powerful problem-solving tools. They are capable of handling
large systems of equations, nonlinearities, and complicated geometries that are not uncommon in
engineering practice and that are often impossible to solve analytically. As such, they greatly enhance your
problem-solving skills. [1]
Theorem: If 𝑓 is continuous on a closed interval [𝑎, 𝑏], and 𝑥0 is any number between 𝑓(𝑎) and 𝑓(𝑏)
inclusive then, there is at least one number 𝑥0 in the closed interval such that 𝑓(𝑥) = 𝑓(𝑥0 ). [2]
Theorem: Suppose f is continuous on [a,b] and differentiable on (a,b), and suppose that f(a) = f(b),
then there is a number c within the interval [a,b] with f’(c) = 0. [3]
It can be observed from the figure that 𝑓(𝑎) = 𝑓(𝑏). And within
the interval [a,b], a number 𝑐 is present whose slope is zero. The
slope then is also referred to as the first derivative of the function,
we call 𝑓′(𝑐) and is equal to 0.
1
I.1.C Mean Value Theorem
Theorem: Suppose f is continuous on [a,b] and differentiable on (a,b), then there is a number c within the
𝑓(𝑏)−𝑓(𝑎)
interval [a,b] with = 𝑓′(𝑐)[3]
𝑏 −𝑎
It states that a polynomial equation of degree higher than 1 has a complex solution. Polynomial equations
are in the form 𝑃(𝑥) = 𝑎𝑛 𝑥 𝑛 + 𝑎𝑛−1𝑥 𝑛−1 + ⋯ + 𝑎1𝑥 + 𝑎0, where 𝑎𝑛 is assumed non-zero, in which case 𝑛 is
called the degree of the polynomial 𝑃(𝑥) and 𝑎𝑖 ’s are known coefficients while 𝑥 is an unknown number. [4]
Let 𝑓(𝑥) have 𝑛 + 1 continuous derivatives on [𝑎, 𝑏] for some 𝑛 ≥ 0, 𝑎𝑛𝑑 𝑙𝑒𝑡 𝑥, 𝑥0 ∈ [𝑎, 𝑏]. Then,
(𝑥 − 𝑥0)𝑛+1 (𝑛+1)
𝑅𝑛 (𝑥) = 𝑓 (ℰ𝑥 )
(𝑛 + 1)!
In the above formula, 𝑛! denotes the factorial of 𝑛, and 𝑅𝑛 is the remainder term denoting the difference
between the Taylor polynomial of degree 𝑛 and the original function. The remainder term 𝑅𝑛 depends on 𝑥
and the value is small if 𝑥 is close enough to 𝑎.[5]
Taylor’s Theorem is important because it allows us to represent, exactly, fairly general functions in terms
of polynomials with a known specified, boundable error. This allow us to replace, in a computational setting,
these same general functions with something that is much simpler-a polynomial- yet at the same time we
are able to bound the error that is made. [7]
2
LEARNING CONTENTS 1.2 NUMBERS AND THEIR ACCURACY
Numerical computation is performed by tedious and repetitive methods, besides manipulation of number
such as addition, multiplication, and etc. while managing to include as many significant numbers in order to
get a more accurate solution. Because of these, computer applications have been considered to hasten
solving numerical problems.
Digital Computers use Binary number system to represent all types of information inside the computers.
Alphanumeric characters are represented using binary bits (i.e., 0 and 1). Digital representations are eas ier
to design, storage is easy, accuracy and precision are greater.
There are two major approaches to store real numbers (i.e., numbers with fractional component) in
modern computing. These are (i) Fixed Point Notation and (ii) Floating Point Notation. In fixed point notation,
there are a fixed number of digits after the decimal point, whereas floating point number allows for a varying
number of digits after the decimal point.
Fixed-Point Representation
This representation has fixed number of bits for integer part and for fractional part. For example, if given
fixed-point representation is AAAA.FFFF, then you can store minimum value is 0000.0001 and maximum
value is 9999.9999. There are three parts of a fixed-point number representation: the sign field, integer field,
and fractional field.
Example: Assume that a number is using 32-bit format which reserve 1 bit for the sign, 15 bits for the
integer part and 16 bits for the fractional part.
1 000000000101011 1010000000000000
Sign bit Integer part Fractional part
Where, 0 is used to represent + and 1 is used to represent - 0.000000000101011 is 15 bit binary value
for decimal 43 and 1010000000000000 is 16 bit binary value for fractional 0.625.
The advantage of using a fixed-point representation is performance and its disadvantage is relatively
limited range of values that they can represent. So, it is usually inadequate for numerical analysis as it does
not allow enough numbers and accuracy. A number whose representation exceeds 32 bits would have to be
stored inexactly.
Floating-Point Representation
This representation does not reserve a specific number of bits for the integer part or the fractional part.
Instead, the floating number representation of a number has two parts: the first part represents a signed
fixed-point number called mantissa. The second part of designates the position of the decimal (or binary)
point and is called the exponent. The fixed-point mantissa may be fraction or an integer. Floating-point is
always interpreted to represent a number in the following form: Mxr e.
3
Only the mantissa m and the exponent e are physically represented in the register (including their sign).
A floating-point binary number is represented in a similar manner except that is uses base 2 for the exponent.
A floating-point number is said to be normalized if the most significant digit of the mantissa is 1.
n 0
Sign bit Exponent Mantissa
Biased form
Example: Suppose number is using 32-bit format: the 1 bit sign bit, 8 bits for signed exponent, and 23 bits
for the fractional part. The leading bit 1 is not stored (as it is always 1 for a normalized number) and is referred
to as a “hidden bit”.
1 00000101 10101100000000000000000
The precision of a floating-point format is the number of positions reserved for binary digits plus one (for
the hidden bit). In the examples considered here the precision is 23+1=24.
The gap between 1 and the next normalized floating-point number is known as machine epsilon. the gap
is (1+2-23)-1=2-23for above example, but this is same as the smallest positive floating-point number because
of non-uniform spacing unlike in the fixed-point scenario.
Note that non-terminating binary numbers can be represented in floating point representation, e.g., 1/3 =
(0.010101 ...)2 cannot be a floating-point number as its binary representation is non-terminating.
IEEE (Institute of Electrical and Electronics Engineers) has standardized Floating-Point Representation
as following diagram.
n 0
Sign bit Exponent Mantissa
So, actual number is (-1)s(1+m)x2(e-Bias), where s is the sign bit, m is the mantissa, e is the exponent value,
and Bias is the bias number. The sign bit is 0 for positive number and 1 for negative number. Exponents are
represented by or two’s complement representation.
According to IEEE 754 standard, the floating-point number is represented in following ways:
• Half Precision (16 bit): 1 sign bit, 5 bit exponent, and 10 bit mantissa
• Single Precision (32 bit): 1 sign bit, 8 bit exponent, and 23 bit mantissa
• Double Precision (64 bit): 1 sign bit, 11 bit exponent, and 52 bit mantissa
• Quadruple Precision (128 bit): 1 sign bit, 15 bit exponent, and 112 bit mantissa
4
LEARNING CONTENTS I.3 ERRORS
However, even if computers solve with reliability, errors may still occur, hence, precision and accuracy
should be considered.
Sources of Errors:
1. Modelling error
a. Input error – this is when the person doing the calculations or the computations make
mistake while loading or inputting the numbers
2. Numerical error – this occurs while using a calculator or a computer to achieve the solution.
a. Truncation error - this occurs when a mathematical procedure is cut-off or trimmed.
Example: In implementing the Taylor’s formula, because of a lengthy solution,
remainder 𝑅𝑛 is cut-off the since its value is too small.
𝑓 ′(𝑎) 𝑓 ′′(𝑎) 𝑓 (𝑛) (𝑎)
𝑓(𝑥) = 𝑓(𝑎) + (𝑥 − 𝑎) + (𝑥 − 𝑎)2 + ⋯ + (𝑥 − 𝑎)𝑛 + 𝑅𝑛 (𝑥)
1! 2! 𝑛!
b. Round-off error – this occurs when long numbers are rounded off for ease of computation.
1
Example: the value of 𝜋, , √2
3
For numerical error, the relationship between the exact, or true, result and the approximation is formulated
as
𝑇𝑟𝑢𝑒 𝐸𝑟𝑟𝑜𝑟
𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝐸𝑟𝑟𝑜𝑟 =
𝑇𝑟𝑢𝑒 𝑉𝑎𝑙𝑢𝑒
𝑇𝑟𝑢𝑒 𝐸𝑟𝑟𝑜𝑟
𝑃𝑒𝑟𝑐𝑒𝑛𝑡 𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝐸𝑟𝑟𝑜𝑟(𝜀𝑡 ) = (100)
𝑇𝑟𝑢𝑒 𝑉𝑎𝑙𝑢𝑒
Numbers are represented in number systems for digital numbers, some of which are the decimal (base
10) system, octal (base 8) system and binary (base 2) system. However, the most relevant in digital systems
such as a computer use the binary number system.
For example, the numbers 18 + 2 is inputted in the computer. The computer converts the decimal
numbers first to binary numbers then add the binary numbers. The result of the binary addition will then
be converted back to a decimal number. And so the decimal number is what appears on the screen.
Illustration:
1810 → 100102
+
210 → 000102
101002 → 2010
The most common size binary number is a 32 bit number, which can represent approximately seven
digits of a decimal number. Some digital computers have 64 bit binary numbers, which can represent 13 to
14 decimal digits. In many engineering and scientific calculations, 32 bit arithmetic is adequate. However, in
many other applications, 64 bit arithmetic is required. In a few special situations, 128 bit arithmetic may be
required. On 32 bit computers, 64 bit arithmetic, or even 128 bit arithmetic, can be accomplished using
software enhancements.[6]
SUMMARY
There are mathematical preliminaries that should be taken into consideration before having a deeper
study in numerical methods.
1. Intermediate Value Theorem
2. Rolle’s Theorem
5
3. Mean Value Theorem
4. Fundamental Theorem in Algebra
5. Taylors Formula
Various types of number representation techniques for digital number representation are also used for
example: Binary number system, octal number system, decimal number system, and hexadecimal number
system, but binary number system is the most relevant for representing numbers in digital computer system.
In digital systems, binary numbers are represented by either using a fixed-point representation or a
floating-point number representation. But even though computers use binary number representations and
could make computations very fast, errors in computation may still occur. These errors may happen because
of modelling error, truncation error or round-off error.
REFERENCES
[1] Steven C. Chapra and Raymond P. Canale, NumericalMethods for Engineers, 7th Edition, McGraw-
Hill Education, NY 101121, copyright 2015
[2] mathissfun.com/ algebra/intermediate-value-theorem.html
[3] Alex Sviria, math24.net/ rolles-theorem.copyright 2020 math24.net
[4] cut-the-knot.org/ do-you-know/ fundamentaltheorem.shtml
[5] math.info/Calculus/Taylor_Formula
[6] Joe D. Hoffman, Numerical Methods of Engineers and Scientists, 2 nd Edition, copyright 2001,
Marcel Dekker, Inc., USA
[7] James F. Epperson, An Introduction to Numerical Methods and Analysis, 2 nd Edition, John Wiley &
Sons, Inc., New Jersey, copyright 2013.