Numerical Analysis
Numerical Analysis
Methods
Notes on the lectures
1 Introduction 7
1.1 Numerical solving . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Floating point . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Numerical errors . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.1 Numerical stability . . . . . . . . . . . . . . . . . . . . 14
1.4 Some educational examples . . . . . . . . . . . . . . . . . . . . 15
2 Nonlinear Equations 19
2.1 Bisection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 Simple iterative method . . . . . . . . . . . . . . . . . . . . . . 20
2.3 Newton or tangent method . . . . . . . . . . . . . . . . . . . . 23
2.4 Secant method . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.5 Other methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.6 System of nonlinear equations . . . . . . . . . . . . . . . . . . 28
2.6.1 Generalization of a simple iterative method . . . . . . . 28
2.6.2 Newton’s method . . . . . . . . . . . . . . . . . . . . . 28
2.7 Polynomial equations . . . . . . . . . . . . . . . . . . . . . . . 29
2.7.1 Laguerre’s method . . . . . . . . . . . . . . . . . . . . 30
3 Linear Systems 31
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 Vector and Matrix norms . . . . . . . . . . . . . . . . . . . . . 33
3.3 Sensitivity of the linear systems . . . . . . . . . . . . . . . . . 36
3.4 LU decomposition . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4.1 Permutation matrices . . . . . . . . . . . . . . . . . . . 37
3.4.2 LU decomposition . . . . . . . . . . . . . . . . . . . . 37
3.4.3 LU decomposition with pivoting . . . . . . . . . . . . . 41
3.5 Symmetric positive definite matrices . . . . . . . . . . . . . . . 44
3.6 Iterative methods for solving a linear systems . . . . . . . . . . 46
4 Contents
5 Eigenvalue problem 67
5.1 Power method . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.2 QR method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6 Interpolation 71
6.1 Polynomial interpolation . . . . . . . . . . . . . . . . . . . . . 71
6.2 Divided differences . . . . . . . . . . . . . . . . . . . . . . . . 74
6.2.1 Convergence of the interpolation polynomial . . . . . . 78
6.3 Spline interpolation . . . . . . . . . . . . . . . . . . . . . . . . 79
8 Bézier curves 99
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
8.1.1 Affine transformations in R3 . . . . . . . . . . . . . . . 99
8.1.2 Linear Interpolation . . . . . . . . . . . . . . . . . . . 101
8.1.3 Parametric curves . . . . . . . . . . . . . . . . . . . . . 101
8.2 Bézier curves . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
8.2.1 Bernstein polynomials . . . . . . . . . . . . . . . . . . 102
Contents 5
– The infinite dimensional spaces are replaced with the finite dimen-
sional spaces, e.g. instead of a general functions we compute the
solution in the space of the polynomials of degree at most 5 (inter-
polation).
– The complex functions are replaced with simpler ones, e.g. with
polynomials (integration).
– The matrices are replaced with the matrices that have a simpler
form, e.g. when solving the linear systems we transform the given
matrix in the upper triangular matrix.
x̄−x
• x̄ = x(1 + dr ), or dr = x
, where dr is the relative error.
1.2 Floating point 9
The numerical errors are, among other things, the result of the fact that we
can not present all the real numbers in the computer, but only finally many.
Therefore, all numbers that are not representable are approximated by the rep-
resentable ones, and for each such approximation we obtain some error.
In the computers, the numbers are written in the floating-point arithmetics as
x = ±m · be , m = 0.c1 c2 . . . ct ,
where:
• b is the base (2, 10 (calculators), 16 (IBM)),
• m is the significand or mantissa,
• t is the length of the mantissa,
• e is the exponent: L ≤ e ≤ U,
• ci are the digits between 0 and b − 1.
For the normalized numbers we assume c1 6= 0. Such system is denoted by
P (b, t, L, U ).
On the Figure 1.1 one can observe that these representable numbers are not
uniformly distributed on the real axis. Observe also that there is a large gap
Figure 1.1: The representable normalized (positive) numbers from the set
P (2, 3, −1, 1).
10 1 Introduction
between 0 and the first representable number. This gap can be reduced, if we
impose denormalized numbers, namely in the set P (2, 3, −1, 1) we additionally
obtain numbers
m number
10000010 01100000000000000000000 x = (1 + 2−2 + 2−3 ) · 2130−127 = 11
11111111 00000000000000000000000 x=∞
11111111 01011010100000000000000 x = N aN
00000000 00000000000000000000000 x=0
00000000 00000100000000000000000 x = 2 · 2−126 = 2−132
−6
Let x be the number. With f l(x) we denote the nearest representable number
of x:
1.3 Numerical errors 11
1
f l(x) = x(1 + δ), |δ| ≤ u, u = b1−t ,
2
and
√ √
f l( x) = x(1 + δ), |δ| ≤ u.
The exception is, if we get the overflow (⇒ ±∞) or the underflow (⇒ 0).
Dn = y − ȳ.
The perturbation theory deals with the analysis of the Dn . The question
is, how the result is changed, if the initial data are slightly changed. We
say that the problem is sensitive (poorly conditioned), if the changes are
big, and it is insensitive (good conditioned), if the obtained changes are
small.
Example: We are looking for the intersection of two given lines.
a) The intersection of
x + y = 2,
x − y = 0,
is x = y = 1. If we slightly change the right side of the equation,
namely,
x + y = 1.9999,
x − y = 0.0002,
the solution is x = 1.00005, y = 0.99985. Therefore the system is
insensitive, see Fig. 1.2 (left).
b) The intersection of
x + 0.99y = 1.99,
0.99x + 0.98y = 1.97,
is x = y = 1. If we slightly change the right side of the equation,
namely,
x + 0.99y = 1.9899,
0.99x + 0.98y = 1.9701,
the solution is x = 2.97, y = −0.99. Therefore the system is very
sensitive, see Fig. 1.2 (right).
1.3 Numerical errors 13
2.0 2.0
1.5 1.5
0.5 0.5
Figure 1.2: An example of insensitive (good conditioned) system (left) and sen-
sitive (poorly conditioned) system (right).
Dm = ȳ − ỹ.
Dz = ỹ − ŷ.
x̄3
• The method error: instead of sin(x̄) we compute g(x̄) = x̄ − 6
,
Dm = ȳ − ỹ = 2.5 · 10−5 .
Dz = ỹ − ŷ = 3.0 · 10−5 .
When analyzing the rounding errors we separate forward and backward (stabil-
ity) analysis.
|y − ŷ|
.
|y|
|x − x̂|
.
|x|
Example: Let’s make sure that the numerical computing is not equal to exact
computing.
a) Let us compute 100 · (100/3 − 33) − 100/3. Exactly we obtain the result
0, while numerically (e.g. in Octave or Matlab) we obtain 2.3448 · 10−13 .
b) What about the associativity? For
Example: One way how to compute the π is to consider that π is the limit of
the perimeter of the equilateral (n−sided) polygon Sn that is inscribed in the
circle with the radius r = 21 . Let an be the side of Sn . One can derive the
following relation between an and a2n :
v v
2
u s u q
u 2 u 1 − 1 − a2
u an 1 1 an 2
n
+ − −
t
a2n = t = .
2 2 4 2 2
π = n→∞
lim Sn .
But it turns out that the formula fails both in single and double precision, see
Table 1.1.
The question is, what is wrong with the formula (1.1)? In the limit Sn should
go to π, but when n is very big, the value of Snn is very small, and therefore
S2n → 0.
16 1 Introduction
n Sn n Sn
6 3.0000000 768 3.1417003
12 3.1058285 1536 3.1430793
24 3.1326280 3072 3.1374769
48 3.1393509 6144 3.1819811
96 3.1410384 12288 3.3541021
192 3.1414828 24576 3.0000000
384 3.1414297 49152 0.0000000
n Sn n Sn
6 3.0000000 768 3.1415837
12 3.1058285 1536 3.1375901
24 3.1326284 3072 3.1415918
48 3.1393499 6144 3.1415923
96 3.1410317 12288 3.1415925
192 3.1414523 24576 3.1415925
384 3.1415575 49152 3.1415925
If we sum (1.3) step by step, then for x > 0 we do not obtain good results.
Namely, in single precision for x = 10 we get the value −7.265709·10−5 instead
of 4.539993 · 10−5 . The reason is that the terms in the series are alternating and
furthermore, the absolute values are growing for some time and then they start
falling towards 0.
When summing the series of ex there are no such problems, as all terms are
positive, the final result is big and the relative error is small. Therefore the
solution for summing the series of e−x is to compute it as e−x = e1x .
2.1 Bisection
We are searching for an interval [a, b], as small as possible, for which f has
different signs in a and b. Let us present the bisection algorithm.
Algorithm 1 Bisection
Input: Function f, interval [a, b] ⊂ R and precision ∈ R.
Output: c ∈ R presenting the (approximate) zero of f.
1: while |b − a| > do
2: c = a + b−a2
;
3: if sign(f (a)) == sign(f (c)) then
4: a = c;
5: else
6: b = c;
7: end if
8: end while
20 2 Nonlinear Equations
Example: Let us find the zeros of the function f (x) = x − tan(x) using the
bisection. Except for x = 0 it is hard to find the good initial interval, as we
have problems with the poles. Solution to this problem is that we find the zeros
of g(x) = xcos(x) − sin(x) instead.
We rewrite f (x) = 0 into x = g(x) and then we proceed with the iteration
xr+1 = g(xr ), r = 0, 1, . . .
Example:
g(x) = x − f (x),
g(x) = x − cf (x), c 6= 0,
g(x) = x − h(x)f (x), h(x) 6= 0.
Example: Let us find the roots of p(x) = x3 − 5x + 1. From the graph of the
3
function we can see that one root is close to 0. We rewrite x = 1+x
5
, namely
1 + x3
g(x) =
5
2.2 Simple iterative method 21
x0 = 0,
1 + x3r
xr+1 = , r = 0, 1, . . .
5
This sequence converges to α = 0.201639678... This iteration works only for
√ to 0. To obtain the remaining two roots we can use the iteration
a root close
xr+1 = 3 5xr − 1, e.g.
If we start enough close to α, then, because of |g 0 (α)| < 1, there exists C < 1
such that |g 0 (α)| ≤ C < 1. We obtain
Table 2.1: The number of steps we need to get two consecutive approximations
that differ for < 10−12 .
Remark 2.2.1 • The method with quadratic convergence order in the neigh-
borhood of the solution converges very slowly far away from the solution,
slower than the methods that have linear convergence order in the neigh-
borhood of the solution.
• Usually the methods with high convergence order converge very fast in
the neighborhood of the solution, but if we do not have enough good ini-
tial approximation, it is better to use the method with linear convergence
order and only in the vicinity of the solution we shall use faster methods.
Let us approximate the given function f with the tangent at the point (xr , f (xr ))
and let us take the intersection of the tangent with the x−axis for the next ap-
proximate value, namely
f (xr )
xr+1 = xr − , r = 0, 1, . . . .
f 0 (xr )
∆y f (xr ) − 0
f 0 (xr ) = k = = ,
∆x xr − xr+1
and we obtain
f (xr )
xr − xr+1 =
f 0 (xr ).
24 2 Nonlinear Equations
15
10
xr+2 xr+1 xr
-5
-10
-15
The tangent method is actually just a special form of a simple iterative method
where g(x) = x − ff0(x)
(x)
. The derivative of g is equal to
f (x)f 00 (x)
g 0 (x) = .
f 02 (x)
f 00 (α)
g 00 (α) =
f 0 (α)
and observe that if f 00 (α) 6= 0 then g 00 (α) 6= 0 and we obtain a quadratic con-
vergence order, otherwise at least cubic convergence order.
If α is a zero of multiplicity m, we obtain limx→α g 0 (x) = 1 − 1
m
, so we have
linear convergence.
The next theorem says that all simple roots are attractive points for the tan-
gent method. If we take enough good initial guesses, the tangent method will
converge.
2.3 Newton or tangent method 25
0.5
0.4
0.3
0.2
0.1
-0.1
Under some assumptions one can prove also the global convergence for an ar-
bitrary initial approximations.
The above theorems assure the convergence of the tangent method close to the
zero or for the function of nice shape, otherwise we have very different be-
haviour and blind use of the tangent method is not recommended.
Example: The Figure 2.3 shows two examples of divergence of the tangent
method.
26 2 Nonlinear Equations
x0 1
x1
0.10
x2 x3
-3 -2 -1 1 2 3
0.05
-1
x3 x1 x0 x2
-0.05
-3
-0.10 -4
Figure 2.3: The divergence because the initial approximation was not enough
close to the zero (left) and the repetition of the approximations
(right).
xr − xr−1
xr+1 = xr − f (xr ), r = 1, 2, . . .
f (xr ) − f (xr−1 )
2.0
1.5
1.0
0.5
xr-1 xr xr+1
-0.5
-1.0
b) tangent method is faster, but requires also the derivative of the func-
tion, therefore sometimes the secant method is a better choice that
the tangent one as it requires less work.
Let us list some other methods one can use to solve the nonlinear equation:
• Muller’s method:
F (x) = 0, x ∈ Rn , F : Rn → Rn .
We obtain
∂f1 ∂f1
(x)(α1 − x1 ) + · · · + (x)(αn − xn ) = −f1 (x)(+ · · · ),
∂x1 ∂xn
..
.
∂fn ∂fn
(x)(α1 − x1 ) + · · · + (x)(αn − xn ) = −fn (x)(+ · · · ).
∂x1 ∂xn
In one step of a Newton’s method we have to solve the following system,
∂f1 (r) ∂f1
··· (x(r) ) (r)
f1 (x(r) )
(x ) ∆x1
∂x1 . ∂xn
.. .. = − ..
..
,
.
. .
∂fn ∂fn (r)
∂x1
(x(r) ) ··· ∂xn
(x (r)
) ∆x (r)
n f n (x )
or equivalently
JF x(r) ∆x(r) = −F x(r) .
We can use any of the methods previously written (bisection, tangent method,...),
but in such a way, we do not use the advantages that p is a polynomial.
In practice we usually require all roots of p. When the first one (let us denote it
p(x)
by xk ) is computed, we obtain (x−x k)
= q(x) which is a polynomial of a lower
degree and we further search for the roots of q.
30 2 Nonlinear Equations
0 1
.. ..
. .
An =
0 1
− aan0 − an−1
a0
··· − aa01
p0 (zr )
S1 = ,
p(zr )
p0 (zr )2 − p(zr )p00 (zr )
S2 = ,
p(zr )2
n
zr+1 = zr − q ,
S1 ± (n − 1)(nS2 − S12 )
where p is the given polynomial. The sign in the last equation is selected such
that the denominator has bigger absolute value.
Theorem 2.7.1 If the polynomial p has only real roots, than for an arbitrary
initial approximation Laguerre’s method converges to left or right nearby root.
In case of a simple root, the order of the convergence in the vicinity of the root
is cubic, otherwise it is linear.
Remark 2.7.1 The method works also for the complex roots, but in this case it
is not necessarily convergent for an arbitrary initial guess.
3 Linear Systems
3.1 Introduction
can be written as
where A is a real (or a complex) matrix and x, b are real (or complex) vectors.
Here ai,j denotes the (i, j)−th element of a matrix A and xi denotes the i−th
element of the vector x.
The vector x can be written as
x1
x2
= [x1 , x2 , . . . , xn ]> ,
x= ..
.
xn
Let us denote
0
.
..
0
ei =
1
← i,
0
..
.
0
then Aek is the k−th column of the matrix A, e>
i A is the i−th row of A and
>
ei Aek is the (i, k)−th element of A.
By A> we denote the transposed matrix of the matrix A and AH = Ā> is the
hermitian transpose of A. Note that AH is in some literature denoted as A∗ .
• det A 6= 0,
• rankA = n,
If there exists scalar γ and nonzero vector x such that Ax = λx, then λ is an
eigenvalue and x is an eigenvector of A.
ci,j = α>
i bj ,
ci,j = Abi ,
n
ai β >
X
C= i .
i=1
The matrix xy > is called the dyadic matrix and has a rank equal to 1.
Real matrix Q is called orthogonal, if Q−1 = Q> (or QQ> = Q> Q = I).
Complex matrix U is called unitary, if U −1 = U H (or U U H = U H U = I).
We say that two vector norms are equivalent, if there exist C1 , C2 > 0 such that
∀x ∈ Cn ,
C1 kxka ≤ kxkb ≤ C2 kxka .
1) kAk ≥ 0, kAk = 0 ⇔ A = 0,
2) kαAk = |α|kAk,
|ai,j |2
X
kAkF := (extended vector 2-norm).
i=1 j=1
Definition 3.2.3 The operator matrix norm based on a vector norm k·k is given
by
( )
n kAxk
kAk = max{kAxk : x ∈ C , kxk = 1} = max : x ∈ Cn , x 6= 0 .
kxk
3.2 Vector and Matrix norms 35
We obtain:
m
!
X
kAk1 = max kAxk1 = max |ai,j | , 1-norm,
i=1
q
kAk2 = max kAxk2 = max λi (AH A), spectral or 2-norm,
kxk2 =1
q
if A = AH , then kAk2 = max λi (A2 ) = max |λi (A)|
i=1,...,n i=1,...,n
Xn
kAk∞ = max kAxk∞ = max |ai,j |, ∞−norm.
kxk∞ =1 i=1,...,m
j=1
kAxk ≤ kAkkxk.
√
Note that Frobenius norm is not subordinate matrix norm, as kIkF = n.
Theorem 3.2.1 For every matrix norm and an arbitrary eigenvalue λ of A the
inequality
|λ| ≤ kAk
holds.
and therefore
|λ| ≤ kAk.
36 3 Linear Systems
where
K(A) = kAkkA−1 k.
The K(A) is called the sensitivity of the matrix A or also the condition number.
So the accuracy of the obtained solution depends on the sensitivity of the given
matrix A. For the K(A) we can derive the following estimation
1 ≤ kIk = kAA−1 k ≤ kAkkA−1 k = K(A),
and therefore
K(A) ≥ 1.
Example:
a) The least sensitive are the unitary (orthogonal) matrices multiplied by the
scalar, since for the unitary matrix U we derive
K(U ) = kU kkU −1 k = kU kkU −H k = 1 · 1 = 1.
For the matrices H7 and H10 we obtain K(H7 ) = 1.8 · 108 and K(H10 ) =
1.6 · 1013 .
3.4 LU decomposition 37
3.4 LU decomposition
e>
σ
.1
.. .
Pσ =
e>
σn
Properties:
• Pσ−1 = Pσ> ,
• Pσ A represents the matrix with rows of the matrix A that are permuted
with σ,
• APσ> represents the matrix with columns of the matrix A that are per-
muted with σ.
we obtain
0 1 0 2 2 1 0 1 3
1 0 0 ,
Pσ =
1 0 3 ,
Pσ A =
APσ> 2 2 1 .
=
0 0 1 3 2 1 2 3 1
3.4.2 LU decomposition
1 2 4
we obtain the following LU decomposition
1 0 0 2 2 3
2 1 0 0 1 0 .
LU =
1 5
2
1 1 0 0 2
• The matrix U is the upper triangular of the matrix A obtained at the end
of the Algorithm 3.
(1) (n−2)
• The elements a11 , a22 , . . . , an−1,n−12 with which we divide are called piv-
ots. They should not be equal to zero, otherwise the factorization fails to
materialize. This is a procedural problem. It can be removed by simply
reordering the rows of A so that the first element of the permuted matrix
is nonzero.
• Computing the LU decomposition using the Algorithm 3 requires
n−1 n n
2 1 1 2 3
2 = n3 − n2 − n = n + O(n2 )
X X X
1 +
j=1 i=j+1 k=j+1 3 2 6 3
0 1 1
−1
1 0 0 2 2 3 2 2 3
A(2) = L(2) A(1) 0 1 0 0 1 0 = 0 1 0 .
=
0 −1 1 0 1 54 0 0 45
We obtain
1 0 0 2 2 3
2 1 0 , 0 1 0 .
L=
U =
1
2
1 1 0 0 54
40 3 Linear Systems
1 y1 b1
`11 1
y2
b2
.. .. .. .. = .. (3.2)
. . .
.
.
`n1 `n2 ··· 1 yn bn
u11 u12 · · ·
u1n x1 y1
u22 · · · u2n
x2
y2
... .. .. = .. (3.3)
. . .
unn xn yn
3.4 LU decomposition 41
Let us present the Algorithm 5 for its computation. The cost of solving
the system 5 is
n
(2 + 2(j − 1)) = n2 + n
X
j=1
The method for the LU decomposition without pivoting fails, if any of the pivots
is equal to 0, while numerically it fails even if it is close to 0.
This problem can be removed by simply reordering the rows of A (partial piv-
oting) or also columns of A (complete pivoting). The latter is not very useful in
practice as it requires too many operations.
Using partial pivoting, before we start eliminating the entries below the main
diagonal in j−th column of A, we compare the values |aj,j |, |aj+1,j |, |aj+2,j |,
. . . |an,j | and we change the j−th row with the one that contains the maximal
element. As a result one obtains decomposition P A = LU, where P denotes
permutation matrix.
Because of the pivoting, the absolute value of all entries in the matrix L is
≤ 1.
Theorem 3.4.1 If the matrix A is nonsingular, then there exists such permuta-
tion matrix P, such that there exists LU decomposition with partial pivoting,
P A = LU,
where L is a lower triangular matrix with ones on the main diagonal and U is
an upper triangular matrix.
1 0 1
44 3 Linear Systems
1 0 1 0 0 1
1 0 0
L(1) 0 1 0 ,
=
1 0 1
−1 1 2 3
A(1) = L(1) Ã = 0 1 2 ,
0 −2 −2
1 2 3 0 1 0
j=2: Ã(1) = 0 −2 −2
, P (1) = 0 0 1
,
0 1 2 1 0 0
1 0 0
L(2) 0 1 0 ,
=
0 12 1
−1
1 2 3
A(2) = L(2) Ã(1) 0 −2 −2
=
.
0 0 1
We obtain
1 0 0 1 2 3
0 1 0
L= 1 1 0 0 −2 −2
P = 0 0 1
,
, U =
.
1
1 0 0 0 −2 1 0 0 1
• A is s.p.d., iff aii > 0 for ∀i and maxi,j |aij | = maxi |aii |,
• A is s.p.d., iff LU decomposition without pivoting does not fail and aii > 0
∀i,
• A is s.p.d., iff there exists a nonsingular lower triangular matrix V with
real and positive diagonal entries, such that A = V V > .
The decomposition A = V V > is called Cholesky decomposition and V is called
the Cholesky factor.
Now, let us present the Algorithm 7 for Cholesky decomposition. The number
n
1 3
n + O(n2 ),
X
(2(k − 1) + 2 + 2(n − k)k) =
k=1 3
that is half less than the number of the required operations for the LU decom-
position.
Example: Let us compute the Cholesky decomposition for the given matrix
4 −2 4 −2 4
−2 10 1 −5 −5
A=
4 1 9 −2 1 .
−2 −5 −2 22 7
4 −5 1 7 14
46 3 Linear Systems
Note that the Cholesky decomposition is the cheapest way to determine if the
given matrix is s.p.d., i.e., if the algorithm for its computation does not fail, then
the matrix is s.p.d..
x(r+1) = Rx(r) + c.
Proof: Let us denote the exact solution with x̂. Then from x̂ = Rx̂ + c and
x(r+1) = Rx(r) + c follows
and therefore
Obviously, the necessary and sufficient condition for the convergence is limk→∞ Rk =
0, what is equivalent to the condition ρ(R) < 1.
3.6 Iterative methods for solving a linear systems 47
M x(r+1) = −N x(r) + b, r = 0, 1, . . . ,
where the matrix M is choosen such that the system with the matrix M can be
solved faster as the initial system Ax = b.
In which cases the iterative methods are used?
• When we are dealing with large systems, i.e. when A is a large matrix.
• When the matrix A is a sparse matrix, i.e. a matrix in which most of the
elements are zero.
Jacobi method
If the diagonal entries of the matrix A are nonzero, i.e. aii 6= 0 ∀i, the system
Ax = b can be written as
1
x1 = (b1 − a12 x2 − a13 x3 − · · · − a1n xn ),
a11
..
.
1
xn = (bn − an1 x1 − an2 x2 − · · · − an,n−1 xn−1 .)
ann
This yields to the Jacobi method
n
(r+1) 1 bk −
X (r)
xk = aki xi , k = 1, 2, . . . , n.
akk
i=1
i6=k
RJ = −D−1 (L + U ).
Gauss-Seidel method
(r+1)
We are going to compute the values of x1 , . . . , xn(r+1) and when we compute
(r+1) (r+1) (r+1)
an element xk we can take into account the values x1 , . . . , xk−1 that are
already given. Considering this we obtain Gauss-Seidel method:
k−1 n
(r+1) 1 X (r+1) X (r)
xk = bk − aki xi − aki xi , k = 1, 2, . . . , n.
akk i=1 i=k+1
and the initial approximation x(0) = [101]> let us determine the first two steps
of the Jacobi and Gauss-Seidel method.
The Jacobi method:
The exact solution is equal to X = [1, 1, 1]> . Observe that A is strictly diago-
nally dominant matrix, therefore both methods converge.
(r+1) (r) (r+1)GS (r)
xk = xk + ω x k − xk , k = 1, 2, . . . , n,
(r+1)GS
where ω is called relaxation factor and xk is the approximation obtained
by the Gauss-Seidel method. Note that for ω = 1, SOR is equal to the Gauss-
Seidel method.
Definition 3.6.1 Let us define that the matrix A has the “property A”, if there
exists the permutation matrix P, such that
" #
A11 A12
P AP > = ,
A21 A22
Theorem 3.6.4 Let the matrix A has the property A and let µ = ρ(RJ ), then
a) ρ(RGS ) = µ2 ,
b) ωopt = √2 , ρ(RSOR (ωopt )) = ωopt − 1,
1+ 1−µ2
c)
ω − 1, ωopt ≤ ω < 2,
(
ρ(RSOR (ω)) = q
1 − ω + 12 ω 2 µ2 + ωµ 1 − ω + 41 ω 2 µ2 , 0 < ω ≤ ωopt .
4 −1 −1
−1 4 −1 −1
−1 4 −1 −1
A=
(3.4)
−1 4 −1
−1 −1 4 −1
−1 −1 4
ρ(ω)
1.0
0.8
0.6
0.4
0.2
ω
0.5 1.0 1.5 2.0
Figure 3.1: The graph of ρ(ω) that corresponds to the matrix (3.4).
4 Linear least-squares problem
(Overdetermined systems)
4.1 Introduction
• If we consider
kAx − bk2 ,
then the corresponding x that minimizes this norm is called the solution
of the least-squares method.
1 x1 x1 · · · xn1
a0 y1
1 x2 x22 · · · xn2
a1
y2
.. .. .. .. .. = .. , Aa = y.
. . . . . .
1 xm xm · · · xnm an ym
4 Linear least-squares problem
54 (Overdetermined systems)
Using the least-squares method one wants to find such a, that kAa − yk2 is
minimal. In this case, kAa − yk22 is also minimal, i.e.
n
yk22 (p(xi ) − yi )2
X
kAa − =
i=1
Example: In statistics one wants to estimate the parameters of the given model
with respect to the obtained measurements. Let us assume that the success of
the student b depends on
• a1 − the points achieved on the matura exam,
• a2 − the points achieved on the entrance exam,
• a3 − the high school success.
The hypothesis: The linear model b = x1 a + x2 a2 + x3 a3 , where x1 , x2 , x3 are
the unknowns. For m students we obtain the overdetermined system
a11 a12 a13 b1
.
x1
. .. .. ..
. x2 =
.
. . .
am1 am2 am3 x 3 bm
The system of linear equations (4.1) can be transformed into the system of the
normal equations, given as
A> Ax = A> b,
Theorem 4.2.1 The solution of the system of normal equations A> Ax = A> b
minimizes the norm kAx − bk2 .
4.2 System of the normal equations 55
Proof: Let B = A> A and C = A> b. Let A has a full rank. The matrix B is
s.p.d matrix, as
Observe that
Therefore
Remark 4.2.1 Solving the system of normal equations is the cheapest but at
least stable. Therefore one usually uses one of the other methods.
4 Linear least-squares problem
56 (Overdetermined systems)
4.3 QR decomposition
A> Ax = A> b
(QR)> QRx = (QR)> b
R> Q> QRx = R> Q> b
R> Rx = R> Q> b
Rx = Q> b.
where A> A is s.p.d. matrix and therefore R> is the Cholesky factor. As the
Cholesky decomposition is uniquely defined, R is also unique. Thus we can
conclude that Q = AR−1 is also unique.
The solution of the least-squares method is obtained, if one solves the upper
triangular system
Rx = Q> b.
Note that in exact computing, the results obtained by CGS and MGS are the
same, but numerically MGS is more stable than CGS.
Example: Let
1+ 1 1
A= 1 1+ 1 , = 10−10 .
1 1 1+
Q̃ = [Q Q1 ] ∈ Rm×m ,
In the plane, the vector x = [x1 , x2 ]> is rotated for an angle ϕ in the negative
axis such that it is multiplied by the matrix
" #
c s
R> = , c = cos ϕ, s = sin ϕ.
−s c
where c and s appear at the intersections i−th and j−th rows and columns. The
>
matrix Rik is orthogonal matrix called Givens rotation. Note that
yj = xi , ∀j 6= i, k,
yi = cxi + sxk ,
yk = −sxi + cxk .
>
q Rik can be choosen such that yk = 0 and yi = r. This is the case
The rotation
when r = x2i + x2k , c = xri , s = xrk . Therefore, using the appropriate Givens
rotations one may zero all underdiagonal elements in the given matrix A. The
number of the operations then is equal to 3mn2 − n3 .
× × × × × × × × ×
× × × R> 0 × × R> 0 × ×
R> =
12 13
−−→ −−→
× × × × × × 0 × ×
× × × × × × × × ×
× × × × × ×
R>14
0 × × R>
23
0 × ×
−−→ −−→
0 × × 0 0 ×
0 × × 0 × ×
× × × × × ×
R>24
0 × × R>
34
0 × ×
−−→ −−→ = R̃.
0 0 × 0 0 ×
0 0 × 0 0 0
Let us define
Q̃ := R12 R13 R14 R23 R24 R34 .
If we denote the upper 3 × 3 matrix of R̃ by R and the first n columns of Q̃ by
Q, we obtain the QR decomposition.
2
The number of the operations is equal to 2mn2 − n3 .
3
and P 2 = I as P 2 = P P = P P > = I,
• P represents the reflection over the hyperplane with the normal w :
every vector x can be presented as x = αw + u, u⊥w, and therefore
2
Px = x − ww> (αw + u) = x − 2αw = −αw + u.
w> w
The idea is to find the matrix P such that we zero all components of the given
vector x except the first one:
P x = ±ke1 .
Proof:
2 2x> (x − y)
Px = x − ww> x = x − (x − y)
w> w (x − y)> (x − y)
2(x> x − x> y)
=x− (x − y) = y.
2(x> x − x> y)
We would like that y = ±ke1 and therefore w = x ± ke1 . What is the value of
k?
The preposition of the Theorem 4.3.1 is kxk2 = kyk2 , therefore k = kxk2 and
w = x ± kxk2 e1 .
where Pi is the Householder reflection that zeroes all elements under the main
diagonal in the i−th column.
and
P̃n P̃n−1 · · · P̃1 = Q> .
Since P̃i = P̃i> , the matrix Q is equal to Q = P̃1 P̃2 · · · P̃n . As the matrices P̃i
are orthogonal, Q is also orthogonal,
Applications that employ the SVD include computing the pseudoinverse, least
squares fitting of data, multivariable control, matrix approximation, and deter-
mining the rank, range and null space of a matrix.
• In case if n > m, the SVD is obtained such that we transpose the SVD of
the matrix AH .
• If σr > 0 and σr+1 = σr+2 = · · · = σn = 0, then r is a rank of a matrix
A.
Then
kAx − bk2 = k[SV > x − U1> b, U2> b]> k2
and the minimum is obtained for SV > x = U1> b, or equivalently x = V S −1 U1> =
Pn u> i b
i=1 σi vi .
Pseudoinverse
−1
In case if m < n and rank(A) = m, then A+ = A> AA> .
4 Linear least-squares problem
64 (Overdetermined systems)
−1
Remark 4.4.1 If m = n, then A+ = A> A A> = A−1 A−> A> = A−1 .
Pr
If the rank(A) = r, then A = i=1 σi ui vi> .
Pk
Theorem 4.4.2 Let A = U ΣV > , rank(A) > k and let Ak = i=1 σi ui vi> or
equivalently Ak = U Σk V > , where
σ1
..
.
σk
Σk =
0 .
...
0
Then
min kB − Ak2 = kAk − Ak2 = σk+1 .
rank(B)=k
4.4 Singular value decomposition 65
kAk −Ak2 = kU (Σk −Σ)V > k2 ≤ kU k2 kΣk −Σk2 kV > k2 = kΣk −Σk2 = σk+1 .
Suppose that we are given the matrix B such that rank(B) = k, therefore dim
ker(B) = n − k. Let us define Vk+1 = [v1 , v2 , . . . , vk+1 ]. Then
By the latter theorem, Ak is the best approximation of the matrix A with the
matrix with the rank k. The value of σk+1 gives us the information how far is
the matrix A from the nearest matrix with the rank k.
The singular decomposition can be used for the compression of images. The im-
age can be presented with the matrix A whose elements represent the grayscale
level (or RGB). Instead of the matrix A (or in the color case matrices Ar , Ag , Ab ),
we the the best approximation with the matrix of the rank k. As a consequence,
instead of m × n data we require only (m + n)k data for [u1 , . . . , uk ] and
[σ1 r1 , . . . , σk rk ].
1. We compute A> A.
2. We determine the eigenvalues of the matrix A> A, e.g. as the roots of
polynomial det(A> A − λI) = 0.
>
√ as λ1 ≥ λ2 ≥ · · · ≥ λn (as A A is s.p.d., λi ≥ 0,
3. We sort the eigenvalues
∀λi ∈ R) and σi = λi , i = 1, 2, . . . , n.
4. We compute the eigenvectors of A> A from (A> A − λI)vi = 0, and by
the normalization we obtain the matrix V.
1
5. We compute ui = σi
Avi and finally obtain U.
5 Eigenvalue problem
A non-zero vector v ∈ Cn is an eigenvector of a square n × n matrix A if it
satisfies the linear equation
Av = λv,
where λ ∈ C is a scalar, termed the eigenvalue corresponding to v. That is, the
eigenvectors are the vectors that the linear transformation A merely elongates
or shrinks, and the amount that they elongate/shrink by is the eigenvalue. The
above equation is called the eigenvalue equation or the eigenvalue problem.
The vector y 6= 0 that satisfies y H A = λy H is called left eigenvector.
If y is left eigenvector of A with the corresponding eigenvalue λ, then y is the
right eigenvector of AH with the corresponding eigenvalue λ̄. So hereafter we
will consider only the right eigenvectors.
The eigenvalues are the roots of the polynomial
Which algorithm is suitable, depends on the needs of the user and the charac-
teristics of the given matrix. The following facts are important:
• is the given matrix symmetric (or hermitian),
• does the user require all eigenvalues or just some,
• does the user require also the eigenvectors.
converges towards the vector that corresponds to the biggest (by the absolute
value) eigenvalue.
Theorem 5.1.1 Let for the eigenvalues of the matrix A the following holds
|λ1 | > |λ2 | ≥ |λ3 | ≥ · · · ≥ |λn |.
If k → ∞, then xk converges towards the eigenvector that corresponds to the
eigenvalue λ1 .
5.1 Power method 69
Note that the order of convergence is linear and that in the practical applications
the method converges for every x0 .
We have the convergence also in case when |λ1 | = |λ2 | ≥ |λ3 | ≥ · · · ≥ |λn |
and λ1 = +λ2 (multiple dominant eigenvalue). The method can be modified
that it works also in case when |λ1 | = |λ2 | ≥ |λ3 | ≥ · · · ≥ |λn | and λ1 = −λ2 ,
λ1 = λ̄2 .
How can one obtain the eigenvalue, if the approximation for the eigenvector
is known?
The best approximate value is the λ that minimizes kAx − λxk2 . The solution
is the Rayleigh quotient
x> Ax
ρ(A, x) = , x 6= 0 .
x> x
5.2 QR method
This method is nowdays mostly used in the numerical packages. Here we
present the Algorithm 10 for its computation.
Note that:
• The matrices Ak , k = 0, 1, . . . , are similar to the matrix A, as Ak+1 =
Q>
k Ak , and therefore they have the same eigenvalues.
that matches the value of the function f in the points xi . Such polynomial is
called the interpolation polynomial.
How can one compute the interpolation polynomial? The naive approach
would be:
72 6 Interpolation
Let us set the linear system for the unknown coefficients of the polynomial
p(x) = a0 + a1 x + · · · + an−1 xn−1 + an xn , namely
a0 + a1 x0 + · · · + an−1 xn−1
0 + an xn0 = y0 ,
..
.
a0 + a1 xn + · · · + an−1 xn−1
n + an xnn = yn .
This approach, i.e. solving the obtained system, is time-consuming (it would be
better to use some explicit method), in addition, the system is ill-conditioned.
Proof:
• Let us first show the existence of such polynomial by construction. Let
us define the polynomials
n
Y x − xk
Ln,i (x) =
k=0 xi − xk
k6=i
f (n+1) (ξ)
f (x) − In (x) = ω(x), (6.1)
(n + 1)!
(n+1)
Proof: If x = xi , i ∈ 0, 1, . . . , n, then f (xi )−In (xi ) = 0 and f (n+1)!(ξ) ω(xi ) = 0.
Let us now assume x 6= xi , ∀i ∈ 0, 1, . . . , n. We define the function
therefore
f (n+1) (ξ)
R=
(n + 1)!
and the equation (6.2) at z = x reads as (6.1).
a) The polynomial
Proof:
a.1) The base case, n = 0: f [x0 ] is the leading coefficient of the polyno-
mial of degree 0 that matches f in x1 , namely this polynomial is the
line y = f (x0 ). Therefore f [x0 ] = f (x0 ).
6.2 Divided differences 75
b) Obviously.
c) Obviously.
d) Let p0 be the interpolation polynomial that matches f in x0 , x1 , . . . , xk−1
and let p1 be the interpolation polynomial that matches f in x1 , x2 , . . . , xk .
Then the interpolation polynomial p that matches f in x0 , x1 , . . . , xk has
the form
x − xk x − x0
p(x) = p0 (x) − p1 (x).
x0 − x k xk − x0
If we compare the leading coefficients, we obtain (6.4).
f (x1 ) − f (x0 )
f [x0 ] = f (x0 ), f [x0 , x1 ] = .
x 1 − x0
What happens if x1 → x0 ? From
f (x1 ) − f (x0 )
p(x) = f [x0 ] + f [x0 , x1 ](x − x0 ) = f (x0 ) + (x − x0 )
x1 − x0
and
f (x0 + h) − f (x0 )
f 0 (x0 ) = lim ,
h→0 h
76 6 Interpolation
one derives
p(x) = f (x0 ) + f 0 (x0 )(x − x0 )
and therefore
f [x0 , x0 ] = f 0 (x0 ).
Therefore the definition of the interpolation polynomial and the divided differ-
ences can be extended to the polynomials that, besides the values, match also the
derivatives. For example: For the given interpolation points x1 , x2 , x2 , x2 , x3 , x3 ,
we are searching for the polynomial p for which
p(x1 ) = y1 , p(x2 ) = y2 , p0 (x2 ) = y20 , p00 (x2 ) = y200 , p(x3 ) = y3 , p(x3 ) = y30 .
The divided differences can be computed in the triangular scheme from which,
at the end, we quick pick up a polynomial equation or calculate its value at the
given point.
1 (k)
f [x0 , x1 , . . . , xk ] = f (ξ),
k!
where
min (xi ) < ξ < max (xi ).
i=0,1,...,k i=0,1,...,k
f (n+1) (ξ)
f (x) − In (x) = ω(x),
(n + 1)!
where
min(x, x0 , . . . , xn ) < ξ < max(x, x0 , . . . , xn ).
The advantage of the new estimate is that it can be used also when the interpo-
lation points match.
78 6 Interpolation
One would expect that with raising the degree of the interpolation polynomial
In (and therefore raising the number of the interpolation points, i.e. n + 1) the
interpolation polynomial would converge to the interpolation function.
But, it turns out that for an arbitrary selection of interpolation points, there exists
a function for which the interpolation polynomial diverges when n increases.
observe what happens when raising the number of the interpolation points.
3 3
2 2
1 1
0 0
-1 -1
-4 -2 0 2 4 -4 -2 0 2 4
n=3 n=7
3 3
2 2
1 1
0 0
-1 -1
-4 -2 0 2 4 -4 -2 0 2 4
n=11 n=15
3 3
2 2
1 1
0 0
-1 -1
-4 -2 0 2 4 -4 -2 0 2 4
n=19 n=23
Figure 6.1: The graph of the interpolation polynomial (red) that interpolates the
1
function 1+x 2 on the interval [−5, 5] at the given equidistant points.
2k+1
Note that, if we take Chebyshev points xk = 5 cos 2n+1 π , then with rais-
ing the degree n of the interpolating polynomial, the interpolation polynomials
converge towards the given function f (see Figure 6.2).
6.3 Spline interpolation 79
3 3
2 2
1 1
0 0
-1 -1
-4 -2 0 2 4 -4 -2 0 2 4
n=3 n=7
3 3
2 2
1 1
0 0
-1 -1
-4 -2 0 2 4 -4 -2 0 2 4
n=11 n=15
3 3
2 2
1 1
0 0
-1 -1
-4 -2 0 2 4 -4 -2 0 2 4
n=19 n=23
Figure 6.2: The graph of the interpolation polynomial (red) that interpolates the
1
function 1+x 2 on the interval [−5, 5] at the given Chebyshev points.
As the polynomials of a higher degree may diverge when one wants to interpo-
late more points, we choose different solution to this problem. Instead of one
interpolation polynomial of a higher degree, the interpolating polynomial can
be composed of several polynomials of a lower degree. The obtained function
is called piecewise-polynomial function or a spline.
In practice, the cubic splines are the one that works the best.
pi (x) = f (xi ) + f [xi , xi+1 ](x − xi ) + f [xi , xi+1 , xi+ 1 ](x − xi )(x − xi+1 )
2
f (xi+1 ) − f (xi )
= f (xi ) + (x − xi )
xi+1 − xi
f [xi , xi+ 1 ] − f [xi+1 , xi+ 1 ]
+ 2 2
(x − xi )(x − xi+1 )
xi − xi+1
f (xi+1 ) − f (xi )
= f (xi ) + (x − xi )
xi+1 − xi
2(f (xi )−vi ) 2(vi −f (xi+1 ))
xi −xi+1
− xi −xi+1
+ (x − xi )(x − xi+1 )
xi − xi+1
f (xi+1 ) − f (xi )
= f (xi ) + (x − xi )
xi+1 − xi
(x − xi )(x − xi+1 )
+ (2f (xi ) + 2f (xi+1 ) − 4vi ).
(xi+1 − xi )2
Therefore
As
f (xi+1 ) − f (xi ) 2f (xi ) + 2f (xi+1 ) − 4vi
p0i (xi+1 ) = + ,
xi+1 − xi xi+1 − xi
f (xi+2 ) − f (xi+1 ) 2f (xi+1 ) + 2f (xi+2 ) − 4vi+1
p0i+1 (xi+1 ) = + (xi+1 − xi+2 ),
xi+2 − xi+1 (xi+2 − xi+1 )2
6.3 Spline interpolation 81
One vi can be arbitrary choosen. Let us assume that v0 = A. For the rest of
unknowns vi , i = 1, 2, . . . , n − 1, we obtain the (n − 1) × (n − 1) linear system
4 0 0 ··· 0 0 v1 6f (x1 ) + f (x0 ) + f (x2 ) − 4A
4 4 0 ··· 0 0 v2 6f (x2 ) + f (x1 ) + f (x3 )
0 4 4 ··· 0 0
v3
6f (x3 ) + f (x2 ) + f (x4 ) .
.. .. .. .. ...
..
. . . .
.
0 0 0 ··· 4 4 vn−1 6f (xn−1 ) + f (xn−2 ) + f (xn )
7 Numerical differentiation and
integration
From function values we would like to determine the approximate value of its
derivative. An idea is to use the derivative of the corresponding interpolation
polynomial. By differentiating
ω(x) (n+1)
f (x) = In (x) + f (ξx ),
(n + 1)!
we obtain
The error can be estimated if x = xk for some k ∈ {0, 1, . . . , n}. In this case
ω(x) = 0 and the second part of the error cancels. We derive
ω 0 (xk ) (n+1)
f 0 (xk ) = In0 (xk ) + f (ξ),
(n + 1)!
where In0 (xk ) represents the derivative of the interpolation polynomial of degree
n at the given point xk .
Example: Let us derive formula for f 0 (x1 ), if we know the values f (x0 ) and
84 7 Numerical differentiation and integration
f (x1 ) :
0 ω 0 (x0 ) 00
f (x0 ) = I10 (x0 ) + f (ξ0 )
2!
f 00 (ξ0 )
= (f [x0 ] + f [x0 , x1 ](x − x0 ))0 |x=x0 + ((x − x0 )(x − x1 ))0 |x=x0
2
x0 − x1 00
= f [x0 , x1 ] + f (ξ0 )
2
f (x1 ) − f (x0 ) h 00
= − f (ξ0 ).
h 2
n=1 (points x0 , x1 ) :
1 1
f 0 (x0 ) = (f (x1 ) − f (x0 )) − hf 00 (ξ0 ),
h 2
1 1
f 0 (x1 ) = (f (x1 ) − f (x0 )) + hf 00 (ξ1 ),
h 2
n=2 (points x0 , x1 , x2 ) :
1 1
f 0 (x0 ) = (−3f (x0 ) + 4f (x1 ) − f (x2 )) + h2 f 000 (ξ0 ),
2h 3
0 1 1 2 000
f (x1 ) = (−f (x0 ) + f (x2 )) − h f (ξ1 ) (the symmetric difference),
2h 6
0 1 1
f (x2 ) = (f (x0 ) − 4f (x1 ) + 3f (x2 )) + h2 f 000 (ξ2 ),
2h 3
1 1
f 0 (0) = (f (h)−f (−h))+O(h2 ), f 00 (0) = 2
(f (−h)−2f (0)+f (h))+O(h2 )
2h h
to compute f 0 (0) and f 00 (0) for the function f (x) = ex . We obtain the results
presented in the Table 7.1. Although the error of the methods is O(h2 ) we obtain
worse results for smaller value of h. Why?
7.1 Numerical differentiation 85
h f 0 (0) f 00 (0)
1 1.11752012 1.0861612
0.1 1.00166730 1.00083334
0.01 1.00001610 1.0000169
0.001 0.99999454 0.9983778
0.0001 0.99994244 1.4901161
Table 7.1: Numerical values of the derivatives f 0 (0) and f 00 (0) for the function
f (x) = ex .
The results of the previous example, where the smaller values of the step h
resulted in worse approximations for the first and second derivative of the given
function, can be explained on formula
1 1
f 00 (x1 ) = 2
(y0 − 2y1 + y2 ) − h2 f (4) (ξ), yi = f (xi ).
h 12
If we ignore the rounding error Dz , the whole error is composed of the error
that corresponds to the method and the irreducible error:
• the irreducible error: instead of computing with the exact values f (xi ) we
compute with the approximate values ỹi , such that |f (xi ) − ỹi | ≤ . We
obtain
4
|Dn | ≤ 2 .
h
h2 (4) 4
|D| ≤ |Dm | + |Dn | ≤ kf k + 2 .
12 h
If we know the estimate for the f (4) and , we can, from the estimate for D,
determine the optimal value of h, where the error for the whole error is the
smallest (see Figure 7.1).
86 7 Numerical differentiation and integration
error
method error
1.4 irreducible error
1.2
whole error
1.0
0.8
0.6
0.4
0.2
h
0.5 1.0 1.5 2.0 2.5 3.0
Figure 7.1: The optimal step h can be estimated from the graph of the estimation
for the whole error.
Z b n
X
f (x)dx = αi f (xi ) + R(f ) ,
a i=0
| {z }
| {z } error of the quadrature rule
quadrature rule
where points xi are called the integration points or. breakpoints and αi are the
weights (coefficients). We require that the formula is accurate for the polynomi-
als of the highest possible degree, therefore we integrate interpolation polyno-
mial instead of f.
If the breakpoints xi , i = 0, 1, . . . , n, are determined, then the weights Ai are
determined in the one of the following ways:
7.2 Numerical integration 87
15
10
2 4 6 8 10 12 14
We obtain Z b
Ai = Ln,i (x)dx.
a
n = 1 : Trapezoid rule:
Z x1
h h3
f (x)dx = (y0 + y1 ) − f 00 (ξ),
x0 2 12
How we obtain this rule? We compute
Z x1
x − x1 h Z x1
x − x0 h
A0 = dx = , A1 = dx = ,
x0 x0 − x1 2 x0 x1 − x0 2
n = 2 : Simpson’s rule:
Z x2
h h5 (4)
f (x)dx = (y0 + 4y1 + y2 ) − f (ξ),
x0 3 90
This rule is exact for all polynomials of degree 3.
n = 3 : Simpson’s 3/8 rule:
Z x3
3h 3h5 (4)
f (x)dx = (y0 + 3y1 + 3y2 + y3 ) − f (ξ),
x0 8 80
n = 4 : Boole’s rule:
Z x4
2h 8h7 (6)
f (x)dx = (7y0 + 32y1 + 12y2 + 32y3 + 7y4 ) − f (ξ),
x0 45 945
n=3: Z x3
3h 3h3 00
f (x)dx = (y1 + y2 ) + f (ξ),
x0 2 4
n = 4 : Milne’s rule
Z x4
4h 28h5 (4)
f (x)dx = (2y1 − y2 + y3 ) + f (ξ).
x0 3 90
90 7 Numerical differentiation and integration
Let us assume that every time we compute f (xi ) the obtained error is ≤ . The
estimate for the whole error is than |Dn | ≤ ni=0 |Ai |.
P
As
n n Z b n
Z b ! Z b
X X X
Ai = Ln,i (x)dx = Ln,i (x) dx = dx = b − a = nh,
i=0 i=0 a a i=0 a
in case when Ai ≥ 0 then |Dn | ≤ (b − a), i.e. the irreducible error is small.
Unfortunately: the closed rules for n > 8 contain some negative weights and
Pn Pn
i=0 |Ai | can be very big (although i=0 Ai = (b − a) = const.)
and for every subintegral, one may use some Newton Cotes rule of a lower
degree.
Here are some basic composite rules:
• Composite Trapezoid rule:
Z xn
h (xn − x0 )h2 00
f (x)dx = (y0 + 2y1 + 2y2 + · · · + 2yn−1 + yn ) − f (ξ)
x0 2 12
Obtained from:
n−1 n−1
!
Z xn X Z xi+1 h h3
(yk + yk+1 ) − f 00 (ξk )
X
f (x)dx = f (x)dx =
x0 i=0 xi i=0 2 12
n−1
X h3
h
= (y0 + 2y1 + 2y2 + · · · + 2yn−1 + yn ) − f 00 (ξk )
2 i=0 12
h nh3 00
= (y0 + 2y1 + 2y2 + · · · + 2yn−1 + yn ) − f (ξ).
2 12
evaluates the integrand at equally spaced points. The integrand must have con-
tinuous derivatives, though fairly good results may be obtained if only a few
derivatives exist.
The method is named after Werner Romberg (1909–2003), who published the
method in 1955.
holds.
Let us multiply the equation that corresponds to h2 with 4 and let us subtract
from it the equation that corresponds to h, in order to get rid of the part with h2
and therefore obtain better approximation:
(1)
I(f ) = T h (f ) + a2,1 h4 + a3,1 h6 + · · · ,
2
!4 !6
(1) h h
I(f ) = T h (f ) + a2,1 + a3,1 + ··· ,
4 2 2
where
(1)
4T h (f ) − Th (f ) (1)
4T h (f ) − T h (f )
2 4 2
T h (f ) = , T h (f ) = .
2 3 4 3
We continue with the procedure in
(2)
I(f ) = T h (f ) + a3,2 h6 + a4,2 h8 + · · · ,
4
where
(1) (1)
16T h (f ) − T h (f )
(2) 4 2
T h (f ) = .
4 15
By this we generate the triangular array presented by Table 7.2. The general
Table 7.2: The estimates of the definite integral obtained by the Romberg’s
method.
formula is
(j−1) (j−1)
4j T h (f ) − T h (f )
(j) 2k 2k−1
T h (f ) = .
2k 4j − 1
94 7 Numerical differentiation and integration
Let the initial step be equal to h = 0.6. Compute the two halving/steps of the
method.
One obtains the following results:
1 1
(0)
Th = 0.6 ln 1.0 + ln 1.6 + ln 2.2 = 0.5185394,
2 2
(0) 1 (0)
T h = Th + 0.3 (ln 1.3 + ln 1.9) = 0.5305351,
2 2
(0) 1 (0)
T h = T h + 0.15 (ln 1.15 + ln 1.45 + ln 1.75 + ln 2.05) = 0.5335847.
4 2 2
1 h
T h = T h + k (y1 + y3 + · · · + y2k −1 ).
2 k 2 2 k−1 2
In such a way each function value is computed only once and so using the
Romberg method results in a negligible additional computation in comparison
(0)
with computing the T h , while the obtained result can be much more accurate.
2k
Suppose that one would like to compute the integral of a function of two vari-
ables on a rectangular area Ω = [a, b] × [c, d],
ZZ Z b Z d
f (x, y) dx dy = dx f (x, y) dy.
Ω a c
7.2 Numerical integration 95
Example: Suppose that we have the 5 × 4 grid, see Table 7.3. Than the co-
efficients for the composite Trapezoid rule generalized to the double integral
computation are presented in Table 7.4.
1 2 2 2 1
hk 2 4 4 4 2
4 2 4 4 4 2
1 2 2 2 1
Table 7.4: The 5 × 4 grid with the coefficients that correspond to the composite
Trapezoid rule.
If one would use e.g. composite Simpson’s rule on 5 × 5 grid, the obtained
coefficients are presented in Table 7.5.
1 4 2 4 1
4 16 8 16 4
hk
9
2 8 4 8 2
4 16 8 16 4
1 4 2 4 1
Table 7.5: The 5 × 5 grid with the coefficients that correspond to the composite
Simpson’s rule.
is the variance of f.
• For the arbitrary d : Xi are the random vectors from [0, 1]d , where each
component is a random number on [0, 1].
7.2 Numerical integration 97
The Gaussian quadrature rule is named after Carl Friedrich Gauss. The definite
integral of a function,
Z b
f (x)ρ(x) dx, ρ ≥ 0,
a
where the points xi are not known in advance. The weights are determined by
the points xi since
Z b
Ai = Ln,i (x)ρ(x)dx,
a
and the rule will be for an arbitrary choice of xi accurate for the polynomials
of degree at least n. With an appropriate choice of xi one may achieve, that the
rule is accurate for the polynomials of degree at least 2n + 1.
In the background there are orthogonal polynomials, where the scalar product
of functions f, g is defined as
Z b
< f, g >:= f (x)g(x)ρ(x) dx, f ⊥g ⇔< f, g >= 0.
a
Example: The Gauss–Legendre quadrature rules on two and three points are
! !
Z 1
1 1 1 (4)
f (x) dx = f − √ + f √ + f (ξ),
−1 3 3 135
s s
Z 1
5 3 8 5 3 1
f (x) dx = f − + f (0) + f + f (6) (ξ).
−1 9 5 9 9 5 15750
Compare these two rules with the Trapezoid and Simpson’s quadrature.
8 Bézier curves
They were independently discovered by P. E. Bézier (1910-1999, Renault) and
P. de Casteljau (1930-1999, Citroen). In research area, the most notable was
the Bézier wirk, therefore the curves are named Bézier curves. According to
another author, however, is called very important algorithm (although it was
also independently discovered by someone in Bézier research group).
Bézier curves are important objects in Computer Aided Geometric Design, or
shortly CAGD. They are used e.g. for writing letters, car, ship and plane design,
motion design, object animations (e.g. cartoons), etc.
8.1 Introduction
of the points aj with the weights αj . The point x is mapped by Φ in such a way
that firstly the points aj are mapped and than we use the affine combination
with the same weights αj .
Example: The middle point of the line segment a1 a2 is mapped into the middle
point on the mapped line segment Φ(a1 )Φ(a2 ) :
1 1
x = (a1 + a2 ), Φ(x) = (Φ(a1 ) + Φ(a2 )).
2 2
100 8 Bézier curves
Example: The centroid, namely the geometric center of a plane figure or the
arithmetic mean position of all the points in the shape, of a discrete set of points
is mapped into a centroid of a discrete set of a mapped points.
Theorem 8.1.1 If we fix the coordinate system, then each affine mapping has
the form
Φ(x) = Ax + v, A ∈ R3×3 , v ∈ R3 . (8.1)
Pn
Proof: Let us show that (8.1) is the affine mapping. Let x = j=1 αj aj ,
Pn
j=1 αj = 1. Then
n
X n
X n
X
Φ αj aj = Φ(x) = A αj aj + αj v
j=1 j=1 j=1
n
X n
X n
X n
X
= αj (Aaj ) + αj v = αj (Aaj + v) = αj Φ(aj ).
j=1 j=1 j=1 j=1
Examples:
• Identity: A = I, v = 0.
• Translation: A = I, v is the translation vector.
• Scaling: v = 0, A is a diagonal matrix.
• Rotation: v = 0, A is orthogonal matrix (A> A = I). For example, for
the rotation around z−axis,
cos α − sin α 0
A = sin α cos α 0 .
0 0 1
• Shear stress (i.e. the component of stress coplanar with a material cross
section that arises from the force vector component parallel to the cross
section): v = 0, A is upper triangular matrix with ones on the main diag-
onal.
• Rigid motion: A is orthogonal matrix, v is an arbitrary vector.
Remark 8.1.1 Every affine mapping can be written as the compositum of trans-
lations, rotations, scalings and shear stresses.
8.1 Introduction 101
Remark 8.1.2 The x(t) is the affine combination of the points a and b.
As t can be written as
t = (1 − t) · 0 + t · 1,
it follows that we have the affine mapping that maps the line segment in R ⊂ R3
between 0 and 1 into the line segment in R3 between a and b. So the linear
interpolation is actually the affine interpolation.
The linear interpolation will be the main building block of de Casteljau algo-
rithm.
So the curve is an image of the mapping p (i.e. the set p(I)), where I is called
the domain of the parametrization.
In CAGD it is important what is the type of the parametric curve, since the
goal is usually the implementation of the algorithms in practice. Therefore we
mostly require polynomial or rational parametric curves. Hereafter we will con-
sider only polynomial parametric curves, i.e. curves whose components pi are
polynomials of degree ≤ n. Such curve is called polynomial curve of degree
n.
102 8 Bézier curves
Example:
!
n i
Bin (t) = t (1 − t)n−i , t ∈ [0, 1].
i
The Bernstein basis polynomialds of degree four are presented on the Fig-
ure 8.1.
1.0
0.8
0.6
0.4
0.2
Theorem 8.2.1 The Bernstein basis polynomials satisfy the following recur-
rence formula:
n−1
Bin (t) = (1 − t)Bin−1 (t) + tBi−1 (t),
8.2 Bézier curves 103
where
B00 (t) = 1 (8.2)
and
Bjn (t) = 0, ∀j ∈
/ {0, 1, . . . , n}.
Theorem 8.2.2 Bernstein basis polynomials form the partition of the unity:
n
Bjn (t) ≡ 1, Bjn (t) ≥ 0.
X
j=0
The Bernstein polynomials were firstly used in constructive proof of the Weier-
strass approximation theorem.
n
bn (t) = bj Bjn (t)
X
j=0
is called the Bézier curve. The points bj are called Bézier control points and
the polygon that connects them is called Bézier control polygon.
104 8 Bézier curves
Figure 8.2: Cubic Bézier curve with the corresponding control polygon.
The geometric interpretation can be seen on the Figure 8.3. Geometrically, this
algorithm represents the repetition of linear interpolations.
Usually, the coefficients bri are written in the triangular form, called de Casteljau
8.2 Bézier curves 105
03
022
031
02
021 041 03
2 023
04
01
1 5 2
Figure 8.3: De Casteljau algortihm for the cubic curve at t = 0.7. The point on
the curve is obtained with the repetition of linear interpolations.
Proof: We will use the induction. For r = 0, the (8.3) holds by definition. Let
us assume that (8.3) holds for r − 1 and let us use the recurrence relation for bri
106 8 Bézier curves
given in Algorithm 11 and the recurrence relation that holds for the Bernstein
polynomials,
Remark 8.2.1 With the intermediate points bri the Bézier curve can be written
as r
bn (t) =
X n−r
bi Bir (t).
i=0
The interpretation for this can be: first we compute n − r steps of de Casteljau
algorithm according to the t and we take the calculated points bn−r i for the
control points of the Bézier curve of degree r and compute the point with the
parameter t.
1. Affine invariance
Bézier curves are invariant under the affine mappings, i.e. the next two
processes lead to the same result:
a) we compute the point bn (t), and then we apply the affine mapping
on it,
8.2 Bézier curves 107
Theorem 8.2.4 The Bézier curve bn lies in the convex hull of its control
polygon.
b − u r−1 u − a r−1
bri (u) = bi (u) + b (u).
b−a b − a i+1
As the mapping t is affine, we say that the Bézier curves are invariant
with respect to the affine transformations of the parameter.
5. Symmetry The Bézier curves that correspond to the two sequences of the
control points b0 , b1 , . . . , bn and bn , bn−1 , . . . , b0 are the same, the only
difference is in the parametrization direction:
n n
bj Bjn (t) = bn−j Bjn (1 − t).
X X
j=0 j=0
n n 0 n
bj Bjn (t) = n
bn−i Bin (1−t) = bn−j Bjn (1−t).
X X X X
bj Bn−j (1−t) =
j=0 j=0 i=n j=0
7. Pseudo-local control Let us find the extrema of the Bernstein basis poly-
nom Bin . Let us compute the derivative
! !
d n n i−1 n
Bi (t) = it (1 − t)n−i + (n − i)ti (1 − t)n−i−1 (−1)
dt i i
!
n i−1
= t (1 − t)n−i−1 (i − nt).
i
So the solutions of the equation
d n
B (t) = 0
dt i
are t1 = 0, t2 = 1 and t3 = ni . As the Bernstein polynomials are non-
negative, the maximum of Bin on [0, 1] is obtained at t = ni . The practical
importance of this fact is that if one changes one of the control points bi ,
the change of the curve is the biggest in the neighborhood of the point
that corresponds to t = ni , although the change affects the whole curve.
where
∆bj := bj+1 − bj and ∆r bj = ∆(∆r−1 )bj .
r →r+1:
dr+1 n dr n
!
d
r+1
b (t) = b (t)
dt dt dtr
n−r
d
∆r bj Bjn−r (t)
X
= n(n − 1) · · · (n − r + 1)
dt j=0
n−r
d n−r
∆r bj
X
= n(n − 1) · · · (n − r + 1) B (t)
j=0 dt j
n−r
∆r bj Bj−1
n−r−1
(t) − Bjn−r−1 (t)
X
= n(n − 1) · · · (n − r)
j=0
n−r−1
∆r+1 bj Bjn−r−1 (t).
X
= n(n − 1) · · · (n − r)
j=0
8.2.5 Subdivision
Until now, the Bézier curve was defined on the interval [0, 1]. Sometimes one
would like to express just one part of the Bézier curve (e.g. on the interval [0, c],
c < 1) as an independent curve. Searching the control points of this curve is
called subdivision.
Let the control points bj , j = 0, 1, . . . , n, correspond to the basic curve while
cj , j = 0, 1, . . . , n, correspond to the curve we are looking for. We would like
that
c0 = b0 , cn = bn (c).
The part of the curve bn on the interval [0, c] must be exactly the curve cn .
One can show the following theorem.
8.2 Bézier curves 111
So one can read the control points of the curve cn from the de Casteljau scheme
for computing bn (c) (i.e. we read the diagonal elements of this scheme). Be-
cause of the symmetry, the control points of the Bézier curve on the interval
[c, 1] are bn−1
i (c) (i.e. the bottom line in the scheme - from right to left).
The subdivision can be repeated: e.g. first we preform it for c = 21 , then on the
half of both curves ... After k steps, we obtain 2k control polygons (small arcs),
which together describe the whole basic Bézier curve.
Example: Suppose that we would like to find the intersection of a given curve
with the given line. We can use min-max boxes. If the box and the line does not
intersect, than there is no intersection. If they intersect, we can make a curve
subdivision at t = 12 and we consider min-max boxes for both curves, namely,
we check if any of them intersect the given line. We continue with this process.
As the boxes quickly decrease, after some steps of iteration we take the center
of the box that the line intersects, for the approximation of the intersection.
-2
-4
-6
1 2 3 4 5
(i) Let us take an arbitrary curve in the plane and choose two points on it.
If the curve between these two points is replaced with the line segment
between these two points, we obtain the curve, for which clearly holds
that it less often intersects an arbitrary line as the initial curve.
(ii) Now, let us consider the control polygon P of the Bézier curve bn . We
make the subdivision at t = 21 and obtain two new control polygons that
together compose one piecewise linear curve L1 . Each knot of L1 is the
control point of one of the control polygons. So each such point is one of
the points in de Casteljau scheme for t = 12 (from the diagonal or the bot-
tom line). As the de Casteljau algorithm is actually repeating procedure
in (i), the piecewise line L1 intersects an arbitrary line less often as the
control polygon P.
(iii) We repeat the subdivision. On the step k we obtain piecewise linear curve
Lk composed of 2k control polygons, which intersect an arbitrary line less
often as the control polygon P.
(iv) Using some analysis and the fact that limk→∞ Lk = bn , we prove the
theorem.
8.3 Other interesting topics for further studying 113
01234and
Figure 8.7: An example of using polynomial 546478019
648
rational 9
984
Bézier 478 to
surfaces
construct the plane model.
!
"
#$
%&"'()*)#+
(a) (b)
,$#"-./"0
Figure 8.8: Geri’s game, 1997 computer animated short film made by Pixar.
Figure 2: The control mesh for Geri’s head, created by digitizing a
full-scale model sculpted out of clay.
The film won an Academy 269Award for(c)Best Animated $4"32( )3"2:"+#? )+2in
Short
+1"#$#2Film
+34+()(d)
+0*+56
+3#"1)$(:
")1232++"$")"(4#)" #)#+$6$")(1"1?)+))0
1998.in point,Left: The effort $
control "4mesh
4")#of
(2"1)Recursive
Gery’s
$subdivision "
+ 56+
5#+$) of a Right: 34 +( )+1" )"
#)
$ ( )"?+
animated. As a case considerable manual was
( )+"3:the
+$5(Figure +"$1 (4&4#head. 2+3#"15Geri.
#topologically )complicated
"22$)#16+"
required to hide the seams in the face of Woody, a principal +5+5mesh:
+)$(a)
0"the
+)control
5)mesh;
43(b)
2after
one 14"$subdivision
"((2+"step;
0+ (c)3 4()566"#+
after
(++6)789* #"1+4))3") 4the ": &)3#"")(4)+")?(@ABCDEFGHA53
$limit
character in Toy Story.
)")))789* (40;"+")steps;
two subdivision (32(d)
+) 2($3#+1)#"1#"4"+#)3"()0
": surface.
)")563"))"+#+#2+3#"1)#" IJIKLMNOPQMRS;0T0UV<(4#)"/"4W.<(4#)) /:
Subdivision surfaces have the potential to overcome both of these4"+#)3"()0*#2+3#"1")652#))" ()"+X2Y)+$Z;0T0TV<(4#)"/"4W.%:
problems: they do not require trimming, and smoothness of# $:+</4"+#)2()+0
="6+"2"1+
the 3clothing,
4()))6 "">#"+ )#"[;($/")0
model is automatically guaranteed, even as the model animates."+"1"#2+animation
3#"1)
of( ))+(however,
+1$:+poses its own dif culties which we
The use of subdivision in animation systems is not new, but for+#
4" a)0,"address
)56+3+4 "))4.
in Section >#1 \ ]74to5express
")it"#is): necessary
First,
function of the clothing on subdivision meshes in such a way that
478 the energy
variety of reasons (several of which we address in this paper), their ()(( 6)(+structure
(4&()#"1#
use has not been widespread. In the mid 1980s for instance, Sym- the resulting motion does not )#)"+the
inappropriately reveal #(")"()2#$
of the subdivision control mesh. 4)
Second,6 "^
in1) "
((
order +789*
for a 0"((+789*"#+4":
physical
bolics was possibly the rst to use subdivision in their animation
simulator to make use of subdivision
( "2#it)must
"" +3col-
2&)$(("
system as a means of creating detailed polyhedra. The LightWave )(#
surfaces ':_ 31")+*1);($0+56:
compute
3D modeling and animation system from NewTek also uses subdi- lisions very ef ciently. While collisions of NURBS surfaces)
3"5# 11
" 1
"( )
))6+ 1
?
# .
have
vision in a similar fashion. been studied in great detail, little work-0 "((
has $&done
been 43previously
+4")#(""""0
This paper describes a number of issues that arose when we with subdivision surfaces. `0;)+1?#))()()5"344"&()
added a variant of Catmull-Clark [2] subdivision surfaces to our Having modeled and animated ( )
subdivision
5)) (
surfaces, 1)some
4)6"^)(+
animation and rendering systems, Marionette and RenderMan [17], formidable challenges remain before they can be rendered. The
respectively. The resulting extensions were used heavily in the cre- topological freedom that makes subdivision surfaces so attractive
ation of Geri (Figure 1), a human character in our recently com- for modeling and animation means that they generally do not
pleted short lm Geri’s game. Speci cally, subdivision surfaces admit parametrizations suitable for texture mapping. Solid tex-
were used to model the skin of Geri’s head (see Figure 2), his hands, tures [12, 13] and projection textures [9] can address some pro-
and his clothing, including his jacket, pants, shirt, tie, and shoes. duction needs, but Section 5.1 shows that it is possible to go a good
In contrast to previous systems such as those mentioned above, deal further by using programmable shaders in combination with
that use subdivision as a means to embellish polygonal models, our smooth scalar elds de ned over the surface.
system uses subdivision as a means to de ne piecewise smooth sur- The combination of semi-sharp creases for modeling, an appro-
faces. Since our system reasons about the limit surface itself, polyg- priate and ef cient interface to physical simulation for animation,
List of Figures
points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.2 The graph of the interpolation polynomial (red) that interpolates
1
the function 1+x 2 on the interval [−5, 5] at the given Chebyshev
points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
7.1 The optimal step h can be estimated from the graph of the esti-
mation for the whole error. . . . . . . . . . . . . . . . . . . . . 86
7.2 The numerical integration. . . . . . . . . . . . . . . . . . . . . 87
7.3 The numerical integration: Trapezoid rule. . . . . . . . . . . . . 88
7.4 The numerical integration: Simpson’s rule. . . . . . . . . . . . . 89
7.5 The numerical integration: Midpoint rule. . . . . . . . . . . . . 90
7.1 Numerical values of the derivatives f 0 (0) and f 00 (0) for the func-
tion f (x) = ex . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7.2 The estimates of the definite integral obtained by the Romberg’s
method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
7.3 The 5 × 4 grid on the given rectangular area. . . . . . . . . . . . 95
7.4 The 5 × 4 grid with the coefficients that correspond to the com-
posite Trapezoid rule. . . . . . . . . . . . . . . . . . . . . . . . 95
7.5 The 5 × 5 grid with the coefficients that correspond to the com-
posite Simpson’s rule. . . . . . . . . . . . . . . . . . . . . . . . 96