MATH3511_Assignment_4
MATH3511_Assignment_4
Assignment 4
Phoebe Grosser, u7294693
May 10, 2023
xi = xi−1 + h (1)
We will assume that a and b (wherein xi ∈ [a, b], and h = (b − a)/n) are floating point numbers, but h is not. We will
first determine the value of x1 in floating point arithmetic:
f l(x) = x(1 + δ)
wherein |δ| < ϵ (”machine epsilon”), we can substitute into our previous equation to get an expression for f l(x1 ):
Here, δ0 may be 0 if we’re considering the very beginning of our domain; however, in general it will not be. Expanding
the above, we attain:
Because |δ| < ϵ, we can determine an upper bound on the magnitude of the error terms (δ0 + δ2 + δ0 δ2 ) and (δ1 + δ2 + δ1 δ2 )
(which are effectively identical as the δ terms are indistinguishable from one another):
where we have taken the extreme values of |δ| in the last two lines. Hence, we can see that:
with |ϕ1 | ≤ 2ϵ (and ϕ as our new error term). So f l(x1 ) has an error on the order of 2ϵ. We can prove that, in general,
f l(xi ) has an error on the order of (i + 1)ϵ via mathematical induction. The base case has already been proven above.
Our inductive assumption will be that:
Page 1
Then, f l(xi ) will be given by:
f l(xi ) = f l(f l(xi−1 + f l(h))
= (xi−1 (1 + ϕi−1 ) + h(1 + δ0 ))(1 + δ1 )
= (xi−1 + ϕi−1 xi−1 + h + δ0 h)(1 + δ1 )
= xi−1 + ϕi−1 xi−1 + h + δ0 h + δ1 xi−1 + δ1 ϕi−1 xi−1 + δ1 h + δ1 δ0 h
= xi−1 + h + xi−1 (ϕi−1 + δ1 + δ1 ϕi−1 ) + h(δ0 + δ1 + δ1 δ0 )
We already have that the upper bound on the (δ0 + δ1 + δ1 δ0 ) error term is 2ϵ. We can determine an upper bound on the
magnitude of the error term (ϕi−1 + δ1 + δ1 ϕi−1 ) by noting that |δ| < ϵ and |ϕi−1 | ≤ iϵ (our inductive assumption):
|ϕi−1 + δ1 + δ1 ϕi−1 | ≤ |ϕi−1 | + |δ1 | + |δ1 ϕi−1 | (triangle inequality)
2
=⇒ |ϕi−1 + δ1 + δ1 ϕi−1 | ≤ iϵ + ϵ + iϵ
=⇒ |ϕi−1 + δ1 + δ1 ϕi−1 | ≤ (i + 1)ϵ + iϵ2 ≈ (i + 1)ϵ
where we have taken the extreme values of |δ| and |ϕi−1 | in the last two lines. Hence, by noting that the above error term
will in general dominate the 2ϵ error term, we can see that:
f l(xi ) = (xi−1 + h)(1 + ϕi ) (3)
with |ϕi | ≤ (i + 1)ϵ.
b.
We now want to determine the accuracy of the following alternative method of determine the floating point value of xi :
xi = a + ih (4)
To implement this in the floating point model, we can note that a will not incur a floating point error as we assume that
it has already been stored as a floating point number. i is an integer, so can be stored exactly. However, what we need
to notice is that not only does h incur a floating point error, but multiplication by i will introduce an additional floating
point error. This can be easily shown by examining the section in the lecture notes where we conclude that multiplication
in the floating point model is given by:
f l(f l(x) × f l(y)) = f l(x(1 + δ1 ) × y(1 + δ2 ))(1 + δ3 )
= (xy)(1 + δ4 )
where
(1 + δ4 ) = (1 + δ1 )(1 + δ2 )(1 + δ3 )
If, say, x is stored exactly, then δ1 = 0. This still leaves:
f l(x × f l(y)) = (xy)(1 + δ4 )
where
(1 + δ4 ) = (1 + δ2 )(1 + δ3 )
Thus, to implement our equation xi = a + ih in the floating point model, we must use:
f l(xi ) = f l(a + f l(ih))
= (a + ih(1 + δ0 )(1 + δ1 ))(1 + δ3 )
= (a + ih + δ0 ih + δ1 ih + δ0 δ1 ih)(1 + δ3 )
= a + ih + δ0 ih + δ1 ih + δ0 δ1 ih + δ3 a + δ3 ih + δ0 δ3 ih + δ1 δ3 ih + δ0 δ1 δ3 ih
= a + ih + a(δ3 ) + ih(δ0 + δ1 + δ0 δ1 + δ3 + δ0 δ3 + δ1 δ3 + δ0 δ1 δ3 )
Again, because |δ| ≤ ϵ, we can determine an upper bound on the (δ0 + δ1 + δ0 δ1 + δ3 + δ0 δ3 + δ1 δ3 + δ0 δ1 δ3 ) error term.
We can clearly see that the upper bound on the (δ3 ) error term will be ϵ. Taking the extreme values of |δ|, we attain as
our error bound:
|δ0 + δ1 + δ0 δ1 + δ3 + δ0 δ3 + δ1 δ3 + δ0 δ1 δ3 | ≤ |δ0 | + |δ1 | + |δ0 δ1 | + |δ3 | + |δ0 δ3 | + |δ1 δ3 | + |δ0 δ1 δ3 |
=⇒ |δ0 + δ1 + δ0 δ1 + δ3 + δ0 δ3 + δ1 δ3 + δ0 δ1 δ3 | ≤ ϵ + ϵ + ϵ2 + ϵ + ϵ2 + ϵ2 + ϵ3
=⇒ |δ0 + δ1 + δ0 δ1 + δ3 + δ0 δ3 + δ1 δ3 + δ0 δ1 δ3 | ≤ 3ϵ + 3ϵ2 + ϵ3 ≈ 3ϵ
Page 2
Hence, by noting that the above error term will in general dominate the ϵ error term, we can see that:
Hence, for i > 2 (and in general), the second method will minimise the magnitude of the floating point error.
Page 3
Question 2: Matrix Norms
a.
To prove the identity that A−1 − B −1 = A−1 (B − A)B −1 , we simply expand out the RHS:
=1 =1
= A−1 − B −1
Then, taking the norm of both sides and applying norm properties:
b.
We let B = A + E and δ = ∥E∥∥A−1 ∥ < 1. We have from the identity that we proved in the previous section:
However, the labelling of our matrices is completely arbitrary. If we simply interchange the A and B matrices, we attain:
B −1 = B −1 (A − B)A−1 + A−1
By applying matrix norm properties, we can determine the following inequality for ∥B −1 ∥ from the above expression:
B −1 = B −1 (A − B)A−1 + A−1
∥B −1 ∥ = ∥B −1 (A − B)A−1 + A−1 ∥
∥B −1 ∥ ≤ ∥B −1 (A − B)A−1 ∥ + ∥A−1 ∥ (triangle inequality)
−1 −1 −1
≤ ∥B ∥∥(A − B)A ∥ + ∥A ∥ (sub-multiplicity)
−1 −1 −1
≤ ∥B ∥∥(A − B)∥∥A ∥ + ∥A ∥ (sub-multiplicity)
−1 −1 −1
= ∥B ∥∥ − E∥∥A ∥ + ∥A ∥ by def. of E
−1 −1 −1
= ∥B ∥∥E∥∥A ∥ + ∥A ∥
−1 −1
= ∥B ∥δ + ∥A ∥ by def. of δ
So we have attained that ∥B −1 ∥ ≤ ∥B −1 ∥δ + ∥A−1 ∥. Now, rearranging to isolate ∥B −1 ∥ on the LHS of the equation:
∥B −1 ∥ ≤ ∥B −1 ∥δ + ∥A−1 ∥
=⇒ ∥B −1 ∥ − ∥B −1 ∥δ ≤ ∥A−1 ∥
=⇒ ∥B −1 ∥(1 − δ) ≤ ∥A−1 ∥
1
=⇒ ∥B −1 ∥ ≤ ∥A−1 ∥
(1 − δ)
We note that the inequality sign does not change on the last line because 1 − δ > 0. Hence, we have proven that:
1
∥B −1 ∥ ≤ ∥A−1 ∥ (9)
(1 − δ)
Page 4
Now, if we take the norm of our original identity in Question 1a, we have:
A−1 − B −1 = A−1 (B − A)B −1
∥A−1 − B −1 ∥ = ∥A−1 (B − A)B −1 ∥
≤ ∥A−1 (B − A)∥B −1 ∥ (sub-multiplicity)
−1 −1
≤ ∥A ∥∥(B − A)∥∥B ∥ (sub-multiplicity)
−1 −1
= ∥A ∥∥E∥∥B ∥ by def. of E
−1
= δ∥B ∥ by def. of δ
Now, if ∥A−1 − B −1 ∥ ≤ δ∥B −1 ∥ and ∥B −1 ∥ ≤ 1
(1−δ) ∥A
−1
∥ (from previous result), then:
δ
∥A−1 − B −1 ∥ ≤ δ∥B −1 ∥ ≤ ∥A−1 ∥ (10)
(1 − δ)
Which gives the result:
δ
∥A−1 − B −1 ∥ ≤ ∥A−1 ∥ (11)
(1 − δ)
c.
We suppose that Ax = b and (A + E)(x + e) = b. To first show that Ae = −E(x + e), we can expand out the LHS of
given original equation:
(A + E)(x + e) = b
Ax + Ae + Ex + Ee = b
Substitute Ax = b on the LHS:
=⇒ b + Ae + Ex + Ee = b
=⇒ Ae = −Ex − Ee
=⇒ Ae = −E(x + e)
−1
If we solve for e by multiplying both sides by A , then we attain:
e = −A−1 E(x + e)
By taking the norm of both sides and applying matrix norm properties, we can determine the following inequality for ∥e∥
from the above expression:
e = −A−1 E(x + e)
∥e∥ = ∥ − A−1 E(x + e)∥
= ∥ − A−1 Ex − A−1 Ee∥
≤ ∥ − A−1 Ex∥ + ∥ − A−1 Ee∥ (triangle inequality)
−1 −1
≤∥−A ∥∥Ex∥ + ∥ − A ∥∥Ee∥ (sub-multiplicity)
−1 −1
≤∥−A ∥∥E∥∥x∥ + ∥ − A ∥∥E∥∥e∥ (sub-multiplicity)
−1 −1
= ∥A ∥∥E∥∥x∥ + ∥A ∥∥E∥∥e∥
= δ∥x∥ + δ∥e∥ by def. of δ
So we have attained that ∥e∥ ≤ δ∥x∥ + δ∥e∥. Now, rearranging to isolate ∥e∥ on the LHS of the equation:
∥e∥ ≤ δ∥x∥ + δ∥e∥
=⇒ ∥e∥ − δ∥e∥ ≤ δ∥x∥
=⇒ ∥e∥(1 − δ) ≤ δ∥x∥
δ
=⇒ ∥e∥ ≤ ∥x∥
(1 − δ)
Again, we note that the inequality sign does not change on the last line because 1 − δ > 0. Hence, we have proven that:
δ
∥e∥ ≤ ∥x∥ (12)
(1 − δ)
Page 5
Question 3: Stability
a.
We define:
Z 1
xn = tn (t + 7)−1 dt (13)
0
We can also determine a recurrence relation for the above formula, as shown below:
Z 1
tn
xn = dt
0 (t + 7)
Z 1 n
t + 7tn−1 − 7tn−1
= dt
0 (t + 7)
Z 1 n−1
t (t + 7) − 7tn−1
= dt
0 (t + 7)
Z 1
tn−1
= tn−1 − 7 dt
0 (t + 7)
Z 1 Z 1 n−1
n−1 t
= t dt − 7 dt
0 0 (t + 7)
Now we can recognise that the second term is simply xn−1 , via our original definition of xn .
Z 1
=⇒ xn = tn−1 dt − 7xn−1
0
tn 1
= − 7xn−1
n 0
= f rac1n(1n − 0n ) − 7xn−1
1
= − 7xn−1
n
So our recurrence relationship is:
1
xn = − 7xn−1 (14)
n
The plot of xn for n = 0 to n = 18 is shown below in Figure 1.
Page 6
Figure 1: Plot of xn using the forward recurrence relation
b.
The following figure (Figure 2) displays the difference between the value of xn using the forward recurrence relation, and
using the integral Matlab function.
Figure 2: Plot of the absolute error in xn against n for the forward recurrence relation
It is slightly more clear to see this error on a log-linear scale. Figure 3 displays a log-linear plot of the same error alongside
a plot of 7n (the reason for which will be discussed in Q3d).
Page 7
Figure 3: Log-linear lot of the absolute error in xn against n for the forward recurrence relation
c.
We can apply the same recurrence relation that we derived above in Equation 14 backwards. Rearranging terms, we find
that:
1 1
xn−1 = − xn (15)
7 n
The following figure (Figure 4) displays the difference between the value of xn using the backward recurrence relation,
and using the integral Matlab function.
Note for marker: To produce the following graph, the factor of 1/7 was taken inside the bracketed expression in
Equation 16 and applied to each term individually. I am aware that retaining the brackets produces a different error plot
to the one shown below in Figure 4, but I believe the one shown more clearly displays the error associated with carrying
through an error associated with x18 + ϵ (as will be discussed in Q3d).
As can be seen in Figure 4, there is no consistent pattern to the errors, unlike Figure 2. However, the order of magnitude
of the errors is very small (10−18 ) for all n values, so what we’re observing is an error that remains around what is
effectively numerical 0 (or as close to it as we can get).
Page 8
Figure 4: Plot of the absolute error in xn against n for the backward recurrence relation
d.
To determine whether the forward or the backward recurrence relation (or neither) give us the correct answer, we can
observe what happens if we carry through a small error associated with our starting values.
For the forward recurrence relation, suppose that our initial value, x0 , is perturbed by some small error, ϵ, such that our
true starting value is actually x′0 = x0 + ϵ. Then, our next iteration, x′1 , will produce:
1
x′1 = − 7x′0
1
= 1 − 7(x0 + ϵ)
= 1 − 7x0 − 7ϵ
= x1 − 7ϵ
So, the iteration x′1 is perturbed by an error −7ϵ. Continuing on, the next iteration, x′2 , will produce:
1
x′2 = − 7x′1
2
1
= − 7(x1 − 7ϵ)
2
1
= − 7x1 + 49ϵ
2
= x2 + 49ϵ
If we continue this on, then we will see that in general, the iteration x′n will be perturbed by an error (−7)n ϵ for n ≥ 1.
We can now apply this same thinking to the backward recurrence relation. Suppose that our initial value, x18 , is perturbed
Page 9
by the same error such that the true starting value is x′18 = x18 + ϵ. Our next iteration, x′17 , will produce:
′ 1 1 ′
x17 = − x18
7 18
1 1
= − (x18 + ϵ)
7 18
1 1 1
= − x18 − ϵ
7 18 7
1
= x17 − ϵ
7
So, the iteration x′17 is perturbed by an error − 17 ϵ. Continuing on, the next iteration, x′16 , will produce:
1 1
x′1 6 = − x′17
7 17
1 1 1
= − x17 − ϵ
7 17 7
1 1 1
= − x17 + ϵ
7 17 49
1
= x16 + ϵ
49
(18−n)
If we continue this on, we will see that the iteration x′n will be perturbed by an error − 17 ϵ for n ≤ 17.
Thus, what we expect with the forward recurrence relation is that, if we have some small error ϵ in our starting value,
then the absolute value of the error will be multiplied by 7 on each subsequent iteration. In fact, this is precisely what
we can see in the log-linear plot in Figure 3 - the gradient of our absolute error is exactly the same as that of 7n as we
iterate in n (ie. it is increasing at exactly the predicted rate). So, the forward recurrence relation does not give us the
correct answer, as it multiplies the error by a factor of 7 on subsequent iterations.
Conversely, we can see with the backward recurrence relation that, if we have some small error ϵ in our starting value,
then the absolute value of the error will be multiplied by 1/7 on each subsequent iteration. In contrast to the forward
recurrence relation, we actually expect the error to decrease on subsequent iterations. While there will still be some
floating point rounding errors, the backward recurrence relation should therefore converge to the true value of xn on
subsequent iterations and therefore give us close to the correct answer.
The reason we cannot observe this in the Figure 4 plot is because we start out with very negligible error in the first place
(that is, we found x18 using the integral function), so the error just remains around numerical 0 for all iterations. The
multiplication by a factor of 1/7 at each iteration is a ”worst case scenario” - for the precise backward recurrence relation
that we’re examining, the error behaves far more nicely than that.
Page 10
Question 4: Iterative Solvers
a.
We would like to determine the true solution to the matrix equation Ax = b, where:
I A1
A = n,n (17)
A2 In,n
0.5
0.5
b= . (18)
..
0.5
A1 is a square matrix with all 0 entries except the last two columns, where all entries are −0.25. Similarly, A2 is a square
matrix with all 0 entries except the first two columns, where all entries are −0.25. If we write out the matrix system that
we’d like to solve with n = 8, we can see that it will resemble the form:
1 0 0 0 0 0 −0.25 −0.25 0.5
0
1 0 0 0 0 −0.25 −0.25
0.5
0
0 1 0 0 0 −0.25 −0.25
0.5
0 0 0 1 0 0 −0.25 −0.25 0.5
−0.25 −0.25 0 0 1 0 x = (19)
0 0 0.5
−0.25 −0.25 0 0 0 1 0 0 0.5
−0.25 −0.25 0 0 0 0 1 0 0.5
−0.25 −0.25 0 0 0 0 0 1 0.5
Just by inspecting the above system, it is clear that our true solution x will be a column vector of 1’s. This is because,
for each row, we want to pick up the 1 down the diagonal of the matrix and then subtract both of the −0.25 entries in
each row to attain a final value of 0.5. So, in general, the true solution x will be:
1
1
x = . (20)
..
1
b.
The plots on the following page display the number of iterations required by both the Jacobi and Gauss-Seidel methods
to achieve convergence to the true solution, against the dimension of the matrix A. Figure 5 used the infinity norm to
determine convergence, whereas Figure 6 used the Euclidean 2-norm to determine convergence.
Page 11
Figure 5: Number of iterations against n for the Jacobi and Gauss-Seidel methods, using the infinity norm for convergence
Figure 6: Number of iterations against n for the Jacobi and Gauss-Seidel methods, using the Euclidean 2-norm for
convergence
c.
We can see that, in Figures 5 and 6, the number of iterations is only very weakly dependent on the matrix dimension, n.
In fact, if the infinity norm is used for convergence, then the iterations is not dependent on the dimension of the matrix
A at all. For the Euclidean 2-norm, the number of iterations is only increasing by approximately 2 for every order of
Page 12
magnitude of n - a very marginal increase.
The difference in the number of iterations can be explained by examining the spectral radius of E for the two different
iterative methods. We are told that ρ(EJ ) = 0.5 and ρ(EGS ) = 0.25 for the error matrices of the Jacobi and Gauss-Seidel
algorithms, respectively. It was derived in the lecture notes that the number of iterations, k, required for the error to be
reduced by a factor of 10−m is:
−m
k≥ (21)
log10 (ρ(E))
In our algorithms, we required the schemes to terminate when the error on subsequent iterations was less than 10−8 , so
m = 8. Substituting m = 8 into Equation 21 as well as the spectral radii specified above, we see that the number of
iterations for the Jacobi routine should obey:
−m −8
kJ ≥ = = 26.5754
log10 (ρ(EJ )) log10 (0.5)
and the number of iterations for the Gauss-Seidel routine should obey:
−m −8
kGS ≥ = = 13.2877
log10 (ρ(EGS )) log10 (0.25)
The number of iterations displayed in Figures 5 and 6 show exactly this. For the infinity norm, the number of iterations
required for convergence is the ceiling of both of the values above (ie. kJ = 27 and kGS = 14). For the Euclidean 2-norm,
the number of iterations starts slightly above the ceiling at kJ = 28 and kGS = 15, and then increases marginally from
there. For the specific matrix A that we are examining, the spectral radii ρ(EJ ) and ρ(EGS ) are only very marginally
dependent on matrix dimension n, and this is what can be observed in the marginal increase in the number of iterations,
k, for the Euclidean 2-norm. It’s a special property of this matrix that the spectral radius is only weakly dependent on n
- in general, it will be much more variable for any A matrix.
Page 13