Analysis of Numerical Methods
3.5/5
()
About this ebook
Based on each author's more than 40 years of experience in teaching university courses, this book offers lucid, carefully presented coverage of norms, numerical solution of linear systems and matrix factoring, iterative solutions of nonlinear equations, eigenvalues and eigenvectors, polynomial approximation, numerical solution of differential equations, and more. No mathematical preparation beyond advanced calculus and elementary linear algebra (or matrix theory) is assumed. Examples and problems are given that extend or amplify the analysis in many cases.
Related to Analysis of Numerical Methods
Titles in the series (100)
First-Order Partial Differential Equations, Vol. 2 Rating: 0 out of 5 stars0 ratingsTopoi: The Categorial Analysis of Logic Rating: 5 out of 5 stars5/5Dynamic Probabilistic Systems, Volume II: Semi-Markov and Decision Processes Rating: 0 out of 5 stars0 ratingsHistory of the Theory of Numbers, Volume II: Diophantine Analysis Rating: 0 out of 5 stars0 ratingsFirst-Order Partial Differential Equations, Vol. 1 Rating: 5 out of 5 stars5/5Calculus: An Intuitive and Physical Approach (Second Edition) Rating: 4 out of 5 stars4/5A Catalog of Special Plane Curves Rating: 2 out of 5 stars2/5Chebyshev and Fourier Spectral Methods: Second Revised Edition Rating: 4 out of 5 stars4/5Numerical Methods Rating: 5 out of 5 stars5/5Infinite Series Rating: 4 out of 5 stars4/5Laplace Transforms and Their Applications to Differential Equations Rating: 5 out of 5 stars5/5Differential Geometry Rating: 5 out of 5 stars5/5Differential Forms Rating: 5 out of 5 stars5/5Analytic Inequalities Rating: 5 out of 5 stars5/5Foundations of Combinatorics with Applications Rating: 2 out of 5 stars2/5An Adventurer's Guide to Number Theory Rating: 3 out of 5 stars3/5Statistical Inference Rating: 4 out of 5 stars4/5Mathematics for the Nonmathematician Rating: 4 out of 5 stars4/5Theory of Approximation Rating: 0 out of 5 stars0 ratingsTheory of Games and Statistical Decisions Rating: 4 out of 5 stars4/5A Treatise on Probability Rating: 0 out of 5 stars0 ratingsCalculus Refresher Rating: 3 out of 5 stars3/5Applied Functional Analysis Rating: 0 out of 5 stars0 ratingsVision in Elementary Mathematics Rating: 3 out of 5 stars3/5Fourier Series and Orthogonal Polynomials Rating: 0 out of 5 stars0 ratingsApplied Multivariate Analysis: Using Bayesian and Frequentist Methods of Inference, Second Edition Rating: 0 out of 5 stars0 ratingsDifferential Forms with Applications to the Physical Sciences Rating: 5 out of 5 stars5/5Model Theory: Third Edition Rating: 3 out of 5 stars3/5Methods of Applied Mathematics Rating: 3 out of 5 stars3/5Counterexamples in Topology Rating: 4 out of 5 stars4/5
Related ebooks
Introduction to Numerical Analysis: Second Edition Rating: 4 out of 5 stars4/5An Introduction to Ordinary Differential Equations Rating: 4 out of 5 stars4/5Theoretical Numerical Analysis: An Introduction to Advanced Techniques Rating: 0 out of 5 stars0 ratingsLectures on Ordinary Differential Equations Rating: 4 out of 5 stars4/5An Introduction to the Calculus of Variations Rating: 0 out of 5 stars0 ratingsElementary Real and Complex Analysis Rating: 4 out of 5 stars4/5Nonlinear Programming: Analysis and Methods Rating: 5 out of 5 stars5/5Applied Partial Differential Equations Rating: 5 out of 5 stars5/5Nonlinear Differential Equations Rating: 0 out of 5 stars0 ratingsLebesgue Integration Rating: 0 out of 5 stars0 ratingsLaplace Transforms and Their Applications to Differential Equations Rating: 5 out of 5 stars5/5Operational Calculus in Two Variables and Its Applications Rating: 0 out of 5 stars0 ratingsCalculus of Variations Rating: 0 out of 5 stars0 ratingsBoundary Value Problems and Fourier Expansions Rating: 0 out of 5 stars0 ratingsA Course of Pure Mathematics: Third Edition Rating: 0 out of 5 stars0 ratingsComplex Variables: Second Edition Rating: 4 out of 5 stars4/5Differential Calculus and Its Applications Rating: 3 out of 5 stars3/5Boundary and Eigenvalue Problems in Mathematical Physics Rating: 4 out of 5 stars4/5Elements of Real Analysis Rating: 0 out of 5 stars0 ratingsA Second Course in Complex Analysis Rating: 0 out of 5 stars0 ratingsAn Introduction to the Mathematical Theory of Finite Elements Rating: 0 out of 5 stars0 ratingsModern Nonlinear Equations Rating: 3 out of 5 stars3/5Fourier Analysis in Several Complex Variables Rating: 0 out of 5 stars0 ratingsFoundations of Mathematical Analysis Rating: 3 out of 5 stars3/5Chebyshev and Fourier Spectral Methods: Second Revised Edition Rating: 4 out of 5 stars4/5Classic Papers in Control Theory Rating: 0 out of 5 stars0 ratingsDynamical Systems Rating: 4 out of 5 stars4/5Vector Spaces and Matrices Rating: 0 out of 5 stars0 ratingsDifferential Equations Rating: 1 out of 5 stars1/5
Mathematics For You
Basic Math & Pre-Algebra Workbook For Dummies with Online Practice Rating: 3 out of 5 stars3/5Mental Math Secrets - How To Be a Human Calculator Rating: 5 out of 5 stars5/5Basic Math & Pre-Algebra For Dummies Rating: 4 out of 5 stars4/5Algebra - The Very Basics Rating: 5 out of 5 stars5/5My Best Mathematical and Logic Puzzles Rating: 4 out of 5 stars4/5The Art of Statistical Thinking Rating: 5 out of 5 stars5/5Quantum Physics for Beginners Rating: 4 out of 5 stars4/5Calculus For Dummies Rating: 4 out of 5 stars4/5The Moscow Puzzles: 359 Mathematical Recreations Rating: 5 out of 5 stars5/5Algebra I: 1,001 Practice Problems For Dummies (+ Free Online Practice) Rating: 0 out of 5 stars0 ratingsThe Everything Everyday Math Book: From Tipping to Taxes, All the Real-World, Everyday Math Skills You Need Rating: 5 out of 5 stars5/5Math Magic: How To Master Everyday Math Problems Rating: 3 out of 5 stars3/5Pre-Calculus For Dummies Rating: 5 out of 5 stars5/5Geometry For Dummies Rating: 4 out of 5 stars4/5Game Theory: A Simple Introduction Rating: 4 out of 5 stars4/5Mental Math: Tricks To Become A Human Calculator Rating: 2 out of 5 stars2/5Algebra I Workbook For Dummies Rating: 3 out of 5 stars3/5Basic Math Notes Rating: 5 out of 5 stars5/5Calculus Made Easy Rating: 4 out of 5 stars4/5Junior Maths Olympiad: 50 problems with detailed correction Vol. 1: 50 Problems ( with detailed correction), #67 Rating: 0 out of 5 stars0 ratingsThe Little Book of Mathematical Principles, Theories & Things Rating: 3 out of 5 stars3/5The Cartoon Introduction to Calculus Rating: 5 out of 5 stars5/5Mathematics for the Nonmathematician Rating: 4 out of 5 stars4/5Calculus Essentials For Dummies Rating: 5 out of 5 stars5/5The Golden Ratio: The Divine Beauty of Mathematics Rating: 4 out of 5 stars4/5Painless Algebra Rating: 3 out of 5 stars3/5
Reviews for Analysis of Numerical Methods
3 ratings0 reviews
Book preview
Analysis of Numerical Methods - Eugene Isaacson
1
Norms, Arithmetic, and Well-Posed Computations
0.INTRODUCTION
In this chapter, we treat three topics that are generally useful for the analysis of the various numerical methods studied throughout the book. In Section 1, we give the elements of the theory of norms of finite dimensional vectors and matrices. This subject properly belongs to the field of linear algebra. In later chapters, we may occasionally employ the notion of the norm of a function. This is a straightforward extension of the notion of a vector norm to the infinite-dimensional case. On the other hand, we shall not introduce the corresponding natural generalization, i.e., the notion of the norm of a linear transformation that acts on a space of functions. Such ideas are dealt with in functional analysis, and might profitably be used in a more sophisticated study of numerical methods.
We study briefly, in Section 2, the practical problem of the effect of rounding errors on the basic operations of arithmetic. Except for calculalations involving only exactinteger arithmetic, rounding errors are invariably present in any computation. A most important feature of the later analysis of numerical methods is the incorporation of a treatment of the effects of such rounding errors.
Finally, in Section 3, we describe the computational problems that are reasonable
in some general sense. In effect, a numerical method which produces a solution insensitive to small changes in data or to rounding errors is said to yield a well-posed computation. How to determine the sensitivity of a numerical procedure is dealt with in special cases throughout the book. We indicate heuristically that any convergent algorithm is a well-posed computation.
1.NORMS OF VECTORS AND MATRICES
We assume that the reader is familiar with the basic theory of linear algebra, not necessarily in its abstract setting, but at least with specific reference to finite-dimensional linear vector spaces over the field of complex scalars. By basic theory
we of course include: the theory of linear systems of equations, some elementary theory of determinants, and the theory of matrices or linear transformations to about the Jordan normal form. We hardly employ the Jordan form in the present study. In fact a much weaker result can frequently be used in its place (when the divisor theory or invariant subspaces are not actually involved). This result is all too frequently skipped in basic linear algebra courses, so we present it as
THEOREM 1. For any square matrix A of order n there exists a non-singular matrix P, of order n, such that
B = P−1AP
is upper triangular and has the eigenvalues of A, say λj ≡ λj(A), j = 1, 2, …, n, on the principal diagonal (i.e., any square matrix is equivalent to a triangular matrix).
Proof.We sketch the proof of this result. The reader should have no difficulty in completing the proof in detail.
Let λ1 be an eigenvalue of A with corresponding eigenvector u1.† Then pick a basis for the n-dimensional complex vector space, Cn, with u1 as the first such vector. Let the independent basis vectors be the columns of a non-singular matrix P1, which then determines the transformation to the new basis. In this new basis the transformation determined by A is given by B1 ≡ P1−1 and since Au1 = λ1u1,
where A2 is some matrix of order n − 1.
The characteristic polynomial of B1 is clearly
det (λIn − B1 = (λ − λ1) det (λIn−i − A2),
where In is the identity matrix of order n. Now pick some eigenvalue λ2 of A2 and corresponding (n − 1)-dimensional eigenvector, v2; i.e.,
A2v2 = λ2v2.
With this vector we define the independent n-dimensional vectorts
Note that with the scalar α = α1υ12 + α2υ22 + … + αn−1υn−1, 2
B1û1 = λ1û1,B1û2 = λ2û2 + αû1,
and thus if we set u1 = P1û1,u2 = P1û2, then
Au1 = λ1u1,Au2 = λ2u2 + αu1.
Now we introduce a new basis of Cn with the first two vectors being and u1 and u2. The non-singular matrix P2 which determines this change of basis has u1 and u2 as its first two columns; and the original linear transformation in the new basis has the representation
where A3 is some matrix of order n − 2.
The theorem clearly follows by the above procedure; a formal inductive proof could be given.
It is easy to prove the related stronger result of Schur stated in Theorem 2.4 of Chapter 4 (see Problem 2.13(b) of Chapter 4). We turn now to the basic content of this section, which is concerned with the generalization of the concept of distance in n-dimensional linear vector spaces.
The distance
between a vector and the null vector, i.e., the origin, is a measure of the size
or length
of the vector. This generalized notion of distance or size is called a norm. In particular, all such generalizations are required to have the following properties:
(0)To each vector x in the linear space, , say, a unique real number is assigned; this number, denoted by ||x|| or N(x), is called the norm of x iff:
(i)||x|| ≥ 0 for all x ∈ and ||x|| = 0 iff x = o; where o denotes the zero vector (if ≡ Cn, then oi = 0);
(ii)||αx|| = ||α|| · ||x|| for all scalars α and all x ∈ ;
(iii)||x + y|| ≤ ||x|| + ||y||, the triangle inequality,† for all x, y ∈ .
Some examples of norms in the complex n-dimensional space Cn are
It is an easy exercise for the reader to justify the use of the notation in (1d) by verifying that
The norm, ||·||2, is frequently called the Euclidean norm as it is just the formula for distance in ordinary three-dimensional Euclidean space extended to dimension n. The norm, ||·||∞, is called the maximum norm or occasionally the uniform norm. In general, ||·||p, for p ≥ 1 is termed the p-norm.
To verify that (1) actually defines norms, we observe that conditions (0), (i), and (ii) are trivially satisfied. Only the triangle inequality, (iii), offers any difficulty. However,
and
so (1a) and (1d) define norms.
The proof of (iii) for (1b), the Euclidean norm, is based on the well known Cauchy-Schwarz inequality which states that
To prove this basic result, let |x| and |y| be the n-dimensional vectors with components |xj| and |yj|, j = 1, 2, …, n, respectively. Then for any real scalar, ξ,
But since the real quadratic polynomial in ξ above does not change sign its discriminant must be non-positive; i.e.,
However, we note that
and (2) follows from the above pair of inequalities.
Now we form
An application of the Cauchy-Schwarz inequality yields finally
N2(x + y) ≤ N2(x) + N2(y)
and so the Euclidean norm also satisfies the triangle inequality.
The statement that
is know as Minkowski’s inequality. We do not derive it here as general p-norms will not be employed further. (A proof of (3) can be found in most advanced calculus texts.)
We can show quite generally that all vector norms are continuous functions in Cn. That is,
LEMMA 1. Every vector norm, N(x), is a continuous function of x1, x2, …, xn, the components of x.
Proof.For any vectors x and δ we have by (iii)
N(x + δ) ≤ N(x) + N(δ),
so that
N(x + δ) − N(x) ≤ N(δ).
On the other hand, by (ii) and (iii),
so that
−N(δ) ≤ N(x + δ) − N(x).
Thus, in general
|N(x + δ) − N(x)| ≤ N(δ).
With the unit vectors † {ek}, any δ has the representation
Using (ii) and (iii) repeatedly implies
where
Using this result in the previous inequality yields, for any ε > 0 and all δ with N∞(δ) ≤ ε/M,
|N(x + δ) − N(x)| ≤ ε.
This is essentially the definition of continuity for a function of the n variables x1, x2, …, xn.
See Problem 6 for a mild generalization.
Now we can show that all vector norms are equivalent in the sense of
THEOREM 2. For each pair of vector norms, say N(x) and N′(x), there exist positive constants m and M such that for all x ∈ Cn:
mN′(x) ≤ N(x) ≤ MN′(x).
Proof.The proof need only be given when one of the norms is N∞, since N and N′ are equivalent if they are each equivalent to N∞. Let S ⊂ Cn be defined by
S ≡ {x | N∞(x) = 1, x ∈ Cn}
(this is frequently called the surface of the unit ball in Cn). S is a closed bounded set of points. Then since N(x) is a continuous function (see Lemma 1), we conclude by a theorem of Weierstrass that N(x) attains its minimum and its maximum on S at some points of S. That is, for some x⁰ ∈ S and x¹ ∈ S
or
0 < N(x⁰) ≤ N(x) ≤ N(x¹) < ∞
for all x ∈ S.
For any y ≠ o we see that y|N∞(y) is in S and so
or
N(x⁰)N∞(y) ≤ N(y) ≤ N(x¹)N∞(y).
The last two inequalities yield
mN∞(y) ≤ N(y) ≤ MN∞(y),
where m ≡ N(x⁰) and M ≡ N(x¹).
A matrix of order n could be treated as a vector in a space of dimension n² (with some fixed convention as to the manner of listing its elements). Then matrix norms satisfying the conditions (0)–(iii) could be defined as in (1). However, since the product of two matrices of order n is also such a matrix, we impose an additional condition on matrix norms, namely that
(iv)||AB|| ≤ ||A|| · ||B||.
With this requirement the vector norms (1) do not all become matrix norms (see Problem 2). However, there is a more natural, geometric, way in which the norm of a matrix can be defined. Thus, if x ∈ Cn and ||·|| is some vector norm on Cn, then ||x|| is the length
of x, ||Ax|| is the length
of Ax, and we define a norm of A, written as ||A|| or N(A), by the maximum relative stretching,
Note that we use the same notation, ||·||, to denote vector and matrix norms; the context will always clarify which is implied. We call (5) a natural norm or the matrix norm induced by the vector norm, ||·||. This is also known as the operator norm in functional analysis. Since for any x ≠ o we can define u = x/||x|| so that ||u|| = 1, the definition (5) is equivalent to
That is, by Problems 6 and 7, ||Au|| is a continuous function of u and hence the maximum is attained for some y, with ||y|| = 1.
Before verifying the fact that (5) or (6) defines a matrix norm, we note that they imply, for any vector x, that
There are many other ways in which matrix norms may be defined. But if (7) holds for some such norm then it is said to be compatible with the vector norm employed in (7). The natural norm (5) is essentially the smallest
matrix norm compatible with a given vector norm.
To see that (5) yields a norm, we first note that conditions (i) and (ii) are trivially verified. For checking the triangle inequality, let y be such that ||y|| = 1 and from (6),
||(A + B)|| = ||(A + B)y||.
But then, upon recalling (7),
Finally, to verify (iv), let y with ||y|| = 1 now be such that
||(AB)|| = ||(AB)y||.
Again by (7), we have
so that (5) and (6) do define a matrix norm.
We shall now determine the natural matrix norms induced by some of the vector p-norms (p = 1, 2, ∞) defined in (1). Let the nth order matrix A have elements ajk, j, k = 1, 2, …, n.
(A) The matrix norm induced by the maximum norm (1d) is
i.e., the maximum absolute row sum. To prove (8), let y be such that ||y||∞ = 1 and
||A||∞ = ||Ay||∞.
Then,
so the right-hand side of (8) is an upper bound of ||A||∞. Now if the maximum row sum occurs for, say, j = J then let x have the components
Clearly ||x||∞ = 1, if A is non-trivial, and
so (8) holds. [If A ≡ O, property (ii) implies ||A|| = 0 for any natural norm.]
(B) Next, we claim that
i.e., the maximum absolute column sum. Now let ||y||1 = 1 and be such that
||A||1 = ||Ay||1.
Then,
and the right-hand side of (9) is an upper bound of ||A||1. If the maximum is attained for m = K, then this bound is actually attained for x = ek, the Kth unit vector, since ||ek|| = 1 and
Thus (9) is established.
(C) Finally, we consider the Euclidean norm, for which case we recall the notation for the Hermitian transpose or conjugate transpose of any rectangular matrix A ≡ (aij),
A* ≡ ĀT,
i.e., if A* ≡ (bij), then bij = āij. Further, the spectral radius of any square matrix A is defined by
where λs(A) denotes the sth eigenvalue of A. Now we can state that
To prove (11), we again pick y such that ||y||2 = 1 and
||A||2 = ||Ay||2.
From (1b) it is clear that ||x||2² = x*x, since x* ≡ ( 1, 2, …, n). Therefore, from the identity (Ay)* = y*A*, we find
But since A*A is Hermitian it has a complete set of n orthonormal eigenvectors, say u1, u2, …, un, such that
The multiplication of (13b) by us* on the left yields further
λs = us*A*Aus ≥ 0.
Every vector has a unique expansion in the basis {us}. Say in particular that
and then (12) becomes, upon recalling (13),
Thus ρ¹/²(A*A) is an upper bound of ||A||2. However, using y = us, where λs = ρ(A*A), we get
and so (11) follows.
We have observed that a matrix of order n can be considered as a vector of dimension n². But since every matrix norm satisfies the conditions (0)–(iii) of a vector norm the results of Lemma 1 and Theorem 2 also apply to matrix norms. Thus we have
LEMMA 1′. Every matrix norm, ||A||, is a continuous function of the n² elements aij of A.
THEOREM 2′. For each pair of matrix norms, say ||A|| and ||A||′ there exist positive constants m and M such that for all nth order matrices A
m||A||′ ≤ ||A|| ≤ M||A||′.
The proofs of these results follow exactly the corresponding proofs for vector norms so we leave their detailed exposition to the reader.
There is frequently confusion between the spectral radius (10) of a matrix and the Euclidean norm (11) of a matrix. (To add to this confusion, ||A||2 is sometimes called the spectral norm of A.) It should be observed that if A is Hermitian, i.e., A* = A, then λs(A*A) = λs²(A) and so the spectral radius is equal to the Euclidean norm for Hermitian matrices. However, in general this is not true, but we have
LEMMA 2. For any natural norm, ||·||, and square matrix, A,
ρ(A) ≤ ||A||.
Proof.For each eigenvalue λs(A) there is a corresponding eigenvector, say us, which can be chosen to be normalized for any particular vector norm, ||us|| = 1. But then for the corresponding natural matrix norm
As this holds for all s = 1, 2, …, n, the result follows.
On the other hand, for each matrix some natural norm is arbitrarily close to the spectral radius. More precisely we have
THEOREM 3. For each nth order matrix A and each arbitrary ε > 0 a natural norm, ||A||, can be found such that
ρ(A) ≤ ||A|| ≤ ρ(A) + ε.
Proof.The left-hand inequality has been verified above. We shall show how to construct a norm satisfying the right-hand inequality. By Theorem 1 we can find a non-singular matrix P such that
PAP−1 ≡ B ≡ Λ + U
where Λ = (λj(A)δij) and U ≡ (uij) has zeros on and below the diagonal. With δ > 0, a "sufficiently small positive number, we form the diagonal matrix of order n
Now consider
C = DBD−1 = Λ + E,
where E ≡ (eij) = DUD−1 has elements
Note that the elements eij can be made arbitrarily small in magnitude by choosing δ appropriately. Also we have that
A = P−1D−1CDP.
Since DP is non-singular, a vector norm can be defined by
||x|| ≡ N2(DPx) =(x*P*D*DPx)¹/².
The proof of this fact is left to the reader in Problem 5. The natural matrix norm induced by this vector norm is of course
However, from the above form for A, we have, for any y,
||Ay|| = N2(DPAy) = N2(CDPy).
If we let z ≡ DPy, this becomes
||Ay|| = N2(Cz) = (z*C*Cz)¹/².
Now observe that
Here the term (δ) represents an nth order matrix each of whose terms is (δ).† Thus, we can conclude that
since
|z* (δ)z| ≤ n²z*z (δ) = z*z (δ).
Recalling ||y|| = N2(z), we find from ||y|| = 1 that z*z = 1. Then it follows that
For δ sufficiently small (δ) < ε.
It should be observed that the natural norm employed in Theorem 3 depends upon the matrix A as well as the arbitrary small parameter ε However, this result leads to an interesting characterization of the spectral radius of any matrix; namely,
COROLLARY. For any square matrix A
where the inf is taken over all vector norms, N(·); or equivalently
where the inf is taken over all natural norms, ||·||.
Proof.By using Lemma 2 and Theorem 3, since ε > 0 is arbitrary and the natural norm there depends upon ε, the result follows from the definition of inf.
1.1.Convergent Matrices
To study the convergence of various iteration procedures as well as for many other purposes, we investigate matrices A for which
where O denotes the zero matrix all of whose entries are 0. Any square matrix satisfying condition (14) is said to be convergent. Equivalent conditions are contained in
THEOREM 4. The following three statements are equivalent:
(a)A is convergent;
(b) , for some matrix norm;
(c)ρ(A) < 1.
Proof.We first show that (a) and (b) are equivalent. Since ||·|| is continuous, by Lemma 1′, and ||O|| = 0, then (a) implies (b). But if (b) holds for some norm, then Theorem 2′ implies there exists an M such that
||Am||∞ ≤ M||Am|| → 0.
Hence,(a) holds.
Next we show that (b) and (c) are equivalent. Note that by Theorem 2′ there is no loss in generality if we assume the norm to be a natural norm. But then, by Lemma 2 and the fact that λ(Am) = λm(A), we have
||Am ≥ ρ(Am) = ρm(A),
so that (b) implies (c). On the other hand, if (c) holds, then by Theorem 3 we can find an ε > 0 and a natural norm, say N(·), such that
N(A) ≤ ρ(A) + ε ≡ θ < 1.
Now use the property (iv) of matrix norms to get
N(Am) ≤ [N(A)]m ≤ θm
so that N(Am) = 0 and hence (b) holds.
A test for convergent matrices which is frequently easy to apply is the content of the
COROLLARY. A is convergent if for some matrix norm
||A|| < 1.
Proof.Again by (iv) we have
||A||m ≤ ||A||m
so that condition (b) of Theorem 4 holds.
Another important characterization and property of convergent matrices is contained in
THEOREM 5. (a) The geometric series
I + A + A² + A³ + …,
converges iff A is convergent.
(b) If A is convergent, then I − A is non-singular and
(I − A)−1 = I + A + A² + A³ + ···.
Proof.A necessary condition for the series in part (a) to converge is that Am = O, i.e., that A be convergent. The sufficiency will follow from part (b).
Let A be convergent, whence by Theorem 4 we know that ρ(A) < 1. Since the eigenvalues of I − A are 1 − λ(A), it follows that det (I − A) ≠ 0 and hence this matrix is non-singular. Now consider the identity
(I − A)(I + A + A² + ··· + Am) = I − Am + ¹
which is valid for all integers m. Since A is convergent, the limit as m → ∞ of the right-hand side exists. The limit, after multiplying both sides on the left by (I − A)−1, yields
(I + A + A² + ···) = (I − A)−1
and part (b) follows.
A useful corollary to this theorem is
COROLLARY. If in some natural norm, ||A|| < 1, then I − A is non-singular and
Proof.By the corollary to Theorem 4 and part (b) of Theorem 5 it follows that I − A is non-singular. For a natural norm we note that ||I|| = 1 and so taking the norm of the identity
I = (I − A)(I − A)−1
yields
Thus the left-hand inequality is established.
Now write the identity as
(I − A)−1 = I + A(I − A)−1
and take the norm to get
||(I − A)−1|| ≤ 1 + ||A|| · ||(I − A)−1||.
Since ||A|| < 1 this yields
It should be observed that if A is convergent, so is (−A), and ||A|| = ||−A||. Thus Theorem 5 and its corollary are immediately applicable to matrices of the form I + A. That is, if in some natural norm, ||A|| < 1, then
PROBLEMS, SECTION 1
1. (a) Verify that (1b) defines a norm in the linear space of square matrices of order n; i.e., check properties (i)–(iv), for ||A||E² = |aij|².
(b) Similarly, verify that (1a) defines a matrix norm, i.e., ||A|| = |aij|
2. Show by example that the maximum vector norm, η(A) = |aij|, when applied to a matrix, does not satisfy condition (iv) that we impose on a matrix norm.
3. Show that if A is non-singular, then B ≡ A*A is Hermitian and positive definite. That is, x*Bx > 0 if x ≠ o. Hence the eigenvalues of B are all positive.
4. Show for any non-singular matrix A and any matrix norm that
[Hint: ||I|| = ||II|| ≤ ||I||²; ||A−1A|| ≤ ||A−1|| · ||A||.]
5. Show that if η(x) is a norm and A is any non-singular matrix, then N(x) defined by
N(x) ≡ η(Ax),
is a (vector) norm.
6. We call η(x) a semi-norm iff η(x) satisfies all of the conditions, (0)–(iii), for a norm with condition (i) replaced by the weaker condition
(i′): η(x) ≥ 0 for all x ∈ .
We say that η(x) is non-trivial iff η(x) > 0 for some x ∈ . Prove the following generalization of Lemma 1:
LEMMA 1″. Every non-trivial semi-norm, η(x), is a continuous function of x1, x2, …, xn, the components of x. Hence every semi-norm is continuous.
7. Show that if η(x) is a semi-norm and A any square matrix, then N(x) ≡ η(Ax) defines a semi-norm.
2.FLOATING-POINT ARITHMETIC AND ROUNDING ERRORS
In the following chapters we will have to refer, on occasion, to the errors due to rounding
in the basic arithmetic operations. Such errors are inherent in all computations in which only a fixed number of digits are retained. This is, of course, the case with all modern digital computers and we consider here an example of one way in which many of them do or can do arithmetic; so-called floating-point arithmetic. Although most electronic computers operate with numbers in some kind of binary representation, most humans still think in terms of a decimal representation and so we shall employ the latter here.
Suppose the number a ≠ 0 has the exact decimal representation
where q is an integer and the d1, d2, …, are digits with d1 ≠ 0. Then the "t-digit floating-decimal representation of a, or for brevity the
floating a" used in the machine, is of the form
where δ1 ≠ 0 and δ1, δ2, …, δt are digits. The number (.δ1δ2 … δt) is called the mantissa and q is called the exponent of fl(a). There is usually a restriction on the exponent, of the form
for some large positive integers N, M. If a number a ≠ 0 has an exponent outside of this range it cannot be represented in the form (2), (3). If, during the course of a calculation, some computed quantity has an exponent q > M (called overflow) or q < − N (called underflow), meaningless results usually follow. However, special precautions can be taken on most computers to at least detect the occurrence of such over- or underflows. We do not consider these practical difficulties further; rather, we shall assume that they do not occur or are somehow taken into account.
There are two popular ways in which the floating digits δj are obtained from the exact digits, dj. The obvious chopping representation takes
Thus the exact mantissa is chopped off after the tth decimal digit to get the floating mantissa. The other and preferable procedure is to round, in which case †
and the brackets on the right-hand side indicate the integral part. The error in either of these procedures can be bounded as in
LEMMA 1. The error in t-digit floating-decimal representation of a number a ≠ 0 is bounded by
Proof.From (1), (2), and (4) we have
But since 1 ≤ d1 ≤ 9 and 0.dt + 1dt + 2 ··· ≤ 1 this implies
|a − fl(a)| ≤ 10¹ − t|a|,
which is the bound for the chopped representation. For the case of rounding we have, similarly,
We shall assume that our idealized computer performs each basic arithmetic operation correctly to 2t digits and then either rounds or chops the result to a t-digit floating number. With such operations it clearly follows from Lemma 1 that
In many calculations, particularly those concerned with linear systems, the accumulation of products is required (e.g., the inner product of two vectors). We assume that rounding (or chopping) is done after each multiplication and after each successive addition. That is,
and in general
The result of such computations can be represented as an exact inner product with, say, the ai slightly altered. We state this as
LEMMA 2. Let the floating-point inner product (7) be computed with rounding. Then if n and t satisfy
it follows that
where
Proof.By (6b) we can write
fl(akbk) = akbk(1 + φk10−t),|φk| ≤ 5,
since rounding is assumed. Similarly from (6a) and (7b) with n = k we have
where
θ1 = 0;|θk| ≤ 5,k = 2, 3, ….
Now a simple recursive application of the above yields
where we have introduced Ek by
A formal verification of this result is easily obtained by induction.
Since θ1 = 0, it follows that
(1 − 5·10−t)n − k + ² ≤ 1 + Ek ≤ (1 + 5·10−t)n − k + ²,k = 2, 3, …, n,
and
(1 − 5·10−t)n ≤ 1 + E1 ≤ (1 + 5·10−t)n.
Hence, with = 5·10−t,
But, for p ≤ n,(8) implies that p ≤ , so that
Therefore,
|Ek| ≤ (n − k + 2)10¹ − t,k = 2, 3, …, n.
Clearly for k = 1 we find, as above with k = 2, that
|E1| ≤ n·10¹ − t.
The result now follows upon setting
δak = akEk.
(Note that we could just as well have set δbk = bkEk.)
Obviously a similar result can be obtained for the error due to chopping if condition (8) is strengthened slightly; see Problem 1.
PROBLEMS, SECTION 2
1. Determine the result analogous to Lemma 2, when chopping
replaces rounding
in the statement.
[Hint: The factor 10¹ − t need only be replaced by 2·10¹ − t, throughout.]
2. (a) Find a representation for .
(b) If c1 > c2 > … > cn > 0, in what order should be calculated to minimize the effect of rounding?
3. What are the analogues of equations (6a, b, c) in the binary representation:
fl(a) = ±2q(.δ1δ2 ··· δt)
where δ1 = 1 and δj = 0 or 1?
3.WELL-POSED COMPUTATIONS
Hadamard introduced the notion of well-posed or properly posed problems in the theory of partial differential equations (see Section 0 of Chapter 9). However, it seems that a related concept is quite useful in discussing computational problems of almost all kinds. We refer to this as the notion of a well-posed computing problem.
First, we must clarify what is meant by a computing problem
in general. Here we shall take it to mean an algorithm or equivalently: a set of rules specifying the order and kind of arithmetic operations (i.e., rounding rules) to be used on specified data. Such a computing problem may have as its object, for example, the determination of the roots of a quadratic equation or of an approximation to the solution of a nonlinear partial differential equation. How any such rules are determined for a particular purpose need not concern us at present (this is, in fact, what much of the rest of this book is about).
Suppose the specified data for some particular computing problem are the quantities a1, a2, …, am, which we denote as the m-dimensional vector a. Then if the quantities to be computed are x1, x2, …, xn, we can write
where of course the n-dimensional function f(·) is determined by the rules.
Now we will define a computing problem to be well-posed iff the algorithm meets three requirements. The first requirement is that a solution,
x, should exist for the given data, a. This is implied by the notation (1). However, if we recall that (1) represents the evaluation of some algorithm it would seem that a solution (i.e., a result of using the algorithm) must always exist. But this is not true, a trivial example being given by data that lead to a division by zero in the algorithm. (The algorithm in this case is not properly specified since it should have provided for such a possibility. If it did not, then the corresponding computing problem is not well-posed for data that lead to this difficulty.) There are other, more subtle situations that result in algorithms which cannot be evaluated and it is by no means easy, a priori, to determine that x is indeed defined by (1).
The second requirement is that the computation be unique. That is, when performed several times (with the same data) identical results are obtained. This is quite invariably true of algorithms which can be evaluated. If in actual practice it seems to be violated, the trouble usually lies with faulty calculations (i.e., machine errors). The functions f(a) must be single valued to insure uniqueness.
The third requirement is that the result of the computation should depend Lipschitz continuously on the data with a constant that is not too large. That is, "small changes in the data, a, should result in only "small changes in the computed x. For example, let the computation represented by (1) satisfy the first two requirements for all data a in some set, say a ∈ D. If we change the data a by a small amount δa so that (a + δa) ∈ D, then we can write the result of the computation with the altered data as
Now if there exists a constant M such that for any δa,
we say that the computation depends Lipschitz continuously on the data. Finally, we say (1) is well-posed iff the three requirements are satisfied and (3) holds with a not too large constant, M = M(a, η), for some not too small η > 0 and all δa such that ||δa|| ≤ η. Since the Lipschitz constant M depends on (a, η) we see that a computing problem or algorithm may be well-posed for some data, a, but not for all data.
Let (a) denote the original problem which the algorithm (1) was devised to solve.
This problem is also said to be well-posed if it has a unique solution, say
y = g(a),
which depends Lipschitz continuously on the data. That is, (a) is well-posed if for all δa satisfying ||δa|| ≤ ζ, there is a constant N = N(a, ζ) such that
We call the algorithm (1) convergent iff f depends on a parameter, say (e.g., may determine the size of the rounding errors), so that for any small > 0,
for all δa such that ||δa|| ≤. Now, if (a) is well-posed and (1) is convergent, then (4) and (5) yield
Thus, recalling (3), we are led to the heuristic
OBSERVATION 1. If (a) is a well-posed problem, then a necessary condition that (1) be a convergent algorithm is that (1) be a well-posed computation.
Therefore we are interested in determining whether a given algorithm (1) is a well-posed computation simply because only such an algorithm is sure to be convergent for all problems of the form (a + δa), when (a) is well-posed and ||δa|| ≤ δ.
Similarly, by interchanging f and g in (6), we may justify
OBSERVATION 2. If is a not well-posed problem, then a necessary condition that (1) be an accurate algorithm is that (1) be a not well-posed computation.
In fact, for certain problems of linear algebra (see SubSection 1.2 of Chapter 2), it has been possible to prove that the commonly used algorithms, (1), produce approximations, x, which are exact solutions of slightly perturbed original mathematical problems. In these algebraic cases, the accuracy of the solution x, as measured in (5), is seen to depend on the well-posedness of the original mathematical problem. In algorithms, (1), that arise from differential equation problems, other techniques are developed to estimate the accuracy of the approximation. For differential equation problems the well-posedness of the resulting algorithms (1) is referred to as the stability of the finite difference schemes (see Chapters 8 and 9).
We now consider two elementary examples to illustrate some of the previous notions.
The most overworked example of how a simple change in the algorithm can affect the accuracy of a single precision calculation is the case of determining the smallest root of a quadratic equation. If in
x² + 2bx + c,
b < 0 and c are given to t digits, but |c|/b² < 10−t then the smallest root, x2, should be found from x2 = c/x1, after finding x1 = –b + in single precision arithmetic. Using
in single precision arithmetic would be disastrous!
A more sophisticated well-posedness discussion, without reference to the type of arithmetic, is afforded by the problem of determining the zeros of a polynomial
Pn(z) = zn + an − 1Zn − ¹ + … + a1z + a0.
If Qn(z) ≡ zn + bn − 1Zn − ¹ + … + b1z + b0, then the zeros of Pn(z; ) ≡ Pn(z) + Qn(z) are close
to the zeros of Pn(z). That is, in the theory of functions of a complex variable it is shown that
LEMMA. If z = z1 is a simple zero of Pn(z), then for || || sufficiently small Pn(z; ) has a zero z1( ), such that
If z1 is a zero of multiplicity r of Pn(z), there are r neighboring zeros of Pn(z; ) with
Now it is clear that in the case of a simple zero, z1, the computing problem, to determine the zero, might be well-posed if Pn′(z1) were not too small and Qn(z1) not too large, since then |z1( ) − z1|/| | would not be large for small . On the other hand, the determination of the multiple root would most likely lead to a not well-posed computing problem.
The latter example illustrates Observation (2), that is, a computing problem is not well-posed if the original mathematical problem is not well-posed. On the other hand, the example of the quadratic equation indicates how an ill-chosen formulation of an algorithm may be well-posed but yet inaccurate in single precision.
Given an > 0 and a problem (a) we do not, in general, know how to determine an algorithm, (1), that requires the least amount of work to find x so that ||x − y|| ≤ . This is an important aspect of algorithms for which there is no general mathematical theory. For most of the algorithms that are described in later chapters, we estimate the number of arithmetic operations required to find x.
PROBLEM, SECTION 3
1. For the quadratic equation
x² + 2bx + c = 0,
find the small root by using single precision arithmetic in the iterative schemes
and
If your computer has a mantissa with approximately t = 2p digits, use
c = 1, b = −10p
for the two initial values
Which scheme gives the smaller root to approximately t digits with the smaller number of iterations? Which scheme requires less work?
† Unless otherwise indicated, boldface type denotes column vectors. For example, an n-dimensional vector uk has the components uik; i.e.,
† For complex numbers x and y the elementary inequality |x + y| ≤ |x| + |y| expresses the fact that the length of any side of a triangle is not greater than the sum of the lengths of the other two sides.
† ek has the components eik, where eik = 0, i ≠ k; ekk = 1.
† A quantity, say f, is said to be (δ), or briefly f = (δ) iff for some constants K ≥ 0 and δ0 > 0,
|f| ≤ K |δ|,for |δ| ≤ δ0.
† For simplicity we are neglecting the special case that occurs when d1 = d2 = … = dt = 9 and dt + 1 ≥ 5. Here we would increase the exponent q in (2) by unity and set δ1 = 1, δj = 0, j > 1. Note that when dt + 1 = 5, if we were to round up iff dt is odd, then an unbiased rounding procedure would result. Some electronic computers employ an unbiased rounding procedure (in a binary system).
2
Numerical Solution of
Linear Systems and Matrix Inversion
0.INTRODUCTION
Finding the solution of a linear algebraic equation system of large
order and calculating the inverse of a matrix of large
order can be difficult numerical tasks. While in principle there are standard methods for solving such problems, the difficulties are practical and stem from
(a) the labor required in a lengthy sequence of calculations, and
(b) the possible loss of accuracy in such lengthy calculations performed with a fixed number of decimal places.
The first difficulty renders manual computation impractical and the second limits the applicability of high speed digital computers with fixed word
length. Thus to determine the feasibility of solving a particular problem with given equipment, several questions should be answered:
(i)How many arithmetic operations are required to apply a proposed method?
(ii)What will be the accuracy of a solution to be found by the proposed method (a priori estimate)?
(iii)How can the accuracy of the computed answer be checked (a posteriori estimate)?
The first question can frequently† be answered in a straightforward manner and this is done, by means of an operational count,
for most of the methods in this chapter. The third question can be easily answered if we have a bound for the norm of the inverse matrix. We therefore indicate, in SubSection 1.3, how such a bound may be obtained if we have an approximate inverse. However, the second question has only been recently answered for some methods. After discussing the notions of well-posed problem
and condition number
of a matrix, we give an account of Wilkinson’s a priori estimate for the Gaussian elimination method in SubSection 1.2. This treatment, in Section 1, of the Gaussian elimination method is followed, in Section 2, by a discussion of some modifications of the procedure. Direct factorization methods, which include Gaussian elimination as a special case, are described in Section 3. Iterative methods and techniques for accelerating them are studied in the remaining three sections.
The matrix inversion problem may be formulated as follows: Given a square matrix of order n,
find its inverse, i.e., a square matrix of order n, say A−1, such that
Here I is the nth order identity matrix whose elements are given by the Kronecker delta:
It is well known that this problem has one and only one solution iff the determinant of A is non-zero (det A ≠ 0), i.e., iff A is non-singular.
The problem of solving a general linear system is formulated as follows: Given a square matrix A and an arbitrary n-component column vector f, find a vector x which satisfies
or, in component form,
Again it is known that this problem has a solution which is unique for every inhomogeneous term f, iff A is non-singular. [If A is singular the system (4) will have a solution only for special vectors f and such a solution is not unique. The numerical solution of such singular
problems is briefly touched on