Vector Extrapolation Methods With Applications Avram Sidi download
Vector Extrapolation Methods With Applications Avram Sidi download
https://ebookbell.com/product/vector-extrapolation-methods-with-
applications-avram-sidi-22052016
https://ebookbell.com/product/vector-mechanics-for-engineers-statics-
and-dynamics-instructor-solution-manual-ferdinand-p-beer-e-russell-
johnston-jr-etc-11th-edition-ferdinand-p-beer-46430678
https://ebookbell.com/product/vector-analysis-for-computer-
graphics-1st-john-vince-47449036
https://ebookbell.com/product/vectorborne-diseases-david-claborn-
sujit-bhattacharya-syamal-roy-48458320
https://ebookbell.com/product/vector-semantics-andrs-kornai-48702772
Vector Calculus Illustrated Peter Baxandall Hans Liebeck
https://ebookbell.com/product/vector-calculus-illustrated-peter-
baxandall-hans-liebeck-48954866
https://ebookbell.com/product/vector-analysis-versus-vector-calculus-
instructor-solution-manual-solutions-1st-edition-antonio-
galbis-49151616
https://ebookbell.com/product/vector-calculus-6th-edition-jerrold-e-
marsden-anthony-tromba-50721468
Vector Mechanics For Engineers Statics And Dynamics 12th Edition Beer
https://ebookbell.com/product/vector-mechanics-for-engineers-statics-
and-dynamics-12th-edition-beer-51376592
https://ebookbell.com/product/vector-analysis-for-computer-
graphics-2nd-edition-john-vince-51578756
Vector Extrapolation Methods
with Applications
Editor-in-Chief
Donald Estep Chen Greif J. Nathan Kutz
Colorado State University University of British Columbia University of Washington
Jan S. Hesthaven Ralph C. Smith
Editorial Board
Ecole Polytechnique Fédérale North Carolina State University
Daniela Calvetti de Lausanne
Charles F. Van Loan
Case Western Reserve University
Johan Hoffman Cornell University
Paul Constantine KTH Royal Institute of Technology
Karen Willcox
Colorado School of Mines
David Keyes Massachusetts Institute
Omar Ghattas Columbia University of Technology
University of Texas at Austin
Series Volumes
Sidi, Avram, Vector Extrapolation Methods with Applications
Borzì, A., Ciaramella, G., and Sprengel, M., Formulation and Numerical Solution of Quantum Control Problems
Benner, Peter, Cohen, Albert, Ohlberger, Mario, and Willcox, Karen, editors, Model Reduction and
Approximation: Theory and Algorithms
Kuzmin, Dmitri and Hämäläinen, Jari, Finite Element Methods for Computational Fluid Dynamics:
A Practical Guide
Rostamian, Rouben, Programming Projects in C for Students of Engineering, Science, and Mathematics
Smith, Ralph C., Uncertainty Quantification: Theory, Implementation, and Applications
Dankowicz, Harry and Schilder, Frank, Recipes for Continuation
Mueller, Jennifer L. and Siltanen, Samuli, Linear and Nonlinear Inverse Problems with Practical
Applications
Shapira, Yair, Solving PDEs in C++: Numerical Methods in a Unified Object-Oriented Approach,
Second Edition
Borzì, Alfio and Schulz, Volker, Computational Optimization of Systems Governed by Partial
Differential Equations
Ascher, Uri M. and Greif, Chen, A First Course in Numerical Methods
Layton, William, Introduction to the Numerical Analysis of Incompressible Viscous Flows
Ascher, Uri M., Numerical Methods for Evolutionary Differential Equations
Zohdi, T. I., An Introduction to Modeling and Simulation of Particulate Flows
Biegler, Lorenz T., Ghattas, Omar, Heinkenschloss, Matthias, Keyes, David, and van Bloemen Waanders, Bart,
editors, Real-Time PDE-Constrained Optimization
Chen, Zhangxin, Huan, Guanren, and Ma, Yuanle, Computational Methods for Multiphase Flows
in Porous Media
Shapira, Yair, Solving PDEs in C++: Numerical Methods in a Unified Object-Oriented Approach
10 9 8 7 6 5 4 3 2 1
All rights reserved. Printed in the United States of America. No part of this book may be
reproduced, stored, or transmitted in any manner without the written permission of the
publisher. For information, write to the Society for Industrial and Applied Mathematics,
3600 Market Street, 6th Floor, Philadelphia, PA 19104-2688 USA.
Trademarked names may be used in this book without the inclusion of a trademark
symbol. These names are used in an editorial context only; no infringement of trademark
is intended.
is a registered trademark.
Preface xi
v
vi Contents
5 Epsilon Algorithms 99
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.2 SEA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.3 VEA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.4 TEA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.5 Implementation of epsilon algorithms in cycling mode . . . . . . . 114
5.6 ETEA: An economical implementation of TEA . . . . . . . . . . . 115
5.7 Comparison of epsilon algorithms with polynomial methods . . . 117
IV Appendices 359
A QR Factorization 361
A.1 Gram–Schmidt orthogonalization (GS) and QR factorization . . . 361
A.2 Modified Gram–Schmidt orthogonalization (MGS) . . . . . . . . . 364
A.3 MGS with reorthogonalization . . . . . . . . . . . . . . . . . . . . . . 365
Bibliography 405
Index 427
Preface
xi
xii Preface
velopments that took place in vector extrapolation methods up to the 1980s. Both
books treat vector extrapolation methods as part of the general topic of convergence
acceleration. The more recent book by Gander, Gander, and Kwok [93] briefly dis-
cusses a few of these methods as tools of scientific computing. So far, however, there
has not been a book that is fully dedicated to the subject of vector extrapolation meth-
ods and their applications. The present book will hopefully help to fill this void.
The main purpose of this book is to present a unified and systematic account of the
existing literature, old and new, on the theory and applications of vector extrapolation
methods that is as comprehensive and up-to-date as possible. In this account, I include
much of the original and relevant literature that deals with methods of practical impor-
tance whose effectiveness has been amply verified in various surveys and comparative
studies. I discuss the algebraic, computational/algorithmic, and analytical aspects of
the methods covered. The discussions are rigorous, and complete proofs are provided
in most places to make the reading flow better. I believe this treatment will help the
reader understand the thought process leading to the development of the individual
methods, why these methods work, how they work, and how they should be applied
for best results. Inevitably, the contents and the perspective of this book reflect my
personal interests and taste. Therefore, I apologize to those colleagues whose work
has not been covered.
Following the introduction and a review of substantial general and numerical lin-
ear algebra background in Chapter 0, which is needed throughout, this book is divided
into four parts:
(i) Part I reviews several vector extrapolation methods that are in use and that have
proved to be efficient convergence accelerators. These methods are divided into
two groups: (i) polynomial-type methods and (ii) epsilon algorithms.
Chapter 1 presents the development and algebraic properties of four polynomial-
type methods: minimal polynomial extrapolation (MPE), reduced rank extrapola-
tion (RRE), modified minimal polynomial extrapolation (MMPE), and the most re-
cent singular value decomposition–based minimal polynomial extrapolation (SVD-
MPE). Chapters 2 and 4 present computationally efficient and numerically stable
algorithms for these methods. The algorithms presented are also very econom-
ical as far as computer storage requirements are concerned; this issue is crucial
since most major applications of vector extrapolation methods are to very high
dimensional problems. Chapter 3 discusses some interesting relations between
MPE and RRE that were discovered recently. (Note that MPE and RRE are
essentially different from each other.)
Chapter 5 covers the three known epsilon algorithms: the scalar epsilon algo-
rithm (SEA), the vector epsilon algorithm (VEA), and the topological epsilon algo-
rithm (TEA).
Chapters 6 and 7 present unified convergence and convergence acceleration theo-
ries for MPE, RRE, MMPE, and TEA. Technically speaking, the contents of the
two chapters are quite involved. In these chapters, I have given detailed proofs
of some of the convergence theorems. I have decided to present the complete
proofs as part of this book since their techniques are quite general and are im-
mediately applicable in other problems as well. For example, the techniques of
proof developed in Chapter 6 have been used to prove some of the results pre-
sented in Chapters 12, 13, 14, and 16. (Of course, readers who do not want to
Preface xiii
spend their time on the proofs can simply skip them and study only the state-
ments of the relevant convergence theorems and the remarks and explanations
that follow the latter.)
Chapter 8 discusses some interesting recursion relations that exist among the
vectors obtained from each of the methods MPE, RRE, MMPE, and TEA.
(ii) Part II reviews some of the developments related to Krylov subspace methods for
matrix problems, a most interesting topic of numerical linear algebra, to which
vector extrapolation methods are closely related.
Chapter 9 deals with Krylov subspace methods for solving linear systems since
these are related to MPE, RRE, and TEA when the latter are applied to vector
sequences that are generated by fixed-point iterative procedures for linear sys-
tems. In particular, it reviews the method of Arnoldi that is also known as the
full orthogonalization method (FOM), the method of generalized minimal residuals
(GMR), and the method of Lanczos. It discusses the method of conjugate gradi-
ents (CG) and the method of conjugate residuals (CR) in a unified manner. It also
discusses the biconjugate gradient algorithm (Bi-CG).
Chapter 10 deals with Krylov subspace methods for solving matrix eigenvalue
problems. It reviews the method of Arnoldi and the method of Lanczos for these
problems. These methods are also closely related to MPE and TEA.
(iii) Part III reviews some of the applications of vector extrapolation methods.
Chapter 11 presents some nonstandard uses for computing eigenvectors corre-
sponding to known eigenvalues (such as the PageRank of the Google Web ma-
trix) and for computing derivatives of eigenpairs. Another interesting applica-
tion concerns multidimensional scaling.
Chapter 12 deals with applying vector extrapolation methods to vector-valued
power series. When MPE, MMPE, and TEA are applied to sequences of vector-
valued polynomials that form the partial sums of vector-valued Maclaurin series,
they produce vector-valued rational approximations to the sums of these series.
Chapter 12 discusses the properties of these rational approximations. Chapter
13 presents additional methods based on ideas from Padé approximants for ob-
taining rational approximations from vector-valued power series. Chapter 14
gives some interesting applications of vector-valued rational approximations.
Chapter 15 briefly presents some of the current knowledge about vector gener-
alizations of some scalar extrapolation methods, a subject that has not yet been
explored fully.
Chapter 16 discusses vector-valued rational interpolation procedures in the com-
plex plane that are closely related to the methods developed in Chapter 12.
the Rayleigh quotient inverse power method with variable shifts for the eigen-
value problem, a subject not treated in most books on numerical linear algebra.
A well-documented and well-tested FORTRAN 77 code for implementing MPE
and RRE in a unified manner is included in Appendix H.
The informed reader may pay attention to the fact that I have not included matrix
extrapolation methods in this book, even though they are definitely related to vector
extrapolation methods; I have pointed out some of the relevant literature on this topic,
however. In my discussion of Krylov subspace methods for linear systems, I have also
excluded the topic of semi-iterative methods, with Chebyshev iteration being the most
interesting representative. This subject is covered very extensively in the various books
and papers referred to in the relevant chapters. The main reason for both omissions,
which I regret very much, was the necessity to keep the size of the book in check.
Finally, I have not included any numerical examples since there are many of these in the
existing literature; the limitation I imposed on the size of the book was again the reason
for this omission. Nevertheless, I have pointed out some papers containing numerical
examples that illustrate the theoretical results presented in the different chapters.
I hope this book will serve as a reference for the more mathematically inclined re-
searchers in the area of vector extrapolation methods and for scientists and engineers
in different computational disciplines and as a textbook for students interested in un-
dertaking to study the subject seriously. Most of the mathematical background needed
to cope with the material is summarized in Chapter 0 and the appendices, and some is
provided as needed in the relevant chapters.
Before closing, I would like to express my deepest gratitude and appreciation to
my dear friends and colleagues Dr. William F. Ford of NASA Lewis Research Center
(today, NASA John H. Glenn Research Center at Lewis Field) and Professor David A.
Smith of Duke University, who introduced me to the general topic of vector extrap-
olation methods. Our fruitful collaboration began after I was invited by Dr. Ford to
Lewis Research Center to spend a sabbatical there during 1981–1983. Our first joint
work was summarized very briefly in the NASA technical memorandum [297] and
presented at the Thirtieth Anniversary Meeting of the Society for Industrial and Ap-
plied Mathematics, Stanford, California, July 19–23, 1982. This work was eventually
published as the NASA technical paper [298] and, later, as the journal paper [299]. I
consider it a privilege to acknowledge their friendship and their influence on my career
in this most interesting topic.
Lastly, I owe a debt of gratitude to my dear wife Carmella for her constant patience,
understanding, support, and encouragement while this book was being written. I ded-
icate this book to her with love.
Avram Sidi
Technion, Haifa
December 2016
Chapter 0
1
2 Chapter 0. Introduction and Review of Linear Algebra
Now, a good way to approach and motivate vector extrapolation methods is within
the context of the fixed-point iterative solution of systems of equations. Because the
actual development of these methods proceeds via the solution of linear systems, we
devote the next section to a brief review of linear algebra, where we introduce the
notation that we employ throughout this work and state some important results from
matrix theory that we recall as we go along. Following these, in Sections 0.3 and 0.4 of
this chapter, we review the essentials of the fixed-point iterative solution of nonlinear
and, especially, linear systems in some detail. We advise the reader to study this chapter
with some care and become familiar with its contents before proceeding to the next
chapters.
Subspaces
• A subset of a vector space is a subspace of if it is a vector space itself.
0.2. Some linear algebra notation and background 3
• If and are subspaces of the vector space , then the set ∩ is a subspace
of , and we have
+ = { z ∈ : z = x + y, x ∈ , y ∈ }.
+ is a subspace of , and
We will denote the standard basis vectors in s by e i . Thus e i has one as its ith com-
ponent, the remaining components being zero. We will also denote by e the vector
whose components are all unity.
We denote the transpose of x and the Hermitian conjugate of x, both row vectors,
by x T and x ∗ , respectively, and these are given as
(AT )i j = (A) j i ∀ i, j .
4 Chapter 0. Introduction and Review of Linear Algebra
A∗ = AT = AT .
Hermitian part of A. The symmetric part and the skew-symmetric part of a real
square matrix can be defined analogously.
If A = [ai j ]1≤i , j ≤s is a square matrix, we will define the matrix diag (A) ∈ s ×s
via
diag (A) = diag (a11 , a22 , . . . , a s s ),
and we will define tr(A), the trace of A, as
s
tr(A) = ai i .
i =1
• A matrix A ∈ r ×s whose elements below (above) the main diagonal are all zero
is said to be upper triangular (lower triangular).2
• If A = [ai j ]1≤i , j ≤s and B = [bi j ]1≤i , j ≤s are upper (lower) triangular, then AB =
C = [ci j ]1≤i , j ≤s is upper (lower) triangular too, and ci i = ai i bi i for all i.
Partitioning of matrices
We will make frequent use of matrix partitionings in different places.
• If A = [ai j ]1≤i ≤r ∈ r ×s , then we will denote the ith row and j th column of A
1≤ j ≤s
by a Ti and a j , respectively; that is,
Matrix-vector multiplication
Let x = [x (1) , . . . , x (s ) ]T and let A = [ai j ]1≤i ≤r ∈ r ×s , with the columnwise parti-
1≤ j ≤s
tioning in (0.3). Then z = Ax can be computed as follows:
• Row version: z (i ) = sj =1 ai j x ( j ) , i = 1, . . . , r.
• Column version: z = sj =1 x ( j ) a j .
2 When r < s (r > s ), A is said to be upper trapezoidal (lower trapezoidal) too.
3
Note that if x = [x (1) , . . . , x (r ) ]T ∈ r and y = [y (1) , . . . , y (s ) ]T ∈ s , then Z = x y T ∈ r ×s , with
zi j = (Z)i j = x (i ) y ( j ) .
6 Chapter 0. Introduction and Review of Linear Algebra
• We also have
(A∗A) = (A∗ ), (A∗A) = (A).
• For A ∈ r ×s , the orthogonal complement of (A), denoted by (A)⊥ , is defined
as
(A)⊥ = {y ∈ r : y ∗ x = 0 for all x ∈ (A)}.
Then every vector in r is the sum of a vector in (A) and another vector in
(A)⊥ ; that is,
r = (A) ⊕ (A)⊥ .
In addition,
(A)⊥ = (A∗ ).
0.2. Some linear algebra notation and background 7
• A ∈ s ×s has exactly s eigenvalues that are the roots of its characteristic polyno-
mial R(λ), which is defined by
R(λ) = det(λI − A).
Note that R(λ) is of degree exactly s with leading coefficientq one. Thus, if
λ1 , . . . , λq are the distinct roots of R(λ), then R(λ) = i =1
(λ − λi ) ri , where
q
λi = λ j if i = j and i =1 ri = s. ri is called the algebraic multiplicity of λi .
The eigenvectors of A∗A (of AA∗ ) are called the right (left) singular vectors of A.
Thus, σi (A) = |λi (A)| if A is normal.
4
When A is nondiagonalizable or defective, there is a nonsingular matrix V such that the matrix J =
V −1 AV , called the Jordan canonical form of A, is almost diagonal. We will deal with this general case in
detail later via Theorem 0.1 in Section 0.4.
0.2. Some linear algebra notation and background 9
Vector norms
• We use · to denote vector norms in s . Vector norms satisfy the following
conditions:
1. x ≥ 0 for all x ∈ s ; x = 0 if and only if x = 0.
2. γ x = |γ | x for every γ ∈ and x ∈ s .
3. x + y ≤ x + y for every x, y ∈ s .
• With x = [x (1) , . . . , x (s ) ]T , the l p -norm in s is defined via
s 1/ p
(i ) p
x p = |x | , 1 ≤ p < ∞; x∞ = max |x (i ) |, p = ∞.
i
i =1
• If A ∈ r ×s and x ∈ s , then
Ax2
min σi (A) ≤ ≤ max σi (A).
i x2 i
Matrix norms
• Matrix norms will also be denoted by ·. Since matrices in r ×s can be viewed
also as vectors in r s (that is, the matrix space r ×s is isomorphic to the vector
space r s ), we can define matrix norms just as we define vector norms, by the
following three conditions:
1. A ≥ 0 for all A ∈ r ×s . A = 0 if and only if A = O.
2. γA = |γ | A for every γ ∈ and A ∈ r ×s .
3. A + B ≤ A + B for every A, B ∈ r ×s .
The matrix norms we will be using are generally the natural norms (or induced
norms or subordinate norms) that are defined via
Ax(a)
A(a,b ) = max ,
x=0 x(b )
where A ∈ r ×s and x ∈ s , and Ax(a) and x(b ) are the vector norms in r
and s , respectively. Note that this maximum exists and is achieved for some
nonzero vector x 0 . We say that the matrix norm · (a,b ) is induced by, or is
subordinate to, the vector norms · (a) and · (b ) . (Here, · (a) and · (b )
are not to be confused with the l p -norms.) With this notation, induced matrix
norms satisfy the following fourth condition, in addition to the above three:
4. AB(a,c) ≤ A(a,b ) B(b ,c) , with A ∈ r ×s and B ∈ s ×t .
There are matrix norms that are not natural norms and that satisfy the fourth
condition. Matrix norms, whether natural or not, that satisfy the fourth condi-
tion are said to be multiplicative.
In addition,
• When A is a square matrix and the vector norms · (a) and · (b ) are the same,
we let A(a) stand for A(a,a) . In this case, we have
Also, we say that the matrix norm · (a) is induced by the vector norm · (a) .
k
A1 · · · Ak ≤ Ai and I ≥ 1.
i =1
0.2. Some linear algebra notation and background 11
• A matrix norm that is multiplicative but not natural and that is used frequently
in applications is the Frobenius or Schur norm. For A ∈ r ×s , A = [ai j ]1≤i ≤r
1≤ j ≤s
as in (0.2), this norm is defined by
r s
AF = |ai j |2 = tr(A∗A) = tr(AA∗ ).
i =1 j =1
We also have
UAV F = AF if U ∈ r ×r and V ∈ s ×s are unitary.
• The natural matrix norms and spectral radii for square matrices satisfy
ρ(A) ≤ A.
In addition, given ε > 0, there exists a vector norm · that depends on A and ε
such that the matrix norm induced by it satisfies
A ≤ ρ(A) + ε.
The two inequalities between A and ρ(A) become an equality when we have
the following:
(i) A is normal and A = A2 , since A2 = ρ(A) in this case.
(ii) A = diag (d1 , . . . , d s ) and A = A p , with arbitrary p, for, in this case,
• The natural matrix norms and spectral radii for square matrices also satisfy
ρ(A) = lim Ak 1/k .
k→∞
12 Chapter 0. Introduction and Review of Linear Algebra
Condition numbers
• The condition number of a nonsingular square matrix A relative to a natural
matrix norm · is defined by
• If the natural norm is induced by the vector l p -norm, we denote κ(A) by κ p (A).
For l2 -norms, we have the following:
σmax (A)
κ2 (A) = , A arbitrary, and
σmin (A)
|λ (A)|
κ2 (A) = max , A normal.
|λmin (A)|
Here σmin (A) and σmax (A) are the smallest and largest singular values of A. Sim-
ilarly, λmin (A) and λmax (A) are the smallest and largest eigenvalues of A in mod-
ulus.
maxi |ai i |
κ p (A) ≥ , 1 ≤ p ≤ ∞.
mini |ai i |
Inner products
• We will use (· , ·) to denote inner products (or scalar products).
Thus, (x, y), with x, y ∈ s , denotes an inner product in s . Inner products
satisfy the following conditions:
1. (y, x) = (x, y) for all x, y ∈ s .
2. (x, x) ≥ 0 for all x ∈ s and (x, x) = 0 if and only if x = 0.
3. (αx, βy) = αβ(x, y) for x, y ∈ s and α, β ∈ .
4. (x, βy + γ z ) = β(x, y) + γ (x, z ) for x, y, z ∈ s and β, γ ∈ .
• We say that the vectors x and y are orthogonal to each other if (x, y) = 0.
• For any inner product (· , ·) and vector norm · induced by it, and any two
vectors x, y ∈ s , we have
Equality holds if and only if x and y are linearly dependent. This is a more
general version of the Cauchy–Schwarz inequality mentioned earlier.
0.2. Some linear algebra notation and background 13
• The standard Euclidean inner product in s and the vector norm induced by it
are defined by, respectively,
(x, y) = x ∗ y ≡ 〈x, y〉 and z = 〈z , z 〉 ≡ z 2 .
The standard Euclidean inner product is used to define the angle ∠(x, y) between
two nonzero vectors x and y as follows:
| 〈x, y〉 | π
cos ∠(x, y) = , 0 ≤ ∠(x, y) ≤ .
x2 y2 2
• The most general inner product in s and the vector norm induced by it are,
respectively,
(x, y) = x ∗ M y ≡ (x, y)M and z = (z , z )M ≡ z M ,
A∗Ax = A∗ b.
x = (A∗A)−1A∗ b.
ψ(x) = 0, ψ : N → N , (0.4)
and
T
ψ(x) = ψ1 (x), . . . , ψN (x) ; ψi (x) = ψi x (1) , . . . , x (N ) scalar functions.
x m+1 = f (x m ), m = 0, 1, . . . , (0.5)
with
T
f (x) = f1 (x), . . . , fN (x) ; fi (x) = fi x (1) , . . . , x (N ) scalar functions.
f (x) = x + C(x)ψ(x),
where
∂ fi
fi , j (s) = , i, j = 1, . . . , N .
∂ x ( j ) x=s
Consequently,
where F (x) is the Jacobian matrix of the vector-valued function f (x) given as
⎡ ⎤
f1,1 (x) f1,2 (x) ··· f1,N (x)
⎢ f2,1 (x) f2,2 (x) ··· f2,N (x) ⎥
⎢ ⎥
F (x) = ⎢ . .. .. ⎥ . (0.7)
⎣ .. . . ⎦
fN ,1 (x) fN ,2 (x) ··· fN ,N (x)
16 Chapter 0. Introduction and Review of Linear Algebra
By (0.8), we realize that the vectors x m and x m+1 satisfy the approximate equality
That is, for all large m, the sequence {x m } behaves as if it were being generated by an
N -dimensional linear system of the form (I − T )x = d through
x m+1 = T x m + d, m = 0, 1, . . . , (0.9)
where T = F (s) and d = [I − F (s)]s. This suggests that we should study those se-
quences {x m } that arise from linear systems of equations to derive and study vector
extrapolation methods. We undertake this task in the next section.
the rate of convergence (to s) of the sequence
Now, {x m } above is determined by
ρ F (s) , the spectral radius of F (s). It is known that ρ F (s) < 1 must hold for conver-
gence to take place and that, the closer ρ F (s) is to zero, the faster the convergence.
The rate of convergence deteriorates as ρ F (s) becomes closer to one, however.
As an example, let us consider the cases in which (0.4) and (0.5) arise from finite-
difference or finite-element discretizations of continuum problems. For s [the solution
to (0.4) and (0.5)] to be a reasonable approximation to the solution of the continuum
problem, the mesh size of the discretization must be small enough. However, a small
mesh size means a large N . In addition, as the mesh size tends to zero, hence N →
∞, generally, ρ F (s) tends to one, as can be shown rigorously in some cases. All
this means that, when the mesh size decreases, not only does the dimension of the
problem increase, the convergence of the fixed-point method in (0.5) deteriorates as
well. As mentioned above, this problem of slow convergence can be treated efficiently
via vector extrapolation methods.
Remark: From our discussion of the nature of the iterative methods for nonlinear
systems, it is clear that the vector-valued functions ψ(x) and f (x) above are assumed
to be differentiable at least twice in a neighborhood of the solution s.
M x m+1 = N x m + b, m = 0, 1, . . . . (0.14)
x m+1 = T x m + d, m = 0, 1, . . . ; T = M −1 N , d = M −1 b. (0.15)
Note that the matrix T cannot have one as one of its eigenvalues since A = M (I − T )
and A is nonsingular. [Since we have to solve the equations in (0.14) many times, we
need to choose M such that the solution of these equations is much less expensive than
the solution of (0.10).]
Now, we would like the sequence {x m } to converge to s. The subject of conver-
gence can be addressed in terms of ρ(T ), the spectral radius of T , among others. Ac-
tually, we have the following result.
Theorem 0.1. Let s be the (unique) solution to the system x = T x + d, where T does not
have one as one of its eigenvalues, and let the sequence {x m } be generated as in
x m+1 = T x m + d, m = 0, 1, . . . , (0.16)
starting with some initial vector x 0 . A necessary and sufficient condition for {x m } to
converge to s from arbitrary x 0 is ρ(T ) < 1.
s = T s + d. (0.17)
x m − s = T m (x 0 − s), m = 0, 1, . . . . (0.19)
T m = (V J V −1 ) m = V J m V −1 , (0.23)
where ⎡ ⎤
[J r1 (λ1 )] m
⎢ [J r2 (λ2 )] m ⎥
⎢ ⎥
Jm =⎢
⎢ .. ⎥.
⎥ (0.24)
⎣ . ⎦
[J rq (λq )] m
It is clear that lim m→∞ J m = O implies that lim m→∞ T m = O by (0.23). Conversely,
by J m = V −1 T m V , lim m→∞ T m = O implies that limm→∞ J m = O, which implies
that lim m→∞ [J ri (λi )] m = O for each i.6 Therefore, it is enough to study [J r (λ)] m .
First, when r = 1,
[J 1 (λ)] m = [λ m ]. (0.25)
As a result, lim m→∞ [J 1 (λ)] m = O if and only if |λ| < 1.
For r > 1, let us write
⎡ ⎤
0 1
⎢ .. ⎥
⎢ 0 . ⎥
J r (λ) = λI r + E r , Er = ⎢
⎢
⎥ ∈ r ×r ,
⎥ (0.26)
⎣ ..
. 1⎦
0
so that
m
m m−i i
[J r (λ)] m = (λI r + E r ) m = λ m I r + λ Er. (0.27)
i =1 i
Now, observe that, when k < r , the only nonzero elements of E kr are (E kr )i ,k+i = 1,
i = 1, . . . , r − k, and that E rr = O.7 Then
Em
r =O if m ≥ r , (0.28)
6
For a sequence of matrices {B m }∞ m=0 ∈
r ×s
, by lim m→∞ B m = O we mean that B m → O as m → ∞
entrywise, that is, lim m→∞ (B m )i j = 0, 1 ≤ i ≤ r, 1 ≤ j ≤ s , simultaneously.
7
For example,
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 1 0 0 0 0 1 0 0 0 0 1
⎢0 0 1 0⎥ ⎢0 0 0 1⎥ ⎢0 0 0 0⎥
E4 = ⎢ ⎥ 2 ⎢ ⎥
⎣0 0 0 1⎦ , E 4 = ⎣0 0 0 0⎦ , E 4 = ⎣0 0 0 0⎦ , E 4 = O
3 ⎢ ⎥ 4
0 0 0 0 0 0 0 0 0 0 0 0
.
0.4. Fixed-point iterative methods for linear systems 19
r −1
m m m m−i i
[J r (λ)] = λ I r + λ Er
i =1 i
⎡ m m m−1 m m−2 m m−r +1 ⎤
λ 1
λ m2 λ m−1 ··· rm−1
λ
⎢ λ m
λ ··· λ m−r +2 ⎥
⎢ 1 rm−2 ⎥
⎢ λ m
··· λ m−r +3 ⎥
=⎢ r −3 ⎥. (0.29)
⎢ .. .. ⎥
⎣ . . ⎦
λm
By the fact that ml = 0 when l > m, note that (0.29) is valid for all m = 0, 1, . . . . In
addition, because
k
m m(m − 1) · · · (m − k + 1) 1
= = cj m j , ck = ,
k k! j =0 k!
m−r +1
for λ = 0, the most dominant entry as m → ∞ in [J r (λ)] m is r m
−1
λ , which is
r −1 m
O(m λ ). Therefore, lim m→∞ [J r (λ)] = O if and only if |λ| < 1.
m
Remarks:
1. Jordan blocks of size ri > 1 occur when the matrix T is not diagonalizable.
(Nondiagonalizable matrices are also said to be defective.)
E kr = O if k < r ,
[J r (λ) − λI r ]k = (0.30)
O if k ≥ r .
4. It is clear from (0.19) that, the faster T m tends to O, the faster the convergence
of {x m } to s. The rate of convergence of T m to O improves as ρ(T ) decreases,
as is clear from (0.24) and (0.29).
T v i = λi v i . (0.31)
Then
N
xm − s = αi λim v i , m = 1, 2, . . . . (0.33)
i =1
V = [V 1 |V 2 | · · · |V q ], V i ∈ N ×ri , i = 1, . . . , q, (0.34)
V i = [ v i 1 | v i 2 | · · · | v i ri ] ∈ N ×ri , i = 1, . . . , q. (0.35)
T v i 1 = λi v i 1 , ri ≥ 1; T v i j = λi v i j + v i , j −1 , j = 2, . . . , ri , ri > 1. (0.36)
k v i , j −k = 0 if k < j ,
T − λi I v i j = (0.37)
0 if k = j .
0.4. Fixed-point iterative methods for linear systems 21
Then there exist a vector z m associated with the zero eigenvalues and vector-valued poly-
nomials p i (m) associated with the respective nonzero eigenvalues λi , given by
q ri
zm = αi j v i , j −m , (0.39)
i =1 j =m+1
λi =0
ri −1 ri
m
p i (m) = ai l , a i l = λ−l
i
αi j v i , j −l , λi = 0, (0.40)
l =0
l j =l +1
such that q
xm − s = zm + p i (m)λim . (0.41)
i =1
λi =0
The first double summation represents the contribution of the zero eigenvalues of T
to x m − s, while the second represents the contribution of the nonzero eigenvalues of
T . We therefore need to analyze T m v i j for λi = 0 and for λi = 0 separately.
We start with the contribution of the zero eigenvalues λi . Letting λi = 0 in (0.37),
we have T m v i j = v i , j −m , where we also mean that v i k = 0 when k ≤ 0. Substituting
q r
this into the summation i =1 j i=1 , we obtain the vector z m as the contribution of
the zero eigenvalues. λi =0
We now turn to the contribution of the nonzero eigenvalues λi , which turns out
to be more complicated. By (0.23), we have T m V = V J m . Now, by (0.34),
T m V = [ T m V 1 | T m V 2 | · · · | T m V q ],
T m V i = V i [J ri (λi )] m , i = 1, . . . , q.
22 Chapter 0. Introduction and Review of Linear Algebra
Now,
T m V i = [ T m v i 1 | T m v i 2 | · · · | T m v i ri ].
Equating the j th column of the matrix T m V i , namely, the vector T m v i j , with the
j th column of V i [J ri (λi )] m by invoking (0.29), we obtain
j
m m− j +k
T mvi j = vik λ . (0.43)
k=1
j −k i
q ri
Substituting (0.43) into the summation i =1 j =1
and rearranging, we obtain
λi =0
q ri −1 ri q
m
αi j v i , j −l λim−l = p i (m)λim
i =1 l =0 l j =l +1 i =1
λi =0 λi =0
Remarks:
1. Since ml is a polynomial in m of degree exactly l , p i (m) is a polynomial in m
of degree at most ri − 1.
2. Naturally, z m = 0 for all m when zero is not an eigenvalue of T .
3. In addition, z m = 0 for m ≥ max{ri : λi = 0}, since the summations on j in
(0.39) are empty for such m.
4. Clearly, z m is in the subspace spanned by the eigenvectors and principal vec-
tors corresponding to the zero eigenvalues. Similarly, p i (m) is in the subspace
spanned by the eigenvector and principal vectors corresponding to the eigen-
value λi = 0.
5. It is clear from (0.38) that, because x 0 is chosen arbitrarily, x 0 − s is also arbi-
trary, and so are the αi j . Therefore, by (0.41), for convergence of {x m } from any
x 0 , we need to have |λi | < 1 for all i. In addition, the smaller ρ(T ) is, the faster
{x m } converges to s.
A= D −E −F, (0.44)
where
D = diag (A) = diag (a11 , a22 , . . . , aN N ), (0.45)
while −E is the lower triangular part of A excluding the main diagonal, and −F is the
upper triangular part of A excluding the main diagonal. Hence, both E and F have
zero diagonals.
0.4. Fixed-point iterative methods for linear systems 23
(1) (2)
provided that ai i = 0 for all i. Here, we first compute x m+1 , then x m+1 , and so
on. Thus, we have
M = D − E, N = F (0.54)
so that
T = (D − E )−1 F . (0.55)
4. Symmetric Gauss–Seidel method: Let us recall that in each step of the Gauss–
(i )
Seidel method, the x m are being updated in the order i = 1, 2, . . . , N . We will
(i )
call this updating a forward sweep. We can also perform the updating of the x m
in the order i = N , N − 1, . . . , 1. One such step is called a backward sweep, and it
amounts to taking
M = D −F, N = E (0.56)
so that
T = (D − F )−1 E . (0.57)
If we obtain x m+1 by applying to x m a forward sweep followed by a backward
sweep, we obtain a method called the symmetric Gauss–Seidel method. The ma-
trix of iteration relevant to this method is thus
T = (D − F )−1 E (D − E )−1 F . (0.58)
24 Chapter 0. Introduction and Review of Linear Algebra
(1) (2)
provided that ai i = 0 for all i. Here too, we first compute x m+1 , then x m+1 , and
so on. Decomposing A as in (0.44), for SOR, we have
1 1
M= (D − ωE ), N= [(1 − ω)D + ωF ]; (0.60)
ω ω
hence,
T = (D − ωE)−1 [(1 − ω)D + ωF ]. (0.61)
6. Symmetric SOR (SSOR): Note that, as in the case of the Gauss–Seidel method,
(i )
in SOR too the x m are being updated in the order i = 1, 2, . . . , N . We will call
(i )
this updating a forward sweep. We can also perform the updating of the x m in
the order i = N , N − 1, . . . , 1. One such step is called a backward sweep, and it
amounts to taking
1 1
M= (D − ωF ), N= [(1 − ω)D + ωE] (0.62)
ω ω
so that
T = (D − ωF )−1 [(1 − ω)D + ωE]. (0.63)
7. Alternating direction implicit (ADI) method: This method was developed to solve
linear systems arising from finite-difference or finite-element solution of elliptic
and parabolic partial differential equations. In the linear system Ax = b, we
have A = H + V . Expressing this linear system in the forms
so that
T = (V + μI )−1 (H − μI )(H + μI )−1 (V − μI ). (0.67)
(i )
Remark: In the Jacobi, Gauss–Seidel, and SOR methods mentioned above, the x m
are updated one at a time. Because of this, these methods are called point methods.
(i )
We can choose to update several of the x m simultaneously, that is, we can choose to
(i )
update the x m in blocks. The resulting methods are said to be block methods.
0 < λ1 ≤ λ2 ≤ · · · ≤ λN .
Then the Richardson iterative method converges provided 0 < ω < 2/λN . Denoting T by
T (ω), we also have the following optimal result:
2 λN − λ1
ω0 = , min ρ(T (ω)) = ρ(T (ω0 )) = < 1.
λN + λ1 ω λN + λ1
Theorem 0.5. Let the matrix A be strictly diagonally dominant. Then both the Jacobi
and the Gauss–Seidel methods converge. Define
i −1 N
μi = |ai j |/|ai i |, νi = |ai j |/|ai i |, i = 1, . . . , N .
j =1 j =i +1
We next state the Stein–Rosenberg theorem that pertains to the convergence of the
Jacobi and Gauss–Seidel methods.
Theorem 0.6. Denote by T J and T G-S the iteration matrices for the Jacobi and Gauss–
Seidel iterative methods for the linear system Ax = b. If T J ≥ O, then precisely one of the
following takes place:
26 Chapter 0. Introduction and Review of Linear Algebra
1. ρ(TG-S ) = ρ(TJ ) = 0.
3. ρ(TG-S ) = ρ(TJ ) = 1.
Thus, the Jacobi and Gauss–Seidel methods converge together and diverge together. In the
case of convergence, the Gauss–Seidel method converges faster than the Jacobi method.
Theorem 0.7. Let the matrix A be irreducibly diagonally dominant. Then both the Jacobi
and the Gauss–Seidel methods converge.
Theorem 0.9. Let A = M − N be a regular splitting of the matrix A, and let A−1 ≥ O.
Then
ρ(A−1 N )
ρ(M −1 N ) = < 1.
1 + ρ(A−1 N )
Hence, the iterative method x m+1 = T x m +d, where T = M −1 N , converges. Conversely,
ρ(M −1 N ) < 1 implies that A is nonsingular and A−1 ≥ O.
Theorem 0.10. Let the matrix A be Hermitian positive definite. Then we have the fol-
lowing:
2. The Jacobi method converges if and only if the matrix 2D −A is Hermitian positive
definite. Here D = diag(A).
Theorem 0.11. For SOR to converge, it is necessary (but not sufficient) that 0 < ω < 2.
Theorem 0.12. Let A be Hermitian positive definite. Then SOR converges if and only if
0 < ω < 2.
Theorem 0.13. Let the matrices H and V in the ADI method be normal. Then
μ − λi (H ) μ − λi (V )
ρ(T ) ≤ max
max .
i μ + λi (H ) i μ + λi (V )
Then the ADI method converges if H and V are Hermitian positive definite and μ > 0.
the following two-stage fixed-point iterative method for the system Ax = b: Pick x 0 and a
scalar μ, and compute x 1 , x 2 , . . . as in
%
(μI + AH )x m+1 = (μI − AS )x m + b
, m = 0, 1, . . . .
(μI + AS )x m+1 = (μI − AH )x m+1 + b
0.4. Fixed-point iterative methods for linear systems 27
If AH is positive definite, then A is nonsingular. If, in addition, μ is real and μ > 0, then
μ − λi (AH )
ρ(T ) ≤ max < 1.
i μ + λi (AH )
Thus, the iterative method converges. (Note that the method described here is an ADI
method.)
Chapter 1
Development of
Polynomial Extrapolation
Methods
1.1 Preliminaries
1.1.1 Motivation
In this chapter, we present the derivation of four polynomial extrapolation methods:
minimal polynomial extrapolation (MPE), reduced rank extrapolation (RRE), modified
minimal polynomial extrapolation (MMPE), and SVD-based minimal polynomial ex-
trapolation (SVD-MPE). Of these, MPE, RRE, and MMPE date back to the 1970s,
while SVD-MPE was published in 2016. MPE was introduced by Cabay and Jackson
[52]; RRE was introduced independently by Kaniel and Stein [155], Eddy [74], and
Mes̆ina [185]; and MMPE was introduced independently by Brezinski [28], Pugachev
[211], and Sidi, Ford, and Smith [299]. SVD-MPE is a new method by the author
[290].8
MPE and RRE, along with the epsilon algorithms (to be described in Chapter 5),
have been reviewed by Skelboe [305] and by Smith, Ford, and Sidi [306]. Since the
publication of these reviews, quite a few developments have taken place on the sub-
ject of vector extrapolation, and some of the newer developments have been reviewed
by Sidi [286, 289]; for still newer developments, see Sidi [290, 292]. Our purpose
here is to cover as many of these developments as possible and to present a broad
perspective.
Given a vector sequence that converges slowly, our aim in this chapter is to
develop extrapolation methods whose only input is the sequence {x m } itself. As we
mentioned in the preceding chapter, a major area of application of vector extrapo-
lation methods is that of iterative solution of systems of equations. We have also
seen that nonlinear systems of equations “behave” linearly close to their solutions.
Therefore, in our derivation of polynomial extrapolation methods, we will go
through the iterative solution of linear systems of equations. That is, we will derive
the methods within the context of linear systems, making sure that these methods
8
The extrapolation methods we discuss in this book apply to vector sequences, as already mentioned.
Block versions of some of the methods we describe here, which apply to sequences of vectors and matrices,
have been given in Brezinski and Redivo Zaglia [40] and Messaoudi [186] and in the recent papers by Jbilou
and Sadok [151] and Jbilou and Messaoudi [146]. See also Baron and Wajc [15]. We do not discuss these
methods here.
31
32 Chapter 1. Development of Polynomial Extrapolation Methods
involve only the sequence of approximations {x m } that result from the iterative
methods used.9 Following their derivation (definition), we will present a detailed
discussion of their algebraic properties. We will not address the important issues
of (i) actual algorithms for their numerical implementation and (ii) their analytical
(convergence) properties in this chapter; we leave these topics to Chapters 2, 4, 6, and 7.
Important note: Starting with this chapter, and throughout this book, we will fix
our notation for the inner products in s and the vector norms induced by them as
follows:
• For general or weighted inner products,
(y, z ) = y ∗ M z , z = (z , z ), (1.1)
where M ∈ s ×s is a fixed Hermitian positive definite matrix. Recall that all
inner products in s are weighted inner products unless they are Euclidean.
• For the standard l2 or Euclidean inner product,
〈y, z 〉 = y ∗ z , z2 = 〈z , z 〉, (1.2)
whenever confusion may arise.
Throughout, we use the fact that, for any square matrix H and analytic functions
f (λ) and g (λ), we have f (H ) g (H ) = g (H ) f (H ).
We will also need the following definition throughout this work.
Definition 1.1. The polynomial A(z) = im=0 ai z i is monic of degree m if a m = 1. We
also denote the set of monic polynomials of degree m by m .
The following theorem is known as the Cayley–Hamilton theorem, and there are dif-
ferent proofs of it. The proof we present here employs the Jordan canonical form and
should provide a good exercise in the subject.
i =0 i =1 i =1
Theorem 1.4. The minimal polynomial Q(λ) of T exists and is unique. Moreover, if
Q1 (T ) = O for some polynomial Q1 (λ) with deg Q1 > deg Q, then Q(λ) divides Q1 (λ).
In particular, Q(λ) divides R(λ), the characteristic polynomial of T . [Thus, the degree of
Q(λ) is at most N , and its zeros are some or all of the eigenvalues of T .]
Proof. Since the characteristic polynomial R(λ) satisfies R(T ) = O, there is also a
monic polynomial Q(λ) of smallest degree m, m ≤ N , satisfying Q(T ) = O. Suppose
&
that there is another monic polynomial Q(λ) & ) = O.
of degree m that satisfies Q(T
&
Then the difference S(λ) = Q(λ) − Q(λ) also satisfies S(T ) = O, and its degree is less
than m, which is impossible. Therefore, Q(λ) is unique.
Let Q1 (λ) be of degree m1 such that m1 > m and Q1 (T ) = O. Then there exist
polynomials a(λ) of degree m1 − m and r (λ) of degree at most m − 1 such that Q1 (λ) =
a(λ)Q(λ) + r (λ). Therefore,
O = Q1 (T ) = a(T )Q(T ) + r (T ) = r (T ).
Since r (T ) = O, but r (λ) has degree less than m, r (λ) must be the zero polynomial.
Therefore, Q(λ) divides Q1 (λ). Letting Q1 (λ) = R(λ), we realize that Q(λ) divides
R(λ), meaning that its zeros are some or all of the eigenvalues of T .
To see how Q(λ) factorizes, let us consider the case in which λ1 = a = λ2 and are
different from the rest of the λi . Assume also that r1 ≥ r2 . Then, by (0.30), we have
that [J r j (λ j ) −aI r j ]k = O only when k ≥ r j , j = 1, 2. This means that Q(λ) will have
(λ − a) r1 as one of its factors.
Definition 1.5. Given a nonzero vector u ∈ N , the monic polynomial P (λ) is said to
be a minimal polynomial of T with respect to u if P (T )u = 0 and if P (λ) has smallest
degree.
Theorem 1.6. The minimal polynomial P (λ) of T with respect to u exists and is unique.
Moreover, if P1 (T )u = 0 for some polynomial P1 (λ) with deg P1 > deg P , then P (λ)
divides P1 (λ). In particular, P (λ) divides Q(λ), the minimal polynomial of T , which in
turn divides R(λ), the characteristic polynomial of T . [Thus, the degree of P (λ) is at most
N , and its zeros are some or all of the eigenvalues of T .]
Proof. Since the minimal polynomial Q(λ) satisfies Q(T ) = O, it also satisfies
Q(T )u = 0. Therefore, there is a monic polynomial P (λ) of smallest degree k, k ≤ m,
where m is the degree of Q(λ), satisfying P (T )u = 0. Suppose that there is another
monic polynomial P (λ) of degree k that satisfies P (T )u = 0. Then the difference
S(λ) = P (λ) − P (λ) also satisfies S(T )u = 0, and its degree is less than k, which is
impossible. Therefore, P (λ) is unique.
Let P1 (λ) be of degree k1 such that k1 > k and P1 (T )u = 0. Then there exist
polynomials a(λ) of degree k1 − k and r (λ) of degree at most k − 1 such that P1 (λ) =
a(λ)P (λ) + r (λ). Therefore,
0 = P1 (T )u = a(T )P (T )u + r (T )u = r (T )u.
Since r (T )u = 0, but r (λ) has degree less than k, r (λ) must be the zero polynomial.
Therefore, P (λ) divides P1 (λ). Letting P1 (λ) = Q(λ) and P1 (λ) = R(λ), we realize that
P (λ) divides Q(λ) and R(λ), meaning that its zeros are some or all of the eigenvalues
of T .
To see how P (λ) factorizes, let us consider the case in which λ1 = a = λ2 and are
different from the rest of the λi . Assume also that r1 ≥ r2 . Recall that the eigenvectors
and principal vectors v i j of T span N . Therefore, u can be expressed as a linear
combination of the v i j . Suppose that
r1 r2
u= α1 j v 1 j + α2 j v 2 j + (a linear combination of {v i j }, i ≥ 3),
j =1 j =1
k r1 r2
T − aI α1 j v 1 j + α2 j v 2 j = 0 only when k ≥ r1 .
j =1 j =1
This means that P (λ) will have (λ − a) r1 as one of its factors.
Note that J has five Jordan blocks with (λ1 = a, r1 = 3), (λ2 = a, r2 = 2), (λ3 = b , r3 =
2), (λ4 = b , r4 = 1), and (λ5 = b , r5 = 1). Thus, the characteristic polynomial R(λ)
and the minimal polynomial Q(λ) are
If
u = 2v 11 − v 12 + 3v 21 + 4v 32 − 2v 41 + v 51 ,
then (T − aI )2 and (T − b I )2 annihilate the vectors 2v 11 − v 12 + 3v 21 and 4v 32 −
2v 41 + v 51 , respectively. Consequently, the minimal polynomial of T with respect to
u is
P (λ) = (λ − a)2 (λ − b )2 .
Remark: From the examples given here, it must be clear that the minimal polyno-
mial of T with respect to u is determined by the eigenvectors and principal vectors of
T that are present in the spectral decomposition of u. This means that if two vectors
d 1 and d 2 , d 1 = d 2 , have the same eigenvectors and principal vectors in their spec-
tral decompositions, then the minimal polynomial of T with respect to d 1 is also the
minimal polynomial of T with respect to d 2 .
x = T x + d. (1.5)
Writing this system in the form (I − T )x = d, it becomes clear that the uniqueness
of the solution is guaranteed when the matrix I − T is nonsingular or, equivalently,
36 Chapter 1. Development of Polynomial Extrapolation Methods
when T does not have one as its eigenvalue. Starting with an arbitrary vector x 0 , let
the vector sequence {x m } be generated via the iterative scheme
x m+1 = T x m + d, m = 0, 1, . . . . (1.6)
As we have shown already, provided ρ(T ) < 1, limm→∞ x m exists and equals s.
Making use of what we already know about minimal polynomials, we can actually
construct s as a linear combination of a finite number (at most N + 1) of the vectors
x m , whether {x m } converges or not. This is the subject of Theorem 1.8 below. Before
we state this theorem, we introduce some notation and a few simple, but useful, facts.
Given the sequence {x m }, generated as in (1.6), let
u m = Δx m , w m = Δu m = Δ2 x m , m = 0, 1, . . . , (1.7)
ε m = x m − s, m = 0, 1, . . . . (1.8)
u m = T m u 0, w m = T m w 0, m = 0, 1, . . . . (1.9)
Similarly, by (1.6) and by the fact that s = T s + d, one can relate the error in x m+1 to
the error in x m via
ε m+1 = (T x m + d) − (T s + d) = T (x m − s) = T ε m , (1.10)
ε m = T m ε0 , m = 0, 1, . . . . (1.11)
k
Then i =0 ci = 0, and s can be expressed as
k
i =0 ci x n+i
s= k . (1.16)
i =0 ci
k k
0 = P (T )εn = ci T i εn = ci εn+i , (1.17)
i =0 i =0
k k k
0= ci εn+i = ci x n+i − ci s,
i =0 i =0 i =0
and solving this for s, we obtain (1.16), provided ki=0 ci = 0. Now, ki=0 ci = P (1) = 0
since one is not an eigenvalue of T , and hence (λ − 1) is not a factor of P (λ). This
completes the proof.
By Theorem 1.8, we need to determine P (λ) to construct s. By the fact that P (λ)
is uniquely defined via P (T )εn = 0, it seems that we have to actually know εn to
know P (λ). However, since εn = x n − s and since s is unknown, we have no way
of knowing εn . Fortunately, we can obtain P (λ) solely from our knowledge of the
vectors x m . This we achieve with the help of Theorem 1.9.
Theorem 1.9. The minimal polynomial of T with respect to εn is also the minimal
polynomial of T with respect to u n = x n+1 − x n .
Proof. Let P (λ) be the minimal polynomial of T with respect to εn as before, and
denote by S(λ) the minimal polynomial of T with respect to u n . Thus,
P (T )εn = 0 (1.18)
and
S(T )u n = 0. (1.19)
Multiplying (1.18) by (T − I ), and recalling from (1.12) that (T − I )εn = u n , we
obtain P (T )u n = 0. By Theorem 1.6, this implies that S(λ) divides P (λ). Next, again
by (1.12), we can rewrite (1.19) as (T − I )S(T )εn = 0, which, upon multiplying by
(T − I )−1 , gives S(T )εn = 0. By Theorem 1.6, this implies that P (λ) divides S(λ).
Therefore, P (λ) ≡ S(λ).
P (T )u n = 0 (1.20)
and has smallest degree. Now, since all the vectors x m are available to us, so are the
vectors u m = x m+1 − x m . Thus, the polynomial P (λ) can now be determined from
(1.20), as we show next.
38 Chapter 1. Development of Polynomial Extrapolation Methods
k
Next, recalling that ck = 1, we rewrite i =0 ci u n+i = 0 in the form
k−1
ci u n+i = −u n+k . (1.22)
i =0
Let us express (1.21) and (1.22) more conveniently in matrix form. For this, let us
define the matrices U j as
U j = [ u n | u n+1 | · · · | u n+ j ]. (1.23)
U k c = 0, c = [c0 , c1 , . . . , ck ]T , (1.24)
and
U k−1 c = −u n+k , c = [c0 , c1 , . . . , ck−1 ]T . (1.25)
We will continue to use this notation without further explanation below.
Clearly, (1.25) is a system of N linear equations in the k unknowns c0 , c1 , . . . , ck−1
and is in general overdetermined since k ≤ N . Nevertheless, by Theorem 1.9, it is
consistent and has a unique solution for the ci . With this, we see that the solution s in
(1.16) is determined completely by the k + 2 vectors x n , x n+1 , . . . , x n+k+1 .
We now express s in a form that is slightly different from that in (1.16). With ck = 1
again, let us set
c
γi = k i , i = 0, 1, . . . , k. (1.26)
j =0 c j
k
This is allowed because j =0 c j = P (1) = 0 by Theorem 1.8. Obviously,
k
γi = 1. (1.27)
i =0
k
Ukγ = 0 and γi = 1, γ = [γ0 , γ1 , . . . , γk ]T . (1.29)
i =0
Proposition 1.10. Let k be the degree of P (λ), the minimal polynomial of T with respect
to u n . Then the sets {u n , u n+1 , . . . , u n+ j }, j < k, are linearly independent, while the set
{u n , u n+1 , . . . , u n+k } is not. The vector u n+k is a linear combination of u n+i , 0 ≤ i ≤
k − 1, as shown in (1.22).
spect to εn , has been determined. We have seen that P (λ) is also the minimal polyno-
mial of T with respect to u n and can be determined uniquely by solving the generally
overdetermined, but consistent, system of linear equations in (1.22) or, equivalently,
(1.25). However, the degree of the minimal polynomial of T with respect to εn can
be as large as N . Because N can be a very large integer in general, determining s in the
way we have described here becomes prohibitively expensive as far as computation
time and storage requirements are concerned. [Note that we need to store the vectors
u n , u n+1 , . . . , u n+k and solve the N × k linear system in (1.22).] Thus, we conclude
that computing s via a combination of the iteration vectors x m , as described above,
may not be feasible after all. Nevertheless, with a twist, we can use the framework
developed thus far to approximate s effectively. To do this, we replace the minimal
polynomial of T with respect to u n (or εn ) by another unknown polynomial, whose
degree is smaller—in fact, much smaller—than N and is at our disposal.
Let us denote the degree of the minimal polynomial of T with respect to u n by k0 ;
of course, k0 ≤ N . Then, by Definition 1.5, it is clear that the sets {u n , u n+1 , . . . , u n+k },
0 ≤ k ≤ k0 − 1, are linearly independent, but the set {u n , u n+1 , . . . , u n+k0 } is not. This
implies that the matrices U k , k = 0, 1, . . . , k0 − 1, are of full rank, but U k0 is not; that
is,
rank(U k ) = k + 1, 0 ≤ k ≤ k0 − 1, rank(U k0 ) = k0 . (1.30)
Most people start at our website which has the main PG search
facility: www.gutenberg.org.
ebookbell.com