0% found this document useful (0 votes)
17 views

Vector Extrapolation Methods With Applications Avram Sidi download

The document is a comprehensive resource on vector extrapolation methods, authored by Avram Sidi, and published by the Society for Industrial and Applied Mathematics. It covers the development, algorithms, and applications of polynomial extrapolation methods in computational science and engineering. The book is part of a series aimed at providing advanced educational materials for interdisciplinary communities in computational mathematics and engineering.

Uploaded by

semaaseloy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Vector Extrapolation Methods With Applications Avram Sidi download

The document is a comprehensive resource on vector extrapolation methods, authored by Avram Sidi, and published by the Society for Industrial and Applied Mathematics. It covers the development, algorithms, and applications of polynomial extrapolation methods in computational science and engineering. The book is part of a series aimed at providing advanced educational materials for interdisciplinary communities in computational mathematics and engineering.

Uploaded by

semaaseloy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

Vector Extrapolation Methods With Applications

Avram Sidi download

https://ebookbell.com/product/vector-extrapolation-methods-with-
applications-avram-sidi-22052016

Explore and download more ebooks at ebookbell.com


Here are some recommended products that we believe you will be
interested in. You can click the link to download.

Vector Mechanics For Engineers Statics And Dynamics Instructor


Solution Manual Ferdinand P Beer E Russell Johnston Jr Etc 11th
Edition Ferdinand P Beer

https://ebookbell.com/product/vector-mechanics-for-engineers-statics-
and-dynamics-instructor-solution-manual-ferdinand-p-beer-e-russell-
johnston-jr-etc-11th-edition-ferdinand-p-beer-46430678

Vector Analysis For Computer Graphics 1st John Vince

https://ebookbell.com/product/vector-analysis-for-computer-
graphics-1st-john-vince-47449036

Vectorborne Diseases David Claborn Sujit Bhattacharya Syamal Roy

https://ebookbell.com/product/vectorborne-diseases-david-claborn-
sujit-bhattacharya-syamal-roy-48458320

Vector Semantics Andrs Kornai

https://ebookbell.com/product/vector-semantics-andrs-kornai-48702772
Vector Calculus Illustrated Peter Baxandall Hans Liebeck

https://ebookbell.com/product/vector-calculus-illustrated-peter-
baxandall-hans-liebeck-48954866

Vector Analysis Versus Vector Calculus Instructor Solution Manual


Solutions 1st Edition Antonio Galbis

https://ebookbell.com/product/vector-analysis-versus-vector-calculus-
instructor-solution-manual-solutions-1st-edition-antonio-
galbis-49151616

Vector Calculus 6th Edition Jerrold E Marsden Anthony Tromba

https://ebookbell.com/product/vector-calculus-6th-edition-jerrold-e-
marsden-anthony-tromba-50721468

Vector Mechanics For Engineers Statics And Dynamics 12th Edition Beer

https://ebookbell.com/product/vector-mechanics-for-engineers-statics-
and-dynamics-12th-edition-beer-51376592

Vector Analysis For Computer Graphics 2nd Edition John Vince

https://ebookbell.com/product/vector-analysis-for-computer-
graphics-2nd-edition-john-vince-51578756
Vector Extrapolation Methods
with Applications

CS17_Sidi_FM_07-14-17.indd 1 8/9/2017 1:06:51 PM


Computational Science & Engineering
The SIAM series on Computational Science and Engineering publishes research monographs, advanced
undergraduate- or graduate-level textbooks, and other volumes of interest to an interdisciplinary CS&E
community of computational mathematicians, computer scientists, scientists, and engineers. The series
includes both introductory volumes aimed at a broad audience of mathematically motivated readers
interested in understanding methods and applications within computational science and engineering and
monographs reporting on the most recent developments in the field. The series also includes volumes
addressed to specific groups of professionals whose work relies extensively on computational science and
engineering.
SIAM created the CS&E series to support access to the rapid and far-ranging advances in computer
modeling and simulation of complex problems in science and engineering, to promote the interdisciplinary
culture required to meet these large-scale challenges, and to provide the means to the next generation of
computational scientists and engineers.

Editor-in-Chief
Donald Estep Chen Greif J. Nathan Kutz
Colorado State University University of British Columbia University of Washington
Jan S. Hesthaven Ralph C. Smith
Editorial Board
Ecole Polytechnique Fédérale North Carolina State University
Daniela Calvetti de Lausanne
Charles F. Van Loan
Case Western Reserve University
Johan Hoffman Cornell University
Paul Constantine KTH Royal Institute of Technology
Karen Willcox
Colorado School of Mines
David Keyes Massachusetts Institute
Omar Ghattas Columbia University of Technology
University of Texas at Austin

Series Volumes
Sidi, Avram, Vector Extrapolation Methods with Applications
Borzì, A., Ciaramella, G., and Sprengel, M., Formulation and Numerical Solution of Quantum Control Problems
Benner, Peter, Cohen, Albert, Ohlberger, Mario, and Willcox, Karen, editors, Model Reduction and
Approximation: Theory and Algorithms
Kuzmin, Dmitri and Hämäläinen, Jari, Finite Element Methods for Computational Fluid Dynamics:
A Practical Guide
Rostamian, Rouben, Programming Projects in C for Students of Engineering, Science, and Mathematics
Smith, Ralph C., Uncertainty Quantification: Theory, Implementation, and Applications
Dankowicz, Harry and Schilder, Frank, Recipes for Continuation
Mueller, Jennifer L. and Siltanen, Samuli, Linear and Nonlinear Inverse Problems with Practical
Applications
Shapira, Yair, Solving PDEs in C++: Numerical Methods in a Unified Object-Oriented Approach,
Second Edition
Borzì, Alfio and Schulz, Volker, Computational Optimization of Systems Governed by Partial
Differential Equations
Ascher, Uri M. and Greif, Chen, A First Course in Numerical Methods
Layton, William, Introduction to the Numerical Analysis of Incompressible Viscous Flows
Ascher, Uri M., Numerical Methods for Evolutionary Differential Equations
Zohdi, T. I., An Introduction to Modeling and Simulation of Particulate Flows
Biegler, Lorenz T., Ghattas, Omar, Heinkenschloss, Matthias, Keyes, David, and van Bloemen Waanders, Bart,
editors, Real-Time PDE-Constrained Optimization
Chen, Zhangxin, Huan, Guanren, and Ma, Yuanle, Computational Methods for Multiphase Flows
in Porous Media
Shapira, Yair, Solving PDEs in C++: Numerical Methods in a Unified Object-Oriented Approach

CS17_Sidi_FM_07-14-17.indd 2 8/9/2017 1:06:51 PM


AVRAM SIDI
Technion – Israel Institute of Technology
Haifa, Israel

Vector Extrapolation Methods


with Applications

Society for Industrial and Applied Mathematics


Philadelphia

CS17_Sidi_FM_07-14-17.indd 3 8/9/2017 1:06:51 PM


Copyright © 2017 by the Society for Industrial and Applied Mathematics

10 9 8 7 6 5 4 3 2 1

All rights reserved. Printed in the United States of America. No part of this book may be
reproduced, stored, or transmitted in any manner without the written permission of the
publisher. For information, write to the Society for Industrial and Applied Mathematics,
3600 Market Street, 6th Floor, Philadelphia, PA 19104-2688 USA.

Trademarked names may be used in this book without the inclusion of a trademark
symbol. These names are used in an editorial context only; no infringement of trademark
is intended.

Publisher David Marshall and Kivmars Bowling


Executive Editor Elizabeth Greenspan
Developmental Editor Gina Rinelli Harris
Managing Editor Kelly Thomas
Production Editor Ann Manning Allen
Copy Editor Julia Cochrane
Production Manager Donna Witzleben
Production Coordinator Cally Shrader
Compositor Cheryl Hufnagle
Graphic Designer Lois Sellers

Library of Congress Cataloging-in-Publication Data


Names: Sidi, Avram, author.
Title: Vector extrapolation methods with applications / Avram Sidi,
Technion – Israel Institute of Technology, Haifa, Israel.
Description: Philadelphia : Society for Industrial and Applied Mathematicsm,
[2017] | Series: Computational science and engineering series ; 17 |
Includes bibliographical references and index.
Identifiers: LCCN 2017026889 (print) | LCCN 2017031009 (ebook) | ISBN
9781611974966 (e-book) | ISBN 9781611974959 (print)
Subjects: LCSH: Vector analysis. | Extrapolation.
Classification: LCC QA433 (ebook) | LCC QA433 .S525 2017 (print) | DDC
515/.63--dc23
LC record available at https://lccn.loc.gov/2017026889

is a registered trademark.

CS17_Sidi_FM_07-14-17.indd 4 8/9/2017 1:06:51 PM


Contents

Preface xi

0 Introduction and Review of Linear Algebra 1


0.1 General background and motivation . . . . . . . . . . . . . . . . . . . 1
0.2 Some linear algebra notation and background . . . . . . . . . . . . . 2
0.3 Fixed-point iterative methods for nonlinear systems . . . . . . . . . 15
0.4 Fixed-point iterative methods for linear systems . . . . . . . . . . . . 16

I Vector Extrapolation Methods 29

1 Development of Polynomial Extrapolation Methods 31


1.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.2 Solution to x = T x + d from {x m } . . . . . . . . . . . . . . . . . . . . 35
1.3 Derivation of MPE, RRE, MMPE, and SVD-MPE . . . . . . . . . . 39
1.4 Finite termination property . . . . . . . . . . . . . . . . . . . . . . . . . 45
1.5 Application of polynomial extrapolation methods to arbitrary {x m } 47
1.6 Further developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
1.7 Determinant representations . . . . . . . . . . . . . . . . . . . . . . . . 49
1.8 Compact representations . . . . . . . . . . . . . . . . . . . . . . . . . . 55
1.9 Numerical stability in polynomial extrapolation . . . . . . . . . . . 56
1.10 Solution of x = f (x) by polynomial extrapolation methods in
cycling mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
1.11 Final remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

2 Unified Algorithms for MPE and RRE 65


2.1 General considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.2 QR factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
2.3 Solution of least-squares problems by QR factorization . . . . . . . 67
2.4 Algorithms for MPE and RRE . . . . . . . . . . . . . . . . . . . . . . . 71
2.5 Error estimation via algorithms . . . . . . . . . . . . . . . . . . . . . . 74
2.6 Further algorithms for MPE and RRE . . . . . . . . . . . . . . . . . . 78

3 MPE and RRE Are Related 83


3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.2 R k γ k for MPE and RRE . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.3 First main result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.4 Second main result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.5 Peak-plateau phenomenon . . . . . . . . . . . . . . . . . . . . . . . . . . 88

v
vi Contents

4 Algorithms for MMPE and SVD-MPE 91


4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.2 LU factorization and SVD . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.3 Algorithm for MMPE . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.4 Error estimation via algorithm for MMPE . . . . . . . . . . . . . . . 95
4.5 Algorithm for SVD-MPE . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.6 Error estimation via algorithm for SVD-MPE . . . . . . . . . . . . . 98

5 Epsilon Algorithms 99
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.2 SEA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.3 VEA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.4 TEA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.5 Implementation of epsilon algorithms in cycling mode . . . . . . . 114
5.6 ETEA: An economical implementation of TEA . . . . . . . . . . . 115
5.7 Comparison of epsilon algorithms with polynomial methods . . . 117

6 Convergence Study of Extrapolation Methods: Part I 119


6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6.2 Convergence and stability of rows in the extrapolation table . . . 121
6.3 Technical preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6.4 Proof of Theorem 6.6 for MMPE and TEA . . . . . . . . . . . . . . . 130
6.5 Proof of Theorem 6.6 for MPE and RRE . . . . . . . . . . . . . . . . 137
6.6 Another proof of (6.13) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
6.7 Proof of Theorem 6.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
6.8 Generalizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
6.9 Extension to infinite-dimensional spaces . . . . . . . . . . . . . . . . . 153

7 Convergence Study of Extrapolation Methods: Part II 155


7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
7.2 Simple error bounds for sRRE n,k
................... . . . . . 156
7.3 Error bounds for sn,k and sn,k via orthogonal polynomials
MPE RRE
. . . . . 158
7.4 Appraisal of the upper bounds on Γn,k D
and conclusions . . . . . . . 170
7.5 Justification of cycling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
7.6 Cycling and nonlinear systems of equations . . . . . . . . . . . . . . 174

8 Recursion Relations for Vector Extrapolation Methods 177


8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
8.2 Recursions for fixed m . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
8.3 Recursions for m = n + q with fixed q . . . . . . . . . . . . . . . . . . 181
8.4 Recursions for fixed n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
8.5 qd-type algorithms and the matrix eigenvalue problem . . . . . . . 184

II Krylov Subspace Methods 189

9 Krylov Subspace Methods for Linear Systems 191


9.1 Projection methods for linear systems . . . . . . . . . . . . . . . . . . 191
9.2 Krylov subspace methods: General discussion . . . . . . . . . . . . . 197
9.3 Method of Arnoldi: Full orthogonalization method (FOM) . . . . 203
Contents vii

9.4 Method of generalized minimal residuals (GMR) . . . . . . . . . . . 206


9.5 FOM and GMR are related . . . . . . . . . . . . . . . . . . . . . . . . . 208
9.6 Recursive algorithms for FOM and GMR with A Hermitian pos-
itive definite: A unified treatment of conjugate gradients and con-
jugate residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
9.7 On the existence of short recurrences for FOM and GMR . . . . . 219
9.8 The method of Lanczos . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
9.9 Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
9.10 FOM and GMR with prior iterations . . . . . . . . . . . . . . . . . . 229
9.11 Krylov subspace methods for nonlinear systems . . . . . . . . . . . 230

10 Krylov Subspace Methods for Eigenvalue Problems 233


10.1 Projection methods for eigenvalue problems . . . . . . . . . . . . . . 233
10.2 Krylov subspace methods . . . . . . . . . . . . . . . . . . . . . . . . . . 235
10.3 The method of Arnoldi . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
10.4 The method of Lanczos . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
10.5 The case of Hermitian A . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
10.6 Methods of Arnoldi and Lanczos for eigenvalues with special prop-
erties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250

III Applications and Generalizations 261

11 Miscellaneous Applications of Vector Extrapolation Methods 263


11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
11.2 Computation of steady-state solutions . . . . . . . . . . . . . . . . . . 263
11.3 Computation of eigenvectors with known eigenvalues . . . . . . . 266
11.4 Computation of eigenpair derivatives . . . . . . . . . . . . . . . . . . 273
11.5 Application to solution of singular linear systems . . . . . . . . . . 277
11.6 Application to multidimensional scaling . . . . . . . . . . . . . . . . 279

12 Rational Approximations from Vector-Valued Power Series: Part I 285


12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
12.2 Derivation of vector-valued rational approximations . . . . . . . . . 286
12.3 A compact formula for sn,k (z) from SMPE . . . . . . . . . . . . . . . 289
12.4 Algebraic properties of sn,k (z) . . . . . . . . . . . . . . . . . . . . . . . 290
12.5 Efficient computation of sn,k (z) from SMPE . . . . . . . . . . . . . . 294
12.6 Some sources of vector-valued power series . . . . . . . . . . . . . . . 296
12.7 Convergence study of sn,k (z): A constructive theory of de Montes-
sus type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297

13 Rational Approximations from Vector-Valued Power Series: Part II 305


13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
13.2 Generalized inverse vector-valued Padé approximants . . . . . . . . 305
13.3 Simultaneous Padé approximants . . . . . . . . . . . . . . . . . . . . . 308
13.4 Directed simultaneous Padé approximants . . . . . . . . . . . . . . . 310

14 Applications of SMPE, SMMPE, and STEA 313


14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
14.2 Application to the solution of x = zAx + b versus Krylov sub-
space methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
viii Contents

14.3 A related application to Fredholm integral equations . . . . . . . . 316


14.4 Application to reanalysis of structures . . . . . . . . . . . . . . . . . . 317
14.5 Application to nonlinear differential equations . . . . . . . . . . . . 318
14.6 Application to computation of f (A)b . . . . . . . . . . . . . . . . . . 320

15 Vector Generalizations of Scalar Extrapolation Methods 327


15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
15.2 Review of a generalized Richardson extrapolation process . . . . . 327
15.3 Vectorization of FGREP . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
15.4 A convergence theory for FGREP2 . . . . . . . . . . . . . . . . . . . . 332
15.5 Vector extrapolation methods from scalar sequence
transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341

16 Vector-Valued Rational Interpolation Methods 345


16.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
16.2 Development of vector-valued rational interpolation methods . . 345
16.3 Algebraic properties of interpolants . . . . . . . . . . . . . . . . . . . 351
16.4 Convergence study of r p,k (z): A constructive theory of de Montes-
sus type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354

IV Appendices 359

A QR Factorization 361
A.1 Gram–Schmidt orthogonalization (GS) and QR factorization . . . 361
A.2 Modified Gram–Schmidt orthogonalization (MGS) . . . . . . . . . 364
A.3 MGS with reorthogonalization . . . . . . . . . . . . . . . . . . . . . . 365

B Singular Value Decomposition (SVD) 367


B.1 Full SVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
B.2 Reduced SVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
B.3 SVD as a sum of rank-one matrices . . . . . . . . . . . . . . . . . . . . 370

C Moore–Penrose Generalized Inverse 373


C.1 Definition of the Moore–Penrose generalized inverse . . . . . . . . 373
C.2 SVD representation of A+ . . . . . . . . . . . . . . . . . . . . . . . . . . 374
C.3 Connection with linear least-squares problems . . . . . . . . . . . . 374
C.4 Moore–Penrose inverse of a normal matrix . . . . . . . . . . . . . . . 374

D Basics of Orthogonal Polynomials 377


D.1 Definition and basic properties of orthogonal polynomials . . . . 377
D.2 Some bounds related to orthogonal polynomials . . . . . . . . . . . 378

E Chebyshev Polynomials: Basic Properties 381


E.1 Definition of Chebyshev polynomials . . . . . . . . . . . . . . . . . . 381
E.2 Zeros, extrema, and min-max properties . . . . . . . . . . . . . . . . . 382
E.3 Orthogonality properties . . . . . . . . . . . . . . . . . . . . . . . . . . 383

F Useful Formulas and Results for Jacobi Polynomials 385


F.1 Definition of Jacobi polynomials . . . . . . . . . . . . . . . . . . . . . 385
F.2 Some consequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
Contents ix

G Rayleigh Quotient and Power Method 387


G.1 Properties of the Rayleigh quotient . . . . . . . . . . . . . . . . . . . . 387
G.2 The power method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
G.3 Inverse power method or inverse iteration . . . . . . . . . . . . . . . 391
G.4 Inverse power method with variable shifts . . . . . . . . . . . . . . . 392

H Unified FORTRAN 77 Code for MPE and RRE 397

Bibliography 405

Index 427
Preface

An important problem that arises in different areas of science and engineering is of


computing limits of sequences of vectors {x m }, where x m ∈ N , the dimension N
being very large. Such sequences arise, for example, in the solution by fixed-point it-
erative methods of systems of linear or nonlinear algebraic equations. Given such a
system of equations with solution s and a sequence {x m } obtained from this system
by a fixed-point iterative method, we have lim m→∞ x m = s if limm→∞ x m exists. Now,
in most cases of interest, {x m } converges to s extremely slowly, making direct use of
the x m to approximate s with reasonable accuracy quite expensive. This will especially
be the case when computing each x m is very time-consuming. One practical way to
remedy this problem is to apply a suitable extrapolation (or convergence acceleration)
method to the available x m . An extrapolation method takes a finite (and preferably
small) number of the vectors x m and processes them to obtain an approximation to s
that is better than the individual x m used in the process. A good method is in general
nonlinear in the x m and takes into account, either implicitly or explicitly, the asymp-
totic behavior of the x m as m → ∞. If the sequence {x m } does not converge, we may
think that no use can be made of it to approximate s. However, at least in some cases,
a suitable vector extrapolation method can still be applied to the divergent sequence
{x m } to produce good approximations to s, the solution of the system of equations
being solved. In this case, we call s the antilimit of {x m }; figuratively speaking, we
may view the sequence {x m } as diverging from its antilimit s.
One nice feature of the methods we study is that they take the vector sequence
{x m } as their only input, nothing else being needed. As such, they can be applied to
arbitrary vector sequences, whether these are obtained from linear systems or from
nonlinear systems or in any other way.
The subject of vector extrapolation methods was initiated by Peter Wynn in the
1960s with an interesting and successful generalization of his famous epsilon algorithm,
which implements the transformation of Daniel Shanks for accelerating the conver-
gence of sequences of scalars. The works of Shanks and Wynn had a great impact and
paved the way for more research into convergence acceleration. With the addition of
more methods and their detailed study, the subject of vector extrapolation methods
has come a long way since then. Today, it is an independent research area of numeri-
cal analysis. It has many practical applications. It also has connections to approxima-
tion theory. The relevance of vector extrapolation methods as effective computational
tools for solving problems of very high dimension has long been recognized, as can be
ascertained by doing a literature search in different computational disciplines.
There are a few books that discuss different aspects of vector extrapolation meth-
ods: The 1977 monograph of Brezinski [29] contains one chapter that deals with the
epsilon algorithm, its vectorized versions, and a matrix version. The 1991 book by
Brezinski and Redivo Zaglia [36] contains one chapter that discusses some of the de-

xi
xii Preface

velopments that took place in vector extrapolation methods up to the 1980s. Both
books treat vector extrapolation methods as part of the general topic of convergence
acceleration. The more recent book by Gander, Gander, and Kwok [93] briefly dis-
cusses a few of these methods as tools of scientific computing. So far, however, there
has not been a book that is fully dedicated to the subject of vector extrapolation meth-
ods and their applications. The present book will hopefully help to fill this void.
The main purpose of this book is to present a unified and systematic account of the
existing literature, old and new, on the theory and applications of vector extrapolation
methods that is as comprehensive and up-to-date as possible. In this account, I include
much of the original and relevant literature that deals with methods of practical impor-
tance whose effectiveness has been amply verified in various surveys and comparative
studies. I discuss the algebraic, computational/algorithmic, and analytical aspects of
the methods covered. The discussions are rigorous, and complete proofs are provided
in most places to make the reading flow better. I believe this treatment will help the
reader understand the thought process leading to the development of the individual
methods, why these methods work, how they work, and how they should be applied
for best results. Inevitably, the contents and the perspective of this book reflect my
personal interests and taste. Therefore, I apologize to those colleagues whose work
has not been covered.
Following the introduction and a review of substantial general and numerical lin-
ear algebra background in Chapter 0, which is needed throughout, this book is divided
into four parts:

(i) Part I reviews several vector extrapolation methods that are in use and that have
proved to be efficient convergence accelerators. These methods are divided into
two groups: (i) polynomial-type methods and (ii) epsilon algorithms.
Chapter 1 presents the development and algebraic properties of four polynomial-
type methods: minimal polynomial extrapolation (MPE), reduced rank extrapola-
tion (RRE), modified minimal polynomial extrapolation (MMPE), and the most re-
cent singular value decomposition–based minimal polynomial extrapolation (SVD-
MPE). Chapters 2 and 4 present computationally efficient and numerically stable
algorithms for these methods. The algorithms presented are also very econom-
ical as far as computer storage requirements are concerned; this issue is crucial
since most major applications of vector extrapolation methods are to very high
dimensional problems. Chapter 3 discusses some interesting relations between
MPE and RRE that were discovered recently. (Note that MPE and RRE are
essentially different from each other.)
Chapter 5 covers the three known epsilon algorithms: the scalar epsilon algo-
rithm (SEA), the vector epsilon algorithm (VEA), and the topological epsilon algo-
rithm (TEA).
Chapters 6 and 7 present unified convergence and convergence acceleration theo-
ries for MPE, RRE, MMPE, and TEA. Technically speaking, the contents of the
two chapters are quite involved. In these chapters, I have given detailed proofs
of some of the convergence theorems. I have decided to present the complete
proofs as part of this book since their techniques are quite general and are im-
mediately applicable in other problems as well. For example, the techniques of
proof developed in Chapter 6 have been used to prove some of the results pre-
sented in Chapters 12, 13, 14, and 16. (Of course, readers who do not want to
Preface xiii

spend their time on the proofs can simply skip them and study only the state-
ments of the relevant convergence theorems and the remarks and explanations
that follow the latter.)
Chapter 8 discusses some interesting recursion relations that exist among the
vectors obtained from each of the methods MPE, RRE, MMPE, and TEA.

(ii) Part II reviews some of the developments related to Krylov subspace methods for
matrix problems, a most interesting topic of numerical linear algebra, to which
vector extrapolation methods are closely related.
Chapter 9 deals with Krylov subspace methods for solving linear systems since
these are related to MPE, RRE, and TEA when the latter are applied to vector
sequences that are generated by fixed-point iterative procedures for linear sys-
tems. In particular, it reviews the method of Arnoldi that is also known as the
full orthogonalization method (FOM), the method of generalized minimal residuals
(GMR), and the method of Lanczos. It discusses the method of conjugate gradi-
ents (CG) and the method of conjugate residuals (CR) in a unified manner. It also
discusses the biconjugate gradient algorithm (Bi-CG).
Chapter 10 deals with Krylov subspace methods for solving matrix eigenvalue
problems. It reviews the method of Arnoldi and the method of Lanczos for these
problems. These methods are also closely related to MPE and TEA.

(iii) Part III reviews some of the applications of vector extrapolation methods.
Chapter 11 presents some nonstandard uses for computing eigenvectors corre-
sponding to known eigenvalues (such as the PageRank of the Google Web ma-
trix) and for computing derivatives of eigenpairs. Another interesting applica-
tion concerns multidimensional scaling.
Chapter 12 deals with applying vector extrapolation methods to vector-valued
power series. When MPE, MMPE, and TEA are applied to sequences of vector-
valued polynomials that form the partial sums of vector-valued Maclaurin series,
they produce vector-valued rational approximations to the sums of these series.
Chapter 12 discusses the properties of these rational approximations. Chapter
13 presents additional methods based on ideas from Padé approximants for ob-
taining rational approximations from vector-valued power series. Chapter 14
gives some interesting applications of vector-valued rational approximations.
Chapter 15 briefly presents some of the current knowledge about vector gener-
alizations of some scalar extrapolation methods, a subject that has not yet been
explored fully.
Chapter 16 discusses vector-valued rational interpolation procedures in the com-
plex plane that are closely related to the methods developed in Chapter 12.

(iv) Part IV is a compendium of eight appendices covering topics that we refer to


in Parts I–III. The topics covered are QR factorization in Appendix A; singular
value decomposition (SVD) in Appendix B; the Moore–Penrose inverse in Ap-
pendix C; fundamental properties of orthogonal polynomials and special prop-
erties of Chebyshev and Jacobi polynomials in Appendices D, E, and F; and
the Rayleigh quotient and the power method in Appendix G. Appendix G also
gives an independent rigorous treatment of the local convergence properties of
xiv Preface

the Rayleigh quotient inverse power method with variable shifts for the eigen-
value problem, a subject not treated in most books on numerical linear algebra.
A well-documented and well-tested FORTRAN 77 code for implementing MPE
and RRE in a unified manner is included in Appendix H.

The informed reader may pay attention to the fact that I have not included matrix
extrapolation methods in this book, even though they are definitely related to vector
extrapolation methods; I have pointed out some of the relevant literature on this topic,
however. In my discussion of Krylov subspace methods for linear systems, I have also
excluded the topic of semi-iterative methods, with Chebyshev iteration being the most
interesting representative. This subject is covered very extensively in the various books
and papers referred to in the relevant chapters. The main reason for both omissions,
which I regret very much, was the necessity to keep the size of the book in check.
Finally, I have not included any numerical examples since there are many of these in the
existing literature; the limitation I imposed on the size of the book was again the reason
for this omission. Nevertheless, I have pointed out some papers containing numerical
examples that illustrate the theoretical results presented in the different chapters.
I hope this book will serve as a reference for the more mathematically inclined re-
searchers in the area of vector extrapolation methods and for scientists and engineers
in different computational disciplines and as a textbook for students interested in un-
dertaking to study the subject seriously. Most of the mathematical background needed
to cope with the material is summarized in Chapter 0 and the appendices, and some is
provided as needed in the relevant chapters.
Before closing, I would like to express my deepest gratitude and appreciation to
my dear friends and colleagues Dr. William F. Ford of NASA Lewis Research Center
(today, NASA John H. Glenn Research Center at Lewis Field) and Professor David A.
Smith of Duke University, who introduced me to the general topic of vector extrap-
olation methods. Our fruitful collaboration began after I was invited by Dr. Ford to
Lewis Research Center to spend a sabbatical there during 1981–1983. Our first joint
work was summarized very briefly in the NASA technical memorandum [297] and
presented at the Thirtieth Anniversary Meeting of the Society for Industrial and Ap-
plied Mathematics, Stanford, California, July 19–23, 1982. This work was eventually
published as the NASA technical paper [298] and, later, as the journal paper [299]. I
consider it a privilege to acknowledge their friendship and their influence on my career
in this most interesting topic.
Lastly, I owe a debt of gratitude to my dear wife Carmella for her constant patience,
understanding, support, and encouragement while this book was being written. I ded-
icate this book to her with love.

Avram Sidi
Technion, Haifa
December 2016
Chapter 0

Introduction and Review


of Linear Algebra

0.1 General background and motivation


An important problem that arises in different areas of science and engineering is that
of computing limits of sequences of vectors {x m },1 where x m ∈ N , the dimension
N being very large in many applications. Such vector sequences arise, for example, in
the numerical solution of very large systems of linear or nonlinear equations by fixed-
point iterative methods, and lim m→∞ x m are simply the required solutions to these
systems. One common source of such systems is the finite-difference or finite-element
discretization of continuum problems. In later chapters, we will discuss further prob-
lems that give rise to vector sequences whose limits are needed.
In most cases of interest, however, the sequences {x m } converge to their limits ex-
tremely slowly. That is, to approximate s = limm→∞ x m with a reasonable prescribed
level of accuracy by x m , we need to consider very large values of m. Since, the vec-
tors x m are normally computed in the order m = 0, 1, 2, . . . , it is clear that we have
to compute many such vectors until we reach one that has acceptable accuracy. Thus,
this way of approximating s via the x m becomes very expensive computationally.
Nevertheless, we may ask whether we can do something with those x m that are
already available, to somehow obtain new approximations to s that are better than
each individual available x m . The answer to this question is yes for at least a large
class of sequences that arise from fixed-point iteration of linear and nonlinear systems
of equations. One practical way of achieving this is to apply to the sequence {x m } a
suitable convergence acceleration method (or extrapolation method).
Of course, if lim m→∞ x m does not exist, it seems that no use can be made of the
x m . Now, if the sequence {x m } is generated by an iterative solution of a linear or
nonlinear system of equations, it can be thought of as “diverging from” the solution
s of this system. We call s the antilimit of {x m } in this case. It turns out that vector
extrapolation methods can be applied to such divergent sequences {x m } to obtain good
approximations to the relevant antilimits, at least in some cases.
As we will see later, a vector extrapolation method computes as an approximation
to the limit or antilimit of {x m } a “weighted average” of a certain number of the vectors

x m . This approximation is of the general form ki=0 γi x n+i , where n and k are integers

chosen by the user and the scalars γi , which can be complex, satisfy ki=0 γi = 1. Of
course, the methods differ in the way they determine the γi .
1 Unless otherwise stated, {c m } will mean {c m }∞
m=0 throughout this book.

1
2 Chapter 0. Introduction and Review of Linear Algebra

Now, a good way to approach and motivate vector extrapolation methods is within
the context of the fixed-point iterative solution of systems of equations. Because the
actual development of these methods proceeds via the solution of linear systems, we
devote the next section to a brief review of linear algebra, where we introduce the
notation that we employ throughout this work and state some important results from
matrix theory that we recall as we go along. Following these, in Sections 0.3 and 0.4 of
this chapter, we review the essentials of the fixed-point iterative solution of nonlinear
and, especially, linear systems in some detail. We advise the reader to study this chapter
with some care and become familiar with its contents before proceeding to the next
chapters.

0.2 Some linear algebra notation and background


In this section, we provide the necessary background in matrix analysis and numerical
linear algebra that we will need for this book; in addition, we establish most of the
notation we will be using throughout. The rigorous treatments of these subjects are
to be found in various books. For matrix analysis, we refer the reader to Gantmacher
[95], Horn and Johnson [138, 139], Householder [142], Berman and Plemmons [21],
and Varga [333], for example.
For numerical linear algebra, we refer the reader (in alphabetical order) to Axels-
son [12], Barrett et al. [16], Björck [23, 24], Brezinski [33, 34], Ciarlet [60], Datta
[64], Demmel [73], Golub and Van Loan [103], Greenbaum [118], Hageman and
Young [125], Ipsen [144], Kelley [157], Liesen and Strakoš [174], Meurant [187],
Meyer [188], Parlett [208], Saad [239, 240], Stewart [309, 310, 311], Trefethen and
Bau [324], van der Vorst [326], Watkins [339], and Wilkinson [344], for example. See
also the numerical analysis books by Ralston and Rabinowitz [214] and Stoer and Bu-
lirsch [313], and the more recent book by Gander, Gander, and Kwok [93], which also
contain extensive treatments of numerical linear algebra.

Vector and matrix spaces


We assume that the reader is familiar with the basic properties of vector spaces. We
will be using the following standard notation:

• : the field of complex numbers.

• : the field of real numbers.

•  s : the complex vector space of dimension s (over ).

•  s : the real vector space of dimension s (over ).

•  r ×s : the space of r × s matrices with complex entries.

•  r ×s : the space of r × s matrices with real entries.

We denote the dimension of any vector space  by dim  .

Subspaces
• A subset  of a vector space  is a subspace of  if it is a vector space itself.
0.2. Some linear algebra notation and background 3

• If  and  are subspaces of the vector space  , then the set  ∩ is a subspace
of  , and we have

dim  + dim  − dim  ≤ dim( ∩  ) ≤ min{dim  , dim  }.

The set  ∪  is not necessarily a subspace.

• Define the sum of the subspaces  and  of  via

 +  = { z ∈  : z = x + y, x ∈  , y ∈  }.

 +  is a subspace of  , and

max{dim  , dim  } ≤ dim( +  ) ≤ dim  + dim  ,

dim( +  ) = dim  + dim  − dim( ∩  ).

Vectors and matrices


We will use lowercase boldface italic letters to denote column vectors. We will also
use 0 to denote the zero column vector. Thus,
⎡ (1) ⎤
x
⎢ x (2) ⎥
⎢ ⎥
x ∈  s if x = ⎢ . ⎥ , x (i ) ∈  ∀ i. (0.1)
⎣ .. ⎦
x (s )

We will denote the standard basis vectors in  s by e i . Thus e i has one as its ith com-
ponent, the remaining components being zero. We will also denote by e the vector
whose components are all unity.
We denote the transpose of x and the Hermitian conjugate of x, both row vectors,
by x T and x ∗ , respectively, and these are given as

x T = [x (1) , x (2) , . . . , x (s ) ] and x ∗ = [ x (1) , x (2) , . . . , x (s ) ].

Here, a stands for the complex conjugate of a.


We will use uppercase boldface italic letters to denote matrices. We will also use
O to denote the zero matrix. Of course, I denotes the identity matrix. Sometimes
it becomes necessary to emphasize the dimension of the identity matrix, and in such
cases we will also write I s to denote the identity matrix in  s ×s . Thus,
⎡ ⎤
a11 a12 ··· a1s
⎢ a21 a22 ··· a2s ⎥
⎢ ⎥
A ∈  r ×s if A = [ai j ]1≤i ≤r =⎢ . .. .. ⎥ , ai j ∈  ∀ i, j . (0.2)
1≤ j ≤s ⎣ .. . . ⎦
ar 1 ar 2 ··· ar s

Sometimes, we will also denote ai j by (A)i j .

• We denote by AT the transpose of A. Thus, if A ∈  r ×s , then AT ∈  s ×r , and

(AT )i j = (A) j i ∀ i, j .
4 Chapter 0. Introduction and Review of Linear Algebra

Similarly, we denote by A∗ the Hermitian conjugate of A. Thus, if A ∈  r ×s ,


then A∗ ∈  s ×r , and
(A∗ )i j = (A) j i ∀ i, j .

Here too, a stands for the complex conjugate of a. Thus,

A∗ = AT = AT .

• A square matrix Ais said to be symmetric if AT = A. It is said to be skew symmetric


if AT = −A.
If A is real skew symmetric, then x T Ax = 0 for every real vector x.
Similarly, a square matrix A is said to be Hermitian if A∗ = A. It is said to be skew
Hermitian if A∗ = −A.
If A is Hermitian (skew Hermitian), then x ∗Ax is real (purely imaginary or zero)
for every complex vector x.
If A = [ai j ]1≤i , j ≤s is Hermitian (skew Hermitian), then ai i are all real (purely
imaginary or zero).

• A square matrix A is said to be normal if it satisfies A∗A = AA∗ . Thus, Hermi-


tian, skew-Hermitian, real symmetric, and real skew-symmetric matrices are all
normal.

• Any complex square matrix A can be written in the form A = AH + AS , where


AH = 2 (A + A∗ ) is the Hermitian part of A and AS = 2 (A − A∗ ) is the skew-
1 1

Hermitian part of A. The symmetric part and the skew-symmetric part of a real
square matrix can be defined analogously.

• A square matrix Q is said to be unitary if it satisfies Q ∗Q = QQ ∗ = I .

• A matrix Q ∈  r ×s , r > s, is also said to be unitary if Q ∗Q = I s . (Note that, in


this case, QQ ∗ is a singular r × r matrix, hence not equal to I r .)

• If we denote the ith column of Q ∈  r ×s , r ≥ s, by q i , then Q is unitary if


q ∗i q j = δi j for all i, j .

• A square matrix P is said to be a permutation matrix if it is obtained by permut-


ing the rows (or the columns) of the identity matrix I .
If P is a permutation matrix, then so is P T . In addition, P T P = PP T = I ; that
is, P is also unitary.
Let A ∈  r ×s , and let P ∈  s ×s and P  ∈  r ×r be two permutation matrices.
Then the matrices AP and P  A are obtained by permuting, respectively, the
columns and the rows of A.

• A ∈  r ×s is said to be diagonal if ai j = 0 for i = j , whether r ≥ s or r ≤ s.

• If A ∈  r ×s is a diagonal matrix with elements d1 , d2 , . . . , d p along its main diag-


onal, where p = min{r, s}, then we will write

A = diag (d1 , d2 , . . . , d p ), p = min{r, s}.


0.2. Some linear algebra notation and background 5

If A = [ai j ]1≤i , j ≤s is a square matrix, we will define the matrix diag (A) ∈  s ×s
via
diag (A) = diag (a11 , a22 , . . . , a s s ),
and we will define tr(A), the trace of A, as
s
tr(A) = ai i .
i =1

• A matrix A ∈  r ×s whose elements below (above) the main diagonal are all zero
is said to be upper triangular (lower triangular).2
• If A = [ai j ]1≤i , j ≤s and B = [bi j ]1≤i , j ≤s are upper (lower) triangular, then AB =
C = [ci j ]1≤i , j ≤s is upper (lower) triangular too, and ci i = ai i bi i for all i.

• If A = [ai j ]1≤i , j ≤s is upper (lower) triangular with ai i = 0 for all i, then A is


nonsingular. A−1 is upper (lower) triangular too, and (A−1 )i i = 1/ai i for all i.
• A matrix whose elements below the subdiagonal (above the superdiagonal) are
all zero is said to be upper Hessenberg (lower Hessenberg).

Partitioning of matrices
We will make frequent use of matrix partitionings in different places.
• If A = [ai j ]1≤i ≤r ∈  r ×s , then we will denote the ith row and j th column of A
1≤ j ≤s
by a Ti and a j , respectively; that is,

a Ti = [ai 1 , ai 2 , . . . , ai s ] and a j = [a1 j , a2 j , . . . , a r j ]T .

We will also write


⎡ ⎤
a T1
⎢ a T2 ⎥
⎢ ⎥
A= ⎢
⎢ .. ⎥
⎥ and A = [ a 1 | a 2 | · · · |a s ]. (0.3)
⎣ . ⎦
a Tr

• If A ∈  r ×s and B ∈  s ×t , then we have3


s T
AB = [Ab 1 |Ab 2 | · · · |Ab t ] and also AB = ai bi .
i =1

Matrix-vector multiplication
Let x = [x (1) , . . . , x (s ) ]T and let A = [ai j ]1≤i ≤r ∈  r ×s , with the columnwise parti-
1≤ j ≤s
tioning in (0.3). Then z = Ax can be computed as follows:

• Row version: z (i ) = sj =1 ai j x ( j ) , i = 1, . . . , r.

• Column version: z = sj =1 x ( j ) a j .
2 When r < s (r > s ), A is said to be upper trapezoidal (lower trapezoidal) too.
3
Note that if x = [x (1) , . . . , x (r ) ]T ∈  r and y = [y (1) , . . . , y (s ) ]T ∈  s , then Z = x y T ∈  r ×s , with
zi j = (Z)i j = x (i ) y ( j ) .
6 Chapter 0. Introduction and Review of Linear Algebra

Linear independence and rank


• A set of vectors {a 1 , . . . , a k } is said to be linearly dependent if there exist scalars

αi , not all zero, such that ki=1 αi a i = 0. Otherwise, it is said to be linearly
independent.
• The number of linearly independent rows of a matrix A is equal to the number
of its linearly independent columns, and this number is called the rank of A. We
denote this number by rank(A).
• Thus, if A ∈  r ×s , then rank(A) ≤ min{r, s}.
• We also have rank(A) = rank(AT ) = rank(A∗ ).
• If A ∈  r ×s , and rank(A) = min{r, s}, then A is said to be of full rank.
If A ∈  r ×s , r ≥ s, and is of full column rank, that is, rank(A) = s, then Ax = 0
if and only if x = 0.
• A ∈  s ×s is nonsingular if and only if rank(A) = s.
• rank(AB) ≤ min{rank(A), rank(B)}.
• If A ∈  r ×r and C ∈  s ×s are nonsingular, and B ∈  r ×s , then rank(AB) =
rank(BC) = rank(ABC) = rank(B).
• rank(A∗A) = rank(AA∗ ) = rank(A).

Range and null space


Let A ∈  r ×s , with the columnwise partitioning in (0.3).
• The range of A is the subspace
(A) = y ∈  r : y = Ax for some x ∈  s .

Thus, (A) is the set of all vectors of the form y = is =1 x (i ) a i . Hence, the
dimension of (A) satisfies dim (A) = rank(A).
• The null space of A is the subspace
 (A) = x ∈  s : Ax = 0 .
The dimension of  (A) satisfies dim  (A) = s − rank(A).
• For any two matrices A ∈  r ×s and B ∈  s × p ,
(AB) ⊆ (A) and dim (AB) = dim (B) − dim[ (A) ∩ (B)].

• We also have
(A∗A) = (A∗ ),  (A∗A) =  (A).
• For A ∈  r ×s , the orthogonal complement of (A), denoted by (A)⊥ , is defined
as
(A)⊥ = {y ∈  r : y ∗ x = 0 for all x ∈ (A)}.
Then every vector in  r is the sum of a vector in (A) and another vector in
(A)⊥ ; that is,
 r = (A) ⊕ (A)⊥ .
In addition,
(A)⊥ =  (A∗ ).
0.2. Some linear algebra notation and background 7

Eigenvalues and eigenvectors


• We say (λ, v), where λ ∈  and v ∈  s , v = 0, is an eigenpair of a square matrix
A ∈  s ×s if
Av = λv.
λ is an eigenvalue of A and v is a right eigenvector, or simply an eigenvector of A,
corresponding to the eigenvalue λ. To emphasize that λ is an eigenvalue of A,
we will sometimes write λ(A) instead of λ.
The left eigenvector w of A corresponding to its eigenvalue λ is defined via
w T A = λw T .

• A ∈  s ×s has exactly s eigenvalues that are the roots of its characteristic polyno-
mial R(λ), which is defined by
R(λ) = det(λI − A).
Note that R(λ) is of degree exactly s with leading coefficientq one. Thus, if
λ1 , . . . , λq are the distinct roots of R(λ), then R(λ) = i =1
(λ − λi ) ri , where
q
λi = λ j if i = j and i =1 ri = s. ri is called the algebraic multiplicity of λi .

• An eigenvalue λi of A is said to be simple if ri = 1; otherwise, λi is multiple.


• Eigenvectors corresponding to different eigenvalues are linearly independent.
• Denote by ri the number of linearly independent eigenvectors corresponding
to λi . ri is called the geometric multiplicity of λi , and it satisfies 1 ≤ ri ≤ ri . If
ri = ri , we say that λi is nondefective; otherwise, λi is defective. If ri = ri for all
i, then A is diagonalizable or nondefective; otherwise, A is nondiagonalizable or
defective.
• The set {λ1 , . . . , λ s } of all the eigenvalues of A is called the spectrum of A and will
be denoted by σ(A).

• Let us express the characteristic polynomial of A in the form R(λ) = is =0 ci λi ,
c s = 1. Then
s 
s
−c s −1 = tr(A) = λi and (−1) s c0 = det A = λi .
i =0 i =1

• By the Cayley–Hamilton theorem,


R(A) = O,
s
where R(λ) is the characteristic polynomial of A. Thus, with R(λ) = i =0 ci λ
i
,

we have R(A) = is =0 ci Ai , with A0 ≡ I .
• If A ∈  r ×s and B ∈  s ×r , then
λ s det(λI r − AB) = λ r det(λI s − BA).
As a result, when r = s, AB and BA have the same characteristic polynomials
and hence the same eigenvalues. This can be extended to the product of an arbi-
trary number of square matrices of the same dimension. For example, if A, B, C
are all in  r ×r , then ABC, CAB, and BCA all have the same characteristic poly-
nomials and hence the same eigenvalues.
8 Chapter 0. Introduction and Review of Linear Algebra

• If A ∈  s ×s has exactly s linearly independent eigenvectors, then A can be diag-


onalized by a nonsingular matrix V as in

V −1AV = diag (λ1 , . . . , λ s ).


Here
V = [ v 1 | · · · | v s ], Av i = λi v i , i = 1, . . . , s,
and
V −1 = [ w 1 | · · · | w s ]T , w Ti A = λi w Ti , i = 1, . . . , s.
In addition, w Ti v j = δi j .
4

• A ∈  s ×s is a normal matrix if and only if it can be diagonalized by a unitary


matrix V = [ v 1 | · · · | v s ], that is,

V ∗AV = diag (λ1 , . . . , λ s ), V ∗V = VV ∗ = I .

Thus, Av i = λi v i , i = 1, . . . , s, and v ∗i v j = δi j for all i, j .


Special cases of normal matrices are Hermitian and skew-Hermitian matrices.
(i) If A ∈  s ×s is Hermitian, then all its eigenvalues are real.
(ii) If A ∈  s ×s is skew Hermitian, then all its eigenvalues are purely imaginary
or zero.
• The spectral radius of a square matrix A is defined as
 
ρ(A) = max λi (A), λi (A) eigenvalues of A.
i

• A square matrix A is said to be positive definite if

x ∗Ax > 0 for all x = 0.

It is said to be positive semidefinite if

x ∗Ax ≥ 0 for all x = 0.

A matrix A is positive definite (positive semidefinite) if and only if (i) it is Hermi-


tian and (ii) all its eigenvalues are positive (nonnegative). Note that the require-
ment that x ∗Ax be real for all x already forces A to be Hermitian.
• If A ∈  r ×s , then A∗A ∈  s ×s and AA∗ ∈  r ×r are both Hermitian and positive
semidefinite. If r ≥ s and rank(A) = s, A∗A is positive definite. If r ≤ s and
rank(A) = r , AA∗ is positive definite.
• The singular values of a matrix A ∈  r ×s are defined as
 
σi (A) = λi (A∗A) = λi (AA∗ ), i = 1, . . . , min{r, s}.

The eigenvectors of A∗A (of AA∗ ) are called the right (left) singular vectors of A.
Thus, σi (A) = |λi (A)| if A is normal.
4
When A is nondiagonalizable or defective, there is a nonsingular matrix V such that the matrix J =
V −1 AV , called the Jordan canonical form of A, is almost diagonal. We will deal with this general case in
detail later via Theorem 0.1 in Section 0.4.
0.2. Some linear algebra notation and background 9

Vector norms
• We use  ·  to denote vector norms in  s . Vector norms satisfy the following
conditions:
1. x ≥ 0 for all x ∈  s ; x = 0 if and only if x = 0.
2. γ x = |γ | x  for every γ ∈  and x ∈  s .
3. x + y ≤ x + y for every x, y ∈  s .
• With x = [x (1) , . . . , x (s ) ]T , the l p -norm in  s is defined via
 s 1/ p
(i ) p
x p = |x | , 1 ≤ p < ∞; x∞ = max |x (i ) |, p = ∞.
i
i =1

Thus, the l2 -norm (also called the Euclidean norm) is simply



x2 = x ∗ x.

• If A ∈  r ×s , r ≥ s, is unitary and x ∈  s , then


Ax2 = x2 .

• If A ∈  r ×s and x ∈  s , then
Ax2
min σi (A) ≤ ≤ max σi (A).
i x2 i

• If A ∈  s ×s is normal and x ∈  s , then


Ax2
min |λi (A)| ≤ ≤ max |λi (A)|.
i x2 i

• The following is known as the Hölder inequality:


1 1
|x ∗ y| ≤ x p yq , + = 1, 1 ≤ p, q ≤ ∞.
p q
The special case p = q = 2 of the Hölder inequality is called the Cauchy–Schwarz
inequality.
• If  ·  is a norm on  s and A ∈  s ×s is nonsingular, then x ≡ Ax is a norm
on  s too.
• If  ·  is a norm on  s , then x is a continuous function of x. Thus, with x =
[x (1) , . . . , x (s ) ]T , x is a continuous function in the s scalar variables x (1) , . . . , x (s ) .
• All vector norms are equivalent. That is, given any two vector norms  · (a) and
 · (b ) in  s , there exist positive constants Ca b and Da b such that

Ca b x(a) ≤ x(b ) ≤ Da b x(a) ∀ x ∈ s .

This implies that if a vector sequence {x m } converges to s in one norm, it con-


verges to s in every norm. Thus, if limm→∞ x m −s(a) = 0, then lim m→∞ x m −
s(b ) = 0 too, and vice versa, because

Ca b x m − s(a) ≤ x m − s(b ) ≤ Da b x m − s(a) .


10 Chapter 0. Introduction and Review of Linear Algebra

Matrix norms
• Matrix norms will also be denoted by ·. Since matrices in  r ×s can be viewed
also as vectors in  r s (that is, the matrix space  r ×s is isomorphic to the vector
space  r s ), we can define matrix norms just as we define vector norms, by the
following three conditions:
1. A ≥ 0 for all A ∈  r ×s . A = 0 if and only if A = O.
2. γA = |γ | A for every γ ∈  and A ∈  r ×s .
3. A + B ≤ A + B for every A, B ∈  r ×s .

The matrix norms we will be using are generally the natural norms (or induced
norms or subordinate norms) that are defined via

Ax(a)
A(a,b ) = max ,
x=0 x(b )

where A ∈  r ×s and x ∈  s , and Ax(a) and x(b ) are the vector norms in  r
and  s , respectively. Note that this maximum exists and is achieved for some
nonzero vector x 0 . We say that the matrix norm  · (a,b ) is induced by, or is
subordinate to, the vector norms  · (a) and  · (b ) . (Here,  · (a) and  · (b )
are not to be confused with the l p -norms.) With this notation, induced matrix
norms satisfy the following fourth condition, in addition to the above three:
4. AB(a,c) ≤ A(a,b ) B(b ,c) , with A ∈  r ×s and B ∈  s ×t .

There are matrix norms that are not natural norms and that satisfy the fourth
condition. Matrix norms, whether natural or not, that satisfy the fourth condi-
tion are said to be multiplicative.

• In view of the definition above, natural norms satisfy

Ax(a) ≤ A(a,b ) x(b ) for all x.

In addition,

Ax(a) ≤ M x(b ) for all x ⇒ A(a,b ) ≤ M .

• When A is a square matrix and the vector norms  · (a) and  · (b ) are the same,
we let A(a) stand for A(a,a) . In this case, we have

Ax(a) ≤ A(a) x(a) for all x.

Also, we say that the matrix norm  · (a) is induced by the vector norm  · (a) .

• In  s ×s , for any natural norm A(a,a) ≡ A(a) , we have I (a) = 1.

• If Ai ∈  s ×s , i = 1, . . . , p, and  ·  is a multiplicative norm on  s ×s , then


k
A1 · · · Ak  ≤ Ai  and I  ≥ 1.
i =1
0.2. Some linear algebra notation and background 11

• With A ∈  r ×s as in (0.2), if we let Ax and x be the vector l p -norms in  r


and  s , respectively, the natural norm A of A becomes
r
A = A1 = max |ai j |, p = 1,
1≤ j ≤s
i =1
s
A = A∞ = max |ai j |, p = ∞,
1≤i ≤r
j =1

A = A2 = max σi (A), p = 2; σi (A) singular values of A.


i

In view of these, we have the following:


P p = 1, 1 ≤ p ≤ ∞, if P is a permutation matrix.
AP 1 = A1 if P is a permutation matrix.
PA∞ = A∞ if P is a permutation matrix.
U 2 = 1 if U ∈  r ×s and is unitary.
UA2 = A2 if A ∈  r ×s , U ∈  r ×r and is unitary.
UAV 2 = A2 if A ∈  r ×s , U ∈  r ×r , V ∈  s ×s , and U ,V are unitary.
A2 = max |λi (A)| = ρ(A) if A ∈  s ×s and is normal.

• A matrix norm that is multiplicative but not natural and that is used frequently
in applications is the Frobenius or Schur norm. For A ∈  r ×s , A = [ai j ]1≤i ≤r
1≤ j ≤s
as in (0.2), this norm is defined by

 r s  

AF =  |ai j |2 = tr(A∗A) = tr(AA∗ ).
i =1 j =1

We also have
UAV F = AF if U ∈  r ×r and V ∈  s ×s are unitary.

• The natural matrix norms and spectral radii for square matrices satisfy
ρ(A) ≤ A.
In addition, given ε > 0, there exists a vector norm  ·  that depends on A and ε
such that the matrix norm induced by it satisfies
A ≤ ρ(A) + ε.
The two inequalities between A and ρ(A) become an equality when we have
the following:
(i) A is normal and A = A2 , since A2 = ρ(A) in this case.
(ii) A = diag (d1 , . . . , d s ) and A = A p , with arbitrary p, for, in this case,

A p = max |di | = ρ(A), 1 ≤ p ≤ ∞.


1≤i ≤s

• The natural matrix norms and spectral radii for square matrices also satisfy
 
ρ(A) = lim Ak 1/k .
k→∞
12 Chapter 0. Introduction and Review of Linear Algebra

Condition numbers
• The condition number of a nonsingular square matrix A relative to a natural
matrix norm  ·  is defined by

κ(A) = A A−1 .

• Relative to every natural norm, κ(A) ≥ 1.

• If the natural norm is induced by the vector l p -norm, we denote κ(A) by κ p (A).
For l2 -norms, we have the following:

σmax (A)
κ2 (A) = , A arbitrary, and
σmin (A)
|λ (A)|
κ2 (A) = max , A normal.
|λmin (A)|

Here σmin (A) and σmax (A) are the smallest and largest singular values of A. Sim-
ilarly, λmin (A) and λmax (A) are the smallest and largest eigenvalues of A in mod-
ulus.

• If A = [ai j ]1≤i , j ≤s is upper or lower triangular and nonsingular, then ai i = 0,


1 ≤ i ≤ s, necessarily, and

maxi |ai i |
κ p (A) ≥ , 1 ≤ p ≤ ∞.
mini |ai i |

Inner products
• We will use (· , ·) to denote inner products (or scalar products).
Thus, (x, y), with x, y ∈  s , denotes an inner product in  s . Inner products
satisfy the following conditions:
1. (y, x) = (x, y) for all x, y ∈  s .
2. (x, x) ≥ 0 for all x ∈  s and (x, x) = 0 if and only if x = 0.
3. (αx, βy) = αβ(x, y) for x, y ∈  s and α, β ∈ .
4. (x, βy + γ z ) = β(x, y) + γ (x, z ) for x, y, z ∈  s and β, γ ∈ .

• We say that the vectors x and y are orthogonal to each other if (x, y) = 0.

• An inner product (· , ·) in  s can also be used to define a vector norm  ·  in  s


as 
x = (x, x).

• For any inner product (· , ·) and vector norm  ·  induced by it, and any two
vectors x, y ∈  s , we have

|(x, y)| ≤ x y.

Equality holds if and only if x and y are linearly dependent. This is a more
general version of the Cauchy–Schwarz inequality mentioned earlier.
0.2. Some linear algebra notation and background 13

• The standard Euclidean inner product in  s and the vector norm induced by it
are defined by, respectively,

(x, y) = x ∗ y ≡ 〈x, y〉 and z  = 〈z , z 〉 ≡ z 2 .

The standard Euclidean inner product is used to define the angle ∠(x, y) between
two nonzero vectors x and y as follows:

| 〈x, y〉 | π
cos ∠(x, y) = , 0 ≤ ∠(x, y) ≤ .
x2 y2 2

• The most general inner product in  s and the vector norm induced by it are,
respectively,

(x, y) = x ∗ M y ≡ (x, y)M and z  = (z , z )M ≡ z M ,

where M ∈  s ×s is Hermitian positive definite. Such an inner product is also


called a weighted inner product.
(Throughout this book, we will be using both the Euclidean inner product and
the weighted inner product and the norms induced by them. Normally, we will
use the notation (y, z ) for all norms, whether
 weighted or not. We will use
the notation 〈x, y〉 = y ∗ z and z 2 = 〈z , z 〉 to avoid confusion when both
weighted and standard Euclidean inner products and norms induced by them
are being used simultaneously.)
• Unitary matrices preserve the Euclidean inner product 〈· , ·〉. That is, if U is
unitary, then 〈U x, U y〉 = 〈x, y〉.

Linear least-squares problems


The least-squares solution x to the problem Ax = b, where A ∈  r ×s , b ∈  r , and
x ∈  s , is defined to be the solution to the minimization problem

min Ax − b,


x

where z  = (z , z ), (· , ·) being an arbitrary inner product in  r .
• Let the column partitioning of A be A = [ a 1 | a 2 | · · · |a s ]. It is known that x
also satisfies the normal equations
s
(a i , a j ) x ( j ) = (a i , b), i = 1, . . . , s.
j =1

• When (y, z ) = y ∗ z , the normal equations become

A∗Ax = A∗ b.

If r ≥ s and rank(A) = s, A∗A is nonsingular, and hence the solution is given by

x = (A∗A)−1A∗ b.

In addition, this solution is unique.


14 Chapter 0. Introduction and Review of Linear Algebra

Some special classes of square matrices


Next, we discuss some special classes of square matrices A = [ai j ]1≤i , j ≤s ∈  s ×s .

• A is said to be strictly diagonally dominant if it satisfies


s
|ai i | > |ai j |, i = 1, . . . , s.
i =1
j =i

It is known that if A is strictly diagonally dominant, then A is also nonsingular.


• A is said to be reducible if there exists a permutation matrix P such that
 
A A12
P T AP = 11 ,
O A22

where A11 and A22 are square matrices. Otherwise, A is irreducible.


Whether A ∈  s ×s is irreducible or not can also be determined by looking at
its directed graph G(A). This graph is obtained as follows: Let P1 , . . . , P s be s
points in the plane; these are called nodes. If ai j = 0, connect Pi to P j by a path
−−→
Pi P j directed from Pi to P j . G(A) is said to be strongly connected if, for any
−−−−→ −−→ −−−−−→
pair of nodes Pi and P j , there exists a directed path Pi =l0 P l1 , P l1 P l2 , . . . , P l r −1 P l r = j
connecting Pi to P j .
Then A is irreducible if and only if G(A) is strongly connected.
• A is said to be irreducibly diagonally dominant if it is irreducible and satisfies
s
|ai i | ≥ |ai j |, i = 1, . . . , s,
i =1
j =i

with strict inequality occurring at least once.


It is known that if A is irreducibly diagonally dominant, then A is also nonsin-
gular. In addition, ai i = 0 for all i.
• A is said to be nonnegative (positive) if ai j ≥ 0 (ai j > 0) for all i, j , and we write
A ≥ O (A > O).
If A is nonnegative, then ρ(A) is an eigenvalue of A, and the corresponding eigen-
vector v is nonnegative, that is, v ≥ 0. If A is irreducible, in addition to being
nonnegative, then ρ(A) is a simple eigenvalue of A, and the corresponding eigen-
vector v is positive, that is, v > 0.
• A is said to be an M-matrix if (i) ai j ≤ 0 for i = j and (ii) A−1 ≥ O.
It is known that A is an M-matrix if and only if (i) ai i > 0 for all i and (ii) the
matrix B = I − D −1A, where D = diag (A), satisfies ρ(B) < 1.
• A = M − N is said to be a regular splitting of A if M is nonsingular with M −1 ≥ O
and N ≥ O.
Thus, the splitting A = M − N , with M = diag (A) and N = M − A, is a regular
splitting if A is an M-matrix.
0.3. Fixed-point iterative methods for nonlinear systems 15

0.3 Fixed-point iterative methods for nonlinear systems


Consider the nonlinear system of equations

ψ(x) = 0, ψ : N → N , (0.4)

whose solution we denote by s. What is meant by x, s, and ψ(x) is

x = [x (1) , . . . , x (N ) ]T , s = [s (1) , . . . , s (N ) ]T ; x (i ) , s (i ) scalars,

and
 T  
ψ(x) = ψ1 (x), . . . , ψN (x) ; ψi (x) = ψi x (1) , . . . , x (N ) scalar functions.

Then, starting with a suitable vector x 0 , an initial approximation to s, the sequence


{x m } of approximations can be generated by some fixed-point iterative method as

x m+1 = f (x m ), m = 0, 1, . . . , (0.5)

with
 T  
f (x) = f1 (x), . . . , fN (x) ; fi (x) = fi x (1) , . . . , x (N ) scalar functions.

Here x − f (x) = 0 is a possibly “preconditioned” form of (0.4); hence, it has the


same solution s [that is, ψ(s) = 0 and also s = f (s)], and, in the case of convergence,
lim m→∞ x m = s. One possible form of f (x) is

f (x) = x + C(x)ψ(x),

where C(x) is an N × N matrix such that C(s) is nonsingular.


We now want to study the nature of the vectors x m that arise from the iterative
method of (0.5), the function f (x) there being nonlinear in general. Assuming that
lim m→∞ x m exists, hence that x m ≈ s for all large m [recall that s is the solution to
the system ψ(x) = 0 and hence to the system x = f (x)], we expand f (x m ) in a Taylor
series about s. Expanding each of the functions fi (x m ), we have
N
(j)
fi (x m ) = fi (s) + fi , j (s)(x m − s ( j ) ) + O(x m − s2 ) as m → ∞,
j =1

where 
∂ fi 
fi , j (s) = , i, j = 1, . . . , N .
∂ x ( j )  x=s
Consequently,

x m+1 = f (s) + F (s)(x m − s) + O(x m − s2 ) as m → ∞, (0.6)

where F (x) is the Jacobian matrix of the vector-valued function f (x) given as
⎡ ⎤
f1,1 (x) f1,2 (x) ··· f1,N (x)
⎢ f2,1 (x) f2,2 (x) ··· f2,N (x) ⎥
⎢ ⎥
F (x) = ⎢ . .. .. ⎥ . (0.7)
⎣ .. . . ⎦
fN ,1 (x) fN ,2 (x) ··· fN ,N (x)
16 Chapter 0. Introduction and Review of Linear Algebra

Recalling that s = f (s), we rewrite (0.6) in the form

x m+1 = s + F (s)(x m − s) + O(x m − s2 ) as m → ∞. (0.8)

By (0.8), we realize that the vectors x m and x m+1 satisfy the approximate equality

x m+1 ≈ s + F (s)(x m − s) = F (s)x m + [I − F (s)]s for all large m.

That is, for all large m, the sequence {x m } behaves as if it were being generated by an
N -dimensional linear system of the form (I − T )x = d through

x m+1 = T x m + d, m = 0, 1, . . . , (0.9)

where T = F (s) and d = [I − F (s)]s. This suggests that we should study those se-
quences {x m } that arise from linear systems of equations to derive and study vector
extrapolation methods. We undertake this task in the next section.
 the rate of convergence (to s) of the sequence
 Now,  {x m } above is determined by
ρ F (s) , the spectral radius of F (s). It is known that ρ F (s) < 1 must hold for conver-
 
gence to take place and that, the closer ρ F (s) is to zero, the faster the convergence.
 
The rate of convergence deteriorates as ρ F (s) becomes closer to one, however.
As an example, let us consider the cases in which (0.4) and (0.5) arise from finite-
difference or finite-element discretizations of continuum problems. For s [the solution
to (0.4) and (0.5)] to be a reasonable approximation to the solution of the continuum
problem, the mesh size of the discretization must be small enough. However, a small
mesh size means a large N . In addition, as the mesh size tends to zero, hence N →
∞, generally, ρ F (s) tends to one, as can be shown rigorously in some cases. All
this means that, when the mesh size decreases, not only does the dimension of the
problem increase, the convergence of the fixed-point method in (0.5) deteriorates as
well. As mentioned above, this problem of slow convergence can be treated efficiently
via vector extrapolation methods.
Remark: From our discussion of the nature of the iterative methods for nonlinear
systems, it is clear that the vector-valued functions ψ(x) and f (x) above are assumed
to be differentiable at least twice in a neighborhood of the solution s.

0.4 Fixed-point iterative methods for linear systems


0.4.1 General treatment
Let A ∈ N ×N be a nonsingular matrix and b ∈ N be a given vector, and consider the
linear system of equations
Ax = b, (0.10)
whose solution we denote by s. As already mentioned, what is meant by s, x, and b,
is

s = [s (1) , . . . , s (N ) ]T , x = [x (1) , . . . , x (N ) ]T , b = [b (1) , . . . , b (N ) ]T ;


s (i ) , x (i ) , b (i ) scalars. (0.11)

We now split the matrix as in5

A=M −N, M nonsingular. (0.12)


5 Note that once M is chosen, N is determined (by A and M ) as N = M − A.
0.4. Fixed-point iterative methods for linear systems 17

Rewriting (0.10) in the form


Mx = Nx + b (0.13)
and choosing an initial vector x 0 , we generate the vectors x 1 , x 2 , . . . by solving (for
x m+1 ) the linear systems

M x m+1 = N x m + b, m = 0, 1, . . . . (0.14)

Now, (0.14) can also be written in the form

x m+1 = T x m + d, m = 0, 1, . . . ; T = M −1 N , d = M −1 b. (0.15)

Note that the matrix T cannot have one as one of its eigenvalues since A = M (I − T )
and A is nonsingular. [Since we have to solve the equations in (0.14) many times, we
need to choose M such that the solution of these equations is much less expensive than
the solution of (0.10).]
Now, we would like the sequence {x m } to converge to s. The subject of conver-
gence can be addressed in terms of ρ(T ), the spectral radius of T , among others. Ac-
tually, we have the following result.

Theorem 0.1. Let s be the (unique) solution to the system x = T x + d, where T does not
have one as one of its eigenvalues, and let the sequence {x m } be generated as in

x m+1 = T x m + d, m = 0, 1, . . . , (0.16)

starting with some initial vector x 0 . A necessary and sufficient condition for {x m } to
converge to s from arbitrary x 0 is ρ(T ) < 1.

Proof. Since s is the solution to the system x = T x + d, it satisfies

s = T s + d. (0.17)

Subtracting (0.17) from (0.16), we obtain

x m+1 − s = T (x m − s), m = 0, 1, . . . . (0.18)

Then, by induction, we have

x m − s = T m (x 0 − s), m = 0, 1, . . . . (0.19)

Since x 0 is arbitrary, so is x 0 − s, and hence lim m→∞ (x m − s) = 0 with arbitrary x 0 if


and only if lim m→∞ T m = O.
We now turn to the study of T m . For this, we will make use of the Jordan factor-
ization of T given as
T = V J V −1 , (0.20)
where V is a nonsingular matrix and J is the Jordan canonical form of T and is a block
diagonal matrix given by
⎡ ⎤
J r1 (λ1 )
⎢ J r2 (λ2 ) ⎥
⎢ ⎥
J =⎢⎢ .. ⎥,
⎥ (0.21)
⎣ . ⎦
J rq (λq )
18 Chapter 0. Introduction and Review of Linear Algebra

with the Jordan blocks J ri (λi ) defined as


⎡ ⎤
λ 1
⎢ .. ⎥
⎢ λ . ⎥
J 1 (λ) = [λ] ∈ 1×1 , J r (λ) = ⎢

⎥ ∈  r ×r ,
⎥ r > 1. (0.22)
⎣ ..
. 1⎦
λ
q
Here λi are the (not necessarily distinct) eigenvalues of T and r
i =1 i
= N . Therefore,

T m = (V J V −1 ) m = V J m V −1 , (0.23)

where ⎡ ⎤
[J r1 (λ1 )] m
⎢ [J r2 (λ2 )] m ⎥
⎢ ⎥
Jm =⎢
⎢ .. ⎥.
⎥ (0.24)
⎣ . ⎦
[J rq (λq )] m

It is clear that lim m→∞ J m = O implies that lim m→∞ T m = O by (0.23). Conversely,
by J m = V −1 T m V , lim m→∞ T m = O implies that limm→∞ J m = O, which implies
that lim m→∞ [J ri (λi )] m = O for each i.6 Therefore, it is enough to study [J r (λ)] m .
First, when r = 1,
[J 1 (λ)] m = [λ m ]. (0.25)
As a result, lim m→∞ [J 1 (λ)] m = O if and only if |λ| < 1.
For r > 1, let us write
⎡ ⎤
0 1
⎢ .. ⎥
⎢ 0 . ⎥
J r (λ) = λI r + E r , Er = ⎢

⎥ ∈  r ×r ,
⎥ (0.26)
⎣ ..
. 1⎦
0

so that  
m
m m−i i
[J r (λ)] m = (λI r + E r ) m = λ m I r + λ Er. (0.27)
i =1 i

Now, observe that, when k < r , the only nonzero elements of E kr are (E kr )i ,k+i = 1,
i = 1, . . . , r − k, and that E rr = O.7 Then

Em
r =O if m ≥ r , (0.28)
6
For a sequence of matrices {B m }∞ m=0 ∈ 
r ×s
, by lim m→∞ B m = O we mean that B m → O as m → ∞
entrywise, that is, lim m→∞ (B m )i j = 0, 1 ≤ i ≤ r, 1 ≤ j ≤ s , simultaneously.
7
For example,
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 1 0 0 0 0 1 0 0 0 0 1
⎢0 0 1 0⎥ ⎢0 0 0 1⎥ ⎢0 0 0 0⎥
E4 = ⎢ ⎥ 2 ⎢ ⎥
⎣0 0 0 1⎦ , E 4 = ⎣0 0 0 0⎦ , E 4 = ⎣0 0 0 0⎦ , E 4 = O
3 ⎢ ⎥ 4

0 0 0 0 0 0 0 0 0 0 0 0
.
0.4. Fixed-point iterative methods for linear systems 19

and, therefore, (0.27) becomes

r −1 
m m m m−i i
[J r (λ)] = λ I r + λ Er
i =1 i
⎡ m  m  m−1  m  m−2  m  m−r +1 ⎤
λ 1
λ  m2 λ m−1 ··· rm−1
λ
⎢ λ m
λ ··· λ m−r +2 ⎥
⎢ 1 rm−2 ⎥
⎢ λ m
··· λ m−r +3 ⎥
=⎢ r −3 ⎥. (0.29)
⎢ .. .. ⎥
⎣ . . ⎦
λm
 
By the fact that ml = 0 when l > m, note that (0.29) is valid for all m = 0, 1, . . . . In
addition, because
  k
m m(m − 1) · · · (m − k + 1) 1
= = cj m j , ck = ,
k k! j =0 k!
  m−r +1
for λ = 0, the most dominant entry as m → ∞ in [J r (λ)] m is r m
−1
λ , which is
r −1 m
O(m λ ). Therefore, lim m→∞ [J r (λ)] = O if and only if |λ| < 1.
m

We have shown that, whether r = 1 or r > 1, limm→∞ [J r (λ)] m = O if and only if


|λ| < 1. Going back to (0.24), we realize that limm→∞ J m = O if and only if |λi | < 1,
i = 1, . . . , q. The result now follows.

Remarks:

1. Jordan blocks of size ri > 1 occur when the matrix T is not diagonalizable.
(Nondiagonalizable matrices are also said to be defective.)

2. If T is diagonalizable, then ri = 1 for all i = 1, . . . , q, and q = N necessarily, that


is, J is a diagonal matrix, λi being the ith element along the diagonal. Therefore,
J m is diagonal too and is given by J m = diag (λ1m , . . . , λNm ). In this case, the ith
column of V is an eigenvector of T corresponding to the eigenvalue λi .

3. It is important to observe that

E kr = O if k < r ,
[J r (λ) − λI r ]k = (0.30)
O if k ≥ r .

4. It is clear from (0.19) that, the faster T m tends to O, the faster the convergence
of {x m } to s. The rate of convergence of T m to O improves as ρ(T ) decreases,
as is clear from (0.24) and (0.29).

0.4.2 Error formulas for x m


It is important to also analyze the structure of the error vector x m − s as a function of
m. We will need this to analyze the behavior of the error in extrapolation. In addition,
the result of Theorem 0.1 can also be obtained by looking at this structure. We treat
the error x m − s next.
20 Chapter 0. Introduction and Review of Linear Algebra

Error when T is diagonalizable


To start, we will look at the case where T is diagonalizable, that is, ri = 1 for all i.
Of course, in this case, q = N . As already mentioned, the ith column of V , which we
will denote by v i , is an eigenvector corresponding to the eigenvalue λi of T , that is,

T v i = λi v i . (0.31)

In addition, v 1 , . . . , v N form a basis for N . In view of this, we have the following


theorem.

Theorem 0.2. Let


N
x0 − s = αi v i for some αi ∈ . (0.32)
i =1

Then
N
xm − s = αi λim v i , m = 1, 2, . . . . (0.33)
i =1

This result is valid whether the sequence {x m } converges or not.

Proof. By (0.19) and (0.32), we have


N N  
xm − s = T m αi v i = αi T m v i .
i =1 i =1

Invoking (0.31), the result follows.

Remark: As x 0 is chosen arbitrarily, x 0 − s is also arbitrary, and so are the αi . There-


fore, by (0.33), for convergence of {x m } from any x 0 , we need to have |λi | < 1 for
all i.

Error when T is nondiagonalizable (defective)


When T is nondiagonalizable (or defective), the treatment of x m − s becomes much
more involved. For this, we need to recall some facts and details concerning the matrix
V in (0.20). We have first the partitioning

V = [V 1 |V 2 | · · · |V q ], V i ∈ N ×ri , i = 1, . . . , q, (0.34)

where the matrices V i have the columnwise partitionings given in

V i = [ v i 1 | v i 2 | · · · | v i ri ] ∈ N ×ri , i = 1, . . . , q. (0.35)

Here, v i 1 is an eigenvector of T corresponding to the eigenvalue λi , whether ri = 1 or


ri > 1. When ri > 1, the vectors v i j , j = 2, . . . , ri , are principal vectors (or generalized
eigenvectors) corresponding to λi . The v i j satisfy

T v i 1 = λi v i 1 , ri ≥ 1; T v i j = λi v i j + v i , j −1 , j = 2, . . . , ri , ri > 1. (0.36)

Consequently, we also have

 k v i , j −k = 0 if k < j ,
T − λi I v i j = (0.37)
0 if k = j .
0.4. Fixed-point iterative methods for linear systems 21

Generally, a nonzero vector u that satisfies


r  r −1
(T − λi I u = 0 but (T − λi I u = 0

is said to be a generalized eigenvector of T of rank r , with associated eigenvalue λi . Thus,


v i j is of rank j . In addition, the N vectors v i j are linearly independent, and hence
they form a basis for N . That is, every vector in N can be expressed as a linear
combination of the v i j . We make use of these facts to prove the next theorem.

Theorem 0.3. Let


q ri
x0 − s = αi j v i j for some αi j ∈ . (0.38)
i =1 j =1

Then there exist a vector z m associated with the zero eigenvalues and vector-valued poly-
nomials p i (m) associated with the respective nonzero eigenvalues λi , given by
q ri
zm = αi j v i , j −m , (0.39)
i =1 j =m+1
λi =0
ri −1  ri
m
p i (m) = ai l , a i l = λ−l
i
αi j v i , j −l , λi = 0, (0.40)
l =0
l j =l +1

such that q
xm − s = zm + p i (m)λim . (0.41)
i =1
λi =0

This result is valid whether the sequence {x m } converges or not.

Proof. By (0.38) and (0.19), we first have


q ri  q ri q ri 
xm − s = αi j (T m v i j ) = + αi j (T m v i j ). (0.42)
i =1 j =1 i =1 j =1 i =1 j =1
λi =0 λi =0

The first double summation represents the contribution of the zero eigenvalues of T
to x m − s, while the second represents the contribution of the nonzero eigenvalues of
T . We therefore need to analyze T m v i j for λi = 0 and for λi = 0 separately.
We start with the contribution of the zero eigenvalues λi . Letting λi = 0 in (0.37),
we have T m v i j = v i , j −m , where we also mean that v i k = 0 when k ≤ 0. Substituting
q  r
this into the summation i =1 j i=1 , we obtain the vector z m as the contribution of
the zero eigenvalues. λi =0

We now turn to the contribution of the nonzero eigenvalues λi , which turns out
to be more complicated. By (0.23), we have T m V = V J m . Now, by (0.34),

T m V = [ T m V 1 | T m V 2 | · · · | T m V q ],

and, invoking (0.24), we have

T m V i = V i [J ri (λi )] m , i = 1, . . . , q.
22 Chapter 0. Introduction and Review of Linear Algebra

Now,
T m V i = [ T m v i 1 | T m v i 2 | · · · | T m v i ri ].
Equating the j th column of the matrix T m V i , namely, the vector T m v i j , with the
j th column of V i [J ri (λi )] m by invoking (0.29), we obtain

j  
m m− j +k
T mvi j = vik λ . (0.43)
k=1
j −k i
q  ri
Substituting (0.43) into the summation i =1 j =1
and rearranging, we obtain
λi =0

q ri −1   ri  q
m
αi j v i , j −l λim−l = p i (m)λim
i =1 l =0 l j =l +1 i =1
λi =0 λi =0

as the contribution of the nonzero eigenvalues to x m − s. This completes the


proof.

Remarks:
 
1. Since ml is a polynomial in m of degree exactly l , p i (m) is a polynomial in m
of degree at most ri − 1.
2. Naturally, z m = 0 for all m when zero is not an eigenvalue of T .
3. In addition, z m = 0 for m ≥ max{ri : λi = 0}, since the summations on j in
(0.39) are empty for such m.
4. Clearly, z m is in the subspace spanned by the eigenvectors and principal vec-
tors corresponding to the zero eigenvalues. Similarly, p i (m) is in the subspace
spanned by the eigenvector and principal vectors corresponding to the eigen-
value λi = 0.
5. It is clear from (0.38) that, because x 0 is chosen arbitrarily, x 0 − s is also arbi-
trary, and so are the αi j . Therefore, by (0.41), for convergence of {x m } from any
x 0 , we need to have |λi | < 1 for all i. In addition, the smaller ρ(T ) is, the faster
{x m } converges to s.

0.4.3 Some basic iterative methods


We now turn to a few known examples of iterative methods. Of course, all the matrices
A below are assumed to be nonsingular. To see what the matrices M and N in (0.12)
are in some of the cases below, we decompose A as

A= D −E −F, (0.44)

where
D = diag (A) = diag (a11 , a22 , . . . , aN N ), (0.45)
while −E is the lower triangular part of A excluding the main diagonal, and −F is the
upper triangular part of A excluding the main diagonal. Hence, both E and F have
zero diagonals.
0.4. Fixed-point iterative methods for linear systems 23

1. Richardson iteration: In this method, we first rewrite (0.10) in the form


x = x + ω(b − Ax) for some ω ∈  (0.46)
and iterate as in
x m+1 = x m + ω(b − Ax m ), m = 0, 1, . . . . (0.47)
That is,
1 1
M= I, N= (I − ωA) (0.48)
ω ω
in (0.12), and hence
T = I − ωA, d = ωb (0.49)
in (0.15).
2. Jacobi method: In this method, x m+1 is computed from x m as follows:
 N 
(i ) 1 (j)
x m+1 = b (i ) − ai j x m , i = 1, . . . , N , (0.50)
ai i j =1
j =i

provided that ai i = 0 for all i. This amounts to setting


M = D, N = D −A= E +F. (0.51)
Therefore,
T = I − D −1A = D −1 (E + F ). (0.52)
3. Gauss–Seidel method: In this method, x m+1 is computed from x m as follows:
 i −1 N 
(i ) 1 (j) (j)
x m+1 = b (i ) − ai j x m+1 − ai j x m , i = 1, . . . , N , (0.53)
ai i j =1 j =i +1

(1) (2)
provided that ai i = 0 for all i. Here, we first compute x m+1 , then x m+1 , and so
on. Thus, we have
M = D − E, N = F (0.54)
so that
T = (D − E )−1 F . (0.55)
4. Symmetric Gauss–Seidel method: Let us recall that in each step of the Gauss–
(i )
Seidel method, the x m are being updated in the order i = 1, 2, . . . , N . We will
(i )
call this updating a forward sweep. We can also perform the updating of the x m
in the order i = N , N − 1, . . . , 1. One such step is called a backward sweep, and it
amounts to taking
M = D −F, N = E (0.56)
so that
T = (D − F )−1 E . (0.57)
If we obtain x m+1 by applying to x m a forward sweep followed by a backward
sweep, we obtain a method called the symmetric Gauss–Seidel method. The ma-
trix of iteration relevant to this method is thus
T = (D − F )−1 E (D − E )−1 F . (0.58)
24 Chapter 0. Introduction and Review of Linear Algebra

5. Successive overrelaxation (SOR): This method is similar to the Gauss–Seidel


method but involves a scalar ω called the relaxation parameter. In this method,
x m+1 is computed from x m as follows:
 i −1 N  ⎫
(i ) 1 (j) (j) ⎪

x̂ m+1 = b (i ) − ai j x m+1 − ai j x m
ai i j =1 j =i +1 , i = 1, . . . , N , (0.59)
(i ) (i ) (i )


x m+1 = ω x̂ m+1 + (1 − ω)x m

(1) (2)
provided that ai i = 0 for all i. Here too, we first compute x m+1 , then x m+1 , and
so on. Decomposing A as in (0.44), for SOR, we have

1 1
M= (D − ωE ), N= [(1 − ω)D + ωF ]; (0.60)
ω ω
hence,
T = (D − ωE)−1 [(1 − ω)D + ωF ]. (0.61)

Note that SOR reduces to the Gauss–Seidel method when ω = 1.


By adjusting the parameter ω, we can minimize ρ(T ) and thus optimize the rate
of convergence of SOR. In this case, SOR is denoted optimal SOR.

6. Symmetric SOR (SSOR): Note that, as in the case of the Gauss–Seidel method,
(i )
in SOR too the x m are being updated in the order i = 1, 2, . . . , N . We will call
(i )
this updating a forward sweep. We can also perform the updating of the x m in
the order i = N , N − 1, . . . , 1. One such step is called a backward sweep, and it
amounts to taking

1 1
M= (D − ωF ), N= [(1 − ω)D + ωE] (0.62)
ω ω
so that
T = (D − ωF )−1 [(1 − ω)D + ωE]. (0.63)

If we obtain x m+1 by applying to x m a forward sweep followed by a backward


sweep, we obtain a method called SSOR. The matrix of iteration relevant to this
method is thus

T = (D − ωF )−1 [(1 − ω)D + ωE](D − ωE)−1 [(1 − ω)D + ωF ]. (0.64)

7. Alternating direction implicit (ADI) method: This method was developed to solve
linear systems arising from finite-difference or finite-element solution of elliptic
and parabolic partial differential equations. In the linear system Ax = b, we
have A = H + V . Expressing this linear system in the forms

(H + μI )x = (−V + μI )x + b and (V + μI )x = (−H + μI )x + b,

ADI is defined via the following two-stage iterative method:


%
(H + μI )x m+1 = (−V + μI )x m + b
m = 0, 1, . . . , (0.65)
(V + μI )x m+1 = (−H + μI )x m+1 + b
0.4. Fixed-point iterative methods for linear systems 25

where μ is some appropriate scalar. In this case,


1 1
M= (H + μI )(V + μI ), N= (H − μI )(V − μI ) (0.66)
2μ 2μ

so that
T = (V + μI )−1 (H − μI )(H + μI )−1 (V − μI ). (0.67)
(i )
Remark: In the Jacobi, Gauss–Seidel, and SOR methods mentioned above, the x m
are updated one at a time. Because of this, these methods are called point methods.
(i )
We can choose to update several of the x m simultaneously, that is, we can choose to
(i )
update the x m in blocks. The resulting methods are said to be block methods.

0.4.4 Some convergence results for basic iterative methods


We now state without proof some convergence results for the iterative methods de-
scribed in the preceding subsection. For the proofs, we refer the reader to the relevant
literature.

Theorem 0.4. Let A have eigenvalues λi that satisfy

0 < λ1 ≤ λ2 ≤ · · · ≤ λN .

Then the Richardson iterative method converges provided 0 < ω < 2/λN . Denoting T by
T (ω), we also have the following optimal result:

2 λN − λ1
ω0 = , min ρ(T (ω)) = ρ(T (ω0 )) = < 1.
λN + λ1 ω λN + λ1

Theorem 0.5. Let the matrix A be strictly diagonally dominant. Then both the Jacobi
and the Gauss–Seidel methods converge. Define
i −1 N
μi = |ai j |/|ai i |, νi = |ai j |/|ai i |, i = 1, . . . , N .
j =1 j =i +1

1. For the Jacobi method,

ρ(T ) ≤ T ∞ = max(μi + νi ) < 1.


i

2. For the Gauss–Seidel method,


νi
ρ(T ) ≤ T ∞ ≤ max < 1.
i 1 − μi

We next state the Stein–Rosenberg theorem that pertains to the convergence of the
Jacobi and Gauss–Seidel methods.

Theorem 0.6. Denote by T J and T G-S the iteration matrices for the Jacobi and Gauss–
Seidel iterative methods for the linear system Ax = b. If T J ≥ O, then precisely one of the
following takes place:
26 Chapter 0. Introduction and Review of Linear Algebra

1. ρ(TG-S ) = ρ(TJ ) = 0.

2. 0 < ρ(T G-S ) < ρ(T J ) < 1.

3. ρ(TG-S ) = ρ(TJ ) = 1.

4. ρ(TG-S ) > ρ(TJ ) > 1.

Thus, the Jacobi and Gauss–Seidel methods converge together and diverge together. In the
case of convergence, the Gauss–Seidel method converges faster than the Jacobi method.

Theorem 0.7. Let the matrix A be irreducibly diagonally dominant. Then both the Jacobi
and the Gauss–Seidel methods converge.

Theorem 0.8. Let A be an M-matrix. Then the Jacobi method converges.

Theorem 0.9. Let A = M − N be a regular splitting of the matrix A, and let A−1 ≥ O.
Then
ρ(A−1 N )
ρ(M −1 N ) = < 1.
1 + ρ(A−1 N )
Hence, the iterative method x m+1 = T x m +d, where T = M −1 N , converges. Conversely,
ρ(M −1 N ) < 1 implies that A is nonsingular and A−1 ≥ O.

Theorem 0.10. Let the matrix A be Hermitian positive definite. Then we have the fol-
lowing:

1. The Gauss–Seidel method converges.

2. The Jacobi method converges if and only if the matrix 2D −A is Hermitian positive
definite. Here D = diag(A).

Theorem 0.11. For SOR to converge, it is necessary (but not sufficient) that 0 < ω < 2.

Theorem 0.12. Let A be Hermitian positive definite. Then SOR converges if and only if
0 < ω < 2.

Theorem 0.13. Let the matrices H and V in the ADI method be normal. Then
    
 μ − λi (H )   μ − λi (V ) 
ρ(T ) ≤ max    
max   .
i μ + λi (H )  i μ + λi (V ) 

Then the ADI method converges if H and V are Hermitian positive definite and μ > 0.

Theorem 0.14. Denote the Hermitian and skew-Hermitian parts of A by AH and AS ,


respectively. That is, AH = 2 (A + A∗ ) and AS = 2 (A − A∗ ), and A = AH + AS . Consider
1 1

the following two-stage fixed-point iterative method for the system Ax = b: Pick x 0 and a
scalar μ, and compute x 1 , x 2 , . . . as in
%
(μI + AH )x m+1 = (μI − AS )x m + b
, m = 0, 1, . . . .
(μI + AS )x m+1 = (μI − AH )x m+1 + b
0.4. Fixed-point iterative methods for linear systems 27

The matrix of iteration T in x m+1 = T x m + d is then given by

T = (μI + AS )−1 (μI − AH )(μI + AH )−1 (μI − AS ).

If AH is positive definite, then A is nonsingular. If, in addition, μ is real and μ > 0, then
 
 μ − λi (AH ) 
ρ(T ) ≤ max   < 1.
i μ + λi (AH ) 

Thus, the iterative method converges. (Note that the method described here is an ADI
method.)
Chapter 1

Development of
Polynomial Extrapolation
Methods

1.1 Preliminaries
1.1.1 Motivation
In this chapter, we present the derivation of four polynomial extrapolation methods:
minimal polynomial extrapolation (MPE), reduced rank extrapolation (RRE), modified
minimal polynomial extrapolation (MMPE), and SVD-based minimal polynomial ex-
trapolation (SVD-MPE). Of these, MPE, RRE, and MMPE date back to the 1970s,
while SVD-MPE was published in 2016. MPE was introduced by Cabay and Jackson
[52]; RRE was introduced independently by Kaniel and Stein [155], Eddy [74], and
Mes̆ina [185]; and MMPE was introduced independently by Brezinski [28], Pugachev
[211], and Sidi, Ford, and Smith [299]. SVD-MPE is a new method by the author
[290].8
MPE and RRE, along with the epsilon algorithms (to be described in Chapter 5),
have been reviewed by Skelboe [305] and by Smith, Ford, and Sidi [306]. Since the
publication of these reviews, quite a few developments have taken place on the sub-
ject of vector extrapolation, and some of the newer developments have been reviewed
by Sidi [286, 289]; for still newer developments, see Sidi [290, 292]. Our purpose
here is to cover as many of these developments as possible and to present a broad
perspective.
Given a vector sequence that converges slowly, our aim in this chapter is to
develop extrapolation methods whose only input is the sequence {x m } itself. As we
mentioned in the preceding chapter, a major area of application of vector extrapo-
lation methods is that of iterative solution of systems of equations. We have also
seen that nonlinear systems of equations “behave” linearly close to their solutions.
Therefore, in our derivation of polynomial extrapolation methods, we will go
through the iterative solution of linear systems of equations. That is, we will derive
the methods within the context of linear systems, making sure that these methods

8
The extrapolation methods we discuss in this book apply to vector sequences, as already mentioned.
Block versions of some of the methods we describe here, which apply to sequences of vectors and matrices,
have been given in Brezinski and Redivo Zaglia [40] and Messaoudi [186] and in the recent papers by Jbilou
and Sadok [151] and Jbilou and Messaoudi [146]. See also Baron and Wajc [15]. We do not discuss these
methods here.

31
32 Chapter 1. Development of Polynomial Extrapolation Methods

involve only the sequence of approximations {x m } that result from the iterative
methods used.9 Following their derivation (definition), we will present a detailed
discussion of their algebraic properties. We will not address the important issues
of (i) actual algorithms for their numerical implementation and (ii) their analytical
(convergence) properties in this chapter; we leave these topics to Chapters 2, 4, 6, and 7.

Important note: Starting with this chapter, and throughout this book, we will fix
our notation for the inner products in  s and the vector norms induced by them as
follows:
• For general or weighted inner products,

(y, z ) = y ∗ M z , z  = (z , z ), (1.1)
where M ∈  s ×s is a fixed Hermitian positive definite matrix. Recall that all
inner products in  s are weighted inner products unless they are Euclidean.
• For the standard l2 or Euclidean inner product,

〈y, z 〉 = y ∗ z , z2 = 〈z , z 〉, (1.2)
whenever confusion may arise.
Throughout, we use the fact that, for any square matrix H and analytic functions
f (λ) and g (λ), we have f (H ) g (H ) = g (H ) f (H ).
We will also need the following definition throughout this work.

Definition 1.1. The polynomial A(z) = im=0 ai z i is monic of degree m if a m = 1. We
also denote the set of monic polynomials of degree m by  m .

1.1.2 Minimal polynomials of matrices


We start by discussing minimal polynomials of matrices. We already know that the
characteristic polynomial R(λ) = det(λI − T ) of a square matrix T ∈ N ×N is a monic
polynomial of degree exactly N and its roots are the eigenvalues of T . If λi and ri are
the eigenvalues of T and their corresponding multiplicities, precisely as in (0.21) and
(0.22) in the proof of Theorem 0.1, then
N 
q q
R(λ) = ei λi = (λ − λi ) ri , ri = N , eN = 1. (1.3)
i =0 i =1 i =1

The following theorem is known as the Cayley–Hamilton theorem, and there are dif-
ferent proofs of it. The proof we present here employs the Jordan canonical form and
should provide a good exercise in the subject.

Theorem 1.2. The matrix T satisfies


N 
q
R(T ) = ei T i = (T − λi I ) ri = O, T0 = I. (1.4)
i =0 i =1

In other words, the characteristic polynomial of T annihilates the matrix T .


9
A completely different derivation of vector extrapolation methods can be given starting with the Shanks
transformation; this was done by Sidi, Ford, and Smith [299]. We summarize the Shanks transformation in
Section 5.2. For yet another approach that proceeds through kernels, see Brezinski and Redivo Zaglia [42].
1.1. Preliminaries 33

Proof. We will recall (0.20)–(0.29), concerning the Jordan


q canonical form, and use the
same notation. First, substituting (0.20) into R(T ) = i =1 (T − λi I ) ri , we have
N  q
  q 
ri −1
R(T ) = i
ei T = V (J − λi I ) V =V (J − λi I ) V −1 = V R(J )V −1 .
ri

i =0 i =1 i =1

Next, by (0.24), we have


⎡   ⎤
R J r1 (λ1 )
⎢   ⎥
N ⎢ R J r2 (λ2 ) ⎥
R(J ) = ei J i = ⎢
⎢ .. ⎥.

i =0 ⎣ .
 ⎦
R J rq (λq )

Now, for each j = 1, . . . , q,


   q
  r  r
R J r j (λ j ) = J r j (λ j ) − λi I r j i J r j (λ j ) − λ j I r j j = O,
i =1
i = j
 r
since, by (0.30), J r j (λ j )−λ j I r j j = O. Therefore, R(J ) = O. Consequently, R(T ) =
O as well.

Definition 1.3. The monic polynomial Q(λ) is said to be a minimal polynomial of T if


Q(T ) = O and if Q(λ) has smallest degree.

Theorem 1.4. The minimal polynomial Q(λ) of T exists and is unique. Moreover, if
Q1 (T ) = O for some polynomial Q1 (λ) with deg Q1 > deg Q, then Q(λ) divides Q1 (λ).
In particular, Q(λ) divides R(λ), the characteristic polynomial of T . [Thus, the degree of
Q(λ) is at most N , and its zeros are some or all of the eigenvalues of T .]

Proof. Since the characteristic polynomial R(λ) satisfies R(T ) = O, there is also a
monic polynomial Q(λ) of smallest degree m, m ≤ N , satisfying Q(T ) = O. Suppose
&
that there is another monic polynomial Q(λ) & ) = O.
of degree m that satisfies Q(T
&
Then the difference S(λ) = Q(λ) − Q(λ) also satisfies S(T ) = O, and its degree is less
than m, which is impossible. Therefore, Q(λ) is unique.
Let Q1 (λ) be of degree m1 such that m1 > m and Q1 (T ) = O. Then there exist
polynomials a(λ) of degree m1 − m and r (λ) of degree at most m − 1 such that Q1 (λ) =
a(λ)Q(λ) + r (λ). Therefore,
O = Q1 (T ) = a(T )Q(T ) + r (T ) = r (T ).
Since r (T ) = O, but r (λ) has degree less than m, r (λ) must be the zero polynomial.
Therefore, Q(λ) divides Q1 (λ). Letting Q1 (λ) = R(λ), we realize that Q(λ) divides
R(λ), meaning that its zeros are some or all of the eigenvalues of T .

Note that, with T = V J V −1 and J as before, we have Q(T ) = V Q(J )V −1 , where


⎡   ⎤
Q J r1 (λ1 )
⎢   ⎥
⎢ Q J r2 (λ2 ) ⎥
Q(J ) = ⎢
⎢ .. ⎥.

⎣ .
 ⎦
Q J rq (λq )
34 Chapter 1. Development of Polynomial Extrapolation Methods

To see how Q(λ) factorizes, let us consider the case in which λ1 = a = λ2 and are
different from the rest of the λi . Assume also that r1 ≥ r2 . Then, by (0.30), we have
that [J r j (λ j ) −aI r j ]k = O only when k ≥ r j , j = 1, 2. This means that Q(λ) will have
(λ − a) r1 as one of its factors.

Definition 1.5. Given a nonzero vector u ∈ N , the monic polynomial P (λ) is said to
be a minimal polynomial of T with respect to u if P (T )u = 0 and if P (λ) has smallest
degree.

Theorem 1.6. The minimal polynomial P (λ) of T with respect to u exists and is unique.
Moreover, if P1 (T )u = 0 for some polynomial P1 (λ) with deg P1 > deg P , then P (λ)
divides P1 (λ). In particular, P (λ) divides Q(λ), the minimal polynomial of T , which in
turn divides R(λ), the characteristic polynomial of T . [Thus, the degree of P (λ) is at most
N , and its zeros are some or all of the eigenvalues of T .]

Proof. Since the minimal polynomial Q(λ) satisfies Q(T ) = O, it also satisfies
Q(T )u = 0. Therefore, there is a monic polynomial P (λ) of smallest degree k, k ≤ m,
where m is the degree of Q(λ), satisfying P (T )u = 0. Suppose that there is another
monic polynomial P (λ) of degree k that satisfies P (T )u = 0. Then the difference
S(λ) = P (λ) − P (λ) also satisfies S(T )u = 0, and its degree is less than k, which is
impossible. Therefore, P (λ) is unique.
Let P1 (λ) be of degree k1 such that k1 > k and P1 (T )u = 0. Then there exist
polynomials a(λ) of degree k1 − k and r (λ) of degree at most k − 1 such that P1 (λ) =
a(λ)P (λ) + r (λ). Therefore,

0 = P1 (T )u = a(T )P (T )u + r (T )u = r (T )u.

Since r (T )u = 0, but r (λ) has degree less than k, r (λ) must be the zero polynomial.
Therefore, P (λ) divides P1 (λ). Letting P1 (λ) = Q(λ) and P1 (λ) = R(λ), we realize that
P (λ) divides Q(λ) and R(λ), meaning that its zeros are some or all of the eigenvalues
of T .

Again, with T = V J V −1 and J as before, we have P (T ) = V P (J )V −1 , where


⎡   ⎤
P J r1 (λ1 )
⎢   ⎥
⎢ P J r2 (λ2 ) ⎥
P (J ) = ⎢
⎢ .. ⎥.

⎣ .
 ⎦
P J rq (λq )

To see how P (λ) factorizes, let us consider the case in which λ1 = a = λ2 and are
different from the rest of the λi . Assume also that r1 ≥ r2 . Recall that the eigenvectors
and principal vectors v i j of T span N . Therefore, u can be expressed as a linear
combination of the v i j . Suppose that

r1 r2
u= α1 j v 1 j + α2 j v 2 j + (a linear combination of {v i j }, i ≥ 3),
j =1 j =1

r1 ≤ r1 , r2 ≤ r2 , r1 ≥ r2 ≥ 1, α1r  , α2r  = 0.


1 2
1.2. Solution to x = T x + d from {x m } 35

Then, by (0.37), we have that

 k  r1 r2 
T − aI α1 j v 1 j + α2 j v 2 j = 0 only when k ≥ r1 .
j =1 j =1


This means that P (λ) will have (λ − a) r1 as one of its factors.

Example 1.7. Let T = V J V −1 , where J is the Jordan canonical form of T , given as


⎡ ⎤
a 1
⎢ a 1 ⎥
⎢ ⎥
⎢ a ⎥
⎢ ⎥
⎢ a 1 ⎥
⎢ ⎥
J =⎢
⎢ a ⎥,
⎥ a = b .
⎢ b 1 ⎥
⎢ ⎥
⎢ b ⎥
⎢ ⎥
⎣ b ⎦
b

Note that J has five Jordan blocks with (λ1 = a, r1 = 3), (λ2 = a, r2 = 2), (λ3 = b , r3 =
2), (λ4 = b , r4 = 1), and (λ5 = b , r5 = 1). Thus, the characteristic polynomial R(λ)
and the minimal polynomial Q(λ) are

R(λ) = (λ − a)5 (λ − b )4 , Q(λ) = (λ − a)3 (λ − b )2 .

If
u = 2v 11 − v 12 + 3v 21 + 4v 32 − 2v 41 + v 51 ,
then (T − aI )2 and (T − b I )2 annihilate the vectors 2v 11 − v 12 + 3v 21 and 4v 32 −
2v 41 + v 51 , respectively. Consequently, the minimal polynomial of T with respect to
u is
P (λ) = (λ − a)2 (λ − b )2 .

Remark: From the examples given here, it must be clear that the minimal polyno-
mial of T with respect to u is determined by the eigenvectors and principal vectors of
T that are present in the spectral decomposition of u. This means that if two vectors
d 1 and d 2 , d 1 = d 2 , have the same eigenvectors and principal vectors in their spec-
tral decompositions, then the minimal polynomial of T with respect to d 1 is also the
minimal polynomial of T with respect to d 2 .

1.2 Solution to x = T x + d from {x m }


1.2.1 General considerations and notation
Let s be the unique solution to the N -dimensional linear system

x = T x + d. (1.5)

Writing this system in the form (I − T )x = d, it becomes clear that the uniqueness
of the solution is guaranteed when the matrix I − T is nonsingular or, equivalently,
36 Chapter 1. Development of Polynomial Extrapolation Methods

when T does not have one as its eigenvalue. Starting with an arbitrary vector x 0 , let
the vector sequence {x m } be generated via the iterative scheme

x m+1 = T x m + d, m = 0, 1, . . . . (1.6)

As we have shown already, provided ρ(T ) < 1, limm→∞ x m exists and equals s.
Making use of what we already know about minimal polynomials, we can actually
construct s as a linear combination of a finite number (at most N + 1) of the vectors
x m , whether {x m } converges or not. This is the subject of Theorem 1.8 below. Before
we state this theorem, we introduce some notation and a few simple, but useful, facts.
Given the sequence {x m }, generated as in (1.6), let

u m = Δx m , w m = Δu m = Δ2 x m , m = 0, 1, . . . , (1.7)

where Δx m = x m+1 − x m and Δ2 x m = Δ(Δx m ) = x m+2 − 2x m+1 + x m , and define


the error vectors ε m as in

ε m = x m − s, m = 0, 1, . . . . (1.8)

Using (1.6), it is easy to show by induction that

u m = T m u 0, w m = T m w 0, m = 0, 1, . . . . (1.9)

Similarly, by (1.6) and by the fact that s = T s + d, one can relate the error in x m+1 to
the error in x m via

ε m+1 = (T x m + d) − (T s + d) = T (x m − s) = T ε m , (1.10)

which, by induction, gives

ε m = T m ε0 , m = 0, 1, . . . . (1.11)

In addition, we can relate ε m to u m , and vice versa, via

u m = (T − I )εm and ε m = (T − I )−1 u m , m = 0, 1, . . . . (1.12)

Similarly, we can relate u m to w m , and vice versa, via

w m = (T − I )u m and u m = (T − I )−1 w m , m = 0, 1, . . . . (1.13)

Finally, by T m+i = T i T m , we can also rewrite (1.9) and (1.11) as in

T m+i u 0 = T i u m , T m+i w 0 = T i w m , T m+i ε0 = T i ε m m = 0, 1, . . . . (1.14)

As usual, T 0 = I in (1.9) and (1.11) and throughout.

1.2.2 Construction of solution via minimal polynomials

Theorem 1.8. Let P (λ) be the minimal polynomial of T with respect to εn = x n − s,


given as
k
P (λ) = ci λi , ck = 1. (1.15)
i =0
1.2. Solution to x = T x + d from {x m } 37

k
Then i =0 ci = 0, and s can be expressed as
k
i =0 ci x n+i
s= k . (1.16)
i =0 ci

Proof. By definition of P (λ), P (T )εn = 0. Therefore,

k k
0 = P (T )εn = ci T i εn = ci εn+i , (1.17)
i =0 i =0

the last equality following from (1.14). Therefore,

k k  k 
0= ci εn+i = ci x n+i − ci s,
i =0 i =0 i =0

 
and solving this for s, we obtain (1.16), provided ki=0 ci = 0. Now, ki=0 ci = P (1) = 0
since one is not an eigenvalue of T , and hence (λ − 1) is not a factor of P (λ). This
completes the proof.

By Theorem 1.8, we need to determine P (λ) to construct s. By the fact that P (λ)
is uniquely defined via P (T )εn = 0, it seems that we have to actually know εn to
know P (λ). However, since εn = x n − s and since s is unknown, we have no way
of knowing εn . Fortunately, we can obtain P (λ) solely from our knowledge of the
vectors x m . This we achieve with the help of Theorem 1.9.

Theorem 1.9. The minimal polynomial of T with respect to εn is also the minimal
polynomial of T with respect to u n = x n+1 − x n .

Proof. Let P (λ) be the minimal polynomial of T with respect to εn as before, and
denote by S(λ) the minimal polynomial of T with respect to u n . Thus,

P (T )εn = 0 (1.18)

and
S(T )u n = 0. (1.19)
Multiplying (1.18) by (T − I ), and recalling from (1.12) that (T − I )εn = u n , we
obtain P (T )u n = 0. By Theorem 1.6, this implies that S(λ) divides P (λ). Next, again
by (1.12), we can rewrite (1.19) as (T − I )S(T )εn = 0, which, upon multiplying by
(T − I )−1 , gives S(T )εn = 0. By Theorem 1.6, this implies that P (λ) divides S(λ).
Therefore, P (λ) ≡ S(λ).

What Theorem 1.9 says is that P (λ) in Theorem 1.8 satisfies

P (T )u n = 0 (1.20)

and has smallest degree. Now, since all the vectors x m are available to us, so are the
vectors u m = x m+1 − x m . Thus, the polynomial P (λ) can now be determined from
(1.20), as we show next.
38 Chapter 1. Development of Polynomial Extrapolation Methods

First, by (1.20), (1.15), and (1.14), we have that


k k
0 = P (T )u n = ci T i u n = ci u n+i . (1.21)
i =0 i =0

k
Next, recalling that ck = 1, we rewrite i =0 ci u n+i = 0 in the form

k−1
ci u n+i = −u n+k . (1.22)
i =0

Let us express (1.21) and (1.22) more conveniently in matrix form. For this, let us
define the matrices U j as

U j = [ u n | u n+1 | · · · | u n+ j ]. (1.23)

Thus, U j is an N × ( j + 1) matrix, u n , u n+1 , . . . , u n+ j being its columns. In this nota-


tion, (1.21) and (1.22) read, respectively,

U k c = 0, c = [c0 , c1 , . . . , ck ]T , (1.24)

and
U k−1 c  = −u n+k , c  = [c0 , c1 , . . . , ck−1 ]T . (1.25)
We will continue to use this notation without further explanation below.
Clearly, (1.25) is a system of N linear equations in the k unknowns c0 , c1 , . . . , ck−1
and is in general overdetermined since k ≤ N . Nevertheless, by Theorem 1.9, it is
consistent and has a unique solution for the ci . With this, we see that the solution s in
(1.16) is determined completely by the k + 2 vectors x n , x n+1 , . . . , x n+k+1 .
We now express s in a form that is slightly different from that in (1.16). With ck = 1
again, let us set
c
γi = k i , i = 0, 1, . . . , k. (1.26)
j =0 c j
k
This is allowed because j =0 c j = P (1) = 0 by Theorem 1.8. Obviously,

k
γi = 1. (1.27)
i =0

Thus, (1.16) becomes


k
s= γi x n+i . (1.28)
i =0
k
Dividing (1.24) by j =0 c j , and invoking (1.26), we realize that the γi satisfy the system

k
Ukγ = 0 and γi = 1, γ = [γ0 , γ1 , . . . , γk ]T . (1.29)
i =0

This is a linear system of N + 1 equations in the k + 1 unknowns γ0 , γ1 , . . . , γk . It is


generally overdetermined, but consistent, and has a unique solution.
1.3. Derivation of MPE, RRE, MMPE, and SVD-MPE 39

At this point, we note again that s is the solution to (I − T )x = d, whether



ρ(T ) < 1 or not. Thus, with the γi as determined above, s = ki=0 γi x n+i , whether
lim m→∞ x m exists or not.
We close this section with an observation concerning the polynomial P (λ) in The-
orem 1.9.

Proposition 1.10. Let k be the degree of P (λ), the minimal polynomial of T with respect
to u n . Then the sets {u n , u n+1 , . . . , u n+ j }, j < k, are linearly independent, while the set
{u n , u n+1 , . . . , u n+k } is not. The vector u n+k is a linear combination of u n+i , 0 ≤ i ≤
k − 1, as shown in (1.22).

1.3 Derivation of MPE, RRE, MMPE, and SVD-MPE


1.3.1 General remarks

So far, we have seen that s can be computed via a sum of the form ki=0 γi x n+i , with
k k
i =0 γi = 1, once P (λ) = i =0 ci λ (ck = 1), the minimal polynomial of T with re-
i

spect to εn , has been determined. We have seen that P (λ) is also the minimal polyno-
mial of T with respect to u n and can be determined uniquely by solving the generally
overdetermined, but consistent, system of linear equations in (1.22) or, equivalently,
(1.25). However, the degree of the minimal polynomial of T with respect to εn can
be as large as N . Because N can be a very large integer in general, determining s in the
way we have described here becomes prohibitively expensive as far as computation
time and storage requirements are concerned. [Note that we need to store the vectors
u n , u n+1 , . . . , u n+k and solve the N × k linear system in (1.22).] Thus, we conclude
that computing s via a combination of the iteration vectors x m , as described above,
may not be feasible after all. Nevertheless, with a twist, we can use the framework
developed thus far to approximate s effectively. To do this, we replace the minimal
polynomial of T with respect to u n (or εn ) by another unknown polynomial, whose
degree is smaller—in fact, much smaller—than N and is at our disposal.
Let us denote the degree of the minimal polynomial of T with respect to u n by k0 ;
of course, k0 ≤ N . Then, by Definition 1.5, it is clear that the sets {u n , u n+1 , . . . , u n+k },
0 ≤ k ≤ k0 − 1, are linearly independent, but the set {u n , u n+1 , . . . , u n+k0 } is not. This
implies that the matrices U k , k = 0, 1, . . . , k0 − 1, are of full rank, but U k0 is not; that
is,
rank(U k ) = k + 1, 0 ≤ k ≤ k0 − 1, rank(U k0 ) = k0 . (1.30)

1.3.2 Derivation of MPE


Let us choose k to be an arbitrary positive integer that is normally much smaller than
the degree of the minimal polynomial of T with respect to u n (hence also εn ) and,
therefore, also much smaller than N . In view of Proposition 1.10, the overdetermined
linear system U k−1 c  = −u n+k in (1.25) is now clearly inconsistent and hence has no
solution for c0 , c1 , . . . , ck−1 in the ordinary sense. To get around this problem, we solve
this system in the least-squares sense, since such a solution always exists. Following

that, we let ck = 1 and, provided ki=0 ci = 0, we compute γ0 , γ1 , . . . , γk precisely as in

(1.26) and then compute the vector sn,k = ki=0 γi x n+i as our approximation to s. The
resulting method is MPE.
40 Chapter 1. Development of Polynomial Extrapolation Methods

We can summarize the definition of MPE through the following steps:


1. Choose the integers k and n and input the vectors x n , x n+1 , . . . , x n+k+1 .
2. Compute the vectors u n , u n+1 , . . . , u n+k and form the N × k matrix U k−1 . (Re-
call that u m = x m+1 − x m .)
3. Solve the overdetermined linear system U k−1 c  = −u n+k in the least-squares
sense for c  = [c0 , c1 , . . . , ck−1 ]T . This amounts to solving the optimization prob-
lem ' k−1 '
' '
min ' ' c i u n+i + u n+k
',
' (1.31)
c0 ,c1 ,...,ck−1
i =0
which can also be expressed as
min

U k−1 c  + u n+k , c  = [c0 , c1 , . . . , ck−1 ]T . (1.32)
c

Here the vector norm  ·  that we are using is defined via z  = (z , z ), where
(· , ·) is an arbitrary inner product at our disposal, as defined in (1.1).10 With

c0 , c1 , . . . , ck−1 available, set ck = 1 and compute γi = ci / kj=0 c j , i = 0, 1, . . . , k,

provided kj=0 c j = 0.

4. Compute sn,k = ki=0 γi x n+i as an approximation to lim m→∞ x m = s.

1.3.3 Derivation of RRE


Again, let us choose k to be an arbitrary positive integer that is normally much smaller
than the degree of the minimal polynomial of T with respect to u n (hence also εn )
and, therefore, also much smaller than N . In view of Proposition 1.10, the overdeter-

mined linear system U k γ = 0 in (1.29), subject to ki=0 γi = 1, is inconsistent, hence
has no solution for γ0 , γ1 , . . . , γk in the ordinary sense. Therefore, we solve the system

U k γ = 0 in the least-squares sense, with the equation ki=0 γi = 1 serving as a con-
straint. Note that such a solution always exists. Following that, we compute the vector

sn,k = ki=0 γi x n+i as our approximation to s. The resulting method is RRE. This ap-
proach to RRE was essentially given by Kaniel and Stein [155] and by Mes̆ina [185];11
however, their motivations are different from the one that we have used here, which
goes through the minimal polynomial of a matrix with respect to a vector.
We can summarize the definition of RRE through the following steps:
1. Choose the integers k and n and input the vectors x n , x n+1 , . . . , x n+k+1 .
2. Compute the vectors u n , u n+1 , . . . , u n+k and form the N × (k + 1) matrix U k .
(Recall that u m = x m+1 − x m .)
3. Solve the overdetermined linear system U k γ = 0 in the least-squares sense, sub-

ject to the constraint ki=0 γi = 1. This amounts to solving the optimization
problem
' k '
' ' k
min ' γ u ' subject to γi = 1, (1.33)
γ0 ,γ1 ,...,γk ' i n+i '
i =0 i =0
10
For the use of other norms, see Section 1.6.
11 The two works compute the γi in the same way but differ in the way they compute sn,k ; namely,
 
sn,k = ki=0 γi x n+i in [185], while sn,k = ki=0 γi x n+i +1 in [155]. Note that, in both methods, only the
vectors x n , x n+1 , . . . , x n+k+1 are used to compute sn,k .
Exploring the Variety of Random
Documents with Different Content
If the second copy is also defective, you may demand a refund
in writing without further opportunities to fix the problem.

1.F.4. Except for the limited right of replacement or refund set


forth in paragraph 1.F.3, this work is provided to you ‘AS-IS’,
WITH NO OTHER WARRANTIES OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.

1.F.5. Some states do not allow disclaimers of certain implied


warranties or the exclusion or limitation of certain types of
damages. If any disclaimer or limitation set forth in this
agreement violates the law of the state applicable to this
agreement, the agreement shall be interpreted to make the
maximum disclaimer or limitation permitted by the applicable
state law. The invalidity or unenforceability of any provision of
this agreement shall not void the remaining provisions.

1.F.6. INDEMNITY - You agree to indemnify and hold the


Foundation, the trademark owner, any agent or employee of the
Foundation, anyone providing copies of Project Gutenberg™
electronic works in accordance with this agreement, and any
volunteers associated with the production, promotion and
distribution of Project Gutenberg™ electronic works, harmless
from all liability, costs and expenses, including legal fees, that
arise directly or indirectly from any of the following which you
do or cause to occur: (a) distribution of this or any Project
Gutenberg™ work, (b) alteration, modification, or additions or
deletions to any Project Gutenberg™ work, and (c) any Defect
you cause.

Section 2. Information about the Mission


of Project Gutenberg™
Project Gutenberg™ is synonymous with the free distribution of
electronic works in formats readable by the widest variety of
computers including obsolete, old, middle-aged and new
computers. It exists because of the efforts of hundreds of
volunteers and donations from people in all walks of life.

Volunteers and financial support to provide volunteers with the


assistance they need are critical to reaching Project
Gutenberg™’s goals and ensuring that the Project Gutenberg™
collection will remain freely available for generations to come. In
2001, the Project Gutenberg Literary Archive Foundation was
created to provide a secure and permanent future for Project
Gutenberg™ and future generations. To learn more about the
Project Gutenberg Literary Archive Foundation and how your
efforts and donations can help, see Sections 3 and 4 and the
Foundation information page at www.gutenberg.org.

Section 3. Information about the Project


Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-
profit 501(c)(3) educational corporation organized under the
laws of the state of Mississippi and granted tax exempt status
by the Internal Revenue Service. The Foundation’s EIN or
federal tax identification number is 64-6221541. Contributions
to the Project Gutenberg Literary Archive Foundation are tax
deductible to the full extent permitted by U.S. federal laws and
your state’s laws.

The Foundation’s business office is located at 809 North 1500


West, Salt Lake City, UT 84116, (801) 596-1887. Email contact
links and up to date contact information can be found at the
Foundation’s website and official page at
www.gutenberg.org/contact
Section 4. Information about Donations to
the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission
of increasing the number of public domain and licensed works
that can be freely distributed in machine-readable form
accessible by the widest array of equipment including outdated
equipment. Many small donations ($1 to $5,000) are particularly
important to maintaining tax exempt status with the IRS.

The Foundation is committed to complying with the laws


regulating charities and charitable donations in all 50 states of
the United States. Compliance requirements are not uniform
and it takes a considerable effort, much paperwork and many
fees to meet and keep up with these requirements. We do not
solicit donations in locations where we have not received written
confirmation of compliance. To SEND DONATIONS or determine
the status of compliance for any particular state visit
www.gutenberg.org/donate.

While we cannot and do not solicit contributions from states


where we have not met the solicitation requirements, we know
of no prohibition against accepting unsolicited donations from
donors in such states who approach us with offers to donate.

International donations are gratefully accepted, but we cannot


make any statements concerning tax treatment of donations
received from outside the United States. U.S. laws alone swamp
our small staff.

Please check the Project Gutenberg web pages for current


donation methods and addresses. Donations are accepted in a
number of other ways including checks, online payments and
credit card donations. To donate, please visit:
www.gutenberg.org/donate.

Section 5. General Information About


Project Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could
be freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose
network of volunteer support.

Project Gutenberg™ eBooks are often created from several


printed editions, all of which are confirmed as not protected by
copyright in the U.S. unless a copyright notice is included. Thus,
we do not necessarily keep eBooks in compliance with any
particular paper edition.

Most people start at our website which has the main PG search
facility: www.gutenberg.org.

This website includes information about Project Gutenberg™,


including how to make donations to the Project Gutenberg
Literary Archive Foundation, how to help produce our new
eBooks, and how to subscribe to our email newsletter to hear
about new eBooks.
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.

More than just a book-buying platform, we strive to be a bridge


connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.

Join us on a journey of knowledge exploration, passion nurturing, and


personal growth every day!

ebookbell.com

You might also like